Plotting LFS Standard Errors
This script shows how to retrieve several LFS series using the Mountainmath CANSIM retrieval package and to plot the monthly total employment change with the associated standard errors. This plot is coupled with a plot of the underlying trend cycle estimates of the levels of total employment. The latter is probably the most appropriate estimate to use for planning purposes.
In this example the key packages are:
- dplyr – to process the retrieved data – part of tidyverse
- ggplot2 – to do the plots – part of tidyverse
- cansim – to retrieve the series from NDM
- xts – to manage time series
- tbl2xts – to convert the retrieved tibble to an xts data frame
- lubridate – to manipulate date – part of tidyverse.
The first stage is to set a working directory, load the initial required packages, define the list of Labour Force Survey (LFS) vectors required and retrieve and print the meta data.
#This vignette retrieves vectors from the LFS
#some routines make use of a cache which can be usefully set in options
#the series are defined, the date is adjusted to suit the conventions of the xts time series package
#define the plot window in terms of periodicity units
#analyze meta data
The get_cansim_vector_info function retrieves the title information for each vector and the table number. The get_cansim_cube_metadata function retrieves the metadata for the table(s). In this case, there is only one table. The dplyr left_join function is used to link the series and table metadata. The select operator is used to simplify the resulting tibble to just the vector, table number (NDM productID), as well as the series and table titles. The common_meta tibble is coerced to a data frame for printing purposes so that all fields are shown in the log file. This meta data provides the documentation for the run.
The next code retrieves the data series using get_cansim_vector which returns a tibble with all the series. This tibble is a tall data structure with one row per data point. The NDM REF_DATE field is returned as a character date by the get_cansim_vector routine but is converted to an R date variable for use in time-related calculations.
canada_lfs_xts <- get_cansim_vector(canada_lfs_vbls,start_time=as.Date("1995-01-01")) %>%
#use index function or dates to select rows in xts data frame.
The dplyr mutate function is used to create the vector of R dates required by the tbl_xts function from the tbl2xts package. The latter function creates a data frame with columns for each unique vector and data derived from the VALUE field in the initial tibble. The tail function is used to show the last 20 observations of the resulting data frame and the column names and meta data are printed again.
The xts package was loaded after the dplyr package so that the xts version of merge is used in the next set of code which calculates the level month to month difference. When using diff and lag functions in R, check the documentation for the package being used because different conventions may be used on the sign of the lag. A xts data structure is created using the xts version of merge to maintain the xts attributes. The column names must be reset to be useful for the plot.
#the first vector is lfs employment 15+
#the fourth vector is the standard error of the month to month change
#the data have to be merged together to maintain the xts attributes
The data1 xts data frame contains three series, the level of employment, the monthly change and the standard error. For the purposes of the initial plot, a plot data structure is created, with the change series and the change series plus half the standard error added to provide the upper limit. The lower limit is also added to the plot data xts data frame. Appropriate column names are defined to be useful in the plot.
The next piece of code converts the xts data frame back to a tibble after using the last function, from the xts package, to select the last portion of the data. The plot window is defined above in terms of the periodicity of the data, in this case, months.
#making it into a wide tibble makes it easier to manage in ggplot because the date is accessible
#last 2 years - use units of series such as months or days as in example below
#now calculate a date string for the caption
A date string is calculated from the first and last values of the date in the plot_data2 frame. The first rows of the plot data tibble are printed for documentation purposes.
The next set of code plots the lfs change variable with the standard-error augmented differences as the upper and lower bounds. The basic level change is marked with a point. The point range geom plots the augmented differences as a line between upper and lower. The geom_line portion of the plot draws a line between the change points.
#load the plot library
labs(title="Monthly Change in LFS 15+ Employment",y="Monthly Change (000)",
subtitle="Seasonally Adjusted with Standard Error Bars",
caption=paste(date_string," ","Statcan:",paste(canada_lfs_vbls[c(1,4)],collapse=", "),"JCI"))
The caption string includes the date string calculated above, and the two vector numbers which as merged as a string using a comma. Then the plot is saved to a png file. The example is shown below.
The next set of code plots the level of the employment series along with the trend-cycle component. The latter series provides a better base of analysis than the somewhat more volatile timeseries of LFS point estimates. We want to plot the last 6 months of the trend cycle with a different line type so one variant is created with the last 6 months set to NA and the other is just the last 6 months. Note the combination of three series with xts merge.
#now start a plot of the trend cycle and the total employment
# review this link about trend cycle https://www.statcan.gc.ca/eng/dai/btd/tce-faq
#we want to drop the last 12 observations from the trend cycle and add a third series which is just the tail of the trend cycle.
Again, the last plot_window observations are selected for the plot.
The next set of code calculates the date string for the caption using the index variable for the xts plot_data_trend frame. The index variable is essentially the time vector for the frame.
#linetype and colur are managed in the alphabetic order of the series group in tibble.
labs(title="Monthly LFS 15+ Employment",x=NULL,y="Monthly Level (000)",
caption=paste(date_string_trend," ","Statcan:",paste(canada_lfs_vbls[c(1,2)],collapse=", "),"JCI"))
The tidy function from the broom package (part of tidyverse) is used to create the required tibble.
The series are processed in alphabetical order so the line types should be appropriate specified. The line type is varied by the series variable which is created by the tidy function. This variable contains the names of the three series that we are plotting.
The gridExtra package is used to merge the two plots into one larger graphic for easier inclusion in a web site or word document with both plots side by side. The dimensions of the ggsave are defined in inches.
The merged plots are shown below.