### Analyzing Health Expenditure Data From The OECD With R

Musings on Seasonal Adjustment and The Labour Force Survey

# Introduction

This note outlines some thoughts about seasonal adjustment. Examples from R will be supplied. The specific context of the material is in relation to the Labour Force Survey but much of the remarks are relevant to the analysis of other data.[1][2] Where possible, references have been chosen which are freely available on the web.

Many timeseries with a sub-annual periodicity may exhibit regular seasonal patterns each year. These seasonal patterns will arise because of a combination of factors related to calendar effects (school years, holidays, etc.), weather (climate), timing decisions (dividends), etc. Granger emphasizes that timeseries may have different causes of seasonality and thus may show different seasonal patterns.(Granger 1978)

These patterns often obscure the underlying trends in the data. The seasonal adjustment process uses mathematical techniques to decompose the initial raw timeseries into three timeseries components:

1. Seasonal
2. Trend-Cycle
3. Residual

Combined, the three components, processes underlying the original series, are equivalent to the original series. Since it is a mathematical transformation, seasonal decomposition is a statistical estimate. We cannot separately measure the components. Multiplicative models may be appropriate to cases in which the seasonal pattern is proportionate to the level of the series. Additive models may otherwise suffice. This is discussed in Chapter 6 of Hyndman and Athanasopoulos (HA).(Hyndman and Athanasopoulos 2018)

In the context of the Labour Force Survey (LFS), Statistics Canada directly measures each month of data with a household survey instrument. Each individual unweighted observation represents pure measured data. Population-level estimates are developed by applying population weights to each observation to get appropriately scaled values. This is more of a statistical process since the population weights are subject to error and revision. Except when the population weights are revised, with the availability of a new census, the data remain unchanged. The underlying unweighted observation is not revised.(Government of Canada 2018)

The basic seasonal adjustment estimation process is to filter out the seasonal and trend components using some statistical filter such as centered moving averages. The goal is to identify patterns and remove data that are explained by seasonality. The Australian Bureau of Statistics (ABS) has an excellent summary of the basic process involved in the X11 analysis. They explain the general steps on their web site.(Australian Bureau of Statistics n.d.) The basic approach involves three steps:

1. Estimate the trend by a moving average
2. Remove the trend leaving the seasonal and irregular components
3. Estimate the seasonal component using moving averages to smooth out the irregulars.

The process is iterative because the trend cannot really be identified until the seasonal and irregular residual components have been estimated. The ABS site has a step by step outline of the process.

Eurostat publishes a handbook which covers the field very well with chapters by major practitioners.(Mazzi and Ladiray 2018) The chapter on timeseries components reviews the general characteristics of the components and the type of models, generally deterministic or stochastic, used to estimate the trends.(Dagum and Mazzi 2018) In this chapter, Dagum and Mazzi discuss the issues of modelling moveable holidays, such as Easter, and trading day variations.

One issue raised is the treatment of outliers. Special estimate may be required for significant strikes or weather events. The example of the big ice storm in Canada is used as what are termed as redistributive outliers which may shift the pattern of seasonality. This may require special modelling.

# Some Tools

The X11 approach was developed at the Bureau of the Census in the US and substantially refined and enhanced at Statistics Canada. Modern seasonal adjustment practice owes a great deal to our statistics agency. At StatCan, the process was enhanced to use ARIMA modelling to back- and forward-cast the series because otherwise the centered filters have difficulties with endpoints because there are not enough observations for the full range of weights.

ARIMA stands for Auto Regressive Integrated Moving Average. It is a technique for forecasting time series based on moving averages. Chapter 8 in H-A introduces the key concepts and discusses their use in forecasting.

The SEATS “Seasonal Extraction in ARIMA Time Series” decomposition method has been incorporated in the latest version of the X series (X13Arima-SEATS) by the Bureau of the Census in the US.(US Census Bureau n.d.) The SEATS process only works on quarterly and monthly data. The author of the R seasonal package, Christoph Sax, which interfaces to the Census software, has developed an excellent web site which facilitates the online use of the software and experimentation with parameters.(Sax n.d.) It is possible to upload series and directly process them on this site. This site also facilitates the testing of specific modelling assumptions for trading days and outliers. The latter capability is a key attribute of the X-family of seasonal adjustment procedures. In his R package, Sax facilitates the use of the full modeling facilities of X13-SEATS/X-11. He has also implemented a view function which emulates his web site.[3]

Another approach (STL) uses locally-weighted regressions (LOESS) as smoothers to estimate the components.(Cleveland et al. 1990) The STL algorithm, which is included in the base version of R, has the advantage of being useful for daily, weekly and any other timeseries with “regular” periodicity that can be fit into a TS object in R.[4]

Both X-11 (Henderson) and STL (Loess) use smoothing processes or filters to identify the trend-cycle component. Dagum reviews the use of these tools for local trend-cycle estimation.(Dagum 2018) Her analytical charts indicate a strong similarity of the results for the Henderson and Loess filters.

Examples of these decomposition procedures will be shown later in this note. However, there are many other similar techniques which will not be evaluated for this note.

# Usage Considerations

The estimation of the components of a seasonal timeseries will change with each new data point. This implies that history gets revised. The practice at Statistics Canada is to revise only the immediate past observation until the end of the year. This is equivalent to leaving the factors used to extract the seasonal portion unchanged. With the beginning of the new year, the seasonal factors are revised back by estimating the full data series. With the LFS, this is introduced into the public dataset close to the release of the January data. This year, StatCan released the revisions at the beginning of February 2018. The chart below shows the impact of the revisions on month-to-month changes in the aggregate level of employment.

Figure 1 Monthly Season Revisions 2019-02-01

One of the challenges with processing data through seasonal adjustment or other filtering algorithms is the requirement to keep consistency between various levels of aggregation. The LFS has data by province, industry and demographic characteristic so consistency is a challenge. In the LFS case, it is handled by “raking” the series in each direction to maintain consistency of aggregation.(Fortier and Gellatly n.d.)

# Some Examples of Seasonal Adjustment

In this section, we will just review some mechanical examples of seasonal adjustment. Various R packages will be used. These will be outlined in the appendix which reviews the example script. In all cases, the forecast package, developed by Rob Hyndman, is loaded because it provides useful support to the management of ts objects (regular-periodicity time series) in R.(“Forecast Package | R Documentation” n.d.)  There is an excellent introduction to the seasonal decomposition techniques in Chapter 6 of the Hyndman - Athanasopoulos book.(Hyndman and Athanasopoulos 2018) At the start of the chapter, the book provides a brief but useful summary of classical decomposition techniques using moving average filters. These techniques have been replaced with the more modern seasonal adjustment packages.

In these examples, the same series, monthly employment for 15+, is used. Outlier detection has been left enabled for the X-family examples and the robust method has been used for STL.

## X11

The Census software supports both x11ARIMA and the SEATS approach. The seasonal package is used to prepare the interface information and extract results for both.(Sax and Eddelbuettel 2018)   The initial example is just for the X11 variant.

The results from the analysis are shown below:

`x11 Results`
`Call:`
`seas(x = stc_raw, x11 = "")`
`Coefficients:`
`                   Estimate Std. Error z value Pr(>|z|)  `
`LS2008.Nov       -146.75083   34.92917 -4.201 2.65e-05 ***`
`LS2009.Jan       -184.01995   34.92948 -5.268 1.38e-07 ***`
`MA-Nonseasonal-01   0.07721   0.05937   1.300   0.193  `
`MA-Seasonal-12       0.61281   0.04961 12.352 < 2e-16 ***`
`---`
`Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1`
`X11 adj. ARIMA: (0 1 1)(0 1 1) Obs.: 288 Transform: none`
`AICc: 2811, BIC: 2829 QS (no seasonality in final):   0`
`Box-Ljung (no autocorr.): 26.5   Shapiro (normality): 0.9962`

Figure 2 X11 Summary Results

The analysis identifies Nov 2008 and January 2009 as significant, which is the start of the Great Recession. The seasonally adjusted series was shown to have no residual seasonality by the tests applied. The next chart shows the seasonal components extracted from the analysis. Note that scale of each component chart is independent. In reviewing the chart, note that the algorithm identified the break in trend and that the unexplained residuals are relatively small. The seasonal estimates do not increase in size over time.

Figure 3 X11 Decomposition

## SEATS Example

Extending the analysis to the SEATS procedure produces similar results.

`X13 Results`
`Call:`
`seas(x = stc_raw)`
`Coefficients:`
`                   Estimate Std. Error z value Pr(>|z|)  `
`LS2008.Nov       -146.75083   34.92917 -4.201 2.65e-05 ***`
`LS2009.Jan       -184.01995   34.92948 -5.268 1.38e-07 ***`
`MA-Nonseasonal-01   0.07721   0.05937   1.300   0.193  `
`MA-Seasonal-12       0.61281   0.04961 12.352 < 2e-16 ***`
`---`
`Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1`
`SEATS adj. ARIMA: (0 1 1)(0 1 1) Obs.: 288 Transform: none`
`AICc: 2811, BIC: 2829 QS (no seasonality in final):   0`
`Box-Ljung (no autocorr.): 26.5   Shapiro (normality): 0.9962`
`Messages generated by X-13:`
`Warnings:`
`- At least one visually significant trading day peak has been`
`found in one or more of the estimated spectra.`

Figure 4 X13-SEATS Summary Results

The component chart below shows similar results to the X-11 variant

Figure 5 X13SEAT Components

The residual variation is slightly smaller probably reflecting the type of filters applied. However, the differences in scales should be noted.

## STL Example

The final example uses the mstl procedure in the forecast package to iterate the stl procedure automatically to tighten the estimation on the seasonal patterns. The output is relatively simple, showing the distribution of the components.

`stl Results`
`     Data           Trend         Seasonal12         Remainder      `
`Min.   :12894   Min.   :13242   Min.   :-364.936   Min.   :-160.752`
`1st Qu.:14847   1st Qu.:14870   1st Qu.:-239.970   1st Qu.: -15.653`
`Median :16484   Median :16575   Median : 18.923   Median :   0.802`
`Mean   :16195   Mean   :16194   Mean   : -0.175   Mean   :  1.259`
`3rd Qu.:17542   3rd Qu.:17572   3rd Qu.: 202.352   3rd Qu.: 17.801`
`Max.   :18979   Max.   :18730   Max.   : 376.106   Max.   : 249.917`

Figure 6 MSTL Summary

Reviewing the component charts below indicates that the STL algorithm produces a similar trend estimate with a noticeable break but not as sharp. H-A suggest that STL has several advantages:

STL has several advantages over the classical, SEATS and X11 decomposition methods:

• Unlike SEATS and X11, STL will handle any type of seasonality, not only monthly and quarterly data.
• The seasonal component is allowed to change over time, and the rate of change can be controlled by the user.
• The smoothness of the trend-cycle can also be controlled by the user.
• It can be robust to outliers (i.e., the user can specify a robust decomposition), so that occasional unusual observations will not affect the estimates of the trend-cycle and seasonal components. They will, however, affect the remainder component.

On the other hand, STL has some disadvantages. In particular, it does not handle trading day or calendar variation automatically, and it only provides facilities for additive decompositions.[5]

Generally, the residuals are larger for the test case than the X-family examples shown previously. The break in the trend is less extreme than that found by the X-family examples.

Figure 7 STL Decomposition

As in the other example, the analysis was run using the simplest automatic approach. However, a robust estimation procedure was used which moves the effect of outliers to the remainder rather than letting them affect the other components.

The next two charts compare the track of the examples over the full available sample period as well as in the recent period. Note that all the algorithms track the StatCan estimate well and produce a similar break for the Great Recession. However, the trend-cycle estimate from the STL approach has a less sharp break in the trend but wider swings in the irregular remainder at the break point.

Figure 8 Example Comparison over Full Sample

The next graph tightens the focus to recent observations and changes the scale. As would be expected because of the similarity of the approaches, the X-11 and the SEATS variant, used in automatic mode, track the StatCan X-12 estimate best.

Figure 9 Seasonal Adjustment Comparison 2016 forward

Comparison charts of the irregular remainder are shown below. All forms of seasonal adjustment are constrained to the standard identity that the sum of the three components is identical to the original raw series. As noted earlier, the STL procedure used a robust estimation approach which tends to move the impact of outliers to the irregular remainder rather than moving them to the trend estimate.

Figure 10 Comparison of Seasonal Adjustment Residuals

The other components are shown in Appendix A. The plot of the trend-cycle measure has been done with 3 panels and as a common chart in that appendix. The charts show that the STL algorithm has created a smoother trend and allocated the outliers to the Residuals component in comparison to the X-family estimates.

The next table shows the basic statistics for the components for each of the seasonal adjustment methods.

 Statistic N Mean St. Dev. Min Pctl(25) Pctl(75) Max stl_seasonal 288 -0.175 242.486 -364.936 -239.970 202.352 376.106 x11_seasonal 288 -0.114 240.267 -368.705 -245.108 198.570 389.129 x13_seasonal 288 -0.119 240.648 -376.648 -250.298 187.044 390.193 stl_trendcycle 288 16,193.750 1,607.167 13,241.890 14,869.920 17,572.030 18,729.710 x11_trendcycle 288 16,194.330 1,610.948 13,257.400 14,891.950 17,585.100 18,808.450 x13_trendcycle 288 16,194.950 1,610.782 13,251.060 14,892.060 17,582.950 18,803.610 stl_irregular 288 1.259 44.890 -160.752 -15.653 17.801 249.917 x11_irregular 288 0.616 17.404 -58.303 -8.026 8.947 74.379 x13_irregular 288 -0.000 10.986 -29.557 -7.095 7.522 36.113

Figure 11 Descriptive Statistics for Components

The next figure shows the descriptive statistics for the full sample period for the seasonally adjusted data.

 Statistic N Mean St. Dev. Min Pctl(25) Pctl(75) Max empl_stl_sa 288 16,195.000 1,611.077 13,228.190 14,883.320 17,586.900 18,864.350 empl_x11_sa 288 16,194.940 1,610.988 13,240.000 14,892.170 17,585.160 18,818.820 empl_x13_sa 288 16,194.950 1,610.884 13,245.640 14,890.680 17,581.590 18,803.690 stc_sa 288 16,196.400 1,611.340 13,262.300 14,887.500 17,605.950 18,808.400

Figure 12 Descriptive Statistics for Seasonally Adjusted Series

The statistics are relatively standard for all the estimates. To the extent that the estimation process shifts variations between the trend-cycle and irregular components, there is no loss of information in the final seasonally adjusted series. However, if the filtering process shifts information to the estimated seasonal patterns inappropriately, information may be lost. That is the challenge with filtering algorithms. It should be noted that the current stc_sa estimates were retrieved prior to the update of the seasonal factors. Both the X variants show the most modest irregulars with X-13 having the shallowest ones. This means that more variation has been moved to the trend component than in the stl alternative.

# A Trend is a Trend is a Trend

The heading is the start of a famous poem attributed to Sir Alex Cairncross:

A trend is a trend is a trend. But the question is, will it bend? Will it alter its course through some unforeseen force and come to a premature end?[6]

One of the main reasons for seasonally adjusting a series is to focus on the trend. If the decompositions are available, the trend, known as the trend-cycle, can be easily extracted. However, if only the seasonally-adjusted final result is available, some other approach is required. StatCan has documented a procedure to extract a trend from its seasonally-adjusted series using (for monthly data), a centered 13-period filter with exponentially declining weights on each side.(Government of Canada et al. 2015) StatCan supplies it for a few key series. However, the technique is applicable to other series. The chart below shows the recent estimate of the seasonally-adjusted LFS15+ employment and the associated trend-cycle estimate.

Figure 13 LFS15+ Employment and Trend-Cycle estimate

The latter part of the trend is shown as a dotted line to indicate that only a portion of the filter was used. The final data point is filtered only with half of the filter. This trend-cycle estimate should likely be used for planning purposes rather than direct analysis of the more volatile monthly series.

# Estimates Have a Distribution

The final chart in this note pairs two charts showing the monthly and year-over-year change in the LFS15+ series. StatCan provides estimates of the standard error for these measures of change. These are shown as error bars at the 68% level. These standard error estimates are discussed in the notes at the end of the LFS Daily release. Since the survey samples are assumed to be normally distributed, the point estimate is assumed to be the most likely result. However, one should always be aware the “true” number might be even outside the line shown but with a very low probability. The 95% confidence interval error bar would be almost twice as long. Thus, caution should be exercised in the interpretation of recent changes. The value scales are much different in the two charts. It should be noted that the error bars for the year-over-year change are much smaller in proportion to the change. This suggests some usefulness in that form of comparison.

# Conclusions

The purpose of this note is to outline how seasonal adjustment works and to provide an example using Canadian data of several processes which are available in R. An appendix reviews the R code used to produce the examples. Results are broadly similar, but the modern X-family examples show why these programs are used for monthly and quarterly data by the major statistical authorities. The important points to consider are:

1. Additional data changes the estimation process resulting in differences in the estimated decomposition between seasonal, trend-cycle and irregular components.
2. This can result in changes in patterns of the period-to-period seasonally adjusted observations but would not likely affect the estimate of the general trend.
3. Seasonal adjustment techniques were developed to facilitate a focus on longer term trends that were obscured by seasonality. The mathematical filters involved may obscure or modify the period-to-period changes in such a way as to modify the information content. Care should always be taken with short-term analysis.
4. Specific consideration of outliers related to major weather events or significant strikes may be required but automatic procedures may suffice.
5. The trend-cycle component is a valuable indicator of the direction of change in the target series.

# Bibliography

Australian Bureau of Statistics. n.d. “Australian Bureau of Statistics Time Series Analysis: Seasonal Adjustment Methods.” Accessed January 22, 2019. http://www.abs.gov.au/websitedbs/d3310114.nsf/4a256353001af3ed4b2562bb00121564/c890aa8e65957397ca256ce10018c9d8!OpenDocument.

Cleveland, Robert B., William S. Cleveland, Jean E. McRae, and Irma Terpenning. 1990. “STL: A Seasonal-Trend Decomposition.” Journal of Official Statistics 6 (1): 3–73.

Dagum, Estela Bee. 2018. “14 - Trend-Cycle Estimation.” In Handbook on Seasonal Adjustment, 2018th ed. Luxembourg: European Union. https://ec.europa.eu/eurostat/documents/3859598/8939616/KS-GQ-18-001-EN-N.pdf/7c4d120a-4b8a-441b-aefd-6afe81a7cf59.

Dagum, Estela Bee, and Gian Luigi Mazzi. 2018. “3 - Time Series Components.” In Handbook on Seasonal Adjustment, 2018th ed. Luxembourg: European Union. https://ec.europa.eu/eurostat/documents/3859598/8939616/KS-GQ-18-001-EN-N.pdf/7c4d120a-4b8a-441b-aefd-6afe81a7cf59.

Government of Canada, Statistics Canada. 2018. “Guide to the Labour Force Survey, 2018.” September 7, 2018. https://www150.statcan.gc.ca/n1/pub/71-543-g/71-543-g2018001-eng.htm.

Government of Canada, Statistics Canada, Suzy Fortier, Steve Matthews, and Guy Gellatly. 2015. “Trend-Cycle Estimates – Frequently Asked Questions.” August 19, 2015. https://www.statcan.gc.ca/eng/dai/btd/tce-faq.

Granger, Clive WJ. 1978. “Seasonality: Causation, Interpretation, and Implications.” In Seasonal Analysis of Economic Time Series, 33–56. NBER.

Hyndman, Rob J., and George Athanasopoulos. 2018. Forecasting: Principles and Practice. OTexts.

Mazzi, Gian Luigi, and Dominique Ladiray. 2018. Handbook on Seasonal Adjustment. 2018th ed. Luxembourg: European Union. https://ec.europa.eu/eurostat/documents/3859598/8939616/KS-GQ-18-001-EN-N.pdf/7c4d120a-4b8a-441b-aefd-6afe81a7cf59.

Sax, Christoph. n.d. “Seasonal: R Interface to X-13ARIMA-SEATS.” Accessed January 23, 2019. http://www.seasonal.website/.

Sax, Christoph, and Dirk Eddelbuettel. 2018. “Seasonal Adjustment by X-13ARIMA-SEATS in R.” Journal of Statistical Software 87 (11): 1–17.

US Census Bureau, Brian C. Monsell. n.d. “US Census Bureau - X-13ARIMA-SEATS Seasonal Adjustment Program - Home Page.” Accessed January 23, 2019. https://www.census.gov/srd/www/x13as/.

# R - Examples - Code

This section reviews the code used to produce the examples in the preceding sections of the report. As well as base R, key packages include:

• dplyr – data frame/tibble manipulation
• cansim – retrieving variables from StatCan NDM (a.k.a. CANSIM)
• tidyyr – a tidyverse component to manipulate data frames and tibbles to and from tidy data structures
• ggplot2 – tidyverse graphics package
• lubridate – tidyverse package to manipulate dates
• tsbox – a package to work with timeseries objects
• forecast – useful timeseries tools including seasonal adjustment
• seasonal – x11/x13 seasonal adjustment

The use of “ts” objects throughout the code is needed to fit the requirements for “regular” periodicity objects. The tsbox package has translation functions to move from other data structures such as tibbles to ts objects. An alternative strategy is to manipulate the time data using packages such as xts/zoo and only convert to “ts” when doing seasonal adjustment. The autoplot extension in the forecast packages supports “ts” objects.

Much of the code is adequately commented. The first stage after initial package loads via the library function is to define a list of series and to retrieve them. The cansim package retrieves the series in vector order (i.e. sorted) into a tibble which is translated into a ts data frame using the ts_ts function from tsbox. If the position in the request list is critical, the retrieved data frame must be reordered. This is good standard practice as is the saveRDS function which is used to save a copy of the data with date.

`library(dplyr)`
`library(cansim)`
`library(tsbox)`
`library(tidyr)`
`library(lubridate)`
`library(ggplot2)`
`library(forecast)`
`#In this vignette, we are working with ts objects to suit the conventions of seasonal adjustment`
`canada_lfs_vbls<-c("v2062811","v100304302","v101884904", "v101884905",`
`"v101884906","v2064890"`
`)`
`#analyze meta data`
`meta1<-get_cansim_vector_info(canada_lfs_vbls)`
`cube_meta<-get_cansim_cube_metadata(unique(meta1\$table))`
`common_meta<-left_join(meta1,cube_meta,by=c("table"="productId")) %>%distinct()%>%`
`select(VECTOR,table,title_en,cubeTitleEn,cubeEndDate)`
`print(as.data.frame(common_meta))`
`#now retrieve the data`
`canada_lfs_ts <- get_cansim_vector(canada_lfs_vbls,start_time=as.Date("1995-01-01")) %>%`
`mutate(Date=as.Date(paste0(REF_DATE,"-01")))%>%`
`select(Date,VECTOR,VALUE)%>%`
`ts_ts()`
`#order the series by the vector order in the request`
`canada_lfs_ts<-canada_lfs_ts[,canada_lfs_vbls]`
`#save the data vintaged`
`saveRDS(canada_lfs_ts,file=paste0("LFS_15_data_",Sys.Date(),".Rds"))`

The meta data for the series and corresponding NDM table are retrieved and displayed as well for documentation purposes.

The next section of code calculates the change variables needed for plotting the monthly change with standard errors.

`last_year<-year(Sys.Date())-1`
`window(canada_lfs_ts,last_year)`
`print(colnames(canada_lfs_ts))`
`print(data.frame(select(meta1,VECTOR,title_en)))`
`#the first vector is lfs employment 15+`
`#the fourth vector is the standard error of the month to month change`
`lfs_change<-diff(canada_lfs_ts[,1],differences=1,lag=1)`
`#the data have to be merged together to maintain the ts attributes`
`data1<-cbind(canada_lfs_ts[,1],lfs_change,canada_lfs_ts[,4])`
`colnames(data1)<-c("LFS15+","DeltaLFS15+","SEDeltalfs15+")`
`tail(data1,12)`
`plot_data<-cbind(data1[,2],data1[,2]+.5*data1[,3],data1[,2]-.5*data1[,3])`
`colnames(plot_data)<-c("DeltaLFS15","Upper","Lower")`
`#making it into a wide data frame makes it easier to manage in ggplot because the date is accessible`
`#subset method from forecast package is used to manage number of observations`
`#that is easier than using the window command`
`numb_obs<-nrow(plot_data)`
`plot_data_tbl<-ts_tbl(subset(plot_data,start=numb_obs-23))`
`#use spread from tidyr to make it wide so that upper and lower are easily`
`#accessed`
`#the ts_tbl function from the tsbox package has extracted the time variable from the`
`#ts objects to make it a separate variable`
`plot_data2<-spread(plot_data_tbl,id,value)`

The cbind command is used to merge the calculated variables into a data frame retaining the ts attributes. Because the forecast package is loaded, the subset function specifically recognizes “ts” objects. One challenge with “ts” objects is that the time concept is hidden as an attribute of the variables so particular procedures are required to manage the attributes. The ts_tbl, from the tsbox package, converts the ts data frame into a tall tibble with the variable names in the “id” variable. The ts_tbl also extracts the time concept into its own separate variable.

However, for the purposes of the plot, we need to access each variable (levels, upper and lower bounds) so the tibble is spread using the corresponding tidyyr function into a wide data structure.

`#now calculate a date string for the caption`
`date_string<-paste0(substr(plot_data2\$time[1],1,7),"/",substr(tail(plot_data2\$time,1),1,7))`
`head(plot_data2)`
`#load the plot library`
`library(ggplot2)`
`plot1<-ggplot(plot_data2,aes(x=time,y=DeltaLFS15))+geom_point()+geom_pointrange(aes(ymin=Lower,ymax=Upper))+`
`geom_line()+`
`labs(title="Monthly Change in LFS 15+ Employment",y="Monthly Change (000)",`
`x=NULL,`
`subtitle="Seasonally Adjusted with Standard Error Bars",`
`caption=paste(date_string," ","STATCAN:",paste(canada_lfs_vbls[c(1,4)],collapse=", "),"JCI"))`
`ggsave(plot1,file="LFS15_monthly_change.png")`

Because the data had been reorganized into a wide tibble, the plot is relatively straight forward. The separate ymin and ymax variables provide the limits for the error bars produced by the pointrange geom overlay.

`#now start a plot of the trend cycle and the total employment`
`# review this link about trend cycle https://www.STATCAN.gc.ca/eng/dai/btd/tce-faq`
`#we want to drop the last 12 observations from the trend cycle and add a third series which is just the tail of the trend cycle.`
`dotted_period<-6`
`data_chart2<-cbind(canada_lfs_ts[,1],head(canada_lfs_ts[,2],length(canada_lfs_ts[,2])-dotted_period),`
`tail(canada_lfs_ts[,2],dotted_period))`
`# tail(data_chart2,24)`
`colnames(data_chart2)<-c("LFS15plus","Trend-Cycle","Trend Recent")`
`numb_obs<-nrow(data_chart2)`
`plot_data_trend<-subset(data_chart2,start=numb_obs-23)`
`trend_tibble<-ts_tbl(plot_data_trend)`
`date_string_trend<-paste0(substr(trend_tibble\$time[1],1,7),"/",substr(tail(trend_tibble\$time,1),1,7))`
`#linetype and colur are managed in the alphabetic order of the series group in tibble.`
`trend_plot<-ggplot(trend_tibble,aes(x=time,y=value,group=id,`
`colour=id))+geom_line(aes(linetype=id))+`
`scale_linetype_manual(values=c("solid","solid","dashed"))+`
`theme(legend.position="bottom",legend.title=element_blank())+`
`labs(title="Monthly LFS 15+ Employment",x=NULL,y="Monthly Level (000)",`
`subtitle="Seasonally Adjusted",`
`caption=paste(date_string_trend," ","STATCAN:",paste(canada_lfs_vbls[c(1,2)],collapse=", "),"JCI"))`
`ggsave(trend_plot,file="LFS15_monthly_trend.png")`

Note the use of the caption to document the Vnumbers used in the chart. The line types are forced in this chart using the scale_linetype_manual function. The next set of code merges the two charts into one png file.

`#gridExtra's grob management is useful`
`#to group the graphic objects from the plots`
`#the graphic objects are plot instructions not pngs`
`library(gridExtra)`
`plot_list<-mget(c("plot1","trend_plot"))`
`two_plots<-marrangeGrob(plot_list,nrow=1,ncol=2,top=NULL)`
`two_plot_name<-"lfs_se_trend.png"`
`ggsave(file=two_plot_name,width=15,height=8,two_plots)`

The next set of code repeats the standard error plot for the annual change in the employment timeseries.

`#now do the annual change plot`
`lfs_change_annual<-diff(canada_lfs_ts[,1],differences=1,lag=12)`
`#the data have to be merged together to maintain the ts attributes`
`data12<-cbind(canada_lfs_ts[,1],lfs_change_annual,canada_lfs_ts[,5])`
`colnames(data12)<-c("LFS15+","DeltaLFS15+12","SEDeltalfs15+12")`
`tail(data12,12)`
`plot_data<-cbind(data12[,2],data12[,2]+.5*data12[,3],data12[,2]-.5*data12[,3])`
`colnames(plot_data)<-c("DeltaLFS15A","Upper","Lower")`
`numb_obs<-nrow(plot_data)`
`plot_data2<-spread(ts_tbl(subset(plot_data,start=numb_obs-23)),id,value) #go for a wide tibble to make the chart easier`
`#now calculate a date string for the caption`
`date_string<-paste0(substr(plot_data2\$time[1],1,7),"/",substr(tail(plot_data2\$time,1),1,7))`
`head(plot_data2)`
`#load the plot library`
`library(ggplot2)`
`plot12<-ggplot(plot_data2,aes(x=time,y=DeltaLFS15A))+geom_point()+geom_pointrange(aes(ymin=Lower,ymax=Upper))+`
`geom_line()+`
`labs(title="Annual Change in LFS 15+ Employment",y="Annual Change (000)",`
`x=NULL,`
`subtitle="Seasonally Adjusted with Standard Error Bars",`
`caption=paste(date_string," ","STATCAN:",paste(canada_lfs_vbls[c(1,5)],collapse=", "),"JCI"))`
`ggsave(plot12,file="LFS15_annual_change.png")`
`plot_list<-mget(c("plot1","plot12"))`
`two_plots<-marrangeGrob(plot_list,nrow=1,ncol=2,top=NULL)`
`two_plot_name<-"lfs_se_monthly_annual.png"`
`ggsave(file=two_plot_name,width=15,height=8,two_plots)`

Now we start the actual seasonal adjustment examples using stl as the first one.

`#now start some seasonal adjustment tests`
`#the raw series is the 6th and last in the list of requested vectors`
`stc_sa<-canada_lfs_ts[,1]`
`stc_raw<-canada_lfs_ts[,6]`
`#first we will use the stl package`
`empl_stl_fit<-mstl(stc_raw,robust=TRUE)`
`print(summary(empl_stl_fit))`
`stl_output<-`
`capture.output(summary(empl_stl_fit))`
`#now write it out`
`cat("stl Results",stl_output,`
`file="LFS15_stl_components_analysis.txt",`
`sep="\n")`
`empl_stl_sa<-seasadj(empl_stl_fit)`
`stl_component_plot<-autoplot(empl_stl_fit)+`
`labs(title="Total Employment Decomposition LFS15+",`
`subtitle="Decomposition using STL - LOESS (000)",`
`caption=paste("STATCAN:",canada_lfs_vbls[6],"JCI"))`
`ggsave(stl_component_plot,file="LFS15_stl_components.png")`

In the code above, the mstl procedure from the forecast package is used rather than the stl version in the base R implementation. The mstl approach provides a slightly tighter fit by iterating the stl process. All of the seasonal adjustment procedures produce some object which can be displayed with the summary function. The capture.output function is used to capture the summary results to a text variable which is written out to a text file. This text file was inserted into the seasonal adjustment report. The autoplot function from the forecast package extends the standard ggplot2 library to facilitate the plotting of timeseries and related objects. For the mstl/stl procedure, the resulting object (empl_stl_fit) includes all of the decomposition components. The seasadj function is used to extract the seasonally-adjusted timeseries.

The next stage uses the seas function from the seasonal package from Sax to access the Bureau of the Census binaries to do the X13-SEATS analysis,

`#now we are going to try x-13`
`library(seasonal)`
`empl_x13_fit<-seas(stc_raw)`
`empl_x13_sa<-final(empl_x13_fit)`
`print(summary(empl_x13_fit))`
`x13_component_plot<-autoplot(empl_x13_fit)+`
`labs(title="Total Employment Decomposition LFS15+",`
`subtitle="Decomposition using X13 -default settings (000)",`
`caption=paste("STATCAN:",canada_lfs_vbls[6],"JCI"))`
`ggsave(x13_component_plot,file="LFS15_x13_components.png")`
`#now lets save the x13 results as a text file`
`x13_output<-`
`capture.output(summary(empl_x13_fit))`
`#now write it out`
`cat("X13 Results",x13_output,`
`file="LFS15_x13_components_analysis.txt",`
`sep="\n")`

In this case, the final function extracts the seasonally-adjusted series.

The next code segment repeats the same process but sets the x11 option to a null string to get the base X11 Arima analysis.

`#now we will do the x11-variant`
`empl_x11_fit<-seas(stc_raw,x11="")`
`empl_x11_sa<-final(empl_x11_fit)`
`print(summary(empl_x11_fit))`
`x11_component_plot<-autoplot(empl_x11_fit)+`
`labs(title="Total Employment Decomposition LFS15+",`
`subtitle="Decomposition using x11 -default settings (000)",`
`caption=paste("STATCAN:",canada_lfs_vbls[6],"JCI"))`
`ggsave(x11_component_plot,file="LFS15_x11_components.png")`
`#now lets save the x11 results as a text file`
`x11_output<-`
`capture.output(summary(empl_x11_fit))`
`#now write it out`
`cat("x11 Results",x11_output,`
`file="LFS15_x11_components_analysis.txt",`
`sep="\n")`

The next section of code plots the seasonally-adjusted time series over the whole range processed and over a shorter range.

`#comparison of three methods`
`plot_data3<-cbind(stc_sa,empl_stl_sa,empl_x13_sa,empl_x11_sa)`
`colnames(plot_data3)<-c("STATCAN","MSTL-LOESS","X13-SEATS","X-11")`
`plot_tibble<-ts_tbl(plot_data3)`
`last_date<-substr(max(plot_tibble\$time),1,7)`
`stc_text<-paste("STATCAN: raw:",canada_lfs_vbls[6],"SA:",canada_lfs_vbls[1],"as of ",last_date,"JCI")`
`sa_plot<-ggplot(plot_tibble,aes(x=time,y=value,colour=id,group=id))+`
`geom_line(aes(linetype=id))+`
`theme(axis.title.x=element_blank(),`
`legend.title=element_blank(),legend.position="bottom")+`
`labs(title="Comparison of Seasonally Adjusted LFS15+ Employment",`
`subtitle="Methods are STL-LOESS, X11, X13-SEATS and STATCAN-X12",`
`y="Employment (000)",`
`caption=stc_text)`
`ggsave(sa_plot,file="LFS15_sa_alternatives_full.png")`
`#shorter time horizaon`
`plot_data4<-window(plot_data3,start=2016)`
`plot_tibble<-ts_tbl(plot_data4)`
`last_date<-substr(max(plot_tibble\$time),1,7)`
`stc_text<-paste("STATCAN: raw:",canada_lfs_vbls[6],"SA:",canada_lfs_vbls[1],"as of ",last_date,"JCI")`
`sa_plot4<-ggplot(plot_tibble,aes(x=time,y=value,colour=id,group=id))+`
`geom_line(aes(linetype=id))+`
`theme(axis.title.x=element_blank(),`
`legend.title=element_blank(),legend.position="bottom")+`
`labs(title="Comparison of Seasonally Adjusted LFS15+ Employment",`
`subtitle="Methods are STL-LOESS,X13-SEATS and STATCAN-X12",`
`y="Employment (000)",`
`caption=stc_text)`
`ggsave(sa_plot4,file="LFS15_sa_alternatives_short.png")`

The next set of code plots the irregular, trend-cycle and seasonal components using facets so that they are compared on a common scale for each component.

`#add a comparison of irregulars`
`stl_irregular<-remainder(empl_stl_fit)`
`x11_irregular<-irregular(empl_x11_fit)`
`x13_irregular<-irregular(empl_x13_fit)`
`plot_data_5<-cbind(stl_irregular,x11_irregular,x13_irregular)`
`plot_tibble<-ts_tbl(plot_data_5)`
`irr_plot<-ggplot(plot_tibble,aes(x=time,y=value,colour=id,group=id,`
`fill=id,facet=id))+theme(axis.title.x=element_blank(),`
`legend.title=element_blank(),legend.position="bottom")+`
`labs(title="Comparison of Seasonal Adjustment Residuals LFS15+ Employment",`
`subtitle="Methods are STL-LOESS, X11, X13-SEATS",`
`y="Employment (000)",`
`caption=stc_text)+`
`geom_col()+facet_wrap(~id,ncol=1)`
`ggsave(irr_plot,file="LFS15_sa_irregular.png")`
`#add a comparison of trendcycles`
`stl_trendcycle<-trendcycle(empl_stl_fit)`
`x11_trendcycle<-trend(empl_x11_fit)`
`x13_trendcycle<-trend(empl_x13_fit)`
`plot_data_5<-cbind(stl_trendcycle,x11_trendcycle,x13_trendcycle)`
`plot_tibble<-ts_tbl(plot_data_5)`
`trend_plot<-ggplot(plot_tibble,aes(x=time,y=value,colour=id,group=id,`
`fill=id,facet=id))+theme(axis.title.x=element_blank(),`
`legend.title=element_blank(),legend.position="bottom")+`
`labs(title="Comparison of Seasonal Adjustment Trendcycle LFS15+ Employment",`
`subtitle="Methods are STL-LOESS, X11, X13-SEATS",`
`y="Employment (000)",`
`caption=stc_text)+`
`geom_line(aes(linetype=id))+facet_wrap(~id,ncol=1)`
`ggsave(trend_plot,file="LFS15_sa_trendcycle.png")`
`plot_tibble<-ts_tbl(plot_data_5)`
`trend_plot<-ggplot(plot_tibble,aes(x=time,y=value,colour=id,group=id,`
`fill=id))+theme(axis.title.x=element_blank(),`
`legend.title=element_blank(),legend.position="bottom")+`
`labs(title="Comparison of Seasonal Adjustment Trendcycle LFS15+ Employment",`
`subtitle="Methods are STL-LOESS, X11, X13-SEATS",`
`y="Employment (000)",`
`caption=stc_text)+`
`geom_line(aes(linetype=id))`
`ggsave(trend_plot,file="LFS15_sa_trendcycle_oneplot.png")`
`#add a comparison of seasonals`
`#the seasonals must be extracted from the object result from seas`
`stl_seasonal<-seasonal(empl_stl_fit)`
`x11_seasonal<-(empl_x11_fit\$data[,"seasonal"])`
`x13_seasonal<-(empl_x13_fit\$data[,"seasonal"])`
`plot_data_6<-cbind(stl_seasonal,x11_seasonal,x13_seasonal)`
`plot_tibble<-ts_tbl(plot_data_6)`
`seasonal_plot<-ggplot(plot_tibble,aes(x=time,y=value,colour=id,group=id,`
`fill=id,facet=id))+theme(axis.title.x=element_blank(),`
`legend.title=element_blank(),legend.position="bottom")+`
`labs(title="Comparison of Seasonal Adjustment Seasonals LFS15+ Employment",`
`subtitle="Methods are STL-LOESS, X11, X13-SEATS",`
`y="Employment (000)",`
`caption=stc_text)+`
`geom_line(aes(linetype=id))+facet_wrap(~id,ncol=1)`
`ggsave(seasonal_plot,file="LFS15_sa_seasonal.png")`

There are functions to extract the three components for stl but not for the seas function. The latter package includes functions for the first two components only. Therefore, the seasonal portion must be extracted directly from the fit object returned by the seas function.

The next set of code uses the summary function to analyze the set of components.

`#compare the seasonals`
`d1<-cbind(stl_seasonal,x11_seasonal,x13_seasonal)`
`summary(d1)`
`#sappply can be used to apply a function to each column of a data frame`
`#alternatively use one of the map functions from purrr for more flexibility`
`sapply(d1,sd,na.rm=TRUE)`
`d2<-cbind(stl_trendcycle,x11_trendcycle,x13_trendcycle)`
`summary(d2)`
`sapply(d2,sd,na.rm=TRUE)`
`d3<-cbind(stl_irregular,x11_irregular,x13_irregular)`
`summary(d3)`
`sapply(d3,sd,na.rm=TRUE)`

The sapply procedure is used to apply the standard deviation (“sd”) procedure separately to each column in the data frame. If you use apply “sd” to the frame, all 3 columns are combined which is not useful.

The results are not very nicely formatted so the final set of code uses the stargazer package to produce a formatted HTML table of statistics. This is written out to an HTML file.

`library(stargazer)`
`#prepare a sett of summaries with stargazer`
`d_components<-cbind(stl_seasonal,x11_seasonal,x13_seasonal,`
`stl_trendcycle,x11_trendcycle,x13_trendcycle,`
`stl_irregular,x11_irregular,x13_irregular)`
`stargazer(d_components,summary=TRUE,type="html",out="statistics_components.html")`
`d_sa<-cbind(empl_stl_sa,empl_x11_sa,empl_x13_sa,stc_sa)`
`stargazer(d_sa,summary=TRUE,type="html",out="statistics_sa.html")`
`d_sa_2010<-window(d_sa,start=2010)`
`stargazer(d_sa_2010,summary=TRUE,type="html",out="statistics_sa_2010.html")`

If the output html files are opened in a web browser, the material can be copied and pasted to a report. There are other packages for doing nicely formatted summaries, but stargazer is useful for reporting regression results and other data. It can also produce LaTex and ascii. The package echoes its HTML output to the console.

[1] This note has benefited from advice and criticism from Dr. Philip Smith. All responsibility for the errors, omissions and opinions rests with the author Paul Jacobson.

[2] The examples were run using Statistics Canada’s data retrieved as of January 28, 2019.

[6] Alexander Cairncross. (n.d.). AZQuotes.com. Retrieved January 23, 2019, from AZQuotes.com Web site: https://www.azquotes.com/quote/757581