GDP Growth Decomposition
In this example, we will use the get_cansim function from Mountain Math to retrieve the monthly GDP $K data. The Laspeyres time series (fix weighted) will be used to analyze the relative role that major aggregates played in the monthly results for total gdp growth. The meta data now available in the NDM will be used to flexibly select the aggregates to be analyzed.
Packages USED
The key package used in this vignette is the cansim package which access the Statcan NDM api. Key packages include
- dplyr – for processing the tibble prior to conversion to xts
- ggplot2 – part of tidyverse for plotting the results
- cansim – package to retrieve metadata and series from Statistics Canada’s NDM
- grid, gridExtra, gtable – for developing a table of the growth rates to be plotted.
Retrieving the DATA
The first major step is to use the get_cansim function to retrieve the complete table for analysis. This table is saved using the saveRDS function in R to the local project folder. This facilitates documentation as well as easy processing. A flag is used to determine whether to retrieve a fresh copy of the table for processing.
library(cansim)
library(tidyverse)
library(lubridate)
#get gdp table and save it in the working directory
gdp_table_id<-"36-10-0434"
file_refresh<-TRUE
file_refresh<-FALSE
save_file<-paste0(gdp_table_id,".rds")
if(!file.exists(save_file)|file_refresh){
gdp_table<-get_cansim(gdp_table_id)
#now save it in the working directory
saveRDS(gdp_table,file=save_file)
} else {
gdp_table<-readRDS(save_file)
}
print(colnames(gdp_table))
#The current column names are
# 1 REF_DATE
# 2 GEO
# 3 DGUID
# 4 Seasonal adjustment
# 5 Prices
# 6 North American Industry Classification System (NAICS)
# 7 UOM
# 8 UOM_ID
# 9 SCALAR_FACTOR
# 10 SCALAR_ID
# 11 VECTOR
# 12 COORDINATE
# 13 VALUE
# 14 STATUS
# 15 SYMBOL
# 16 TERMINATED
# 17 DECIMALS
# 18 GeoUID
# 19 Classification Code for Seasonal adjustment
# 20 Hierarchy for Seasonal adjustment
# 21 Classification Code for Prices
# 22 Hierarchy for Prices
# 23 Classification Code for North American Industry Classification System (NAICS)
# 24 Hierarchy for North American Industry Classification System (NAICS)
#now we will rename columns as needed. We can use numbers for the original columns
The interactive analysis of the columns of the retrieved table should always be an initial step in the process of script development. The column numbers are shown in the R comments. Many are long enough to be unwieldly when trying to write easy to manage code. As noted before, column names with blanks or special characters should be enclosed with back ticks (`) when using dplyr and other tidyverse packages. The file_refresh flag should be set to TRUE to force re-retrieval of the table.
In the next stage the table will be simplified, columns to be used will be renamed as required. The column number is used to simplify the renaming process. An r-standard date will also be created. Columns are also selected using their numbers rather than names for simplicity.
#the working matrix with be set to reduced columns
#an R standard date is also created
gdp_monthly<-gdp_table %>%
rename(`sadj_type`=4,
`NAICS`=6,
`price_type`=21,
`naics_code`=23
) %>%
select(c(1,2,4:6,12,13,23)) %>%
mutate(data_date=as.Date(paste0(REF_DATE,"-1")))
#gdp table could be removed at this point
The next stage is to start the processing to develop the data for the initial and subsequent charts. We will use selected special aggregates as well as a few key service series. These series will be defined using the NAICS code in the meta data provided by the get_cansim retrieval. This variable, defined as naics_code in the rename above is easier to work with than the NAICS text strings for selecting series. In the code below, the text strings are used to select the seasonal adjustment type required and the price measure. For this analysis, we will be using the 2007 fixed weighted Laspeyres measures. The latter are referred to as “2007 constant”. The select verb in the code above uses column numbers rather than variable names for coding convenience.
chart1 - composition of growth special aggregates
target_aggregates<-c("[T001]",
"[T002]",
"[T003]",
#"[T004]",
"[T005]",
"[T006]",
"[T007]",
"[T011]",
"[T012]",
"[T016]",
"[T018]",
"[41]",
"[44-45]",
"[52]","[53]"
)
#build tibble of distinct naics_codes
naics_table<-select(gdp_monthly,NAICS,naics_code)%>%distinct()
#calculate last date
last_date<-max(gdp_monthly$data_date)
date_vector_1<-c(last_date-months(4),last_date-months(2),last_date-months(1),last_date)
#get unique_prices and the string for constant dollars
price_measures<-unique(gdp_monthly$Prices)
laspeyres_flag<-price_measures[grep("constant",price_measures)]
#get_flag for seasonally adjusted at annual rates
seasonal_measures<-unique(gdp_monthly$sadj_type)
seasonal_flag<-seasonal_measures[grep("annual",seasonal_measures)]
#get the data for the growth table with the lag required for percentage change
In the initial analysis, the naics table calculated here was examined for the target aggregate codes defined in the character array in the code above. The next set of code develops the main table for processing. Data are selected only for the required months, in this case, the last 4 months are used since we are going to report 3 months in one of the tables.
The data must be grouped by the naics_code and the COORDINATE then sorted (“arranged”) by date so that lags, percentage changes and contributions can be calculated. The COORDINATE variable preserves ordering.
growth_data<-filter(gdp_monthly,data_date %in% date_vector_1,
Prices==laspeyres_flag,sadj_type==seasonal_flag) %>%
group_by(COORDINATE,naics_code) %>%arrange(data_date)%>%
mutate(monthly_change=100*((VALUE/lag(VALUE,1))-1),
delta_change=VALUE-lag(VALUE,1))%>%ungroup()
head(as.data.frame(growth_data))
The head function uses a data.frame to force the display of the initial rows of all columns for the tibble. The monthly percentage change as well as the delta change (level difference) are calculated. The latter is required for the composition of growth calculations below.
The next set of code defines a tibble for the total all-industries sector. The ratio of the delta change in the target sectors to that of the total sector defines the composition of change. The latter calculation is part of the calculation of the full_table which involves a left_join of the growth_data table to the total table.
#now select the total only
total_only<-filter(growth_data,naics_code=="[T001]")%>%select(data_date,monthly_change,delta_change)%>%
rename(total_change=delta_change,total_pch=monthly_change)
#now the base table and total to calculate the composition of change
full_table<-left_join(growth_data,total_only,by="data_date")%>%
filter(!is.na(delta_change)) %>%
mutate(point_contribution=((delta_change/total_change)*total_pch))
The point contribution is defined as the change share of the percentage change for the total series.
The next code selects the data for just the target aggregates and for the key variables required in the subsequent charts.
#select the chart data for the last months
chart_data_set<-filter(full_table,naics_code %in% target_aggregates) %>%
select(data_date,naics_code,NAICS,point_contribution,monthly_change)
#now plot the last monthly_change
#get_industries
# naics_text<-as.data.frame(filter(naics_table,naics_code %in% target_aggregates)%>%
# select(NAICS))
reverse_index<-seq(length(target_aggregates),1,by=-1)
chart_last_month<-filter(chart_data_set,data_date==last_date)%>%
mutate(naics_text=paste(naics_code,NAICS))
The next code starts the initial chart which is only for the last date.
plot1<-ggplot((chart_last_month),aes(y=point_contribution,x=naics_code,group=data_date,
label=round(point_contribution,2)))+
labs(title="Percentage Point Contribution to Total GDP Change",
caption=paste("NDM:",gdp_table_id,"JCI"),
x=NULL,y="Percentage point contribution",
subtitle=paste("Laspeyres data for selected aggregates for",substr(last_date,1,7)))+
geom_bar(stat="identity",fill="light green")+
geom_text(size=3)+scale_x_discrete(limits=chart_last_month$naics_code[reverse_index],
labels=(chart_last_month$NAICS[reverse_index]))+coord_flip()
ggsave(plot1,file="gdp_laspeyres_contribution.png")
The coordinates of the chart are flipped so that the x axis is plotted on the vertical portion of the chart. Data points are labeled with the rounded value of the point contribution. The X axis is labelled with the text even though the plot is done with the naics_code variable. The labelling is reversed so that the all industries is at the top of the chart. The date is turned into a text string for the purposes of the subtitle. An example of this initial chart is shown below.
The next stage creates an alternative chart to show the data for two months. The data must be grouped to separate it for preparing the bars. The default bar chart is a stacked bar so the “dodge” capability is used to place the bars side by side.
#now lets do it for 2 months
chart_last_two<-filter(chart_data_set,data_date %in% c(last_date,last_date-months(1)))%>%
mutate(naics_text=paste(naics_code,NAICS))
#calculate an adjustment factor to move the text outside the bar
#R considers a date object as a potentially contiguous variabble
#therefore, we convert it to a discrete text string for the group and fill values
text_adjust_factor<-.1*max(abs(chart_last_two$point_contribution))
plot2<-ggplot((chart_last_two),aes(y=point_contribution,x=naics_code,group=substr(data_date,1,7),
fill=substr(data_date,1,7),label=round(point_contribution,2)))+
labs(title="Percentage Point Contribution to Total GDP Change",
caption=paste("NDM:",gdp_table_id,"JCI"),
x=NULL,y="Percentage point contribution",
subtitle="Laspeyres data for selected aggregates")+
geom_bar(stat="identity",position="dodge")+
scale_fill_discrete(name="Month")+
geom_text(size=3,position=position_dodge(0.9),aes(y=(sign(point_contribution)*
(text_adjust_factor+abs(point_contribution)))))+
theme(legend.title=element_blank(),legend.position="bottom")+
scale_x_discrete(limits=chart_last_two$naics_code[reverse_index],
labels=(chart_last_two$NAICS[reverse_index]))+coord_flip()
ggsave(plot2,file="gdp_laspeyres_contribution_2mon.png")
The dodge position is specified in the geom_bar definition. A date construct in R is considered to be a continuous variable because it is represented as number. Therefore, group and fill variable definitions convert the date to text string which is, by definition, discrete. The group defines the separation of the bars. The fill defines the colour choices from the default set for the bars. A text adjustment factor is calculated relative to the maximum of the values plotted to facilitate the positioning of the value labels above the end of the bars. This is used in the geo_text definition which supplies a text layer on top of the bar layer from geom_bar. The theme function is used to move the legend from the default side position to the bottom and remove the superfluous title. An example of the chart is inserted below.
The final chart example uses data for the last 3 months and is defined similarly.
#last three
chart_last_three<-filter(chart_data_set,data_date %in% c(last_date,last_date-months(1),last_date-months(2)))%>%
mutate(naics_text=paste(naics_code,NAICS))
#calculate an adjustment factor to move the text outside the bar
#R considers a date object as a potentially contiguous variabble
#therefore, we convert it to a discrete text string for the group and fill values
text_adjust_factor<-.1*max(abs(chart_last_three$point_contribution))
plot3<-ggplot((chart_last_three),aes(y=point_contribution,x=naics_code,group=substr(data_date,1,7),
fill=substr(data_date,1,7),label=round(point_contribution,2)))+
labs(title="Percentage Point Contribution to Total GDP Change",
caption=paste("NDM:",gdp_table_id,"JCI"),
x=NULL,y="Percentage point contribution",
subtitle="Laspeyres data for selected aggregates")+
geom_bar(stat="identity",position=position_dodge(0.9),width=.8)+
scale_fill_discrete(name="Month")+
geom_text(size=2.5,position=position_dodge(0.9),aes(y=(sign(point_contribution)*
(text_adjust_factor+abs(point_contribution)))))+
theme(legend.title=element_blank(),legend.position="bottom")+
scale_x_discrete(limits=chart_last_two$naics_code[reverse_index],
labels=(chart_last_two$NAICS[reverse_index]))+coord_flip()
ggsave(plot3,file="gdp_laspeyres_contribution_3month.png")
In the two multi-date charts, the geom_text function utilizes an aes (“aesthetics”) specification to supply a y-axis (value) position which is calculated by adding the text adjustment to the absolute value of the point being plotted and then applying the original sign. In this three-date version, the width option in geom_bar is used to slightly narrow the width to provide a bit of space between the bars. A sample chart is shown below.
The final stage in this script is to develop a table that can be plotted with the plotting software. GGPLOT2 is based on the grid package. Several extensions to the grid package, gridExtra and gtable are used to provide a title to the standard table grob (graphic object). The initial step is to develop a tibble which includes a modified character version of the date, converts the naics_text into a factor to preserve order in the table and reduces the dataset to only the columns required, the sector text, the date and the contribution value. The tidyverse spread function is used to create a data frame with columns for each date.
#now build up a table for a chart table
# the date is shortened and the text is converted to factor to preserve order
table_data<-mutate(chart_last_three,date_text=substr(data_date,1,7),
row_text=factor(naics_text,levels=unique(naics_text)),contribution=round(point_contribution,2))%>%
select(row_text,date_text,contribution)%>%
rename(Sectors=row_text)
#now use spread to create a data frame with dates as columns
table_frame<-spread(table_data,key=date_text,value=contribution)
The final segment creates the actual table graphic and finally combines it with the third version of the charts above. The default tableGrob creates a nice table but with centered values. For this purpose, we want to right justify everything. This is done with a modification to the default theme. Then gtable commands are used to put a title at the top of the table and a footnote at the bottom. A link is supplied in the comments to a stack overflow discussion of modifying a table grob in this way. The comments explain what is going on.
library(gridExtra)
t1<-ttheme_default()
#refer to the vignette documentation
tt2 <- ttheme_default(core=list(fg_params=list(hjust=1, x=0.9)),
rowhead=list(fg_params=list(hjust=1, x=0.95)))
grob1<-tableGrob(table_frame,rows=NULL,theme=tt2)
#now add a title based on a stack overflow discussion
library(grid)
library(gtable)
grob_title<-textGrob("Percentage Contributions to Laspeyres Growth Rate",
gp=gpar(fontsize=15))
padding<-unit(.5,"line")
#create a new grob with room for the title at top (pos=0)
full_grob<-gtable_add_rows(grob1,heights=grobHeight(grob_title)+padding,pos=0)
full_grob<-gtable_add_grob(full_grob,list(grob_title),t=1,l=1,r=ncol(full_grob))
grob_footnote<-textGrob(paste("STATCAN NDM: ",gdp_table_id,"JCI"),
gp=gpar(fontsize=8))
full_grob<-gtable_add_rows(full_grob,heights=grobHeight(grob_footnote)+padding,
pos=nrow(full_grob))
full_grob<-gtable_add_grob(full_grob,list(grob_footnote),t=nrow(full_grob),l=2,r=ncol(full_grob))
plot(full_grob)
ggsave(full_grob,file="gdp_contributions_table_3months.png")
plot_list<-mget(c("plot3","full_grob"))
two_plots<-marrangeGrob(plot_list,ncol=2,nrow=1,top=NULL)
two_plot_name<-"gdp_contributions_analysis.png"
ggsave(file=two_plot_name,width=15,height=8,two_plots)
A grob is just a special data frame with instructions for plotting. There is one row for each row of the table and the code simply adds a row at the top and later at the bottom. A graphic version of the title and footnote are inserted in the appropriate position.
The table example is shown below.
An example of the merged graphic with chart 3 and the table is shown below.
The important point in the analysis is to show if there is significant variability in the sources of growth for aggregate GDP in recent months.