Analyzing Financial Indicators By Country From The OECD With R
In this example, we will use the OECD package to access financial indicators data by country. Because it is a different dataset, the approach must be completely adjusted for differences in nomenclature and organization.
Packages USED
The key package used in this vignette is the oecd package which accesses stats database from the OECD.[1] Key packages include
- dplyr – for processing the tibble
- ggplot2 – part of tidyverse for plotting the results
- OECD – package to access data using OECD stats api.[2]
Examing the Data Structure
The initial stage is to search for available databases.
#OECD RUN
library(OECD)
library(tidyverse)
library(gdata)
#get data set list
dataset_list <- get_datasets()
search_results<-search_dataset("Financial Indicators", data = dataset_list)
print(search_results)
#
data_set<-"FIN_IND_FBS"
search_results %>%
filter(id==data_set)%>%
select(title)->data_set_title
typeof(data_set_title)
data_set_title<-as.character(unlist(data_set_title))
print(data_set_title)
dstruct<-get_data_structure(data_set)
str(dstruct,max.level=1)
dstruct$VAR_DESC
# id description
# 1 LOCATION Country
# 2 INDICATOR Indicator
# 3 TIME Time
# 4 OBS_VALUE Observation Value
# 5 TIME_FORMAT Time Format
# 6 OBS_STATUS Observation Status
# 7 UNIT Unit
# 8 POWERCODE Unit multiplier
# 9 REFERENCEPERIOD Reference period
print(dstruct$INDICATOR)
The code above shows how to inspect the structure of a given data set. In this case, we are looking at the “Financial Indicators – Stocks”. Each dataset in OECD.stat (https://stats.oecd.org) has a different structure. Target databases can be evaluated using the stats.oecd.org web site.
There are a large number of indicators. By inspection, we have chosen:
- Debt of general government, as a percentage of GDP
- Debt of households and NPISHs, as a percentage of GDI
- Private sector debt, as a percentage of GDP
- Total net worth of households and NPISHs, as a percentage of GDI
Retrieving the data
indicator_table<-dstruct$INDICATOR[c(13,15,18,20),]
print(indicator_table)
indicator_vector<-unlist(indicator_table["id"])
indicator_title<-unlist(indicator_table["label"])
#define the countries we are going to chart
target_locations<-c("CAN","USA","FRA","DEU","GBR","DNK","NLD","NZL","AUS","NOR","ESP")
filter_list<-list(target_locations,indicator_vector)
data1<-get_dataset(dataset=data_set,filter=filter_list)
Filters have to be set in the order of variables shown in the variable description above to reduce the data retrieval to manageable proportions. The retrieved dataset is saved as an XLSX file in the default directory to provide a snapshot of data. This is often a requirement when delivering research because data gets revised. The advantage of the XLSX data structure is it is now an open data standard. The openxlsx package creates the file without using any Microsoft or Java libraries. If one wanted only to postprocess the data in R, it would be better to save the data1 tibble using saveRDS, a native R command which saves in a compact binary form. The command readRDS could be used to read the file in,
ibrary(openxlsx)
write.xlsx(data1,file=paste0("OECD_Financial_indicators",Sys.Date(),".xlsx"))
Process and Plot the Data by Indicator
The next set of code is the key processing loop which processes the data1 object by indicator and does 4 plots.
or(this_chart in seq_along(indicator_vector)){
# print(this_chart)
this_indicator<-indicator_vector[this_chart]
this_title<-indicator_title[this_chart]
title_sections<-unlist(strsplit(this_title,",",fixed=TRUE))
print(this_title)
plot_data<-data1 %>%
filter(INDICATOR==this_indicator)%>%
group_by(LOCATION) %>%
arrange(obsTime) %>%
slice(n()) %>%
ungroup()
# print(as.data.frame(plot_data))
max_obs_value<-max(plot_data$obsValue)
cat("maximum is ",max_obs_value,"\n")
text_adjust_factor<-.1*max(abs((plot_data$obsValue)))
target_title="Percentage"
plot1<-ggplot(plot_data,aes(x=LOCATION,y=obsValue,
label=paste(round(obsValue,1),"-",obsTime)))+
labs(y=target_title,x=NULL,
title=title_sections[1],
subtitle=title_sections[2],
caption=paste("OECD:",data_set_title,"JCI",Sys.Date()))+
geom_bar(stat="identity",fill="light green")+
theme(plot.title=element_text(size=12),
plot.subtitle=element_text(size=10),plot.caption=element_text(size=7))+
geom_text(size=2.5,angle=90,aes(y=(sign(obsValue)*
# (text_adjust_factor+abs(obsValue)))))
(1.2*abs(obsValue)))))
ggsave(plot1,file=paste0(this_indicator,".png"),width=8,height=10.5)
#rename the plots
mv(from="plot1",to=paste0(this_indicator,"_plot1"))
}
The plot data is constructed by filtering by indicator and then grouping by location, sorting by time and extracting via the slice operator only the last observation. The title is split into two sections to provide a more readable title and subtitle. This code is naturally dependent on the specific title structure of the indicator. In the plot, labels are constructed from the year and value being plotted.
In this deck, we are not using facets, we are going to manage each plot independently. However, the ggplot function returns the same object each time through the loop. Therefore, at the bottom of the loop, the MV command from the gdata package is used to move the plot1 object to a named object which is constructed from the indicator name.
Grouping the Plots
In this example, the named plots are grouped using gridExtra’s functions to provide the example shown below.
library(gridExtra)
plot_names<-paste0(indicator_vector,"_plot1")
plot_list<-mget(plot_names)
multi_plots<-marrangeGrob(plot_list,ncol=2,nrow=2,top=NULL)
multi_plot_name<-"Debt_summary.png"
ggsave(file=multi_plot_name,width=8,height=10.5,multi_plots)