Accessing the WORLD Bank Data
This note focusses on directly accessing indicators in the database of the World Bank. Appendices in the accompanying PDF contain the log and a log file for a sample run.
Packages USED
The key package used in this vignette is the wbstats package which accesses the World Bank RESTful API. Key packages include
- dplyr – for processing the tibble prior to conversion to xts
- ggplot2 – part of tidyverse for plotting the results
- wbstats – the retrieval package to be used[1].
Detailed instructions for utilizing the wbstats package are available online.[2]
Retrieving the DATA
The process starts with loading the relevant packages. Unlike many other statistical databases, data in the World Bank (WB) system are organized by indicators by countries. The structure of the database is retained in a cache which can be processed. In this cache, data frames are available with the codes and documentation for countries, variables and other constructs. In this first code segment, the cache is retrieved, and the country list searched for country names to return country codes which are the most productive way to access that construct. The database requires some searching at the start of projects. For example, as well as an aggregate GINI index prepared by the World Bank, there are others submitted by various countries.
#this routine uses wbstats to access the worldbank database
setwd("D:\\OneDrive\\wb_examples")
library(tidyverse)
library(wbstats)
library(ggplot2)
#examine the cache
current_cache<-wbcache()
str(current_cache,max.level=1)
g20_countries<-c("Argentina", "Australia", "Brazil", "Canada", "China", "France", "Germany",
"India", "Indonesia", "Italy", "Japan", "Korea, Rep.", "Mexico", "Russian Federation",
"Saudi Arabia", "South Africa", "Turkey", "United Kingdom","United States","European Union")
#use str_detect to look at partial strings to get their spelling
filter(current_cache$countries,str_detect(country,"Korea"))
filter(current_cache$countries,country %in% g20_countries) %>%
select(iso2c, iso3c,country,regionID,region) ->g20_country_list
#save the iso codes for subsequent work
write.csv(g20_country_list,file="g20_country_list.csv")
In this example, we are searching for the 3-digit codes (iso3c) for the g20 countries. The initial text list tried did not find Russia or Korea. The str_detect function was used interactively in a filter statement to identify the Koreas as well as Russia. One example of this use is shown in the script. Then the g20_countries list was edited to match WB standards. The second filter statement returns the rows in the countries data frame matching the G20 list. These are saved as a CSV for future work.
The next code segment looks for the appropriate indicator code, the indicatorID, to retrieve the data. Of course, if the indicatorID and iso3c list have been prepared externally by using the web site or in a previous run, the wbstats process can start directly with the wb function to retrieve the data.
#now look for GINI by experiment, this returns 1 row
gini_indicator<-filter(current_cache$indicators,str_detect(indicator,"GINI"))
print(data.frame(gini_indicator))
gini_data<-wb(country=g20_country_list$iso3c,indicator=gini_indicator$indicatorID)
There are many variables in the gini_data frame including date, country codes other information. In the next code segment, the g20_country_list is simplified to a smaller list with only the code and the region for later merging with the plot data. The gini_data is then grouped by country, sorted by date and reduced to the last observation for each country. The data cannot be selected by date because of the ragged edge as will be seen below. The dplyr slice verb is used to select the last observation (the nth one) for each country. This works because of the grouping and sorting,
#the next statement groups by country, sorts by year, and takes the last observation in each group
# and creates a country_year variable comprised of the country and last year
# and join to country tibble to get region
code_region<-select(g20_country_list,iso3c,region)
group_by(gini_data,country)%>%arrange(date)%>%slice(n()) %>%
mutate(country_year=paste0(iso3c,"-",date))%>%
left_join(code_region,by=c("iso3c"))->gini_data_last
write.csv(gini_data_last,file="g20_wb_gini_index.csv")
A mutate command is used to join the 3-character country code and the date of the last observation for labelling the x axis in the plot. The mutate command adds this column to the gini_data_last tibble. A left_join is used to match the code_region tibble to the retrieved data to add the region as a column.
The next code segment produces a bar chart with the country_year variable as the x axis labels plotted at a 45 degree angle. The bars are filled with different colours based on region and the scale legend is appropriately labelled.
#start the plot
gini_plot1<-ggplot(gini_data_last,aes(x=country_year,y=value,fill=region))+
geom_bar(stat="identity")+
labs(title=gini_indicator$indicator,subtitle="G20 Countries",y="GINI Value",
caption="Source: World Bank",x="Country by Indicator Date")+
scale_fill_discrete(name = "Region")+
theme(axis.text.x = element_text(angle = 45, hjust = 1))
ggsave(gini_plot1,file="world_bank_gini_g20.png")
The resulting plot is shown below.
]t is often useful to produce a tabular presentation of data as graphic. This facilitates inclusion in MSWord documents or in web sites. The next coding segment uses gridExtra to produce a plot Grob which looks like a table.[3]
#now start a table
library(gridExtra)
select(gini_data_last,iso3c,region,country,date,value) %>%
arrange(iso3c)->gini_table_data
colnames(gini_table_data)<-c("Code","Region","Country","Date","Gini")
# grid table is documented at
table_plot<-tableGrob(gini_table_data)
ggsave(table_plot,file="world_bank_gini_table_chart.png")
The resulting table plot is shown in a graphic below.
It is often useful to combine a plot and the table in one graphic for the web or a report. The next code segment uses the functions of marrangeGrob to combine a plot and the table graphic in one graphic. This process relies on the fact the tableGrob and ggplot2 are built on the Grid library and return graphic objects (grobs) as results which can be merged.
plot_list<-mget(c("gini_plot1","table_plot"))
two_plots<-marrangeGrob(plot_list,ncol=2,nrow=1,top=NULL)
two_plot_name<-"gini_chart_table.png"
ggsave(file=two_plot_name,width=15,height=8,two_plots)
The resulting graphic is shown below.
This combination approach is often useful when displaying information on the web.
[1] https://www.rdocumentation.org/packages/wbstats/versions/0.2
[2] https://cran.r-project.org/web/packages/wbstats/vignettes/Using_the_wbstats_package.html
[3] https://cran.r-project.org/web/packages/gridExtra/vignettes/tableGrob.html