The vndata and itagdp dataset
Daniele Girolimetto
2024-06-07
Source:vignettes/articles/Dataset-vndata-and-itagdp.qmd
Dataset-vndata-and-itagdp.qmd
vndata
: Groupped monthly time series
The Australian Tourism Demand dataset (Girolimetto et al., 2024; Wickramasuriya et al., 2018) measures the number of nights Australians spent away from home. It includes 228 monthly observations of Visitor Nights (VNs) from January 1998 to December 2016, and has a cross-sectional grouped structure based on a geographic hierarchy crossed by purpose of travel. The monthly bottom times series are available at robjhyndman.com/data/TourismData_v3.csv.
# Dataset
data(vndata)
str(vndata)
#> Time-Series [1:228, 1:525] from 1998 to 2017: 45151 17295 20725 25389 20330 ...
#> - attr(*, "dimnames")=List of 2
#> ..$ : NULL
#> ..$ : chr [1:525] "Total" "A" "B" "C" ...
# Aggregation matrix
data(vnaggmat)
str(vnaggmat)
#> int [1:221, 1:304] 1 1 0 0 0 0 0 0 1 0 ...
#> - attr(*, "dimnames")=List of 2
#> ..$ : chr [1:221] "Total" "A" "B" "C" ...
#> ..$ : chr [1:304] "AAAHol" "AAAVis" "AAABus" "AAAOth" ...
id_bts <- colnames(vnaggmat)[round(seq(1, NCOL(vnaggmat), length.out = 10))]
states <- c("NSW", "VIC", "QLD", "SA", "WA", "TAS", "NT")
colnames(vndata)[colnames(vndata) %in% LETTERS[1:7]] <- states
marrangeGrob(list(autoplot(vndata[,"Total"], y = NULL,
main = "Australia (total)") + theme_minimal(),
autoplot(vndata[, states], y = NULL,
main = "States") + theme_minimal(),
autoplot(vndata[, c("Hol", "Vis", "Bus", "Oth")], y = NULL,
main = "Purpose of travel") + theme_minimal(),
autoplot(vndata[,id_bts], y = NULL,
main = "Bottom time series") + theme_minimal()),
top = NULL, nrow = 2, ncol = 2)
The geographic hierarchy comprises 7 states, 27 zones, and 76 regions, for a total of 111 nested geographic divisions. Six of these zones are each formed by a single region, resulting in 105 unique nodes in the hierarchy. The purpose of travel comprises four categories: holiday, visiting friends and relatives, business, and other. To avoid redundancies (Di Fonzo & Girolimetto, 2024), 24 nodes (6 zones are formed by a single region) are not considered, resulting in an unbalanced hierarchy of 525 unique nodes instead of the theoretical 555 with duplicated nodes. The data can be temporally aggregated into two, three, four, six, or twelve months (\(\mathcal{K}=\{12, 6, 4, 3, 2, 1\}\)).
itagdp
: general linearly constrained multiple quarterly
time series
The National Accounts are a coherent and consistent set of
macroeconomic indicators that are used mostly for economic research and
forecasting, policy design, and coordination mechanisms. In this
dataset, GDP is a key macroeconomic quantity that is measured using
three main approaches, namely output (or production), income and
expenditure. These parallel systems internally present a well-defned
hierarchical structure of variables with relevant economic signifcance,
such as Final consumption, on the expenditure side, Gross operating
surplus and mixed income on the income side, and Total gross value added
on the output side. In the EU countries, the data is processed on the
basis of the ESA 2010 classifcation and are released by Eurostat. The
dataset itagdp
(https://ec.europa.eu/eurostat/web/national-accounts/)
contains the Italian Gross Domestic Product (GDP) at at current prices
(in euro), with time series spanning the period 2000:Q1-2019:Q4.
# Dataset
data(itagdp)
str(itagdp)
#> Time-Series [1:80, 1:21] from 2000 to 2020: 290847 309761 301049 339856 307934 ...
#> - attr(*, "dimnames")=List of 2
#> ..$ : NULL
#> ..$ : chr [1:21] "GDP" "D1" "P3_P5" "P3" ...
autoplot(itagdp, y = NULL) + theme_minimal()
The output, income and expenditure approaches are represented as three differente hierarchies that share the same top-level series (\(GDP\)), but not the bottom-level series.
plot_mat <- function(mat, font_size = 8, caption_label = NULL){
melt(mat) |>
ggplot(aes(x = Var2, y = Var1)) +
geom_tile(aes(fill=as.character(value)), color = "grey") +
scale_fill_manual(values = c("-1" = "red", "0" = "white", "1" = "black")) +
labs(x=NULL, y=NULL, title=NULL) +
scale_y_discrete(limits=rev) +
scale_x_discrete(position = "top") +
labs(caption = caption_label) +
coord_fixed() +
theme_void() +
theme(axis.text.x=element_text(size=font_size, vjust=0.5, angle = 90, hjust=0),
axis.text.y=element_text(size=font_size, hjust=1),
plot.caption = element_text(size=rel(0.9), hjust=0.5, face = "bold"),
legend.position = "none")
}
marrangeGrob(list(plot_mat(incside$agg_mat, caption_label = "Income hierarchy"),
plot_mat(outside$agg_mat, caption_label = "Output hierarchy"),
plot_mat(expside$agg_mat, caption_label = "Expenditure hierarchy")),
top = NULL, layout_matrix = matrix(c(1, 2, 3, 3, 3, 3, 3,3), 2, 4))
The complete \((9 \times 21)\) zero constraints matrix encompassing output, expenditure and income sides is represented in the following figure.
rownames(gdpconsmat)[c(1,2,4)] <- c("GDP[O]", "GDP[I]", "GDP[E]")
rownames(gdpconsmat)[-c(1,2,4)] <- colnames(gdpconsmat)[2:7]
plot_mat(gdpconsmat)
A linear combination matrix for a general linearly constrained multiple time series may be construct (Girolimetto & Di Fonzo, 2023).