vignettes/articles/FoReco_prob.Rmd
FoReco_prob.Rmd
Probabilistic forecast reconciliation is a statistical procedure for
generating coherent and accurate forecasts across multiple time series.
FoReco
is an R package that provides an implementation of
this technique, using the methods proposed by Panagiotelis et al. (2022)
and Girolimetto et al. (2023). The package is flexible and can be used
to perform reconciliation on cross-sectional, temporal, and
cross-temporal data. In particular we can calculate both point and
probabilistic forecasts, and allows for the specification of other
constraints (like non-negativity issues) to ensure that the forecasts
are consistent with each other. In this vignette, we will walk through
the basics of using FoReco
for probabilistic forecast
reconciliation.
To get started, we need to load the necessary libraries and set a sample size for the probabilistic forecasts sample:
library(forecast) # the data and the forecasting function
library(MASS) # simulate from a multivariate normal distribution
library(FoReco026) # bootstrap and reconciliation phase
B <- 1000 # Sample size for the probabilistic forecasts sample
Let’s take a look at how FoReco
can be used for
cross-sectional reconciliation. In this scenario, we use the UKLungDeaths
dataset containing the monthly deaths from bronchitis, emphysema, and
asthma in the UK from 1974 to 1979 for both sexes
(Total), males (Male) and females
(Female). We want to generate coherent forecasts for
these variables, such that \(Total = Male +
Female\) at any time.
We start by arranging the data frame of deaths, the aggregation
matrix, and the constraint matrix. Then, we generate the base forecasts
with ETS model using the functions ets()
and forecast()
from the package forecast.
# Cross-sectional setup
tdeaths <- mdeaths + fdeaths
agg_mat <- t(c(1,1))
cons_cs <- cbind(1,-agg_mat)
lungDeaths <- cbind(tdeaths, mdeaths, fdeaths)
fit <- apply(lungDeaths, 2, function(x) ets(ts(x, frequency = frequency(lungDeaths))))
forecast_obj <- lapply(fit, forecast, h=12) # forecast object
base <- sapply(forecast_obj, function(x) x$mean) # base mean (point forecasts)
res <- sapply(fit, residuals, type='response') # in-sample residuals (one-step)
We know that a sample from the reconciled distribution can be obtained by reconciling each member of a sample from the incoherent distribution (Panagiotelis et al., 2022). This finding enables us to distinguish the process of generating the base forecast samples from the reconciliation phase. To generate an incoherent sample, we can use a non-parametric (such as the joint block bootstrap) or a parametric approach (by assuming gaussianity).
We can implement the block bootstrap method proposed by Panagiotelis
et al. (2022) using the boot_cs()
function. This function
requires three inputs: the fit parameter, which is a
list of models (e.g., the three ETS models used to forecast); the
boot_size parameter, which specifies the number of
bootstrap replications; h, which denotes the block
size, typically equivalent to the forecast horizon. It is important to
note that the models used have the simulate()
function available and implemented similarly to the package forecast
(Hyndman et al. 2023), with the following mandatory parameters:
object, innov,
future, and nsim. The outputs of the
function are the seed used to sample the errors, and a 3-d array (\(\mathrm{boot\_size} \times n \times h\)),
which can be reconciled using the htsrec()
function.
# Base forecasts' sample:
# we simulate from the base models by sampling errors
# while keeping the cross-sectional dimension fixed.
base_csjb <- boot_cs(fit, B, 12)$sample
norm(apply(base_csjb, 3, function(x) x%*%t(cons_cs))) # Check the coherency for all the samples
#> [1] 119495.1
# Reconciled forecasts' sample:
# we reconcile each member of a sample from the incoherent distribution.
reco_csjb <- apply(base_csjb, 3, htsrec, C = agg_mat, res = res,
comb = "shr", keep = "recf", simplify = FALSE)
reco_csjb <- simplify2array(reco_csjb)
rownames(reco_csjb) <- NULL
norm(apply(reco_csjb, 3, function(x) x%*%t(cons_cs))) # Check the coherency
#> [1] 1.482476e-10
To obtain the base forecasts assuming Gaussianity (Panagiotelis et
al. 2022), we can use packages to simulate from a multivariate normal
distribution such as MASS
(Venables and Ripley 2002). We can then apply the htsrec()
function to obtain the reconciled sample. When dealing with multiple
forecast horizons, it is recommended to utilize multi-step residuals
instead of one-step residuals (see Girolimetto et al. 2023).
# Multi-step residuals
hres <- lapply(1:12, function(h)
sapply(fit, residuals, type='response', h = h))
# List of H=12 covariance matrix (one for each forecast horizon)
cov_shr <- lapply(hres, function(r) shrink_estim(r)$scov)
# Base forecasts' sample:
# we simulate from a multivariate normal distribution.
base_csg <- lapply(1:12, function(h) MASS::mvrnorm(n = B, mu = base[h, ], Sigma = cov_shr[[h]]))
base_csg <- simplify2array(base_csg)
sum(abs(apply(base_csg, 3, function(x) x%*%t(cons_cs)))) # Check the coherency
#> [1] 1225671
# Reconciled forecasts' sample:
# we reconcile each member of the base forecasts' sample.
reco_csg <- apply(base_csg, 3, htsrec, C = agg_mat, res = res, comb = "shr", keep = "recf", simplify = FALSE)
reco_csg <- simplify2array(reco_csg)
rownames(reco_csg) <- NULL
sum(abs(apply(reco_csg, 3, function(x) x%*%t(cons_cs)))) # Check the coherency
#> [1] 1.064137e-09
In the temporal framework, we reconcile base forecasts across different time frequencies (e.g., monthly, quarterly, and annual data) for a single time series. For example, suppose we want to generate reconciled monthly, quarterly, and annual forecasts for the total number of deaths due to bronchitis, emphysema, and asthma in the UK. According to Girolimetto et al. (2023), we can use the same approach as in the cross-sectional framework, but we need to account for the different time frequencies.
In the bootstrap approach, we can use the boot_te()
function, which generates a bootstrap sample for time-series forecasts
while keeping a temporal structure. The function requires the same
inputs as the equivalent cross-sectional boot_cs()
function, with an additional parameter m that indicates
the maximum order of temporal aggregation. The length of the block
bootstrap is determined by both m and
h, where h refers to the forecast
horizons for the most temporally aggregated series.
In the Gaussian approach, we assume that all the base forecasts
follow a multivariate normal distribution, and we calculate the
covariance matrix of the base forecasts using multi-step residuals
organized in matrix form through the residuals_matrix()
function.
To reconcile each sample we use the optimal temporal reconclition
function, thfrec()
.
# Temporal setup
y <- tdeaths
m <- 12
kset <- c(12, 3, 1) # factors subset of m = 12
# (only monthly, quarterly and annual data are considered)
kset <- setNames(kset, paste0("k", kset))
cons_te <- thf_tools(m = kset)$Zt
temp_y <- lapply(kset, agg_ts, x = y) # Aggregated time series list
fit <- lapply(temp_y, function(x) ets(ts(x, frequency = frequency(x))))
forecast_obj <- lapply(fit, function(tsfit)
forecast(tsfit, h=frequency(tsfit$x))) # forecast object
base <- sapply(forecast_obj, function(x) x$mean) # base mean
res <- Reduce("c", sapply(fit, residuals, type='response')) # in-sample residuals (one-step)
# Base forecasts' sample:
# we simulate from the base models by sampling errors
# while keeping the temporal dimension fixed.
base_tejb <- boot_te(fit, B, m = kset)$sample
# Reconciled forecasts' sample:
# we reconcile each member of the base forecasts' sample.
reco_tejb <- t(apply(base_tejb, 1, function(boot_base){
thfrec(boot_base, m = kset, res = res, comb = "wlsv", keep = "recf")}))
# Multi-step residuals
hres <- lapply(fit, function(mod)
lapply(1:frequency(mod$x), function(h)
residuals(mod, type='response', h = h)))
hres <- Reduce("c", lapply(hres, arrange_hres))
# Re-arrenge multi-step residuals in a matrix form
mres <- residuals_matrix(hres, m = kset)
# Base forecasts' sample:
# we simulate from a multivariate normal distribution.
base_teg <- MASS::mvrnorm(n = B, mu = unlist(base), Sigma = shrink_estim(mres)$scov)
# Reconciled forecasts' sample:
# we reconcile each member of the base forecasts' sample.
reco_teg <- t(apply(base_teg, 1, thfrec, m = kset, comb = "wlsv", res = res, keep = "recf"))
Finally, we obtain probabilistic reconciled forecasts in the
cross-temporal framework. We use the boot_ct()
function
with almost the same input as boot_te()
to implement the
bootstrap approach. The fit parameter in this case has
to consider the model for all the series at different frequency.
For the Gaussian approach, we can use the arrange_hres()
and residuals_matrix()
functions to arrange residuals for
the covariance matrix of the base forecasts.
Finally, we can reconcile the base forecasts samples with the optimal
cross-temporal reconciliation octrec()
(or we can use other
heuristic alternatives, e.g. iterec()
).
# Cross-temporal setup
ctlungDeaths <- lapply(kset, agg_ts, x = lungDeaths) # Aggregated multivariate time series list
fit <- lapply(ctlungDeaths, function(tsk)
apply(tsk, 2, function(x) ets(ts(x, frequency = frequency(tsk)))))
forecast_obj <- lapply(fit, function(fitk) # forecast object
lapply(fitk, function(tsfit)
forecast(tsfit, h=frequency(tsfit$x))))
base <- sapply(forecast_obj, function(fmodk) # base mean
sapply(fmodk, function(x) x$mean))
res <- Reduce("rbind", sapply(fit, function(fitk) # in-sample residuals (one-step)
sapply(fitk, residuals, type='response')))
# Base forecasts' sample:
# we simulate from the base models by sampling errors
# while keeping the cross-sectional and temporal dimensions fixed.
base_ctjb <- boot_ct(fit, B, m = kset)$sample
# Reconciled forecasts' sample:
# we reconcile each member of the base forecasts' sample.
reco_ctjb <- lapply(base_ctjb, function(boot_base){
octrec(t(boot_base), m = kset, C = agg_mat, res = t(res), comb = "bdshr", keep = "recf")})
reco_ctjb <- lapply(reco_ctjb, function(x) `rownames<-`(t(x), NULL))
# Multi-step residuals
hres <- lapply(fit, function(fitk) # in-sample residuals (one-step)
lapply(1:frequency(fitk$tdeaths$x), function(h)
sapply(fitk, residuals, type='response', h = h)))
hres <- t(Reduce("rbind", lapply(hres, arrange_hres)))
# Re-arrenge multi-step residuals in a matrix form
mres <- residuals_matrix(hres, m = kset)
# Base forecasts' sample:
# we simulate from a multivariate normal distribution.
base_ctg <- MASS::mvrnorm(n = B, mu = residuals_matrix(t(Reduce("rbind", base)), m = kset),
Sigma = cov(na.omit(mres)))
base_ctg <- apply(base_ctg, 1, function(x) matrix(x, ncol = 3), simplify = FALSE)
# Reconciled forecasts' sample:
# we reconcile each member of the base forecasts' sample.
reco_ctg <- lapply(base_ctg, function(boot_base){
FoReco026::octrec(t(boot_base), m = kset, C = agg_mat, res = t(res), comb = "bdshr",
keep = "recf")})
reco_ctg <- lapply(reco_ctg, function(x) `rownames<-`(t(x), NULL))
Girolimetto, D., Athanasopoulos, G., Di Fonzo, T., and Hyndman, R. J. (2023), Cross-temporal Probabilistic Forecast Reconciliation, https://doi.org/10.48550/arXiv.2303.17277 .
Hyndman R, Athanasopoulos G, Bergmeir C, Caceres G, Chhay L, O’Hara-Wild M, Petropoulos F, Razbash S, Wang E, Yasmeen F (2023). forecast: Forecasting functions for time series and linear models . R package version 8.20, https://pkg.robjhyndman.com/forecast/.
Panagiotelis, A., Gamakumara, P., Athanasopoulos, G. and Hyndman, R. J. (2023), Probabilistic forecast reconciliation: Properties, evaluation and score optimisation, European Journal of Operational Research 306(2), 693–706, https://doi.org/10.1016/j.ejor.2022.07.040 .