Title: | Time Series Forecasting with Machine Learning Methods |
---|---|
Description: | The purpose of 'forecastML' is to simplify the process of multi-step-ahead forecasting with standard machine learning algorithms. 'forecastML' supports lagged, dynamic, static, and grouping features for modeling single and grouped numeric or factor/sequence time series. In addition, simple wrapper functions are used to support model-building with most R packages. This approach to forecasting is inspired by Bergmeir, Hyndman, and Koo's (2018) paper "A note on the validity of cross-validation for evaluating autoregressive time series prediction" <doi:10.1016/j.csda.2017.11.003>. |
Authors: | Nickalus Redell |
Maintainer: | Nickalus Redell <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.9.1 |
Built: | 2025-02-23 04:19:10 UTC |
Source: | https://github.com/nredell/forecastml |
The residuals from model training/fit are sampled i.i.d. for (a) each direct forecast horizon for a single time series and (b) each combination of direct forecast horizon and group for multiple time series.
calculate_intervals( forecasts, residuals, index = NULL, outcome = NULL, keys = NULL, levels = c(0.95), times = 100L, weights = NULL, keep_samples = FALSE )
calculate_intervals( forecasts, residuals, index = NULL, outcome = NULL, keys = NULL, levels = c(0.95), times = 100L, weights = NULL, keep_samples = FALSE )
forecasts |
A data.frame of forecasts. |
residuals |
A data.frame of residuals (e.g., |
index |
Optional for forecasts from |
outcome |
Optional for forecasts from |
keys |
Optional. For grouped time series, a character vector giving the column name(s) of the group columns. The key identifies unique time series of residuals for bootstrap sampling. For direct forecasting, a single time series will have one group per direct forecast horizon. |
levels |
A numeric vector with 1 or more forecast prediction intervals. A level of .95, for example, will return the 0.25 and .975 quantiles of the bootstrapped forecast distribution at each forecast horizon. |
times |
Integer. The number of bootstrap samples. |
weights |
Not implemented. |
keep_samples |
Boolean. If |
If forecasts
is an object of class 'forecast_results', a forecast_results
object
with a new column for each lower- and upper-bound forecast in levels
. If forecasts
is a
data.frame, the function return will be the same but without forecastML
attributes. If,
keep_samples
is TRUE
, a named list of length 2 is returned with 'forecasts' and 'samples'.
## Not run: data("data_seatbelts", package = "forecastML") data_train <- create_lagged_df(data_seatbelts, type = "train", method = "direct", outcome_col = 1, lookback = 1:15, horizons = c(1, 6, 12)) windows <- create_windows(data_train, window_length = 0) model_fn <- function(data) { model <- lm(DriversKilled ~ ., data) } model_results <- train_model(data_train, windows, model_name = "OLS", model_function = model_fn) predict_fn <- function(model, data) { data_pred <- as.data.frame(predict(model, data)) } data_fit <- predict(model_results, prediction_function = list(predict_fn), data = data_train) residuals <- residuals(data_fit) data_forecast <- create_lagged_df(data_seatbelts, type = "forecast", method = "direct", outcome_col = 1, lookback = 1:15, horizons = c(1, 6, 12)) data_forecasts <- predict(model_results, prediction_function = list(predict_fn), data = data_forecast) data_forecasts <- combine_forecasts(data_forecasts) data_forecasts <- calculate_intervals(data_forecasts, residuals, times = 30) plot(data_forecasts) ## End(Not run)
## Not run: data("data_seatbelts", package = "forecastML") data_train <- create_lagged_df(data_seatbelts, type = "train", method = "direct", outcome_col = 1, lookback = 1:15, horizons = c(1, 6, 12)) windows <- create_windows(data_train, window_length = 0) model_fn <- function(data) { model <- lm(DriversKilled ~ ., data) } model_results <- train_model(data_train, windows, model_name = "OLS", model_function = model_fn) predict_fn <- function(model, data) { data_pred <- as.data.frame(predict(model, data)) } data_fit <- predict(model_results, prediction_function = list(predict_fn), data = data_train) residuals <- residuals(data_fit) data_forecast <- create_lagged_df(data_seatbelts, type = "forecast", method = "direct", outcome_col = 1, lookback = 1:15, horizons = c(1, 6, 12)) data_forecasts <- predict(model_results, prediction_function = list(predict_fn), data = data_forecast) data_forecasts <- combine_forecasts(data_forecasts) data_forecasts <- calculate_intervals(data_forecasts, residuals, times = 30) plot(data_forecasts) ## End(Not run)
The horizon-specific models can either be combined to (a) produce final forecasts for only those horizons at which they were trained (i.e., shorter-horizon models override longer-horizon models when producing final short-horizon h-step-ahead forecasts) or (b) produce final forecasts using any combination of horizon-specific models that minimized error over the validation/training dataset.
combine_forecasts( ..., type = c("horizon", "error"), aggregate = stats::median, data_error = list(NULL), metric = NULL )
combine_forecasts( ..., type = c("horizon", "error"), aggregate = stats::median, data_error = list(NULL), metric = NULL )
... |
One or more objects of class 'forecast_results' from running |
type |
Default: 'horizon'. A character vector of length 1 that identifies the forecast combination method. |
aggregate |
Default |
data_error |
Optional. A list of objects of class 'validation_error' from running |
metric |
Required if |
An S3 object of class 'forecastML' with final h-step-ahead forecasts.
Forecast combination type:
type = 'horizon'
: 1 final h-step-ahead forecast is returned for each model object passed in ...
.
type = 'error'
: 1 final h-step-ahead forecast is returned by selecting, for each forecast horizon,
the model that minimized the chosen error metric at that horizon on the outer-loop validation data sets.
Columns in returned 'forecastML' data.frame:
model
: User-supplied model name in train_model()
.
model_forecast_horizon
: The direct-forecasting time horizon that the model was trained on.
horizon
: Forecast horizons, 1:h, measured in dataset rows.
forecast_period
: The forecast period in row indices or dates. The forecast period starts at either attributes(create_lagged_df())$data_stop + 1
for row indices or attributes(create_lagged_df())$data_stop + 1 * frequency
for date indices.
"groups"
: If given, the user-supplied groups in create_lagged_df()
.
"outcome_name"_pred
: The final forecasts.
"outcome_name"_pred_lower
: If given, the lower forecast bounds returned by the user-supplied prediction function.
"outcome_name"_pred_upper
: If given, the upper forecast bounds returned by the user-supplied prediction function.
The output of combine_forecasts()
has the following generic S3 methods
# Example with "type = 'horizon'". data("data_seatbelts", package = "forecastML") horizons <- c(1, 3, 12) lookback <- 1:15 data_train <- create_lagged_df(data_seatbelts, type = "train", outcome_col = 1, lookback = lookback, horizon = horizons) windows <- create_windows(data_train, window_length = 0) model_function <- function(data, my_outcome_col) { model <- lm(DriversKilled ~ ., data = data) return(model) } model_results <- train_model(data_train, windows, model_name = "LM", model_function) data_forecast <- create_lagged_df(data_seatbelts, type = "forecast", outcome_col = 1, lookback = lookback, horizon = horizons) prediction_function <- function(model, data_features) { x <- data_features data_pred <- data.frame("y_pred" = predict(model, newdata = x)) return(data_pred) } data_forecasts <- predict(model_results, prediction_function = list(prediction_function), data = data_forecast) data_combined <- combine_forecasts(data_forecasts) plot(data_combined)
# Example with "type = 'horizon'". data("data_seatbelts", package = "forecastML") horizons <- c(1, 3, 12) lookback <- 1:15 data_train <- create_lagged_df(data_seatbelts, type = "train", outcome_col = 1, lookback = lookback, horizon = horizons) windows <- create_windows(data_train, window_length = 0) model_function <- function(data, my_outcome_col) { model <- lm(DriversKilled ~ ., data = data) return(model) } model_results <- train_model(data_train, windows, model_name = "LM", model_function) data_forecast <- create_lagged_df(data_seatbelts, type = "forecast", outcome_col = 1, lookback = lookback, horizon = horizons) prediction_function <- function(model, data_features) { x <- data_features data_pred <- data.frame("y_pred" = predict(model, newdata = x)) return(data_pred) } data_forecasts <- predict(model_results, prediction_function = list(prediction_function), data = data_forecast) data_combined <- combine_forecasts(data_forecasts) plot(data_combined)
Create a list of datasets with lagged, grouped, dynamic, and static features to (a) train forecasting models for specified forecast horizons and (b) forecast into the future with a trained ML model.
create_lagged_df( data, type = c("train", "forecast"), method = c("direct", "multi_output"), outcome_col = 1, horizons, lookback = NULL, lookback_control = NULL, dates = NULL, frequency = NULL, dynamic_features = NULL, groups = NULL, static_features = NULL, predict_future = NULL, use_future = FALSE, keep_rows = FALSE )
create_lagged_df( data, type = c("train", "forecast"), method = c("direct", "multi_output"), outcome_col = 1, horizons, lookback = NULL, lookback_control = NULL, dates = NULL, frequency = NULL, dynamic_features = NULL, groups = NULL, static_features = NULL, predict_future = NULL, use_future = FALSE, keep_rows = FALSE )
data |
A data.frame with the (a) target to be forecasted and (b) features/predictors. An optional date column can be given in the
|
type |
The type of dataset to return–(a) model training or (b) forecast prediction. The default is |
method |
The type of modeling dataset to create. |
outcome_col |
The column index–an integer–of the target to be forecasted. If |
horizons |
A numeric vector of one or more forecast horizons, h, measured in dataset rows.
If |
lookback |
A numeric vector giving the lags–in dataset rows–for creating the lagged features. All non-grouping,
non-static, and non-dynamic features in the input dataset, |
lookback_control |
A list of numeric vectors, specifying potentially unique lags for each feature. The length
of the list should equal |
dates |
A vector or 1-column data.frame of dates/times with class 'Date' or 'POSIXt'. The length
of |
frequency |
Date/time frequency. Required if |
dynamic_features |
A character vector of column names that identify features that change through time but which are not lagged (e.g., weekday or year).
If |
groups |
A character vector of column names that identify the groups/hierarchies when multiple time series are present. These columns are used as model features but
are not lagged. Note that combining feature lags with grouped time series will result in |
static_features |
For grouped time series only. A character vector of column names that identify features that do not change through time.
These columns are not lagged. If |
predict_future |
When |
use_future |
Boolean. If |
keep_rows |
Boolean. For non-grouped time series, keep the |
An S3 object of class 'lagged_df' or 'grouped_lagged_df': A list of data.frames with new columns for the lagged/non-lagged features.
For method = "direct"
, the length of the returned list is equal to the number of forecast horizons and is in the order of
horizons supplied to the horizons
argument. Horizon-specific datasets can be accessed with
my_lagged_df$horizon_h
where 'h' gives the forecast horizon.
For method = "multi_output"
, the length of the returned list is 1. Horizon-specific datasets can be accessed with
my_lagged_df$horizon_1_3_5
where "1_3_5" represents the forecast horizons passed in horizons
.
The contents of the returned data.frames are as follows:
A data.frame with the outcome and lagged/dynamic features.
A data.frame with the outcome and unlagged grouping columns followed by lagged, dynamic, and static features.
(1) An 'index' column giving the row index or date of the
forecast periods (e.g., a 100 row non-date-based training dataset would start with an index of 101). (2) A 'horizon' column
that indicates the forecast period from 1:max(horizons)
. (3) Lagged features identical to the
'train', non-grouped dataset.
(1) An 'index' column giving the date of the
forecast periods. The first forecast date for each group is the maximum date from the dates
argument
+ 1 * frequency
which is the user-supplied date/time frequency.(2) A 'horizon' column that indicates
the forecast period from 1:max(horizons)
. (3) Lagged, static, and dynamic features identical to the 'train', grouped dataset.
names
: The horizon-specific datasets that can be accessed with my_lagged_df$horizon_h
.
type
: Training, train
, or forecasting, forecast
, dataset(s).
method
: direct
or multi_output
.
horizons
: Forecast horizons measured in dataset rows.
outcome_col
: The column index of the target being forecasted.
outcome_cols
: If method = multi_output
, the column indices of the multiple outputs in the transformed dataset.
outcome_name
: The name of the target being forecasted.
outcome_names
: If method = multi_output
, the column names of the multiple outputs in the transformed dataset.
The names take the form "outcome_name_h" where 'h' is a horizon passed in horizons
.
predictor_names
: The predictor or feature names from the input dataset.
row_indices
: The row.names()
of the output dataset. For non-grouped datasets, the first
lookback
+ 1 rows are removed from the beginning of the dataset to remove NA
values in the lagged features.
date_indices
: If dates
are given, the vector of dates
.
frequency
: If dates
are given, the date/time frequency.
data_start
: min(row_indices)
or min(date_indices)
.
data_stop
: max(row_indices)
or max(date_indices)
.
groups
: If groups
are given, a vector of group names.
class
: grouped_lagged_df, lagged_df, list
The output of create_lagged_df()
is passed into
and has the following generic S3 methods
# Sampled Seatbelts data from the R package datasets. data("data_seatbelts", package = "forecastML") #------------------------------------------------------------------------------ # Example 1 - Training data for 2 horizon-specific models w/ common lags per predictor. horizons <- c(1, 12) lookback <- 1:15 data <- data_seatbelts data_train <- create_lagged_df(data_seatbelts, type = "train", outcome_col = 1, horizons = horizons, lookback = lookback) head(data_train[[length(horizons)]]) # Example 1 - Forecasting dataset # The last 'nrow(data_seatbelts) - horizon' rows are automatically used from data_seatbelts. data_forecast <- create_lagged_df(data_seatbelts, type = "forecast", outcome_col = 1, horizons = horizons, lookback = lookback) head(data_forecast[[length(horizons)]]) #------------------------------------------------------------------------------ # Example 2 - Training data for one 3-month horizon model w/ unique lags per predictor. horizons <- 3 lookback <- list(c(3, 6, 9, 12), c(4:12), c(6:15), c(8)) data_train <- create_lagged_df(data_seatbelts, type = "train", outcome_col = 1, horizons = horizons, lookback_control = lookback) head(data_train[[length(horizons)]])
# Sampled Seatbelts data from the R package datasets. data("data_seatbelts", package = "forecastML") #------------------------------------------------------------------------------ # Example 1 - Training data for 2 horizon-specific models w/ common lags per predictor. horizons <- c(1, 12) lookback <- 1:15 data <- data_seatbelts data_train <- create_lagged_df(data_seatbelts, type = "train", outcome_col = 1, horizons = horizons, lookback = lookback) head(data_train[[length(horizons)]]) # Example 1 - Forecasting dataset # The last 'nrow(data_seatbelts) - horizon' rows are automatically used from data_seatbelts. data_forecast <- create_lagged_df(data_seatbelts, type = "forecast", outcome_col = 1, horizons = horizons, lookback = lookback) head(data_forecast[[length(horizons)]]) #------------------------------------------------------------------------------ # Example 2 - Training data for one 3-month horizon model w/ unique lags per predictor. horizons <- 3 lookback <- list(c(3, 6, 9, 12), c(4:12), c(6:15), c(8)) data_train <- create_lagged_df(data_seatbelts, type = "train", outcome_col = 1, horizons = horizons, lookback_control = lookback) head(data_train[[length(horizons)]])
create_skeleton()
strips the feature data from a create_lagged_df()
object
but keeps the outcome column(s), any grouping columns, and meta-data which allows the resulting
lagged_df
to be used downstream in the forecastML
pipeline. The main benefit is
that the custom modeling function passed in train_model()
can read data directly from the
disk or a database when the dataset is too large to fit into memory.
create_skeleton(lagged_df)
create_skeleton(lagged_df)
lagged_df |
An object of class 'lagged_df' from |
An S3 object of class 'lagged_df' or 'grouped_lagged_df': A list of data.frames with the
outcome column(s) and any grouping columns but with all other features removed.
A special attribute skeleton = TRUE
is added.
The output of create_skeleton
can be passed into
Flexibly create blocks of time-contiguous validation datasets to assess the forecast accuracy of trained models at various times in the past. These validation datasets are similar to the outer loop of a nested cross-validation model training setup.
create_windows( lagged_df, window_length = 12L, window_start = NULL, window_stop = NULL, skip = 0, include_partial_window = TRUE )
create_windows( lagged_df, window_length = 12L, window_start = NULL, window_stop = NULL, skip = 0, include_partial_window = TRUE )
lagged_df |
An object of class 'lagged_df' or 'grouped_lagged_df' from |
window_length |
An integer that defines the length of the contiguous validation dataset in dataset rows/dates.
If dates were given in |
window_start |
Optional. A row index or date identifying the row/date to start creating contiguous validation datasets. A
vector of start rows/dates can be supplied for greater control. The length and order of |
window_stop |
Optional. An index or date identifying the row/date to stop creating contiguous validation datasets. A
vector of start rows/dates can be supplied for greater control. The length and order of |
skip |
An integer giving a fixed number of dataset rows/dates to skip between validation datasets. If dates were given
in |
include_partial_window |
Boolean. If |
An S3 object of class 'windows': A data.frame giving the indices for the validation datasets.
The output of create_windows()
is passed into
and has the following generic S3 methods
# Sampled Seatbelts data from the R package datasets. data("data_seatbelts", package = "forecastML") # Example - Training data for 2 horizon-specific models w/ common lags per feature. horizons <- c(1, 12) lookback <- 1:15 data_train <- create_lagged_df(data_seatbelts, type = "train", outcome_col = 1, lookback = lookback, horizon = horizons) # All historical window lengths of 12 plus any partial windows at the end of the dataset. windows <- create_windows(data_train, window_length = 12) windows # Two custom validation windows with different lengths. windows <- create_windows(data_train, window_start = c(20, 80), window_stop = c(30, 100)) windows
# Sampled Seatbelts data from the R package datasets. data("data_seatbelts", package = "forecastML") # Example - Training data for 2 horizon-specific models w/ common lags per feature. horizons <- c(1, 12) lookback <- 1:15 data_train <- create_lagged_df(data_seatbelts, type = "train", outcome_col = 1, lookback = lookback, horizon = horizons) # All historical window lengths of 12 plus any partial windows at the end of the dataset. windows <- create_windows(data_train, window_length = 12) windows # Two custom validation windows with different lengths. windows <- create_windows(data_train, window_start = c(20, 80), window_stop = c(30, 100)) windows
A dataset containing daily average sensor measurements of several environmental conditions collected by 14 buoys in Lake Michigan from 2012 through 2018.
data_buoy
data_buoy
A data.frame with 30,821 rows and 9 columns:
date
average daily wind speed in kts
the station ID for each buoy
latitude
longitude
day of year
calendar year
air temperature in degrees Fahrenheit
water temperature in degrees Fahrenheit
A dataset containing daily average sensor measurements of several environmental
conditions collected by 14 buoys in Lake Michigan from 2012 through 2018. This
dataset is identical to the data_buoy dataset except that there are gaps in
the daily sensor data. Running fill_gaps()
on data_buoy_gaps
will
produce data_buoy
.
data_buoy_gaps
data_buoy_gaps
A data.frame with 23,646 rows and 9 columns:
date
average daily wind speed in kts
the station ID for each buoy
latitude
longitude
day of year
calendar year
air temperature in degrees Fahrenheit
water temperature in degrees Fahrenheit
This is the Seatbelts
dataset from the datasets
package.
data_seatbelts
data_seatbelts
A data.frame with 192 rows and 8 columns
Harvey, A.C. (1989). Forecasting, Structural Time Series Models and the Kalman Filter. Cambridge University Press, pp. 519–523.
Durbin, J. and Koopman, S. J. (2001). Time Series Analysis by State Space Methods. Oxford University Press.
https://stat.ethz.ch/R-manual/R-devel/library/datasets/html/UKDriverDeaths.html
In order to create a modeling dataset with feature lags that are temporally correct, the entry
function in forecastML
, create_lagged_df
, needs evenly-spaced time series with no
gaps in data collection. fill_gaps()
can help here.
This function takes a data.frame
with (a) dates, (b) the outcome being forecasted, and, optionally,
(c) dynamic features that change through time, (d) group columns for multiple time series modeling,
and (e) static or non-dynamic features for multiple time series modeling and returns a data.frame
with rows evenly spaced in time. Specifically, this function adds rows to the input dataset
while filling in (a) dates, (b) grouping information, and (c) static features. The (a) outcome and (b)
dynamic features will be NA
for any missing time periods; these NA
values can be left
as-is, user-imputed, or removed from modeling in the user-supplied modeling wrapper function for train_model
.
fill_gaps(data, date_col = 1, frequency, groups = NULL, static_features = NULL)
fill_gaps(data, date_col = 1, frequency, groups = NULL, static_features = NULL)
data |
A data.frame or object coercible to a data.frame with, minimally, dates and the outcome being forecasted. |
date_col |
The column index–an integer–of the date index. This column should have class 'Date' or 'POSIXt'. |
frequency |
Date/time frequency. A string taking the same input as |
groups |
Optional. A character vector of column names that identify the unique time series (i.e., groups/hierarchies) when multiple time series are present. |
static_features |
Optional. For grouped time series only. A character vector of column names that identify features that do not change through time. These columns are expected to be used as model features but are not lagged (e.g., a ZIP code column). The most recent values for each static feature for each group are used to fill in the resulting missing data in static features when new rows are added to the dataset. |
An object of class 'data.frame': The returned data.frame has the same number of columns and column order but
with additional rows to account for gaps in data collection. For grouped data, any new rows added to the returned data.frame will appear
between the minimum–or oldest–date for that group and the maximum–or most recent–date across all groups. If the user-supplied
forecasting algorithm(s) cannot handle missing outcome values or missing dynamic features, these should either be
imputed prior to create_lagged_df()
or filtered out in the user-supplied modeling function for train_model
.
The output of fill_gaps()
is passed into
# NOAA buoy dataset with gaps in data collection data("data_buoy_gaps", package = "forecastML") data_buoy_no_gaps <- fill_gaps(data_buoy_gaps, date_col = 1, frequency = '1 day', groups = 'buoy_id', static_features = c('lat', 'lon')) # The returned data.frame has the same number of columns but the time-series # are now evenly spaced at 1 day apart. Additionally, the unchanging grouping # columns and static features columns have been filled in for the newly created dataset rows. dim(data_buoy_gaps) dim(data_buoy_no_gaps) # Running create_lagged_df() is the next step in the forecastML forecasting # process. If there are long gaps in data collection, like in this buoy dataset, # and the user-supplied modeling algorithm cannot handle missing outcomes data, # the best option is to filter these rows out in the user-supplied modeling function # for train_model()
# NOAA buoy dataset with gaps in data collection data("data_buoy_gaps", package = "forecastML") data_buoy_no_gaps <- fill_gaps(data_buoy_gaps, date_col = 1, frequency = '1 day', groups = 'buoy_id', static_features = c('lat', 'lon')) # The returned data.frame has the same number of columns but the time-series # are now evenly spaced at 1 day apart. Additionally, the unchanging grouping # columns and static features columns have been filled in for the newly created dataset rows. dim(data_buoy_gaps) dim(data_buoy_no_gaps) # Running create_lagged_df() is the next step in the forecastML forecasting # process. If there are long gaps in data collection, like in this buoy dataset, # and the user-supplied modeling algorithm cannot handle missing outcomes data, # the best option is to filter these rows out in the user-supplied modeling function # for train_model()
Plot forecast error at various levels of aggregation.
## S3 method for class 'forecast_error' plot( x, type = c("global"), metric = NULL, facet = NULL, models = NULL, horizons = NULL, windows = NULL, group_filter = NULL, ... )
## S3 method for class 'forecast_error' plot( x, type = c("global"), metric = NULL, facet = NULL, models = NULL, horizons = NULL, windows = NULL, group_filter = NULL, ... )
x |
An object of class 'forecast_error' from |
type |
Select plot type; |
metric |
Select error metric to plot (e.g., "mae"); |
facet |
Optional. A formula with any combination of |
models |
Optional. A vector of user-defined model names from |
horizons |
Optional. A numeric vector to filter results by horizon. |
windows |
Optional. A numeric vector to filter results by validation window number. |
group_filter |
A string for filtering plot results for grouped time series (e.g., |
... |
Not used. |
Forecast error plots of class 'ggplot'.
Plot hyperparameter stability and relationship with error metrics across validation datasets and horizons.
## S3 method for class 'forecast_model_hyper' plot( x, data_results, data_error, type = c("stability", "error"), horizons = NULL, windows = NULL, ... )
## S3 method for class 'forecast_model_hyper' plot( x, data_results, data_error, type = c("stability", "error"), horizons = NULL, windows = NULL, ... )
x |
An object of class 'forecast_model_hyper' from |
data_results |
An object of class 'training_results' from
|
data_error |
An object of class 'validation_error' from
|
type |
Select plot type; 'stability' is the default. |
horizons |
Optional. A numeric vector to filter results by horizon. |
windows |
Optional. A numeric vector to filter results by validation window number. |
... |
Not used. |
Hyper-parameter plots of class 'ggplot'.
# Sampled Seatbelts data from the R package datasets. data("data_seatbelts", package = "forecastML") # Example - Training data for 2 horizon-specific models w/ common lags per predictor. horizons <- c(1, 12) lookback <- 1:15 data_train <- create_lagged_df(data_seatbelts, type = "train", outcome_col = 1, lookback = lookback, horizon = horizons) # One custom validation window at the end of the dataset. windows <- create_windows(data_train, window_start = 181, window_stop = 192) # User-define model - LASSO # A user-defined wrapper function for model training that takes the following # arguments: (1) a horizon-specific data.frame made with create_lagged_df(..., type = "train") # (e.g., my_lagged_df$horizon_h) and, optionally, (2) any number of additional named arguments # which are passed as '...' in train_model(). library(glmnet) model_function <- function(data, my_outcome_col) { x <- data[, -(my_outcome_col), drop = FALSE] y <- data[, my_outcome_col, drop = FALSE] x <- as.matrix(x, ncol = ncol(x)) y <- as.matrix(y, ncol = ncol(y)) model <- glmnet::cv.glmnet(x, y, nfolds = 3) return(model) } # my_outcome_col = 1 is passed in ... but could have been defined in model_function(). model_results <- train_model(data_train, windows, model_name = "LASSO", model_function, my_outcome_col = 1) # User-defined prediction function - LASSO # The predict() wrapper takes two positional arguments. First, # the returned model from the user-defined modeling function (model_function() above). # Second, a data.frame of predictors--identical to the datasets returned from # create_lagged_df(..., type = "train"). The function can return a 1- or 3-column data.frame # with either (a) point forecasts or (b) point forecasts plus lower and upper forecast # bounds (column order and column names do not matter). prediction_function <- function(model, data_features) { x <- as.matrix(data_features, ncol = ncol(data_features)) data_pred <- data.frame("y_pred" = predict(model, x, s = "lambda.min")) return(data_pred) } # Predict on the validation datasets. data_valid <- predict(model_results, prediction_function = list(prediction_function), data = data_train) # User-defined hyperparameter function - LASSO # The hyperparameter function should take one positional argument--the returned model # from the user-defined modeling function (model_function() above). It should # return a 1-row data.frame of the optimal hyperparameters. hyper_function <- function(model) { lambda_min <- model$lambda.min lambda_1se <- model$lambda.1se data_hyper <- data.frame("lambda_min" = lambda_min, "lambda_1se" = lambda_1se) return(data_hyper) } data_error <- return_error(data_valid) data_hyper <- return_hyper(model_results, hyper_function) plot(data_hyper, data_valid, data_error, type = "stability", horizons = c(1, 12))
# Sampled Seatbelts data from the R package datasets. data("data_seatbelts", package = "forecastML") # Example - Training data for 2 horizon-specific models w/ common lags per predictor. horizons <- c(1, 12) lookback <- 1:15 data_train <- create_lagged_df(data_seatbelts, type = "train", outcome_col = 1, lookback = lookback, horizon = horizons) # One custom validation window at the end of the dataset. windows <- create_windows(data_train, window_start = 181, window_stop = 192) # User-define model - LASSO # A user-defined wrapper function for model training that takes the following # arguments: (1) a horizon-specific data.frame made with create_lagged_df(..., type = "train") # (e.g., my_lagged_df$horizon_h) and, optionally, (2) any number of additional named arguments # which are passed as '...' in train_model(). library(glmnet) model_function <- function(data, my_outcome_col) { x <- data[, -(my_outcome_col), drop = FALSE] y <- data[, my_outcome_col, drop = FALSE] x <- as.matrix(x, ncol = ncol(x)) y <- as.matrix(y, ncol = ncol(y)) model <- glmnet::cv.glmnet(x, y, nfolds = 3) return(model) } # my_outcome_col = 1 is passed in ... but could have been defined in model_function(). model_results <- train_model(data_train, windows, model_name = "LASSO", model_function, my_outcome_col = 1) # User-defined prediction function - LASSO # The predict() wrapper takes two positional arguments. First, # the returned model from the user-defined modeling function (model_function() above). # Second, a data.frame of predictors--identical to the datasets returned from # create_lagged_df(..., type = "train"). The function can return a 1- or 3-column data.frame # with either (a) point forecasts or (b) point forecasts plus lower and upper forecast # bounds (column order and column names do not matter). prediction_function <- function(model, data_features) { x <- as.matrix(data_features, ncol = ncol(data_features)) data_pred <- data.frame("y_pred" = predict(model, x, s = "lambda.min")) return(data_pred) } # Predict on the validation datasets. data_valid <- predict(model_results, prediction_function = list(prediction_function), data = data_train) # User-defined hyperparameter function - LASSO # The hyperparameter function should take one positional argument--the returned model # from the user-defined modeling function (model_function() above). It should # return a 1-row data.frame of the optimal hyperparameters. hyper_function <- function(model) { lambda_min <- model$lambda.min lambda_1se <- model$lambda.1se data_hyper <- data.frame("lambda_min" = lambda_min, "lambda_1se" = lambda_1se) return(data_hyper) } data_error <- return_error(data_valid) data_hyper <- return_hyper(model_results, hyper_function) plot(data_hyper, data_valid, data_error, type = "stability", horizons = c(1, 12))
A forecast plot for each horizon for each model in predict.forecast_model()
.
## S3 method for class 'forecast_results' plot( x, data_actual = NULL, actual_indices = NULL, facet = horizon ~ model, models = NULL, horizons = NULL, windows = NULL, group_filter = NULL, ... )
## S3 method for class 'forecast_results' plot( x, data_actual = NULL, actual_indices = NULL, facet = horizon ~ model, models = NULL, horizons = NULL, windows = NULL, group_filter = NULL, ... )
x |
An object of class 'forecast_results' from |
data_actual |
A data.frame containing the target/outcome name and any grouping columns. The data can be historical actuals and/or holdout/test data. |
actual_indices |
Required if |
facet |
Optional. For numeric outcomes, a formula with any combination of |
models |
Optional. Filter results by user-defined model name from |
horizons |
Optional. Filter results by horizon. |
windows |
Optional. Filter results by validation window number. |
group_filter |
Optional. A string for filtering plot results for grouped time-series (e.g., |
... |
Not used. |
Forecast plot of class 'ggplot'.
A forecast plot of h-step-ahead forecasts produced from multiple horizon-specific forecast models
using combine_forecasts()
.
## S3 method for class 'forecastML' plot( x, data_actual = NULL, actual_indices = NULL, facet = ~model, models = NULL, group_filter = NULL, drop_facet = FALSE, interval_fill = NULL, interval_alpha = NULL, ... )
## S3 method for class 'forecastML' plot( x, data_actual = NULL, actual_indices = NULL, facet = ~model, models = NULL, group_filter = NULL, drop_facet = FALSE, interval_fill = NULL, interval_alpha = NULL, ... )
x |
An object of class 'forecastML' from |
data_actual |
A data.frame containing the target/outcome name and any grouping columns. The data can be historical actuals and/or holdout/test data. |
actual_indices |
Required if |
facet |
Optional. A formula with any combination of |
models |
Optional. Filter results by user-defined model name from |
group_filter |
Optional. A string for filtering plot results for grouped time-series (e.g., |
drop_facet |
Optional. Boolean. If actuals are given when forecasting factors, the plot facet with 'actual' data can be dropped. |
interval_fill |
A character vector of color names or hex codes to fill the prediction intervals. For intervals with multiple levels, the first color corresponds to the fill with the widest interval. |
interval_alpha |
A numeric vector of alpha values to shade the prediction intervals. For intervals with multiple levels, the first value corresponds to the shading with the widest interval. |
... |
Not used. |
Forecast plot of class 'ggplot'.
Plot datasets with lagged features to view ther direct forecasting setup across horizons.
## S3 method for class 'lagged_df' plot(x, ...)
## S3 method for class 'lagged_df' plot(x, ...)
x |
An object of class 'lagged_df' from |
... |
Not used. |
A single plot of class 'ggplot' if lookback
was specified in create_lagged_df()
;
a list of plots, one per feature, of class 'ggplot' if lookback_control
was specified.
# Sampled Seatbelts data from the R package datasets. data("data_seatbelts", package = "forecastML") #------------------------------------------------------------------------------ # Example 1 - Training data for 3 horizon-specific models w/ common lags per predictor. horizons <- c(1, 6, 12) lookback <- 1:15 data_train <- create_lagged_df(data_seatbelts, type = "train", outcome_col = 1, lookback = lookback, horizon = horizons) plot(data_train) #------------------------------------------------------------------------------ # Example 2 - Training data for one 3-month horizon model w/ unique lags per predictor. horizons <- 3 lookback <- list(c(3, 6, 9, 12), c(4:12), c(6:15), c(8)) data_train <- create_lagged_df(data_seatbelts, type = "train", outcome_col = 1, lookback_control = lookback, horizon = horizons) plot(data_train)
# Sampled Seatbelts data from the R package datasets. data("data_seatbelts", package = "forecastML") #------------------------------------------------------------------------------ # Example 1 - Training data for 3 horizon-specific models w/ common lags per predictor. horizons <- c(1, 6, 12) lookback <- 1:15 data_train <- create_lagged_df(data_seatbelts, type = "train", outcome_col = 1, lookback = lookback, horizon = horizons) plot(data_train) #------------------------------------------------------------------------------ # Example 2 - Training data for one 3-month horizon model w/ unique lags per predictor. horizons <- 3 lookback <- list(c(3, 6, 9, 12), c(4:12), c(6:15), c(8)) data_train <- create_lagged_df(data_seatbelts, type = "train", outcome_col = 1, lookback_control = lookback, horizon = horizons) plot(data_train)
Several diagnostic plots can be returned to assess the quality of the forecasts based on predictions on the validation datasets.
## S3 method for class 'training_results' plot( x, type = c("prediction", "residual", "forecast_stability"), facet = horizon ~ model, models = NULL, horizons = NULL, windows = NULL, valid_indices = NULL, group_filter = NULL, keep_missing = FALSE, ... )
## S3 method for class 'training_results' plot( x, type = c("prediction", "residual", "forecast_stability"), facet = horizon ~ model, models = NULL, horizons = NULL, windows = NULL, valid_indices = NULL, group_filter = NULL, keep_missing = FALSE, ... )
x |
An object of class 'training_results' from |
type |
Plot type. The default plot is "prediction" for validation dataset predictions. |
facet |
Optional. For numeric outcomes, a formula with any combination of |
models |
Optional. Filter results by user-defined model name from |
horizons |
Optional. A numeric vector of model forecast horizons to filter results by horizon-specific model. |
windows |
Optional. A numeric vector of window numbers to filter results. |
valid_indices |
Optional. A numeric or date vector to filter results by validation row indices or dates. |
group_filter |
Optional. A string for filtering plot results for grouped time series
(e.g., |
keep_missing |
Boolean. If |
... |
Not used. |
Diagnostic plots of class 'ggplot'.
Plot forecast error at various levels of aggregation across validation datasets.
## S3 method for class 'validation_error' plot( x, type = c("window", "horizon", "global"), metric = NULL, facet = NULL, models = NULL, horizons = NULL, windows = NULL, group_filter = NULL, ... )
## S3 method for class 'validation_error' plot( x, type = c("window", "horizon", "global"), metric = NULL, facet = NULL, models = NULL, horizons = NULL, windows = NULL, group_filter = NULL, ... )
x |
An object of class 'validation_error' from |
type |
Select plot type; |
metric |
Select error metric to plot (e.g., "mae"); |
facet |
Optional. A formula with any combination of |
models |
Optional. A vector of user-defined model names from |
horizons |
Optional. A numeric vector to filter results by horizon. |
windows |
Optional. A numeric vector to filter results by validation window number. |
group_filter |
A string for filtering plot results for grouped time series (e.g., |
... |
Not used. |
Forecast error plots of class 'ggplot'.
Plot validation datasets across time.
## S3 method for class 'windows' plot(x, lagged_df, show_labels = TRUE, group_filter = NULL, ...)
## S3 method for class 'windows' plot(x, lagged_df, show_labels = TRUE, group_filter = NULL, ...)
x |
An object of class 'windows' from |
lagged_df |
An object of class 'lagged_df' from |
show_labels |
Boolean. If |
group_filter |
Optional. A string for filtering plot results for grouped time series (e.g., |
... |
Not used. |
A plot of the outer-loop nested cross-validation windows of class 'ggplot'.
# Sampled Seatbelts data from the R package datasets. data("data_seatbelts", package = "forecastML") # Example - Training data for 3 horizon-specific models w/ common lags per predictor. horizons <- c(1, 6, 12) lookback <- 1:15 data_train <- create_lagged_df(data_seatbelts, type = "train", outcome_col = 1, lookback = lookback, horizon = horizons) # All historical window lengths of 12 plus any partial windows at the end of the dataset. windows <- create_windows(data_train, window_length = 12) plot(windows, data_train) # Two custom validation windows with different lengths. windows <- create_windows(data_train, window_start = c(20, 80), window_stop = c(30, 100)) plot(windows, data_train)
# Sampled Seatbelts data from the R package datasets. data("data_seatbelts", package = "forecastML") # Example - Training data for 3 horizon-specific models w/ common lags per predictor. horizons <- c(1, 6, 12) lookback <- 1:15 data_train <- create_lagged_df(data_seatbelts, type = "train", outcome_col = 1, lookback = lookback, horizon = horizons) # All historical window lengths of 12 plus any partial windows at the end of the dataset. windows <- create_windows(data_train, window_length = 12) plot(windows, data_train) # Two custom validation windows with different lengths. windows <- create_windows(data_train, window_start = c(20, 80), window_stop = c(30, 100)) plot(windows, data_train)
Predict with a 'forecast_model' object from train_model()
. If data = create_lagged_df(..., type = "train")
,
predictions are returned for the outer-loop nested cross-validation datasets.
If data
is an object of class 'lagged_df' from create_lagged_df(..., type = "forecast")
,
predictions are returned for the horizons specified in create_lagged_df(horizons = ...)
.
## S3 method for class 'forecast_model' predict(..., prediction_function = list(NULL), data)
## S3 method for class 'forecast_model' predict(..., prediction_function = list(NULL), data)
... |
One or more trained models from |
prediction_function |
A list of user-defined prediction functions with length equal to
the number of models supplied in |
data |
If |
If data = create_lagged_df(..., type = "forecast")
, an S3 object of class 'training_results'. If
data = create_lagged_df(..., type = "forecast")
, an S3 object of class 'forecast_results'.
Columns in returned 'training_results' data.frame:
model
: User-supplied model name in train_model()
.
model_forecast_horizon
: The direct-forecasting time horizon that the model was trained on.
window_length
: Validation window length measured in dataset rows.
window_number
: Validation dataset number.
valid_indices
: Validation dataset row names from attributes(create_lagged_df())$row_indices
.
date_indices
: If given and method = "direct"
, validation dataset date indices from attributes(create_lagged_df())$date_indices
.
If given and method = "multi_output"
, date_indices represents the date of the forecast.
"groups"
: If given, the user-supplied groups in create_lagged_df()
.
"outcome_name"
: The target being forecasted.
"outcome_name"_pred
: The model predictions.
"outcome_name"_pred_lower
: If given, the lower prediction bounds returned by the user-supplied prediction function.
"outcome_name"_pred_upper
: If given, the upper prediction bounds returned by the user-supplied prediction function.
forecast_indices
: If method = "multi_output"
, the validation index of the h-step-ahead forecast.
forecast_date_indices
: If method = "multi_output"
, the validation date index of the h-step-ahead forecast.
Columns in returned 'forecast_results' data.frame:
model
: User-supplied model name in train_model()
.
model_forecast_horizon
: If method = "direct"
, the direct-forecasting time horizon that the model was trained on.
horizon
: Forecast horizons, 1:h, measured in dataset rows.
window_length
: Validation window length measured in dataset rows.
forecast_period
: The forecast period in row indices or dates. The forecast period starts at either attributes(create_lagged_df())$data_stop + 1
for row indices or attributes(create_lagged_df())$data_stop + 1 * frequency
for date indices.
"groups"
: If given, the user-supplied groups in create_lagged_df()
.
"outcome_name"
: The target being forecasted.
"outcome_name"_pred
: The model forecasts.
"outcome_name"_pred_lower
: If given, the lower forecast bounds returned by the user-supplied prediction function.
"outcome_name"_pred_upper
: If given, the upper forecast bounds returned by the user-supplied prediction function.
# Sampled Seatbelts data from the R package datasets. data("data_seatbelts", package = "forecastML") # Example - Training data for 2 horizon-specific models w/ common lags per predictor. horizons <- c(1, 12) lookback <- 1:15 data_train <- create_lagged_df(data_seatbelts, type = "train", outcome_col = 1, lookback = lookback, horizon = horizons) # One custom validation window at the end of the dataset. windows <- create_windows(data_train, window_start = 181, window_stop = 192) # User-define model - LASSO # A user-defined wrapper function for model training that takes the following # arguments: (1) a horizon-specific data.frame made with create_lagged_df(..., type = "train") # (e.g., my_lagged_df$horizon_h) and, optionally, (2) any number of additional named arguments # which are passed as '...' in train_model(). library(glmnet) model_function <- function(data, my_outcome_col) { x <- data[, -(my_outcome_col), drop = FALSE] y <- data[, my_outcome_col, drop = FALSE] x <- as.matrix(x, ncol = ncol(x)) y <- as.matrix(y, ncol = ncol(y)) model <- glmnet::cv.glmnet(x, y, nfolds = 3) return(model) } # my_outcome_col = 1 is passed in ... but could have been defined in model_function(). model_results <- train_model(data_train, windows, model_name = "LASSO", model_function, my_outcome_col = 1) # User-defined prediction function - LASSO # The predict() wrapper takes two positional arguments. First, # the returned model from the user-defined modeling function (model_function() above). # Second, a data.frame of predictors--identical to the datasets returned from # create_lagged_df(..., type = "train"). The function can return a 1- or 3-column data.frame # with either (a) point forecasts or (b) point forecasts plus lower and upper forecast # bounds (column order and column names do not matter). prediction_function <- function(model, data_features) { x <- as.matrix(data_features, ncol = ncol(data_features)) data_pred <- data.frame("y_pred" = predict(model, x, s = "lambda.min")) return(data_pred) } # Predict on the validation datasets. data_valid <- predict(model_results, prediction_function = list(prediction_function), data = data_train) # Forecast. data_forecast <- create_lagged_df(data_seatbelts, type = "forecast", outcome_col = 1, lookback = lookback, horizon = horizons) data_forecasts <- predict(model_results, prediction_function = list(prediction_function), data = data_forecast)
# Sampled Seatbelts data from the R package datasets. data("data_seatbelts", package = "forecastML") # Example - Training data for 2 horizon-specific models w/ common lags per predictor. horizons <- c(1, 12) lookback <- 1:15 data_train <- create_lagged_df(data_seatbelts, type = "train", outcome_col = 1, lookback = lookback, horizon = horizons) # One custom validation window at the end of the dataset. windows <- create_windows(data_train, window_start = 181, window_stop = 192) # User-define model - LASSO # A user-defined wrapper function for model training that takes the following # arguments: (1) a horizon-specific data.frame made with create_lagged_df(..., type = "train") # (e.g., my_lagged_df$horizon_h) and, optionally, (2) any number of additional named arguments # which are passed as '...' in train_model(). library(glmnet) model_function <- function(data, my_outcome_col) { x <- data[, -(my_outcome_col), drop = FALSE] y <- data[, my_outcome_col, drop = FALSE] x <- as.matrix(x, ncol = ncol(x)) y <- as.matrix(y, ncol = ncol(y)) model <- glmnet::cv.glmnet(x, y, nfolds = 3) return(model) } # my_outcome_col = 1 is passed in ... but could have been defined in model_function(). model_results <- train_model(data_train, windows, model_name = "LASSO", model_function, my_outcome_col = 1) # User-defined prediction function - LASSO # The predict() wrapper takes two positional arguments. First, # the returned model from the user-defined modeling function (model_function() above). # Second, a data.frame of predictors--identical to the datasets returned from # create_lagged_df(..., type = "train"). The function can return a 1- or 3-column data.frame # with either (a) point forecasts or (b) point forecasts plus lower and upper forecast # bounds (column order and column names do not matter). prediction_function <- function(model, data_features) { x <- as.matrix(data_features, ncol = ncol(data_features)) data_pred <- data.frame("y_pred" = predict(model, x, s = "lambda.min")) return(data_pred) } # Predict on the validation datasets. data_valid <- predict(model_results, prediction_function = list(prediction_function), data = data_train) # Forecast. data_forecast <- create_lagged_df(data_seatbelts, type = "forecast", outcome_col = 1, lookback = lookback, horizon = horizons) data_forecasts <- predict(model_results, prediction_function = list(prediction_function), data = data_forecast)
The purpose of forecast reconciliation is to produce a single coherent forecast from multiple forecasts produced at (a) different time horizons (e.g., monthly and quarterly) and/or (b) different levels of aggregation (e.g., classroom, school, and school district). After forecast reconciliation, the bottom-level or most disaggregated forecast can simply be summed up to produce all higher-level forecasts.
reconcile_forecasts( forecasts, frequency, index, outcome, keys = NULL, method, keep_all = TRUE, keep_non_reconciled = FALSE )
reconcile_forecasts( forecasts, frequency, index, outcome, keys = NULL, method, keep_all = TRUE, keep_non_reconciled = FALSE )
forecasts |
A list of 2 or more dataframes with forecasts. Each dataframe must have
a date column named |
frequency |
A character vector of |
index |
A string giving the column name of the date column which should be common across |
outcome |
A string giving the column name of the forecast which should be common across |
keys |
Optional. For forecast reconciliation across groups, a |
method |
One of |
keep_all |
Boolean. For |
keep_non_reconciled |
Boolean. For |
A data.frame
of reconciled forecasts.
method = 'temporal': Forecasts are reconciled across forecast horizons.
Structural scaling with weights from temporal hierarchies from Athanasopoulos et al. (2017).
To produce correct forecast reconciliations, all forecasts at the lowest/disaggregated level should be present for all horizons contained in the forecasts with the higher levels of aggregation (e.g., 24 monthly forecasts for 2 annual forecasts or 21 daily forecasts for 3 weekly forecasts).
method = 'group': Forecasts are reconciled across groups independently at each forecast horizon.
Structural scaling from Hyndman et al. (2011).
A key column is not needed for the forecast at the highest level of aggregation.
Having input forecasts at each level of aggregation is not a requirement. For example, forecasts by nation, state, and city could be reconciled with only 2 input forecasts: 1 for nation (highest aggregation) and 1 for the combination of nation by state by city (lowest/no aggregation) without the 2 intermediate-level forecasts at the state and city levels.
Athanasopoulos, G., Hyndman, R. J., Kourentzes, N., & Petropoulos, F. (2017). Forecasting with temporal hierarchies. European Journal of Operational Research, 262(1), 60-74. https://robjhyndman.com/papers/temporalhierarchies.pdf
Hyndman, R. J., Ahmed, R. A., Athanasopoulos, G., & Shang, H. L. (2011). Optimal combination forecasts for hierarchical time series. Computational statistics & data analysis, 55(9), 2579-2589. http://robjhyndman.com/papers/hierarchical
#------------------------------------------------------------------------------ # Temporal example 1: 2 forecasts, daily/monthly, 2 forecast periods at highest aggregation. freq <- c("1 day", "1 month") data_1_day <- data.frame("index" = seq(as.Date("2020-1-1"), as.Date("2020-2-29"), by = freq[1]), "forecast" = c(rep(5, 31), rep(7, 29))) data_1_month <- data.frame("index" = seq(as.Date("2020-1-1"), as.Date("2020-2-1"), by = freq[2]), "forecast" = c(150, 200)) forecasts_reconciled <- reconcile_forecasts(list(data_1_day, data_1_month), freq, index = "index", outcome = "forecast", method = "temporal") #------------------------------------------------------------------------------ # Temporal example 2: 3 forecasts, monthly/4-monthly/annually, 1 forecast period at highest aggregation. freq <- c("1 month", "4 months", "1 year") data_1_month <- data.frame("index" = seq(as.Date("2020-1-1"), as.Date("2020-12-1"), by = freq[1]), "forecast" = rep(10, 12)) data_4_months <- data.frame("index" = seq(as.Date("2020-1-1"), as.Date("2020-12-1"), by = freq[2]), "forecast" = c(40, 50, 45)) data_1_year <- data.frame("index" = as.Date("2020-01-01"), "forecast" = c(110)) forecasts_reconciled <- reconcile_forecasts(list(data_1_month, data_4_months, data_1_year), freq, index = "index", outcome = "forecast", method = "temporal") #------------------------------------------------------------------------------ # Temporal example 3: 2 forecasts, weekly/monthly, 2 forecast periods at highest aggregation. freq <- c("1 week", "1 month") data_1_week <- data.frame("index" = seq(as.Date("2020-1-1"), as.Date("2020-3-1"), by = freq[1]), "forecast" = c(rep(3, 5), rep(2, 4))) data_1_month <- data.frame("index" = seq(as.Date("2020-1-1"), as.Date("2020-2-1"), by = freq[2]), "forecast" = c(11, 12)) forecasts_reconciled <- reconcile_forecasts(list(data_1_week, data_1_month), freq, index = "index", outcome = "forecast", method = "temporal") #------------------------------------------------------------------------------ # Temporal example 4: 2 forecasts, hourly/daily, 3 forecast periods at highest aggregation. freq <- c("1 hour", "1 day") timezone <- "UTC" data_1_hour <- data.frame("index" = seq(as.POSIXct("2020-01-01 00:00:00", tz = timezone), as.POSIXct("2020-01-03 23:00:00", tz = timezone), by = freq[1]), "forecast" = rep(c(3, 5), 72 / 2)) data_1_day <- data.frame("index" = seq(as.Date("2020-1-1"), as.Date("2020-1-3"), by = freq[2]), "forecast" = c(90, 100, 105)) forecasts_reconciled <- reconcile_forecasts(list(data_1_hour, data_1_day), freq, index = "index", outcome = "forecast", method = "temporal") #------------------------------------------------------------------------------ # Grouped example 1: 2 forecasts, completely nested/hierarchical. freq <- c("1 month") dates <- seq(as.Date("2020-1-1"), as.Date("2020-3-1"), by = freq) data_total <- data.frame("index" = dates, "forecast" = c(50, 100, 75)) data_state <- data.frame("index" = rep(dates, 2), "state" = c(rep("IL", length(dates)), rep("WI", length(dates))), "forecast" = c(20, 60, 40, 25, 40, 50)) forecasts <- list("total" = data_total, "state" = data_state) forecasts_reconciled <- reconcile_forecasts(forecasts, freq, index = "index", outcome = "forecast", method = "group") #------------------------------------------------------------------------------ # Grouped example 2: 4 forecasts, non-nested. freq <- c("1 month") dates <- seq(as.Date("2020-1-1"), as.Date("2020-3-1"), by = freq) data_total <- data.frame("index" = dates, "forecast" = c(50, 100, 75)) data_state <- data.frame("index" = rep(dates, 2), "state" = c(rep("IL", length(dates)), rep("WI", length(dates))), "forecast" = c(20, 60, 40, 25, 40, 50)) data_sex <- data.frame("index" = rep(dates, 2), "sex" = c(rep("M", length(dates)), rep("F", length(dates))), "forecast" = c(25, 45, 40, 35, 40, 20)) data_state_sex <- data.frame("index" = rep(dates, 4), "state" = c(rep("IL", length(dates)*2), rep("WI", length(dates)*2)), "sex" = c(rep("M", 3), rep("F", 3), rep("M", 3), rep("F", 3)), "forecast" = c(5, 15, 10, 30, 10, 10, 25, 30, 20, 10, 10, 15)) forecasts <- list("total" = data_total, "state" = data_state, "sex" = data_sex, "state_sex" = data_state_sex) forecasts_reconciled <- reconcile_forecasts(forecasts, freq, index = "index", outcome = "forecast", method = "group")
#------------------------------------------------------------------------------ # Temporal example 1: 2 forecasts, daily/monthly, 2 forecast periods at highest aggregation. freq <- c("1 day", "1 month") data_1_day <- data.frame("index" = seq(as.Date("2020-1-1"), as.Date("2020-2-29"), by = freq[1]), "forecast" = c(rep(5, 31), rep(7, 29))) data_1_month <- data.frame("index" = seq(as.Date("2020-1-1"), as.Date("2020-2-1"), by = freq[2]), "forecast" = c(150, 200)) forecasts_reconciled <- reconcile_forecasts(list(data_1_day, data_1_month), freq, index = "index", outcome = "forecast", method = "temporal") #------------------------------------------------------------------------------ # Temporal example 2: 3 forecasts, monthly/4-monthly/annually, 1 forecast period at highest aggregation. freq <- c("1 month", "4 months", "1 year") data_1_month <- data.frame("index" = seq(as.Date("2020-1-1"), as.Date("2020-12-1"), by = freq[1]), "forecast" = rep(10, 12)) data_4_months <- data.frame("index" = seq(as.Date("2020-1-1"), as.Date("2020-12-1"), by = freq[2]), "forecast" = c(40, 50, 45)) data_1_year <- data.frame("index" = as.Date("2020-01-01"), "forecast" = c(110)) forecasts_reconciled <- reconcile_forecasts(list(data_1_month, data_4_months, data_1_year), freq, index = "index", outcome = "forecast", method = "temporal") #------------------------------------------------------------------------------ # Temporal example 3: 2 forecasts, weekly/monthly, 2 forecast periods at highest aggregation. freq <- c("1 week", "1 month") data_1_week <- data.frame("index" = seq(as.Date("2020-1-1"), as.Date("2020-3-1"), by = freq[1]), "forecast" = c(rep(3, 5), rep(2, 4))) data_1_month <- data.frame("index" = seq(as.Date("2020-1-1"), as.Date("2020-2-1"), by = freq[2]), "forecast" = c(11, 12)) forecasts_reconciled <- reconcile_forecasts(list(data_1_week, data_1_month), freq, index = "index", outcome = "forecast", method = "temporal") #------------------------------------------------------------------------------ # Temporal example 4: 2 forecasts, hourly/daily, 3 forecast periods at highest aggregation. freq <- c("1 hour", "1 day") timezone <- "UTC" data_1_hour <- data.frame("index" = seq(as.POSIXct("2020-01-01 00:00:00", tz = timezone), as.POSIXct("2020-01-03 23:00:00", tz = timezone), by = freq[1]), "forecast" = rep(c(3, 5), 72 / 2)) data_1_day <- data.frame("index" = seq(as.Date("2020-1-1"), as.Date("2020-1-3"), by = freq[2]), "forecast" = c(90, 100, 105)) forecasts_reconciled <- reconcile_forecasts(list(data_1_hour, data_1_day), freq, index = "index", outcome = "forecast", method = "temporal") #------------------------------------------------------------------------------ # Grouped example 1: 2 forecasts, completely nested/hierarchical. freq <- c("1 month") dates <- seq(as.Date("2020-1-1"), as.Date("2020-3-1"), by = freq) data_total <- data.frame("index" = dates, "forecast" = c(50, 100, 75)) data_state <- data.frame("index" = rep(dates, 2), "state" = c(rep("IL", length(dates)), rep("WI", length(dates))), "forecast" = c(20, 60, 40, 25, 40, 50)) forecasts <- list("total" = data_total, "state" = data_state) forecasts_reconciled <- reconcile_forecasts(forecasts, freq, index = "index", outcome = "forecast", method = "group") #------------------------------------------------------------------------------ # Grouped example 2: 4 forecasts, non-nested. freq <- c("1 month") dates <- seq(as.Date("2020-1-1"), as.Date("2020-3-1"), by = freq) data_total <- data.frame("index" = dates, "forecast" = c(50, 100, 75)) data_state <- data.frame("index" = rep(dates, 2), "state" = c(rep("IL", length(dates)), rep("WI", length(dates))), "forecast" = c(20, 60, 40, 25, 40, 50)) data_sex <- data.frame("index" = rep(dates, 2), "sex" = c(rep("M", length(dates)), rep("F", length(dates))), "forecast" = c(25, 45, 40, 35, 40, 20)) data_state_sex <- data.frame("index" = rep(dates, 4), "state" = c(rep("IL", length(dates)*2), rep("WI", length(dates)*2)), "sex" = c(rep("M", 3), rep("F", 3), rep("M", 3), rep("F", 3)), "forecast" = c(5, 15, 10, 30, 10, 10, 25, 30, 20, 10, 10, 15)) forecasts <- list("total" = data_total, "state" = data_state, "sex" = data_sex, "state_sex" = data_state_sex) forecasts_reconciled <- reconcile_forecasts(forecasts, freq, index = "index", outcome = "forecast", method = "group")
Return model residuals
residuals(object, ...)
residuals(object, ...)
object |
An object of class 'training_results' from running |
... |
Not used. |
A data.frame of model residuals of class 'training_residuals'.
Compute forecast error metrics on the validation datasets or a new test dataset.
return_error( data_results, data_test = NULL, test_indices = NULL, aggregate = stats::median, metrics = c("mae", "mape", "mdape", "smape", "rmse", "rmsse"), models = NULL, horizons = NULL, windows = NULL, group_filter = NULL )
return_error( data_results, data_test = NULL, test_indices = NULL, aggregate = stats::median, metrics = c("mae", "mape", "mdape", "smape", "rmse", "rmsse"), models = NULL, horizons = NULL, windows = NULL, group_filter = NULL )
data_results |
An object of class 'training_results' or 'forecast_results' from running (a)
|
data_test |
Required for forecast results only. If |
test_indices |
Required if |
aggregate |
Default |
metrics |
A character vector of common forecast error metrics. The default behavior is to return all metrics. |
models |
Optional. A character vector of user-defined model names supplied to |
horizons |
Optional. A numeric vector to filter results by horizon. |
windows |
Optional. A numeric vector to filter results by validation window number. |
group_filter |
Optional. A string for filtering plot results for grouped time series
(e.g., |
An S3 object of class 'validation_error', 'forecast_error', or 'forecastML_error': A list of data.frames
of error metrics for the validation or forecast dataset depending on the class of data_results
: 'training_results',
'forecast_results', or 'forecastML' from combine_forecasts()
.
A list containing:
Error metrics by model, horizon, and validation window
Error metrics by model and horizon, collapsed across validation windows
Global error metrics by model collapsed across horizons and validation windows
mae
: Mean absolute error (works with factor outcomes)
mape
: Mean absolute percentage error
mdape
: Median absolute percentage error
smape
: Symmetrical mean absolute percentage error
rmse
: Root mean squared error
rmsse
: Root mean squared scaled error from the M5 competition
The output of return_error()
has the following generic S3 methods
# Sampled Seatbelts data from the R package datasets. data("data_seatbelts", package = "forecastML") # Example - Training data for 2 horizon-specific models w/ common lags per predictor. horizons <- c(1, 12) lookback <- 1:15 data_train <- create_lagged_df(data_seatbelts, type = "train", outcome_col = 1, lookback = lookback, horizon = horizons) # One custom validation window at the end of the dataset. windows <- create_windows(data_train, window_start = 181, window_stop = 192) # User-define model - LASSO # A user-defined wrapper function for model training that takes the following # arguments: (1) a horizon-specific data.frame made with create_lagged_df(..., type = "train") # (e.g., my_lagged_df$horizon_h) and, optionally, (2) any number of additional named arguments # which are passed as '...' in train_model(). library(glmnet) model_function <- function(data, my_outcome_col) { x <- data[, -(my_outcome_col), drop = FALSE] y <- data[, my_outcome_col, drop = FALSE] x <- as.matrix(x, ncol = ncol(x)) y <- as.matrix(y, ncol = ncol(y)) model <- glmnet::cv.glmnet(x, y, nfolds = 3) return(model) } # my_outcome_col = 1 is passed in ... but could have been defined in model_function(). model_results <- train_model(data_train, windows, model_name = "LASSO", model_function, my_outcome_col = 1) # User-defined prediction function - LASSO # The predict() wrapper takes two positional arguments. First, # the returned model from the user-defined modeling function (model_function() above). # Second, a data.frame of predictors--identical to the datasets returned from # create_lagged_df(..., type = "train"). The function can return a 1- or 3-column data.frame # with either (a) point forecasts or (b) point forecasts plus lower and upper forecast # bounds (column order and column names do not matter). prediction_function <- function(model, data_features) { x <- as.matrix(data_features, ncol = ncol(data_features)) data_pred <- data.frame("y_pred" = predict(model, x, s = "lambda.min")) return(data_pred) } # Predict on the validation datasets. data_valid <- predict(model_results, prediction_function = list(prediction_function), data = data_train) # Forecast error metrics for validation datasets. data_error <- return_error(data_valid)
# Sampled Seatbelts data from the R package datasets. data("data_seatbelts", package = "forecastML") # Example - Training data for 2 horizon-specific models w/ common lags per predictor. horizons <- c(1, 12) lookback <- 1:15 data_train <- create_lagged_df(data_seatbelts, type = "train", outcome_col = 1, lookback = lookback, horizon = horizons) # One custom validation window at the end of the dataset. windows <- create_windows(data_train, window_start = 181, window_stop = 192) # User-define model - LASSO # A user-defined wrapper function for model training that takes the following # arguments: (1) a horizon-specific data.frame made with create_lagged_df(..., type = "train") # (e.g., my_lagged_df$horizon_h) and, optionally, (2) any number of additional named arguments # which are passed as '...' in train_model(). library(glmnet) model_function <- function(data, my_outcome_col) { x <- data[, -(my_outcome_col), drop = FALSE] y <- data[, my_outcome_col, drop = FALSE] x <- as.matrix(x, ncol = ncol(x)) y <- as.matrix(y, ncol = ncol(y)) model <- glmnet::cv.glmnet(x, y, nfolds = 3) return(model) } # my_outcome_col = 1 is passed in ... but could have been defined in model_function(). model_results <- train_model(data_train, windows, model_name = "LASSO", model_function, my_outcome_col = 1) # User-defined prediction function - LASSO # The predict() wrapper takes two positional arguments. First, # the returned model from the user-defined modeling function (model_function() above). # Second, a data.frame of predictors--identical to the datasets returned from # create_lagged_df(..., type = "train"). The function can return a 1- or 3-column data.frame # with either (a) point forecasts or (b) point forecasts plus lower and upper forecast # bounds (column order and column names do not matter). prediction_function <- function(model, data_features) { x <- as.matrix(data_features, ncol = ncol(data_features)) data_pred <- data.frame("y_pred" = predict(model, x, s = "lambda.min")) return(data_pred) } # Predict on the validation datasets. data_valid <- predict(model_results, prediction_function = list(prediction_function), data = data_train) # Forecast error metrics for validation datasets. data_error <- return_error(data_valid)
The purpose of this function is to support investigation into the stability of hyperparameters in the nested cross-validation and across forecast horizons.
return_hyper(forecast_model, hyper_function)
return_hyper(forecast_model, hyper_function)
forecast_model |
An object of class 'forecast_model' from |
hyper_function |
A user-defined function for retrieving model hyperparameters. See the example below for details. |
An S3 object of class 'forecast_model_hyper': A data.frame of model-specific hyperparameters.
The output of return_hyper()
has the following generic S3 methods
# Sampled Seatbelts data from the R package datasets. data("data_seatbelts", package = "forecastML") # Example - Training data for 2 horizon-specific models w/ common lags per predictor. horizons <- c(1, 12) lookback <- 1:15 data_train <- create_lagged_df(data_seatbelts, type = "train", outcome_col = 1, lookback = lookback, horizon = horizons) # One custom validation window at the end of the dataset. windows <- create_windows(data_train, window_start = 181, window_stop = 192) # User-define model - LASSO # A user-defined wrapper function for model training that takes the following # arguments: (1) a horizon-specific data.frame made with create_lagged_df(..., type = "train") # (e.g., my_lagged_df$horizon_h) and, optionally, (2) any number of additional named arguments # which are passed as '...' in train_model(). library(glmnet) model_function <- function(data, my_outcome_col) { x <- data[, -(my_outcome_col), drop = FALSE] y <- data[, my_outcome_col, drop = FALSE] x <- as.matrix(x, ncol = ncol(x)) y <- as.matrix(y, ncol = ncol(y)) model <- glmnet::cv.glmnet(x, y, nfolds = 3) return(model) } # my_outcome_col = 1 is passed in ... but could have been defined in model_function(). model_results <- train_model(data_train, windows, model_name = "LASSO", model_function, my_outcome_col = 1) # User-defined prediction function - LASSO # The predict() wrapper takes two positional arguments. First, # the returned model from the user-defined modeling function (model_function() above). # Second, a data.frame of predictors--identical to the datasets returned from # create_lagged_df(..., type = "train"). The function can return a 1- or 3-column data.frame # with either (a) point forecasts or (b) point forecasts plus lower and upper forecast # bounds (column order and column names do not matter). prediction_function <- function(model, data_features) { x <- as.matrix(data_features, ncol = ncol(data_features)) data_pred <- data.frame("y_pred" = predict(model, x, s = "lambda.min")) return(data_pred) } # Predict on the validation datasets. data_valid <- predict(model_results, prediction_function = list(prediction_function), data = data_train) # User-defined hyperparameter function - LASSO # The hyperparameter function should take one positional argument--the returned model # from the user-defined modeling function (model_function() above). It should # return a 1-row data.frame of the optimal hyperparameters. hyper_function <- function(model) { lambda_min <- model$lambda.min lambda_1se <- model$lambda.1se data_hyper <- data.frame("lambda_min" = lambda_min, "lambda_1se" = lambda_1se) return(data_hyper) } data_error <- return_error(data_valid) data_hyper <- return_hyper(model_results, hyper_function) plot(data_hyper, data_valid, data_error, type = "stability", horizons = c(1, 12))
# Sampled Seatbelts data from the R package datasets. data("data_seatbelts", package = "forecastML") # Example - Training data for 2 horizon-specific models w/ common lags per predictor. horizons <- c(1, 12) lookback <- 1:15 data_train <- create_lagged_df(data_seatbelts, type = "train", outcome_col = 1, lookback = lookback, horizon = horizons) # One custom validation window at the end of the dataset. windows <- create_windows(data_train, window_start = 181, window_stop = 192) # User-define model - LASSO # A user-defined wrapper function for model training that takes the following # arguments: (1) a horizon-specific data.frame made with create_lagged_df(..., type = "train") # (e.g., my_lagged_df$horizon_h) and, optionally, (2) any number of additional named arguments # which are passed as '...' in train_model(). library(glmnet) model_function <- function(data, my_outcome_col) { x <- data[, -(my_outcome_col), drop = FALSE] y <- data[, my_outcome_col, drop = FALSE] x <- as.matrix(x, ncol = ncol(x)) y <- as.matrix(y, ncol = ncol(y)) model <- glmnet::cv.glmnet(x, y, nfolds = 3) return(model) } # my_outcome_col = 1 is passed in ... but could have been defined in model_function(). model_results <- train_model(data_train, windows, model_name = "LASSO", model_function, my_outcome_col = 1) # User-defined prediction function - LASSO # The predict() wrapper takes two positional arguments. First, # the returned model from the user-defined modeling function (model_function() above). # Second, a data.frame of predictors--identical to the datasets returned from # create_lagged_df(..., type = "train"). The function can return a 1- or 3-column data.frame # with either (a) point forecasts or (b) point forecasts plus lower and upper forecast # bounds (column order and column names do not matter). prediction_function <- function(model, data_features) { x <- as.matrix(data_features, ncol = ncol(data_features)) data_pred <- data.frame("y_pred" = predict(model, x, s = "lambda.min")) return(data_pred) } # Predict on the validation datasets. data_valid <- predict(model_results, prediction_function = list(prediction_function), data = data_train) # User-defined hyperparameter function - LASSO # The hyperparameter function should take one positional argument--the returned model # from the user-defined modeling function (model_function() above). It should # return a 1-row data.frame of the optimal hyperparameters. hyper_function <- function(model) { lambda_min <- model$lambda.min lambda_1se <- model$lambda.1se data_hyper <- data.frame("lambda_min" = lambda_min, "lambda_1se" = lambda_1se) return(data_hyper) } data_error <- return_error(data_valid) data_hyper <- return_hyper(model_results, hyper_function) plot(data_hyper, data_valid, data_error, type = "stability", horizons = c(1, 12))
Return a summary of a lagged_df object
## S3 method for class 'lagged_df' summary(object, ...)
## S3 method for class 'lagged_df' summary(object, ...)
object |
An object of class 'lagged_df' from |
... |
Not used. |
A printed summary of the contents of the lagged_df object.
Train a user-defined forecast model for each horizon, 'h', and across the validation
datasets, 'd'. If method = "direct"
, a total of 'h' * 'd' models are trained.
If method = "multi_output"
, a total of 1 * 'd' models are trained.
These models can be trained in parallel with the future
package.
train_model( lagged_df, windows, model_name, model_function, ..., use_future = FALSE, python = FALSE )
train_model( lagged_df, windows, model_name, model_function, ..., use_future = FALSE, python = FALSE )
lagged_df |
An object of class 'lagged_df' from |
windows |
An object of class 'windows' from |
model_name |
A name for the model. |
model_function |
A user-defined wrapper function for model training that takes the following
arguments: (1) a horizon-specific data.frame made with |
... |
Optional. Named arguments passed into the user-defined |
use_future |
Boolean. If |
python |
Boolean. If |
An S3 object of class 'forecast_model': A nested list of trained models. Models can be accessed with
my_trained_model$horizon_h$window_w$model
where 'h' gives the forecast horizon and 'w' gives
the validation dataset window number from create_windows()
.
The output of train_model
can be passed into
and has the following generic S3 methods
plot
(from predict.forecast_model(data = create_lagged_df(..., type = "train"))
)
plot
(from predict.forecast_model(data = create_lagged_df(..., type = "forecast"))
)
# Sampled Seatbelts data from the R package datasets. data("data_seatbelts", package = "forecastML") # Example - Training data for 2 horizon-specific models w/ common lags per predictor. horizons <- c(1, 12) lookback <- 1:15 data_train <- create_lagged_df(data_seatbelts, type = "train", outcome_col = 1, lookback = lookback, horizon = horizons) # One custom validation window at the end of the dataset. windows <- create_windows(data_train, window_start = 181, window_stop = 192) # User-define model - LASSO # A user-defined wrapper function for model training that takes the following # arguments: (1) a horizon-specific data.frame made with create_lagged_df(..., type = "train") # (e.g., my_lagged_df$horizon_h) and, optionally, (2) any number of additional named arguments # which are passed as '...' in train_model(). library(glmnet) model_function <- function(data, my_outcome_col) { x <- data[, -(my_outcome_col), drop = FALSE] y <- data[, my_outcome_col, drop = FALSE] x <- as.matrix(x, ncol = ncol(x)) y <- as.matrix(y, ncol = ncol(y)) model <- glmnet::cv.glmnet(x, y, nfolds = 3) return(model) } # my_outcome_col = 1 is passed in ... but could have been defined in model_function(). model_results <- train_model(data_train, windows, model_name = "LASSO", model_function, my_outcome_col = 1) # View the results for the model (a) trained on the first horizon # and (b) to be assessed on the first outer-loop validation window. model_results$horizon_1$window_1$model
# Sampled Seatbelts data from the R package datasets. data("data_seatbelts", package = "forecastML") # Example - Training data for 2 horizon-specific models w/ common lags per predictor. horizons <- c(1, 12) lookback <- 1:15 data_train <- create_lagged_df(data_seatbelts, type = "train", outcome_col = 1, lookback = lookback, horizon = horizons) # One custom validation window at the end of the dataset. windows <- create_windows(data_train, window_start = 181, window_stop = 192) # User-define model - LASSO # A user-defined wrapper function for model training that takes the following # arguments: (1) a horizon-specific data.frame made with create_lagged_df(..., type = "train") # (e.g., my_lagged_df$horizon_h) and, optionally, (2) any number of additional named arguments # which are passed as '...' in train_model(). library(glmnet) model_function <- function(data, my_outcome_col) { x <- data[, -(my_outcome_col), drop = FALSE] y <- data[, my_outcome_col, drop = FALSE] x <- as.matrix(x, ncol = ncol(x)) y <- as.matrix(y, ncol = ncol(y)) model <- glmnet::cv.glmnet(x, y, nfolds = 3) return(model) } # my_outcome_col = 1 is passed in ... but could have been defined in model_function(). model_results <- train_model(data_train, windows, model_name = "LASSO", model_function, my_outcome_col = 1) # View the results for the model (a) trained on the first horizon # and (b) to be assessed on the first outer-loop validation window. model_results$horizon_1$window_1$model