Hierarchical Forecasting

The finnts package leverages the great work of the hts package. It’s currently retired but we think old dogs can still learn new tricks! It’s still of great use to finnts and allows for both standard and grouped hierarchical forecasting.

Standard Hierarchy

A standard hierarchy of data is pretty straight forward. Each combo variable can be aggregated into the next combo variable when producing forecasts at higher aggregations of the data. This structure resembles a pyramid, with the bottom being the lowest granularity of time series and the top being a single time series of the grand total of the data. Below is a good example with geographical combo variables that can be aggregated into one another when building a standard hierarchy forecast.

#> Standard Hierarchical Time Series Data
#> # A tibble: 9 × 5
#>   Continent     Country       City        Date       Target
#>   <chr>         <chr>         <chr>       <date>      <dbl>
#> 1 North America United States Kansas City 2020-01-01    100
#> 2 North America United States Kansas City 2020-02-01    250
#> 3 North America United States Kansas City 2020-03-01    320
#> 4 North America United States Seattle     2020-01-01     80
#> 5 North America United States Seattle     2020-02-01    200
#> 6 North America United States Seattle     2020-03-01    270
#> 7 North America Mexico        Mexico City 2020-01-01     50
#> 8 North America Mexico        Mexico City 2020-02-01     80
#> 9 North America Mexico        Mexico City 2020-03-01    120

In the above example, “City” was the lowest level of the hierarchy, which feeds into “Country”, which then feeds into “Continent”. Finn will take this data and will forecast by City, total Country, and total Continent. After each model is ran for every level in the hierarchy, the best model is chosen at each level, then the “Best Model” and every other model is reconciled back down to the lowest level.

Grouped Hierarchy

Grouped hierarchies are very different than the traditional hierarchy approach described above. There are some data sets that can be aggregated in various ways, meaning they need to follow another approach the hts package calls “grouped”. A good example is a data set that contains historical time series by geography, customer segment, and product.

#> Grouped Hierarchical Time Series Data
#> # A tibble: 12 × 5
#>    Country       Segment       Product Date       Target
#>    <chr>         <chr>         <chr>   <date>      <dbl>
#>  1 United States Enterprise    Coffee  2020-01-01     10
#>  2 United States Enterprise    Coffee  2020-02-01     20
#>  3 United States Enterprise    Coffee  2020-03-01     30
#>  4 United States Public Sector Coffee  2020-01-01      5
#>  5 United States Public Sector Coffee  2020-02-01      8
#>  6 United States Public Sector Coffee  2020-03-01     11
#>  7 Mexico        Enterprise    Coffee  2020-01-01     20
#>  8 Mexico        Enterprise    Coffee  2020-02-01     23
#>  9 Mexico        Enterprise    Coffee  2020-03-01     27
#> 10 Mexico        Enterprise    Tea     2020-01-01     50
#> 11 Mexico        Enterprise    Tea     2020-02-01     55
#> 12 Mexico        Enterprise    Tea     2020-03-01     60

It would be hard to aggregate the above data in a traditional hierarchy. The same products are found in different segments and countries, also the same segments are found in multiple countries. Finn will follow a similar modeling process as the one described for a traditional hierarchy, but instead will create forecasts at the below levels.

Grand Total: model “Target” across the sum of “Country”, “Segment”, and “Product”
Country: model “Target” across the sum of “Country”. Creating forecasts for the grand total of “United States” and “Mexico”
Segment: model “Target” across the sum of “Segment”. Creating forecasts for the grand total of “Enterprise” and “Public Sector”
Product: model “Target” across the sum of “Product”. Creating forecasts for the grand total of “Coffee” and “Tea”.

External Regressors

The aggregation process is automatically calculated for each external regressor, depending on how the regressor maps to the combo variables. If a regressor is unique for each time series, then the standard aggregation process is implemented (same as the process for the target variable). If the regressor repeats the same values across all time series, then only one global copy is retained and applied to all new aggregated time series. If the regressor specifically maps to one or more combo variables, then the relationship is respected while still being able to aggregate to the total level. Aggregations of drivers are always summed.

The only limitation is when an external regressor maps to the middle of a standard hierarchy (not at the global level or bottom level), in this case the regressor will be summed at the global level across the hierarchy. This is due to limitations of aggregation naming in the hts package used in finnts.

Explore the final results of the aggregations by seeing the end result using get_prepped_data().

Spark Parallel Processing

There is also a small limitation when doing hierarchical forecasting using spark as the parallel computing back-end. The hts package finnts uses cannot handle spark data frames. So behind the scenes finnts has to bring all of the data into memory on the spark cluster driver node in order to create the hierarchies. Please keep that in mind. If you have a lot of historical data that is greater than the RAM on the spark driver node, please consider using a different VM size with more memory. The same issue holds true when reconciling the hierarchical forecast, where all of the forecasts across all time series for a specific model is loaded into the same executor node to be reconciled. Because of these limitations, we are now exploring other options outside of the hts package, including the building of our own hierarchical forecasting package. Stay tuned.

Corner Cases in Forecast Reconciliation

Some forecasts can be widely off from the target value. This can cause issues when finnts tries to use historical back testing residuals as the initial weights in hts to reconcile the forecast at various aggregations of your data down to the lowest level. To prevent reconciliation issues, finnts will shrink these residuals to more appropriate numbers that still convey they are considerably off from the initial target values, but not astronomical. For example a residual value of 1,000,000,000,000 compared to a target value of 100 might get shrunk down to 900, since keeping the initial residual will throw the weighted reconciliation process out of whack and cause issues. Most forecasts in finnts will not have this problem, but if your data is very noisy then this fix will help prevent any issues when trying to run hierarchical forecasts in finnts.