The “Best-Model” you see in both the back testing and future forecast outputs are chosen based on what had the best accuracy over the back testing process. After all individual, ensemble, and average model forecast are created for both back testing and the future forecast, a weighted MAPE calculation is applied to each unique data combo and model combination.
A standard MAPE calculation is produced first, then instead of a simple average to get the final MAPE a weighted MAPE is taken based on the size of the target variable value. Please see below for an example of the process.
#> Simple Back Test Results
#> # A tibble: 10 × 8
#> Combo Date Model FCST Target MAPE Target_Total Percent_Total
#> <chr> <date> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 Country_1 2020-01-01 arima 9 10 0.1 150 0.0667
#> 2 Country_1 2020-02-01 arima 23 20 0.15 150 0.133
#> 3 Country_1 2020-03-01 arima 35 30 0.167 150 0.2
#> 4 Country_1 2020-04-01 arima 41 40 0.025 150 0.267
#> 5 Country_1 2020-05-01 arima 48 50 0.04 150 0.333
#> 6 Country_1 2020-01-01 ets 7 10 0.3 150 0.0667
#> 7 Country_1 2020-02-01 ets 22 20 0.1 150 0.133
#> 8 Country_1 2020-03-01 ets 29 30 0.0333 150 0.2
#> 9 Country_1 2020-04-01 ets 42 40 0.05 150 0.267
#> 10 Country_1 2020-05-01 ets 53 50 0.06 150 0.333
#>
#> Overall Model Accuracy by Combo
#> # A tibble: 2 × 4
#> Combo Model MAPE Weighted_MAPE
#> <chr> <chr> <dbl> <dbl>
#> 1 Country_1 arima 0.0963 0.08
#> 2 Country_1 ets 0.109 0.0733
During the simple back test process above, arima seems to be the better model from a pure MAPE perspective, but ETS ends up being the winner when using weighted MAPE. The benefits of weighted MAPE allow finnts to find the optimal model that performs the best on the biggest components of a forecast, which comes with the added benefit of putting more weight on more recent observations since those are more likely to have larger target values then ones further into the past. Another way of putting more weight on more recent observations is how Finn overlaps its back testing scenarios. This means the most recent observations are tested for accuracy in different forecast horizons (H=1, H=2, etc). More info on this in the back testing vignette.
User of Finn can also take the Finn outputs, create their own accuracy metrics, and choose their own best models since all model results are written to disk.