Forecast Benchmark: Dynrmf (A New Serious Competitor in Town) vs. Theta Method for M-competitions and Tourism Competition

In the world of time series forecasting, benchmarking methods against established competitors are critical. Today I’m sharing the results of an extensive comparison between two prediction approaches: the classic Theta method (notoriously difficult to beat on chosen benchmark datasets) and the newer Dynrmf method (from R and Python packages forward) algorithm, tested in three major prediction competitions.

The setup

I evaluated both methods for:

M3 competition: 3,003 various time series
M1 competition: The original M-competition dataset (1001 time series)
Tourist competition: Tourism-specific time series data (1311 time series)

For each dataset, I trained both models on historical data and evaluated their predictions against existing test sets using standard accuracy metrics (ME, RMSE, MAE, MPE, MAPE, MASE, ACF1). Each model uses its default parameters.

Implementation details

The analysis used dual-core parallel processing to speed up calculations over thousands of sequences. Both methods received identical train/test splits for fair comparison – full code at the end of the post.

# Example forecasting loop structure
metrics <- foreach(i = 1:n, .combine = rbind)%dopar%{
  series <- dataset[[i]]
  train <- series$x
  test <- series$xx
  fit <- method(y = train, h = length(test))
  forecast::accuracy(fit)
}

Results

It’s worth mentioning that Theta was the clear winner of the M3 competition (and it has improved in many ways), and of course it’s always easier to beat the winner in retrospect.

I performed paired t-tests on the difference in statistics (Theta – Dynrmf) in all series of each competition. Here’s what the numbers reveal:

M3 competition results

Metric	t-statistic	p-value	Interpretation
ME	16.96	<0.001	Theta significantly more biased
RMSE	10.17	<0.001	Theta significantly worse
MAE	7.65	<0.001	Theta significantly worse
M.P.E	-5.59	<0.001	Dynrmf more biased
CARD	1.29	0.197	No significant difference
MASE	14.09	<0.001	Theta significantly worse
ACF1	-1.37	0.171	No significant difference

Pronunciation: Dynrmf demonstrates significantly better accuracy on M3 data for most error metrics.

M1 competition results

Metric	t-statistic	p-value	Interpretation
ME	2.56	0.011	Theta is more biased
RMSE	2.94	0.003	Theta significantly worse
MAE	2.28	0.023	Theta significantly worse
M.P.E	-0.61	0.540	No significant difference
CARD	0.38	0.703	No significant difference
MASE	-0.74	0.461	No significant difference
ACF1	-6.50	<0.001	Dynrmf residuals better

Pronunciation: Dynrmf shows moderate but statistically significant improvements, especially in RMSE and MAE.

Results of the tourism competition

Metric	t-statistic	p-value	Interpretation
ME	1.62	0.105	No significant difference
RMSE	1.88	0.060	Marginal benefit for Dynrmf
MAE	1.16	0.245	No significant difference
M.P.E	4.52	<0.001	Theta less biased
CARD	-5.03	<0.001	Dynrmf significantly better
MASE	-5.17	<0.001	Dynrmf significantly better
ACF1	-8.65	<0.001	Dynrmf residuals better

Pronunciation: In the field of tourism data, Dynrmf excels in scaled metrics (MAPE, MASE) and produces better behaved residuals.

Key Takeaways

Dynrmf consistently outperforms Theta on accuracy metrics such as RMSE, MAE and MASE in all competitions
The advantage is strongest on M3which suggests that Dynrmf can handle different series well
Tourism data shows the power of Dynrmf in percentage and scaled error statistics
Residual autocorrelation (ACF1) is consistently better with Dynrmf, indicating that patterns are captured more completely
Model-agnostic flexibility: It is important to mention that unlike method-specific approaches, Dynrmf accepts any fitting function (and the default here is automatic Ridge regression that minimizes generalized cross-validation errors) via its fit_func parameter (see here), allowing the use of Ridge regression (default), Random Forest, XGBoost, Support Vector Machines or any other model with standard fit/predict interfaces

Code availability

The full R implementation is available below. The analysis uses parallel processing for efficiency on large competitive datasets.

library(ahead)
library(Mcomp)
library(Tcomp)
library(foreach)
library(doParallel)
library(doSNOW)

setwd("~/Documents/Papers/to_submit/2026-01-01-dynrmf-vs-Theta-on-M1-M3-Tourism")

# 1. M3 comp. results --------------

data(M3)

n <- length(Mcomp::M3)

training_indices <- seq_len(n)

# Theta -----------

metrics_theta <- rep(NA, n)

pb <- utils::txtProgressBar(max = n, style = 3)

doParallel::registerDoParallel(cl=2)

# looping on all the training set time series
metrics_theta <- foreach::foreach(i = 1:n, 
                 .combine = rbind, .verbose = TRUE)%dopar%{
  series <- M3[[i]]
  train <- series$x
  test <- series$xx
  fit <- forecast::thetaf(
    y = train,
    h = length(test))
  utils::setTxtProgressBar(pb, i)
  forecast::accuracy(fit)
}
close(pb)

print(metrics_theta)

# Dynrmf -----------


metrics_dynrmf <- rep(NA, n)

pb <- utils::txtProgressBar(max = n, style = 3)

doParallel::registerDoParallel(cl=2)

# looping on all the training set time series
metrics_dynrmf <- foreach::foreach(i = 1:n, 
                                  .combine = rbind, .verbose = TRUE)%dopar%{
                                    series <- M3[[i]]
                                    train <- series$x
                                    test <- series$xx
                                    fit <- ahead::dynrmf(
                                      y = train,
                                      h = length(test))
                                    utils::setTxtProgressBar(pb, i)
                                    forecast::accuracy(fit)
                                  }
close(pb)

print(metrics_dynrmf)

diff_metrics <- metrics_theta - metrics_dynrmf

M3_results <- apply(diff_metrics, 2, t.test)

M3_results <- apply(diff_metrics, 2, function(x) {z <- t.test(x); return(c(z$statistic, z$p.value))})
rownames(M3_results) <- c("statistic", "p-value")



# 2. M1 comp. results -----------------

data(M1)

n <- length(Mcomp::M1)

training_indices <- seq_len(n)

# Theta -----------

metrics_theta <- rep(NA, n)

pb <- utils::txtProgressBar(max = n, style = 3)

doParallel::registerDoParallel(cl=2)

# looping on all the training set time series
metrics_theta <- foreach::foreach(i = 1:n, 
                                  .combine = rbind, .verbose = TRUE)%dopar%{
                                    series <- M1[[i]]
                                    train <- series$x
                                    test <- series$xx
                                    fit <- forecast::thetaf(
                                      y = train,
                                      h = length(test))
                                    utils::setTxtProgressBar(pb, i)
                                    forecast::accuracy(fit)
                                  }
close(pb)

print(metrics_theta)

# Dynrmf -----------

metrics_dynrmf <- rep(NA, n)

pb <- utils::txtProgressBar(max = n, style = 3)

doParallel::registerDoParallel(cl=2)

# looping on all the training set time series
metrics_dynrmf <- foreach::foreach(i = 1:n, 
                                   .combine = rbind, .verbose = TRUE)%dopar%{
                                     series <- M1[[i]]
                                     train <- series$x
                                     test <- series$xx
                                     fit <- ahead::dynrmf(
                                       y = train,
                                       h = length(test))
                                     utils::setTxtProgressBar(pb, i)
                                     forecast::accuracy(fit)
                                   }
close(pb)

print(metrics_dynrmf)

diff_metrics <- metrics_theta - metrics_dynrmf

M1_results <- apply(diff_metrics, 2, function(x) {z <- t.test(x); return(c(z$statistic, z$p.value))})
rownames(M1_results) <- c("statistic", "p-value")


# 2. Tourism comp. results -----------------

data(tourism)

n <- length(Tcomp::tourism)

training_indices <- seq_len(n)

# Theta -----------

metrics_theta <- rep(NA, n)

pb <- utils::txtProgressBar(max = n, style = 3)

doParallel::registerDoParallel(cl=2)

# looping on all the training set time series
metrics_theta <- foreach::foreach(i = 1:n, 
                                  .combine = rbind, .verbose = TRUE)%dopar%{
                                    series <- tourism[[i]]
                                    train <- series$x
                                    test <- series$xx
                                    fit <- forecast::thetaf(
                                      y = train,
                                      h = length(test))
                                    utils::setTxtProgressBar(pb, i)
                                    forecast::accuracy(fit)
                                  }
close(pb)

print(metrics_theta)

# Dynrmf -----------

metrics_dynrmf <- rep(NA, n)

pb <- utils::txtProgressBar(max = n, style = 3)

doParallel::registerDoParallel(cl=2)

# looping on all the training set time series
metrics_dynrmf <- foreach::foreach(i = 1:n, 
                                   .combine = rbind, .verbose = TRUE)%dopar%{
                                     series <- tourism[[i]]
                                     train <- series$x
                                     test <- series$xx
                                     fit <- ahead::dynrmf(
                                       y = train,
                                       h = length(test))
                                     utils::setTxtProgressBar(pb, i)
                                     forecast::accuracy(fit)
                                   }
close(pb)

print(metrics_dynrmf)

diff_metrics <- metrics_theta - metrics_dynrmf

diff_metrics <- diff_metrics[is.finite(diff_metrics[, 4]), ]

Tourism_results <- apply(diff_metrics, 2, function(x) {z <- t.test(x); return(c(z$statistic, z$p.value))})
rownames(Tourism_results) <- c("statistic", "p-value")

# # Theta - dynrmf+default
# 
# ```R
# kableExtra::kable(M3_results)
# kableExtra::kable(M1_results)
# kableExtra::kable(Tourism_results)
# 
# |          |       ME|     RMSE|      MAE|       MPE|      MAPE|    MASE|       ACF1|
#   |:---------|--------:|--------:|--------:|---------:|---------:|-------:|----------:|
#   |statistic | 16.96265| 10.16816| 7.652445| -5.586289| 1.2900974| 14.0924| -1.3680305|
#   |p-value   |  0.00000|  0.00000| 0.000000|  0.000000| 0.1971162|  0.0000|  0.1714049|
# 
# |          |        ME|      RMSE|       MAE|        MPE|      MAPE|       MASE|      ACF1|
#   |:---------|---------:|---------:|---------:|----------:|---------:|----------:|---------:|
#   |statistic | 2.5575962| 2.9406984| 2.2815166| -0.6129693| 0.3808239| -0.7378457| -6.497644|
#   |p-value   | 0.0106864| 0.0033502| 0.0227274|  0.5400360| 0.7034148|  0.4607813|  0.000000|
# 
# |          |        ME|      RMSE|       MAE|       MPE|       MAPE|       MASE|      ACF1|
#   |:---------|---------:|---------:|---------:|---------:|----------:|----------:|---------:|
#   |statistic | 1.6239829| 1.8845405| 1.1628106| 4.5207962| -5.0276196| -5.1743478| -8.648504|
#   |p-value   | 0.1046342| 0.0597262| 0.2451306| 0.0000068|  0.0000006|  0.0000003|  0.000000|
#   
# ```

What forecasting methods have you found most reliable in your work? Share your experiences in the comments below.

Sign and share this petition https://www.change.org/stop_torturing_T_Moudiki – after seriously researching my background and contributions to the field.

#Forecast #Benchmark #Dynrmf #Competitor #Town #Theta #Method #Mcompetitions #Tourism #Competition #bloggers

Forecast Benchmark: Dynrmf (A New Serious Competitor in Town) vs. Theta Method for M-competitions and Tourism Competition | R bloggers

The setup

Implementation details