In the world of time series forecasting, benchmarking methods against established competitors are critical. Today I’m sharing the results of an extensive comparison between two prediction approaches: the classic Theta method (notoriously difficult to beat on chosen benchmark datasets) and the newer Dynrmf method (from R and Python packages forward) algorithm, tested in three major prediction competitions.
The setup
I evaluated both methods for:
- M3 competition: 3,003 various time series
- M1 competition: The original M-competition dataset (1001 time series)
- Tourist competition: Tourism-specific time series data (1311 time series)
For each dataset, I trained both models on historical data and evaluated their predictions against existing test sets using standard accuracy metrics (ME, RMSE, MAE, MPE, MAPE, MASE, ACF1). Each model uses its default parameters.
Implementation details
The analysis used dual-core parallel processing to speed up calculations over thousands of sequences. Both methods received identical train/test splits for fair comparison – full code at the end of the post.
# Example forecasting loop structure
metrics <- foreach(i = 1:n, .combine = rbind)%dopar%{
series <- dataset[[i]]
train <- series$x
test <- series$xx
fit <- method(y = train, h = length(test))
forecast::accuracy(fit)
}
Results
It’s worth mentioning that Theta was the clear winner of the M3 competition (and it has improved in many ways), and of course it’s always easier to beat the winner in retrospect.
I performed paired t-tests on the difference in statistics (Theta – Dynrmf) in all series of each competition. Here’s what the numbers reveal:
M3 competition results
| Metric | t-statistic | p-value | Interpretation |
|---|---|---|---|
| ME | 16.96 | <0.001 | Theta significantly more biased |
| RMSE | 10.17 | <0.001 | Theta significantly worse |
| MAE | 7.65 | <0.001 | Theta significantly worse |
| M.P.E | -5.59 | <0.001 | Dynrmf more biased |
| CARD | 1.29 | 0.197 | No significant difference |
| MASE | 14.09 | <0.001 | Theta significantly worse |
| ACF1 | -1.37 | 0.171 | No significant difference |
Pronunciation: Dynrmf demonstrates significantly better accuracy on M3 data for most error metrics.
M1 competition results
| Metric | t-statistic | p-value | Interpretation |
|---|---|---|---|
| ME | 2.56 | 0.011 | Theta is more biased |
| RMSE | 2.94 | 0.003 | Theta significantly worse |
| MAE | 2.28 | 0.023 | Theta significantly worse |
| M.P.E | -0.61 | 0.540 | No significant difference |
| CARD | 0.38 | 0.703 | No significant difference |
| MASE | -0.74 | 0.461 | No significant difference |
| ACF1 | -6.50 | <0.001 | Dynrmf residuals better |
Pronunciation: Dynrmf shows moderate but statistically significant improvements, especially in RMSE and MAE.
Results of the tourism competition
| Metric | t-statistic | p-value | Interpretation |
|---|---|---|---|
| ME | 1.62 | 0.105 | No significant difference |
| RMSE | 1.88 | 0.060 | Marginal benefit for Dynrmf |
| MAE | 1.16 | 0.245 | No significant difference |
| M.P.E | 4.52 | <0.001 | Theta less biased |
| CARD | -5.03 | <0.001 | Dynrmf significantly better |
| MASE | -5.17 | <0.001 | Dynrmf significantly better |
| ACF1 | -8.65 | <0.001 | Dynrmf residuals better |
Pronunciation: In the field of tourism data, Dynrmf excels in scaled metrics (MAPE, MASE) and produces better behaved residuals.
Key Takeaways
- Dynrmf consistently outperforms Theta on accuracy metrics such as RMSE, MAE and MASE in all competitions
- The advantage is strongest on M3which suggests that Dynrmf can handle different series well
- Tourism data shows the power of Dynrmf in percentage and scaled error statistics
- Residual autocorrelation (ACF1) is consistently better with Dynrmf, indicating that patterns are captured more completely
- Model-agnostic flexibility: It is important to mention that unlike method-specific approaches, Dynrmf accepts any fitting function (and the default here is automatic Ridge regression that minimizes generalized cross-validation errors) via its
fit_funcparameter (see here), allowing the use of Ridge regression (default), Random Forest, XGBoost, Support Vector Machines or any other model with standard fit/predict interfaces
Code availability
The full R implementation is available below. The analysis uses parallel processing for efficiency on large competitive datasets.
library(ahead)
library(Mcomp)
library(Tcomp)
library(foreach)
library(doParallel)
library(doSNOW)
setwd("~/Documents/Papers/to_submit/2026-01-01-dynrmf-vs-Theta-on-M1-M3-Tourism")
# 1. M3 comp. results --------------
data(M3)
n <- length(Mcomp::M3)
training_indices <- seq_len(n)
# Theta -----------
metrics_theta <- rep(NA, n)
pb <- utils::txtProgressBar(max = n, style = 3)
doParallel::registerDoParallel(cl=2)
# looping on all the training set time series
metrics_theta <- foreach::foreach(i = 1:n,
.combine = rbind, .verbose = TRUE)%dopar%{
series <- M3[[i]]
train <- series$x
test <- series$xx
fit <- forecast::thetaf(
y = train,
h = length(test))
utils::setTxtProgressBar(pb, i)
forecast::accuracy(fit)
}
close(pb)
print(metrics_theta)
# Dynrmf -----------
metrics_dynrmf <- rep(NA, n)
pb <- utils::txtProgressBar(max = n, style = 3)
doParallel::registerDoParallel(cl=2)
# looping on all the training set time series
metrics_dynrmf <- foreach::foreach(i = 1:n,
.combine = rbind, .verbose = TRUE)%dopar%{
series <- M3[[i]]
train <- series$x
test <- series$xx
fit <- ahead::dynrmf(
y = train,
h = length(test))
utils::setTxtProgressBar(pb, i)
forecast::accuracy(fit)
}
close(pb)
print(metrics_dynrmf)
diff_metrics <- metrics_theta - metrics_dynrmf
M3_results <- apply(diff_metrics, 2, t.test)
M3_results <- apply(diff_metrics, 2, function(x) {z <- t.test(x); return(c(z$statistic, z$p.value))})
rownames(M3_results) <- c("statistic", "p-value")
# 2. M1 comp. results -----------------
data(M1)
n <- length(Mcomp::M1)
training_indices <- seq_len(n)
# Theta -----------
metrics_theta <- rep(NA, n)
pb <- utils::txtProgressBar(max = n, style = 3)
doParallel::registerDoParallel(cl=2)
# looping on all the training set time series
metrics_theta <- foreach::foreach(i = 1:n,
.combine = rbind, .verbose = TRUE)%dopar%{
series <- M1[[i]]
train <- series$x
test <- series$xx
fit <- forecast::thetaf(
y = train,
h = length(test))
utils::setTxtProgressBar(pb, i)
forecast::accuracy(fit)
}
close(pb)
print(metrics_theta)
# Dynrmf -----------
metrics_dynrmf <- rep(NA, n)
pb <- utils::txtProgressBar(max = n, style = 3)
doParallel::registerDoParallel(cl=2)
# looping on all the training set time series
metrics_dynrmf <- foreach::foreach(i = 1:n,
.combine = rbind, .verbose = TRUE)%dopar%{
series <- M1[[i]]
train <- series$x
test <- series$xx
fit <- ahead::dynrmf(
y = train,
h = length(test))
utils::setTxtProgressBar(pb, i)
forecast::accuracy(fit)
}
close(pb)
print(metrics_dynrmf)
diff_metrics <- metrics_theta - metrics_dynrmf
M1_results <- apply(diff_metrics, 2, function(x) {z <- t.test(x); return(c(z$statistic, z$p.value))})
rownames(M1_results) <- c("statistic", "p-value")
# 2. Tourism comp. results -----------------
data(tourism)
n <- length(Tcomp::tourism)
training_indices <- seq_len(n)
# Theta -----------
metrics_theta <- rep(NA, n)
pb <- utils::txtProgressBar(max = n, style = 3)
doParallel::registerDoParallel(cl=2)
# looping on all the training set time series
metrics_theta <- foreach::foreach(i = 1:n,
.combine = rbind, .verbose = TRUE)%dopar%{
series <- tourism[[i]]
train <- series$x
test <- series$xx
fit <- forecast::thetaf(
y = train,
h = length(test))
utils::setTxtProgressBar(pb, i)
forecast::accuracy(fit)
}
close(pb)
print(metrics_theta)
# Dynrmf -----------
metrics_dynrmf <- rep(NA, n)
pb <- utils::txtProgressBar(max = n, style = 3)
doParallel::registerDoParallel(cl=2)
# looping on all the training set time series
metrics_dynrmf <- foreach::foreach(i = 1:n,
.combine = rbind, .verbose = TRUE)%dopar%{
series <- tourism[[i]]
train <- series$x
test <- series$xx
fit <- ahead::dynrmf(
y = train,
h = length(test))
utils::setTxtProgressBar(pb, i)
forecast::accuracy(fit)
}
close(pb)
print(metrics_dynrmf)
diff_metrics <- metrics_theta - metrics_dynrmf
diff_metrics <- diff_metrics[is.finite(diff_metrics[, 4]), ]
Tourism_results <- apply(diff_metrics, 2, function(x) {z <- t.test(x); return(c(z$statistic, z$p.value))})
rownames(Tourism_results) <- c("statistic", "p-value")
# # Theta - dynrmf+default
#
# ```R
# kableExtra::kable(M3_results)
# kableExtra::kable(M1_results)
# kableExtra::kable(Tourism_results)
#
# | | ME| RMSE| MAE| MPE| MAPE| MASE| ACF1|
# |:---------|--------:|--------:|--------:|---------:|---------:|-------:|----------:|
# |statistic | 16.96265| 10.16816| 7.652445| -5.586289| 1.2900974| 14.0924| -1.3680305|
# |p-value | 0.00000| 0.00000| 0.000000| 0.000000| 0.1971162| 0.0000| 0.1714049|
#
# | | ME| RMSE| MAE| MPE| MAPE| MASE| ACF1|
# |:---------|---------:|---------:|---------:|----------:|---------:|----------:|---------:|
# |statistic | 2.5575962| 2.9406984| 2.2815166| -0.6129693| 0.3808239| -0.7378457| -6.497644|
# |p-value | 0.0106864| 0.0033502| 0.0227274| 0.5400360| 0.7034148| 0.4607813| 0.000000|
#
# | | ME| RMSE| MAE| MPE| MAPE| MASE| ACF1|
# |:---------|---------:|---------:|---------:|---------:|----------:|----------:|---------:|
# |statistic | 1.6239829| 1.8845405| 1.1628106| 4.5207962| -5.0276196| -5.1743478| -8.648504|
# |p-value | 0.1046342| 0.0597262| 0.2451306| 0.0000068| 0.0000006| 0.0000003| 0.000000|
#
# ```
What forecasting methods have you found most reliable in your work? Share your experiences in the comments below.
Sign and share this petition https://www.change.org/stop_torturing_T_Moudiki – after seriously researching my background and contributions to the field.
Related
#Forecast #Benchmark #Dynrmf #Competitor #Town #Theta #Method #Mcompetitions #Tourism #Competition #bloggers


