DuckDB
Parquet
Calibration
Ranking
Bayesian
Chance TS
Backtesting
Race analysis as an inference and decision system
Thoroughbred flat racing is not a binary classification problem. It’s one
multi-competitor outcome process
hierarchy (horse / trainer / jockey / track),
time dependence (form cycles, market movements), and decision layers (how to deal with probabilities).
This macro post is intentionally code heavy. The goal is to show a canonical, reproducible R workflow that you can customize: from data contract to modeling, from evaluation to implementation.
- Model layer: calibrated probabilities and uncertainty.
- Decision layer: explicit assumptions, risk management, stress testing.
If you only do one thing, do this: never judge models based on ROI alone. First, use proper scoring rules and calibration.
1) Canonical data contract: race ā runner ā odds snapshots
The fastest way to avoid analytics chaos is to standardize your storage model. A minimal outline allows each chapter, notebook, or experiment to start from the same tables.
DuckDB schema (SQL)
-- sql/schema.sql CREATE TABLE IF NOT EXISTS races ( race_id VARCHAR PRIMARY KEY, race_date DATE, region VARCHAR, track VARCHAR, surface VARCHAR, going VARCHAR, distance_m INTEGER, race_class VARCHAR, purse DOUBLE, field_size INTEGER ); CREATE TABLE IF NOT EXISTS runners ( race_id VARCHAR, runner_id VARCHAR, horse_id VARCHAR, jockey_id VARCHAR, trainer_id VARCHAR, draw INTEGER, weight_kg DOUBLE, age INTEGER, sex VARCHAR, odds_decimal DOUBLE, finish_pos INTEGER, finish_time_s DOUBLE, win INTEGER, PRIMARY KEY (race_id, runner_id) ); CREATE TABLE IF NOT EXISTS odds_snapshots ( race_id VARCHAR, runner_id VARCHAR, ts TIMESTAMP, odds_decimal DOUBLE, traded_vol DOUBLE, PRIMARY KEY (race_id, runner_id, ts) );
It queries Parquet locally with high performance (predicate pushdown, parallel scans) and avoids running a server database for most use cases.
Columnar storage + compression + fast scans. Perfect for large runner-level tables and time series snapshots.
2) Generate a reproducible simulated data set
High quality race data is often licensed. A robust approach is to make each lab executable on a simulator that matches your canonical schema, and then provide adapters to ingest real data only if access is legal.
Racing Simulator (R)
# R/simulate.R
library(dplyr)
library(tidyr)
simulate_races <- function(
n_races = 200, n_horses = 600, n_jockeys = 200, n_trainers = 150,
min_field = 6, max_field = 14, seed = 1
) {
set.seed(seed)
horses <- tibble(horse_id = sprintf("H%04d", 1:n_horses),
ability = rnorm(n_horses, 0, 1))
jockeys <- tibble(jockey_id = sprintf("J%03d", 1:n_jockeys),
skill = rnorm(n_jockeys, 0, 0.4))
trainers<- tibble(trainer_id= sprintf("T%03d", 1:n_trainers),
skill = rnorm(n_trainers, 0, 0.5))
surfaces <- c("Turf","Dirt","AllWeather")
goings <- c("Firm","Good","Gd-Fm","Gd-Sft","Soft","Heavy","Standard","Fast","Sloppy")
races <- tibble(
race_id = sprintf("R%05d", 1:n_races),
race_date = as.Date("2025-01-01") + sample(0:365, n_races, TRUE),
region = sample(c("GB","IE","US","AU","FR"), n_races, TRUE),
track = sample(c("TRACK_A","TRACK_B","TRACK_C"), n_races, TRUE),
surface = sample(surfaces, n_races, TRUE, prob = c(0.45,0.35,0.20)),
going = sample(goings, n_races, TRUE),
distance_m = sample(c(1000,1200,1400,1600,1800,2000,2400,2800), n_races, TRUE),
race_class = sample(c("G1","G2","G3","HCP","ALW","CLM"), n_races, TRUE),
purse = round(exp(rnorm(n_races, log(25000), 0.7))),
field_size = sample(min_field:max_field, n_races, TRUE)
)
runners <- races |>
rowwise() |>
do({
r <- .
k <- r$field_size
df <- tibble(
race_id = r$race_id,
runner_id = paste0(r$race_id, "_", seq_len(k)),
horse_id = sample(horses$horse_id, k, FALSE),
jockey_id = sample(jockeys$jockey_id, k, TRUE),
trainer_id = sample(trainers$trainer_id, k, TRUE),
draw = sample(1:k, k, FALSE),
weight_kg = pmax(45, rnorm(k, 55, 3)),
age = sample(2:7, k, TRUE),
sex = sample(c("C","F","G","H"), k, TRUE, prob = c(0.2,0.35,0.35,0.1))
) |>
left_join(horses, by="horse_id") |>
left_join(jockeys, by="jockey_id", suffix=c("","_j")) |>
left_join(trainers, by="trainer_id", suffix=c("","_t")) |>
mutate(
draw_effect = ifelse(k >= 12, (k + 1 - draw)/k, 0),
surface_effect = case_when(r$surface=="Turf" ~ 0.15,
r$surface=="AllWeather" ~ 0.05,
TRUE ~ 0),
weight_penalty = -0.03*(weight_kg - 55),
utility = ability + skill + skill_t + 0.12*draw_effect +
surface_effect + weight_penalty + rnorm(k, 0, 0.25)
)
# Sequential proportional-to-exp(utility) finish ordering
remaining <- seq_len(k)
w <- exp(df$utility - max(df$utility))
order_idx <- integer(0)
for (pos in seq_len(k)) {
probs <- w[remaining]/sum(w[remaining])
pick <- sample(remaining, 1, prob = probs)
order_idx <- c(order_idx, pick)
remaining <- setdiff(remaining, pick)
}
df$finish_pos <- match(seq_len(k), order_idx)
df$win <- as.integer(df$finish_pos == 1)
base_time <- r$distance_m/16
df$finish_time_s <- base_time - 0.8*df$ability + rnorm(k, 0, 1.2)
# Market-like odds with an overround
p <- exp(df$utility - max(df$utility)); p <- p/sum(p)
overround <- 1.18
df$odds_decimal <- pmin(200, pmax(1.01, 1/(overround*p) * exp(rnorm(k, 0, 0.06))))
df |>
select(race_id, runner_id, horse_id, jockey_id, trainer_id,
draw, weight_kg, age, sex, odds_decimal,
finish_pos, finish_time_s, win)
}) |>
ungroup()
list(races = races, runners = runners)
}
sim <- simulate_races(seed = 42)Rapid EDA health checks
library(ggplot2) ggplot(sim$races, aes(field_size)) + geom_histogram(bins = 12) + labs(x = "Runners per race", y = "Count") ggplot(sim$runners, aes(odds_decimal)) + geom_histogram(bins = 50) + scale_x_log10() + labs(x = "Decimal odds (log scale)", y = "Count")
3) Continue using Parquet and querying DuckDB
A clean pattern is: write Parquet (arrow) ā create tables in DuckDB read_parquet(). This provides a fast, local warehouse without heavy infrastructure.
library(arrow)
library(DBI)
library(duckdb)
dir.create("data/derived", recursive = TRUE, showWarnings = FALSE)
write_parquet(sim$races, "data/derived/races.parquet")
write_parquet(sim$runners, "data/derived/runners.parquet")
con <- dbConnect(duckdb(), "data/warehouse/racing.duckdb")
dbExecute(con, "INSTALL parquet; LOAD parquet;")
dbExecute(con, "
CREATE OR REPLACE TABLE races AS
SELECT * FROM read_parquet('data/derived/races.parquet');
")
dbExecute(con, "
CREATE OR REPLACE TABLE runners AS
SELECT * FROM read_parquet('data/derived/runners.parquet');
")
dbGetQuery(con, "SELECT COUNT(*) AS n_races FROM races;")
dbGetQuery(con, "SELECT COUNT(*) AS n_runners FROM runners;")
dbDisconnect(con, shutdown = TRUE)Search patterns you’ll use all the time
-- Average odds and win rate by surface
SELECT
surface,
AVG(odds_decimal) AS avg_odds,
AVG(win) AS win_rate,
COUNT(*) AS n
FROM runners r
JOIN races x USING (race_id)
GROUP BY surface
ORDER BY n DESC;
-- Within-race normalization of implied probability
WITH base AS (
SELECT
race_id,
runner_id,
odds_decimal,
1.0 / odds_decimal AS implied_raw
FROM runners
)
SELECT
race_id,
runner_id,
implied_raw / SUM(implied_raw) OVER (PARTITION BY race_id) AS implied_p_norm
FROM base;4) Baselines: probabilistic profit modeling + calibration
In race analysis, your model output is typically a probability. This means that evaluation must be given priority correct scoring rules And calibrationnot just AUC.
Feature store (leak proof)
library(dplyr)
features <- sim$runners |>
left_join(sim$races, by = "race_id") |>
transmute(
race_id, runner_id,
win,
# pre-race covariates only
draw, weight_kg, age, sex,
surface, going, distance_m, race_class, purse,
odds_decimal,
log_odds = log(odds_decimal),
implied_p_raw = 1/odds_decimal
)
stopifnot(all(is.finite(features$log_odds)))
stopifnot(all(features$odds_decimal >= 1.01))propermodels GLM baseline + log loss / Brier / AUC
library(tidymodels)
set.seed(123)
df <- features |>
group_by(race_id) |>
mutate(implied_p = implied_p_raw / sum(implied_p_raw)) |>
ungroup() |>
mutate(win = factor(win, levels = c(0,1), labels = c("no","yes")))
spl <- initial_split(df, strata = win)
train<- training(spl)
test <- testing(spl)
rec <- recipe(win ~ draw + weight_kg + age + sex + surface + going +
distance_m + odds_decimal, data = train) |>
step_dummy(all_nominal_predictors()) |>
step_zv(all_predictors()) |>
step_normalize(all_numeric_predictors())
mod <- logistic_reg() |> set_engine("glm")
wf <- workflow() |> add_recipe(rec) |> add_model(mod)
fit <- fit(wf, train)
pred <- predict(fit, test, type = "prob") |>
bind_cols(test |> select(win))
metric_set(mn_log_loss, brier_class, roc_auc)(pred, truth = win, .pred_yes)Calibration plot
library(probably) # Binned calibration plot cal_plot_breaks(pred, truth = win, estimate = .pred_yes, num_breaks = 10)
Calibration Mindset: a model can be ‘accurate’ in ranking, yet systematically overconfident. In the context of betting or pricing, a miscalibration is usually more dangerous than a small drop in AUC.
5) Multi-runner outcome modeling: rankings, not just winners
Binary win models throw away a lot of structure. Racing naturally results in a ranking: finishing positions over several runners. Ranking models allow you to target that structure directly.
Plackett-Luce (direct ranking probability)
library(PlackettLuce)
set.seed(99)
items <- paste0("H", 1:12)
n_races <- 40
ability <- rnorm(length(items)); names(ability) <- items
Rmat <- matrix(0, nrow = n_races, ncol = length(items))
colnames(Rmat) <- items
for (r in 1:n_races) {
field <- sample(items, size = sample(6:10, 1), replace = FALSE)
util <- ability[field] + rnorm(length(field), 0, 0.3)
ord <- field[order(util, decreasing = TRUE)]
Rmat[r, ord] <- seq_along(ord) # 1 = best; 0 = absent
}
rankings <- as.rankings(Rmat)
pl <- PlackettLuce(rankings)
summary(pl)
coef(pl) # worth parametersBradley-Terry via pairwise decomposition
library(dplyr)
library(BradleyTerry2)
pairwise <- tibble()
for (r in 1:n_races) {
in_race <- names(which(Rmat[r, ] > 0))
ranks <- Rmat[r, in_race]
for (i in 1:(length(in_race) - 1)) {
for (j in (i + 1):length(in_race)) {
h1 <- in_race[i]; h2 <- in_race[j]
win1 <- as.integer(ranks[h1] < ranks[h2])
pairwise <- bind_rows(pairwise, tibble(h1=h1, h2=h2, win1=win1, win2=1-win1))
}
}
}
agg <- pairwise |>
group_by(h1, h2) |>
summarise(win1 = sum(win1), win2 = sum(win2), .groups = "drop")
bt <- BTm(cbind(win1, win2), h1, h2, data = agg, id = "horse")
BTabilities(bt)Both families estimate āpower-likeā latent parameters, but they do so with different data representations: full rankings versus aggregated pairwise outcomes.
If you value place markets, exactas/quinellas, or modeling finish times/positions beyond āwinning,ā ranking-aware methods are often better suited to the goal.
6) Hierarchy: GLMMs and Bayesian multilevel models
Racing is fundamentally hierarchical: repeated horses, trainers, jockeys, tracks, regions. Fixed effects can be excessive; Multi-level models allow for shrinkage and better accounting for uncertainty.
GLMM with random trainer/jockey effects (lme4)
library(lme4)
library(dplyr)
sim2 <- simulate_races(n_races=500, n_horses=800, n_jockeys=120, n_trainers=100, seed=202)
df2 <- sim2$runners |> left_join(sim2$races, by="race_id")
m_glmm <- glmer(
win ~ scale(weight_kg) + scale(draw) + surface + going +
(1 | trainer_id) + (1 | jockey_id),
data = df2,
family = binomial()
)
summary(m_glmm)Bayesian multi-level model (brms)
library(brms)
priors <- c(
prior(normal(0, 1), class = "b"),
prior(exponential(1), class = "sd")
)
fit_bayes <- brm(
win ~ scale(weight_kg) + scale(draw) + surface + going +
(1 | trainer_id) + (1 | jockey_id),
data = df2,
family = bernoulli(),
prior = priors,
chains = 2, iter = 1000, cores = 2, seed = 202
)
summary(fit_bayes)
pp_check(fit_bayes) Why Bayesian here?
You get a principled way to encode priors (smaller effects are more likely than large ones), propagate uncertainty via derived probabilities, and perform posterior predictive checks.
7) Time-to-event: modeling survival and vulnerability
Some questions in racing obviously relate to time to event: time to first win, time to peak performance, time to retirement, time between races. Survival analysis gives you the right probability and handles censorship.
Cox proportional hazards
library(dplyr)
library(survival)
set.seed(1)
n <- 1000
horses <- tibble(
horse_id = sprintf("H%04d", 1:n),
talent = rnorm(n),
trainer_id= sample(sprintf("T%03d", 1:80), n, TRUE),
debut_age = pmax(2, rnorm(n, 3.0, 0.6))
)
base_rate <- 0.002
linpred <- 0.65 * horses$talent - 0.25 * (horses$debut_age - 3)
rate <- base_rate * exp(linpred)
time_days <- rexp(n, rate = rate)
censor_days <- runif(n, 60, 700)
status <- as.integer(time_days mutate(time_days = time_obs, status = status)
cox <- coxph(Surv(time_days, status) ~ scale(talent) + scale(debut_age), data = df_surv)
summary(cox)
cox.zph(cox)Vulnerability term (heterogeneity at trainer level)
cox_f <- coxph( Surv(time_days, status) ~ scale(talent) + scale(debut_age) + frailty(trainer_id), data = df_surv ) summary(cox_f)
8) Modern ML: XGBoost in closerdmodels and mlr3
Gradient enhancement can capture nonlinearities and interactions without manual feature engineering. But if you don’t evaluate probabilistically (log loss, calibration), you can accidentally ship a model that “rates well” but is still poorly priced.
neatmodels: tuned XGBoost
library(tidymodels)
library(dplyr)
set.seed(314)
sim3 <- simulate_races(n_races = 500, n_horses = 1200, seed = 314)
df3 <- sim3$runners |>
left_join(sim3$races, by="race_id") |>
mutate(win = factor(win, levels=c(0,1), labels=c("no","yes")))
spl <- initial_split(df3, strata = win)
train <- training(spl)
test <- testing(spl)
rec <- recipe(win ~ draw + weight_kg + age + sex + surface + going +
distance_m + odds_decimal, data=train) |>
step_dummy(all_nominal_predictors()) |>
step_zv(all_predictors())
folds <- vfold_cv(train, v = 5, strata = win)
xgb_spec <- boost_tree(
trees = 800,
tree_depth = tune(),
learn_rate = tune(),
loss_reduction = tune(),
sample_size = tune(),
mtry = tune()
) |>
set_engine("xgboost") |>
set_mode("classification")
wf <- workflow() |> add_recipe(rec) |> add_model(xgb_spec)
grid <- grid_space_filling(
tree_depth(),
learn_rate(),
loss_reduction(),
sample_size = sample_prop(),
mtry(range = c(5, 60)),
size = 20
)
res <- tune_grid(
wf,
resamples = folds,
grid = grid,
metrics = metric_set(mn_log_loss, roc_auc)
)
best <- select_best(res, "mn_log_loss")
final_fit <- finalize_workflow(wf, best) |> fit(train)
pred <- predict(final_fit, test, type="prob") |> bind_cols(test |> select(win))
metric_set(mn_log_loss, brier_class, roc_auc)(pred, truth = win, .pred_yes)mlr3: similar workflow
library(data.table)
library(mlr3)
library(mlr3learners)
set.seed(99)
sim4 <- simulate_races(n_races = 400, n_horses = 900, seed = 99)
df4 <- as.data.table(merge(sim4$runners, sim4$races, by = "race_id"))
df4[, win := factor(ifelse(win == 1, "yes", "no"))]
task <- as_task_classif(df4, target = "win", id = "win_task")
task$set_col_roles(
c("race_id","runner_id","horse_id","jockey_id","trainer_id"),
roles = "name"
)
rsmp <- rsmp("cv", folds = 5)
lrn_glm <- lrn("classif.log_reg", predict_type = "prob")
rr_glm <- resample(task, lrn_glm, rsmp)
rr_glm$aggregate(msr("classif.logloss"))
rr_glm$aggregate(msr("classif.auc"))
lrn_xgb <- lrn("classif.xgboost", predict_type = "prob",
nrounds = 300, eta = 0.05, max_depth = 4)
rr_xgb <- resample(task, lrn_xgb, rsmp)
rr_xgb$aggregate(msr("classif.logloss"))
rr_xgb$aggregate(msr("classif.auc"))9) Odds time series: from snapshots to features
Exchange markets produce timestamps and volumes. Treat them as time series and develop features such as drift, volatility, late steam and liquidity. Keep it honest: the microstructure of the market is noisy and backtests require strong assumptions.
Simulated snapshots as a tsibble + feature extraction
library(dplyr)
library(tsibble)
library(feasts)
library(ggplot2)
set.seed(1)
snap <- tibble(
runner_id = "R00001_01",
ts = as.POSIXct("2025-06-01 14:00:00", tz = "UTC") + seq(0, 60*60, by = 60),
odds_decimal = pmax(1.01, 6 + cumsum(rnorm(61, 0, 0.05))),
traded_vol = cumsum(pmax(0, rnorm(61, 10, 3)))
)
tsbl <- snap |> as_tsibble(index = ts, key = runner_id)
feat <- tsbl |>
features(odds_decimal, list(
mean = ~ mean(.),
sd = ~ sd(.),
min = ~ min(.),
max = ~ max(.),
slope = ~ coef(lm(. ~ seq_along(.)))[2]
))
feat
ggplot(tsbl, aes(ts, odds_decimal)) +
geom_line() +
labs(x = "Time", y = "Decimal odds")Schema normalization: table of snapshots
# Example transformation skeleton for real exchange data:
# 1) parse raw feed messages
# 2) map to normalized columns: race_id, runner_id, ts, odds_decimal, traded_vol
# 3) write Parquet and load into DuckDB
# write_parquet(odds_snapshots, "data/derived/odds_snapshots.parquet")
# dbExecute(con, "CREATE OR REPLACE TABLE odds_snapshots AS
# SELECT * FROM read_parquet('data/derived/odds_snapshots.parquet');")10) Backtesting as a decision layer (fractional Kelly + guardrails)
After you calibrate the probabilities, you can add a decision policy. A commonly used framework is fractional Kelly sizing, but this must be capped and stress tested. The same opportunities can produce very different outcomes depending on execution constraints, market impact, limits and selection filters.
Simulated fractional Kelly example with risk limits
library(dplyr)
library(ggplot2)
sim5 <- simulate_races(n_races = 180, n_horses = 500, seed = 9)
runners <- sim5$runners |>
group_by(race_id) |>
mutate(
p_market = (1/odds_decimal) / sum(1/odds_decimal),
# toy "model": market + noise (for demonstration)
p_model = pmin(0.99, pmax(0.001, p_market + rnorm(n(), 0, 0.02))),
edge = p_model - p_market
) |>
ungroup()
fractional_kelly <- function(p, odds_dec, fraction = 0.25, cap = 0.05) {
b <- odds_dec - 1
f_star <- (b*p - (1 - p)) / b
f <- pmax(0, f_star) * fraction
pmin(f, cap)
}
bankroll <- 1
max_drawdown <- 0.25
peak <- bankroll
hist <- tibble()
for (rid in unique(runners$race_id)) {
if (bankroll < (1 - max_drawdown) * peak) break # guardrail: stop
race <- runners |> filter(race_id == rid)
pick <- race |> arrange(desc(edge)) |> slice(1)
if (pick$edge > 0.01 && pick$odds_decimal mutate(drawdown = 1 - bankroll/peak)
ggplot(hist, aes(seq_along(bankroll), bankroll)) + geom_line() +
labs(x = "Race index", y = "Bankroll")
ggplot(hist, aes(seq_along(drawdown), drawdown)) + geom_line() +
labs(x = "Race index", y = "Drawdown")Practical warning: this example is deliberately simplified. Real backtests require: commissions, slippage, betting limits, late price changes, liquidity restrictions, and selection bias checks.
11) Implementation: Versioned models + HTTP prediction API
If you want your work to be reusable, build a deployable artifact. A clean R-native pattern is:
pins for version control + vetiver for model packaging +
plumber for the API server.
library(tidymodels)
library(pins)
library(vetiver)
library(plumber)
library(dplyr)
sim6 <- simulate_races(n_races = 250, n_horses = 700, seed = 101)
df6 <- sim6$runners |>
left_join(sim6$races, by="race_id") |>
mutate(win = factor(win, levels=c(0,1), labels=c("no","yes")))
rec <- recipe(win ~ draw + weight_kg + age + sex + surface + going +
distance_m + odds_decimal, data = df6) |>
step_dummy(all_nominal_predictors()) |>
step_zv(all_predictors())
fit <- workflow() |>
add_recipe(rec) |>
add_model(logistic_reg() |> set_engine("glm")) |>
fit(df6)
board <- board_folder("pins", versioned = TRUE)
v <- vetiver_model(fit, model_name = "win_prob_glm")
vetiver_pin_write(board, v)
api <- pr() |> vetiver_api(v)
# Run locally:
# api |> pr_run(host = "0.0.0.0", port = 8080)
apiMinimum request payload form (JSON)
{
"draw": 7,
"weight_kg": 55.5,
"age": 4,
"sex": "G",
"surface": "Turf",
"going": "Good",
"distance_m": 1600,
"odds_decimal": 6.2
}12) Reproducibility mechanics: renv + deterministic builds
Reproducibility is not a vibration. Lock dependencies, commit versions, and standardize builds. This is especially important when your analyzes become a book, a report, or a product artifact.
renv workflow
# Initialize (once) # renv::init() # Snapshot current library into renv.lock renv::snapshot() # Restore exact package versions from renv.lock renv::restore()
Dockerfile for deterministic PDF builds (Quarto)
FROM rocker/verse:4.4.2
WORKDIR /book
COPY renv.lock renv.lock
COPY renv/ renv/
COPY .Rprofile .Rprofile
RUN R -q -e "install.packages('renv'); renv::restore(prompt = FALSE)"
COPY . .
CMD ["quarto", "render", "book", "--to", "pdf"]CI Skeleton (GitHub Actions)
name: Render PDF
on:
push:
branches: [ main ]
workflow_dispatch:
jobs:
pdf:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: r-lib/actions/setup-r@v2
- uses: r-lib/actions/setup-renv@v2
- name: Render Quarto book
run: quarto render book --to pdf13) A ‘technical checklist’ that you can actually follow
- Canonical scheme
- Parquet + DuckDB
- Adapters, no redistribution
- GLM/GLMM baselines
- Ranking probabilities
- Bayesian uncertainty
- Wood loss / Brier
- Calibration curves
- Stress-tested backtests
- Opportunity snapshots
- Drift/volatility functions
- Liquidity constraints
- Explicit assumptions
- Fractional Kelly caps
- Extendable railings
- pin version control
- vetiver packaging
- plumber API
Another code snippet: within-race probability normalization
Many race tables store the ‘odds’ per runner, but as an inference you often want a normalized probability within a field. Here is a standard pattern that also emphasizes exaggerated problems.
library(dplyr)
normalize_field_probs <- function(df, odds_col = odds_decimal) {
df |>
group_by(race_id) |>
mutate(
implied_raw = 1 / {{ odds_col }},
overround = sum(implied_raw),
p_market = implied_raw / overround
) |>
ungroup()
}
df_norm <- normalize_field_probs(sim$runners)
df_norm |>
summarise(
avg_overround = mean(overround),
pmin_overround= min(overround),
pmax_overround= max(overround)
)Interpretation: overround > 1 implies the sum of the market’s implied probabilities above 1 (built-in margin). Separating ‘market-implied probability’ from ‘model probability’ is essential if you want a fair evaluation.
#Quantitative #horse #racing #calibration #backtesting #implementation #bloggers

