Quantitative horse racing with R: calibration, backtesting and implementation

R
DuckDB
Parquet
Calibration
Ranking
Bayesian
Chance TS
Backtesting

Race analysis as an inference and decision system

Thoroughbred flat racing is not a binary classification problem. It’s one
multi-competitor outcome process
hierarchy (horse / trainer / jockey / track),
time dependence (form cycles, market movements), and decision layers (how to deal with probabilities).

This macro post is intentionally code heavy. The goal is to show a canonical, reproducible R workflow that you can customize: from data contract to modeling, from evaluation to implementation.

Design principle: treat predictive skills and deployment decisions as separate layers.

Model layer: calibrated probabilities and uncertainty.
Decision layer: explicit assumptions, risk management, stress testing.

If you only do one thing, do this: never judge models based on ROI alone. First, use proper scoring rules and calibration.

1) Canonical data contract: race → runner → odds snapshots

The fastest way to avoid analytics chaos is to standardize your storage model. A minimal outline allows each chapter, notebook, or experiment to start from the same tables.

DuckDB schema (SQL)

-- sql/schema.sql
CREATE TABLE IF NOT EXISTS races (
  race_id     VARCHAR PRIMARY KEY,
  race_date   DATE,
  region      VARCHAR,
  track       VARCHAR,
  surface     VARCHAR,
  going       VARCHAR,
  distance_m  INTEGER,
  race_class  VARCHAR,
  purse       DOUBLE,
  field_size  INTEGER
);

CREATE TABLE IF NOT EXISTS runners (
  race_id        VARCHAR,
  runner_id      VARCHAR,
  horse_id       VARCHAR,
  jockey_id      VARCHAR,
  trainer_id     VARCHAR,
  draw           INTEGER,
  weight_kg      DOUBLE,
  age            INTEGER,
  sex            VARCHAR,
  odds_decimal   DOUBLE,
  finish_pos     INTEGER,
  finish_time_s  DOUBLE,
  win            INTEGER,
  PRIMARY KEY (race_id, runner_id)
);

CREATE TABLE IF NOT EXISTS odds_snapshots (
  race_id      VARCHAR,
  runner_id    VARCHAR,
  ts           TIMESTAMP,
  odds_decimal DOUBLE,
  traded_vol   DOUBLE,
  PRIMARY KEY (race_id, runner_id, ts)
);

Why DuckDB?

It queries Parquet locally with high performance (predicate pushdown, parallel scans) and avoids running a server database for most use cases.

Why parquet?

Columnar storage + compression + fast scans. Perfect for large runner-level tables and time series snapshots.

2) Generate a reproducible simulated data set

High quality race data is often licensed. A robust approach is to make each lab executable on a simulator that matches your canonical schema, and then provide adapters to ingest real data only if access is legal.

Racing Simulator (R)

# R/simulate.R
library(dplyr)
library(tidyr)

simulate_races <- function(
  n_races = 200, n_horses = 600, n_jockeys = 200, n_trainers = 150,
  min_field = 6, max_field = 14, seed = 1
) {
  set.seed(seed)

  horses  <- tibble(horse_id  = sprintf("H%04d", 1:n_horses),
                    ability   = rnorm(n_horses, 0, 1))
  jockeys <- tibble(jockey_id = sprintf("J%03d", 1:n_jockeys),
                    skill     = rnorm(n_jockeys, 0, 0.4))
  trainers<- tibble(trainer_id= sprintf("T%03d", 1:n_trainers),
                    skill     = rnorm(n_trainers, 0, 0.5))

  surfaces <- c("Turf","Dirt","AllWeather")
  goings   <- c("Firm","Good","Gd-Fm","Gd-Sft","Soft","Heavy","Standard","Fast","Sloppy")

  races <- tibble(
    race_id    = sprintf("R%05d", 1:n_races),
    race_date  = as.Date("2025-01-01") + sample(0:365, n_races, TRUE),
    region     = sample(c("GB","IE","US","AU","FR"), n_races, TRUE),
    track      = sample(c("TRACK_A","TRACK_B","TRACK_C"), n_races, TRUE),
    surface    = sample(surfaces, n_races, TRUE, prob = c(0.45,0.35,0.20)),
    going      = sample(goings, n_races, TRUE),
    distance_m = sample(c(1000,1200,1400,1600,1800,2000,2400,2800), n_races, TRUE),
    race_class = sample(c("G1","G2","G3","HCP","ALW","CLM"), n_races, TRUE),
    purse      = round(exp(rnorm(n_races, log(25000), 0.7))),
    field_size = sample(min_field:max_field, n_races, TRUE)
  )

  runners <- races |>
    rowwise() |>
    do({
      r <- .
      k <- r$field_size

      df <- tibble(
        race_id    = r$race_id,
        runner_id  = paste0(r$race_id, "_", seq_len(k)),
        horse_id   = sample(horses$horse_id, k, FALSE),
        jockey_id  = sample(jockeys$jockey_id, k, TRUE),
        trainer_id = sample(trainers$trainer_id, k, TRUE),
        draw       = sample(1:k, k, FALSE),
        weight_kg  = pmax(45, rnorm(k, 55, 3)),
        age        = sample(2:7, k, TRUE),
        sex        = sample(c("C","F","G","H"), k, TRUE, prob = c(0.2,0.35,0.35,0.1))
      ) |>
        left_join(horses, by="horse_id") |>
        left_join(jockeys, by="jockey_id", suffix=c("","_j")) |>
        left_join(trainers, by="trainer_id", suffix=c("","_t")) |>
        mutate(
          draw_effect    = ifelse(k >= 12, (k + 1 - draw)/k, 0),
          surface_effect = case_when(r$surface=="Turf" ~ 0.15,
                                     r$surface=="AllWeather" ~ 0.05,
                                     TRUE ~ 0),
          weight_penalty = -0.03*(weight_kg - 55),
          utility        = ability + skill + skill_t + 0.12*draw_effect +
                           surface_effect + weight_penalty + rnorm(k, 0, 0.25)
        )

      # Sequential proportional-to-exp(utility) finish ordering
      remaining <- seq_len(k)
      w <- exp(df$utility - max(df$utility))
      order_idx <- integer(0)
      for (pos in seq_len(k)) {
        probs <- w[remaining]/sum(w[remaining])
        pick <- sample(remaining, 1, prob = probs)
        order_idx <- c(order_idx, pick)
        remaining <- setdiff(remaining, pick)
      }
      df$finish_pos <- match(seq_len(k), order_idx)
      df$win <- as.integer(df$finish_pos == 1)

      base_time <- r$distance_m/16
      df$finish_time_s <- base_time - 0.8*df$ability + rnorm(k, 0, 1.2)

      # Market-like odds with an overround
      p <- exp(df$utility - max(df$utility)); p <- p/sum(p)
      overround <- 1.18
      df$odds_decimal <- pmin(200, pmax(1.01, 1/(overround*p) * exp(rnorm(k, 0, 0.06))))

      df |>
        select(race_id, runner_id, horse_id, jockey_id, trainer_id,
               draw, weight_kg, age, sex, odds_decimal,
               finish_pos, finish_time_s, win)
    }) |>
    ungroup()

  list(races = races, runners = runners)
}

sim <- simulate_races(seed = 42)

Rapid EDA health checks

library(ggplot2)

ggplot(sim$races, aes(field_size)) +
  geom_histogram(bins = 12) +
  labs(x = "Runners per race", y = "Count")

ggplot(sim$runners, aes(odds_decimal)) +
  geom_histogram(bins = 50) +
  scale_x_log10() +
  labs(x = "Decimal odds (log scale)", y = "Count")

3) Continue using Parquet and querying DuckDB

A clean pattern is: write Parquet (arrow) → create tables in DuckDB read_parquet(). This provides a fast, local warehouse without heavy infrastructure.

library(arrow)
library(DBI)
library(duckdb)

dir.create("data/derived", recursive = TRUE, showWarnings = FALSE)
write_parquet(sim$races,   "data/derived/races.parquet")
write_parquet(sim$runners, "data/derived/runners.parquet")

con <- dbConnect(duckdb(), "data/warehouse/racing.duckdb")
dbExecute(con, "INSTALL parquet; LOAD parquet;")

dbExecute(con, "
  CREATE OR REPLACE TABLE races AS
  SELECT * FROM read_parquet('data/derived/races.parquet');
")

dbExecute(con, "
  CREATE OR REPLACE TABLE runners AS
  SELECT * FROM read_parquet('data/derived/runners.parquet');
")

dbGetQuery(con, "SELECT COUNT(*) AS n_races FROM races;")
dbGetQuery(con, "SELECT COUNT(*) AS n_runners FROM runners;")

dbDisconnect(con, shutdown = TRUE)

Search patterns you’ll use all the time

-- Average odds and win rate by surface
SELECT
  surface,
  AVG(odds_decimal) AS avg_odds,
  AVG(win)          AS win_rate,
  COUNT(*)          AS n
FROM runners r
JOIN races   x USING (race_id)
GROUP BY surface
ORDER BY n DESC;

-- Within-race normalization of implied probability
WITH base AS (
  SELECT
    race_id,
    runner_id,
    odds_decimal,
    1.0 / odds_decimal AS implied_raw
  FROM runners
)
SELECT
  race_id,
  runner_id,
  implied_raw / SUM(implied_raw) OVER (PARTITION BY race_id) AS implied_p_norm
FROM base;

4) Baselines: probabilistic profit modeling + calibration

In race analysis, your model output is typically a probability. This means that evaluation must be given priority correct scoring rules And calibrationnot just AUC.

Feature store (leak proof)

library(dplyr)

features <- sim$runners |>
  left_join(sim$races, by = "race_id") |>
  transmute(
    race_id, runner_id,
    win,
    # pre-race covariates only
    draw, weight_kg, age, sex,
    surface, going, distance_m, race_class, purse,
    odds_decimal,
    log_odds = log(odds_decimal),
    implied_p_raw = 1/odds_decimal
  )

stopifnot(all(is.finite(features$log_odds)))
stopifnot(all(features$odds_decimal >= 1.01))

propermodels GLM baseline + log loss / Brier / AUC

library(tidymodels)

set.seed(123)

df <- features |>
  group_by(race_id) |>
  mutate(implied_p = implied_p_raw / sum(implied_p_raw)) |>
  ungroup() |>
  mutate(win = factor(win, levels = c(0,1), labels = c("no","yes")))

spl  <- initial_split(df, strata = win)
train<- training(spl)
test <- testing(spl)

rec <- recipe(win ~ draw + weight_kg + age + sex + surface + going +
                distance_m + odds_decimal, data = train) |>
  step_dummy(all_nominal_predictors()) |>
  step_zv(all_predictors()) |>
  step_normalize(all_numeric_predictors())

mod <- logistic_reg() |> set_engine("glm")
wf  <- workflow() |> add_recipe(rec) |> add_model(mod)

fit  <- fit(wf, train)
pred <- predict(fit, test, type = "prob") |>
  bind_cols(test |> select(win))

metric_set(mn_log_loss, brier_class, roc_auc)(pred, truth = win, .pred_yes)

Calibration plot

library(probably)

# Binned calibration plot
cal_plot_breaks(pred, truth = win, estimate = .pred_yes, num_breaks = 10)

Calibration Mindset: a model can be ‘accurate’ in ranking, yet systematically overconfident. In the context of betting or pricing, a miscalibration is usually more dangerous than a small drop in AUC.

5) Multi-runner outcome modeling: rankings, not just winners

Binary win models throw away a lot of structure. Racing naturally results in a ranking: finishing positions over several runners. Ranking models allow you to target that structure directly.

Plackett-Luce (direct ranking probability)

library(PlackettLuce)

set.seed(99)
items  <- paste0("H", 1:12)
n_races <- 40
ability <- rnorm(length(items)); names(ability) <- items

Rmat <- matrix(0, nrow = n_races, ncol = length(items))
colnames(Rmat) <- items

for (r in 1:n_races) {
  field <- sample(items, size = sample(6:10, 1), replace = FALSE)
  util  <- ability[field] + rnorm(length(field), 0, 0.3)
  ord   <- field[order(util, decreasing = TRUE)]
  Rmat[r, ord] <- seq_along(ord) # 1 = best; 0 = absent
}

rankings <- as.rankings(Rmat)
pl <- PlackettLuce(rankings)

summary(pl)
coef(pl)  # worth parameters

Bradley-Terry via pairwise decomposition

library(dplyr)
library(BradleyTerry2)

pairwise <- tibble()
for (r in 1:n_races) {
  in_race <- names(which(Rmat[r, ] > 0))
  ranks   <- Rmat[r, in_race]
  for (i in 1:(length(in_race) - 1)) {
    for (j in (i + 1):length(in_race)) {
      h1 <- in_race[i]; h2 <- in_race[j]
      win1 <- as.integer(ranks[h1] < ranks[h2])
      pairwise <- bind_rows(pairwise, tibble(h1=h1, h2=h2, win1=win1, win2=1-win1))
    }
  }
}

agg <- pairwise |>
  group_by(h1, h2) |>
  summarise(win1 = sum(win1), win2 = sum(win2), .groups = "drop")

bt <- BTm(cbind(win1, win2), h1, h2, data = agg, id = "horse")
BTabilities(bt)

Interpretation

Both families estimate “power-like” latent parameters, but they do so with different data representations: full rankings versus aggregated pairwise outcomes.

When rankings matter

If you value place markets, exactas/quinellas, or modeling finish times/positions beyond “winning,” ranking-aware methods are often better suited to the goal.

6) Hierarchy: GLMMs and Bayesian multilevel models

Racing is fundamentally hierarchical: repeated horses, trainers, jockeys, tracks, regions. Fixed effects can be excessive; Multi-level models allow for shrinkage and better accounting for uncertainty.

GLMM with random trainer/jockey effects (lme4)

library(lme4)
library(dplyr)

sim2 <- simulate_races(n_races=500, n_horses=800, n_jockeys=120, n_trainers=100, seed=202)
df2  <- sim2$runners |> left_join(sim2$races, by="race_id")

m_glmm <- glmer(
  win ~ scale(weight_kg) + scale(draw) + surface + going +
    (1 | trainer_id) + (1 | jockey_id),
  data   = df2,
  family = binomial()
)

summary(m_glmm)

Bayesian multi-level model (brms)

library(brms)

priors <- c(
  prior(normal(0, 1), class = "b"),
  prior(exponential(1), class = "sd")
)

fit_bayes <- brm(
  win ~ scale(weight_kg) + scale(draw) + surface + going +
    (1 | trainer_id) + (1 | jockey_id),
  data   = df2,
  family = bernoulli(),
  prior  = priors,
  chains = 2, iter = 1000, cores = 2, seed = 202
)

summary(fit_bayes)
pp_check(fit_bayes)

Why Bayesian here?
You get a principled way to encode priors (smaller effects are more likely than large ones), propagate uncertainty via derived probabilities, and perform posterior predictive checks.

7) Time-to-event: modeling survival and vulnerability

Some questions in racing obviously relate to time to event: time to first win, time to peak performance, time to retirement, time between races. Survival analysis gives you the right probability and handles censorship.

Cox proportional hazards

library(dplyr)
library(survival)

set.seed(1)
n <- 1000

horses <- tibble(
  horse_id  = sprintf("H%04d", 1:n),
  talent    = rnorm(n),
  trainer_id= sample(sprintf("T%03d", 1:80), n, TRUE),
  debut_age = pmax(2, rnorm(n, 3.0, 0.6))
)

base_rate <- 0.002
linpred   <- 0.65 * horses$talent - 0.25 * (horses$debut_age - 3)
rate      <- base_rate * exp(linpred)

time_days   <- rexp(n, rate = rate)
censor_days <- runif(n, 60, 700)

status   <- as.integer(time_days  mutate(time_days = time_obs, status = status)

cox <- coxph(Surv(time_days, status) ~ scale(talent) + scale(debut_age), data = df_surv)
summary(cox)
cox.zph(cox)

Vulnerability term (heterogeneity at trainer level)

cox_f <- coxph(
  Surv(time_days, status) ~ scale(talent) + scale(debut_age) + frailty(trainer_id),
  data = df_surv
)

summary(cox_f)

8) Modern ML: XGBoost in closerdmodels and mlr3

Gradient enhancement can capture nonlinearities and interactions without manual feature engineering. But if you don’t evaluate probabilistically (log loss, calibration), you can accidentally ship a model that “rates well” but is still poorly priced.

neatmodels: tuned XGBoost

library(tidymodels)
library(dplyr)

set.seed(314)
sim3 <- simulate_races(n_races = 500, n_horses = 1200, seed = 314)

df3 <- sim3$runners |>
  left_join(sim3$races, by="race_id") |>
  mutate(win = factor(win, levels=c(0,1), labels=c("no","yes")))

spl   <- initial_split(df3, strata = win)
train <- training(spl)
test  <- testing(spl)

rec <- recipe(win ~ draw + weight_kg + age + sex + surface + going +
               distance_m + odds_decimal, data=train) |>
  step_dummy(all_nominal_predictors()) |>
  step_zv(all_predictors())

folds <- vfold_cv(train, v = 5, strata = win)

xgb_spec <- boost_tree(
  trees = 800,
  tree_depth     = tune(),
  learn_rate     = tune(),
  loss_reduction = tune(),
  sample_size    = tune(),
  mtry           = tune()
) |>
  set_engine("xgboost") |>
  set_mode("classification")

wf <- workflow() |> add_recipe(rec) |> add_model(xgb_spec)

grid <- grid_space_filling(
  tree_depth(),
  learn_rate(),
  loss_reduction(),
  sample_size = sample_prop(),
  mtry(range = c(5, 60)),
  size = 20
)

res <- tune_grid(
  wf,
  resamples = folds,
  grid = grid,
  metrics = metric_set(mn_log_loss, roc_auc)
)

best <- select_best(res, "mn_log_loss")
final_fit <- finalize_workflow(wf, best) |> fit(train)

pred <- predict(final_fit, test, type="prob") |> bind_cols(test |> select(win))
metric_set(mn_log_loss, brier_class, roc_auc)(pred, truth = win, .pred_yes)

mlr3: similar workflow

library(data.table)
library(mlr3)
library(mlr3learners)

set.seed(99)
sim4 <- simulate_races(n_races = 400, n_horses = 900, seed = 99)

df4 <- as.data.table(merge(sim4$runners, sim4$races, by = "race_id"))
df4[, win := factor(ifelse(win == 1, "yes", "no"))]

task <- as_task_classif(df4, target = "win", id = "win_task")
task$set_col_roles(
  c("race_id","runner_id","horse_id","jockey_id","trainer_id"),
  roles = "name"
)

rsmp <- rsmp("cv", folds = 5)

lrn_glm <- lrn("classif.log_reg", predict_type = "prob")
rr_glm  <- resample(task, lrn_glm, rsmp)
rr_glm$aggregate(msr("classif.logloss"))
rr_glm$aggregate(msr("classif.auc"))

lrn_xgb <- lrn("classif.xgboost", predict_type = "prob",
               nrounds = 300, eta = 0.05, max_depth = 4)
rr_xgb <- resample(task, lrn_xgb, rsmp)
rr_xgb$aggregate(msr("classif.logloss"))
rr_xgb$aggregate(msr("classif.auc"))

9) Odds time series: from snapshots to features

Exchange markets produce timestamps and volumes. Treat them as time series and develop features such as drift, volatility, late steam and liquidity. Keep it honest: the microstructure of the market is noisy and backtests require strong assumptions.

Simulated snapshots as a tsibble + feature extraction

library(dplyr)
library(tsibble)
library(feasts)
library(ggplot2)

set.seed(1)
snap <- tibble(
  runner_id = "R00001_01",
  ts = as.POSIXct("2025-06-01 14:00:00", tz = "UTC") + seq(0, 60*60, by = 60),
  odds_decimal = pmax(1.01, 6 + cumsum(rnorm(61, 0, 0.05))),
  traded_vol = cumsum(pmax(0, rnorm(61, 10, 3)))
)

tsbl <- snap |> as_tsibble(index = ts, key = runner_id)

feat <- tsbl |>
  features(odds_decimal, list(
    mean  = ~ mean(.),
    sd    = ~ sd(.),
    min   = ~ min(.),
    max   = ~ max(.),
    slope = ~ coef(lm(. ~ seq_along(.)))[2]
  ))

feat

ggplot(tsbl, aes(ts, odds_decimal)) +
  geom_line() +
  labs(x = "Time", y = "Decimal odds")

Schema normalization: table of snapshots

# Example transformation skeleton for real exchange data:
# 1) parse raw feed messages
# 2) map to normalized columns: race_id, runner_id, ts, odds_decimal, traded_vol
# 3) write Parquet and load into DuckDB

# write_parquet(odds_snapshots, "data/derived/odds_snapshots.parquet")
# dbExecute(con, "CREATE OR REPLACE TABLE odds_snapshots AS
#                 SELECT * FROM read_parquet('data/derived/odds_snapshots.parquet');")

10) Backtesting as a decision layer (fractional Kelly + guardrails)

After you calibrate the probabilities, you can add a decision policy. A commonly used framework is fractional Kelly sizing, but this must be capped and stress tested. The same opportunities can produce very different outcomes depending on execution constraints, market impact, limits and selection filters.

Simulated fractional Kelly example with risk limits

library(dplyr)
library(ggplot2)

sim5 <- simulate_races(n_races = 180, n_horses = 500, seed = 9)
runners <- sim5$runners |>
  group_by(race_id) |>
  mutate(
    p_market = (1/odds_decimal) / sum(1/odds_decimal),
    # toy "model": market + noise (for demonstration)
    p_model  = pmin(0.99, pmax(0.001, p_market + rnorm(n(), 0, 0.02))),
    edge     = p_model - p_market
  ) |>
  ungroup()

fractional_kelly <- function(p, odds_dec, fraction = 0.25, cap = 0.05) {
  b <- odds_dec - 1
  f_star <- (b*p - (1 - p)) / b
  f <- pmax(0, f_star) * fraction
  pmin(f, cap)
}

bankroll <- 1
max_drawdown <- 0.25
peak <- bankroll
hist <- tibble()

for (rid in unique(runners$race_id)) {
  if (bankroll < (1 - max_drawdown) * peak) break  # guardrail: stop

  race <- runners |> filter(race_id == rid)
  pick <- race |> arrange(desc(edge)) |> slice(1)

  if (pick$edge > 0.01 && pick$odds_decimal  mutate(drawdown = 1 - bankroll/peak)

ggplot(hist, aes(seq_along(bankroll), bankroll)) + geom_line() +
  labs(x = "Race index", y = "Bankroll")

ggplot(hist, aes(seq_along(drawdown), drawdown)) + geom_line() +
  labs(x = "Race index", y = "Drawdown")

Practical warning: this example is deliberately simplified. Real backtests require: commissions, slippage, betting limits, late price changes, liquidity restrictions, and selection bias checks.

11) Implementation: Versioned models + HTTP prediction API

If you want your work to be reusable, build a deployable artifact. A clean R-native pattern is:
pins for version control + vetiver for model packaging +
plumber for the API server.

library(tidymodels)
library(pins)
library(vetiver)
library(plumber)
library(dplyr)

sim6 <- simulate_races(n_races = 250, n_horses = 700, seed = 101)

df6 <- sim6$runners |>
  left_join(sim6$races, by="race_id") |>
  mutate(win = factor(win, levels=c(0,1), labels=c("no","yes")))

rec <- recipe(win ~ draw + weight_kg + age + sex + surface + going +
                distance_m + odds_decimal, data = df6) |>
  step_dummy(all_nominal_predictors()) |>
  step_zv(all_predictors())

fit <- workflow() |>
  add_recipe(rec) |>
  add_model(logistic_reg() |> set_engine("glm")) |>
  fit(df6)

board <- board_folder("pins", versioned = TRUE)
v <- vetiver_model(fit, model_name = "win_prob_glm")

vetiver_pin_write(board, v)

api <- pr() |> vetiver_api(v)

# Run locally:
# api |> pr_run(host = "0.0.0.0", port = 8080)

api

Minimum request payload form (JSON)

{
  "draw": 7,
  "weight_kg": 55.5,
  "age": 4,
  "sex": "G",
  "surface": "Turf",
  "going": "Good",
  "distance_m": 1600,
  "odds_decimal": 6.2
}

12) Reproducibility mechanics: renv + deterministic builds

Reproducibility is not a vibration. Lock dependencies, commit versions, and standardize builds. This is especially important when your analyzes become a book, a report, or a product artifact.

renv workflow

# Initialize (once)
# renv::init()

# Snapshot current library into renv.lock
renv::snapshot()

# Restore exact package versions from renv.lock
renv::restore()

Dockerfile for deterministic PDF builds (Quarto)

FROM rocker/verse:4.4.2
WORKDIR /book

COPY renv.lock renv.lock
COPY renv/ renv/
COPY .Rprofile .Rprofile

RUN R -q -e "install.packages('renv'); renv::restore(prompt = FALSE)"

COPY . .
CMD ["quarto", "render", "book", "--to", "pdf"]

CI Skeleton (GitHub Actions)

name: Render PDF
on:
  push:
    branches: [ main ]
  workflow_dispatch:

jobs:
  pdf:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: r-lib/actions/setup-r@v2
      - uses: r-lib/actions/setup-renv@v2
      - name: Render Quarto book
        run: quarto render book --to pdf

13) A ‘technical checklist’ that you can actually follow

Data contract

Canonical scheme
Parquet + DuckDB
Adapters, no redistribution

Modeling

GLM/GLMM baselines
Ranking probabilities
Bayesian uncertainty

Evaluation

Wood loss / Brier
Calibration curves
Stress-tested backtests

Time dependence

Opportunity snapshots
Drift/volatility functions
Liquidity constraints

Decision layer

Explicit assumptions
Fractional Kelly caps
Extendable railings

Stake

pin version control
vetiver packaging
plumber API

Another code snippet: within-race probability normalization

Many race tables store the ‘odds’ per runner, but as an inference you often want a normalized probability within a field. Here is a standard pattern that also emphasizes exaggerated problems.

library(dplyr)

normalize_field_probs <- function(df, odds_col = odds_decimal) {
  df |>
    group_by(race_id) |>
    mutate(
      implied_raw = 1 / {{ odds_col }},
      overround   = sum(implied_raw),
      p_market    = implied_raw / overround
    ) |>
    ungroup()
}

df_norm <- normalize_field_probs(sim$runners)

df_norm |>
  summarise(
    avg_overround = mean(overround),
    pmin_overround= min(overround),
    pmax_overround= max(overround)
  )

Interpretation: overround > 1 implies the sum of the market’s implied probabilities above 1 (built-in margin). Separating ‘market-implied probability’ from ‘model probability’ is essential if you want a fair evaluation.

#Quantitative #horse #racing #calibration #backtesting #implementation #bloggers