Debunking the GAMLSS Myth | R bloggers

Debunking the GAMLSS Myth | R bloggers

[This article was first published on Achim Zeileis, and kindly contributed to R-bloggers]. (You can report a problem with the content on this page here)


Want to share your content on R bloggers? click here if you have a blog, or here if you don’t.

Generalized additive models for location, scale and shape (GAMLSS) are popular for estimating reference equations, for example for lung function measurements. A recent paper claimed to ‘debunk’ the GAMLSS myth in this context by showing that much simpler piecewise linear models work just as well. A new letter to the editor refutes these claims.

Quote

Robert A. Rigby, Mikis D. Statesinapolos, Achim Zeileis, Sanjala Stanojevic, Gillian Heller, Fernanda de Bastiani, Thomas Kneib, Andreas Mayr, Reto Stauffer, Nikolaus Ulauf (2025). “Letter to the editor refuting ‘debunking the Gamlss myth: Syplicity reigns in lung function diagnostics’.” Coming soon Respiratory medicine. Preprint available at arXiv.org E-print archive arXiv:2512.09179 [stat.AP]. doi:10.48550/arXiv.2512.09179

Abstract

We received the above article with interest Zavorsky (2025, Respiratory Medicine) regarding reference equations for lung function tests. The author compares a Generalized Additive Model for Location, Scale, and Shape (GAMLSS), the standard adopted by the Global Lung Function Initiative (GLI), with a segmented linear regression model (SLR) for lung function variables. The author presents an interesting comparison; however, there are some fundamental problems with the approach. We welcome this opportunity to discuss the issues it raises. The author’s claim is that (1) SLR “provides prediction accuracies comparable to GAMLSS”; and (2) the GAMLSS model equations are “complicated and require additional spline tables,” while the SLR is “simpler, more parsimonious, and more accessible to a broader audience.” We respectfully disagree with both points.

Read full letter ›

Replication

Zavorsky submits the data for his analysis NHANES_2007_2012_Only_Acceptable_Spirometry_Values.csv.

Using Zavorsky’s code, we were able to replicate most of his analyzes and reassess the models. The replication code for the letter to the editor is available in debunk.R.

To illustrate, a pair of models for one of the response variables (in the male subgroup) is fitted and assessed below.

Illustration

To illustrate the workflow we used to assess the segmented linear regression (SLR) and the generalized additive models for location, scale, and shape (GAMLSS) for the spirometry data, we consider the response variable FEV1 for the male subgroup. FEV1 is the forced expiratory volume in 1 second in liters, ie the volume of air that can be blown out with force in the first second after full inspiration.

After downloading the data (see link above) we can read the CSV, calculate the squared age (which Zavorsky somewhat confusingly uses instead of age in SLR) and select the male subset.

d <- "NHANES_2007_2012_Only_Acceptable_Spirometry_Values.csv" |>
  read.csv() |>
  transform(Age2 = Age^2) |>
  subset(Sex == "Male")

Then we load the packages we need, all available from CRAN except the top models package that is still on R-Forge.

library("segmented")
library("gamlss")
library("distributions3")
library("topmodels")

To streamline the workflow for probabilistic forecasting and model assessments distributions3 And topmodels packages for the SLR models mounted by segmented we need one prodist() method to extract/predict the corresponding probability distribution. This doesn’t work out-of-the-box because segmented uses only one constant variance, while Zavorsky uses a piecewise constant variance. The function below implements this for the special case of a segmented file lm() with exactly one breakpoint.

prodist.segmented <- function(object, newdata = NULL, ...) {
  ## in-sample and out-of-sample data
  fitdata <- model.frame(object)
  if (is.null(newdata)) newdata <- fitdata

  ## predicted mean (m) and standard deviation (s)
  m <- predict(object, newdata = newdata)
  psi <- object$psi[, "Est."]
  seg <- as.numeric(fitdata[[object$nameUV$Z]] > psi) + 1
  s <- sqrt(tapply(residuals(object)^2, seg, mean))
  s <- s[as.numeric(newdata[[object$nameUV$Z]] > psi) + 1]

  ## Normal distribution object
  distributions3::Normal(mu = m, sigma = s)
}

We then fit the SLR and GAMLSS models into the (somewhat complicated) specification that Zavorsky considered for his paper.

lm_fev1_m <- lm(Baseline_FEV1_L ~ Age2 + Height + I(Height^2) + Weight, data = d)
seg_fev1_m <- segmented(lm_fev1_m, seg.Z = ~ Age2, psi = 20^2)

gam_fev1_m <- gamlss(Baseline_FEV1_L ~ log(Height) + log(Age) + cs(Age, df = 3),
  sigma.formula = ~ log(Age) + cs(Age, df = 3), family = BCCGo(mu.link = "log"), 
  data = d, method = mixed(50, 500),
  control = gamlss.control(trace = TRUE, maxit = 500))

To demonstrate that the SLR with the piecewise constant variance does not provide a satisfactory fit for different age groups, we provide the visualization below. It shows the empirical proportions of observations below the 5% lower limit of normal (LLN) over different age groups (7.5 year intervals). The red line with triangles corresponds to the SLR model and the blue line with circles corresponds to the GAMLSS. The corresponding pointwise 95% intervals are added in gray in each panel.

Although the proportion of observations below the LLN roughly matches the nominal 5% level for the GAMLSS across all age intervals, the SLR performs quite poorly. For young people (5 to 12.5 years old) the LLN is much too small, which means that very few observations actually fall under the LLN. Conversely, for people between 12.5 and 20 years old, the LLN is too large, meaning that too many observations fall under the LLN. In other words, the SLR would fail to indicate critically low FEV1 values, either by producing too many or too few cases.

Using the procast() function of topmodels we predict the corresponding 5% LLN for each person and then aggregate the excess rates into 7.5 year age intervals.

## proportion of observations below LLN
d_lln <- rbind(
  data.frame(
    proportion = d$Baseline_FEV1_L < procast(gam_fev1_m, type = "quantile", at = 0.05, drop = TRUE),
    fit = "gamlss",
    age = d$Age),
  data.frame(
    proportion = d$Baseline_FEV1_L < procast(seg_fev1_m, type = "quantile", at = 0.05, drop = TRUE),
    fit = "segmented",
    age = d$Age)
)

## set up age intervals
breaks <- seq(5, 80, by = 7.5)
mid <- (head(breaks, -1) + tail(breaks, -1))/2
d_lln <- transform(d_lln,
  age = cut(age, breaks, include.lowest = TRUE)
)

## aggregate in age groups
a_lln <- d_lln |>
  aggregate(x = proportion ~ fit + age, FUN = mean) |>
  transform(nobs = aggregate(proportion ~ fit + age, data = d_lln, FUN = length)$proportion) |>
  transform(alpha = 0.05) |>
  transform(se = sqrt(alpha * (1 - alpha)/nobs)) |>
  transform(lower = alpha + qnorm(0.025) * se) |>
  transform(upper = alpha + qnorm(0.975) * se) |>
  transform(age = mid[age])

To display these aggregated values ​​in an attractive way, we use the recent values smallplot package.

library("tinyplot")
pal <- palette.colors()[6:7]
tinytheme("clean2", grid.lwd = 1, grid.lty = 2)
tinyplot(
  proportion ~ age | fit,
  data = a_lln,
  type = "o",
  pch = c(19, 17),
  lwd = 1.5,
  cex = 1.5,
  col = pal,
  ylim = c(0.01, 0.09),
  yaxl = "%",
  xlab = "Age (mid points of 7.5 year intervals)",
  ylab = "Proportion of observations below LLN"
)
tinyplot_add(
  alpha ~ age | fit,
  ymin = a_lln$lower,
  ymax = a_lln$upper,
  type = "ribbon",
  col = "gray"
)
tinyplot_add()

To clarify that not only is the 5% LLN misspecified, but that the SLR more generally provides a poor probabilistic fit for FEV1, we use a detrended QQ plot (aka worm plot) based on z-scores (aka quantile residuals). Using the qqrplot() function of topmodels this can be set as follows.

qqrplot(seg_fev1_m, detrend = TRUE, confint = "polygon",
  col = pal[2], pch = 17, main = "")
qqrplot(gam_fev1_m, detrend = TRUE, plot = FALSE) |>
  points(col = pal[1], pch = 19)

QQ plots of the z-scores from the SLR and GAMLSS models for a selected age group.

In the letter, the QQ plots are used without detrending and are considered separately for different age groups.

See the full replication code (linked above) for more details.


#Debunking #GAMLSS #Myth #bloggers

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *