Choose Your Fighter: data-driven selection of the best marathon

[This article was first published on Rstats – quantixed, and kindly contributed to R-bloggers]. (You can report a problem with the content on this page here)

Want to share your content on R bloggers? click here if you have a blog, or here if you don’t.

Running a marathon is quite a challenge. It takes a lot of time to train to run a good time, and it takes a while to recover. So if you’re chasing a marathon PB time, you need to choose wisely which marathon to target. How can we use data to support our decision? Let’s use R to find out!

For the impatient: just show me the marathon data! or I want to see how I can code this!

Let’s leave aside for a moment the fact that for the most popular marathons, whether or not you can register may not be your choice. What factors should we consider to choose the best one?

Flat course
Favorable weather
Travel considerations

The flattest course is ideal. Any gain in altitude will slow us down. We could look at the finishing times to know how “fast” the course is in practice, but the finishing times really depend on who is running and how many participants there are. It is somewhat cyclical, with the bigger marathons attracting faster runners. To keep things simple, I didn’t use time measurements and only used elevation data.

Most people agree that running in cool temperatures, ideally dry, is best. That is why they are usually organized in spring and autumn. So we need to have an idea of the likely conditions on the day.

The ideal marathon would also be easy to achieve. As I live in Britain, I made a list of popular marathons in Britain and then added the World Marathons for comparison, as well as a few others from Europe that people I know have run. For each edition I pulled a GPX file of the route from Garmin (more on this below) and noted what date the last 3 editions took place (for the weather data). Using these things, and taking advantage of some R libraries, I was able to generate images to compare the marathon routes.

The marathon data

Click on each image to enlarge it:

The course profile for each race is shown on the same scale to give an idea of how challenging it is. Here the most important data is organized in a table, arranged by date.

Marathon	Date	Height gain (m)	Typical maximum temperature (°C)
Tokyo	1/3/26	150	14.8
Great Welsh	3-8-26	118	11.4
Cambridge Border	15/3/26	158	10.4
Boston Lincs.	12/4/26	46	13.4
Brighton	12/4/26	160	13.1
Paris	12/4/26	194	14.6
Manchester	19/4/26	121	14.2
Newport	19/4/26	77	13.6
Boston	20/4/26	234	17.9
Blackpool	26/4/26	172	13.1
London	26/4/26	162	14.6
Stratford-upon-Avon	26/4/26	195	14.2
Milton Keynes	4/5/26	205	14.8
Leeds Rob Burrow	10/5/26	400	19.4
Worchester	17/5/26	296	19.6
Edinburgh	24/5/26	113	14.4
Sydney	30/8/26	369	21.8
Berlin	27/9/26	101	19.7
Chester	4/10/26	213	17
Chicago	11/10/26	105	16.2
Abingdon	18/10/26	97	15.6
Yorkshire	18/10/26	148	14.1
Amsterdam	18/10/26	174	13.8
Frankfurt	25/10/26	142	13
New York	1/11/26	179	15.2
Valencia	6/12/26	144	17.2

and here is a graphical representation of the same data:

Breakdown

Let’s face it: most marathons sell themselves as flat and fast. Who can really claim that?

The three flattest on our list are Boston (Lincs.), Newport and Abingdon. The following marathons are all less than 150m gain and therefore quite flat: Great Welsh, Manchester, Edinburgh, Berlin, Chicago, Yorkshire, Frankfurt, Valencia. Between 150-200m, which is still fairly flat, we have Tokyo, Cambridge Boundary, Brighton, Paris, Blackpool, London, Stratford-upon-Avon, Amsterdam and New York. We are also located in a rolling area. Marathons with more than 200 meters of elevation gain include Boston, Milton Keynes, Leeds, Worcester, Sydney and Chester.

Of the flattest marathons on our list, the coolest temperatures are likely to be in Great Welsh, Frankfurt, Boston (Lincolnshire) and Newport. While Berlin, Valencia and Chicago are probably the warmest. So this gives us an idea of where the best achievements can be unlocked.

Data accuracy

Getting the total elevation gain is difficult. I used a single data source (Garmin Connect) for the GPS data to reduce variation, but even with this single source the calculated total gain varied considerably.

The elevation data for a GPS location must of course be correct. This is not necessarily the case if the data comes from a watch, where the barometer can be inaccurate or where tall buildings distort the location (which is a problem in city marathons).

If the data is correct, the calculation may still be inaccurate due to the sampling rate. If we add up all the elevation changes for a trajectory sampled every 10 meters versus a trajectory sampled every 50 meters, we get a different answer because the latter is smoother than the former. To deal with this, I resampled the elevation data on a uniform distance scale to get the most accurate elevation gain I could get from the data I had. This caveat applies to any marathon information you find online. So our comparison here allows us to say that one marathon has more or less elevation gain than another, but it does not allow us to compare the elevation gain to data at another location.

The weather forecast is taken by looking at the weather of the last three editions – with the exception of Valencia where the 2025 edition has not yet taken place. I used the average of the maximum temperature on those editions. A more accurate picture would be to take a number of days either side of the event, as it is possible that the weather on one or more editions was quite atypical.

Finally, I collected the data manually, so errors are possible. Apologies for any errors!

The code

If you came here for the R coding instead of the running, here’s the bit where I show how the analysis works! Besides general R stuff – importing data, calculations, creating graphs – we need to do a few other things:

read the GPX data and calculate the elevation data – we will use {gpxtoolbox} to help with this
get weather data – we will use {openmeteo} for this
convert WMO codes into icons, load the icons and display them

We have two functions stored in a script that is retrieved during the main script. The goal is to convert the WMO codes into icons. I found one core with the WMO codes and the corresponding URLs of the day or night versions of the pictograms. The first function converts this data (in json format) into a data frame that we can use in the main script. The second function converts the wind direction into a text arrow for display.

library(jsonlite)
library(dplyr)
library(tidyr)
library(purrr)
library(stringr)
library(tibble)

# Example: read the JSON into `lst`
# lst <- jsonlite::fromJSON("Data/descriptions.json", simplifyVector = FALSE)

descriptions_to_df <- function(lst) {
  # lst is expected to be a named list or a list of entries where each element corresponds to a WMO code.
  # Support either:
  #  - named list where names(lst) are WMO codes and each element is a list with day/night fields
  #  - or a list of objects where each object has a "wmo" or "WMO" field + nested day/night fields
  
  # Helper to normalize keys for day/night entries
  norm_field <- function(item, keys) {
    # keys: possible key names (vector), returns first non-NULL value or NA
    for (k in keys) {
      if (!is.null(item[[k]])) return(item[[k]])
    }
    return(NA_character_)
  }
  
  # If lst is a named list with codes as names
  if (!is.null(names(lst)) && all(names(lst) != "")) {
    codes <- names(lst)
    rows <- map2_df(lst, codes, function(item, code) {
      # item may have elements like $day$description, or $day_description, etc.
      # Try several common variants.
      day <- item[["day"]]    # might be a list
      night <- item[["night"]]
      day_description <- if (!is.null(day) && is.list(day)) norm_field(day, c("description", "desc", "text")) else norm_field(item, c("day_description", "dayDescription", "day-desc"))
      day_image <- if (!is.null(day) && is.list(day)) norm_field(day, c("image", "img", "image_url")) else norm_field(item, c("day_image", "dayImage", "day-img"))
      night_description <- if (!is.null(night) && is.list(night)) norm_field(night, c("description", "desc", "text")) else norm_field(item, c("night_description", "nightDescription", "night-desc"))
      night_image <- if (!is.null(night) && is.list(night)) norm_field(night, c("image", "img", "image_url")) else norm_field(item, c("night_image", "nightImage", "night-img"))
      
      tibble(
        wmo = code,
        day_description = as.character(day_description),
        day_image = as.character(day_image),
        night_description = as.character(night_description),
        night_image = as.character(night_image)
      )
    })
    
    return(rows)
  }
  
  # Otherwise treat as array of objects, each with a wmo field
  rows <- map_df(lst, function(item) {
    code <- norm_field(item, c("wmo", "WMO", "WMO_code", "wmo_code", "id"))
    day <- item[["day"]]
    night <- item[["night"]]
    day_description <- if (!is.null(day) && is.list(day)) norm_field(day, c("description", "desc", "text")) else norm_field(item, c("day_description", "dayDescription"))
    day_image <- if (!is.null(day) && is.list(day)) norm_field(day, c("image", "img", "image_url")) else norm_field(item, c("day_image", "dayImage"))
    night_description <- if (!is.null(night) && is.list(night)) norm_field(night, c("description", "desc", "text")) else norm_field(item, c("night_description", "nightDescription"))
    night_image <- if (!is.null(night) && is.list(night)) norm_field(night, c("image", "img", "image_url")) else norm_field(item, c("night_image", "nightImage"))
    
    tibble(
      wmo = as.character(code),
      day_description = as.character(day_description),
      day_image = as.character(day_image),
      night_description = as.character(night_description),
      night_image = as.character(night_image)
    )
  })
  
  # If wmo NA but names exist in original list, try to fill
  if (all(is.na(rows$wmo)) && !is.null(names(lst))) {
    rows$wmo <- names(lst)
  }
  
  # Ensure first column is wmo
  rows %>% select(wmo, everything())
}

windsymbol <- function(degree) {
  # Return wind direction symbol based on degree
  if (is.na(degree)) {
    return("-")
  }
  directions <- c("↓", "↙", "←", "↖", "↑", "↗", "→", "↘", "↓")
  index <- round(degree / 45) + 1
  return(directions[index])
}

Okay, so now the main script. I had the marathon event list in tab delimited format as a file in the Data folder and a gpx file for each marathon in the same folder. The name of the gpx file is the same as the event name. The event listing also had an alias for displaying the marathon name. This is the contents of the file.

event   date2023    date2024    date2025    date2026    alias
Leeds   14/5/23 12/5/24 11/5/25 10/5/26 Leeds Rob Burrow
GreatWelsh  2/4/23  17/3/24 16/3/25 8/3/26  Great Welsh
Cambridge   12/3/23 10/3/24 16/3/25 15/3/26 Cambridge Boundary
Boston  16/4/23 28/4/24 13/4/25 12/4/26 Boston Lincs.
Brighton    2/4/23  7/4/24  6/4/25  12/4/26 Brighton
Manchester  16/4/23 14/4/24 27/4/25 19/4/26 Manchester
Newport 16/4/23 28/4/24 19/4/25 19/4/26 Newport
Blackpool   23/4/23 21/4/24 27/4/25 26/4/26 Blackpool
London  23/4/23 21/4/24 27/4/25 26/4/26 London
Shakespeare 23/4/23 21/4/24 27/4/25 26/4/26 Stratford-upon-Avon
MK  1/5/23  6/5/24  5/5/25  4/5/26  Milton Keynes
Worcester   21/5/23 19/5/24 18/5/25 17/5/26 Worcester
Edinburgh   28/5/23 26/5/24 25/5/25 24/5/26 Edinburgh
Chester 8/10/23 6/10/24 5/10/25 4/10/26 Chester
Abingdon    22/10/23    20/10/24    19/10/25    18/10/26    Abingdon
Yorkshire   15/10/23    20/10/24    19/10/25    18/10/26    Yorkshire
Tokyo   5/3/23  3/3/24  2/3/25  1/3/26  Tokyo
BostonUSA   17/4/23 15/4/24 21/4/25 20/4/26 Boston
Sydney  17/9/23 15/9/24 31/8/25 30/8/26 Sydney
Berlin  24/9/23 29/9/24 21/9/25 27/9/26 Berlin
Chicago 8/10/23 13/10/24    12/10/25    11/10/26    Chicago
NewYork 5/11/23 3/11/24 2/11/25 1/11/26 New York
Frankfurt   29/10/23    27/10/24    26/10/25    25/10/26    Frankfurt
Valencia    3/12/23 1/12/24 7/12/25 6/12/26 Valencia
Amsterdam   15/10/23    20/10/24    19/10/25    18/10/26    Amsterdam
Paris   2/4/23  7/4/24  13/4/25 12/4/26 Paris

From here we can read it and use it to control data collection and processing.

library(ggplot2)
library(dplyr)
library(lubridate)
library(gpxtoolbox)
library(openmeteo)
library(cowplot)
library(png)
library(ggrepel)

## Functions ----

load_weather_image <- function(thisyear) {
  wcode <- weather_df$daily_weather_code[weather_df$yr == thisyear]
  # if wcode is missing or length 0 return a blank image
  if (length(wcode) == 0 || is.na(wcode)) {
    return(NULL)
  }
  # get the image url from wmo_df
  img_url <- wmo_df$day_image[wmo_df$wmo == wcode]
  f <- tempfile()
  download.file(img_url, f)
  img <- readPNG(f)
  img <- as.raster(img)
}

# load tsv of dates
date_df <- read.delim("Data/marathons.txt", header = TRUE, sep = "\t", stringsAsFactors = FALSE)
# the column "event" each row has the name of a marathon that can be loaded by appending ".gpx" to the name

source("Script/mwo.R")
# load the json file descriptions.json from Data/
descriptions <- jsonlite::fromJSON("Data/descriptions.json")
wmo_df <- descriptions_to_df(descriptions)
# we will collate the weather data for each marathon and the max temp etc.
summary_df <- data.frame()

# Loop over each marathon event
for (i in 1:nrow(date_df)) {
  # pick a city, we will automate this later
  city <- date_df$event[i]
  date2026 <- date_df$date2026[date_df$event == city]
  
  # Analyse the example GPX file and get summary statistics
  gpx_path <- paste0("Data/",city,".gpx")
  
  # Get summary statistics
  # stats <- analyse_gpx(gpx_path, return = "stats")
  
  # Get processed track points data
  track_data <- analyse_gpx(gpx_path, return = "data")
  # find the mid point in lat long
  mid_lat <- mean(range(track_data$lat))
  mid_lon <- mean(range(track_data$lon))
  # convert lat and lon coordinates to km for distance calculation
  track_data <- track_data %>%
    mutate(
      lat_km = (lat - mid_lat) * 111.32,
      lon_km = (lon - mid_lon) * 111.32 * cos(mid_lat * pi / 180)
    ) %>%
    arrange(time) %>%
    mutate(
      delta_dist = sqrt((lat_km - lag(lat_km, default = first(lat_km)))^2 + (lon_km - lag(lon_km, default = first(lon_km)))^2),
      cumulative_distance = cumsum(delta_dist)
    )
  # Create route plot
  route <- ggplot() +
    geom_path(data = track_data, aes(x = lon_km, y = lat_km), color = "darkgrey", linewidth = 1) +
    coord_equal() +
    theme_void()
  # x axis is too narrow so expand limits by 10% on each side
  x_range <- range(track_data$lon_km)
  x_expand <- (x_range[2] - x_range[1]) * 0.1
  y_range <- range(track_data$lat_km)
  y_expand <- (y_range[2] - y_range[1]) * 0.1
  # chack that x_expand and y_expand are at least 1.5 km
  x_expand <- max(x_expand, 1.5)
  y_expand <- max(y_expand, 1.5)
  route <- route +
    xlim(x_range[1] - x_expand, x_range[2] + x_expand) +
    ylim(y_range[1] - y_expand, y_range[2] + y_expand)
  # remove title and axis labels and tick labels
  route <- route + 
    # add a scale bar at bottom right
    ggspatial::annotation_scale(location = "br", width_hint = 0.2,
                                plot_unit = "km", bar_cols = c("grey", "white"),
                                line_col = "darkgrey",
                                text_col = "darkgrey",
                                text_cex = 0.8)
  
  # we have delta_dist which is the distance from one point to the next
  # calculate the cumulative distance along the path
  track_data$cum_dist <- cumsum(c(0, track_data$delta_dist[-nrow(track_data)]))
  # we have the ele which is the elevation at each point
  # resample ele so that we have elevation at regular intervals along the cumulative distance, use 0.05 km intervals
  resampled_dist <- seq(0, max(track_data$cum_dist), by = 0.1)
  resampled_ele <- approx(track_data$cum_dist, track_data$ele, xout = resampled_dist)$y
  # calculate the elevation gain and loss over each 0.05 km segment
  ele_diff <- diff(resampled_ele)
  ele_gain <- sum(ele_diff[ele_diff > 0], na.rm = TRUE)
  ele_loss <- sum(-ele_diff[ele_diff < 0], na.rm = TRUE)
  stats <- list(
    total_elevation_gain_m = ele_gain,
    total_elevation_loss_m = ele_loss,
    max_elevation_m = max(track_data$ele, na.rm = TRUE),
    min_elevation_m = min(track_data$ele, na.rm = TRUE)
  )
  new_track_data <- data.frame(resampled_dist, resampled_ele)
  
  # Create elevation profile plot
  # the biggest difference between min and max is 141 m so set y axis limits to min -5 to 150 above that
  ele_plot <- ggplot(new_track_data, aes(x = resampled_dist, y = resampled_ele)) +
    geom_ribbon(aes(ymin = stats$min_elevation_m - 5, ymax = resampled_ele), fill = "#55aa55") +
    geom_line() +
    labs(x = "Distance (km)", y = "Elevation (m)") +
    ylim(stats$min_elevation_m - 5, stats$min_elevation_m + 150) +
    theme_minimal()
  
  yearcols <- c("date2023", "date2024", "date2025")
  weather_df <- data.frame()
  
  for (yr in yearcols) {
    # select column using variable yr
    date_for_yr <- date_df[date_df$event == city, yr]
    # if date_for_yr is na then skip to next iteration
    if (is.na(date_for_yr)) {
      next
    }
    # the date is written in dd/mm/yy format, convert to yyyy-mm-dd
    date_for_yr <- dmy(date_for_yr)
    # if date is in the future, skip to next iteration
    if (date_for_yr > Sys.Date()) {
      next
    }
    
    weather_forecast <- weather_history(
      location = c(mid_lat, mid_lon),
      daily = c("temperature_2m_max",
                "temperature_2m_min",
                "precipitation_sum",
                "windspeed_10m_max",
                "wind_direction_10m_dominant",
                "weather_code"),
      start = date_for_yr,
      end = date_for_yr
    )
    
    weather_forecast$event <- city
    weather_forecast$yr <- year(date_for_yr)
    
    weather_df <- rbind(weather_df, weather_forecast)
  }
  
  # Get alias for city from date_df
  alias <- date_df$alias[date_df$event == city]
  
  # Make an object to display Marathon stats
  p <- ggdraw() +
    draw_label(
      alias,
      fontfamily = 'serif',
      fontface = 'bold',
      x = 0.05,
      y = 0.95,
      hjust = 0,
      vjust = 1,
      size = 24
    ) +
    draw_label(
      # print the date which is stored as 16/4/26 in date2026 column
      # Should say 16th April 2026
      format(dmy(date2026), "%d %B %Y"),
      fontfamily = 'serif',
      fontface = 'italic',
      x = 0.05,
      y = 0.9,
      hjust = 0,
      vjust = 1,
      size = 16
    ) +
    draw_label(
      paste0(
        "Gain: ", round(stats$total_elevation_gain_m, 0), " m\n",
        "Loss: ", round(stats$total_elevation_loss_m, 0), " m\n",
        "Max: ", round(stats$max_elevation_m, 0), " m\n",
        "Min: ", round(stats$min_elevation_m, 0), " m\n"
      ),
      fontface = 'plain',
      x = 0.5,
      y = 0.8,
      hjust = 0.5,
      vjust = 1,
      size = 14
    )
  # Add weather info for each year
  w2023 <- load_weather_image(2023)
  w2024 <- load_weather_image(2024)
  w2025 <- load_weather_image(2025)
  
  for(yr in c(2023, 2024, 2025)) {
    p <- p +
      draw_label(
        paste0(yr),
        fontface = 'bold',
        x = ifelse(yr == 2023, 0.2, ifelse(yr == 2024, 0.5, 0.8)),
        y = 0.5,
        vjust = 0,
        size = 16
      )
    # if there is no row corresponding to yr in weather_df, skip to next iteration
    if (nrow(weather_df[year(weather_df$date) == yr, ]) == 0) {
      next
    }
    p <- p +
      draw_label(
        paste0(
          "High: ", round(weather_df$daily_temperature_2m_max[year(weather_df$date) == yr], 1), " °C\n",
          "Low: ", round(weather_df$daily_temperature_2m_min[year(weather_df$date) == yr], 1), " °C\n",
          "Precip: ", round(weather_df$daily_precipitation_sum[year(weather_df$date) == yr], 1), " mm\n",
          windsymbol(round(weather_df$daily_wind_direction_10m_dominant[year(weather_df$date) == yr], 0))," ", round(weather_df$daily_windspeed_10m_max[year(weather_df$date) == yr], 1), " km/h\n"
        ),
        fontface = 'plain',
        x = ifelse(yr == 2023, 0.2, ifelse(yr == 2024, 0.5, 0.8)),
        y = 0.25,
        size = 12
      )
  }
  
  # check is w2023, w2024, w2025 are not null before adding to plot
  if (!is.null(w2023)) {
    p <- p +
      draw_image(w2023, x = 0.2, y = 0.4, width = 0.2, height = 0.2, hjust = 0.5, vjust = 0.5)
  }
  if (!is.null(w2024)) {
    p <- p +
      draw_image(w2024, x = 0.5, y = 0.4, width = 0.2, height = 0.2, hjust = 0.5, vjust = 0.5)
  }
  if (!is.null(w2025)) {
    p <- p +
      draw_image(w2025, x = 0.8, y = 0.4, width = 0.2, height = 0.2, hjust = 0.5, vjust = 0.5)
  }
  
  # Make a cowplot and assemble the plots
  top_row <- plot_grid(p, route, ncol = 2)
  combined_plot <- plot_grid(top_row, ele_plot, ncol = 1, align = "v", rel_heights = c(3, 1))
  # Save the combined plot to a file
  ggsave(filename = paste0("Output/Plots/",city,"_summary.png"),
         plot = combined_plot, width = 12, height = 8, dpi = 300, bg = "white")
  
  # add to summary_df
  summary_df <- rbind(summary_df, data.frame(
    alias = alias,
    date2026 = date2026,
    total_elevation_gain_m = round(stats$total_elevation_gain_m, 0),
    avg_daily_temp_max = round(mean(weather_df$daily_temperature_2m_max, na.rm = TRUE),1)
  ))
}

# reorder summary_df by date2026
summary_df <- summary_df %>%
  arrange(dmy(date2026))

# Save summary_df to a tsv file
write.table(summary_df, file = "Output/Data/marathon_summary.tsv", sep = "\t", row.names = FALSE, quote = FALSE)


p1 <- ggplot() +
  # add coloured rectangles to indicate elevation
  geom_rect(aes(xmin = 7, xmax = 24, ymin = 0, ymax = 150), fill = "#d0f0d0", alpha = 0.5) +
  geom_rect(aes(xmin = 7, xmax = 24, ymin = 150, ymax = 200), fill = "#fff0b0", alpha = 0.5) +
  geom_rect(aes(xmin = 7, xmax = 24, ymin = 200, ymax = 420), fill = "#f0d0d0", alpha = 0.5) +
  # add points and labels from summary_df
  geom_point(data = summary_df, aes(x = avg_daily_temp_max, y = total_elevation_gain_m)) +
  geom_text_repel(data = summary_df, aes(x = avg_daily_temp_max, y = total_elevation_gain_m, label = alias), size = 3.5, max.overlaps = 1000, segment.color = "#7f7f7f7f", segment.size = 0.2) +
  lims(x = c(7, NA), y = c(0, NA)) +
  labs(x = "Average Daily Max Temperature (°C)", y = "Total Elevation Gain (m)") +
  theme_cowplot(11)
ggsave(filename = "Output/Plots/marathonComparison.png",
       plot = p1, width = 12, height = 8, dpi = 300, bg = "white")

We load into the event and for each row (event) we first load the gpx file. To use analyse_gpx() we read the data from the file, which contains height data and latitude/longitude coordinates. We convert these to Cartesian coordinates because we are dealing with coordinate sets from different parts of the world. This data is used to generate the roadmap. The elevation data is resampled at 100 m intervals so that we get a uniform elevation measurement to calculate total elevation gain and loss. This is used to create the elevation graph.

Statistics of the course can be requested via analyse_gpx() but as discussed above, I recalculated the elevation data and saved it along with the other metrics I needed. This saved an extra phone call analyse_gpx() which sped up the execution time.

Using the data from the last three editions, I looked up the historical weather data on those dates for a location that is at the center of the latitude/longitude coordinates. This was possible with the help of {openmeteo} that is a client to use the Open-Meteo API. I discovered that openweathermap (which I’ve used for other projects) charges for access to historical weather data. While Open-Meteo is really free. Once we have this weather data, we can use the WMO code to retrieve the appropriate icon from openweathermap. These codes show the most extreme weather of the day, rather than a perfect summary. I just wanted something that reflected the weather on the previous editions. The icons can be loaded from a URL using {png}. Finally, we have a wind direction that can be converted into arrows using the function above.

To compose the image, I used {cowplot} to compose the graphic and text elements. This “plot”, the route and the elevation profile were then compiled using {patchwork}. This is done for each event and saved as a file. The summary data is saved as we go so we can create the table and plot shown above in the post.

Conclusion

I’m quite happy with the result, but I can see a few ways to improve it. For example, I think it would be good to have a more advanced measure of marathon hardness. I could also use some custom fonts and improve the colors of the images to get a more professional look. Anyway, the goal was to figure out which marathon I would run in 2026 and I succeeded.

—

The title of the post comes from Choose Your Fighter by The Nova Twins. I saw their NPR Tiny Desk Concert this week.

#Choose #Fighter #datadriven #selection #marathon #bloggers

Choose Your Fighter: data-driven selection of the best marathon | R bloggers

The marathon data

Breakdown

Data accuracy

The code

Conclusion

Related

Like this:

Related

Similar Posts

Testing for interactions in nonlinear regression | R bloggers

How PET Media Group uses Tech to protect animals and buyers

Leave a Reply Cancel reply

The marathon data

Breakdown

Data accuracy

The code

Conclusion

Related

Share this:

Like this:

Related

Similar Posts

Leave a Reply Cancel reply