Visual summaries of Pacific Island populations by @ellis2013nz

This will be the first of several posts where I post some code and visualizations of population issues in the Pacific. The analysis and visualizations are quite simple. Together they show how to create (with publicly available data) all the statistical graphics used in a presentation I recently gave in Wellington migration and mobility in the Pacific.

This was as a side event to the Pacific ‘Heads of Planning and Statistics’ meeting, which takes place every two years and is the largest event that my team at the Pacific Community (SPC) organizes. All papers and presentations related to the meeting are available onlinewhich is definitely transparency in action.

It was nice to have a chance at this side event to talk about the substantive issues that emerge from the data, rather than (as usual in my meetings) how to improve the data, improve its use, and generally strategize and set priorities to improve the metrics. These things are important and (probably) fun, but it’s nice to put them aside and talk about some actual development issues every now and then. My talk was followed by a great panel discussion with speakers from academia, a UN agency, Stats NZ and a national planner from a Pacific island.

Today’s post is quite simple and involves producing two statistical graphs (one of which has both a “bare” and a “highlighted” version) that sets the scene for the Pacific population.

Download data

First I download and clean up the data. Everything I need for these maps is already in the Pacific Data Hub, making this quite easy. The only thing that requires some hassle is converting the country codes into user-friendly country names; and classifying each country into Melanesia, Polynesia, or Micronesia.

# This script produces a couple of general use plots on population growth in the Pacific
# for use in presentations on data issues

library(tidyverse)
library(rsdmx)
library(scales)
library(janitor)
library(ISOcodes)
library(glue)
library(spcstyle)
library(extrafont)
library(Cairo)
library(ggrepel)

# general use caption and font:
the_caption <- "Source: UN World Population Prospects, via the Pacific Data Hub"
the_font <- "Roboto" 

# Download all the mid year population estimates from PDH.stat
d <- readSDMX("https://stats-sdmx-disseminate.pacificdata.org/rest/data/SPC,DF_POP_PROJ,3.0/A.AS+CK+FJ+PF+GU+KI+MH+FM+NR+NC+NU+MP+PW+PG+PN+WS+SB+TK+TO+TV+VU+WF+_T+MEL+MIC+POL+_TXPNG+MELXPNG.MIDYEARPOPEST._T._T?startPeriod=1950&endPeriod=2050&dimensionAtObservation=AllDimensions") |> 
  as_tibble() |> 
  clean_names() |> 
  mutate(time_period = as.numeric(time_period))

# Some subregional classifications.
mel <- c("Melanesia", "Papua New Guinea", "Fiji", "Solomon Islands", "Vanuatu", "New Caledonia")
pol <- c("Polynesia", "Tonga", "Samoa", "Cook Islands", "Tuvalu", "American Samoa", "Pitcairn", "Wallis and Futuna", "French Polynesia", "Niue", "Tokelau")

# lookup table with country codes, names, and which subregion they are in
pict_names <- tribble(~Alpha_2, ~Name,
                      "_T", "All PICTs",
                      "MEL", "Melanesia",
                      "_TXPNG", "Total excluding PNG",
                      "POL", "Polynesia",
                      "MIC", "Micronesia")  |> 
  bind_rows(select(ISO_3166_1, Alpha_2, Name)) |> 
  rename(geo_pict = Alpha_2,
         pict = Name) |> 
  mutate(region = case_when(
    pict %in% mel ~ "Melanesia",
    pict %in% pol ~ "Polynesia",
    grepl("^_T", geo_pict) ~ "Total",
    TRUE ~ "Micronesia"
  ))

# Dataset that combines the original PDH.stat data with the country names and regional classifications
d2 <- d |> 
  mutate(era = ifelse(time_period <= 2025, "Past", "Future")) |> 
  inner_join(pict_names, by = "geo_pict") |> 
  mutate(pict = gsub("Federated States of", "Fed. States", pict)) |> 
  # Order country names from smallest to largest population in 2050:
  mutate(pict = fct_reorder(pict, obs_value, .fun = last))

Line plot

This puts us in a position to simply draw our first plot:

It’s very intuitive and I think it’s a necessary introduction to all the countries and areas we’re talking about. When we first created a version of this plot, I thought it would never be neat enough to use in a presentation, but in fact it works fine on a large conference screen, as long as we (as I have done) exclude the various regional and sub-regional totals.

All the hard work to produce this plot was done previously in data management, so producing the plot is just a single piece of code:

#----------------------time series line plot-------------

# This version just has 21 individual PICTs, no subregional totals. 21 fits
# ok on the screen in 3 rows of 7:
d2 |> 
  # remove subregional and regional totals, so only actual countries
  filter(!(pict %in% c("Micronesia", "Polynesia", "Melanesia") | 
             grepl("total", pict, ignore.case = TRUE) | 
             pict %in% c("All PICTs", "Pitcairn"))) |> 
  ggplot(aes(x = time_period, y = obs_value, colour = era)) +
  facet_wrap(~pict, scales = "free_y", ncol = 7) +
  geom_line() +
  theme(legend.position = "none",
        panel.grid.minor = element_blank(),
        strip.text = element_text(face="plain"),
        plot.caption = element_text(colour = "grey50")) +
  scale_y_continuous(label = comma) +
  scale_colour_manual(values = spc_cols(c(4, 2))) +
  # force all y axes to go to zero (but because free_y in the facet_wrap call, 
  # they will be on different scales for readability):
  expand_limits(y = 0) +
  labs(x = "", y = "",
       title = "Population in the Pacific, 1950 to 2050",
       subtitle = "Countries listed in sequence of projected population in 2050",
       caption = the_caption)

Scatter plot

The lines chart is a nice introduction to population and, most importantly, it is easy to understand. But unless people look carefully at the labels on the vertical axis, this gives no idea of the absolute size of the different countries, and only a very rough visual picture of the different growth rates.

When I was looking for a single image that would summarize two things, I came up with this diagram:

This is something we had prepared well before this talk and didn’t need to use yet, but it was exactly for this kind of use case: a single slide summary of the absolute size and growth rates of the Pacific Island States and Territories.

It takes some explanation and concentration from an audience, especially to explain why the negative growth area is shaded and what that means. The logarithmic scale for population size means that people are unlikely to realize how overwhelmingly large Papua New Guinea is compared to the rest of the Pacific; to show that properly, we really need another graph. But overall this is simple enough for people to understand.

What I like about this plot is that it makes clear the two broad categories of Pacific island countries and territories in terms of population: relatively large (i.e. >100,000 people!) and growing, which includes all of Melanesia and a few others; and small and shrinking, encompassing most of Polynesia and parts of Micronesia. Tonga, with an estimated population of about 104,000, is the borderline case: all countries larger than Tonga are growing in population; and almost all of those smaller ones are shrinking.

There are two areas I left out of this plot because the UN 2024 population projections, the data used, are materially outdated and I didn’t want to get sidetracked in explaining why that was. Hopefully we can include them in future versions of the jar soon.

Again, it was quite easy to create the plot with the data we already have. Here’s the R code to do that:

#----------------scatter plot comparing growth to totals---------------
# Summary data as one row per country for use in scatter plot
d3 <- d2 |> 
  group_by(pict, region) |> 
  summarise(pop2025 = obs_value[time_period == 2025],
            pop2020 = obs_value[time_period == 2020]) |> 
  mutate(cagr = (pop2025 / pop2020) ^ (1/5) - 1) |> 
  mutate(point_type = if_else(pict %in% c("Micronesia", "Polynesia", "Melanesia") | region == "Total", "total_like", "country"),
         # font type has to use identity scale, no scale to map it        
         font_type = ifelse(point_type == "total_like", 4, 1),
         # couldnt' get Melanesia in the right spot with ggrepel so have to make a specific adjustment for it:
         adjusted_x = ifelse(pict == "Melanesia", pop2025 * 1.35, pop2025))

# For a presentation used at HOPS7, I want
# 1) scatter plot but without the region and subregions, to avoid clutter
# 2) as 1 but with the shared sovereign countries highlighted eg with a circle around them.
#
# I also excldued two countries that had conspicuously out-of-date data that I didn't
# want visually prominent.

d4 <- d3 |> 
  filter(point_type == "country") |> 
  # two countries/territories have materially wrong estimates that
  # are distracting, better to just drop them from the chart
  filter(!pict %in% c("Tokelau", "Micronesia, Fed. States"))

p2b <- d4 |> 
  ggplot(aes(x = pop2025, y = cagr, colour = region)) +
  # Draw a pale (transparent, alpha) background rectangle for the negative growth countries:
  annotate("rect", xmin = 30, xmax = Inf, ymin = 0, ymax = -Inf, alpha = 0.1, fill = "red") +
  # Largish points for each country:
  geom_point(size = 2.5) +
  # labels for each country:
  # geom_label_repel(aes(label = pict), seed = 7, family = the_font, size = 2.7, label.size = 0, fill = "transparent") +
  geom_text_repel(aes(x = adjusted_x, label = pict, fontface = font_type), seed = 6, family = the_font, size = 2.7) +
  # For the smaller countries, use actual populations as the points for markers on the axis.
  # For larger than 10,000, there are too many countries and it would be cluttered, so use 3, 10, 30, 100, etc.
  scale_x_log10(label = comma, 
                breaks = signif(c(sort(unique(d3$pop2025))[c(1:4, 8, 9, 12, 23:25)], 3e5), 3)) +
  scale_y_continuous(label = percent) +
  # Use SPC colours for the four subregion types:
  scale_colour_manual(values = c("Micronesia" = spc_cols(1), "Polynesia" = spc_cols(3), "Melanesia" = spc_cols(4), "Total" = "grey50")) +
  # Readable x axis tick marks (at an angle); and not too many vertical gridlines:
  theme(axis.text.x = element_text(angle = 45, hjust = 1),
        panel.grid.minor = element_blank(),
        plot.caption = element_text(colour = "grey50")) +
  # labels for the axes, plot title, legend:
  labs(x = "Population in 2025 (logarithmic scale)",
       y = "Compound annual population growth rate 2020 to 2025",
       colour = "",
       title = "Current population and recent growth in the Pacific",
       subtitle = "Populations of the Pacific Island country and territory members of the Pacific Community (SPC). 
",
       caption = the_caption)

There are a few tricks being used here, the most important of which is probably the way I’ve used the exact population sizes as labels on the horizontal axis. This is something that works well with a small number of points, and that I learned from a Tufte book.

Scatter plot with highlights

Finally, for today I wanted a version of the same plot highlighting the countries that could easily be moved into a bigger and richer country – i.e. France (three territories), the US (three territories and three self-governing countries), New Zealand (three members of the ‘Empire of New Zealand’) or the United Kingdom (Pitcairn). One of the themes of my lecture was the way people live in countries can move, a certain number of them in general Doing. This is a very politically and culturally sensitive point, and I am not going to attempt to explore the reasons for this here, but we can certainly note it as a dominant fact of importance for understanding the demographic dynamics of the Pacific. It is one of two or three crucial points that explain many of the differences between, for example, Kiribati (very densely populated on Tarawa and relatively poor) and the Marshall Islands (less obvious excessive population density, higher living standards).

My plot with the highlights (which are just too large point geographies with shape number 1, a hollow circle) shows this nicely, I believe:

And here’s the code for that plot:

easy_mobility <- c("Pitcairn", 
                   "Niue", "Tokelau", "Cook Islands", 
                   "Wallis and Futuna", "New Caledonia", "French Polynesia",
                   "Guam", "Northern Mariana Islands", "American Samoa",
                   "Marshall Islands", "Palau", "Micronesia, Fed. States")

# check all are in data apart from the two we deliberately dropped
stopifnot(sum(!easy_mobility %in% d4$pict) == 2)

d4 |> 
  ggplot(aes(x = pop2025, y = cagr, colour = region)) +
  # Draw a pale (transparent, alpha) background rectangle for the negative growth countries:
  annotate("rect", xmin = 30, xmax = Inf, ymin = 0, ymax = -Inf, alpha = 0.1, fill = "red") +
  # Largish points for each country:
  geom_point(size = 2.5, alpha = 0.5) +
  # labels for each country:
  # geom_label_repel(aes(label = pict), seed = 7, family = the_font, size = 2.7, label.size = 0, fill = "transparent") +
  geom_text_repel(aes(x = adjusted_x, label = pict, fontface = font_type), seed = 6, family = the_font, size = 2.7) +
  # For the smaller countries, use actual populations as the points for markers on the axis.
  # For larger than 10,000, there are too many countries and it would be cluttered, so use 3, 10, 30, 100, etc.
  geom_point(data = filter(d4, pict %in% easy_mobility), size = 6, shape = 1, colour = "black") +
  scale_x_log10(label = comma, 
                breaks = signif(c(sort(unique(d3$pop2025))[c(1:4, 8, 9, 12, 23:25)], 3e5), 3)) +
  scale_y_continuous(label = percent) +
  # Use SPC colours for the four subregion types:
  scale_colour_manual(values = c("Micronesia" = spc_cols(1), "Polynesia" = spc_cols(3), "Melanesia" = spc_cols(4), "Total" = "grey50")) +
  theme_minimal(base_family = the_font) +
  # Readable x axis tick marks (at an angle); and not too many vertical gridlines:
  theme(axis.text.x = element_text(angle = 45, hjust = 1),
        panel.grid.minor = element_blank(),
        plot.caption = element_text(colour = "grey50")) +
  # labels for the axes, plot tile, legend:
  labs(x = "Population in 2025 (logarithmic scale)",
       y = "Compound annual population growth rate 2020 to 2025",
       colour = "",
       title = "Current population and recent growth in the Pacific",
       subtitle = "Populations of the Pacific Island country and territory members of the Pacific Community (SPC). 
Countries and territories with easy migration access to a larger country are highlighted.",
       caption = the_caption)

That’s all for today. In subsequent blogs I will show how I drew the other graphs in the original presentation, showing net migration, size of the diaspora, populations of Pacific Islanders in different global cities and remittances.

#Visual #summaries #Pacific #Island #populations #ellis2013nz #bloggers

Visual summaries of Pacific Island populations by @ellis2013nz | R bloggers

Download data

Line plot

Scatter plot

Scatter plot with highlights

Related

Like this:

Related

Similar Posts

Testing for interactions in nonlinear regression | R bloggers

T3RN, Interoperability and the next wave of real adoption

Leave a Reply Cancel reply

Download data

Line plot

Scatter plot

Scatter plot with highlights

Related

Share this:

Like this:

Related

Similar Posts

Leave a Reply Cancel reply