Some papua -NEW -Guinea -Data Doodles by @ellis2013nz | R-Bloggers

Some papua -NEW -Guinea -Data Doodles by @ellis2013nz | R-Bloggers

13 minutes, 11 seconds Read

I wrote a review last week of Struggle, reform, boom and bust: an economic history of Papua -NEW -Guinea since independence By Stephen Howes and others. The review was published in the DEVPolicy blog in the Australian National University Development Policy Center in the Crawford School of Public Policy. My message today teases a few data -related problems that I thought of and investigated while reading the book and writing that review.

What is ‘real’?

First consider this graph of the real gross domestic product (GDP) per person, which differs a bit from the person I have included in my review:

It’s even closer to A graph used by PNG’s treasurer Ian Ling-Stuckey When he launched the book at the University of PNG on August 20. The treasurer chose to use the consumer price index (CPI) instead of the GDP deflator to make prices comparable over time. His reasoning was to emphasize changes in the standard of living for ordinary Papua -New Guinans. He also dropped the full GDP series, to concentrate on non-resources BBP for the same reason. These are both reasonable choices. Their combined impact is to make the decrease in the average standard of living of the independence period very visible, from around 11,000 kina per person in 2024 prices 50 years ago, up to 8,500 today during the “slow bust” period identified in the book in question.

That is true, the economic well-being of the average Papua New Guinese, measured in terms of what they can buy with their ‘share’ of the country’s GDP, today is more than 20% lower than in independence 50 years ago.

Here is the graph that I actually included in the book review, which is essentially the same as Figure 1.2 in the book other than a few small aesthetic improvements. It uses the GDP deflator and it is noticeable that there has been less inflation because of this measure (and therefore the GDP seems to have increased per person instead of considerably decreased).

For the choice of CPI versus GDP deflator it is all about the basket with goods used to form the price index – what the average household consumeor what is produced In the country. In an earlier 2022 work, Howes explicitly proposed to use CPI as a deflator when interested in how many consumers can buy with their notional share of GDP, and the GDP defator when comparing the value of what is produced in the country. For a general discussion about economic well-being in Papua New Guinea, what the treasurer wanted in August, I agree with the use of CPI. In my review I went with the GDP deflator instead, just because that meant that I effectively replicated a graph from the book; I didn’t want to go too far in my own analysis.

The issue of GDP of GDP or GDP of non-resouration is interesting and is extensively discussed in the book. I chose to record both of them because the divergence, especially during the last “bust” period, is important in itself. Economic activity that is indirectly caused by resources is still included in GDP of non-resource (for example, government activities financed with taxes; and added value in domestic industries with which employees of the resource industry and services buy). Howes et al would have used the gross national income or a variant of it if they could, but It is not available.

All other points of discussion with regard to this graph are just a few finer points from the polishing of graph: my decision to use gray background rights instead of vertical lines (which in my opinion are more messy) to distinguish the four phases; where to include the label annotations of each phase; And the use of colored direct labels on the two time series instead of a more conventional (but more effort for the reader) legend.

One of the great things of this book is that all data from the economic time series behind it have been published and held by Anu, such as the PNG Economic Database.

Data availability is a major problem for Papua -New -Guinea. First and foremost in problems, its existence is primarily; I have already noted that we do not have a gross national income measure, and other serious gaps include a study of households and expenditures that can be used to measure poverty and food security, and a work forces investigation for understanding employment. There are efforts to improve all of this. But even if it exists, it can be difficult to find, even more for historical data. The Anu’s PNG Economic Database is a brilliant public response to this aspect of the problem, where the economic time series are available in one place.

Note that the ‘about’ information is a bit outdated for this database; Data is now more up -To date then will be claimed (for example, the population goes until 2024, but the ‘About’ only claims that it will be until 2021).

Anyway, the existence of this database, which is accessible as a tableau interactive tool or in bulk as a CSV, is what makes it possible for us to re -make graphics such as the above, with the same data as the book. Here is the R code for signing the first of the two GDP graphs above. The code for the second graph is omitted here, but can be found On Github.

library(tidyverse)
library(spcstyle)
library(scales)
library(ggtext)
library(RColorBrewer)
library(rsdmx)

#---------------download data, set up palette---------------

# Read in the ANU's PNG economic database. Download from 
# https://pngeconomic.devpolicy.org/
pnged <- read_csv("PNG economic database.csv")

# era_cols <- brewer.pal(6, "Set1")[1:4]
era_cols <- c("grey10", "white", "grey10", "white")
gdp_cols <- brewer.pal(7, "Set1")[c(5,7)]


#-------------------------CPI so we see prices facing consumers---------

 # ratio of CPI to non-resource GDP deflator
 cpi_def <- pnged |> 
   filter(Variable%in% c("Non-resource GDP deflator", "CPI deflator")) |> 
   select(Variable, Year, Amount) |> 
   spread(Variable, Amount) |> 
   # rebase to a set year:
   mutate(across(`CPI deflator`:`Non-resource GDP deflator`, 
                 function(x){x / x[Year == 1990]})) |> 
   mutate(ratio = `CPI deflator` / `Non-resource GDP deflator`) 
 
 # from 1990 to 2022, CPI has increased about 40% more than the GDP deflator
 # so if you want to see the living standards of PNGans, there is a case to use
 # the CPI instead
 
 # draw plot:
pnged |> 
   filter(Variable %in% c("Non-resource GDP (current prices, new series)", 
                          "GDP (current prices, new series)", "Population")) |> 
   select(Variable, Year, Amount) |> 
   spread(Variable, Amount) |> 
   mutate(nr_gdp_pp = `Non-resource GDP (current prices, new series)` / Population * 1e6,
          gdp_pp = `GDP (current prices, new series)` / Population * 1e6 ) |> 
   select(Year, nr_gdp_pp, gdp_pp) |> 
   gather(variable, value, -Year) |> 
   drop_na() |>
   left_join(cpi_def, by = "Year") |> 
   mutate(value = value  / `CPI deflator` * filter(cpi_def, Year == 2024)$`CPI deflator`) |> 
   ggplot(aes(x = Year, y = value, colour = variable)) +
   annotate("rect", xmin = 1975, xmax = 1988.5, ymin = -Inf, ymax = Inf, fill = era_cols[1], alpha = 0.1) +
   annotate("rect", xmin = 1988.5, xmax = 2003.5, ymin = -Inf, ymax = Inf, fill = era_cols[2], alpha = 0.1) +
   annotate("rect", xmin = 2003.5, xmax = 2013.5, ymin = -Inf, ymax = Inf, fill = era_cols[3], alpha = 0.1) +
   annotate("rect", xmin = 2013.5, xmax = 2022.5, ymin = -Inf, ymax = Inf, fill = era_cols[4], alpha = 0.1) +
   geom_line(linewidth = 2) +
  # option, can uncomment this and you get a point showing each observation. 
  # It is helpful to see the actual point, but adds clutter.
  #   geom_point(colour = "white") +
   annotate("text", label = c("'Struggle'", "'Reform'", "'Boom'", "'Bust'"), y = 14100, 
            x = c(1981.5, 1996, 2009, 2018), hjust = 0.5, fontface = 4, alpha = 0.8) +
   annotate("text", colour = gdp_cols, x = 2020, y = c(10200, 7600), 
            label = c("All GDP", "Non-resources GDP")) +
   scale_colour_manual(values = gdp_cols) +
   scale_y_continuous(label = comma, breaks = 6:14 * 1000) +
   labs(y = "Kina (2024 prices, based on CPI deflator)",
        x = "",
        title = "Real gross domestic product per person in Papua New Guinea",
        subtitle = "Annotated with the periods used in Struggle, reform, boom and bust: an economic history of Papua New Guinea since independence",
        caption = "Source: ANU's PNG Economic Database, https://pngeconomic.devpolicy.org/")  +
   theme(legend.position ="none",
         plot.subtitle = element_markdown())

Population

The most fundamental national statistics is always population, and unfortunately for PNG there is more than normal uncertainty about how many people live in the country. A census in PNG – with its geographical, linguistic, cultural, political and security challenges – is one of the more difficult exercises in official statistics collection all over the world. There are the most important facts with regard to population estimates:

  • Birth and death registration has insufficient coverage to estimate the death rates. Instead, these must be estimated by survey or census questions such as “Woman X in this household, she has given birth in the last 12 months; and if so, is the child still alive?”, Which can be linked to Life tables model.
  • The census of 2024, delayed from 2021 because of COVID and reduced to a minimalist six questions per household (which do not contain any questions such as the above example, but at least include gender and age), must soon report.
  • The census of 2011 (the report thereof Is here) has been criticized and the estimates of the population are considered by many (for example in the Struggle, reform, boom and bust book) as unsuitable for use.
  • The 2000 census is often referred to as “the last credible population estimate” or in similar terms. The population estimates in the Anu PNG Economic Database Take the 2000 census population as a reference point and from that moment take on a steady growth rate.
  • A Exercise by the National Statistics Office and Worldpop in 2023 published an estimated population before 2021 (from the beginning of September 2025 the results were on The official PNG statistics website). A statistical learning model was trained on satellite images with a malaria survey to offer the “land truth” population. The result (11.7 m in 2021) was high according to existing standards, but not outside the possible range.
  • Different modeled estimates exist, based on some or all the above (plus earlier counts) in different ways.

The state of playing with regard to the estimates of PNG is shown in the following graph:

At the time of writing (at the beginning of September 2025) the estimates of the Census and Worldpop Point are the official statistics on the website of the PNG National Statistics Office. The UN population projects, which are re-dinner through the Pacific community Pacific Data Huband the population estimates in the Anu PNG Economic Databaseare displayed as slippery lines.

The growth between 2011 and 2021 implied by accepting both the census of 2011 and 2021 Worldpop estimates is incredibly fast (4.9% per year). However, there is no way to determine whether the 2011 figure is an undercount, the 2021 an overestimation or both.

The estimate of the Anu – which eventually goes back to Modeleringsing efforts of Bourke and all in 2021- are probably a bit low, a point made by the treasurer when he launched the book. But again, not from the plausible range.

Of course, all this uncertainty feeds on with other statistics: the denominator for GDP per head of the population, registration percentages, etc.; and the construction of sampling frames and survey wages for population surveys.

To improve this, a lot depends on obtaining reliable census data.

Here is the code for making those population estimates graphic:

#--------------population, comparison of data sources--------------
pop_anu <- pnged |> 
  filter(Variable %in% c("Population")) |> 
  select(Year, population = Amount)

pop_pdh <- readSDMX("https://stats-sdmx-disseminate.pacificdata.org/rest/data/SPC,DF_POP_PROJ,3.0/A.PG.MIDYEARPOPEST._T._T?startPeriod=1975&endPeriod=2025&dimensionAtObservation=AllDimensions") |> 
  as_tibble() |> 
  select(year = TIME_PERIOD,
         `UN method` = obsValue) |> 
  mutate(year = as.numeric(year))

# sources:
# https://png-data.sprep.org/system/files/2011%20Census%20National%20Report.pdf
# https://www.nso.gov.pg/statistics/population/ (for WorldPop, accessed 6/9/2025)
specifics <- tribble(~year, ~variable, ~value,
                      2021, "WorldPop method", 11781779,
                      2011, "Census method", 7254442 + 20882, # including both citizens and non-citizens
                      2000, "Census method", 5171548 + 19235,
                      1990, "Census method", 3582333 + 25621,
                      1980, "Census method", 2978057 + 32670) |> 
  # make WorldPop appear first in the legend, better visually:
  mutate(variable = fct_relevel(variable, "WorldPop method"))

# Draw plot
pop_anu |> 
  select(year = Year, `ANU method` = population) |> 
  full_join(pop_pdh, by = "year") |> 
  gather(variable, value, -year) |> 
  # make UN appear first in legend, better visually:
  mutate(variable = fct_relevel(variable, "UN method")) |> 
  ggplot(aes(x = year, y = value, colour = variable)) +
  geom_line(data = filter(specifics, grepl("Census", variable)), colour = "grey50", linetype = 2) +
  geom_line() +
  geom_point(data = specifics, aes(colour = NULL, shape = variable), size = 3) +
  scale_shape_manual(values = c("Census method" = 19, "WorldPop method" = 15)) +
  scale_y_continuous(label = comma) +
  labs(shape = "Single-year", colour = "Multi-year",
       x = "", y = "",
      title = "Different estimates of Papua New Guinea's population",
    subtitle = "Independence to 2025",
  caption = "Source: PNG National Statistics Office (for WorldPop); 2011 National Census Report; ANU PNG economic database; Pacific Data Hub.stat")

Employment

I mentioned in my book review that formal employment is less than 5% of the total population. A more usual measure would be the part of the working age of the working age, but I should have got that denominator from elsewhere and did not have time. Here is the graph that I pulled for myself to check whether this disposable commentary was justified:

What is especially interesting for me is the very low and decreasing part of the population in formal employment. However, it is also interesting to notice the data hiases with regard to employment in the public sector; And the correlation of changes in total employment with the “tree” and “bust” periods that are the driver of the original book.

That graph was signed with this code.

pnged |> 
  filter(Variable %in% c("Total (excluding public service) employment",
                         "Public service employment")) |> 
  left_join(pop_anu, by = "Year") |>
  mutate(Amount = Amount / population) |>
  mutate(Variable = fct_reorder(str_wrap(Variable, 30), Amount, .desc = TRUE)) |> 
  ggplot(aes(x = Year, y = Amount, colour = Variable)) +
  annotate("rect", xmin = 1975, xmax = 1988.5, ymin = -Inf, ymax = Inf, fill = era_cols[1], alpha = 0.1) +
  annotate("rect", xmin = 1988.5, xmax = 2003.5, ymin = -Inf, ymax = Inf, fill = era_cols[2], alpha = 0.1) +
  annotate("rect", xmin = 2003.5, xmax = 2013.5, ymin = -Inf, ymax = Inf, fill = era_cols[3], alpha = 0.1) +
  annotate("rect", xmin = 2013.5, xmax = 2022.5, ymin = -Inf, ymax = Inf, fill = era_cols[4], alpha = 0.1) +
  geom_line() +
  annotate("text", label = c("'Struggle'", "'Reform'", "'Boom'", "'Bust'"), y = 0.0585, 
            x = c(1981.5, 1996, 2009, 2018), hjust = 0.5, fontface = 4, alpha = 0.8) +
  scale_y_continuous(label = percent, limits = c(0, 0.06)) +
  labs(x = "", y = "Proportion of population",
        title = "Formal employment in Papua New Guinea",
      subtitle = "As a proportion of the population (including children and elderly)")

Vaccination

I finally noticed it in the Anu PNG Economic Database Data on vaccination, which is referred to in the book, but does not yet receive a source in the database documentation. There are too many observations to be this and research data, so they must be a kind of health managers. I would treat this with great caution. But the point made in the book is undoubtedly sound that these vaccination rates are low according to world standards and do not go in the right direction:

The code to produce that graph is similar in pattern with all code so far.

pnged |> 
  filter(grepl("Immunization", Variable)) |> 
  ggplot(aes(x = Year, y = Amount, colour = Variable)) +
  annotate("rect", xmin = 1975, xmax = 1988.5, ymin = -Inf, ymax = Inf, fill = era_cols[1], alpha = 0.1) +
  annotate("rect", xmin = 1988.5, xmax = 2003.5, ymin = -Inf, ymax = Inf, fill = era_cols[2], alpha = 0.1) +
  annotate("rect", xmin = 2003.5, xmax = 2013.5, ymin = -Inf, ymax = Inf, fill = era_cols[3], alpha = 0.1) +
  annotate("rect", xmin = 2013.5, xmax = 2022.5, ymin = -Inf, ymax = Inf, fill = era_cols[4], alpha = 0.1) +
  geom_line() +
  annotate("text", label = c("'Struggle'", "'Reform'", "'Boom'", "'Bust'"), y = 85, 
            x = c(1981.5, 1996, 2009, 2018), hjust = 0.5, fontface = 4, alpha = 0.8) +
  scale_y_continuous(label = percent_format(scale = 1)) +
  labs(x = "", y = "", colour = "",
       title = "Immunization rates in Papua New Guinea",
      subtitle = "Proportion of children 12-23 months for measles and DPT; one-year old children for HepB3. Treat data with caution.") 

Well that’s for today. I just thought I would put some of these things in a blog while I thought about it. At some point I will definitely return to PNG topics; And of course this whole area is one considerable part of my daily job.


#papua #Guinea #Data #Doodles #ellis2013nz #RBloggers

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *