Anes 2024 is out! How to analyze the data with R | R-Bloggers

Anes 2024 is out! How to analyze the data with R | R-Bloggers

8 minutes, 0 seconds Read

[This article was first published on R Works, and kindly contributed to R-bloggers]. (You can report problems here about the content on this page)


Do you want to share your content on R-bloggers? Click here if you have a blog, or here If you don’t.

Last fall, my co-authors Stephanie Zimmer” Rebecca Powelland I released Research into complex survey data analysis with R. Earlier in August we had the joy for one workplace Based on the book on user! 2025.

One of our favorite examples in both the book and the workshop comes from the American National Election Studies (Anes)A long -term project from Stanford University and the University of Michigan supported by the National Science Foundation. Since 1948, Anes has introduced almost every two years of surveys, making it one of the richest means for studying public opinion and voting behavior in American elections. The data relates to everything, from party relationships to trust in the government, and are carefully designed to represent the voting population.

Only a few days after the user return!, I was very happy to see that the brand new Anes 2024 data set had been released!

The full release of the Anes 2024 Time Series #data is now available. More details here: election studies.org/anes-announc…

[image or embed]

– American national election studies (@Election studies.bsky.social)) August 12, 2025 at 12:24 pm

Survey data such as Ane’s are not as easy as a simple random sample. Some groups are deliberately overservonst, others are more difficult to reach, and not every respondent has the same chance of being admitted. To correct this, offer surveys “weights”, or numbers that tell us how much each reaction must “count” when estimating the results for the entire population.

If we ignore these weights and simply calculate rough percentages, we run the risk of drawing the wrong conclusions. That’s where the {questionnaire} And {Srvyr} Packages come in: they treat weights, layers and clusters behind the scenes, so that our estimates thoroughly reflect the research design.

So let’s go through how you can start analyzing this dataset in R with the packages {survey} and {srvyr}. For more depth, the Gets to work Chapter of our book is a handy companion.

Download the data

Go to the Election studies Website And take the CSV version of the Time Series Study 2024. The download is supplied as a ZIP file that contains both the data set and documentation.

Charging packages and data

We will use the Tidyverse for a fight, and {survey} plus {srvyr} for analysis:

library(tidyverse)
library(survey)
library(srvyr)

anes_2024 <-
  read_csv("anes_timeseries_2024_csv_20250808.csv")

View the documentation

Viewing the survey documentation is a crucial part of survey analysis. The Anes Materials Include details about the example, the weighting and the code book.

Sample

The 2024 monster was designed to represent American citizens aged 18 or older. Respondents were recruited from both personal and web monsters, with careful notes about suitability and exclusions.

Cod book

The code book decodes variable names in something human readable.

Here are two variables that we will use:

  • V240002b: Mode or interview: Post-election: This variable catches the mode in which the respondent completed the majority of the survey after the elections.
    • -7. Insufficient parital, interview deleted
    • -6. No post -interview
      1. Face-to-face (FTF)
      1. Internet (web)
      1. Paper (papi)
      1. Telephone
      1. Video
  • V241229: Plus: how often the government in Washington trust to do what is good [REVISED]:
    • -9. Refused
    • -8. Don’t know
    • -1. Not applicable
      1. Always
      1. Generally
      1. About half the time
      1. Some of the time
      1. Never

Let’s code again V241229 For easier analysis:

anes_2024_code <-
  anes_2024 |> 
  mutate(
    TrustGovernment = factor(
      case_when(
        V241229 == 1 ~ "Always",
        V241229 == 2 ~ "Most of the time",
        V241229 == 3 ~ "About half the time",
        V241229 == 4 ~ "Some of the time",
        V241229 == 5 ~ "Never",
        .default = NA
      ),
      levels = c("Always", "Most of the time", "About half the time", "Some of the time", "Never")
    )
  )

See more about codebooks.

Make a design object

The design object is the backbone of survey analysis. Research data use weights, layers and clusters to display the population. That means that we need a survey (design object before we perform analyzes, and the design object tells which weights to use, how respondents were sampled and what adjustments are needed. Without this, estimates that we calculate can be misleading.

To build the design object, we need three important ingredients:

  1. Weights: to ensure that the results reflect the population.
  2. Strata: Groups that are used in the event of sampling to guarantee coverage.
  3. Clusters (PSUs): Primary sampling units, such as households or addresses.

Find the right weight

The Anes -documentation provides an overview of which weight variables are in which example:

Because we look at the fresh personal monster after the elections, we use weight variable V240101b.

An important remark: Anes weights count up to the sample size, not to the population size. Let’s confirm:

anes_2024_code |> filter(V240101b > 0) |> summarize(sum = sum(V240101b))
# A tibble: 1 × 1
    sum
  
1   925

This corresponds to the 925 Respondents shown in the example table of the documentation.

Adjustment for the population

If we want to draw conclusions all over the American population, we have to adjust the weights. The target population for this sample is around 232.5 million American citizens aged 18 or older, but that is just a rough number. According to the documentation, we can get the population numbers for a little more accurate from the estimates of the American Community Survey of 2023 American Community.

Thank you, Stephanie, for writing the code for the estimates of the population!

library(tidycensus)

varlist_2023 <- load_variables(2023, "acs1")

citizen_pop_18_plus <-
  get_acs(
    geography = "state",
    variable = "S2901_C01_001",
    year = 2023,
    survey = "acs1"
  )

pums_vars_2023 <- pums_variables |>
  as_tibble() |>
  filter(year == 2023, survey == "acs1")

pums_vars_2023 |>
  filter(var_code %in% c("AGEP", "TYPEHUGQ", "CIT")) |>
  select(var_code, var_label, val_min, val_max, val_label) |>
  print(n = 50)

# Find those who are 18+ and in group quarters
get_citizen_pop_18_plus_gq <- function(state) {
  get_pums(
    variables = c("AGEP", "TYPEHUGQ"),
    state = state,
    year = 2023,
    survey = "acs1",
    variables_filter = list(
      TYPEHUGQ = 2:3 ,
      CIT = 1:4,
      AGEP = (18:200)
    )
  ) |>
    summarize(estimate = sum(PWGTP), .by = STATE)
}

citizen_pop_18_plus_gq_l <- c(state.abb, "DC") |>
  map(get_citizen_pop_18_plus_gq)

citizen_pop_18_plus_gq_df <-
  citizen_pop_18_plus_gq_l |>
  list_rbind() |>
  rename(estimate_gq = estimate)

state_pops <- citizen_pop_18_plus |>
  select(GEOID, NAME, estimate_tot = estimate) |>
  full_join(citizen_pop_18_plus_gq_df, by = c("GEOID" = "STATE")) |>
  filter(NAME != "Puerto Rico") |>
  mutate(estimate_scope = estimate_tot - estimate_gq)

# Total without AK and HI
targetpop <-
  state_pops |>
  filter(!GEOID %in% c("02", "15")) |>
  summarize(TargetPop = sum(estimate_scope))

# Total with AK and HI
state_pops |>
  summarize(TargetPop = sum(estimate_scope))

The estimated target population (targetpop) Before 2024 is 232,449,541. Now we can save the Anes weights so that they get the population size:

anes_adjwgt <- anes_2024_code |>
  mutate(Weight = V240101b / sum(V240101b, na.rm = TRUE) * targetpop)

Layers and clusters

The documentation also tells us which variables of the layers and PSU (cluster) match each weight.

For V240101B these are:

  • Layers: V240101d
  • PSU: V240101c

More about Test design can be found in this chapter.

Put all the way together

Finally, we can build the design object. We will also filter to respondents who have actually completed the personal interview after the elections (V240002b == 1):

options(survey.lonely.psu = "adjust")

anes_des <- anes_adjwgt |>
  filter(V240002b == 1) |>
  as_survey_design(
    weights = Weight,
    strata = V240101d,
    ids = V240101c,
    nest = TRUE
  )

anes_des

And there we go! anes_des is our fully specified survey; From now on we use it every estimate that we calculate reflects the design of the survey.

Analyze the data

Now for the nice part! Let’s see how people have answered the question of the Trust Government:

(
  trustgov <- anes_des |>
    drop_na(TrustGovernment) |>
    group_by(TrustGovernment) |>
    summarize(p = survey_prop(vartype = "ci")) |>
    mutate(Variable = "V241229") |>
    rename(Answer = TrustGovernment) |>
    select(Variable, everything())
)
# A tibble: 5 × 5
  Variable Answer                    p   p_low  p_upp
                            
1 V241229  Always              0.00774 0.00288 0.0206
2 V241229  Most of the time    0.174   0.122   0.242 
3 V241229  About half the time 0.283   0.225   0.351 
4 V241229  Some of the time    0.402   0.341   0.465 
5 V241229  Never               0.133   0.0857  0.201 

And a fast plot:

The results show how varied audience trust is, with only a small part of the respondents who say they always trust the government.

We can also look at the data of subgroups:

anes_des <-
  anes_des |>
  mutate(Gender = factor(
    case_when(V241550 == 1 ~ "Male", V241550 == 2 ~ "Female", .default = NA),
    levels = c("Male", "Female")
  ))

(
  trustgov_gender <- anes_des |>
    drop_na(Gender, TrustGovernment) |>
    group_by(Gender, TrustGovernment) |>
    summarize(p = survey_prop(vartype = "ci")) |>
    mutate(Variable = "V241229") |>
    rename(Answer = TrustGovernment) |>
    select(Variable, everything())
)
# A tibble: 10 × 6
# Groups:   Gender [2]
   Variable Gender Answer                    p    p_low  p_upp
                                
 1 V241229  Male   Always              0.0123  0.00379  0.0393
 2 V241229  Male   Most of the time    0.194   0.109    0.322 
 3 V241229  Male   About half the time 0.249   0.178    0.336 
 4 V241229  Male   Some of the time    0.423   0.321    0.533 
 5 V241229  Male   Never               0.121   0.0659   0.212 
 6 V241229  Female Always              0.00336 0.000744 0.0150
 7 V241229  Female Most of the time    0.153   0.0984   0.230 
 8 V241229  Female About half the time 0.320   0.208    0.457 
 9 V241229  Female Some of the time    0.381   0.285    0.487 
10 V241229  Female Never               0.142   0.0736   0.257 

Pack

The release of Anes 2024 is full of insights, and what we have treated here is just the beginning. With hundreds of variables compared to attitudes, demography and behaviors you will certainly find questions that are worth exploring.

If you try the data set, I would like to hear what you discover. Feel free to share your own analyzes or visualizations!

Find me on bluesky: @IVelasQ3

View more sources:


#Anes #analyze #data #RBloggers

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *