AE 04: Gerrymandering + data exploration I

Suggested answers

Important

These are suggested answers. This document should be used as reference only, it’s not designed to be an exhaustive key.

Getting started

Packages

We’ll use the tidyverse package for this analysis.

library(tidyverse)
library(usdata)

gerrymander <- usdata::gerrymander

Data

The data are available in the usdata package.

glimpse(gerrymander)
Rows: 435
Columns: 12
$ district   <chr> "AK-AL", "AL-01", "AL-02", "AL-03", "AL-04", "AL-05", "AL-0…
$ last_name  <chr> "Young", "Byrne", "Roby", "Rogers", "Aderholt", "Brooks", "…
$ first_name <chr> "Don", "Bradley", "Martha", "Mike D.", "Rob", "Mo", "Gary",…
$ party16    <chr> "R", "R", "R", "R", "R", "R", "R", "D", "R", "R", "R", "R",…
$ clinton16  <dbl> 37.6, 34.1, 33.0, 32.3, 17.4, 31.3, 26.1, 69.8, 30.2, 41.7,…
$ trump16    <dbl> 52.8, 63.5, 64.9, 65.3, 80.4, 64.7, 70.8, 28.6, 65.0, 52.4,…
$ dem16      <dbl> 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0,…
$ state      <chr> "AK", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AR", "AR",…
$ party18    <chr> "R", "R", "R", "R", "R", "R", "R", "D", "R", "R", "R", "R",…
$ dem18      <dbl> 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 0,…
$ flip18     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0,…
$ gerry      <fct> mid, high, high, high, high, high, high, high, mid, mid, mi…

Since this dataset is shipped with a package, it has documentation that you can access via ?gerrymander. The flip18 variable is categorical with three levels:

  • -1: control of the district flipped from Democrats to Republicans between 2016 and 2018;
  • 0: the district did not flip. If Democrats controlled it in 2016, they kept it in 2018. If Republicans controlled it in 2016, they kept it in 2018;
  • 1: control of the district flipped from Republicans to Democrats between 2016 and 2018.

Districts at the tails

Make side-by-side box plots of percent of vote received by Trump in 2016 Presidential Election by prevalence of gerrymandering. Identify any Congressional Districts that are potential outliers. Are they different from the rest of the Congressional Districts due to high support or low support for Trump in the 2016 Presidential Election? Which state are they in? Which city are they in?

These 2 observations are in the state of New York & have disproportionately lower levels of support for Trump in the 2016 election & also low levels of gerrymandering.

ggplot(gerrymander, aes(x = trump16, y = gerry, color = gerry)) +
  geom_boxplot() +
  labs(title = "Trump 2016 Vote Share by Gerry Prevalance",
       x = "Percent of votes received by Trump (2016)",
       y = "Level of Gerrymandering") +
  theme_minimal() +
  theme(legend.position = "none") 

gerrymander |>
  arrange(trump16) |>
  slice(1:2)
# A tibble: 2 × 12
  district last_name first_name party16 clinton16 trump16 dem16 state party18
  <chr>    <chr>     <chr>      <chr>       <dbl>   <dbl> <dbl> <chr> <chr>  
1 NY-15    Serrano   Jose       D            93.8     4.9     1 NY    D      
2 NY-13    Espaillat Adriano    D            92.3     5.4     1 NY    D      
# ℹ 3 more variables: dem18 <dbl>, flip18 <dbl>, gerry <fct>

Flips

Is a Congressional District more likely to have high prevalance of gerrymandering if a Democrat was able to flip the seat in the 2018 election? Support your answer with a visualization as well as summary statistics. Hint: Calculate the conditional distribution of prevalance of gerrymandering based on whether a Democrat was able to flip the seat in the 2018 election.

Based on our visual analysis, there is not strong evidence that a Congressional District was more likely to have high prevalance of gerrymandering if a Democrat was able to flip the seat in the 2018 election.

ggplot(gerrymander, aes(x = flip18, fill = gerry)) +
  geom_bar(position = "fill") 

gerrymander |>
  count(flip18, gerry) |>
  group_by(flip18) |>
  mutate(prop = n / sum(n))
# A tibble: 8 × 4
# Groups:   flip18 [3]
  flip18 gerry     n  prop
   <dbl> <fct> <int> <dbl>
1     -1 low       2 0.4  
2     -1 mid       3 0.6  
3      0 low      52 0.133
4      0 mid     242 0.617
5      0 high     98 0.25 
6      1 low       8 0.211
7      1 mid      25 0.658
8      1 high      5 0.132