AE 05: Gerrymandering + data exploration II

Getting started

Packages

We’ll use the tidyverse package for this analysis.

library(tidyverse)
library(usdata)
library(ggbeeswarm)

gerrymander <- usdata::gerrymander # force df to appear in environment

Data

The data are available in the usdata package.

glimpse(gerrymander)

Rows: 435
Columns: 12
$ district   <chr> "AK-AL", "AL-01", "AL-02", "AL-03", "AL-04", "AL-05", "AL-0…
$ last_name  <chr> "Young", "Byrne", "Roby", "Rogers", "Aderholt", "Brooks", "…
$ first_name <chr> "Don", "Bradley", "Martha", "Mike D.", "Rob", "Mo", "Gary",…
$ party16    <chr> "R", "R", "R", "R", "R", "R", "R", "D", "R", "R", "R", "R",…
$ clinton16  <dbl> 37.6, 34.1, 33.0, 32.3, 17.4, 31.3, 26.1, 69.8, 30.2, 41.7,…
$ trump16    <dbl> 52.8, 63.5, 64.9, 65.3, 80.4, 64.7, 70.8, 28.6, 65.0, 52.4,…
$ dem16      <dbl> 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0,…
$ state      <chr> "AK", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AR", "AR",…
$ party18    <chr> "R", "R", "R", "R", "R", "R", "R", "D", "R", "R", "R", "R",…
$ dem18      <dbl> 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 0,…
$ flip18     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0,…
$ gerry      <fct> mid, high, high, high, high, high, high, high, mid, mid, mi…

Congressional districts per state

Which state has the most congressional districts? How many congressional districts are there in this state?

California, with 53 congressional districts in the state.

gerrymander |>
  count(state, sort = TRUE) |>
  slice(1)

# A tibble: 1 × 2
  state     n
  <chr> <int>
1 CA       53

Gerrymandering and flipping

Is a Congressional District more likely to be flipped to a Democratic seat if it has high prevalence of gerrymandering or low prevalence of gerrymandering? Support your answer with a visualization and summary statistics.

Based on the plot below, we actually find that a Congressional District is more likely to be flipped to a Democratic seat if it has low prevalance of gerrymandering.

gerrymander |>
  mutate(flip18 = as_factor(flip18)) |>
  ggplot(aes(x = gerry, fill = flip18)) +
  geom_bar(position = "fill") +
  labs(title = "Level of gerrymandering by 'flip' status",
     x = "Level of gerrymandering preceding the 2018 House election",
     y = "Proportion of observations",
     fill = "'Flip' status") +
  theme_minimal()

gerrymander |>
  count(gerry, flip18) |>
  group_by(gerry) |>
  mutate(prop = n / sum(n))

# A tibble: 8 × 4
# Groups:   gerry [3]
  gerry flip18     n   prop
  <fct>  <dbl> <int>  <dbl>
1 low       -1     2 0.0323
2 low        0    52 0.839 
3 low        1     8 0.129 
4 mid       -1     3 0.0111
5 mid        0   242 0.896 
6 mid        1    25 0.0926
7 high       0    98 0.951 
8 high       1     5 0.0485

Aesthetic mappings

Recreate the following visualization, and then improve it.

library(scales)


Attaching package: 'scales'

The following object is masked from 'package:purrr':

    discard

The following object is masked from 'package:readr':

    col_factor

##  Recreate 
ggplot(gerrymander, aes(x = gerry, y = clinton16)) +
  geom_beeswarm(alpha = .5, color = "grey") +
  geom_boxplot(aes(color = gerry), alpha = .5) +
  theme_minimal()

## Improve
gerrymander |>
  ggplot(aes(x = gerry, y = clinton16)) +
    geom_beeswarm(alpha = .5, color = "grey") +
    geom_boxplot(aes(color = gerry), alpha = .5) +
    theme_minimal() +
    scale_y_continuous(labels = label_percent(scale = 1)) +
    labs(title = "Distribution of % Vote for Clinton in 2016",
         subtitle = "By Level of Gerrymandering",
         x = "Level of Gerrymandering",
         y = "% Vote for Clinton in 2016") +
    theme(legend.position = "none")