AE 05: Gerrymandering + data exploration II

Suggested answers

Important

These are suggested answers. This document should be used as reference only, it’s not designed to be an exhaustive key.

Getting started

Packages

We’ll use the tidyverse package for this analysis.

library(tidyverse)
library(usdata)
library(ggbeeswarm)

gerrymander <- usdata::gerrymander # force df to appear in environment

Data

The data are available in the usdata package.

glimpse(gerrymander)
Rows: 435
Columns: 12
$ district   <chr> "AK-AL", "AL-01", "AL-02", "AL-03", "AL-04", "AL-05", "AL-0…
$ last_name  <chr> "Young", "Byrne", "Roby", "Rogers", "Aderholt", "Brooks", "…
$ first_name <chr> "Don", "Bradley", "Martha", "Mike D.", "Rob", "Mo", "Gary",…
$ party16    <chr> "R", "R", "R", "R", "R", "R", "R", "D", "R", "R", "R", "R",…
$ clinton16  <dbl> 37.6, 34.1, 33.0, 32.3, 17.4, 31.3, 26.1, 69.8, 30.2, 41.7,…
$ trump16    <dbl> 52.8, 63.5, 64.9, 65.3, 80.4, 64.7, 70.8, 28.6, 65.0, 52.4,…
$ dem16      <dbl> 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0,…
$ state      <chr> "AK", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AR", "AR",…
$ party18    <chr> "R", "R", "R", "R", "R", "R", "R", "D", "R", "R", "R", "R",…
$ dem18      <dbl> 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 0,…
$ flip18     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0,…
$ gerry      <fct> mid, high, high, high, high, high, high, high, mid, mid, mi…

Congressional districts per state

Which state has the most congressional districts? How many congressional districts are there in this state?

California, with 53 congressional districts in the state.

gerrymander |>
  count(state, sort = TRUE) |>
  slice(1)
# A tibble: 1 × 2
  state     n
  <chr> <int>
1 CA       53

Gerrymandering and flipping

Is a Congressional District more likely to be flipped to a Democratic seat if it has high prevalence of gerrymandering or low prevalence of gerrymandering? Support your answer with a visualization and summary statistics.

Based on the plot below, we actually find that a Congressional District is more likely to be flipped to a Democratic seat if it has low prevalance of gerrymandering.

gerrymander |>
  mutate(flip18 = as_factor(flip18)) |>
  ggplot(aes(x = gerry, fill = flip18)) +
  geom_bar(position = "fill") +
  labs(title = "Level of gerrymandering by 'flip' status",
     x = "Level of gerrymandering preceding the 2018 House election",
     y = "Proportion of observations",
     fill = "'Flip' status") +
  theme_minimal()

gerrymander |>
  count(gerry, flip18) |>
  group_by(gerry) |>
  mutate(prop = n / sum(n))
# A tibble: 8 × 4
# Groups:   gerry [3]
  gerry flip18     n   prop
  <fct>  <dbl> <int>  <dbl>
1 low       -1     2 0.0323
2 low        0    52 0.839 
3 low        1     8 0.129 
4 mid       -1     3 0.0111
5 mid        0   242 0.896 
6 mid        1    25 0.0926
7 high       0    98 0.951 
8 high       1     5 0.0485

Aesthetic mappings

Recreate the following visualization, and then improve it.


Attaching package: 'scales'
The following object is masked from 'package:purrr':

    discard
The following object is masked from 'package:readr':

    col_factor
##  Recreate 
ggplot(gerrymander, aes(x = gerry, y = clinton16)) +
  geom_beeswarm(alpha = .5, color = "grey") +
  geom_boxplot(aes(color = gerry), alpha = .5) +
  theme_minimal()

## Improve
gerrymander |>
  ggplot(aes(x = gerry, y = clinton16)) +
    geom_beeswarm(alpha = .5, color = "grey") +
    geom_boxplot(aes(color = gerry), alpha = .5) +
    theme_minimal() +
    scale_y_continuous(labels = label_percent(scale = 1)) +
    labs(title = "Distribution of % Vote for Clinton in 2016",
         subtitle = "By Level of Gerrymandering",
         x = "Level of Gerrymandering",
         y = "% Vote for Clinton in 2016") +
    theme(legend.position = "none")