ggplot(bechdel, aes(x=budget_2013,y=gross_2013,color=binary,size=roi)) +
geom_point(alpha = 0.5) + facet_wrap(~ clean_test) Grammar of Data Transformation
Lecture 3
Announcements / Reminders
First “official” lab takes place immediately after this lecture (well, 15 mins after… stretch your legs & touch grass in between)
Office hours start tomorrow; you can find time / location details here
Come to office hours and / or post on Ed for help!
Lab Assignments (& Exams): Code Style (Do these things!)
Which of these pieces of code is easier to read?
ggplot(bechdel, aes(x = budget_2013, y = gross_2013,
color = binary, size = roi)) +
geom_point(alpha = 0.5) +
facet_wrap(~ clean_test) Lab Assignments (& Exams): Code Style (Do these things!)
Code should follow the tidyverse style:
there should be spaces before and line breaks after each
+when building aggplotthere should also be spaces before and line breaks after each
|>in a data transformation pipeline (we will introduce the “pipe” today!)code should be properly indented (check: Code -> Reindent Lines; equivalently, ⌘I)
spaces around
=signs and spaces after commasyou can find all tidyverse style guidelines here
All code should be visible in the PDF output (should not run off the page)! Use line breaks to prevent this.
Outline
Last Time: Grammar of data viz in R (via
ggplot())Today: Grammar of ‘data wrangling’
Alison Bechdel


The Bechdel Test
(Dykes to Watch Out For - 1985)
Film passes if it has…
- two (named) female characters;
- who talk to each other;
- about something besides a man.
Recent releases
| Title | Year | Bechdel | Director |
|---|---|---|---|
| Dune 2 | 2024 | ❌ | M |
| Conclave | 2024 | ❌ | M |
| Wicked 1 | 2024 | ✅ | M |
| Bugonia | 2025 | ❌ | M |
| Wicked 2 | 2025 | ✅ | M |
| Marty Supreme | 2025 | ❌ | M |
| Wuthering Heights | 2026 | ✅ | F |
| The Devil Wears Prada 2 | 2026 | ✅ | M |
Data Transformation
dplyr
Primary package in the tidyverse for data wrangling and transformation

What is data transformation?
Creating new variables (perhaps as a function of some existing variables, but not necessarily so…)
Reshaping your data frame
Summarizing information about your variables
And more!
The pipe
The pipe,
|>, is an operator (a tool) that allows us to link two functions together in a way that is readable from left to rightUse
|>to pass the output of the previous line of code as the first argument to the function in the following line of code.When reading code “in English”, say “(and) then” whenever you see a pipe.
You can string multiple pipes together to continue passing upstream outputs along to downstream functions; a string of pipes is still referred to as a “single pipeline”
Readability
Consider the following sequence of actions that describe the process of getting to campus in the morning:
I need to find my key, then unlock my car, then start my car, then drive to school, then park.
. . .
Expressed as a set of nested functions in R pseudocode this would look like:
park(drive(start_car(find("keys")), to="campus")). . .
Writing it out using pipes give it a more natural (and easier to read) structure:
find("keys") |>
start_car() |>
drive(to="campus") |>
park()A Grammar of Data Manipulation
dplyr is based on the concepts of functions as verbs that manipulate data frames.
Core single data frame functions / verbs:
-
filter()/slice()- pick rows based on criteria -
select()/rename()- select columns by name -
pull()- grab a column as a vector -
arrange()- reorder rows -
mutate()/transmute()- create or modify columns -
distinct()- filter for unique rows -
summarize()/count()- reduce variables to values -
group_by()/ungroup()- modify other verbs to act on subsets -
relocate()- change column order - … (many more)
Row Operations
slice()
-
slice(): chooses rows based on location
Ex: Display the first five rows of bechdel:
# A tibble: 1,615 × 7
title year gross_2013 budget_2013 roi binary clean_test
<chr> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 21 & Over 2013 67878146 13000000 5.22 FAIL notalk
2 Dredd 3D 2012 55078343 45658735 1.21 PASS ok
3 12 Years a S… 2013 211714070 20000000 10.6 FAIL notalk
4 2 Guns 2013 208105475 61000000 3.41 FAIL notalk
5 42 2013 190040426 40000000 4.75 FAIL men
6 47 Ronin 2013 184166317 225000000 0.819 FAIL men
7 A Good Day t… 2013 371598396 92000000 4.04 FAIL notalk
8 About Time 2013 102648667 12000000 8.55 PASS ok
9 Admission 2013 36014634 13000000 2.77 PASS ok
10 After Earth 2013 304895295 130000000 2.35 FAIL notalk
# ℹ 1,605 more rows
# A tibble: 5 × 7
title year gross_2013 budget_2013 roi binary clean_test
<chr> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 21 & Over 2013 67878146 13000000 5.22 FAIL notalk
2 Dredd 3D 2012 55078343 45658735 1.21 PASS ok
3 12 Years a Sla… 2013 211714070 20000000 10.6 FAIL notalk
4 2 Guns 2013 208105475 61000000 3.41 FAIL notalk
5 42 2013 190040426 40000000 4.75 FAIL men
arrange()
-
arrange(): changes the order of the rows; default is ascending order
# A tibble: 1,615 × 7
title year gross_2013 budget_2013 roi binary clean_test
<chr> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 21 & Over 2013 67878146 13000000 5.22 FAIL notalk
2 Dredd 3D 2012 55078343 45658735 1.21 PASS ok
3 12 Years a S… 2013 211714070 20000000 10.6 FAIL notalk
4 2 Guns 2013 208105475 61000000 3.41 FAIL notalk
5 42 2013 190040426 40000000 4.75 FAIL men
6 47 Ronin 2013 184166317 225000000 0.819 FAIL men
7 A Good Day t… 2013 371598396 92000000 4.04 FAIL notalk
8 About Time 2013 102648667 12000000 8.55 PASS ok
9 Admission 2013 36014634 13000000 2.77 PASS ok
10 After Earth 2013 304895295 130000000 2.35 FAIL notalk
# ℹ 1,605 more rows
# A tibble: 1,615 × 7
title year gross_2013 budget_2013 roi binary clean_test
<chr> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 Back to the F… 1990 590818548 71319016 8.28 FAIL notalk
2 Child's Play 2 1990 108888347 23178680 4.70 PASS ok
3 Dark Angel (I… 1990 15592338 12480828 1.25 FAIL nowomen
4 Die Hard 2 1990 636768095 124808278 5.10 FAIL dubious
5 Edward Scisso… 1990 192479280 35659508 5.40 PASS ok
6 Flatliners 1990 218621858 46357360 4.72 PASS ok
7 Ghost 1990 1310899333 39225459 33.4 FAIL men
8 Goodfellas 1990 166686124 44574385 3.74 FAIL men
9 Home Alone 1990 1359422317 26744631 50.8 FAIL men
10 Nikita 1990 17893838 12480828 1.43 PASS ok
# ℹ 1,605 more rows
sample_n()
-
sample_n(): take a random subset of the rows
Display five random rows of bechdel:
bechdel # A tibble: 1,615 × 7
title year gross_2013 budget_2013 roi binary clean_test
<chr> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 21 & Over 2013 67878146 13000000 5.22 FAIL notalk
2 Dredd 3D 2012 55078343 45658735 1.21 PASS ok
3 12 Years a S… 2013 211714070 20000000 10.6 FAIL notalk
4 2 Guns 2013 208105475 61000000 3.41 FAIL notalk
5 42 2013 190040426 40000000 4.75 FAIL men
6 47 Ronin 2013 184166317 225000000 0.819 FAIL men
7 A Good Day t… 2013 371598396 92000000 4.04 FAIL notalk
8 About Time 2013 102648667 12000000 8.55 PASS ok
9 Admission 2013 36014634 13000000 2.77 PASS ok
10 After Earth 2013 304895295 130000000 2.35 FAIL notalk
# ℹ 1,605 more rows
# A tibble: 5 × 7
title year gross_2013 budget_2013 roi binary clean_test
<chr> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 Saving Grace 2000 54069594 5411634 9.99 PASS ok
2 The Adventures… 2011 467703233 134639802 3.47 FAIL notalk
3 Dazed and Conf… 1993 25640964 11125966 2.30 PASS ok
4 Side Effects 2013 92461120 30000000 3.08 PASS ok
5 Virus 1999 62423681 104884652 0.595 FAIL dubious
filter()
-
filter():chooses rows based on column values - You should think of the logic you provide within a
filter()function call as telling R what observations it should keep
Keep only the rows of bechdel that pass the test:
bechdel# A tibble: 1,615 × 7
title year gross_2013 budget_2013 roi binary clean_test
<chr> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 21 & Over 2013 67878146 13000000 5.22 FAIL notalk
2 Dredd 3D 2012 55078343 45658735 1.21 PASS ok
3 12 Years a S… 2013 211714070 20000000 10.6 FAIL notalk
4 2 Guns 2013 208105475 61000000 3.41 FAIL notalk
5 42 2013 190040426 40000000 4.75 FAIL men
6 47 Ronin 2013 184166317 225000000 0.819 FAIL men
7 A Good Day t… 2013 371598396 92000000 4.04 FAIL notalk
8 About Time 2013 102648667 12000000 8.55 PASS ok
9 Admission 2013 36014634 13000000 2.77 PASS ok
10 After Earth 2013 304895295 130000000 2.35 FAIL notalk
# ℹ 1,605 more rows
# A tibble: 753 × 7
title year gross_2013 budget_2013 roi binary clean_test
<chr> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 Dredd 3D 2012 55078343 45658735 1.21 PASS ok
2 About Time 2013 102648667 12000000 8.55 PASS ok
3 Admission 2013 36014634 13000000 2.77 PASS ok
4 American Hust… 2013 397915817 40000000 9.95 PASS ok
5 August: Osage… 2013 87609748 25000000 3.50 PASS ok
6 Beautiful Cre… 2013 75392809 50000000 1.51 PASS ok
7 Blue Jasmine 2013 101793664 18000000 5.66 PASS ok
8 Carrie 2013 120268278 30000000 4.01 PASS ok
9 Despicable Me… 2013 1338831390 76000000 17.6 PASS ok
10 Elysium 2013 379242208 120000000 3.16 PASS ok
# ℹ 743 more rows
filter()
Keep only the movies released before 2000
# A tibble: 337 × 7
title year gross_2013 budget_2013 roi binary clean_test
<chr> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 10 Things I H… 1999 137877156 18180006 7.58 PASS ok
2 8MM 1999 185774868 55938481 3.32 FAIL notalk
3 American Beau… 1999 680094591 20976930 32.4 PASS ok
4 American Pie 1999 470616170 16781544 28.0 FAIL men
5 Analyze This 1999 396843410 41953861 9.46 FAIL notalk
6 Anna and the … 1999 109782424 104884652 1.05 FAIL men
7 Anywhere But … 1999 52172744 32164627 1.62 FAIL dubious
8 Austin Powers… 1999 722127642 48946171 14.8 FAIL notalk
9 Being John Ma… 1999 77252870 18180006 4.25 PASS ok
10 Black and Whi… 1999 14659560 13984620 1.05 PASS ok
# ℹ 327 more rows
filter()
Often (but not always), looks like:
filter(variable [logical operator] value)or
Some logical operators
| operator | definition |
|---|---|
< |
is less than? |
<= |
is less than or equal to? |
> |
is greater than? |
>= |
is greater than or equal to? |
== |
is exactly equal to? |
!= |
is not equal to? |
More logical operators
| operator | definition |
|---|---|
x & y |
is x AND y? |
x | y |
is x OR y? |
is.na(x) |
is x NA? |
!is.na(x) |
is x not NA? |
x %in% y |
is x in y? |
!(x %in% y) |
is x not in y? |
!x |
is not x? (only makes sense if x is TRUE or FALSE) |
filter()
Keep only the movies from before 2000 AND that pass the test
# A tibble: 147 × 7
title year gross_2013 budget_2013 roi binary clean_test
<chr> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 10 Things I … 1999 137877156 18180006 7.58 PASS ok
2 American Bea… 1999 680094591 20976930 32.4 PASS ok
3 Being John M… 1999 77252870 18180006 4.25 PASS ok
4 Black and Wh… 1999 14659560 13984620 1.05 PASS ok
5 Boys Don't C… 1999 45144602 2796924 16.1 PASS ok
6 But I'm a Ch… 1999 6761310 1678154 4.03 PASS ok
7 Carrie 2: Th… 1999 49674054 29367703 1.69 PASS ok
8 Cruel Intent… 1999 159471926 15383082 10.4 PASS ok
9 Dick 1999 17555926 18180006 0.966 PASS ok
10 Drop Dead Go… 1999 29567426 13984620 2.11 PASS ok
# ℹ 137 more rows
Column Operations
select()
-
select(): changes whether or not a column is included.
Keep only the title and test status.
bechdel# A tibble: 1,615 × 7
title year gross_2013 budget_2013 roi binary clean_test
<chr> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 21 & Over 2013 67878146 13000000 5.22 FAIL notalk
2 Dredd 3D 2012 55078343 45658735 1.21 PASS ok
3 12 Years a S… 2013 211714070 20000000 10.6 FAIL notalk
4 2 Guns 2013 208105475 61000000 3.41 FAIL notalk
5 42 2013 190040426 40000000 4.75 FAIL men
6 47 Ronin 2013 184166317 225000000 0.819 FAIL men
7 A Good Day t… 2013 371598396 92000000 4.04 FAIL notalk
8 About Time 2013 102648667 12000000 8.55 PASS ok
9 Admission 2013 36014634 13000000 2.77 PASS ok
10 After Earth 2013 304895295 130000000 2.35 FAIL notalk
# ℹ 1,605 more rows
select()
-
select()also allows you to exclude particular columns using the-symbol
Again, keep only the title and test status but this time by explicitly excluding all other columns
bechdel# A tibble: 1,615 × 7
title year gross_2013 budget_2013 roi binary clean_test
<chr> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 21 & Over 2013 67878146 13000000 5.22 FAIL notalk
2 Dredd 3D 2012 55078343 45658735 1.21 PASS ok
3 12 Years a S… 2013 211714070 20000000 10.6 FAIL notalk
4 2 Guns 2013 208105475 61000000 3.41 FAIL notalk
5 42 2013 190040426 40000000 4.75 FAIL men
6 47 Ronin 2013 184166317 225000000 0.819 FAIL men
7 A Good Day t… 2013 371598396 92000000 4.04 FAIL notalk
8 About Time 2013 102648667 12000000 8.55 PASS ok
9 Admission 2013 36014634 13000000 2.77 PASS ok
10 After Earth 2013 304895295 130000000 2.35 FAIL notalk
# ℹ 1,605 more rows
rename()
-
rename(): changes the name of columns.
Rename clean_test to test_result
bechdel# A tibble: 1,615 × 7
title year gross_2013 budget_2013 roi binary clean_test
<chr> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 21 & Over 2013 67878146 13000000 5.22 FAIL notalk
2 Dredd 3D 2012 55078343 45658735 1.21 PASS ok
3 12 Years a S… 2013 211714070 20000000 10.6 FAIL notalk
4 2 Guns 2013 208105475 61000000 3.41 FAIL notalk
5 42 2013 190040426 40000000 4.75 FAIL men
6 47 Ronin 2013 184166317 225000000 0.819 FAIL men
7 A Good Day t… 2013 371598396 92000000 4.04 FAIL notalk
8 About Time 2013 102648667 12000000 8.55 PASS ok
9 Admission 2013 36014634 13000000 2.77 PASS ok
10 After Earth 2013 304895295 130000000 2.35 FAIL notalk
# ℹ 1,605 more rows
# A tibble: 1,615 × 7
title year gross_2013 budget_2013 roi binary test_result
<chr> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 21 & Over 2013 67878146 13000000 5.22 FAIL notalk
2 Dredd 3D 2012 55078343 45658735 1.21 PASS ok
3 12 Years a … 2013 211714070 20000000 10.6 FAIL notalk
4 2 Guns 2013 208105475 61000000 3.41 FAIL notalk
5 42 2013 190040426 40000000 4.75 FAIL men
6 47 Ronin 2013 184166317 225000000 0.819 FAIL men
7 A Good Day … 2013 371598396 92000000 4.04 FAIL notalk
8 About Time 2013 102648667 12000000 8.55 PASS ok
9 Admission 2013 36014634 13000000 2.77 PASS ok
10 After Earth 2013 304895295 130000000 2.35 FAIL notalk
# ℹ 1,605 more rows
rename()
Generally, looks like:
rename(new_variable_name = old_variable_name)mutate()
-
mutate(): changes the values of columns (i.e., modifies existing columns) and creates new columns.
Create a new variable for the budget in millions
bechdel# A tibble: 1,615 × 7
title year gross_2013 budget_2013 roi binary clean_test
<chr> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 21 & Over 2013 67878146 13000000 5.22 FAIL notalk
2 Dredd 3D 2012 55078343 45658735 1.21 PASS ok
3 12 Years a S… 2013 211714070 20000000 10.6 FAIL notalk
4 2 Guns 2013 208105475 61000000 3.41 FAIL notalk
5 42 2013 190040426 40000000 4.75 FAIL men
6 47 Ronin 2013 184166317 225000000 0.819 FAIL men
7 A Good Day t… 2013 371598396 92000000 4.04 FAIL notalk
8 About Time 2013 102648667 12000000 8.55 PASS ok
9 Admission 2013 36014634 13000000 2.77 PASS ok
10 After Earth 2013 304895295 130000000 2.35 FAIL notalk
# ℹ 1,605 more rows
# A tibble: 1,615 × 8
title year gross_2013 budget_2013 roi binary clean_test
<chr> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 21 & Over 2013 67878146 13000000 5.22 FAIL notalk
2 Dredd 3D 2012 55078343 45658735 1.21 PASS ok
3 12 Years a S… 2013 211714070 20000000 10.6 FAIL notalk
4 2 Guns 2013 208105475 61000000 3.41 FAIL notalk
5 42 2013 190040426 40000000 4.75 FAIL men
6 47 Ronin 2013 184166317 225000000 0.819 FAIL men
7 A Good Day t… 2013 371598396 92000000 4.04 FAIL notalk
8 About Time 2013 102648667 12000000 8.55 PASS ok
9 Admission 2013 36014634 13000000 2.77 PASS ok
10 After Earth 2013 304895295 130000000 2.35 FAIL notalk
# ℹ 1,605 more rows
# ℹ 1 more variable: budget_million <dbl>
mutate()
Generally, looks like:
mutate(new_variable_name = function(existing_variable))Groups of rows
count()
-
count(): count unique values of one or more variables.
Count how many movies pass or fail the Bechdel test.
bechdel# A tibble: 1,615 × 7
title year gross_2013 budget_2013 roi binary clean_test
<chr> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 21 & Over 2013 67878146 13000000 5.22 FAIL notalk
2 Dredd 3D 2012 55078343 45658735 1.21 PASS ok
3 12 Years a S… 2013 211714070 20000000 10.6 FAIL notalk
4 2 Guns 2013 208105475 61000000 3.41 FAIL notalk
5 42 2013 190040426 40000000 4.75 FAIL men
6 47 Ronin 2013 184166317 225000000 0.819 FAIL men
7 A Good Day t… 2013 371598396 92000000 4.04 FAIL notalk
8 About Time 2013 102648667 12000000 8.55 PASS ok
9 Admission 2013 36014634 13000000 2.77 PASS ok
10 After Earth 2013 304895295 130000000 2.35 FAIL notalk
# ℹ 1,605 more rows
group_by()
-
group_by(): group separately for each value of a variable
Group by movies passing or failing the test
bechdel# A tibble: 1,615 × 7
title year gross_2013 budget_2013 roi binary clean_test
<chr> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 21 & Over 2013 67878146 13000000 5.22 FAIL notalk
2 Dredd 3D 2012 55078343 45658735 1.21 PASS ok
3 12 Years a S… 2013 211714070 20000000 10.6 FAIL notalk
4 2 Guns 2013 208105475 61000000 3.41 FAIL notalk
5 42 2013 190040426 40000000 4.75 FAIL men
6 47 Ronin 2013 184166317 225000000 0.819 FAIL men
7 A Good Day t… 2013 371598396 92000000 4.04 FAIL notalk
8 About Time 2013 102648667 12000000 8.55 PASS ok
9 Admission 2013 36014634 13000000 2.77 PASS ok
10 After Earth 2013 304895295 130000000 2.35 FAIL notalk
# ℹ 1,605 more rows
bechdel |>
group_by(binary)# A tibble: 1,615 × 7
# Groups: binary [2]
title year gross_2013 budget_2013 roi binary clean_test
<chr> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 21 & Over 2013 67878146 13000000 5.22 FAIL notalk
2 Dredd 3D 2012 55078343 45658735 1.21 PASS ok
3 12 Years a S… 2013 211714070 20000000 10.6 FAIL notalk
4 2 Guns 2013 208105475 61000000 3.41 FAIL notalk
5 42 2013 190040426 40000000 4.75 FAIL men
6 47 Ronin 2013 184166317 225000000 0.819 FAIL men
7 A Good Day t… 2013 371598396 92000000 4.04 FAIL notalk
8 About Time 2013 102648667 12000000 8.55 PASS ok
9 Admission 2013 36014634 13000000 2.77 PASS ok
10 After Earth 2013 304895295 130000000 2.35 FAIL notalk
# ℹ 1,605 more rows
summarize()
-
summarize(): collapses a group into a single row.
Compute average budget
bechdel # A tibble: 1,615 × 7
title year gross_2013 budget_2013 roi binary clean_test
<chr> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 21 & Over 2013 67878146 13000000 5.22 FAIL notalk
2 Dredd 3D 2012 55078343 45658735 1.21 PASS ok
3 12 Years a S… 2013 211714070 20000000 10.6 FAIL notalk
4 2 Guns 2013 208105475 61000000 3.41 FAIL notalk
5 42 2013 190040426 40000000 4.75 FAIL men
6 47 Ronin 2013 184166317 225000000 0.819 FAIL men
7 A Good Day t… 2013 371598396 92000000 4.04 FAIL notalk
8 About Time 2013 102648667 12000000 8.55 PASS ok
9 Admission 2013 36014634 13000000 2.77 PASS ok
10 After Earth 2013 304895295 130000000 2.35 FAIL notalk
# ℹ 1,605 more rows
summarize()
Generally, looks like:
summarize(result_variable_name = function(existing_variable))group_by() + summarize()
Group by movies passing/failing and compute within-group average budget
bechdel |>
group_by(binary) |>
summarize(mean_budget = mean(budget_2013))# A tibble: 2 × 2
binary mean_budget
<chr> <dbl>
1 FAIL 65877024.
2 PASS 46913086.
The pipe, in action
Find movies that pass the Bechdel test and display their titles and ROIs in descending order of ROI.
. . .
Start with the bechdel data frame:
bechdel# A tibble: 1,615 × 7
title year gross_2013 budget_2013 roi binary clean_test
<chr> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 21 & Over 2013 67878146 13000000 5.22 FAIL notalk
2 Dredd 3D 2012 55078343 45658735 1.21 PASS ok
3 12 Years a S… 2013 211714070 20000000 10.6 FAIL notalk
4 2 Guns 2013 208105475 61000000 3.41 FAIL notalk
5 42 2013 190040426 40000000 4.75 FAIL men
6 47 Ronin 2013 184166317 225000000 0.819 FAIL men
7 A Good Day t… 2013 371598396 92000000 4.04 FAIL notalk
8 About Time 2013 102648667 12000000 8.55 PASS ok
9 Admission 2013 36014634 13000000 2.77 PASS ok
10 After Earth 2013 304895295 130000000 2.35 FAIL notalk
# ℹ 1,605 more rows
The pipe, in action
Find movies that pass the Bechdel test and display their titles and ROIs in descending order of ROI.
Filter for rows where binary is equal to "PASS":
bechdel |>
filter(binary == "PASS")# A tibble: 753 × 7
title year gross_2013 budget_2013 roi binary clean_test
<chr> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 Dredd 3D 2012 55078343 45658735 1.21 PASS ok
2 About Time 2013 102648667 12000000 8.55 PASS ok
3 Admission 2013 36014634 13000000 2.77 PASS ok
4 American Hust… 2013 397915817 40000000 9.95 PASS ok
5 August: Osage… 2013 87609748 25000000 3.50 PASS ok
6 Beautiful Cre… 2013 75392809 50000000 1.51 PASS ok
7 Blue Jasmine 2013 101793664 18000000 5.66 PASS ok
8 Carrie 2013 120268278 30000000 4.01 PASS ok
9 Despicable Me… 2013 1338831390 76000000 17.6 PASS ok
10 Elysium 2013 379242208 120000000 3.16 PASS ok
# ℹ 743 more rows
The pipe, in action
Find movies that pass the Bechdel test and display their titles and ROIs in descending order of ROI.
Arrange the rows in descending order of roi:
bechdel |>
filter(binary == "PASS") |>
arrange(desc(roi))# A tibble: 753 × 7
title year gross_2013 budget_2013 roi binary clean_test
<chr> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 The Blair Wit… 1999 543776715 839077 648. PASS ok
2 The Devil Ins… 2012 157289709 1014639 155. PASS ok
3 My Big Fat Gr… 2002 768922942 6475896 119. PASS ok
4 Chasing Amy 1997 39417963 362810 109. PASS ok
5 Slacker 1991 4200140 39349 107. PASS ok
6 Insidious 2010 164379554 1602348 103. PASS ok
7 Paranormal Ac… 2010 280159759 3204696 87.4 PASS ok
8 Paranormal Ac… 2011 322170936 5178454 62.2 PASS ok
9 The Last Exor… 2010 118787648 1922817 61.8 PASS ok
10 Cinderella 1997 246710482 4208591 58.6 PASS ok
# ℹ 743 more rows
The pipe, in action
Find movies that pass the Bechdel test and display their titles and ROIs in descending order of ROI.
Select columns title and roi:
bechdel |>
filter(binary == "PASS") |>
arrange(desc(roi)) |>
select(title, roi)# A tibble: 753 × 2
title roi
<chr> <dbl>
1 The Blair Witch Project 648.
2 The Devil Inside 155.
3 My Big Fat Greek Wedding 119.
4 Chasing Amy 109.
5 Slacker 107.
6 Insidious 103.
7 Paranormal Activity 2 87.4
8 Paranormal Activity 3 62.2
9 The Last Exorcism 61.8
10 Cinderella 58.6
# ℹ 743 more rows
