Lecture 10
Duke University
STA 199 Summer 2026: Session I
May 8, 2026
Cohesive, thoughtful analysis of a dataset of your team’s choosing (subject to instructor / TA approval) drawing on & expanding upon the techniques / methods learned in this course
R code and raw output should not appear in your final writeup (i.e., you will need to suppress your code chunks via Quarto settings)Sequential reveal: Motivation, then resolution
Instant reveal: Resolution, and motivation hidden within
When you’re trying to show too much data at once you may end up not showing anything.
Never assume your audience can rapidly process complex visual displays
Don’t add variables to your plot that are tangential to your story
Don’t jump straight to a highly complex figure; first show an easily digestible subset (e.g., show one facet first)
Aim for memorable, but clear
Project note: Make sure to leave time to iterate on your plots after you practice your presentation. If certain plots or outputs are getting too wordy to explain, take time to simplify them!
Be consistent but don’t be repetitive.
Use consistent features throughout plots (e.g., same color represents same level on all plots)
Aim to use a different type of summary or visualization for each distinct analysis
Reading a report with ALL boxplots is like walking into an ice cream shoppe that only sells versions of vanilla (e.g., Madagascar, Vanilla Bean (the best), French Vanilla, Old Fashioned, etc…) when I want a scoop of coffee and a scoop of cinnamon!
# A tibble: 5 × 2
category value
<chr> <dbl>
1 Cutting tools 0.03
2 Buildings and administration 0.22
3 Labor 0.31
4 Machinery 0.27
5 Workplace materials 0.17

























fig-width
For a zoomed-in look
fig-width
For a zoomed-out look
fig-width affects text size

First, ask yourself, must you include multiple plots on a slide? For example, is your narrative about comparing results from two plots?
If no, then don’t! Move the second plot to to the next slide!
If yes, use columns and sequential reveal.
Figure sizing: fig-width, fig-height, etc. in code chunks.
Figure layout: layout-ncol for placing multiple figures in a chunk.
Further control over figure layout with the patchwork package.
Chunk options around what makes it in your final report: message, echo, etc.
Cross referencing figures and tables.
Adding footnotes and citations.
As seen in Figure 1, there is a positive and relatively strong relationship between body mass and flipper length of penguins.
Table 1 displays summaries of flipper length by species.
penguins |>
group_by(species) |>
summarize(
Mean = mean(flipper_length_mm, na.rm = TRUE),
Median = median(flipper_length_mm, na.rm = TRUE),
SD = sd(flipper_length_mm, na.rm = TRUE)
) |>
knitr::kable(digits = 3)| species | Mean | Median | SD |
|---|---|---|---|
| Adelie | 189.954 | 190 | 6.539 |
| Chinstrap | 195.824 | 196 | 7.132 |
| Gentoo | 217.187 | 216 | 6.485 |
@tbl-penguins displays summaries of flipper length by species.
```{r}
#| label: tbl-penguins
#| tbl-cap: Flipper length summaries by species
penguins |>
group_by(species) |>
summarize(
Mean = mean(flipper_length_mm, na.rm = TRUE),
Median = median(flipper_length_mm, na.rm = TRUE),
SD = sd(flipper_length_mm, na.rm = TRUE)
) |>
knitr::kable(digits = 3)
```
The American Association of University Professors (AAUP) is a nonprofit membership association of faculty and other academic professionals. This report by the AAUP shows trends in instructional staff employees between 1975 and 2011, and contains the following image. What trends are apparent in this visualization?

library(tidyverse)
library(scales)
staff <- read_csv("data/instructional-staff.csv")
staff_long <- staff |>
pivot_longer(
cols = -faculty_type, names_to = "year",
values_to = "percentage"
) |>
mutate(
percentage = as.numeric(percentage),
faculty_type = fct_relevel(
faculty_type,
"Full-Time Tenured Faculty",
"Full-Time Tenure-Track Faculty",
"Full-Time Non-Tenure-Track Faculty",
"Part-Time Faculty",
"Graduate Student Employees"
),
year = as.numeric(year),
faculty_type_color = if_else(faculty_type == "Part-Time Faculty", "firebrick1", "gray40")
)p <- ggplot(
staff_long,
aes(
x = year,
y = percentage,
color = faculty_type_color, group = faculty_type
)
) +
geom_line(linewidth = 1, show.legend = FALSE) +
labs(
x = NULL,
y = "Percent of Total Instructional Staff",
color = NULL,
title = "Trends in Instructional Staff Employment Status, 1975-2011",
subtitle = "All Institutions, National Totals",
caption = "Source: US Department of Education, IPEDS Fall Staff Survey"
) +
scale_y_continuous(labels = label_percent(accuracy = 1, scale = 1)) +
scale_color_identity() +
theme(
plot.caption = element_text(size = 8, hjust = 0),
plot.margin = margin(0.1, 0.6, 0.1, 0.1, unit = "in")
) +
coord_cartesian(clip = "off") +
annotate(
geom = "text",
x = 2012, y = 41, label = "Part-Time\nFaculty",
color = "firebrick1", hjust = "left", size = 5
) +
annotate(
geom = "text",
x = 2012, y = 13.5, label = "Other\nFaculty",
color = "gray40", hjust = "left", size = 5
) +
annotate(
geom = "segment",
x = 2011.5, xend = 2011.5,
y = 7, yend = 20,
color = "gray40", linetype = "dotted"
)
p
p +
labs(
title = "Instruction by part-time faculty on a steady increase",
subtitle = "Trends in Instructional Staff Employment Status, 1975-2011\nAll Institutions, National Totals",
caption = "Source: US Department of Education, IPEDS Fall Staff Survey",
y = "Percent of Total Instructional Staff",
x = NULL
)
p +
labs(
title = "Instruction by part-time faculty on a steady increase",
subtitle = "Trends in Instructional Staff Employment Status, 1975-2011\nAll Institutions, National Totals",
caption = "Source: US Department of Education, IPEDS Fall Staff Survey",
y = "Percent of Total Instructional Staff",
x = NULL
) +
theme(panel.grid.minor = element_blank())