typeof(TRUE)[1] "logical"
Lecture 8
An object’s type indicates how it is stored in memory
You’ll commonly encounter:
logicalintegerdoublecharacterYou’ll less commonly encounter:
listNULLcomplexrawCan you use different types of data together? Sometimes… but be careful!
"3" + 3Error in `"3" + 3`:
! non-numeric argument to binary operator
Vectors are constructed using the c function
without intention…
c(2, "Just this one!")[1] "2" "Just this one!"
R will happily convert between various types without complaint when different types of data are concatenated in a vector. This is NOT always a good thing.
without intention…
c(FALSE, 3L)[1] 0 3
c(1.2, 3L)[1] 1.2 3.0
c(2L, "two")[1] "2" "two"
with intention…
with intention…
Explicit coercion:
When you call a function like:
Implicit coercion:
Happens when you use a vector in a specific context that expects a certain type of vector.
We can think of data frames like like vectors of equal length glued together
df <- data.frame(x = 1:2, y = 3:4)
df x y
1 1 3
2 2 4
typeof(df)[1] "list"
class(df)[1] "data.frame"
## alternatively, "bind" two columns together
x <- data.frame(x = 1:2)
y <- data.frame(y = 3:4)
df <- cbind(x, y)
df x y
1 1 3
2 2 4
typeof(df)[1] "list"
class(df)[1] "data.frame"
We can think of data frames like like vectors of equal length glued together
df <- data.frame(x = 1:2, y = 3:4)
df x y
1 1 3
2 2 4
pull() function, we extract a vector from the data frame; this is functionally the same as accessing a column with df$col_name
df |>
pull(y)[1] 3 4
today <- as.Date("2026-05-26")
today[1] "2026-05-26"
typeof(today)[1] "double"
class(today)[1] "Date"
We can think of dates like an integer (the number of days since the origin, 1 Jan 1970) and an integer (the origin, aka “the Unix epoch”) glued together
as.integer(today)[1] 20599
as.integer(today) / 365 # roughly 56 yrs[1] 56.43562
as.integer(as.Date("1970-01-01"))[1] 0
The lubridate package allows you to work with / access elements of dates seamlessly
year(today)[1] 2026
month(today)[1] 5
day(today)[1] 26
R uses factors to handle categorical variables with a fixed and known set of possible values
factor(x = ...): “The default (ordering of levels) is the unique set of values taken by as.character(x), sorted into increasing (alphabetical) order of x”
glimpse(summer_factor) Factor w/ 3 levels "August","July",..: 3 2 3 1 3
as.integer(summer_factor)[1] 3 2 3 1 3
We can use the forcats package (in tidyverse) to work with factors!
Some commonly used functions are:
fct_relevel(): reorder factors by hand
fct_reorder(): reorder factors by another variable
fct_infreq(): reorder factors by frequency
fct_rev(): reorder factors by reversing
amounts <- c("low", "medium", "high", "high", "medium")
amounts_factor <- factor(amounts)
amounts_factor[1] low medium high high medium
Levels: high low medium
fct_relevel(amounts_factor, c("low", "medium", "high"))[1] low medium high high medium
Levels: low medium high
. . .
Referencing a column in a pipeline:
df |>
filter("x" > 0)# A tibble: 5 × 3
x `2011` `my var`
<dbl> <dbl> <dbl>
1 -2 -2 -2
2 -0.5 -0.5 -1
3 0.5 0.5 0
4 1 1 1
5 2 2 2
"x" means the literal character string.
df |>
filter(x > 0)# A tibble: 3 × 3
x `2011` `my var`
<dbl> <dbl> <dbl>
1 0.5 0.5 0
2 1 1 1
3 2 2 2
x means the column name in df.
df |>
filter(`x` > 0)# A tibble: 3 × 3
x `2011` `my var`
<dbl> <dbl> <dbl>
1 0.5 0.5 0
2 1 1 1
3 2 2 2
`x` also means the column name in df.
Referencing a column in a pipeline:
df |>
filter("2011" > 0)# A tibble: 5 × 3
x `2011` `my var`
<dbl> <dbl> <dbl>
1 -2 -2 -2
2 -0.5 -0.5 -1
3 0.5 0.5 0
4 1 1 1
5 2 2 2
"2011" means the literal character string.
df |>
filter(2011 > 0)# A tibble: 5 × 3
x `2011` `my var`
<dbl> <dbl> <dbl>
1 -2 -2 -2
2 -0.5 -0.5 -1
3 0.5 0.5 0
4 1 1 1
5 2 2 2
2011 means the literal number.
df |>
filter(`2011` > 0)# A tibble: 3 × 3
x `2011` `my var`
<dbl> <dbl> <dbl>
1 0.5 0.5 0
2 1 1 1
3 2 2 2
`2011` means the column name in df.
Referencing a column in a pipeline:
df |>
filter("my var" > 0)# A tibble: 5 × 3
x `2011` `my var`
<dbl> <dbl> <dbl>
1 -2 -2 -2
2 -0.5 -0.5 -1
3 0.5 0.5 0
4 1 1 1
5 2 2 2
"my var" means the literal character string.
df |>
filter(my var > 0)Error in parse(text = input): <text>:2:13: unexpected symbol
1: df |>
2: filter(my var
^
my var means nothing.
df |>
filter(`my var` > 0)# A tibble: 2 × 3
x `2011` `my var`
<dbl> <dbl> <dbl>
1 1 1 1
2 2 2 2
`my var` means the column name in df.