Data Types and Classes

Lecture 8

Author

Affiliation

Katie Solarz

Duke University
STA 199 Summer 2026: Session I

Published

May 26, 2026

Types and classes

An object’s type indicates how it is stored in memory

Common data types

You’ll commonly encounter:

logical
integer
double
character

You’ll less commonly encounter:

list
NULL
complex
raw

Logical and Character

Logical: Boolean TRUE / FALSE values

<dbl>

typeof(TRUE)

[1] "logical"

typeof(FALSE)

[1] "logical"

Character: single characters, or strings of characters; wrap in quotes

<chr>

typeof("x")

[1] "character"

typeof("Hello!")

[1] "character"

typeof("TRUE")

[1] "character"

Numeric: Double and integer

Double: floating point numerical values (default numerical type)

<dbl>

typeof(2.5)

[1] "double"

typeof(3)

[1] "double"

Integer: integer numerical values; indicated with an L

<int>

typeof(3L)

[1] "integer"

Type Compatibility

Can you use different types of data together? Sometimes… but be careful!

"3" + 3

Error in `"3" + 3`:
! non-numeric argument to binary operator

3L + 3

[1] 6

typeof(3L + 3)

[1] "double"

TRUE + 3L

[1] 4

typeof(TRUE + 3L)

[1] "integer"

Concatenation

Vectors are constructed using the c function

Double vector:

x <- c(1, 2, 3, 5)
typeof(x)

[1] "double"

Integer vectors:

x <- c(1L, 2L, 3L, 5L)
typeof(x)

[1] "integer"

Character vector:

x <- c("1", "2", "3", "5")
typeof(x)

[1] "character"

Logical vectors:

x <- c(TRUE, FALSE, FALSE)
typeof(x)

[1] "logical"

Converting between types

without intention…

c(2, "Just this one!")

[1] "2"              "Just this one!"

R will happily convert between various types without complaint when different types of data are concatenated in a vector. This is NOT always a good thing.

Converting between types

without intention…

c(FALSE, 3L)

[1] 0 3

c(1.2, 3L)

[1] 1.2 3.0

c(2L, "two")

[1] "2"   "two"

Converting between types

with intention…

x <- 1:3
x

[1] 1 2 3

typeof(x)

[1] "integer"

y <- as.character(x)
y

[1] "1" "2" "3"

typeof(y)

[1] "character"

Converting between types

with intention…

x <- c(TRUE, FALSE)
x

[1]  TRUE FALSE

typeof(x)

[1] "logical"

y <- as.numeric(x)
y

[1] 1 0

typeof(y)

[1] "double"

Explicit vs. implicit coercion

Explicit coercion:

When you call a function like:

Implicit coercion:

Happens when you use a vector in a specific context that expects a certain type of vector.

You’ve seen explicit coercion before

statsci |>
  pivot_longer(
    cols = -degree_type,
    values_to = "n",
    names_to = "year",
    names_transform = as.numeric
  )

Data classes

Data types are like Lego building blocks
We can stick them together to build more complicated constructs, e.g. representations of data
The class determines this construct
Examples: factors, dates, and data frames

Data frames

We can think of data frames like like vectors of equal length glued together

df <- data.frame(x = 1:2, y = 3:4)
df

  x y
1 1 3
2 2 4

typeof(df)

[1] "list"

class(df)

[1] "data.frame"

## alternatively, "bind" two columns together
x <- data.frame(x = 1:2)
y <- data.frame(y = 3:4)

df <- cbind(x, y)
df

  x y
1 1 3
2 2 4

typeof(df)

[1] "list"

class(df)

[1] "data.frame"

Data frames

We can think of data frames like like vectors of equal length glued together

df <- data.frame(x = 1:2, y = 3:4)
df

  x y
1 1 3
2 2 4

typeof(df)

[1] "list"

class(df)

[1] "data.frame"

When we use the pull() function, we extract a vector from the data frame; this is functionally the same as accessing a column with df$col_name

df |>
  pull(y)

[1] 3 4

Dates

today <- as.Date("2026-05-26")
today

[1] "2026-05-26"

typeof(today)

[1] "double"

class(today)

[1] "Date"

More on dates

We can think of dates like an integer (the number of days since the origin, 1 Jan 1970) and an integer (the origin, aka “the Unix epoch”) glued together

as.integer(today)

[1] 20599

as.integer(today) / 365 # roughly 56 yrs

[1] 56.43562

as.integer(as.Date("1970-01-01"))

[1] 0

More on dates

The lubridate package allows you to work with / access elements of dates seamlessly

library(lubridate)

year(today)

[1] 2026

month(today)

[1] 5

day(today)

[1] 26

Factors

R uses factors to handle categorical variables with a fixed and known set of possible values
factor(x = ...): “The default (ordering of levels) is the unique set of values taken by as.character(x), sorted into increasing (alphabetical) order of x”

summer_months <- c("June", "July", "June", "August", "June")
typeof(summer_months)

[1] "character"

summer_factor <- factor(summer_months)
summer_factor

[1] June   July   June   August June  
Levels: August July June

levels(summer_factor) ## print levels

[1] "August" "July"   "June"

typeof(summer_factor)

[1] "integer"

class(summer_factor)

[1] "factor"

More on factors

We can think of factors like character (level labels) and an integer (level numbers) glued together

glimpse(summer_factor)

 Factor w/ 3 levels "August","July",..: 3 2 3 1 3

as.integer(summer_factor)

[1] 3 2 3 1 3

Example factor re-order

amounts <- c("low", "medium", "high", "high", "medium")
amounts_factor <- factor(amounts)
amounts_factor

[1] low    medium high   high   medium
Levels: high low medium

fct_relevel(amounts_factor, c("low", "medium", "high"))

[1] low    medium high   high   medium
Levels: low medium high

FAQ

Quotes VS no quotes VS backticks

. . .

df <- tibble(
  x = c(-2, -0.5, 0.5, 1, 2),
  `2011` = c(-2, -0.5, 0.5, 1, 2),
  `my var` = c(-2, -1, 0, 1, 2)
)
df

# A tibble: 5 × 3
      x `2011` `my var`
  <dbl>  <dbl>    <dbl>
1  -2     -2         -2
2  -0.5   -0.5       -1
3   0.5    0.5        0
4   1      1          1
5   2      2          2

Quotes VS no quotes VS backticks

df <- tibble(
  x = c(-2, -0.5, 0.5, 1, 2),
  `2011` = c(-2, -0.5, 0.5, 1, 2),
  `my var` = c(-2, -1, 0, 1, 2)
)

Referencing a column in a pipeline:

df |>
  filter("x" > 0)

# A tibble: 5 × 3
      x `2011` `my var`
  <dbl>  <dbl>    <dbl>
1  -2     -2         -2
2  -0.5   -0.5       -1
3   0.5    0.5        0
4   1      1          1
5   2      2          2

"x" means the literal character string.

df |>
  filter(x > 0)

# A tibble: 3 × 3
      x `2011` `my var`
  <dbl>  <dbl>    <dbl>
1   0.5    0.5        0
2   1      1          1
3   2      2          2

x means the column name in df.

df |>
  filter(`x` > 0)

# A tibble: 3 × 3
      x `2011` `my var`
  <dbl>  <dbl>    <dbl>
1   0.5    0.5        0
2   1      1          1
3   2      2          2

`x` also means the column name in df.

Quotes VS no quotes VS backticks

df <- tibble(
  x = c(-2, -0.5, 0.5, 1, 2),
  `2011` = c(-2, -0.5, 0.5, 1, 2),
  `my var` = c(-2, -1, 0, 1, 2)
)

Referencing a column in a pipeline:

df |>
  filter("2011" > 0)

# A tibble: 5 × 3
      x `2011` `my var`
  <dbl>  <dbl>    <dbl>
1  -2     -2         -2
2  -0.5   -0.5       -1
3   0.5    0.5        0
4   1      1          1
5   2      2          2

"2011" means the literal character string.

df |>
  filter(2011 > 0)

# A tibble: 5 × 3
      x `2011` `my var`
  <dbl>  <dbl>    <dbl>
1  -2     -2         -2
2  -0.5   -0.5       -1
3   0.5    0.5        0
4   1      1          1
5   2      2          2

2011 means the literal number.

df |>
  filter(`2011` > 0)

# A tibble: 3 × 3
      x `2011` `my var`
  <dbl>  <dbl>    <dbl>
1   0.5    0.5        0
2   1      1          1
3   2      2          2

`2011` means the column name in df.

Quotes VS no quotes VS backticks

df <- tibble(
  x = c(-2, -0.5, 0.5, 1, 2),
  `2011` = c(-2, -0.5, 0.5, 1, 2),
  `my var` = c(-2, -1, 0, 1, 2)
)

Referencing a column in a pipeline:

df |>
  filter("my var" > 0)

# A tibble: 5 × 3
      x `2011` `my var`
  <dbl>  <dbl>    <dbl>
1  -2     -2         -2
2  -0.5   -0.5       -1
3   0.5    0.5        0
4   1      1          1
5   2      2          2

"my var" means the literal character string.

df |>
  filter(my var > 0)

Error in parse(text = input): <text>:2:13: unexpected symbol
1: df |>
2:   filter(my var
               ^

my var means nothing.

df |>
  filter(`my var` > 0)

# A tibble: 2 × 3
      x `2011` `my var`
  <dbl>  <dbl>    <dbl>
1     1      1        1
2     2      2        2

`my var` means the column name in df.

AE 08: Working with Factors