Data Types and Classes

Lecture 8

Author
Affiliation

Katie Solarz

Duke University
STA 199 Summer 2026: Session I

Published

May 26, 2026

Types and classes

Types and classes

An object’s type indicates how it is stored in memory

Common data types

You’ll commonly encounter:

  • logical
  • integer
  • double
  • character

You’ll less commonly encounter:

  • list
  • NULL
  • complex
  • raw

Logical and Character

Logical: Boolean TRUE / FALSE values

  • <dbl>
typeof(TRUE)
[1] "logical"


typeof(FALSE)
[1] "logical"

Character: single characters, or strings of characters; wrap in quotes

  • <chr>
typeof("x")
[1] "character"


typeof("Hello!")
[1] "character"


typeof("TRUE")
[1] "character"

Numeric: Double and integer

Double: floating point numerical values (default numerical type)

  • <dbl>
typeof(2.5)
[1] "double"


[1] "double"

Integer: integer numerical values; indicated with an L

  • <int>
typeof(3L)
[1] "integer"

Type Compatibility

Can you use different types of data together? Sometimes… but be careful!

"3" + 3
Error in `"3" + 3`:
! non-numeric argument to binary operator
3L + 3
[1] 6
typeof(3L + 3)
[1] "double"
TRUE + 3L
[1] 4
typeof(TRUE + 3L)
[1] "integer"

Concatenation

Vectors are constructed using the c function

  • Double vector:

    x <- c(1, 2, 3, 5)
    typeof(x)
    [1] "double"
  • Integer vectors:

    x <- c(1L, 2L, 3L, 5L)
    typeof(x)
    [1] "integer"
  • Character vector:

    x <- c("1", "2", "3", "5")
    typeof(x)
    [1] "character"
  • Logical vectors:

    x <- c(TRUE, FALSE, FALSE)
    typeof(x)
    [1] "logical"

Converting between types

without intention…

c(2, "Just this one!")
[1] "2"              "Just this one!"


R will happily convert between various types without complaint when different types of data are concatenated in a vector. This is NOT always a good thing.

Converting between types

without intention…

c(FALSE, 3L)
[1] 0 3


c(1.2, 3L)
[1] 1.2 3.0


c(2L, "two")
[1] "2"   "two"

Converting between types

with intention…

x <- 1:3
x
[1] 1 2 3
[1] "integer"
y <- as.character(x)
y
[1] "1" "2" "3"
[1] "character"

Converting between types

with intention…

x <- c(TRUE, FALSE)
x
[1]  TRUE FALSE
[1] "logical"
y <- as.numeric(x)
y
[1] 1 0
[1] "double"

Explicit vs. implicit coercion

Explicit coercion:

When you call a function like:

Implicit coercion:

Happens when you use a vector in a specific context that expects a certain type of vector.

You’ve seen explicit coercion before

statsci |>
  pivot_longer(
    cols = -degree_type,
    values_to = "n",
    names_to = "year",
    names_transform = as.numeric
  )

Data classes

Data classes

  • Data types are like Lego building blocks
  • We can stick them together to build more complicated constructs, e.g. representations of data
  • The class determines this construct
  • Examples: factors, dates, and data frames

Data frames

We can think of data frames like like vectors of equal length glued together

df <- data.frame(x = 1:2, y = 3:4)
df
  x y
1 1 3
2 2 4


typeof(df)
[1] "list"


class(df)
[1] "data.frame"
## alternatively, "bind" two columns together
x <- data.frame(x = 1:2)
y <- data.frame(y = 3:4)

df <- cbind(x, y)
df
  x y
1 1 3
2 2 4


typeof(df)
[1] "list"


class(df)
[1] "data.frame"

Data frames

We can think of data frames like like vectors of equal length glued together

df <- data.frame(x = 1:2, y = 3:4)
df
  x y
1 1 3
2 2 4
typeof(df)
[1] "list"
class(df)
[1] "data.frame"
  • When we use the pull() function, we extract a vector from the data frame; this is functionally the same as accessing a column with df$col_name
df |>
  pull(y)
[1] 3 4

Dates

today <- as.Date("2026-05-26")
today
[1] "2026-05-26"
typeof(today)
[1] "double"
class(today)
[1] "Date"

More on dates

We can think of dates like an integer (the number of days since the origin, 1 Jan 1970) and an integer (the origin, aka “the Unix epoch”) glued together

as.integer(today)
[1] 20599
as.integer(today) / 365 # roughly 56 yrs
[1] 56.43562
as.integer(as.Date("1970-01-01"))
[1] 0

More on dates

The lubridate package allows you to work with / access elements of dates seamlessly

year(today)
[1] 2026
month(today)
[1] 5
day(today)
[1] 26

Factors

  • R uses factors to handle categorical variables with a fixed and known set of possible values

  • factor(x = ...): “The default (ordering of levels) is the unique set of values taken by as.character(x), sorted into increasing (alphabetical) order of x”

summer_months <- c("June", "July", "June", "August", "June")
typeof(summer_months)
[1] "character"
summer_factor <- factor(summer_months)
summer_factor
[1] June   July   June   August June  
Levels: August July June
levels(summer_factor) ## print levels
[1] "August" "July"   "June"  
typeof(summer_factor)
[1] "integer"
class(summer_factor)
[1] "factor"

More on factors

  • We can think of factors like character (level labels) and an integer (level numbers) glued together
glimpse(summer_factor)
 Factor w/ 3 levels "August","July",..: 3 2 3 1 3
as.integer(summer_factor)
[1] 3 2 3 1 3

More on factors

We can use the forcats package (in tidyverse) to work with factors!

Some commonly used functions are:

  • fct_relevel(): reorder factors by hand

  • fct_reorder(): reorder factors by another variable

  • fct_infreq(): reorder factors by frequency

  • fct_rev(): reorder factors by reversing

Example factor re-order

amounts <- c("low", "medium", "high", "high", "medium")
amounts_factor <- factor(amounts)
amounts_factor
[1] low    medium high   high   medium
Levels: high low medium
fct_relevel(amounts_factor, c("low", "medium", "high"))
[1] low    medium high   high   medium
Levels: low medium high

FAQ

Quotes VS no quotes VS backticks

. . .

df <- tibble(
  x = c(-2, -0.5, 0.5, 1, 2),
  `2011` = c(-2, -0.5, 0.5, 1, 2),
  `my var` = c(-2, -1, 0, 1, 2)
)
df
# A tibble: 5 × 3
      x `2011` `my var`
  <dbl>  <dbl>    <dbl>
1  -2     -2         -2
2  -0.5   -0.5       -1
3   0.5    0.5        0
4   1      1          1
5   2      2          2

Quotes VS no quotes VS backticks

df <- tibble(
  x = c(-2, -0.5, 0.5, 1, 2),
  `2011` = c(-2, -0.5, 0.5, 1, 2),
  `my var` = c(-2, -1, 0, 1, 2)
)

Referencing a column in a pipeline:

df |>
  filter("x" > 0)
# A tibble: 5 × 3
      x `2011` `my var`
  <dbl>  <dbl>    <dbl>
1  -2     -2         -2
2  -0.5   -0.5       -1
3   0.5    0.5        0
4   1      1          1
5   2      2          2

"x" means the literal character string.

df |>
  filter(x > 0)
# A tibble: 3 × 3
      x `2011` `my var`
  <dbl>  <dbl>    <dbl>
1   0.5    0.5        0
2   1      1          1
3   2      2          2

x means the column name in df.

df |>
  filter(`x` > 0)
# A tibble: 3 × 3
      x `2011` `my var`
  <dbl>  <dbl>    <dbl>
1   0.5    0.5        0
2   1      1          1
3   2      2          2

`x` also means the column name in df.

Quotes VS no quotes VS backticks

df <- tibble(
  x = c(-2, -0.5, 0.5, 1, 2),
  `2011` = c(-2, -0.5, 0.5, 1, 2),
  `my var` = c(-2, -1, 0, 1, 2)
)

Referencing a column in a pipeline:

df |>
  filter("2011" > 0)
# A tibble: 5 × 3
      x `2011` `my var`
  <dbl>  <dbl>    <dbl>
1  -2     -2         -2
2  -0.5   -0.5       -1
3   0.5    0.5        0
4   1      1          1
5   2      2          2

"2011" means the literal character string.

df |>
  filter(2011 > 0)
# A tibble: 5 × 3
      x `2011` `my var`
  <dbl>  <dbl>    <dbl>
1  -2     -2         -2
2  -0.5   -0.5       -1
3   0.5    0.5        0
4   1      1          1
5   2      2          2

2011 means the literal number.

df |>
  filter(`2011` > 0)
# A tibble: 3 × 3
      x `2011` `my var`
  <dbl>  <dbl>    <dbl>
1   0.5    0.5        0
2   1      1          1
3   2      2          2

`2011` means the column name in df.

Quotes VS no quotes VS backticks

df <- tibble(
  x = c(-2, -0.5, 0.5, 1, 2),
  `2011` = c(-2, -0.5, 0.5, 1, 2),
  `my var` = c(-2, -1, 0, 1, 2)
)

Referencing a column in a pipeline:

df |>
  filter("my var" > 0)
# A tibble: 5 × 3
      x `2011` `my var`
  <dbl>  <dbl>    <dbl>
1  -2     -2         -2
2  -0.5   -0.5       -1
3   0.5    0.5        0
4   1      1          1
5   2      2          2

"my var" means the literal character string.

df |>
  filter(my var > 0)
Error in parse(text = input): <text>:2:13: unexpected symbol
1: df |>
2:   filter(my var
               ^

my var means nothing.

df |>
  filter(`my var` > 0)
# A tibble: 2 × 3
      x `2011` `my var`
  <dbl>  <dbl>    <dbl>
1     1      1        1
2     2      2        2

`my var` means the column name in df.

AE 08: Working with Factors