AE 01: Meet the penguins

The goal of this application exercise is to get exposure to using the computational toolkit. Let’s get started!

Quarto Code Chunks

  • Code goes in “code chunks”: these are grey boxes and can be recognized by ‘{r}’
  • To run a code chunk, click the little green right facing arrow; to run a code chunk and all preceding code chunks, use the downward pointing arrow.
  • To run a subset of a code chunk, highlight the lines of code you wish to run and press (cmd + return) or PC equivalent (Cntrl + Enter)
  • Text goes outside of the code chunks!
# this is a code chunk
What’s going on?

What’s that text in the code chunk?

  • #| label: code-chunk : this blue text at the top is a label: basically, it names the code chunk for easy reference. Code chunk names cannot be repeated!

  • \# this is a code chunk : this green text is a comment. A comment goes in a code chunk, but functions like normal text

Load Packages

For this application exercise, we’ll use the tidyverse and palmerpenguins packages.

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.2.1     ✔ readr     2.2.0
✔ forcats   1.0.1     ✔ stringr   1.6.0
✔ ggplot2   4.0.3     ✔ tibble    3.3.1
✔ lubridate 1.9.5     ✔ tidyr     1.3.2
✔ purrr     1.2.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Examine Data

The dataset we will use is called penguins; it was loaded with the palmerpenguins package. You’ll notice it’s not visible yet in the environment pane - let’s put it there.

penguins <- palmerpenguins::penguins

Two useful functions to examine it are glimpse() and View()

Let’s glimpse() at it.

  • Your turn: Replace #add code here with the code for “glimpse” ing at the data penguins data frame – glimpse(penguins). Render the document and view the output.
# add code here

Now, let’s View() it.

  • Your turn: In your console, type code for “view” ing at the data penguins data frame – View(penguins).
Why in the console?

Note that the use of View() opens a new tab in your data viewer pane, rather than printing information to your screen. As a best practice, avoid using the View() function in a Quarto code chunk, as this will cause downstream rendering errors. Reserve use of View() for the console.

::: What information can you see from these two operations? How are they different?

Some R Fundamentals

You just used some functions above - library(), data(), glimpse(), and View(). Let’s practice with some more!

Getting Help

There is a function that tells you how many rows are in the data frame: nrow(). Perhaps this is your first time using it and you aren’t sure how it works: you can use ? to see the documentation.

  • Your turn: Write code to get help with the nrow function
#add code here

(This works for any function, not just nrow!)

  • Your Turn: Now, let’s compute the number of rows in the data frame:
#add code here

Inline code

Inline code in Quarto allows you to execute code within your narrative, e.g. to automatically use the most up-to-date computations in your narrative. The syntax for inline code is similar to code blocks, except you use a single tick (`) rather than triple ticks (```). Let’s practice using inline code by filling in the blanks:

The penguins dataset has ___ rows and ___ columns.

Click render to see whether your inline code has worked!

Oh no!! Errors!!

Unfortunately, functions might not run correctly every time you run them.

What happens if you run mean() on the data frame? Does this even make sense???

  • Your turn: try running this function on the penguins data frame and see what happens!
# add code here

Accessing Columns

As we saw with the mean example, not every function works on a full data frame. Sometimes, you need to access just one column. To do that, we can use $ as dataframe$column_name.

  • Your turn: In the code chunk below, compute the mean of the bill_depth_mm variable in the penguins data frame.
#add code here

Hmmm… something weird is still happening! What does this NA value mean? Do you have any guesses? How can we fix this?

Adding arguments

To fix our issue with mean, we need to tell the function something else (that is, use more than one argument).

  • Your turn: First, get help with the ? . Then, try to compute the mean value, ignoring the NA values
#add code here

How is the document looking?

Click render to see!

Let’s push our changes to GitHub!

Remember:

  • Stage changes with the checkboxes

  • Commit with an informative message

  • Push!

Miscellaneous:

If there is extra time in class, we’ll add some other tips here!