<- dplyr::as_tibble(iris)
iris_tbl class(iris_tbl)
[1] "tbl_df" "tbl" "data.frame"
2025-05-09
One of the big impediments to learning R
in the past was the vast ecosystem of packages.
Along came the tidyverse
collection of packages! While not the most efficient method for programming, the tidyverse
provides a coherent ecosystem for almost all common data tasks! That is,
tidyverse
SyntaxAs the tidyverse
is mostly concerned with the analysis and manipulation of data, the main data object used is a special version of a data frame called a tibble.
You can see the classes of a tibble include a data frame. When R functions do method dispatch, they look through the class list from first to last. If there is a method for tbl_df
it uses that, if not, it looks for a method for a tbl
. If that doesn’t exist, it uses a method for data.frame
s.
tibble [150 x 5] (S3: tbl_df/tbl/data.frame)
$ Sepal.Length: num [1:150] 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
$ Sepal.Width : num [1:150] 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
$ Petal.Length: num [1:150] 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
$ Petal.Width : num [1:150] 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
$ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
We see the structure looks very similar to that of a data.frame
.
# A tibble: 150 x 5
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
<dbl> <dbl> <dbl> <dbl> <fct>
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
7 4.6 3.4 1.4 0.3 setosa
8 5 3.4 1.5 0.2 setosa
9 4.4 2.9 1.4 0.2 setosa
10 4.9 3.1 1.5 0.1 setosa
# i 140 more rows
However, we can see the default print()
method for a tibble (which is used when you just type an R object into the console) is not the same. We get fancy printing that is more useful for us and doesn’t clog up our output space. We get information on the number of observations, the columns, and see only the first few rows/columns.
Almost all of the tidyverse
functions are built to work on a tibble. That is, they usually take in a tibble and output a tibble.
(Almost) all functions have similar syntax!
function_name(tibble, other_arg, ...)
Makes them perfect for chaining!
tibble |>
function(other_arg, ...) |>
...
Note: you’ll often see the chain from the magrittr
package used (%>%
). Due to the popularity of this operator, R
implemented its own pipe recently (|>
). At this point, the functionality is almost the same so we’ll use the Base R
pipe since it doesn’t require a package load.
tidyverse
PackagesThe tidyverse
consists of a large number of packages. However, library(tidyverse)
loads only the eight core packages (which sometimes load other packages of course). Those are (from their website):
ggplot2
- ggplot2 is a system for declaratively creating graphics, based on The Grammar of Graphics. You provide the data, tell ggplot2 how to map variables to aesthetics, what graphical primitives to use, and it takes care of the detailsdplyr
- dplyr provides a grammar of data manipulation, providing a consistent set of verbs that solve the most common data manipulation challengestidyr
- tidyr provides a set of functions that help you get to tidy data. Tidy data is data with a consistent form: in brief, every variable goes in a column, and every column is a variablereadr
- readr provides a fast and friendly way to read rectangular data (like csv, tsv, and fwf). It is designed to flexibly parse many types of data found in the wild, while still cleanly failing when data unexpectedly changespurrr
- purrr enhances R’s functional programming (FP) toolkit by providing a complete and consistent set of tools for working with functions and vectors. Once you master the basic concepts, purrr allows you to replace many for loops with code that is easier to write and more expressivetibble
- tibble is a modern re-imagining of the data frame, keeping what time has proven to be effective, and throwing out what it has not. Tibbles are data.frames that are lazy and surly: they do less and complain more forcing you to confront problems earlier, typically leading to cleaner, more expressive codestringr
- stringr provides a cohesive set of functions designed to make working with strings as easy as possible. It is built on top of stringi, which uses the ICU C library to provide fast, correct implementations of common string manipulationsforcats
- forcats provides a suite of useful tools that solve common problems with factors. R uses factors to handle categorical variables, variables that have a fixed and known set of possible valuesWe’ll spend a good bit of time on ggplot2
, dplry
, tidyr
, and readr
. tibble
will get used implicitly along the way
tidyverse
provides a coherent ecosystem for almost all common data tasks!
tibbles
(special data frames)Use the table of contents on the left or the arrows at the bottom of this page to navigate to the next learning material!