Files corresponding to Short Course: Introduction to Data Science Using R
This course introduces the powerful and popular R statistical software through the RStudio integrated development environment. R is a fully developed programming language and one of the major platforms for doing data science. This course covers frequently used data structures, importing raw data, common data manipulations, summary statistics, and data visualizations through the suite of packages called the tidyverse.
R is an extremely versatile programming language that has the capability to fit a fantastic array of statistical and machine learning models, is extremely easy to collaborate with, and has the capacity to easily and widely share your analyses.
Unfortunately, to be able to utilize these vast capabilities we must of course import the data, likely create variables, and subset our data appropriately. We also want to understand and validate our data through summarizations. R can easily handle these tasks in a multitude of ways. However, the flexibility that comes with R also creates a difficult learning environment. There are often many ways to do the same task and it can be overwhelming at first to determine the best methods.
This course will help you to gain a solid foundation in the modern use of R to do the common tasks mentioned above.
The course provides a modern introduction to the R through the extremely popular suite of packages called the tidyverse. A rough outline is given below:
Day 1:
Basics of how R stores data
R Packages and the tidyverse
Reading data from common formats into R (readr package)
Using R Markdown for reproducibility (rmarkdown and knitr packages)
Common data manipulations and creating new variables (dplyr package)
Day 2:
Reshaping data for summarizing and modeling (tidyr package)
Types of data and numeric summaries (including across groups)
Creating publication ready graphs (ggplot2 package)
This course will make heavy use of hands-on programming. We’ll generally introduce a topic and then have exercises to practice and explore. As such, participants must bring their own laptop computer that has access to the internet and the ability to install programs and download files. This course assumes a strong working knowledge of computers and, although not required, it would be beneficial to have past experience with the logic of programming and/or executing statistical analyses.