R Projects
Markdown basics
Git & Github
Creating a blog & website with R and github
Automating R Markdown
Writing a Book with bookdown
Creating interactive apps with R shiny
R Projects
Markdown basics
Git & Github
Creating a blog & website with R and github
Automating R Markdown
Writing a Book with bookdown
Creating interactive apps with R shiny
Often have many files associated with an analysis
With multiple analyses things get cluttered
Often have many files associated with an analysis
With multiple analyses things get cluttered
Want to associate different
environments
histories
working directories
source documents
with each analysis
Create two new projects (with new empty folders):
One called ‘github_website’
One called ‘automation_of_markdown’
(We’ll also create one later from git.)
renv
optionLet’s look at project options via tools --> Project options
Switch between projects with the upper right menu
Modify and save the project. Note the differing behavior of your R sessoin depending on your project options
Open the .Rproj file in notepad
#
in R) in scriptWork in your ‘github_website’ project
Create a new markdown doc via menus and let’s explore it!
R Markdown file contains three important types of content:
(Optional) YAML header surrounded by ---
s
Chunks of R code
Text mixed with simple text formatting instructions
--- title: "Untitled" author: "First Last" date: "xxxx" output: html_document ---
CTRL/CMD + Shift + k knits (creates the output document) via this info
Can also knit via the little arrow to knit to a different format
Can knit via the rmarkdown::render()
function
## R Markdown This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see <http://rmarkdown.rstudio.com>. When you click the **Knit** button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document.
**Knit**
bold fontThis is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document.
Syntax is really easy to learn via the Cheat sheet
Quick look at the following:
Markdown syntax
Code chunks and their options
Changing type of output
# Header 1
becomes a large font header
## Header 2
becomes a slightly smaller font header
Goes to 6 headers
**bold**
and __bold__
code
Can do lists: be sure to end each line with two spaces!
* unordered list * item 2 + sub-item 1 + sub-item 2 1. ordered list 2. item 2 + sub-item 1 + sub-item 2
Many options depending on chunk purpose!
Can hide/show code with echo = FALSE/TRUE
Can choose if code is evaluated with eval = TRUE/FALSE
message = TRUE/FALSE
and warning = TRUE/FALSE
can turn on/off displaying messages/warnings
R Markdown = Digital “Notebook”: Program that weaves word processing and code.
Designed to be used in three ways (R for Data Science)
Communicating to decision makers (focus on conclusions not code)
Collaborating with other data scientists (including future you!)
As environment to do data science (can evaluate and edit/reevaluate code chunks and document what you did and what you were thinking)
R Markdown really flexible!
R Markdown really flexible!
Change output type in the YAML header and use CTRL/CMD + Shift + k
knit via the menu
Use code explicity:rmarkdown::render("file.Rmd", output_format = "html_document")
tinytex
package and run tinytex::install_tinytex()
Check out the R Markdown definitive guide for cool options for each type of output!
Try to implement code folding, tabsets, and a TOC in an HTML output doc. Can you do it via render
?
Try to implement the kable method of printing a data frame and a TOC in a PDF. Can you do it via render
?
Multiple outputs in one call:
rmarkdown::render("yourfile.Rmd", output_format = c("html_document", "pdf_document", "word_document")
You can also change the name of the output files via the output_file
argument
Can you get it to output three files called ‘my.html’, ‘your.pdf’, ‘their.docx’ with one function call?
Let’s try it! (Note: You can’t specify options when rendering to multiple formats)
Great for document your code/thoughts and for sharing your analyses easily!
Next, we’ll learn about git/github and how RStudio can work with them
A similar Markdown language is used to render documents on github so we’ve already learned enough to create some basic webpages too :)
Ideally we want to document our process, easily collaborate, and widely share our work
To make our workflow for a project reproducible, ideally we would save different versions of our analysis, write-up, etc. along the way
git is a version control software that easily allows multiple users to work on the same project. It simply tracks the changes that we commit
to the files.
clone
the repo locally (or pull
the files down to update it)add
your modified/new files to the repo so other can use them.commit
(i.e. prepare everything to send back to the remote repo)push
your local committed changes up the the repo on githubEveryone can work on the same branch and update as needed (sometimes there will be merge conflicts, covered shortly)
fork
the repo or create their own branch
to work on, rather than modify the main repositorypush
up your changes until they are tested or the new ‘feature’ is debuggedmerge
request to combine your modified repo with the main repo branch(from https://git.logikum.hu)
Each circle represents a ‘commit’ to that repository/branch (all version of files at each commit are kept!)
Let’s look at an example repo and the commits done/how it is tracked!
Sign into github.com and then go to this repo
Click fork
in the top right corner. This will give you a copy of this repo under your account! Under settings, rename the repo to yourname.github.io
Visit https://yourgithubusername.github.io to see the default blog page (i.e. you already have a blog :)
Let’s make some changes! Click on the _posts
folder. Edit the file you see there by clicking on the name. You can then click on the pencil in the right hand header for the file display.
It may take 2 minutes but visit https://yourgithubusername.github.io again to see the changes.
To follow our previous idea (and to start doing this with R), we really don’t want to use the web interface
We can clone
the repo (i.e. download the entire repo locally)
Repo main page has a green button. Click on that.
Open RStudio, go to new project, from Version Control, choose Git, and paste in the repo link. Select a directory to save this in and hit Create project!
Now have the files locally!
We need to make sure RStudio and github can communicate. Do the following:
Go to the Git
tab in your Environment
area
You should see some files there. These are the ones that have changed from the remote repo (the one on github)
Here you can add files that you’d like to commit up to the remote repo
Click on all of the boxes (equivalent to git add -A
) and click the Commit
button
This brings up a window that allows you to compare changes. If you are happy you can put a commit message in the box in the top right and click the commit button (equivalent to git commit -m "message"
)
Hit close on the window (you should see no errors, just a message about the commit)
Now click the push button in the top right (equivalent to git push
)
You should be prompted to log-in in some way. Do so!
Go to your repo on gitub.com and see the changes!
When working by myself on a repo, I’m not worried about merge conflicts with other people’s changes. As such, my workflow is as follows:
Open the appropriate project in RStudio
Go to the Terminal (switched to Bash) and type git pull
(or use the git tab)
Work… at a good spot for saving, back to the terminal
Type git add -A
to add all files that have been modified
Type git commit -m "Message"
to stage a commit
Type git push
to push the local changes to the remote repo
Go to your blog repo and make a change via the web interface
Your local repo is no longer up to date! (Type git status
in the terminal to check). You’ll need to pull down the changes.
Now, let’s update from RStudio!
Update the about page of your blog by editing ‘about.md’. Make changes like before and commit!
Push up the changes. In 2-3 minutes you should see the About page updates :)
Let’s create a repo with our already created ‘github_website’ project
Go to github.com and create a repo with that name (Use the + in the top right corner - don’t initiate a git ignore file)
Open the github_website project we worked on earlier
Go to Tools –> Project Options –> Git (restart R as requested)
Now in the terminal do the following:
git remote add origin [paste the clone link here]
(initiate the tracking of the project on github)git pull origin main
(download all remote files)git push -u origin main
(track changes on this machine)Add, commit, and push your files up!
Very easy! Two steps:
github_document
Let’s take a break from git!
Other markdown functionality: Using parameters
Can be added to the YAML header and can be used to automate reports!
Suppose we are dealing with a football box score data set
NFLData <- read_csv("https://www4.stat.ncsu.edu/~online/datasets/scoresFull.csv") NFLData
## # A tibble: 3,471 x 82 ## week date day season awayTeam AQ1 AQ2 AQ3 AQ4 AOT AOT2 AFinal ## <chr> <chr> <chr> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 1 5-Sep Thu 2002 San Franc~ 3 0 7 6 -1 -1 16 ## 2 1 8-Sep Sun 2002 Minnesota~ 3 17 0 3 -1 -1 23 ## 3 1 8-Sep Sun 2002 New Orlea~ 6 7 7 0 6 -1 26 ## 4 1 8-Sep Sun 2002 New York ~ 0 17 3 11 6 -1 37 ## 5 1 8-Sep Sun 2002 Arizona C~ 10 3 3 7 -1 -1 23 ## # ... with 3,466 more rows, and 70 more variables: homeTeam <chr>, HQ1 <dbl>, ## # HQ2 <dbl>, HQ3 <dbl>, HQ4 <dbl>, HOT <dbl>, HOT2 <dbl>, HFinal <dbl>, ## # stadium <chr>, startTime <time>, toss <chr>, roof <chr>, surface <chr>, ## # duration <dbl>, attendance <chr>, weather <chr>, vegasLine <chr>, OU <chr>, ## # AfirstDowns <dbl>, AnetPassYds <dbl>, AtotalYds <dbl>, Aturnovers <dbl>, ## # AtotalPlays <dbl>, HfirstDowns <dbl>, HnetPassYds <dbl>, HtotalYds <dbl>, ## # Hturnovers <dbl>, HtotalPlays <dbl>, OUvalue <dbl>, OUresult <chr>, ## # awayRushAtt <dbl>, awayRushYds <dbl>, awayRushTD <dbl>, awayPassComp <dbl>, ## # awayPassAtt <dbl>, awayPassYds <dbl>, awayPassTD <dbl>, awayPassInt <dbl>, ## # awayTimesSacked <dbl>, awaySackYdsLost <dbl>, awayFum <dbl>, ## # awayFumLost <dbl>, awayNumPen <dbl>, awayPenYds <dbl>, away3rdConv <dbl>, ## # away3rdAtt <dbl>, away4thConv <dbl>, away4thAtt <dbl>, awayTOP <dbl>, ## # homeRushAtt <dbl>, homeRushYds <dbl>, homeRushTD <dbl>, homePassComp <dbl>, ## # homePassAtt <dbl>, homePassYds <dbl>, homePassTD <dbl>, homePassInt <dbl>, ## # homeTimesSacked <dbl>, homeSackYdsLost <dbl>, homeFum <dbl>, ## # homeFumLost <dbl>, homeNumPen <dbl>, homePenYds <dbl>, home3rdConv <dbl>, ## # home3rdAtt <dbl>, home4thConv <dbl>, home4thAtt <dbl>, homeTOP <dbl>, ## # HminusAScore <dbl>, homeSpread <dbl>
Parameters can be added to the YAML header
title: "NFL Reports" author: "Justin Post" output: html_document params: team: "Pittsburgh Steelers"
Can ‘Knit with parameters’
In .Rmd, access via params$team
May want to create a similar document/output for all 32 teams
rmarkdown::render("NFL.Rmd", output_file = "Cleveland Browns.html", params = list(team = "Cleveland Browns"))
Plan:
Create data frame that has
file names to output to
list with each team name for using in render()
For one team the row would be (last column’s value is really a list with one value in it)
## output_file team ## 1 Pittsburgh Steelers.html Pittsburgh Steelers
#get unique teams teamIDs <- unique(NFLData$awayTeam) #create filenames output_file <- paste0(teamIDs, ".html") #create a list for each team with just the team name parameter params = lapply(teamIDs, FUN = function(x){list(team = x)}) #put into a data frame reports <- tibble(output_file, params)
reports
## # A tibble: 32 x 2 ## output_file params ## <chr> <list> ## 1 San Francisco 49ers.html <named list [1]> ## 2 Minnesota Vikings.html <named list [1]> ## 3 New Orleans Saints.html <named list [1]> ## 4 New York Jets.html <named list [1]> ## 5 Arizona Cardinals.html <named list [1]> ## # ... with 27 more rows
Now knit using apply()
or via purrr::pwalk()
library(rmarkdown) #need to use x[[1]] to get at elements since tibble doesn't simplify apply(reports, MARGIN = 1, FUN = function(x){ render(input = "files/NFL.Rmd", output_file = x[[1]], params = x[[2]]) }) #or with pwalk (args are .l, .f, and ...) #.l is a list of lists, .f is function, formula, or vector pwalk(reports, render, input = "files/NFL.Rmd")
This can be done with multiple parameters.
Nice way to automate creation
Could put into a nice pipeline
Create file to update NFL data each week (scrape new data and add to .csv file)
Create .Rmd file that you want for each team
Create file to submit creation of documents with params
Put all into one file (say with source
)
Try to do a similar process where you use the iris
data frame and create three separate analysis based off of the Species
column.
Create a parameter called species
Run the following code in your markdown doc:
myIris <- filter(iris, params$species) summary(myIris) plot(myIris)
apply()
or pwalk()
to generate the three reports.You can easily write a book using RStudio too!
Can be done via File –> New Directory –> Book Project Using bookdown
Instead we’ll clone this repo and then create an R project from a git repository
Let’s see how to build the book!
Chapters are just .Rmd files (named appropriately 01-intro.Rmd
) that start with a #
Seems like a good idea to talk about merging branches!
If you do a commit on your branch, you may notice something like this
Suppose you like your commit and you think I will too!
pull
requestSuppose you like your commit and you think I will too!
pull
requestIf you are lucky, there won’t be any merge conflicts.
Allows the owner of the original repo to accept the pull request without needing to modify things
The owner will get a notification that a pull request has been made
Owner can then investigate the request and choose whether or not to accept it or they can ask for more details
Owner sees a notification about conflicts that must be resolved
They can view the issues and pick which to include or to include both with a modification
<<<<<<<
is a conflict marker
<<< === >>>
linesR Projects
Markdown basics
Git & Github
Creating a blog & website with R and github
Automating R Markdown
Writing a Book with bookdown
Creating interactive apps with R shiny