class: center, middle, inverse, title-slide .title[ # Workflows & Git/GitHub Basics ] .author[ ### Justin Post ] --- # What do we want to be able to do? Data Science! - Read in raw data and manipulate it - Combine data sources - Summarize data to glean insights - Apply common analysis methods - Communicate Effectively --- # Workflow Important considerations for data analysis workflow: - **Reproducibility** <br> <br> <br> - **Version control** <br> <br> <br> - **Collaboration** --- # What are git and github? - **Git** is a version control software - **Github** is an online hosting service for Git-based projects --- # What are git and github? - **Git** is a version control software - **Github** is an online hosting service for Git-based projects ## Git Tracking - You associate git with a folder (repo) - Git keeps track of all files in the folder (repo) - If you want to keep changes you've made, you **commit** and **push** the changes to the folder (repo) --- # Github - Github allows you to have a remote file repository (folder) tracked by git + Let's create a repository on github.com + **Add** some files and **commit** to the changes + Modify some files on github + Investigate the version control! --- # Local vs Remote Work Mostly you'll want to work on your local computer. Install `git` on your computer! --- # Local vs Remote Work Mostly you'll want to work on your local computer. Install `git` on your computer! ## Workflow 1. (Initially) **clone** the repo locally. (Later) **pull** to get most recent versions of files 2. Work and make changes 3. **add** and **commit** to changes you like 4. **push** changes to remote repo (on github) Let's clone our repo and work on it locally! --- # Git & RStudio Git and RStudio work great together! - Works through **R Projects** <br> <br> <br> - Start a new project from git repo + Update with command line or git menu! --- # Collaboration Idea - Everyone can work on the same `branch` and just take turns working <img src="data:image/png;base64,#img/git_diagram.png" width="500px" style="display: block; margin: auto;" /> --- # Working on Branches - Alternatively, you can have separate **branches** of the repo <img src="data:image/png;base64,#img/git_branch_diagram.png" width="500px" style="display: block; margin: auto;" /> - Work on a branch is similar to working on the main branch - Can merge when happy! --- # Forking - People often `fork` the repo - This creates a copy of the repo on your account - You can then work as normal - If you do a commit on your branch, you may notice something like this <img src="data:image/png;base64,#img/forkcommit.png" width="45%" style="display: block; margin: auto;" /> --- # Merging Branches Suppose you like your commit and you think the original owner will too! - You can do a `pull` request <img src="data:image/png;base64,#img/pullrequest.png" width="79%" style="display: block; margin: auto;" /> --- # Merging Branches Suppose you like your commit and you think the original owner will too! - You can do a `pull` request <img src="data:image/png;base64,#img/pullinfo.png" width="75%" style="display: block; margin: auto;" /> --- # Merging Branches If you are lucky, there won't be any merge conflicts. - Allows the owner of the original repo to accept the pull request without needing to modify things - The owner will get a notification that a pull request has been made <img src="data:image/png;base64,#img/pullrequestnoted.png" width="45%" style="display: block; margin: auto;" /> --- # Merging Branches Owner can then investigate the request and choose whether or not to accept it or they can ask for more details <img src="data:image/png;base64,#img/pullcheck2.png" width="70%" style="display: block; margin: auto;" /> --- # Dealing with conflicts - Sometimes changes requested conflict with changes already made <img src="data:image/png;base64,#img/mergeconflict.png" width="70%" style="display: block; margin: auto;" /> --- # Dealing with conflicts Owner sees a notification about conflicts that must be resolved <img src="data:image/png;base64,#img/resolve.png" width="70%" style="display: block; margin: auto;" /> --- # Dealing with conflicts They can view the issues and pick which to include or to include both with a modification `<<<<<<<` is a conflict marker <img src="data:image/png;base64,#img/resolve2.png" width="70%" style="display: block; margin: auto;" /> - Figure out what to do and delete the `<<< === >>>` lines --- # Recap - **Git** is a version control software + Associated with a folder (repo) + Tracks changes to files - **Github** is an online hosting service for Git-based projects - Workflow: + Pull down most recent files (`git pull`) or do initial download (`git clone`) + Add files you want to keep changes to (`git add`) + Commit to the changes (`git commit`) + Push the changes to the remote repo (`git push`)