class: center, middle, inverse, title-slide .title[ # Workflows & Git/GitHub Basics ] .author[ ### Justin Post ] --- # What do we want to be able to do? Data Science! - Read in raw data and manipulate it - Combine data sources - Summarize data to glean insights - Apply common analysis methods - Communicate Effectively --- # Workflow Important considerations for data analysis workflow: - **Reproducibility** <br> <br> <br> - **Version control** <br> <br> <br> - **Collaboration** --- # What are git and github? - **Git** is a version control software - **Github** is an online hosting service for Git-based projects --- # What are git and github? - **Git** is a version control software - **Github** is an online hosting service for Git-based projects ## Git Tracking - You associate git with a folder (repo) - Git keeps track of all files in the folder (repo) - If you want to keep changes you've made, you **commit** and **push** the changes to the folder (repo) --- # Github - Github allows you to have a remote file repository (folder) tracked by git + Let's create a repository on github.com + **Add** some files and **commit** to the changes + Modify some files on github + Investigate the version control! --- # Local vs Remote Work Mostly you'll want to work on your local computer. Install `git` on your computer! --- # Local vs Remote Work Mostly you'll want to work on your local computer. Install `git` on your computer! ## Workflow 1. (Initially) **clone** the repo locally. (Later) **pull** to get most recent versions of files 2. Work and make changes 3. **add** and **commit** to changes you like 4. **push** changes to remote repo (on github) Let's clone our repo and work on it locally! --- # Git & RStudio Git and RStudio work great together! - Works through **R Projects** <br> <br> <br> - Start a new project from git repo + Update with command line or git menu! --- # Collaboration Idea - Everyone can work on the same `branch` and just take turns working <img src="data:image/png;base64,#img/git_diagram.png" alt="Flow chart with four bubbles moving from left to right. The first bubble is the initial repo, then the second represents a commit, the third and fourth also represent future commits." width="500px" style="display: block; margin: auto;" /> --- # Working on Branches - Alternatively, you can have separate **branches** of the repo <img src="data:image/png;base64,#img/git_branch_diagram.png" alt="Flow chart with many bubbles moving from left to right. The first bubble is the initial repo, the other bubbles represent different commits. On one commit a 'branch' comes off and has a separate flow with its own commits. It then 'merges' back to the main branch at a later commit." width="500px" style="display: block; margin: auto;" /> - Work on a branch is similar to working on the main branch - Can merge when happy! --- # Forking - People often `fork` the repo - This creates a copy of the repo on your account - You can then work as normal - If you do a commit on your branch, you may notice something like this <img src="data:image/png;base64,#img/forkcommit.png" alt="A message on github shows a message saying 'This branch is 1 commit ahead of jbpost2:main.'" width="45%" style="display: block; margin: auto;" /> --- # Merging Branches Suppose you like your commit and you think the original owner will too! - You can do a `pull` request <img src="data:image/png;base64,#img/pullrequest.png" alt="The 'Pull requests' tab on github is shown. Any pull requests are listed there." width="79%" style="display: block; margin: auto;" /> --- # Merging Branches Suppose you like your commit and you think the original owner will too! - You can do a `pull` request <img src="data:image/png;base64,#img/pullinfo.png" alt="A pull request is shown. A file and its changes are displayed." width="75%" style="display: block; margin: auto;" /> --- # Merging Branches If you are lucky, there won't be any merge conflicts. - Allows the owner of the original repo to accept the pull request without needing to modify things - The owner will get a notification that a pull request has been made <img src="data:image/png;base64,#img/pullrequestnoted.png" alt="A notification on github is shown next to the 'pull requests' menu item." width="45%" style="display: block; margin: auto;" /> --- # Merging Branches Owner can then investigate the request and choose whether or not to accept it or they can ask for more details <img src="data:image/png;base64,#img/pullcheck2.png" alt="The github pull request screen is shown. Here one can choose to merge a pull request." width="70%" style="display: block; margin: auto;" /> --- # Dealing with conflicts - Sometimes changes requested conflict with changes already made <img src="data:image/png;base64,#img/mergeconflict.png" alt="An example of a pull request that cannot be merged automatically is shown. The user must then manage the merge themselves." width="70%" style="display: block; margin: auto;" /> --- # Dealing with conflicts Owner sees a notification about conflicts that must be resolved <img src="data:image/png;base64,#img/resolve.png" alt="A list of conflicts on github is shown. A 'resolve conflicts' button appears, allowing the user to resolve issues." width="70%" style="display: block; margin: auto;" /> --- # Dealing with conflicts They can view the issues and pick which to include or to include both with a modification `<<<<<<<` is a conflict marker <img src="data:image/png;base64,#img/resolve2.png" alt="An example conflict is shown with <<<< ===== >>>> designating the areas that must be manually selected." width="70%" style="display: block; margin: auto;" /> - Figure out what to do and delete the `<<< === >>>` lines --- # Recap - **Git** is a version control software + Associated with a folder (repo) + Tracks changes to files - **Github** is an online hosting service for Git-based projects - Workflow: + Pull down most recent files (`git pull`) or do initial download (`git clone`) + Add files you want to keep changes to (`git add`) + Commit to the changes (`git commit`) + Push the changes to the remote repo (`git push`)