layout: false class: title-slide-section-red, middle # `SQL` Joins Justin Post --- # Relational Databases - Often want to combine data from multiple tables to summarize/model <img src="https://www4.stat.ncsu.edu/online/datasets/chinook-database-diagram.png" width="600px" style="display: block; margin: auto;" /> --- # Create Our Own Database and Do Joins! ```python import sqlite3 import pandas as pd con = sqlite3.connect(':memory:') cursor = con.cursor() cursor.execute("CREATE TABLE IF NOT EXISTS dept (name TEXT, rank TEXT);") ``` ``` ## <sqlite3.Cursor object at 0x0000022F83CABE30> ``` <div style="float: left; width: 45%;"> ```python cursor.execute( """ INSERT INTO dept (name, rank) VALUES ("Justin", "Associate"), ("Jung-Ying", "Full"), ("Arnab", "Associate"), ("Spencer", "Full"); """) ``` ``` ## <sqlite3.Cursor object at 0x0000022F83CABE30> ``` </div> <div style="float: right; width: 45%;"> ```python pd.read_sql("SELECT * FROM dept", con) ``` ``` ## name rank ## 0 Justin Associate ## 1 Jung-Ying Full ## 2 Arnab Associate ## 3 Spencer Full ``` </div> <!--comment--> --- # Create Our Own Database and Do Joins! ```python cursor = con.cursor() cursor.execute("CREATE TABLE IF NOT EXISTS seminar (name TEXT, topic TEXT);") ``` ``` ## <sqlite3.Cursor object at 0x0000022F83CDC260> ``` <div style="float: left; width: 45%;"> ```python cursor.execute( """ INSERT INTO seminar (name, topic) VALUES ("Jung-Ying", "Genetics"), ("Jonathan", "Design"), ("Arnab", "ML"), ("Dennis", "Non-parametrics"); """) ``` ``` ## <sqlite3.Cursor object at 0x0000022F83CDC260> ``` </div> <div style="float: right; width: 45%;"> ```python pd.read_sql("SELECT * FROM seminar", con) ``` ``` ## name topic ## 0 Jung-Ying Genetics ## 1 Jonathan Design ## 2 Arnab ML ## 3 Dennis Non-parametrics ``` </div> <!--comment--> --- # Joins Combining two (or more) tables in `SQL` is called doing a **join** - Many types of joins: `left_join()`, `right_join()`, `inner_join()`, `full_join()` are most common - Inner Join: Returns records with matching keys in both tables <img src="img/inner_join.png" width="350px" style="display: block; margin: auto;" /> --- # Inner Join: Returns records with matching keys <div style="float: left; width: 45%;"> `Dept` ``` ## name rank ## 0 Justin Associate ## 1 Jung-Ying Full ## 2 Arnab Associate ## 3 Spencer Full ``` </div> <div style="float: right; width: 45%;"> `seminar` ``` ## name topic ## 0 Jung-Ying Genetics ## 1 Jonathan Design ## 2 Arnab ML ## 3 Dennis Non-parametrics ``` </div> <div style = "content: ''; clear: both; display: table;"> </div> <!--comment--> <br> ```python inner = """ SELECT d.name, d.rank, s.topic FROM dept as d INNER JOIN seminar as s ON s.name = d.name """ pd.read_sql(inner, con) ``` ``` ## name rank topic ## 0 Jung-Ying Full Genetics ## 1 Arnab Associate ML ``` --- # Inner Join: Returns records with matching keys <div style="float: left; width: 45%;"> `Dept` ``` ## name rank ## 0 Justin Associate ## 1 Jung-Ying Full ## 2 Arnab Associate ## 3 Spencer Full ``` </div> <div style="float: right; width: 45%;"> `seminar` ``` ## name topic ## 0 Jung-Ying Genetics ## 1 Jonathan Design ## 2 Arnab ML ## 3 Dennis Non-parametrics ``` </div> <div style = "content: ''; clear: both; display: table;"> </div> <!--comment--> <br> ```python pd.merge( left = pd.read_sql("SELECT * FROM dept", con), right = pd.read_sql("SELECT * FROM seminar", con), how = "inner", on = "name") ``` ``` ## name rank topic ## 0 Jung-Ying Full Genetics ## 1 Arnab Associate ML ``` --- # Joins - Left Join: Returns all records from the 'left' table and any matching records from the 'right' table <img src="img/left_join.png" width="350px" style="display: block; margin: auto;" /> --- # Left Join: Return left table and matching right records <div style="float: left; width: 45%;"> `Dept` ``` ## name rank ## 0 Justin Associate ## 1 Jung-Ying Full ## 2 Arnab Associate ## 3 Spencer Full ``` </div> <div style="float: right; width: 45%;"> `seminar` ``` ## name topic ## 0 Jung-Ying Genetics ## 1 Jonathan Design ## 2 Arnab ML ## 3 Dennis Non-parametrics ``` </div> <div style = "content: ''; clear: both; display: table;"> </div> <!--comment--> <br> ```python left = """ SELECT d.name, d.rank, s.topic FROM dept as d LEFT JOIN seminar as s ON s.name = d.name """ pd.read_sql(left, con) ``` ``` ## name rank topic ## 0 Justin Associate None ## 1 Jung-Ying Full Genetics ## 2 Arnab Associate ML ## 3 Spencer Full None ``` --- # Left Join: Return left table and matching right records <div style="float: left; width: 45%;"> `Dept` ``` ## name rank ## 0 Justin Associate ## 1 Jung-Ying Full ## 2 Arnab Associate ## 3 Spencer Full ``` </div> <div style="float: right; width: 45%;"> `seminar` ``` ## name topic ## 0 Jung-Ying Genetics ## 1 Jonathan Design ## 2 Arnab ML ## 3 Dennis Non-parametrics ``` </div> <div style = "content: ''; clear: both; display: table;"> </div> <!--comment--> <br> ```python pd.merge( left = pd.read_sql("SELECT * FROM dept", con), right = pd.read_sql("SELECT * FROM seminar", con), how = "left", on = "name") ``` ``` ## name rank topic ## 0 Justin Associate NaN ## 1 Jung-Ying Full Genetics ## 2 Arnab Associate ML ## 3 Spencer Full NaN ``` --- # Joins - Right Join: Returns all records from the 'right' table and any matching records from the 'left' table <img src="img/right_join.png" width="350px" style="display: block; margin: auto;" /> - Not supported in `sqlite`'s `SQL`! Just do a left join and switch the tables. --- # Joins - Outer Join: Returns all records when there is a match from the 'left' or 'right' table <img src="img/outer_join.png" width="350px" style="display: block; margin: auto;" /> - Not supported in `sqlite`'s `SQL`! Have to do some work! --- # Outer Join: Return all matches from both tables <div style="float: left; width: 45%;"> `Dept` ``` ## name rank ## 0 Justin Associate ## 1 Jung-Ying Full ## 2 Arnab Associate ## 3 Spencer Full ``` </div> <div style="float: right; width: 45%;"> `seminar` ``` ## name topic ## 0 Jung-Ying Genetics ## 1 Jonathan Design ## 2 Arnab ML ## 3 Dennis Non-parametrics ``` </div> <div style = "content: ''; clear: both; display: table;"> </div> <!--comment--> <br> ```python outer = """ SELECT d.name, d.rank, s.topic FROM dept as d LEFT JOIN seminar as s ON s.name = d.name UNION SELECT s.name, d.rank, s.topic FROM seminar as s LEFT JOIN dept as d ON s.name = d.name """ pd.read_sql(outer, con) ``` ``` ## name rank topic ## 0 Arnab Associate ML ## 1 Dennis None Non-parametrics ## 2 Jonathan None Design ## 3 Jung-Ying Full Genetics ## 4 Justin Associate None ## 5 Spencer Full None ``` --- # Outer Join: Return all matches from both tables <div style="float: left; width: 45%;"> `Dept` ``` ## name rank ## 0 Justin Associate ## 1 Jung-Ying Full ## 2 Arnab Associate ## 3 Spencer Full ``` </div> <div style="float: right; width: 45%;"> `seminar` ``` ## name topic ## 0 Jung-Ying Genetics ## 1 Jonathan Design ## 2 Arnab ML ## 3 Dennis Non-parametrics ``` </div> <div style = "content: ''; clear: both; display: table;"> </div> <!--comment--> <br> ```python pd.merge( left = pd.read_sql("SELECT * FROM dept", con), right = pd.read_sql("SELECT * FROM seminar", con), how = "outer", on = "name") ``` ``` ## name rank topic ## 0 Justin Associate NaN ## 1 Jung-Ying Full Genetics ## 2 Arnab Associate ML ## 3 Spencer Full NaN ## 4 Jonathan NaN Design ## 5 Dennis NaN Non-parametrics ``` --- # Cross Join Other `sqlite` supported join is the cross join - Returns every combination of rows from the left table with the right table ```python cross = """ SELECT * FROM dept CROSS JOIN seminar """ pd.read_sql(cross, con) ``` ``` ## name rank name topic ## 0 Justin Associate Jung-Ying Genetics ## 1 Justin Associate Jonathan Design ## 2 Justin Associate Arnab ML ## 3 Justin Associate Dennis Non-parametrics ## 4 Jung-Ying Full Jung-Ying Genetics ## 5 Jung-Ying Full Jonathan Design ## 6 Jung-Ying Full Arnab ML ## 7 Jung-Ying Full Dennis Non-parametrics ## 8 Arnab Associate Jung-Ying Genetics ## 9 Arnab Associate Jonathan Design ## 10 Arnab Associate Arnab ML ## 11 Arnab Associate Dennis Non-parametrics ## 12 Spencer Full Jung-Ying Genetics ## 13 Spencer Full Jonathan Design ## 14 Spencer Full Arnab ML ## 15 Spencer Full Dennis Non-parametrics ``` --- # Other Joins Lots of other joins out there! - [See here for examples](https://www.sqlitetutorial.net/sqlite-join/) of how to implement them in sqlite! + The right sidebar has more than the standard joins. - Also ways to do [if then else type logic](https://www.sqlitetutorial.net/sqlite-case/), [intersections](https://www.sqlitetutorial.net/sqlite-intersect/), etc. - Can do basic [summaries using `SQL`](https://www.sqlitetutorial.net/sqlite-avg/) as well (including [grouping](https://www.sqlitetutorial.net/sqlite-group-by/)), but we'll just use `python` for that! --- # To JupyterLab! - Let's look at the chinook database and more involved joins! --- # Recap - Joins allows us to combine two (or more) tables into one - inner, left, and cross are all supported by `sqlite` - `pandas` allows for left, right, outer, inner, and cross via the `pd.merge()` function - Can write `SQL` code and use `pd.read_sql()`