import sqlite3
import pandas as pd
= sqlite3.connect("chinook.db") con
SQL Joins on chinook
Now let’s do some joins on the chinook database. We’ll reread in the packages are make our connection to the database.
Using pandas we can check out the tables returned as a data frame.
"SELECT * FROM sqlite_schema WHERE type = 'table';", con) pd.read_sql(
type | name | tbl_name | rootpage | sql | |
---|---|---|---|---|---|
0 | table | albums | albums | 2 | CREATE TABLE "albums"\r\n(\r\n [AlbumId] IN... |
1 | table | sqlite_sequence | sqlite_sequence | 3 | CREATE TABLE sqlite_sequence(name,seq) |
2 | table | artists | artists | 4 | CREATE TABLE "artists"\r\n(\r\n [ArtistId] ... |
3 | table | customers | customers | 5 | CREATE TABLE "customers"\r\n(\r\n [Customer... |
4 | table | employees | employees | 8 | CREATE TABLE "employees"\r\n(\r\n [Employee... |
5 | table | genres | genres | 10 | CREATE TABLE "genres"\r\n(\r\n [GenreId] IN... |
6 | table | invoices | invoices | 11 | CREATE TABLE "invoices"\r\n(\r\n [InvoiceId... |
7 | table | invoice_items | invoice_items | 13 | CREATE TABLE "invoice_items"\r\n(\r\n [Invo... |
8 | table | media_types | media_types | 15 | CREATE TABLE "media_types"\r\n(\r\n [MediaT... |
9 | table | playlists | playlists | 16 | CREATE TABLE "playlists"\r\n(\r\n [Playlist... |
10 | table | playlist_track | playlist_track | 17 | CREATE TABLE "playlist_track"\r\n(\r\n [Pla... |
11 | table | tracks | tracks | 20 | CREATE TABLE "tracks"\r\n(\r\n [TrackId] IN... |
12 | table | sqlite_stat1 | sqlite_stat1 | 864 | CREATE TABLE sqlite_stat1(tbl,idx,stat) |
13 | table | justin_music | justin_music | 865 | CREATE TABLE justin_music (\n album TEX... |
The tracks and albums tables each share a variable/column/key of AlbumId
. Let’s do an inner join on those using pd.merge()
. Remember that we can pull the full tables into pandas using pd.read_sql()
and then use pandas
to do the join!
= pd.merge(left = pd.read_sql("SELECT * FROM tracks", con), right = pd.read_sql("SELECT * FROM albums", con),
tracks_albums = "inner",
how = "AlbumId")
on tracks_albums
TrackId | Name | AlbumId | MediaTypeId | GenreId | Composer | Milliseconds | Bytes | UnitPrice | Title | ArtistId | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | For Those About To Rock (We Salute You) | 1 | 1 | 1 | Angus Young, Malcolm Young, Brian Johnson | 343719 | 11170334 | 0.99 | For Those About To Rock We Salute You | 1 |
1 | 6 | Put The Finger On You | 1 | 1 | 1 | Angus Young, Malcolm Young, Brian Johnson | 205662 | 6713451 | 0.99 | For Those About To Rock We Salute You | 1 |
2 | 7 | Let's Get It Up | 1 | 1 | 1 | Angus Young, Malcolm Young, Brian Johnson | 233926 | 7636561 | 0.99 | For Those About To Rock We Salute You | 1 |
3 | 8 | Inject The Venom | 1 | 1 | 1 | Angus Young, Malcolm Young, Brian Johnson | 210834 | 6852860 | 0.99 | For Those About To Rock We Salute You | 1 |
4 | 9 | Snowballed | 1 | 1 | 1 | Angus Young, Malcolm Young, Brian Johnson | 203102 | 6599424 | 0.99 | For Those About To Rock We Salute You | 1 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
3498 | 3499 | Pini Di Roma (Pinien Von Rom) \ I Pini Della V... | 343 | 2 | 24 | None | 286741 | 4718950 | 0.99 | Respighi:Pines of Rome | 226 |
3499 | 3500 | String Quartet No. 12 in C Minor, D. 703 "Quar... | 344 | 2 | 24 | Franz Schubert | 139200 | 2283131 | 0.99 | Schubert: The Late String Quartets & String Qu... | 272 |
3500 | 3501 | L'orfeo, Act 3, Sinfonia (Orchestra) | 345 | 2 | 24 | Claudio Monteverdi | 66639 | 1189062 | 0.99 | Monteverdi: L'Orfeo | 273 |
3501 | 3502 | Quintet for Horn, Violin, 2 Violas, and Cello ... | 346 | 2 | 24 | Wolfgang Amadeus Mozart | 221331 | 3665114 | 0.99 | Mozart: Chamber Music | 274 |
3502 | 3503 | Koyaanisqatsi | 347 | 2 | 10 | Philip Glass | 206005 | 3305164 | 0.99 | Koyaanisqatsi (Soundtrack from the Motion Pict... | 275 |
3503 rows × 11 columns
Now we’ll join that table we just made with another table, invoice_items
and do a left join (treating invoice_items
as the left table). We’ll join on the key that is shared, trackId
.
next = pd.merge(left = pd.read_sql("SELECT * FROM invoice_items", con),
= tracks_albums,
right= "left",
how = "TrackId")
on next
InvoiceLineId | InvoiceId | TrackId | UnitPrice_x | Quantity | Name | AlbumId | MediaTypeId | GenreId | Composer | Milliseconds | Bytes | UnitPrice_y | Title | ArtistId | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 1 | 2 | 0.99 | 1 | Balls to the Wall | 2 | 2 | 1 | None | 342562 | 5510424 | 0.99 | Balls to the Wall | 2 |
1 | 2 | 1 | 4 | 0.99 | 1 | Restless and Wild | 3 | 2 | 1 | F. Baltes, R.A. Smith-Diesel, S. Kaufman, U. D... | 252051 | 4331779 | 0.99 | Restless and Wild | 2 |
2 | 3 | 2 | 6 | 0.99 | 1 | Put The Finger On You | 1 | 1 | 1 | Angus Young, Malcolm Young, Brian Johnson | 205662 | 6713451 | 0.99 | For Those About To Rock We Salute You | 1 |
3 | 4 | 2 | 8 | 0.99 | 1 | Inject The Venom | 1 | 1 | 1 | Angus Young, Malcolm Young, Brian Johnson | 210834 | 6852860 | 0.99 | For Those About To Rock We Salute You | 1 |
4 | 5 | 2 | 10 | 0.99 | 1 | Evil Walks | 1 | 1 | 1 | Angus Young, Malcolm Young, Brian Johnson | 263497 | 8611245 | 0.99 | For Those About To Rock We Salute You | 1 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
2235 | 2236 | 411 | 3136 | 0.99 | 1 | Looking For Love | 141 | 1 | 3 | Sykes | 391941 | 12769847 | 0.99 | Greatest Hits | 100 |
2236 | 2237 | 411 | 3145 | 0.99 | 1 | Sweet Lady Luck | 141 | 1 | 3 | Vandenberg | 273737 | 8919163 | 0.99 | Greatest Hits | 100 |
2237 | 2238 | 411 | 3154 | 0.99 | 1 | Feirinha da Pavuna/Luz do Repente/Bagaço da La... | 248 | 1 | 7 | Arlindo Cruz/Franco/Marquinhos PQD/Negro, Jove... | 107206 | 3593684 | 0.99 | Ao Vivo [IMPORT] | 155 |
2238 | 2239 | 411 | 3163 | 0.99 | 1 | Samba pras moças | 248 | 1 | 7 | Grazielle/Roque Ferreira | 152816 | 5121366 | 0.99 | Ao Vivo [IMPORT] | 155 |
2239 | 2240 | 412 | 3177 | 1.99 | 1 | Hot Girl | 249 | 3 | 19 | None | 1325458 | 267836576 | 1.99 | The Office, Season 1 | 156 |
2240 rows × 15 columns
Lastly, we’ll now combine that table with the invoices
table using an outer join on the key InvoiceId
. Note that you can actually do all of these joins in SQL using one call (which is likely much more efficient!). Please see the additional readings for topic 2 for more information.
= pd.read_sql("SELECT * FROM invoices", con),
pd.merge(left = next,
right = "outer",
how = "InvoiceId") on
InvoiceId | CustomerId | InvoiceDate | BillingAddress | BillingCity | BillingState | BillingCountry | BillingPostalCode | Total | InvoiceLineId | ... | Name | AlbumId | MediaTypeId | GenreId | Composer | Milliseconds | Bytes | UnitPrice_y | Title | ArtistId | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 2 | 2009-01-01 00:00:00 | Theodor-Heuss-Straße 34 | Stuttgart | None | Germany | 70174 | 1.98 | 1 | ... | Balls to the Wall | 2 | 2 | 1 | None | 342562 | 5510424 | 0.99 | Balls to the Wall | 2 |
1 | 1 | 2 | 2009-01-01 00:00:00 | Theodor-Heuss-Straße 34 | Stuttgart | None | Germany | 70174 | 1.98 | 2 | ... | Restless and Wild | 3 | 2 | 1 | F. Baltes, R.A. Smith-Diesel, S. Kaufman, U. D... | 252051 | 4331779 | 0.99 | Restless and Wild | 2 |
2 | 2 | 4 | 2009-01-02 00:00:00 | Ullevålsveien 14 | Oslo | None | Norway | 0171 | 3.96 | 3 | ... | Put The Finger On You | 1 | 1 | 1 | Angus Young, Malcolm Young, Brian Johnson | 205662 | 6713451 | 0.99 | For Those About To Rock We Salute You | 1 |
3 | 2 | 4 | 2009-01-02 00:00:00 | Ullevålsveien 14 | Oslo | None | Norway | 0171 | 3.96 | 4 | ... | Inject The Venom | 1 | 1 | 1 | Angus Young, Malcolm Young, Brian Johnson | 210834 | 6852860 | 0.99 | For Those About To Rock We Salute You | 1 |
4 | 2 | 4 | 2009-01-02 00:00:00 | Ullevålsveien 14 | Oslo | None | Norway | 0171 | 3.96 | 5 | ... | Evil Walks | 1 | 1 | 1 | Angus Young, Malcolm Young, Brian Johnson | 263497 | 8611245 | 0.99 | For Those About To Rock We Salute You | 1 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
2235 | 411 | 44 | 2013-12-14 00:00:00 | Porthaninkatu 9 | Helsinki | None | Finland | 00530 | 13.86 | 2236 | ... | Looking For Love | 141 | 1 | 3 | Sykes | 391941 | 12769847 | 0.99 | Greatest Hits | 100 |
2236 | 411 | 44 | 2013-12-14 00:00:00 | Porthaninkatu 9 | Helsinki | None | Finland | 00530 | 13.86 | 2237 | ... | Sweet Lady Luck | 141 | 1 | 3 | Vandenberg | 273737 | 8919163 | 0.99 | Greatest Hits | 100 |
2237 | 411 | 44 | 2013-12-14 00:00:00 | Porthaninkatu 9 | Helsinki | None | Finland | 00530 | 13.86 | 2238 | ... | Feirinha da Pavuna/Luz do Repente/Bagaço da La... | 248 | 1 | 7 | Arlindo Cruz/Franco/Marquinhos PQD/Negro, Jove... | 107206 | 3593684 | 0.99 | Ao Vivo [IMPORT] | 155 |
2238 | 411 | 44 | 2013-12-14 00:00:00 | Porthaninkatu 9 | Helsinki | None | Finland | 00530 | 13.86 | 2239 | ... | Samba pras moças | 248 | 1 | 7 | Grazielle/Roque Ferreira | 152816 | 5121366 | 0.99 | Ao Vivo [IMPORT] | 155 |
2239 | 412 | 58 | 2013-12-22 00:00:00 | 12,Community Centre | Delhi | None | India | 110017 | 1.99 | 2240 | ... | Hot Girl | 249 | 3 | 19 | None | 1325458 | 267836576 | 1.99 | The Office, Season 1 | 156 |
2240 rows × 23 columns