#bring in the module
import sqlite3
#make the connection to the .db file. My file is in the main folder on the left
= sqlite3.connect("chinook.db") con
Data Storage and Basic SQL
Justin Post
Before we get into Big Data, we need to understand the most common way to store many different, related, datasets. We’ve already looked at basic ways to store data in a single file:
.csv
(comma separated value).txt
(generic, usually delimited, data).dat
(generic, usually delimited, data).JSON
(think dictionary)
When we have multiple data sets (or sources) a database is often used along with a database management system (DBMS).
Consider the diagram below. This shows many different tables (think data frames) such as playlists
, playlist_track
, and tracks
. These tables are linked through keys such as the Playlistid
or Trackid
.
- The data, the DBMS, and the applications associated with them are often simply called a ‘database’
- As the data tables are generally linked, we call this a relational database
- The DBMS software is used to create, edit, and maintain database files and records, enabling easier file and record creation, data entry, data editing, updating, and reporting.
In this notebook we explore databases and the basics of the common structured query language (SQL) used to interact with them.
Note: These types of webpages are built from Jupyter notebooks (.ipynb
files). You can access your own versions of them by clicking here. It is highly recommended that you go through and run the notebooks yourself, modifying and rerunning things where you’d like!
Common Database Software
Many common types of relational databases management systems (RDBMS) exist. Some are free and some are not. A few common ones are:
Most RDBMS have their own Structured Query Language (SQL), however the basic functionality is similar across them!
Actions on Databases
There are a few common actions we often want to perform on a database. The acronym CRUD is used to describe four actions:
- Create data
- Read data
- Update data
- Delete data
We will write SQL code to do these actions!
You could imagine that, with multiple users possibly accessing a database, we need to be very careful in how we do these actions. There are four properties that relational database transactions must have:
- Atomicity defines all the elements that make up a complete database transaction.
- Consistency defines the rules for maintaining data points in a correct state after a transaction.
- Isolation keeps the effect of a transaction invisible to others until it is committed, to avoid confusion.
- Durability ensures that data changes become permanent once the transaction is committed.
This is commonly referred to as ACID properties.
Let’s explore using SQLite
with python
!
Accessing an SQLite Database in python
The sqlite3
module provides the ability to connect and perform actions on an SQLite
database.
- We need to import this module
- Then we just need a path to our database file (if we already have one, otherwise one will be created at the path - if possible)
- We’ll look at the commonly used
chinook
database the diagram above described. - As we need to have the ability to write to the database from our notebook, we can’t share the same database. That means you’ll need to download the
chinook.db
file and upload it to your folder area on the left! - If you want the changes to remain after you are done, you’d need to download that file before closing the session!
Accessing the Table Schema
Every SQLite database contains a “schema table” with information about that database. This describes the - tables - indices (special lookup tables to improve efficiency of queries) - triggers (named database object that is executed automatically when an INSERT, UPDATE or DELETE statement is issued against the associated table) - views (read-only tables, combinations of tables, etc.)
A schema file contains one row for each table, index, view, and trigger in the schema.
We can get the schema by issuing an SQL command! A common SQL command for querying a database looks like this: - SELECT column1, column2 FROM table WHERE logical_of_some_kind;
For SQLite
from python
, usually we’ll follow this structure:
- Create a
cursor
object usingcursor = con.cursor()
- Write our SQL command as a string
- Execute the SQL code using the
.execute()
method (cursor.execute()
) on ourcursor
object - Use the
.fetchall()
method (cursor.fetchall()
) to actually return the data requested - Close the conneciton made by the cursor (
cursor.close()
)
- Below we use the multiline comment (three quotation marks to start and end) in order to write more legible SQL code
#create a 'cursor' object from our connection
= con.cursor()
cursor
#SQL query to return all table names in the data base
#The * indicates we want to select everything
= '''
get_schema SELECT *
FROM sqlite_schema
WHERE type = "table";
'''
#execute the SQL query on the database!
cursor.execute(get_schema)
#The information for the query is stored in memory. We use the fetchall() method to actually return the information
= cursor.fetchall()
result
#finall we close the connection the cursor made with that query
cursor.close()
We can now look at what is stored in result
. Here it is actually a list
that we can cycle through and print information from.
print(type(result))
for i in result:
print(i)
<class 'list'>
('table', 'albums', 'albums', 2, 'CREATE TABLE "albums"\r\n(\r\n [AlbumId] INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,\r\n [Title] NVARCHAR(160) NOT NULL,\r\n [ArtistId] INTEGER NOT NULL,\r\n FOREIGN KEY ([ArtistId]) REFERENCES "artists" ([ArtistId]) \r\n\t\tON DELETE NO ACTION ON UPDATE NO ACTION\r\n)')
('table', 'sqlite_sequence', 'sqlite_sequence', 3, 'CREATE TABLE sqlite_sequence(name,seq)')
('table', 'artists', 'artists', 4, 'CREATE TABLE "artists"\r\n(\r\n [ArtistId] INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,\r\n [Name] NVARCHAR(120)\r\n)')
('table', 'customers', 'customers', 5, 'CREATE TABLE "customers"\r\n(\r\n [CustomerId] INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,\r\n [FirstName] NVARCHAR(40) NOT NULL,\r\n [LastName] NVARCHAR(20) NOT NULL,\r\n [Company] NVARCHAR(80),\r\n [Address] NVARCHAR(70),\r\n [City] NVARCHAR(40),\r\n [State] NVARCHAR(40),\r\n [Country] NVARCHAR(40),\r\n [PostalCode] NVARCHAR(10),\r\n [Phone] NVARCHAR(24),\r\n [Fax] NVARCHAR(24),\r\n [Email] NVARCHAR(60) NOT NULL,\r\n [SupportRepId] INTEGER,\r\n FOREIGN KEY ([SupportRepId]) REFERENCES "employees" ([EmployeeId]) \r\n\t\tON DELETE NO ACTION ON UPDATE NO ACTION\r\n)')
('table', 'employees', 'employees', 8, 'CREATE TABLE "employees"\r\n(\r\n [EmployeeId] INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,\r\n [LastName] NVARCHAR(20) NOT NULL,\r\n [FirstName] NVARCHAR(20) NOT NULL,\r\n [Title] NVARCHAR(30),\r\n [ReportsTo] INTEGER,\r\n [BirthDate] DATETIME,\r\n [HireDate] DATETIME,\r\n [Address] NVARCHAR(70),\r\n [City] NVARCHAR(40),\r\n [State] NVARCHAR(40),\r\n [Country] NVARCHAR(40),\r\n [PostalCode] NVARCHAR(10),\r\n [Phone] NVARCHAR(24),\r\n [Fax] NVARCHAR(24),\r\n [Email] NVARCHAR(60),\r\n FOREIGN KEY ([ReportsTo]) REFERENCES "employees" ([EmployeeId]) \r\n\t\tON DELETE NO ACTION ON UPDATE NO ACTION\r\n)')
('table', 'genres', 'genres', 10, 'CREATE TABLE "genres"\r\n(\r\n [GenreId] INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,\r\n [Name] NVARCHAR(120)\r\n)')
('table', 'invoices', 'invoices', 11, 'CREATE TABLE "invoices"\r\n(\r\n [InvoiceId] INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,\r\n [CustomerId] INTEGER NOT NULL,\r\n [InvoiceDate] DATETIME NOT NULL,\r\n [BillingAddress] NVARCHAR(70),\r\n [BillingCity] NVARCHAR(40),\r\n [BillingState] NVARCHAR(40),\r\n [BillingCountry] NVARCHAR(40),\r\n [BillingPostalCode] NVARCHAR(10),\r\n [Total] NUMERIC(10,2) NOT NULL,\r\n FOREIGN KEY ([CustomerId]) REFERENCES "customers" ([CustomerId]) \r\n\t\tON DELETE NO ACTION ON UPDATE NO ACTION\r\n)')
('table', 'invoice_items', 'invoice_items', 13, 'CREATE TABLE "invoice_items"\r\n(\r\n [InvoiceLineId] INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,\r\n [InvoiceId] INTEGER NOT NULL,\r\n [TrackId] INTEGER NOT NULL,\r\n [UnitPrice] NUMERIC(10,2) NOT NULL,\r\n [Quantity] INTEGER NOT NULL,\r\n FOREIGN KEY ([InvoiceId]) REFERENCES "invoices" ([InvoiceId]) \r\n\t\tON DELETE NO ACTION ON UPDATE NO ACTION,\r\n FOREIGN KEY ([TrackId]) REFERENCES "tracks" ([TrackId]) \r\n\t\tON DELETE NO ACTION ON UPDATE NO ACTION\r\n)')
('table', 'media_types', 'media_types', 15, 'CREATE TABLE "media_types"\r\n(\r\n [MediaTypeId] INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,\r\n [Name] NVARCHAR(120)\r\n)')
('table', 'playlists', 'playlists', 16, 'CREATE TABLE "playlists"\r\n(\r\n [PlaylistId] INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,\r\n [Name] NVARCHAR(120)\r\n)')
('table', 'playlist_track', 'playlist_track', 17, 'CREATE TABLE "playlist_track"\r\n(\r\n [PlaylistId] INTEGER NOT NULL,\r\n [TrackId] INTEGER NOT NULL,\r\n CONSTRAINT [PK_PlaylistTrack] PRIMARY KEY ([PlaylistId], [TrackId]),\r\n FOREIGN KEY ([PlaylistId]) REFERENCES "playlists" ([PlaylistId]) \r\n\t\tON DELETE NO ACTION ON UPDATE NO ACTION,\r\n FOREIGN KEY ([TrackId]) REFERENCES "tracks" ([TrackId]) \r\n\t\tON DELETE NO ACTION ON UPDATE NO ACTION\r\n)')
('table', 'tracks', 'tracks', 20, 'CREATE TABLE "tracks"\r\n(\r\n [TrackId] INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,\r\n [Name] NVARCHAR(200) NOT NULL,\r\n [AlbumId] INTEGER,\r\n [MediaTypeId] INTEGER NOT NULL,\r\n [GenreId] INTEGER,\r\n [Composer] NVARCHAR(220),\r\n [Milliseconds] INTEGER NOT NULL,\r\n [Bytes] INTEGER,\r\n [UnitPrice] NUMERIC(10,2) NOT NULL,\r\n FOREIGN KEY ([AlbumId]) REFERENCES "albums" ([AlbumId]) \r\n\t\tON DELETE NO ACTION ON UPDATE NO ACTION,\r\n FOREIGN KEY ([GenreId]) REFERENCES "genres" ([GenreId]) \r\n\t\tON DELETE NO ACTION ON UPDATE NO ACTION,\r\n FOREIGN KEY ([MediaTypeId]) REFERENCES "media_types" ([MediaTypeId]) \r\n\t\tON DELETE NO ACTION ON UPDATE NO ACTION\r\n)')
('table', 'sqlite_stat1', 'sqlite_stat1', 864, 'CREATE TABLE sqlite_stat1(tbl,idx,stat)')
Often we want to put this information into our common pandas
data frame format!
import pandas as pd
#create a data frame
= pd.DataFrame(result)
schema_df schema_df
0 | 1 | 2 | 3 | 4 | |
---|---|---|---|---|---|
0 | table | albums | albums | 2 | CREATE TABLE "albums"\r\n(\r\n [AlbumId] IN... |
1 | table | sqlite_sequence | sqlite_sequence | 3 | CREATE TABLE sqlite_sequence(name,seq) |
2 | table | artists | artists | 4 | CREATE TABLE "artists"\r\n(\r\n [ArtistId] ... |
3 | table | customers | customers | 5 | CREATE TABLE "customers"\r\n(\r\n [Customer... |
4 | table | employees | employees | 8 | CREATE TABLE "employees"\r\n(\r\n [Employee... |
5 | table | genres | genres | 10 | CREATE TABLE "genres"\r\n(\r\n [GenreId] IN... |
6 | table | invoices | invoices | 11 | CREATE TABLE "invoices"\r\n(\r\n [InvoiceId... |
7 | table | invoice_items | invoice_items | 13 | CREATE TABLE "invoice_items"\r\n(\r\n [Invo... |
8 | table | media_types | media_types | 15 | CREATE TABLE "media_types"\r\n(\r\n [MediaT... |
9 | table | playlists | playlists | 16 | CREATE TABLE "playlists"\r\n(\r\n [Playlist... |
10 | table | playlist_track | playlist_track | 17 | CREATE TABLE "playlist_track"\r\n(\r\n [Pla... |
11 | table | tracks | tracks | 20 | CREATE TABLE "tracks"\r\n(\r\n [TrackId] IN... |
12 | table | sqlite_stat1 | sqlite_stat1 | 864 | CREATE TABLE sqlite_stat1(tbl,idx,stat) |
Pretty good, just missing column names. For the schema table, those are standardized.
#current column names
schema_df.columns
RangeIndex(start=0, stop=5, step=1)
={0: 'type', 1: 'name', 2: 'tbl_name', 3: 'rootpage', 4: 'sql'}, inplace = True)
schema_df.rename(columns schema_df
type | name | tbl_name | rootpage | sql | |
---|---|---|---|---|---|
0 | table | albums | albums | 2 | CREATE TABLE "albums"\r\n(\r\n [AlbumId] IN... |
1 | table | sqlite_sequence | sqlite_sequence | 3 | CREATE TABLE sqlite_sequence(name,seq) |
2 | table | artists | artists | 4 | CREATE TABLE "artists"\r\n(\r\n [ArtistId] ... |
3 | table | customers | customers | 5 | CREATE TABLE "customers"\r\n(\r\n [Customer... |
4 | table | employees | employees | 8 | CREATE TABLE "employees"\r\n(\r\n [Employee... |
5 | table | genres | genres | 10 | CREATE TABLE "genres"\r\n(\r\n [GenreId] IN... |
6 | table | invoices | invoices | 11 | CREATE TABLE "invoices"\r\n(\r\n [InvoiceId... |
7 | table | invoice_items | invoice_items | 13 | CREATE TABLE "invoice_items"\r\n(\r\n [Invo... |
8 | table | media_types | media_types | 15 | CREATE TABLE "media_types"\r\n(\r\n [MediaT... |
9 | table | playlists | playlists | 16 | CREATE TABLE "playlists"\r\n(\r\n [Playlist... |
10 | table | playlist_track | playlist_track | 17 | CREATE TABLE "playlist_track"\r\n(\r\n [Pla... |
11 | table | tracks | tracks | 20 | CREATE TABLE "tracks"\r\n(\r\n [TrackId] IN... |
12 | table | sqlite_stat1 | sqlite_stat1 | 864 | CREATE TABLE sqlite_stat1(tbl,idx,stat) |
Alternatively, we can use the function read_sql()
from pandas
to automatically put the result into a data frame!
- Here we don’t need a
cursor
object, we just pass theSQL
string and the connection object.
= pd.read_sql(get_schema, con)
schema_df2 schema_df2
type | name | tbl_name | rootpage | sql | |
---|---|---|---|---|---|
0 | table | albums | albums | 2 | CREATE TABLE "albums"\r\n(\r\n [AlbumId] IN... |
1 | table | sqlite_sequence | sqlite_sequence | 3 | CREATE TABLE sqlite_sequence(name,seq) |
2 | table | artists | artists | 4 | CREATE TABLE "artists"\r\n(\r\n [ArtistId] ... |
3 | table | customers | customers | 5 | CREATE TABLE "customers"\r\n(\r\n [Customer... |
4 | table | employees | employees | 8 | CREATE TABLE "employees"\r\n(\r\n [Employee... |
5 | table | genres | genres | 10 | CREATE TABLE "genres"\r\n(\r\n [GenreId] IN... |
6 | table | invoices | invoices | 11 | CREATE TABLE "invoices"\r\n(\r\n [InvoiceId... |
7 | table | invoice_items | invoice_items | 13 | CREATE TABLE "invoice_items"\r\n(\r\n [Invo... |
8 | table | media_types | media_types | 15 | CREATE TABLE "media_types"\r\n(\r\n [MediaT... |
9 | table | playlists | playlists | 16 | CREATE TABLE "playlists"\r\n(\r\n [Playlist... |
10 | table | playlist_track | playlist_track | 17 | CREATE TABLE "playlist_track"\r\n(\r\n [Pla... |
11 | table | tracks | tracks | 20 | CREATE TABLE "tracks"\r\n(\r\n [TrackId] IN... |
12 | table | sqlite_stat1 | sqlite_stat1 | 864 | CREATE TABLE sqlite_stat1(tbl,idx,stat) |
Ahh, that’s better!
Querying a Table
Now that we know which tables exist, we can query them! Let’s return all the albums in the albums
table.
We go through the same process as before (as we closed our cursor
object we need to create a new one):
- Create our cursor using
con.cursor()
- Write our SQL command as a string
- Execute the SQL code using the
.execute()
method (cursor.execute()
) on ourcursor
object - Use the
.fetchall()
method (cursor.fetchall()
) to actually return the data requested - Close the conneciton made by the cursor (
cursor.close()
)
The FROM
function allows us to choose which table to query. LIMIT 20
says to limit what is returned to 20 rows.
#create the cursor instance
= con.cursor()
cursor #create the SQL string
= '''
get_albums SELECT *
FROM albums
LIMIT 20;
'''
#execute the query
= cursor.execute(get_albums)
albums #grab the results
= albums.fetchall()
album_results #close the cursor
cursor.close()#check the results
album_results
[(1, 'For Those About To Rock We Salute You', 1),
(2, 'Balls to the Wall', 2),
(3, 'Restless and Wild', 2),
(4, 'Let There Be Rock', 1),
(5, 'Big Ones', 3),
(6, 'Jagged Little Pill', 4),
(7, 'Facelift', 5),
(8, 'Warner 25 Anos', 6),
(9, 'Plays Metallica By Four Cellos', 7),
(10, 'Audioslave', 8),
(11, 'Out Of Exile', 8),
(12, 'BackBeat Soundtrack', 9),
(13, 'The Best Of Billy Cobham', 10),
(14, 'Alcohol Fueled Brewtality Live! [Disc 1]', 11),
(15, 'Alcohol Fueled Brewtality Live! [Disc 2]', 11),
(16, 'Black Sabbath', 12),
(17, 'Black Sabbath Vol. 4 (Remaster)', 12),
(18, 'Body Count', 13),
(19, 'Chemical Wedding', 14),
(20, 'The Best Of Buddy Guy - The Millenium Collection', 15)]
Ok, when just reading a table, let’s go with pd.read_sql()
instead since it returns things in a nicer format!
= pd.read_sql(get_albums, con)
album_results album_results
AlbumId | Title | ArtistId | |
---|---|---|---|
0 | 1 | For Those About To Rock We Salute You | 1 |
1 | 2 | Balls to the Wall | 2 |
2 | 3 | Restless and Wild | 2 |
3 | 4 | Let There Be Rock | 1 |
4 | 5 | Big Ones | 3 |
5 | 6 | Jagged Little Pill | 4 |
6 | 7 | Facelift | 5 |
7 | 8 | Warner 25 Anos | 6 |
8 | 9 | Plays Metallica By Four Cellos | 7 |
9 | 10 | Audioslave | 8 |
10 | 11 | Out Of Exile | 8 |
11 | 12 | BackBeat Soundtrack | 9 |
12 | 13 | The Best Of Billy Cobham | 10 |
13 | 14 | Alcohol Fueled Brewtality Live! [Disc 1] | 11 |
14 | 15 | Alcohol Fueled Brewtality Live! [Disc 2] | 11 |
15 | 16 | Black Sabbath | 12 |
16 | 17 | Black Sabbath Vol. 4 (Remaster) | 12 |
17 | 18 | Body Count | 13 |
18 | 19 | Chemical Wedding | 14 |
19 | 20 | The Best Of Buddy Guy - The Millenium Collection | 15 |
Process for doing a CRUD activity
(Create, Read, Update, Delete)
If we wanted to do something other than just read a table, we need to follow our general structure from above:
- Create a connection using
sqlite3.connect(path)
- Create a
cursor
object associated with the connection - Write an SQL query as a string
- Use
cursor.execute()
(orcursor.executemany()
) to execute the SQL - Close the
cursor
object
Common SQL commands
- CREATE TABLE - creates a new table
- INSERT INTO - adds records to a table
- UPDATE - modify existing records
- DELETE FROM - deletes data
- DROP TABLE - removes a table
- SELECT - reads data (use
fetchone()
,fetchall()
, or the returned value as an iterator but we’ll usepd.read_sql()
for simple read commands)
Also many important joins we’ll cover shortly.
Let’s go through a few of these actions to show how we can write our SQL code and execute it! Note that SQL code is not case sensitive.
Create a Table
Let’s start by creating a table. We write the SQL code and specify the name of the table and and variables we want to create.
- We specify the type of data the variable will hold after naming the variable
- Check this reference for possible data types!
#create the cursor instance
= con.cursor()
cursor #write our SQL to create a table
#here we also specify two variables: album and artist along with the type of data they'll hold
= """
ct CREATE TABLE IF NOT EXISTS justin_music (
album TEXT,
artist TEXT);
"""
#execute the SQL code
cursor.execute(ct)
<sqlite3.Cursor at 0x7a23d8f825c0>
Let’s check that it worked before we close the connection!
'''
pd.read_sql( SELECT *
FROM justin_music;
''', con)
album | artist |
---|
Cool, now let’s close the connection.
= con.cursor() cursor
You can see we’ll do this process a lot. It may be best to create a function to help us out. (Modified from https://realpython.com/python-sql-libraries/)
def execute_query(connection, query):
= connection.cursor()
cursor try:
cursor.execute(query)print("Query executed successfully")
except Error as e:
print(f"The error '{e}' occurred")
cursor.close()
The function will execute the query for us and close the cursor.
execute_query(con, ct)
Query executed successfully
INSERT INTO
Now lets add some data to our table using INSERT INTO
. We need to pass the table name, optionally the columns we’ll specify, and the values to fill with.
= """
create_rows INSERT INTO
justin_music (album, artist)
VALUES
("Sixteen Stone", "Bush"),
("Listener Supported", "Dave Matthews Band"),
("Chris Stapleton", "Traveler"),
("1989", "Taylor Swift");
"""
execute_query(con, create_rows)
Query executed successfully
Let’s check if it worked!
"SELECT * FROM justin_music", con) pd.read_sql(
album | artist | |
---|---|---|
0 | Sixteen Stone | Bush |
1 | Listener Supported | Dave Matthews Band |
2 | Chris Stapleton | Traveler |
3 | 1989 | Taylor Swift |
Now let’s add a row with only an artist.
= """
create_row INSERT INTO
justin_music (artist)
VALUES
("Taylor Swift");
"""
execute_query(con, create_row)
Query executed successfully
This inserts a None
into the data frame but note that the value in the original table is actually a NULL
.
"SELECT * FROM justin_music", con) pd.read_sql(
album | artist | |
---|---|---|
0 | Sixteen Stone | Bush |
1 | Listener Supported | Dave Matthews Band |
2 | Chris Stapleton | Traveler |
3 | 1989 | Taylor Swift |
4 | None | Taylor Swift |
UPDATE
We can use update
to modify an existing row of data. We use
- UPDATE to specify the table
- SET to specify the values of the columns
- WHERE to describe which records (rows) to modify
We can use AND
, OR
, IN
, and other logical operators in our SQL code! (See Operators and Parse-Affecting Attributes)
= """
mod_row UPDATE justin_music
SET album = "Red (Taylor's Version)", artist = "Taylor Swift"
WHERE (album is null) AND (artist = "Taylor Swift");
"""
execute_query(con, mod_row)
Query executed successfully
"SELECT * FROM justin_music", con) pd.read_sql(
album | artist | |
---|---|---|
0 | Sixteen Stone | Bush |
1 | Listener Supported | Dave Matthews Band |
2 | Chris Stapleton | Traveler |
3 | 1989 | Taylor Swift |
4 | Red (Taylor's Version) | Taylor Swift |
DELETE FROM
We can also delete certain rows. We use
- DELETE FROM to specify the table
- a WHERE condition to determine what is deleted
= """
del_row DELETE FROM justin_music
WHERE artist = "Taylor Swift";
"""
execute_query(con, del_row)
Query executed successfully
"SELECT * FROM justin_music", con) pd.read_sql(
album | artist | |
---|---|---|
0 | Sixteen Stone | Bush |
1 | Listener Supported | Dave Matthews Band |
2 | Chris Stapleton | Traveler |
DROP TABLE
And of course we can remove an entire table using DROP TABLE. Just specify the table to remove!
"DROP TABLE justin_music") execute_query(con,
Query executed successfully
pd.read_sql(get_schema, con)
type | name | tbl_name | rootpage | sql | |
---|---|---|---|---|---|
0 | table | albums | albums | 2 | CREATE TABLE "albums"\r\n(\r\n [AlbumId] IN... |
1 | table | sqlite_sequence | sqlite_sequence | 3 | CREATE TABLE sqlite_sequence(name,seq) |
2 | table | artists | artists | 4 | CREATE TABLE "artists"\r\n(\r\n [ArtistId] ... |
3 | table | customers | customers | 5 | CREATE TABLE "customers"\r\n(\r\n [Customer... |
4 | table | employees | employees | 8 | CREATE TABLE "employees"\r\n(\r\n [Employee... |
5 | table | genres | genres | 10 | CREATE TABLE "genres"\r\n(\r\n [GenreId] IN... |
6 | table | invoices | invoices | 11 | CREATE TABLE "invoices"\r\n(\r\n [InvoiceId... |
7 | table | invoice_items | invoice_items | 13 | CREATE TABLE "invoice_items"\r\n(\r\n [Invo... |
8 | table | media_types | media_types | 15 | CREATE TABLE "media_types"\r\n(\r\n [MediaT... |
9 | table | playlists | playlists | 16 | CREATE TABLE "playlists"\r\n(\r\n [Playlist... |
10 | table | playlist_track | playlist_track | 17 | CREATE TABLE "playlist_track"\r\n(\r\n [Pla... |
11 | table | tracks | tracks | 20 | CREATE TABLE "tracks"\r\n(\r\n [TrackId] IN... |
12 | table | sqlite_stat1 | sqlite_stat1 | 864 | CREATE TABLE sqlite_stat1(tbl,idx,stat) |
My table is gone :(
SELECT
As a statistician, you’d spend most of your time pulling data of interest and then summarizing it, running models, etc.
SELECT
is the workhorse for that task! As we are just reading in data from the database, we can just use pd.read_sql()
.
With SELECT we
- SELECT the column(s) we want
- FROM the table of interest
- Specifing the records (rows) of interest with a WHERE
First, let’s see a way to look at all columns in a table.
'SELECT * FROM albums', con) pd.read_sql(
AlbumId | Title | ArtistId | |
---|---|---|---|
0 | 1 | For Those About To Rock We Salute You | 1 |
1 | 2 | Balls to the Wall | 2 |
2 | 3 | Restless and Wild | 2 |
3 | 4 | Let There Be Rock | 1 |
4 | 5 | Big Ones | 3 |
... | ... | ... | ... |
342 | 343 | Respighi:Pines of Rome | 226 |
343 | 344 | Schubert: The Late String Quartets & String Qu... | 272 |
344 | 345 | Monteverdi: L'Orfeo | 273 |
345 | 346 | Mozart: Chamber Music | 274 |
346 | 347 | Koyaanisqatsi (Soundtrack from the Motion Pict... | 275 |
347 rows × 3 columns
Now let’s select the Title and ArtistID columns only. LIMIT
is a good way to only return (up to) a certain number of results.
"SELECT Title, artistID FROM albums LIMIT 10;", con) pd.read_sql(
Title | ArtistId | |
---|---|---|
0 | For Those About To Rock We Salute You | 1 |
1 | Balls to the Wall | 2 |
2 | Restless and Wild | 2 |
3 | Let There Be Rock | 1 |
4 | Big Ones | 3 |
5 | Jagged Little Pill | 4 |
6 | Facelift | 5 |
7 | Warner 25 Anos | 6 |
8 | Plays Metallica By Four Cellos | 7 |
9 | Audioslave | 8 |
WHERE allows us to specify conditions on rows.
"SELECT * FROM albums WHERE artistID = 2 LIMIT 10;", con) pd.read_sql(
AlbumId | Title | ArtistId | |
---|---|---|---|
0 | 2 | Balls to the Wall | 2 |
1 | 3 | Restless and Wild | 2 |
Clean Up
If we want to save any changes made to our database, we need to do a commit()
to the connection via con.commit()
. This saves the changes made and releases any locks on the data. Other connections to the database can only see changes you’ve made if you commit them.
When you are done working you should also close your connection. This is done with the close()
method.
con.close()
We skimmed over some of the syntax that SQL follows. You should read over the arithmetic operators, comparison operators, and the other operators linked on the left of that page.
Recap
Database is a term usually used to represent a set of related tables and the software used to interact with them.
In python
you can connect to most any kind of database. We looked at connecting to the easy, light-weight, sqlite
database.
The most important actions are given by CRUD (Create, Read, Update, and Delete).
We’ll get much more into python when we start dealing with pyspark
!
If you are on the course website, use the table of contents on the left or the arrows at the bottom of this page to navigate to the next learning material!
If you are on Google Colab, head back to our course website for our next lesson!