SQL Resources
There is a ton more to learn about SQL if you are interested. We’ll be using it a bit more to interact with Spark. Spark has a pandas-on-Spark API (way of interacting) and a Spark-SQL API. While pandas is generally easier, it isn’t as functional as the SQL interface.
Luckily, we’ll mostly use SQL to rename variables, select data, filter observations, and maybe create a new observation or two. As such, it is useful to understand a bit more about it than we covered in lecture/readings.
Variable Types
We briefly discussed the difference between python using None and SQLite using NULL. Generally, python data types are going to get transformed to SQLite (or, later, Spark) when we move data in and out.
SQLite and Python types:
SQLitenatively supports the following types:- NULL
- INTEGER
- REAL
- TEXT
- BLOB
The following
Pythontypes can be sent toSQLiteand will be converted as follows:
Python type |
SQLite type |
|---|---|
None |
NULL |
int |
INTEGER |
float |
REAL |
str (depends on text_factory) |
TEXT |
bytes |
BLOG |
SQLite Practice (Non-graded)
You should
- head to the
sqlitetutorial.netsite and read through the tutorials based around theSELECTstatement (there are 7 or so listed on the page linked). - then read through the five ‘
SQLitefunction’ pages (For example: https://www.sqlitetutorial.net/sqlite-avg/) - Lastly, for those that want more practice, head to
datalemur.comand try some of the ‘easy’ tasks! This will help prepare you for the logic of our laterSQLwork inSpark!
Use the table of contents on the left or the arrows at the bottom of this page to navigate to the next learning material!