Spark MLlib Basics
The video below discusses how to use Spark
MLlib
through pyspark
.
I highly recommend watching the video using the ‘full’ Panopto player. There is a ‘pop out’ button in the bottom right of the video to enter this viewer.
The pyspark
code used in the notes is available in this notebook and the data set is also available online. You’ll need to download the .ipynb
file and upload it to your JupyterHub
environment. Make sure that the kernel used to run the notebook is a pyspark
kernel!
Remember, if you are off campus you should log in to the VPN and then you can access our JupyterHub
.
Notes
Additional Readings
MLlib Guide
- Available here!
- Python documentation (using the search in the upper left has been pretty useful)
MLflow & MLOps
Use the table of contents on the left or the arrows at the bottom of this page to navigate to the next learning material!