Streaming Joins

Published

2025-03-31

The video below discusses how we can join two streams, join a stream with a static data frame, and the requirements to do so!

I highly recommend watching the video using the ‘full’ Panopto player. There is a ‘pop out’ button in the bottom right of the video to enter this viewer.

The notebook used in the video is available here. You’ll need to download this .ipynb file and upload it to your JupyterHub environment. Make sure that the kernel used to run the notebook is a pyspark kernel!

Remember, if you are off campus you should log in to the VPN and then you can access our JupyterHub.

The impressions and clicks data sets are available at https://www4.stat.ncsu.edu/online/datasets/.

Notes

This ends the course! I hope you enjoyed it :)

Head back to the Moodle site to work on your final project.