Spark Structured Streaming

Published

2025-03-31

The video below gives our first example of working with Spark Structured Streaming. We go through an example where we read in text data coming into a folder and count the number of occurrences of each word.

I highly recommend watching the video using the ‘full’ Panopto player. There is a ‘pop out’ button in the bottom right of the video to enter this viewer.

The notebook used in the video is available here. You’ll need to download this .ipynb file and upload it to your JupyterHub environment. Make sure that the kernel used to run the notebook is a pyspark kernel!

Remember, if you are off campus you should log in to the VPN and then you can access our JupyterHub.

Notes

Use the table of contents on the left or the arrows at the bottom of this page to navigate to the next learning material!