#A comment - this text is not evaluated
5 + 6
10 * 2
5**2
25
2025-03-31
Justin Post (Some notes modified from Dr. Eric Chi)
In preparation for dealing with big data we need to learn a programming language and figure out a good coding environment. We’ll learn python
and code in Google Colab/JupyterLab.
We choose python due to its popularity and the ease of programming in spark
(a big data software) through pyspark
.
We use JupyterLab
as it is a widely used software for creating python
notebooks. Google Colab
is built on JupyterLab
!
Note: These types of webpages are built from Jupyter notebooks (.ipynb
files). You can access your own versions of them by clicking here. It is highly recommended that you go through and run the notebooks yourself, modifying and rerunning things where you’d like!
When you open a new notebook in colab
by default it will use python to run any ‘code cells’ (this can be changed in the ‘notebook settings’ under the View -> ‘Notebook info’ menu).
There are two types of cells: - Code cells: allow you to submit code - Text cells: allow you to write text using ‘markdown’ (we’ll learn more about that shortly!)
These can be added in the top left of the notebook (+ Code
and + Text
). Below is a python code cell. These can be run by clicking ‘shift-enter’ when you click on the cell.
print()
function.You can create an object using =
. This saves the result in a variable you can call later.
+
operator. As with most programming languages, there are special characters like \
which indicate something. For instance, \n
is a line break. These appear differently depending on if you print something or just view the object.degrees_celsius = 26.0
print(9 / 5 * degrees_celsius + 32)
degrees_celsius = 100
print(9 / 5 * degrees_celsius + 32)
78.80000000000001
212.0
You might try to add a code cell to this notebook and
+
to concatenate it with the strings aboveThere are a number of built-in objects you can create. Some important ones are listed below:
str
int
, float
bool
list
, tuple
dict
We’ll cover these data types and their uses shortly!
python
The use of *
can allow you to ‘pack’ the remaining values into one object. Placement of the *
is important here!
We’ll utilize packing and upnacking to simplify our code in many places!
_
VariableWhen doing python interactively (as with a JupyterLab notebook), the last evaluated expression is assigned to the variable _
. This carries across code cells.
We’ll use this _
operator when doing computations where we don’t need to save things. For instance,
Where it really comes in handy is as a placeholder variable when doing computations in a for loop or list comprehension (again covered later more fully!).
Here we replace the index of the for loop with _
.
sum_numbers = 0
#no need to create a variable for the index
for _ in range(1,101):
sum_numbers += _
sum_numbers
5050
Careful when modifying elements of a compound object: ‘assignment statements do not copy objects, they create bindings between a target (a spot in computer memory) and an object’!
If you come from R, this is a very different behavior!
#Changing the original compound object (list) modifies both variables
#First, create a 'list' of four values
x = [1, 2, 3, "Cats Rule!"]
#Make y an alias for x (reference the same memory - this differs from how R works)
y = x
#note that they are the same when printing
print(x, y)
[1, 2, 3, 'Cats Rule!'] [1, 2, 3, 'Cats Rule!']
We can modify a list element by using []
after the object name. Note that python
starts counting at 0
. - Here we access and overwrite the 3
element (fourth element in the list)
[1, 2, 10, 'Dogs rule!'] [1, 2, 3, 'Dogs rule!']
.copy()
method. Methods are like functions but we append them to the rear of the object after a .
#Can create a (shallow) copy of the object rather than point to the same object in memory
y = x.copy()
x[2] = 10
x[3]= "No cats rule!"
#Note that y doesn't change its value
print(x, y)
[1, 2, 10, 'No cats rule!'] [1, 2, 10, 'Dogs rule!']
Variable names can use letters, digits, and the underscore symbol (but cannot start with a digit)
Ok variable names:
X
, species5618
, and degrees_celsius
Bad variable names:
777
(begins with a digit)no-way!
(includes punctuation)Python has lots of shorthand notation!
Executed in the following way:
Evaluate the expression on the right of the =
sign to produce a value
Apply the operator to the variable on the left and the value produced
Store this new value in the memory address of the variable on the left of the =
.
This means the operator is applied after the expression on the right is evaluated.
\
or by wrapping the code in ()
Using \
is going to come in very handy when we want to apply multiple methods on one object later in the semester!
Two major ways to do an operation on a variable/object:
function_name(myvar, other_args)
myvar.method(other_args)
Functions are usually more generic actions that you could take on multiple types of objects. For instance, len()
is a function we can run to see the ‘length’ of an object.
Similarly, max()
is another function we can use on many types of objects.
Methods on the other hand are specific to the type of object you are dealing with. Lists will have different methods than a dictionary, for instance.
Here we use the .pop()
method on a this list. This returns and removes the last element from the list.
The .append()
method adds an element to the end of the list.
This quick video shows how to open a new Google Colab notebook and run some basic python code. I’d pop the video out into the panopto player using the arrow icon in the bottom right.
The notebook written in the video is available here.
from IPython.display import IFrame
IFrame(src = 'https://ncsu.hosted.panopto.com/Panopto/Pages/Embed.aspx?id=bae161a8-bac0-4c44-a7a1-b0ef0163e90d&autoplay=false&offerviewer=true&showtitle=true&showbrand=true&captions=false&interactivity=all', width = '620', height = '380')
Create variables with =
Many built-in data structures
Python shorthands (multiple assignment, _
variable, augmented assignment)
Careful when copying a variable
Functions and Methods
If you are on the course website, use the table of contents on the left or the arrows at the bottom of this page to navigate to the next learning material!
If you are on Google Colab, head back to our course website for our next lesson!