Published

2025-03-31

Open In Colab

Basic Use of Python

Justin Post (Some notes modified from Dr. Eric Chi)


In preparation for dealing with big data we need to learn a programming language and figure out a good coding environment. We’ll learn python and code in Google Colab/JupyterLab.

We choose python due to its popularity and the ease of programming in spark (a big data software) through pyspark.

We use JupyterLab as it is a widely used software for creating python notebooks. Google Colab is built on JupyterLab!

Note: These types of webpages are built from Jupyter notebooks (.ipynb files). You can access your own versions of them by clicking here. It is highly recommended that you go through and run the notebooks yourself, modifying and rerunning things where you’d like!


Getting Started

When you open a new notebook in colab by default it will use python to run any ‘code cells’ (this can be changed in the ‘notebook settings’ under the View -> ‘Notebook info’ menu).

There are two types of cells: - Code cells: allow you to submit code - Text cells: allow you to write text using ‘markdown’ (we’ll learn more about that shortly!)

These can be added in the top left of the notebook (+ Code and + Text). Below is a python code cell. These can be run by clicking ‘shift-enter’ when you click on the cell.

#A comment - this text is not evaluated
5 + 6
10 * 2
5**2
25
  • Only the last bit of code is ‘printed’ unless you specifically print it. We’ll do this much of the time with print() function.
# % is mod, // is floor
print(10 / 3)
print(10 % 3)
print(10 // 3)
3.3333333333333335
1
3
  • Operators are applied left to right, except for exponentiation
3 + 4 - 5
2
(3 + 4) - 5
2
3**2**4
43046721
#interpreted this way
3**(2**4)
43046721
#not this
(3**2)**4
6561

Creating Variables

You can create an object using =. This saves the result in a variable you can call later.

x = "Hello! "
y = 'How are you?'
print(x)
print(x + y)
Hello! 
Hello! How are you?
  • Strings are automatically concatenated using the + operator. As with most programming languages, there are special characters like \ which indicate something. For instance, \n is a line break. These appear differently depending on if you print something or just view the object.
x = "Hello! \n"
y = 'Then I asked, "How are you?"'
x
'Hello! \n'
print(x)
Hello! 
x + y
'Hello! \nThen I asked, "How are you?"'
print(x + y)
Hello! 
Then I asked, "How are you?"
  • Variables can be used to simplify and generalize your code
degrees_celsius = 26.0
print(9 / 5 * degrees_celsius + 32)
degrees_celsius = 100
print(9 / 5 * degrees_celsius + 32)
78.80000000000001
212.0

You might try to add a code cell to this notebook and

  • Create a new string variable
  • Use + to concatenate it with the strings above

Object Types

There are a number of built-in objects you can create. Some important ones are listed below:

  • Text Type: str
y = "text string"
type(y)
str
  • Numeric Types: int, float
y = 10
print(type(y))
x = 10.4
print(type(x))
<class 'int'>
<class 'float'>
  • Boolean Type: bool
y = True
type(y)
bool
  • Sequence Types: list, tuple
z = [10, "a", 11.5, True]
type(z)
list
  • Mapping Type: dict
w = {"key1": "value1",
     "key2": ["value2", 10]}
type(w)
dict

We’ll cover these data types and their uses shortly!


Multiple Assignment

  • Assigning multiple variables on one line is easy in python
x, y, z = "Orange", "Banana", "Cherry"
print(x)
print(y)
print(z)
Orange
Banana
Cherry
x = y = z = "Orange"
print(x)
print(y)
Orange
Orange

The use of * can allow you to ‘pack’ the remaining values into one object. Placement of the * is important here!

x, *y = "Orange", "Banana", "Cherry"
print(x)
print(y)
type(y)
Orange
['Banana', 'Cherry']
list
*x, y = "Orange", "Banana", "Cherry"
print(x)
print(y)
['Orange', 'Banana']
Cherry

We’ll utilize packing and upnacking to simplify our code in many places!


_ Variable

When doing python interactively (as with a JupyterLab notebook), the last evaluated expression is assigned to the variable _. This carries across code cells.

x, y, z = "Orange", "Banana", "Cherry"
x
'Orange'
_
'Orange'
x
'Orange'
#print doesn't count toward the _!
print(y)
Banana
_
'Orange'
y
'Banana'
_
'Banana'

We’ll use this _ operator when doing computations where we don’t need to save things. For instance,

degrees_celcius = 100
(9 / 5) * degrees_celcius + 32
212.0
_ - 10
202.0
(9 / 5) * degrees_celcius + 32 - 10
202.0
_ * 10
2020.0

Where it really comes in handy is as a placeholder variable when doing computations in a for loop or list comprehension (again covered later more fully!).

Here we replace the index of the for loop with _.

sum_numbers = 0
#no need to create a variable for the index
for _ in range(1,101):
  sum_numbers += _
sum_numbers
5050

Copying vs Referencing

Careful when modifying elements of a compound object: ‘assignment statements do not copy objects, they create bindings between a target (a spot in computer memory) and an object’!

If you come from R, this is a very different behavior!

#Changing the original compound object (list) modifies both variables
#First, create a 'list' of four values
x = [1, 2, 3, "Cats Rule!"]
#Make y an alias for x (reference the same memory - this differs from how R works)
y = x
#note that they are the same when printing
print(x, y)
[1, 2, 3, 'Cats Rule!'] [1, 2, 3, 'Cats Rule!']

We can modify a list element by using [] after the object name. Note that python starts counting at 0. - Here we access and overwrite the 3 element (fourth element in the list)

#Modifying x here actually modifies y too!
x[3] = "Dogs rule!"
print(x, y)
[1, 2, 10, 'Dogs rule!'] [1, 2, 3, 'Dogs rule!']
  • If you want to avoid this behavior, you can create a copy of the object instead of a reference
  • To do so, we use the .copy() method. Methods are like functions but we append them to the rear of the object after a .
#Can create a (shallow) copy of the object rather than point to the same object in memory
y = x.copy()
x[2] = 10
x[3]= "No cats rule!"
#Note that y doesn't change its value
print(x, y)
[1, 2, 10, 'No cats rule!'] [1, 2, 10, 'Dogs rule!']

Variable Names

Variable names can use letters, digits, and the underscore symbol (but cannot start with a digit)

Ok variable names:

  • X, species5618, and degrees_celsius

Bad variable names:

  • 777 (begins with a digit)
  • no-way! (includes punctuation)

Augmented Assignment

Python has lots of shorthand notation!

  • Quite often we want to take a value, add to it, and replace the old value
winnings = 100
winnings = winnings + 20
winnings
120
  • ‘Augmented assignment’ gives a shorthand for doing this
winnings = 100
winnings += 20
winnings
120
  • This works for all operators except negation
#subtraction
winnings
winnings -= 30
winnings
90
#multiplication
winnings *= 40
winnings
3600
#exponentiation
winnings **= 1/2
winnings
60.0

Augmented Assignment Execution

Executed in the following way:

  1. Evaluate the expression on the right of the = sign to produce a value

  2. Apply the operator to the variable on the left and the value produced

  3. Store this new value in the memory address of the variable on the left of the =.

This means the operator is applied after the expression on the right is evaluated.

winnings = 100
winnings += 100*10
winnings
1100

Continuing a Line of Code

  • For long lines of code, we can break the code across multiple lines using \ or by wrapping the code in ()
10 + 20 - 100 * 60 \
/ 20
-270.0
(10 + 20 - 100 * 60
/20)
-270.0

Using \ is going to come in very handy when we want to apply multiple methods on one object later in the semester!


Functions & Methods

Two major ways to do an operation on a variable/object:

  • Functions: function_name(myvar, other_args)
  • Methods: myvar.method(other_args)

Functions are usually more generic actions that you could take on multiple types of objects. For instance, len() is a function we can run to see the ‘length’ of an object.

myList = [1, 10, 100, 1000]
#len function
len(myList)
4

Similarly, max() is another function we can use on many types of objects.

#max function
max(myList)
1000

Methods on the other hand are specific to the type of object you are dealing with. Lists will have different methods than a dictionary, for instance.

Here we use the .pop() method on a this list. This returns and removes the last element from the list.

#pop method
myList.pop(3)
1000
#last element removed
myList
[1, 10, 100]

The .append() method adds an element to the end of the list.

myList.append(100000)
myList
[1, 10, 100, 100000]

Video Demo

This quick video shows how to open a new Google Colab notebook and run some basic python code. I’d pop the video out into the panopto player using the arrow icon in the bottom right.

The notebook written in the video is available here.

from IPython.display import IFrame
IFrame(src = 'https://ncsu.hosted.panopto.com/Panopto/Pages/Embed.aspx?id=bae161a8-bac0-4c44-a7a1-b0ef0163e90d&autoplay=false&offerviewer=true&showtitle=true&showbrand=true&captions=false&interactivity=all', width = '620', height = '380')

Recap

  • Create variables with =

  • Many built-in data structures

  • Python shorthands (multiple assignment, _ variable, augmented assignment)

  • Careful when copying a variable

  • Functions and Methods

If you are on the course website, use the table of contents on the left or the arrows at the bottom of this page to navigate to the next learning material!

If you are on Google Colab, head back to our course website for our next lesson!