Published

2025-03-31

Open In Colab

Pandas Series

Justin Post

  • Pandas library supports a DataFrame object similar to R’s data frame. This DataFrame’s columns are made up of pandas Series objects.

> Image from https://www.altexsoft.com/blog/pandas-library/

  • Convention is to import the module as pd

  • First we’ll learn about the Series objects. These make up a DataFrame object, which we’ll use to handle many rectangular datasets

  • pandas Series are

    • 1D labeled array that can hold any data type
    • Contains values and indices that are used to extract those values

Note: These types of webpages are built from Jupyter notebooks (.ipynb files). You can access your own versions of them by clicking here. It is highly recommended that you go through and run the notebooks yourself, modifying and rerunning things where you’d like!

Creating a pandas Series

  • Create a series using the pd.Series() function
import numpy as np
import pandas as pd
rng = np.random.default_rng(2) #set a seed
s = pd.Series(rng.normal(size = 10, loc = 2, scale = 4)) #mean of 2 and std of 4
s
0
0 2.756214
1 -0.090994
2 0.347746
3 -7.765870
4 9.198830
5 6.576663
6 0.698309
7 5.095226
8 3.124843
9 -0.215291


Indexing a Series

  • Like lists, the ordering starts at 0
  • Like numpy arrays, all elements in a Series must be of the same type
  • Unlike numpy arrays, Series can be indexed by an index attribute (not just the numeric index)
  • .index attribute returns just these indices
s.index
RangeIndex(start=0, stop=10, step=1)
s[0] #is both the numeric index and the value of an index here
2.756213527174132
s2 = pd.Series(rng.normal(size = 10, loc = 2, scale = 4),
               index = [x for x in "abcdefghij"])
s2
0
a 5.910270
b 0.757774
c 0.684704
d -1.168587
e 3.819832
f 1.603208
g 4.181155
h -0.428743
i 2.507311
j -1.569096

s2.index
Index(['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'], dtype='object')

We can access elements with the numeric index or the index value itself but this behavior will go away soon and the .iloc[] method should be used instead (we discuss the similar DataFrames .iloc[] method shortly).

s2[2]
FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
  s2[2]
0.6847043837681492
s2["c"]
0.6847043837681492
  • We can obtain just the values with of a Series using the .values attribute
s.values
array([ 2.75621353, -0.09099377,  0.34774583, -7.76586953,  9.19882953,
        6.57666349,  0.69830865,  5.09522635,  3.12484268, -0.21529135])
s2.values
array([ 5.9102698 ,  0.75777381,  0.68470438, -1.16858702,  3.81983228,
        1.60320779,  4.18115486, -0.4287428 ,  2.50731139, -1.56909617])
  • Note that when you return the values you get back just a numpy array!
type(s2.values)
numpy.ndarray

Series Relation to Other Common Objects

Relation to Dictionaries

  • Recall a dictionary consists of key/value pairs
  • When creating a Series from a dictionary, the keys of the dictionary become the indices
d = {'b': 1,
     'a': 0,
     'c': 2}
pd.Series(d)
0
b 1
a 0
c 2

Here’s an example with more complex values that show the values of a Series can be a list!

AFCDivisions = {
  "AFCNorth": ["Steelers", "Browns", "Ravens", "Bengals"],
  "AFCEast" : ["Patriots", "Jets", "Dolphins", "Bills"],
  "AFCWest" : ["Raiders", "Chiefs", "Chargers", "Broncos"],
  "AFCSouth": ["Texans", "Colts", "Jaguars", "Titans"]
  }
div_series = pd.Series(AFCDivisions)
div_series
0
AFCNorth [Steelers, Browns, Ravens, Bengals]
AFCEast [Patriots, Jets, Dolphins, Bills]
AFCWest [Raiders, Chiefs, Chargers, Broncos]
AFCSouth [Texans, Colts, Jaguars, Titans]

  • Series are like a fixed-size dict object

    • Can get and set values within a Series using the index label
    • But Series have an ordering to them so, unlike a dictionary, we can use a numeric index (although again, .iloc[] is now the preferred way to do numeric index subsetting)
div_series["AFCNorth"]
['Steelers', 'Browns', 'Ravens', 'Bengals']
div_series[0]
FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
  div_series[0]
['Steelers', 'Browns', 'Ravens', 'Bengals']
div_series.iloc[0]
['Steelers', 'Browns', 'Ravens', 'Bengals']
  • We can check if an index occurs similar to how we could check if a key occurred in a dictionary
print("AFCNorth" in AFCDivisions)
print("AFCNorth" in div_series)
True
True

Relation to Numpy Arrays

  • Series behave very similarly to NumPy’s 1D ndarray
  • In fact, NumPy functions can typically take series as input!
s #was created from a numpy array!
0
0 2.756214
1 -0.090994
2 0.347746
3 -7.765870
4 9.198830
5 6.576663
6 0.698309
7 5.095226
8 3.124843
9 -0.215291

np.exp(s)
0
0 15.740130
1 0.913023
2 1.415872
3 0.000424
4 9885.551552
5 718.139247
6 2.010350
7 163.240789
8 22.756315
9 0.806306

  • Numerical operations are done element-wise
s * 3
0
0 8.268641
1 -0.272981
2 1.043237
3 -23.297609
4 27.596489
5 19.729990
6 2.094926
7 15.285679
8 9.374528
9 -0.645874


Relation to lists

  • Series are like a list object in that you can

    • get and set values by integer index location
    • can using slicing with :
s[4] = 0
s
0
0 2.756214
1 -0.090994
2 0.347746
3 -7.765870
4 0.000000
5 6.576663
6 0.698309
7 5.095226
8 3.124843
9 -0.215291

s[:5]
0
0 2.756214
1 -0.090994
2 0.347746
3 -7.765870
4 0.000000

s[3:5]
0
3 -7.76587
4 0.00000


Recap

  • Pandas Series will make up Pandas DataFrames
    • Each column of a DataFrame is made up of a Series
  • Series are:
    • a 1D data structure with indices and values

If you are on the course website, use the table of contents on the left or the arrows at the bottom of this page to navigate to the next learning material!

If you are on Google Colab, head back to our course website for our next lesson!