Pandas library supports a DataFrame object similar to R’s data frame. This DataFrame’s columns are made up of pandasSeries objects.
> Image from https://www.altexsoft.com/blog/pandas-library/
Convention is to import the module as pd
First we’ll learn about the Series objects. These make up a DataFrame object, which we’ll use to handle many rectangular datasets
pandasSeries are
1D labeled array that can hold any data type
Contains values and indices that are used to extract those values
Note: These types of webpages are built from Jupyter notebooks (.ipynb files). You can access your own versions of them by clicking here. It is highly recommended that you go through and run the notebooks yourself, modifying and rerunning things where you’d like!
Creating a pandasSeries
Create a series using the pd.Series() function
import numpy as npimport pandas as pdrng = np.random.default_rng(2) #set a seeds = pd.Series(rng.normal(size =10, loc =2, scale =4)) #mean of 2 and std of 4s
0
0
2.756214
1
-0.090994
2
0.347746
3
-7.765870
4
9.198830
5
6.576663
6
0.698309
7
5.095226
8
3.124843
9
-0.215291
Indexing a Series
Like lists, the ordering starts at 0
Like numpy arrays, all elements in a Series must be of the same type
Unlike numpy arrays, Series can be indexed by an index attribute (not just the numeric index)
.index attribute returns just these indices
s.index
RangeIndex(start=0, stop=10, step=1)
s[0] #is both the numeric index and the value of an index here
2.756213527174132
s2 = pd.Series(rng.normal(size =10, loc =2, scale =4), index = [x for x in"abcdefghij"])s2
We can access elements with the numeric index or the index value itself but this behavior will go away soon and the .iloc[] method should be used instead (we discuss the similar DataFrames.iloc[] method shortly).
s2[2]
FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
s2[2]
0.6847043837681492
s2["c"]
0.6847043837681492
We can obtain just the values with of a Series using the .values attribute
Can get and set values within a Series using the index label
But Series have an ordering to them so, unlike a dictionary, we can use a numeric index (although again, .iloc[] is now the preferred way to do numeric index subsetting)
div_series["AFCNorth"]
['Steelers', 'Browns', 'Ravens', 'Bengals']
div_series[0]
FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
div_series[0]
['Steelers', 'Browns', 'Ravens', 'Bengals']
div_series.iloc[0]
['Steelers', 'Browns', 'Ravens', 'Bengals']
We can check if an index occurs similar to how we could check if a key occurred in a dictionary
Series behave very similarly to NumPy’s 1D ndarray
In fact, NumPy functions can typically take series as input!
s #was created from a numpy array!
0
0
2.756214
1
-0.090994
2
0.347746
3
-7.765870
4
9.198830
5
6.576663
6
0.698309
7
5.095226
8
3.124843
9
-0.215291
np.exp(s)
0
0
15.740130
1
0.913023
2
1.415872
3
0.000424
4
9885.551552
5
718.139247
6
2.010350
7
163.240789
8
22.756315
9
0.806306
Numerical operations are done element-wise
s *3
0
0
8.268641
1
-0.272981
2
1.043237
3
-23.297609
4
27.596489
5
19.729990
6
2.094926
7
15.285679
8
9.374528
9
-0.645874
Relation to lists
Series are like a list object in that you can
get and set values by integer index location
can using slicing with :
s[4] =0s
0
0
2.756214
1
-0.090994
2
0.347746
3
-7.765870
4
0.000000
5
6.576663
6
0.698309
7
5.095226
8
3.124843
9
-0.215291
s[:5]
0
0
2.756214
1
-0.090994
2
0.347746
3
-7.765870
4
0.000000
s[3:5]
0
3
-7.76587
4
0.00000
Recap
PandasSeries will make up PandasDataFrames
Each column of a DataFrame is made up of a Series
Series are:
a 1D data structure with indices and values
If you are on the course website, use the table of contents on the left or the arrows at the bottom of this page to navigate to the next learning material!
If you are on Google Colab, head back to our course website for our next lesson!