×

Pandas Series

The Series is a data structure provided by the Pandas library. Basically, a Series represents a 1-dimensional labeled array or represents a single column of tabular data. The Series is capable of holding any data type (integers, strings, floating-point numbers, Python objects, etc.). So the Series can take data of any datatype(heterogeneous).

The size of the Series remains the same(number of values do not change) but the value of data can be changed(mutable). The basic method to create a Series is to call Series() function.

Series():

The Series() function of Pandas Library creates a series. The Series() function returns 1-dimensional array with axis labels.

Syntax:

Series(data=None, index=None)

The data parameter contains the data stored in Series. The data can be a Python list, a Python dictionary, a numpy array, or a scalar value. In case the data is a dictionary then the argument order is maintained.

The index parameter is the indices or the values for the axis labels. The index can be a list or an array of values. This is an optional parameter and if nothing is mentioned, then the index is having integer values from 0 to the total number of elements in data – 1 (i.e. 0 to len(data) -1).

Series Creation:

The Series data structure of Pandas can be created in several ways:

  • Series object from a scalar value.
  • Series object from a Python list.
  • Series object from a Python dictionary.
  • Series object from a numpy array.
  • Series object from a text file and a csv file.

Series object from a scalar value:

Creating a Series object from scalar value generally works with the index parameter of the Series() function. The scalar value is assigned for each of the index values. The scalar value will be repeated to match the length index values.

import pandas as pd #Importing Pandas library
scalar_series = pd.Series(12, index = ['a', 'b', 'c', 'd', 'e'])
print(scalar_series)

#Output:
a    12
b    12
c    12
d    12
e    12
dtype: int64

The import statement is necessary to be mentioned once in your code and a good programming practice is to write the import statement on the first line of the code.

The output shows the Series created by a scalar value with string indices from ‘a’ to ‘e’. The Series also shows the dtype i.e. the datatype of the data in the Series. Since we passed an integer value as data, the dtype is int64. Note that the dtype does not show the datatype of index values.

Series object from a Python list:

Creating a Series object from a Python list. The advantage of Series over Python list is that the indices don’t have to be restricted to numeric values(can be strings, decimals(float), or date-time objects e.t.c.).

plist = ['a', 'b', 'c', 'd']
list_series = pd.Series(plist, index = [0.5, 1, 1.5, 2])
print(list_series)

#Output:
0.5    a
1.0    b
1.5    c
2.0    d
dtype: object

Note: The string values in Pandas have dtype: object.

Series object from a Python dictionary:

Creating a Series object from a Python dictionary. The dictionary keys are used as index values. The advantage of Series over Python dictionary is that the indices of Series don’t have to be unique(can have duplicates).

pdict = {'a': 1, 'b': 2, 'c': 3, 'd': 4}
dict_series = pd.Series(pdict)
print(dict_series)

#Output:
a    1
b    2
c    3
d    4
dtype: int64

Series with duplicate indices:

pdict = {'a': [1, 2], 'b': 3}
duplicateindex_series = pd.Series(pdict, index = ['a', 'b', 'a', 'a'])
print(duplicateindex_series)

#Output:
a    [1, 2]
b         3
a    [1, 2]
a    [1, 2]
dtype: object

Series object from a numpy array:

Creating a Series object from a numpy array. For this, we need to import the NumPy(Numerical Python) library, and from the array() function of NumPy, we will create a numpy array.

import numpy as np
numpy_array = np.array([1.8, 2.4, 3.6, 4.9])
numpyarray_series = pd.Series(numpy_array)
print(numpyarray_series)

#Output:
0    1.8
1    2.4
2    3.6
3    4.9
dtype: float64

Series object from a text file and a csv file:

Creating a Series object from a text file and CSV file. For this, we need to read data from the text and CSV files which can be done through the Pandas read_csv() function.

The read_csv() function helps us read both text and CSV files having data either comma-separated or white-spaced. The read_csv function returns a DataFrame i.e. the complete tabular data. The file path with the filename to be read should be mentioned as first parameter in the read_csv() function

To grab only a Series (single column values), we will require usecols parameter to grab the column values of the column specified and the squeeze parameter to make the table dimension reduce to a minimum(in this case, reduce to a single column).

txt_series = pd.read_csv("C:/Users/risha/Documents/text_data.txt",
                         usecols=['a_col'], squeeze = True)
print(txt_series)
#Output:
0    1a
1    2b
2    3c
3    4d
Name: a_col, dtype: object

csv_series = pd.read_csv("C:/Users/risha/Documents/csv_data.csv",
                         usecols=['Period'], squeeze = True)
print(csv_series)
#Output:
0     2020.06
1     2020.06
2     2020.06
3     2020.06
4     2020.06
       ...   
73    2020.06
74    2020.06
75    2020.06
76    2020.06
77    2020.06
Name: Period, Length: 78, dtype: float64

Series Attributes:

An attribute is a property or characteristic. The Attributes do not transform or perform operations on the variables or data instead they are used to give you more information about the given data. Pandas Series objects come with a number of built-in attributes.

  • dtype: The dtype attribute returns the datatype of the underlying data in Series.
  • at: The at attribute access a single value for an axis label.
  • axes:The axes attribute return a list of the row axis labels.
  • index: The index attribute returns the index (axis labels) of the Series.
  • is_unique: The is_unique attribute returns True if values in the Series are unique.
  • nbytes: The nbytes attribute returns the total number of bytes of the Series.
  • shape: The shape attribute returns a tuple of the shape of the Series.
  • size: The size attribute returns the number of elements of the Series.
  • values: The values attribute returns the values of the Series as an array.
series = pd.Series([0.1, 0.2, 0.3], index=['a', 'b', 'c'])
print(series.dtype)
#Output:
float64

print(series.at['c'])
#Output:
0.3

print(series.axes)
#Output:
Index(['a', 'b', 'c'], dtype='object')]

print(series.index)
#Output:
Index(['a', 'b', 'c'], dtype='object')

print(series.is_unique)
#Output:
True

print(series.nbytes)
#Output:
24

print(series.shape)
#Output:
(3,)

print(series.size)
#Output:
3

print(series.values)
#Output:
[0.1 0.2 0.3]

Series Functions:

A function is a block of code defined with a name. The Functions are used to perform specific operations on the provided variable or the data. Pandas Series objects have several built-in functions.

Arithmetic Functions:

  • add(): The add() function returns addition of series and other series or a scalar value, element-wise.
  • sub(): The sub() function returns subtraction of series and other series or a scalar value, element-wise.
  • mul(): The mul() function returns multiplication of series and other series or a scalar value, element-wise.
  • div(): The div() function returns floating division of series and other series or a scalar value, element-wise.
  • mod(): The mod() function returns modulo of series and other series or a scalar value, element-wise.
  • pow(): The pow() function returns exponential power of series and other series or a scalar value, element-wise.
  • abs(): The abs() function returns a Series with absolute numeric value of each element.
series1 = pd.Series([0.1, -0.2, 0.3])
series2 = pd.Series([1, 4, 7])
series3 = pd.Series([-9, 3, 8.9])
print(series1.add(series2))
#Output:
0    1.1
1    3.8
2    7.3
dtype: float64
    
print(series2.sub(series1))
#Output:
0    0.9
1    4.2
2    6.7
dtype: float64
    
print(series1.mul(series2))
#Output:
0    0.1
1   -0.8
2    2.1
dtype: float64
    
print(series2.div(series1))
#Output:
0    10.000000
1   -20.000000
2    23.333333
dtype: float64
    
print(series1.mod(series2))
#Output:
0    0.1
1    3.8
2    0.3
dtype: float64
    
print(series2.pow(series1))
#Output:
0    1.000000
1    0.757858
2    1.792790
dtype: float64
    
print(series3.abs())
#Output:
0    9.0
1    3.0
2    8.9
dtype: float64

Comparison(Relational) Functions:

  • eq(): The eq() function returns True if both the series and other series or a scalar value are same, element-wise.
  • ne(): The eq() function returns True if both the series and other series or a scalar value are not same, element-wise.
  • le(): The le() function returns True if the series is less than or equal to other series or a scalar value are same, element-wise.
  • lt(): The lt() function returns True if the series is less than to other series or a scalar value are same, element-wise.
  • ge(): The ge() function returns True if the series is greater than or equal to other series or a scalar value are same, element-wise.
  • gt(): The gt() function returns True if the series is greater than to other series or a scalar value are same, element-wise.
  • compare(): The compare() function compare two Series and show the differences.
series1 = pd.Series(['a', 'e', 'i', 'o', 'u'])
series2 = pd.Series(['k', 'b', 'i', 'o', 't'])
print(series1.eq(series2))
#Output:
0    False
1    False
2     True
3     True
4    False
dtype: bool
    
print(series1.ne(series2))
#Output:
0     True
1     True
2    False
3    False
4     True
dtype: bool
    
print(series1.le(series2))
#Output:
0     True
1    False
2     True
3     True
4    False
dtype: bool
    
print(series1.lt(series2))
#Output:
0     True
1    False
2    False
3    False
4    False
dtype: bool
    
print(series1.ge(series2))
#Output:
0    False
1     True
2     True
3     True
4     True
dtype: bool

print(series1.gt(series2))
#Output:
0    False
1     True
2    False
3    False
4     True
dtype: bool
    
print(series1.compare(series2))
#Output:
  self other
0    a     k
1    e     b
4    u     t

Statistical Functions:

  • max(): The max() function returns the maximum of the values in series over the requested axis.
  • min(): The min() function returns the minimum of the values in series over the requested axis.
  • sum(): The sum() function returns the sum of the values in series over the requested axis.
  • prod(): The product() function returns the product of the values in series over the requested axis.
  • count(): The count() function returns number of non-NA/null observations in the series.
  • mean(): The mean() function returns the mean of the values in series over the requested axis.
  • median(): The median() function returns the median of the values in series over the requested axis.
  • mode(): The median() function returns the mode(s) of the series. The mode is the value that appears most often and there can be multiple modes.
  • var(): The var() function returns unbiased variance over the requested axis.
  • std(): The std() function returns standard deviation over requested axis.
  • cumsum(): The cumsum() function returns cumulative sum over a series.
  • cumprod(): The cumprod() function returns cumulative product over a series.
  • describe(): The descirbe() function returns generate descriptive statistics. Descriptive statistics include count, mean, standard deviation, minimum and maximum values of the series.
data = list(range(1, 100, 15))
series = pd.Series(data)
print(series.max())
#Output:
91

print(series.min())
#Output:
1

print(series.sum())
#Output:
322

print(series.prod())
#Output:
9625522816

print(series.count())
#Output:
7

print(series.mean())
#Output:
46.0

print(series.median())
#Output:
46.0

print(series.mode())
#Output:
0     1
1    16
2    31
3    46
4    61
5    76
6    91
dtype: int64
    
print(series.var())
#Output:
1050.0

print(series.std())
#Output:
32.4037034920393

print(series.cumsum())
#Output:
0      1
1     17
2     48
3     94
4    155
5    231
6    322
dtype: int64

print(series.cumprod())
#Output:
0             1
1            16
2           496
3         22816
4       1391776
5     105774976
6    9625522816
dtype: int64

print(series.describe())
#Output:
count     7.000000
mean     46.000000
std      32.403703
min       1.000000
25%      23.500000
50%      46.000000
75%      68.500000
max      91.000000
dtype: float64

Index functions:

  • argmin(): The argmin() function returns integer position of the smallest value in the series.
  • argmax(): The argmax() function returns integer position of the largest value in the series.
  • first_valid_index(): The first_valid_index() function returns index for first non-NA value or None. If all elements are non-NA/null, returns None and also returns None for empty series.
  • last_valid_index(): The last_valid_index() function returns index for last non-NA value.
  • keys(): The keys() function returns the index of the series. If the indices are sequencial integers, then it returns a range of index values.
series = pd.Series([float('nan'), 23, 34, 15, 56, 67, 76, 49, float('nan')])
print(series.argmin())
#Output:
3

print(series.argmax())
#Output:
6

print(series.first_valid_index())
#Output:
1

print(series.last_valid_index())
#Output:
7

print(series.keys())
#Output:
RangeIndex(start=0, stop=9, step=1)