×

NumPy Statistical Functions

The NumPy package provides several functionalities to perform statistical operations on the array elements.

  • amin()
  • amax()
  • nanmin()
  • nanmax()
  • ptp()
  • percentile()
  • nanpercentile()
  • quantile()
  • nanquantile()
  • median()
  • average()
  • mean()
  • std()
  • var()
  • nanmean()
  • nanmedian()
  • nanstd()
  • nanvar()
  • corrcoef()
  • correlate()
  • covariance()

amin():

The NumPy amin() function fetches the minimum values of the input array along a given axis.

Syntax:

amin(a, axis=None)

The a parameter is the input array and the axis parameter is the axis or axes along which amin() function operates(optional parameter).

import numpy as np
a = np.arange(50, 10, -10).reshape(2,2)

print(np.amin(a, axis=0)) # axis = 0 means row wise
#Output:
[30 20]

print(np.amin(a, axis=1)) #axis = 1 means column wise
#Output:
[40 20]

amax():

The amax() function returns maximum values of the input array along a given axis.

Syntax:

amax(a, axis=None)
a = np.arange(50, 10, -10).reshape(2,2)

print(np.amax(a, axis=0)) # axis = 0 means row wise
#Output:
[50 40]

print(np.amax(a, axis=1)) #axis = 1 means column wise
#Output:
[50 30]

nanmin():

The nanmin() function fetches the minimum values of the input array along a given axis, ignoring any NaN values. The NaN value means Not A Number and is used to represent the missing value in the data.

Syntax:

nanmin(a, axis=None)
b = np.array([[10, np.nan, 40], [8, 23, np.nan]])

print(np.nanmin(b, axis=0)) # axis = 0 means row wise
#Output:
[ 8. 23. 40.]

print(np.nanmin(b, axis=1)) #axis = 1 means column wise
#Output:
[10.  8.]

nanmax():

The nanmax() function of NumPy returns the maximum values of the input array along a given axis, ignoring any NaN values.

Syntax:

nanmax(a, axis=None)
b = np.array([[10, np.nan, 40], [8, 23, np.nan]])

print(np.nanmax(b, axis=0)) # axis = 0 means row wise
#Output:
[10. 23. 40.]

print(np.nanmax(b, axis=1)) #axis = 1 means column wise
#Output:
[40. 23.]

ptp():

The NumPy ptp() function returns a range of values (maximum to minimum) of the input array, along a given axis. The ptp() function comes from the acronym for “peak to peak”.

Syntax:

ptp(a, axis=None)
arr = np.array([[4, 3, 6, 10], [1, 8, 9, 12]])

print(np.ptp(arr, axis=0)) # axis = 0 means row wise
#Output:
[3 5 3 2]

print(np.ptp(arr, axis=1)) #axis = 1 means column wise
#Output:
[ 7 11]

percentile():

The percentile() function calculates the percentile of the input array along the specified axis. The percentile of a number is the percentage of values that fall below that particular number in a set of numbers.

Syntax:

percentile(a, q, axis=None)

The a parameter is the input array, the q parameter is percentile or sequence of percentiles to compute(which must be between 0 and 100 (inclusive)), and the axis is the axis along which the percentiles are computed.

arr = np.array([[4, 3, 6, 10], [1, 8, 9, 12]])

print(np.percentile(arr, 50, axis=0)) # axis = 0 means row wise
#Output:
[ 2.5  5.5  7.5 11. ]

print(np.percentile(arr, 50, axis=1)) #axis = 1 means column wise
#Output:
[5.  8.5]

nanpercentile():

The nanpercentile() function calculates the percentile of the input array along the specified axis while ignoring the NaN values.

Syntax:

nanpercentile(a, q, axis=None)
b = np.array([[30, np.nan, 17], [6, 23, np.nan]])

print(np.nanpercentile(b, 50, axis=0)) # axis = 0 means row wise
#Output:
[18. 23. 17.]

print(np.nanpercentile(b, 50, axis=1)) #axis = 1 means column wise
#Output:
[23.5 14.5]

quantile():

The quantile() function of NumPy computes the quantile of the input array along the specified axis. The quantile of a number determines how many values in a distribution are above or below that number.

Syntax:

quantile(a, q, axis=None)

The a parameter is the input array, the q parameter is the quantile or sequence of quantiles to compute(which must be between 0 and 1(inclusive)), and the axis is the axis along which the quantiles are calculated.

arr = np.array([[4, 3, 6, 10], [1, 8, 9, 12]])

print(np.quantile(arr, 0.3, axis=0)) # axis = 0 means row wise
#Output:
[ 1.9  4.5  6.9 10.6]

print(np.quantile(arr, 0.3, axis=1)) #axis = 1 means column wise
#Output:
[3.9 7.3]

nanquantile():

The quantile() function of NumPy computes the quantile of the input array along the specified axis while ignoring the NaN values.

Syntax:

nanquantile(a, q, axis=None)
b = np.array([[30, np.nan, 17], [6, 23, np.nan]])

print(np.nanquantile(b, 0.4, axis=0)) # axis = 0 means row wise
#Output:
[15.6 23.  17. ]

print(np.nanquantile(b, 0.4, axis=1)) #axis = 1 means column wise
#Output:
[22.2 12.8]

average():

The average() function of NumPy calculates the weighted average of the input array along a given axis.

Syntax:

average(a, axis=None, weights=None)

The a parameter is the input array, the axis is the axis along which the averages are calculated, and the weight parameter is an array of weights associated with the input array elements.

arr = np.arange(11, 20).reshape(3,3)
weight= np.arange(10,1, -1).reshape(3,3)

print(np.average(arr, axis=0, weights=weight)) # axis = 0 means row wise
#Output:
[13.14285714 14.         14.8       ]

print(np.average(arr, axis=1, weights=weight)) #axis = 1 means column wise
#Output:
[11.92592593 14.88888889 17.77777778]

median():

The median() function returns the median of the input array along a given axis. The median is the middle number in a sorted, ascending or descending, sequence of numbers.

Syntax:

median(a, axis=None)

The a parameter is the input array and the axis parameter is the axis or axes along which medians are computed(optional parameter).

arr = np.arange(11, 20).reshape(3,3)

print(np.median(arr, axis=0)) # axis = 0 means row wise
#Output:
[14. 15. 16.]

print(np.median(arr, axis=1)) # axis = 1 means column wise
#Output:
[12. 15. 18.]

mean():

The NumPy mean() function calculates the arithmetic mean of the input array along a specified given axis. The mean is the sum (total) of all the values in a sequence of numbers divided by the number of elements in the sequence. 

Syntax:

mean(a, axis=None)
arr = np.arange(1, 10).reshape(3,3)

print(np.mean(arr, axis=0)) # axis = 0 means row wise
#Output:
[4. 5. 6.]

print(np.mean(arr, axis=1)) # axis = 1 means column wise
#Output:
[2. 5. 8.]

std():

The std() function returns the standard deviation of the input array along the specified axis. The standard deviation is the measure of the dispersion of a sequence of numbers from its mean. The standard deviation is computed for the flattened array by default, otherwise over the specified axis.

Syntax:

std(a, axis=None)
arr = np.arange(1, 10).reshape(3,3)

print(np.std(arr, axis=0)) # axis = 0 means row wise
#Output:
[2.44948974 2.44948974 2.44948974]

print(np.std(arr, axis=1)) # axis = 1 means column wise
#Output:
[0.81649658 0.81649658 0.81649658]

var():

The var() function returns the variance of the input array along the specified axis. The variance measures the average degree to which each point differs from the mean. The variance is calculated for the flattened array by default, otherwise over the specified axis.

Syntax:

var(a, axis=None)
arr = np.arange(1, 10).reshape(3,3)

print(np.var(arr, axis=0)) # axis = 0 means row wise
#Output:
[6. 6. 6.]

print(np.var(arr, axis=1)) # axis = 1 means column wise
#Output:
[0.66666667 0.66666667 0.66666667]

nanmean():

The nanmean() function calculates the arithmetic mean of the input array along a specified given axis ignoring NaN values.

Syntax:

nanmean(a, axis=None)
b = np.array([[11, 13, np.nan, 17], [19, 23, 29, np.nan]])

print(np.nanmean(b, axis=0)) # axis = 0 means row wise
#Output:
[15. 18. 29. 17.]

print(np.nanmean(b, axis=1)) # axis = 1 means column wise
#Output:
[13.66666667 23.66666667]

nanmedian():

The median() function returns the median of the input array along a given axis while ignoring NaN values.

Syntax:

nanmedian(a, axis=None)
b = np.array([[11, 13, np.nan, 17], [19, 23, 29, np.nan]])

print(np.nanmedian(b, axis=0)) # axis = 0 means row wise
#Output:
[15. 18. 29. 17.]

print(np.nanmedian(b, axis=1)) # axis = 1 means column wise
#Output:
[13. 23.]

nanstd():

The nanstd() function returns the standard deviation of the input array along the specified axis ignoring the NaN values.

Syntax:

nanstd(a, axis=None)

The a parameter is the input array and the axis parameter is the axis or axes along which standard deviations are computed(optional parameter).

b = np.array([[11, 13, np.nan, 17], [19, 23, 29, np.nan]])

print(np.nanstd(b, axis=0)) # axis = 0 means row wise
#Output:
[4. 5. 0. 0.]

print(np.nanstd(b, axis=1)) # axis = 1 means column wise
#Output:
[2.49443826 4.10960934]

nanvar():

The nanvar() function is used to get the variance of the input array along the specified axis while ignoring the NaN values.

Syntax:

nanvar(a, axis=None)

The a parameter is the input array and the axis parameter is the axis or axes along which variances are computed(optional parameter).

b = np.array([[11, 13, np.nan, 17], [19, 23, 29, np.nan]])

print(np.nanvar(b, axis=0)) # axis = 0 means row wise
#Output:
[16. 25.  0.  0.]

print(np.nanvar(b, axis=1)) # axis = 1 means column wise
#Output:
[ 6.22222222 16.88888889]