Agregaciones: Min, Max, etc.
Sumando los Valores de un Array¶
In [1]:
import numpy as np
In [2]:
L = np.random.random(100)
sum(L)
Out[2]:
In [3]:
np.sum(L)
Out[3]:
In [4]:
big_array = np.random.rand(1000000)
%timeit sum(big_array)
%timeit np.sum(big_array)
Minimo y Maximo¶
In [5]:
min(big_array), max(big_array)
Out[5]:
In [6]:
np.min(big_array), np.max(big_array)
Out[6]:
In [7]:
%timeit min(big_array)
%timeit np.min(big_array)
In [8]:
print(big_array.min(), big_array.max(), big_array.sum())
Agregaciones multi dimensionales¶
In [9]:
M = np.random.random((3, 4))
print(M)
In [10]:
M.sum()
Out[10]:
In [11]:
M.min(axis=0)
Out[11]:
In [12]:
M.max(axis=1)
Out[12]:
Otras funciones de agregación¶
The following table provides a list of useful aggregation functions available in NumPy:
Function Name | NaN-safe Version | Description |
---|---|---|
np.sum |
np.nansum |
Compute sum of elements |
np.prod |
np.nanprod |
Compute product of elements |
np.mean |
np.nanmean |
Compute mean of elements |
np.std |
np.nanstd |
Compute standard deviation |
np.var |
np.nanvar |
Compute variance |
np.min |
np.nanmin |
Find minimum value |
np.max |
np.nanmax |
Find maximum value |
np.argmin |
np.nanargmin |
Find index of minimum value |
np.argmax |
np.nanargmax |
Find index of maximum value |
np.median |
np.nanmedian |
Compute median of elements |
np.percentile |
np.nanpercentile |
Compute rank-based statistics of elements |
np.any |
N/A | Evaluate whether any elements are true |
np.all |
N/A | Evaluate whether all elements are true |
We will see these aggregates often throughout the rest of the book.
Ejemplo: Altura promedio de los presidentes¶
In [1]:
!head -4 data/president_heights.csv
In [14]:
import pandas as pd
data = pd.read_csv('data/president_heights.csv')
heights = np.array(data['height(cm)'])
print(heights)
In [15]:
print("Mean height: ", heights.mean())
print("Standard deviation:", heights.std())
print("Minimum height: ", heights.min())
print("Maximum height: ", heights.max())
In [16]:
print("25th percentile: ", np.percentile(heights, 25))
print("Median: ", np.median(heights))
print("75th percentile: ", np.percentile(heights, 75))
In [17]:
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn; seaborn.set() # set plot style
In [18]:
plt.hist(heights)
plt.title('Height Distribution of US Presidents')
plt.xlabel('height (cm)')
plt.ylabel('number');