< Cálculos con Arrays 1. Ufuncs | Contenido | Cálculos con Arrays 2. Broadcasting >

Sumando los Valores de un Array¶

In [1]:

import numpy as np

In [2]:

L = np.random.random(100)
sum(L)

Out[2]:

55.61209116604941

In [3]:

np.sum(L)

Out[3]:

55.612091166049424

In [4]:

big_array = np.random.rand(1000000)
%timeit sum(big_array)
%timeit np.sum(big_array)

10 loops, best of 3: 104 ms per loop
1000 loops, best of 3: 442 µs per loop

Minimo y Maximo¶

In [5]:

min(big_array), max(big_array)

Out[5]:

(1.1717128136634614e-06, 0.9999976784968716)

In [6]:

np.min(big_array), np.max(big_array)

Out[6]:

(1.1717128136634614e-06, 0.9999976784968716)

In [7]:

%timeit min(big_array)
%timeit np.min(big_array)

10 loops, best of 3: 82.3 ms per loop
1000 loops, best of 3: 497 µs per loop

In [8]:

print(big_array.min(), big_array.max(), big_array.sum())

1.17171281366e-06 0.999997678497 499911.628197

Agregaciones multi dimensionales¶

In [9]:

M = np.random.random((3, 4))
print(M)

[[ 0.8967576   0.03783739  0.75952519  0.06682827]
 [ 0.8354065   0.99196818  0.19544769  0.43447084]
 [ 0.66859307  0.15038721  0.37911423  0.6687194 ]]

In [10]:

M.sum()

Out[10]:

6.0850555667307118

In [11]:

M.min(axis=0)

Out[11]:

array([ 0.66859307,  0.03783739,  0.19544769,  0.06682827])

In [12]:

M.max(axis=1)

Out[12]:

array([ 0.8967576 ,  0.99196818,  0.6687194 ])

Otras funciones de agregación¶

The following table provides a list of useful aggregation functions available in NumPy:

Function Name	NaN-safe Version	Description
`np.sum`	`np.nansum`	Compute sum of elements
`np.prod`	`np.nanprod`	Compute product of elements
`np.mean`	`np.nanmean`	Compute mean of elements
`np.std`	`np.nanstd`	Compute standard deviation
`np.var`	`np.nanvar`	Compute variance
`np.min`	`np.nanmin`	Find minimum value
`np.max`	`np.nanmax`	Find maximum value
`np.argmin`	`np.nanargmin`	Find index of minimum value
`np.argmax`	`np.nanargmax`	Find index of maximum value
`np.median`	`np.nanmedian`	Compute median of elements
`np.percentile`	`np.nanpercentile`	Compute rank-based statistics of elements
`np.any`	N/A	Evaluate whether any elements are true
`np.all`	N/A	Evaluate whether all elements are true

We will see these aggregates often throughout the rest of the book.

Ejemplo: Altura promedio de los presidentes¶

In [1]:

!head -4 data/president_heights.csv

order,name,height(cm)
1,George Washington,189
2,John Adams,170
3,Thomas Jefferson,189

In [14]:

import pandas as pd
data = pd.read_csv('data/president_heights.csv')
heights = np.array(data['height(cm)'])
print(heights)

[189 170 189 163 183 171 185 168 173 183 173 173 175 178 183 193 178 173
 174 183 183 168 170 178 182 180 183 178 182 188 175 179 183 193 182 183
 177 185 188 188 182 185]

In [15]:

print("Mean height:       ", heights.mean())
print("Standard deviation:", heights.std())
print("Minimum height:    ", heights.min())
print("Maximum height:    ", heights.max())

Mean height:        179.738095238
Standard deviation: 6.93184344275
Minimum height:     163
Maximum height:     193

In [16]:

print("25th percentile:   ", np.percentile(heights, 25))
print("Median:            ", np.median(heights))
print("75th percentile:   ", np.percentile(heights, 75))

25th percentile:    174.25
Median:             182.0
75th percentile:    183.0

In [17]:

%matplotlib inline
import matplotlib.pyplot as plt
import seaborn; seaborn.set()  # set plot style

In [18]:

plt.hist(heights)
plt.title('Height Distribution of US Presidents')
plt.xlabel('height (cm)')
plt.ylabel('number');

< Cálculos con Arrays 1. Ufuncs | Contenido | Cálculos con Arrays 2. Broadcasting >