This is an excerpt from the Python Data Science Handbook by Jake VanderPlas; Jupyter notebooks are available on GitHub.

The text is released under the CC-BY-NC-ND license, and code is released under the MIT license. If you find this content useful, please consider supporting the work by buying the book!

Indexado elegante

Explorando indexados elegantes

In [1]:
import numpy as np
rand = np.random.RandomState(42)

x = rand.randint(100, size=10)
print(x)
[51 92 14 71 60 20 82 86 74 74]
In [2]:
[x[3], x[7], x[2]]
Out[2]:
[71, 86, 14]
In [3]:
ind = [3, 7, 4]
x[ind]
Out[3]:
array([71, 86, 60])
In [4]:
ind = np.array([[3, 7],
                [4, 5]])
x[ind]
Out[4]:
array([[71, 86],
       [60, 20]])
In [5]:
X = np.arange(12).reshape((3, 4))
X
Out[5]:
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])
In [6]:
row = np.array([0, 1, 2])
col = np.array([2, 1, 3])
X[row, col]
Out[6]:
array([ 2,  5, 11])
In [7]:
X[row[:, np.newaxis], col]
Out[7]:
array([[ 2,  1,  3],
       [ 6,  5,  7],
       [10,  9, 11]])
In [8]:
row[:, np.newaxis] * col
Out[8]:
array([[0, 0, 0],
       [2, 1, 3],
       [4, 2, 6]])

Indexado combinado

In [9]:
print(X)
[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]
In [10]:
X[2, [2, 0, 1]]
Out[10]:
array([10,  8,  9])
In [11]:
X[1:, [2, 0, 1]]
Out[11]:
array([[ 6,  4,  5],
       [10,  8,  9]])
In [12]:
mask = np.array([1, 0, 1, 0], dtype=bool)
X[row[:, np.newaxis], mask]
Out[12]:
array([[ 0,  2],
       [ 4,  6],
       [ 8, 10]])

Ejemplo: Selección de puntos aleatorios

In [13]:
mean = [0, 0]
cov = [[1, 2],
       [2, 5]]
X = rand.multivariate_normal(mean, cov, 100)
X.shape
Out[13]:
(100, 2)
In [14]:
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn; seaborn.set()  # for plot styling

plt.scatter(X[:, 0], X[:, 1]);
In [15]:
indices = np.random.choice(X.shape[0], 20, replace=False)
indices
Out[15]:
array([93, 45, 73, 81, 50, 10, 98, 94,  4, 64, 65, 89, 47, 84, 82, 80, 25,
       90, 63, 20])
In [16]:
selection = X[indices]  # fancy indexing here
selection.shape
Out[16]:
(20, 2)
In [17]:
plt.scatter(X[:, 0], X[:, 1], alpha=0.3)
plt.scatter(selection[:, 0], selection[:, 1],
            facecolor='none', s=200);

Modificando valores con indexado elegante

In [18]:
x = np.arange(10)
i = np.array([2, 1, 8, 4])
x[i] = 99
print(x)
[ 0 99 99  3 99  5  6  7 99  9]
In [19]:
x[i] -= 10
print(x)
[ 0 89 89  3 89  5  6  7 89  9]
In [20]:
x = np.zeros(10)
x[[0, 0]] = [4, 6]
print(x)
[ 6.  0.  0.  0.  0.  0.  0.  0.  0.  0.]
In [21]:
i = [2, 3, 3, 4, 4, 4]
x[i] += 1
x
Out[21]:
array([ 6.,  0.,  1.,  1.,  1.,  0.,  0.,  0.,  0.,  0.])
In [22]:
x = np.zeros(10)
np.add.at(x, i, 1)
print(x)
[ 0.  0.  1.  2.  3.  0.  0.  0.  0.  0.]

Ejemplo: Separando datos en compartimentos (bins)

In [23]:
np.random.seed(42)
x = np.random.randn(100)

# compute a histogram by hand
bins = np.linspace(-5, 5, 20)
counts = np.zeros_like(bins)

# find the appropriate bin for each x
i = np.searchsorted(bins, x)

# add 1 to each of these bins
np.add.at(counts, i, 1)
In [24]:
# plot the results
plt.plot(bins, counts, linestyle='steps');
plt.hist(x, bins, histtype='step');
In [25]:
print("NumPy routine:")
%timeit counts, edges = np.histogram(x, bins)

print("Custom routine:")
%timeit np.add.at(counts, np.searchsorted(bins, x), 1)
NumPy routine:
10000 loops, best of 3: 97.6 µs per loop
Custom routine:
10000 loops, best of 3: 19.5 µs per loop
In [26]:
x = np.random.randn(1000000)
print("NumPy routine:")
%timeit counts, edges = np.histogram(x, bins)

print("Custom routine:")
%timeit np.add.at(counts, np.searchsorted(bins, x), 1)
NumPy routine:
10 loops, best of 3: 68.7 ms per loop
Custom routine:
10 loops, best of 3: 135 ms per loop