This is an excerpt from the Python Data Science Handbook by Jake VanderPlas; Jupyter notebooks are available on GitHub.

The text is released under the CC-BY-NC-ND license, and code is released under the MIT license. If you find this content useful, please consider supporting the work by buying the book!

Comparaciones, máscaras y lógica booleana

Ejemplo: Contando días

In [1]:
import numpy as np
import pandas as pd

# use pandas to extract rainfall inches as a NumPy array
rainfall = pd.read_csv('data/Seattle2014.csv')['PRCP'].values
inches = rainfall / 254.0  # 1/10mm -> inches
inches.shape
Out[1]:
(365,)
In [2]:
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn; seaborn.set()  # set plot styles
In [3]:
plt.hist(inches, 40);

Operadores de comparación como ufuncs

  • En Cálculos con Arrays 1. Ufuncs introducimos ufuncs en particular operadores aritméticos +, -, *, /, y otros

  • NumPy implementa operadores de comparación < (menor que) y > (mayor que)

In [4]:
x = np.array([1, 2, 3, 4, 5])
In [5]:
x < 3  # less than
Out[5]:
array([ True,  True, False, False, False], dtype=bool)
In [6]:
x > 3  # greater than
Out[6]:
array([False, False, False,  True,  True], dtype=bool)
In [7]:
x <= 3  # less than or equal
Out[7]:
array([ True,  True,  True, False, False], dtype=bool)
In [8]:
x >= 3  # greater than or equal
Out[8]:
array([False, False,  True,  True,  True], dtype=bool)
In [9]:
x != 3  # not equal
Out[9]:
array([ True,  True, False,  True,  True], dtype=bool)
In [10]:
x == 3  # equal
Out[10]:
array([False, False,  True, False, False], dtype=bool)
In [11]:
(2 * x) == (x ** 2)
Out[11]:
array([False,  True, False, False, False], dtype=bool)

As in the case of arithmetic operators, the comparison operators are implemented as ufuncs in NumPy; for example, when you write x < 3, internally NumPy uses np.less(x, 3). A summary of the comparison operators and their equivalent ufunc is shown here:

Operator Equivalent ufunc Operator Equivalent ufunc
== np.equal != np.not_equal
< np.less <= np.less_equal
> np.greater >= np.greater_equal
In [12]:
rng = np.random.RandomState(0)
x = rng.randint(10, size=(3, 4))
x
Out[12]:
array([[5, 0, 3, 3],
       [7, 9, 3, 5],
       [2, 4, 7, 6]])
In [13]:
x < 6
Out[13]:
array([[ True,  True,  True,  True],
       [False, False,  True,  True],
       [ True,  True, False, False]], dtype=bool)

Usando Arrays Booleanos

In [14]:
print(x)
[[5 0 3 3]
 [7 9 3 5]
 [2 4 7 6]]

Contando entradas

In [15]:
# how many values less than 6?
np.count_nonzero(x < 6)
Out[15]:
8
In [16]:
np.sum(x < 6)
Out[16]:
8
In [17]:
# how many values less than 6 in each row?
np.sum(x < 6, axis=1)
Out[17]:
array([4, 2, 2])
In [18]:
# are there any values greater than 8?
np.any(x > 8)
Out[18]:
True
In [19]:
# are there any values less than zero?
np.any(x < 0)
Out[19]:
False
In [20]:
# are all values less than 10?
np.all(x < 10)
Out[20]:
True
In [21]:
# are all values equal to 6?
np.all(x == 6)
Out[21]:
False

np.all and np.any can be used along particular axes as well. For example:

In [22]:
# are all values in each row less than 8?
np.all(x < 8, axis=1)
Out[22]:
array([ True, False,  True], dtype=bool)

Operadores Booleanos

In [23]:
np.sum((inches > 0.5) & (inches < 1))
Out[23]:
29
In [24]:
np.sum(~( (inches <= 0.5) | (inches >= 1) ))
Out[24]:
29
Operator Equivalent ufunc Operator Equivalent ufunc
& np.bitwise_and | np.bitwise_or
^ np.bitwise_xor ~ np.bitwise_not
In [25]:
print("Number days without rain:      ", np.sum(inches == 0))
print("Number days with rain:         ", np.sum(inches != 0))
print("Days with more than 0.5 inches:", np.sum(inches > 0.5))
print("Rainy days with < 0.2 inches  :", np.sum((inches > 0) &
                                                (inches < 0.2)))
Number days without rain:       215
Number days with rain:          150
Days with more than 0.5 inches: 37
Rainy days with < 0.2 inches  : 75

Máscaras de Arrays Booleanos

In [26]:
x
Out[26]:
array([[5, 0, 3, 3],
       [7, 9, 3, 5],
       [2, 4, 7, 6]])
In [27]:
x < 5
Out[27]:
array([[False,  True,  True,  True],
       [False, False,  True, False],
       [ True,  True, False, False]], dtype=bool)
In [28]:
x[x < 5]
Out[28]:
array([0, 3, 3, 3, 2, 4])
In [29]:
# construct a mask of all rainy days
rainy = (inches > 0)

# construct a mask of all summer days (June 21st is the 172nd day)
days = np.arange(365)
summer = (days > 172) & (days < 262)

print("Median precip on rainy days in 2014 (inches):   ",
      np.median(inches[rainy]))
print("Median precip on summer days in 2014 (inches):  ",
      np.median(inches[summer]))
print("Maximum precip on summer days in 2014 (inches): ",
      np.max(inches[summer]))
print("Median precip on non-summer rainy days (inches):",
      np.median(inches[rainy & ~summer]))
Median precip on rainy days in 2014 (inches):    0.194881889764
Median precip on summer days in 2014 (inches):   0.0
Maximum precip on summer days in 2014 (inches):  0.850393700787
Median precip on non-summer rainy days (inches): 0.200787401575

Usar las palabras reservadas and/or contra usar los operadores &/|

In [30]:
bool(42), bool(0)
Out[30]:
(True, False)
In [31]:
bool(42 and 0)
Out[31]:
False
In [32]:
bool(42 or 0)
Out[32]:
True
In [33]:
bin(42)
Out[33]:
'0b101010'
In [34]:
bin(59)
Out[34]:
'0b111011'
In [35]:
bin(42 & 59)
Out[35]:
'0b101010'
In [36]:
bin(42 | 59)
Out[36]:
'0b111011'
In [37]:
A = np.array([1, 0, 1, 0, 1, 0], dtype=bool)
B = np.array([1, 1, 1, 0, 1, 1], dtype=bool)
A | B
Out[37]:
array([ True,  True,  True, False,  True,  True], dtype=bool)
In [38]:
A or B
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-38-5d8e4f2e21c0> in <module>()
----> 1 A or B

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
In [39]:
x = np.arange(10)
(x > 4) & (x < 8)
Out[39]:
array([False, False, False, False, False,  True,  True,  True, False, False], dtype=bool)
In [40]:
(x > 4) and (x < 8)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-40-3d24f1ffd63d> in <module>()
----> 1 (x > 4) and (x < 8)

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()