Datos faltantes
Datos faltantes en Pandas¶
None
¶
In [1]:
import numpy as np
import pandas as pd
In [2]:
vals1 = np.array([1, None, 3, 4])
vals1
Out[2]:
In [3]:
for dtype in ['object', 'int']:
print("dtype =", dtype)
%timeit np.arange(1E6, dtype=dtype).sum()
print()
In [4]:
vals1.sum()
NaN
¶
In [5]:
vals2 = np.array([1, np.nan, 3, 4])
vals2.dtype
Out[5]:
In [6]:
1 + np.nan
Out[6]:
In [7]:
0 * np.nan
Out[7]:
In [8]:
vals2.sum(), vals2.min(), vals2.max()
Out[8]:
In [9]:
np.nansum(vals2), np.nanmin(vals2), np.nanmax(vals2)
Out[9]:
NaN y None en Pandas¶
In [10]:
pd.Series([1, np.nan, 2, None])
Out[10]:
In [11]:
x = pd.Series(range(2), dtype=int)
x
Out[11]:
In [12]:
x[0] = None
x
Out[12]:
Typeclass | Conversion When Storing NAs | NA Sentinel Value |
---|---|---|
floating |
No change | np.nan |
object |
No change | None or np.nan |
integer |
Cast to float64 |
np.nan |
boolean |
Cast to object |
None or np.nan |
Keep in mind that in Pandas, string data is always stored with an object
dtype.
Operando con valores Null¶
isnull()
: Generate a boolean mask indicating missing valuesnotnull()
: Opposite ofisnull()
dropna()
: Return a filtered version of the datafillna()
: Return a copy of the data with missing values filled or imputed
Detectando valores null¶
In [13]:
data = pd.Series([1, np.nan, 'hello', None])
In [14]:
data.isnull()
Out[14]:
In [15]:
data[data.notnull()]
Out[15]:
Eliminando/ignorando valores null¶
In [16]:
data.dropna()
Out[16]:
In [17]:
df = pd.DataFrame([[1, np.nan, 2],
[2, 3, 5],
[np.nan, 4, 6]])
df
Out[17]:
In [18]:
df.dropna()
Out[18]:
In [19]:
df.dropna(axis='columns')
Out[19]:
In [20]:
df[3] = np.nan
df
Out[20]:
In [21]:
df.dropna(axis='columns', how='all')
Out[21]:
In [22]:
df.dropna(axis='rows', thresh=3)
Out[22]:
Llenando valores null¶
In [23]:
data = pd.Series([1, np.nan, 2, None, 3], index=list('abcde'))
data
Out[23]:
In [24]:
data.fillna(0)
Out[24]:
In [25]:
# forward-fill
data.fillna(method='ffill')
Out[25]:
In [26]:
# back-fill
data.fillna(method='bfill')
Out[26]:
In [27]:
df
Out[27]:
In [28]:
df.fillna(method='ffill', axis=1)
Out[28]: