This is an excerpt from the Python Data Science Handbook by Jake VanderPlas; Jupyter notebooks are available on GitHub.

The text is released under the CC-BY-NC-ND license, and code is released under the MIT license. If you find this content useful, please consider supporting the work by buying the book!

Operaciones sobre datos en Pandas

Ufuncs: preservacion de los índices

In [1]:
import pandas as pd
import numpy as np
In [2]:
rng = np.random.RandomState(42)
ser = pd.Series(rng.randint(0, 10, 4))
ser
Out[2]:
0    6
1    3
2    7
3    4
dtype: int64
In [3]:
df = pd.DataFrame(rng.randint(0, 10, (3, 4)),
                  columns=['A', 'B', 'C', 'D'])
df
Out[3]:
A B C D
0 6 9 2 6
1 7 4 3 7
2 7 2 5 4
In [4]:
np.exp(ser)
Out[4]:
0     403.428793
1      20.085537
2    1096.633158
3      54.598150
dtype: float64
In [5]:
np.sin(df * np.pi / 4)
Out[5]:
A B C D
0 -1.000000 7.071068e-01 1.000000 -1.000000e+00
1 -0.707107 1.224647e-16 0.707107 -7.071068e-01
2 -0.707107 1.000000e+00 -0.707107 1.224647e-16

UFuncs: alineación de los índices

en Series

In [6]:
area = pd.Series({'Alaska': 1723337, 'Texas': 695662,
                  'California': 423967}, name='area')
population = pd.Series({'California': 38332521, 'Texas': 26448193,
                        'New York': 19651127}, name='population')

Let's see what happens when we divide these to compute the population density:

In [7]:
population / area
Out[7]:
Alaska              NaN
California    90.413926
New York            NaN
Texas         38.018740
dtype: float64
In [8]:
area.index | population.index
Out[8]:
Index(['Alaska', 'California', 'New York', 'Texas'], dtype='object')
In [9]:
A = pd.Series([2, 4, 6], index=[0, 1, 2])
B = pd.Series([1, 3, 5], index=[1, 2, 3])
A + B
Out[9]:
0    NaN
1    5.0
2    9.0
3    NaN
dtype: float64
In [10]:
A.add(B, fill_value=0)
Out[10]:
0    2.0
1    5.0
2    9.0
3    5.0
dtype: float64

en DataFrame

In [11]:
A = pd.DataFrame(rng.randint(0, 20, (2, 2)),
                 columns=list('AB'))
A
Out[11]:
A B
0 1 11
1 5 1
In [12]:
B = pd.DataFrame(rng.randint(0, 10, (3, 3)),
                 columns=list('BAC'))
B
Out[12]:
B A C
0 4 0 9
1 5 8 0
2 9 2 6
In [13]:
A + B
Out[13]:
A B C
0 1.0 15.0 NaN
1 13.0 6.0 NaN
2 NaN NaN NaN
In [14]:
fill = A.stack().mean()
A.add(B, fill_value=fill)
Out[14]:
A B C
0 1.0 15.0 13.5
1 13.0 6.0 4.5
2 6.5 13.5 10.5
Python Operator Pandas Method(s)
+ add()
- sub(), subtract()
* mul(), multiply()
/ truediv(), div(), divide()
// floordiv()
% mod()
** pow()

Ufuncs: Operaciones entre DataFrame y Series

In [15]:
A = rng.randint(10, size=(3, 4))
A
Out[15]:
array([[3, 8, 2, 4],
       [2, 6, 4, 8],
       [6, 1, 3, 8]])
In [16]:
A - A[0]
Out[16]:
array([[ 0,  0,  0,  0],
       [-1, -2,  2,  4],
       [ 3, -7,  1,  4]])
In [17]:
df = pd.DataFrame(A, columns=list('QRST'))
df - df.iloc[0]
Out[17]:
Q R S T
0 0 0 0 0
1 -1 -2 2 4
2 3 -7 1 4
In [18]:
df.subtract(df['R'], axis=0)
Out[18]:
Q R S T
0 -5 0 -6 -4
1 -4 0 -2 2
2 5 0 2 7
In [19]:
halfrow = df.iloc[0, ::2]
halfrow
Out[19]:
Q    3
S    2
Name: 0, dtype: int64
In [20]:
df - halfrow
Out[20]:
Q R S T
0 0.0 NaN 0.0 NaN
1 -1.0 NaN 2.0 NaN
2 3.0 NaN 1.0 NaN