< Manipulación de datos con Pandas | Contenido | Indices y selección de datos >

In [1]:

import numpy as np
import pandas as pd

El objeto `Series` de Pandas¶

In [2]:

data = pd.Series([0.25, 0.5, 0.75, 1.0])
data

Out[2]:

0    0.25
1    0.50
2    0.75
3    1.00
dtype: float64

In [3]:

data.values

Out[3]:

array([ 0.25,  0.5 ,  0.75,  1.  ])

In [4]:

data.index

Out[4]:

RangeIndex(start=0, stop=4, step=1)

In [5]:

data[1]

Out[5]:

0.5

In [6]:

data[1:3]

Out[6]:

1    0.50
2    0.75
dtype: float64

`Series` es una generalización de NumPy array¶

In [7]:

data = pd.Series([0.25, 0.5, 0.75, 1.0],
                 index=['a', 'b', 'c', 'd'])
data

Out[7]:

a    0.25
b    0.50
c    0.75
d    1.00
dtype: float64

In [8]:

data['b']

Out[8]:

0.5

In [9]:

data = pd.Series([0.25, 0.5, 0.75, 1.0],
                 index=[2, 5, 3, 7])
data

Out[9]:

2    0.25
5    0.50
3    0.75
7    1.00
dtype: float64

In [10]:

data[5]

Out[10]:

0.5

`Series` como diccionarios especializados¶

In [11]:

population_dict = {'California': 38332521,
                   'Texas': 26448193,
                   'New York': 19651127,
                   'Florida': 19552860,
                   'Illinois': 12882135}
population = pd.Series(population_dict)
population

Out[11]:

California    38332521
Florida       19552860
Illinois      12882135
New York      19651127
Texas         26448193
dtype: int64

In [12]:

population['California']

Out[12]:

38332521

In [13]:

population['California':'Illinois']

Out[13]:

California    38332521
Florida       19552860
Illinois      12882135
dtype: int64

Construcción de un objeto `Series`¶

>>> pd.Series(data, index=index)

In [14]:

pd.Series([2, 4, 6])

Out[14]:

0    2
1    4
2    6
dtype: int64

In [15]:

pd.Series(5, index=[100, 200, 300])

Out[15]:

100    5
200    5
300    5
dtype: int64

In [16]:

pd.Series({2:'a', 1:'b', 3:'c'})

Out[16]:

1    b
2    a
3    c
dtype: object

In [17]:

pd.Series({2:'a', 1:'b', 3:'c'}, index=[3, 2])

Out[17]:

3    c
2    a
dtype: object

El objeto `DataFrame` de Pandas¶

`DataFrame` es una generalización de NumPy array¶

In [18]:

area_dict = {'California': 423967, 'Texas': 695662, 'New York': 141297,
             'Florida': 170312, 'Illinois': 149995}
area = pd.Series(area_dict)
area

Out[18]:

California    423967
Florida       170312
Illinois      149995
New York      141297
Texas         695662
dtype: int64

In [19]:

states = pd.DataFrame({'population': population,
                       'area': area})
states

Out[19]:

	area	population
California	423967	38332521
Florida	170312	19552860
Illinois	149995	12882135
New York	141297	19651127
Texas	695662	26448193

In [20]:

states.index

Out[20]:

Index(['California', 'Florida', 'Illinois', 'New York', 'Texas'], dtype='object')

In [21]:

states.columns

Out[21]:

Index(['area', 'population'], dtype='object')

`DataFrame` como un diccionario especializado¶

In [22]:

states['area']

Out[22]:

California    423967
Florida       170312
Illinois      149995
New York      141297
Texas         695662
Name: area, dtype: int64

Construcción de objetos `DataFrame`¶

In [23]:

pd.DataFrame(population, columns=['population'])

Out[23]:

	population
California	38332521
Florida	19552860
Illinois	12882135
New York	19651127
Texas	26448193

In [24]:

data = [{'a': i, 'b': 2 * i}
        for i in range(3)]
pd.DataFrame(data)

Out[24]:

	a	b
0	0	0
1	1	2
2	2	4

In [25]:

pd.DataFrame([{'a': 1, 'b': 2}, {'b': 3, 'c': 4}])

Out[25]:

	a	b	c
0	1.0	2	NaN
1	NaN	3	4.0

In [26]:

pd.DataFrame({'population': population,
              'area': area})

Out[26]:

	area	population
California	423967	38332521
Florida	170312	19552860
Illinois	149995	12882135
New York	141297	19651127
Texas	695662	26448193

In [27]:

pd.DataFrame(np.random.rand(3, 2),
             columns=['foo', 'bar'],
             index=['a', 'b', 'c'])

Out[27]:

	foo	bar
a	0.865257	0.213169
b	0.442759	0.108267
c	0.047110	0.905718

In [28]:

A = np.zeros(3, dtype=[('A', 'i8'), ('B', 'f8')])
A

Out[28]:

array([(0, 0.0), (0, 0.0), (0, 0.0)], 
      dtype=[('A', '<i8'), ('B', '<f8')])

In [29]:

pd.DataFrame(A)

Out[29]:

	A	B
0	0	0.0
1	0	0.0
2	0	0.0

El objeto `Index` de Pandas¶

In [30]:

ind = pd.Index([2, 3, 5, 7, 11])
ind

Out[30]:

Int64Index([2, 3, 5, 7, 11], dtype='int64')

`Index` en un array immutable¶

In [31]:

ind[1]

Out[31]:

In [32]:

ind[::2]

Out[32]:

Int64Index([2, 5, 11], dtype='int64')

In [33]:

print(ind.size, ind.shape, ind.ndim, ind.dtype)

5 (5,) 1 int64

In [34]:

ind[1] = 0

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-34-40e631c82e8a> in <module>()
----> 1 ind[1] = 0

/Users/jakevdp/anaconda/lib/python3.5/site-packages/pandas/indexes/base.py in __setitem__(self, key, value)
   1243 
   1244     def __setitem__(self, key, value):
-> 1245         raise TypeError("Index does not support mutable operations")
   1246 
   1247     def __getitem__(self, key):

TypeError: Index does not support mutable operations

`Index` como conjunto ordenado¶

In [35]:

indA = pd.Index([1, 3, 5, 7, 9])
indB = pd.Index([2, 3, 5, 7, 11])

In [36]:

indA & indB  # intersection

Out[36]:

Int64Index([3, 5, 7], dtype='int64')

In [37]:

indA | indB  # union

Out[37]:

Int64Index([1, 2, 3, 5, 7, 9, 11], dtype='int64')

In [38]:

indA ^ indB  # symmetric difference

Out[38]:

Int64Index([1, 2, 9, 11], dtype='int64')

< Manipulación de datos con Pandas | Contenido | Indices y selección de datos >

El objeto Series de Pandas¶

Series es una generalización de NumPy array¶

Series como diccionarios especializados¶

Construcción de un objeto Series¶

El objeto DataFrame de Pandas¶

DataFrame es una generalización de NumPy array¶

DataFrame como un diccionario especializado¶

Construcción de objetos DataFrame¶

El objeto Index de Pandas¶

Index en un array immutable¶

Index como conjunto ordenado¶