This is an excerpt from the Python Data Science Handbook by Jake VanderPlas; Jupyter notebooks are available on GitHub.

The text is released under the CC-BY-NC-ND license, and code is released under the MIT license. If you find this content useful, please consider supporting the work by buying the book!

Datos estructurados en NumPy

In [1]:
import numpy as np
In [2]:
name = ['Alice', 'Bob', 'Cathy', 'Doug']
age = [25, 45, 37, 19]
weight = [55.0, 85.5, 68.0, 61.5]
In [3]:
x = np.zeros(4, dtype=int)
In [4]:
# Use a compound data type for structured arrays
data = np.zeros(4, dtype={'names':('name', 'age', 'weight'),
                          'formats':('U10', 'i4', 'f8')})
print(data.dtype)
[('name', '<U10'), ('age', '<i4'), ('weight', '<f8')]
In [5]:
data['name'] = name
data['age'] = age
data['weight'] = weight
print(data)
[('Alice', 25, 55.0) ('Bob', 45, 85.5) ('Cathy', 37, 68.0)
 ('Doug', 19, 61.5)]
In [6]:
# Get all names
data['name']
Out[6]:
array(['Alice', 'Bob', 'Cathy', 'Doug'], 
      dtype='<U10')
In [7]:
# Get first row of data
data[0]
Out[7]:
('Alice', 25, 55.0)
In [8]:
# Get the name from the last row
data[-1]['name']
Out[8]:
'Doug'
In [9]:
# Get names where age is under 30
data[data['age'] < 30]['name']
Out[9]:
array(['Alice', 'Doug'], 
      dtype='<U10')

Creando Arrays Estructurados

In [10]:
np.dtype({'names':('name', 'age', 'weight'),
          'formats':('U10', 'i4', 'f8')})
Out[10]:
dtype([('name', '<U10'), ('age', '<i4'), ('weight', '<f8')])
In [11]:
np.dtype({'names':('name', 'age', 'weight'),
          'formats':((np.str_, 10), int, np.float32)})
Out[11]:
dtype([('name', '<U10'), ('age', '<i8'), ('weight', '<f4')])
In [12]:
np.dtype([('name', 'S10'), ('age', 'i4'), ('weight', 'f8')])
Out[12]:
dtype([('name', 'S10'), ('age', '<i4'), ('weight', '<f8')])
In [13]:
np.dtype('S10,i4,f8')
Out[13]:
dtype([('f0', 'S10'), ('f1', '<i4'), ('f2', '<f8')])
Character Description Example
'b' Byte np.dtype('b')
'i' Signed integer np.dtype('i4') == np.int32
'u' Unsigned integer np.dtype('u1') == np.uint8
'f' Floating point np.dtype('f8') == np.int64
'c' Complex floating point np.dtype('c16') == np.complex128
'S', 'a' String np.dtype('S5')
'U' Unicode string np.dtype('U') == np.str_
'V' Raw data (void) np.dtype('V') == np.void

Tipos Compuestos mas avanzados

In [14]:
tp = np.dtype([('id', 'i8'), ('mat', 'f8', (3, 3))])
X = np.zeros(1, dtype=tp)
print(X[0])
print(X['mat'][0])
(0, [[0.0, 0.0, 0.0], [0.0, 0.0, 0.0], [0.0, 0.0, 0.0]])
[[ 0.  0.  0.]
 [ 0.  0.  0.]
 [ 0.  0.  0.]]

Record Arrays

In [15]:
data['age']
Out[15]:
array([25, 45, 37, 19], dtype=int32)
In [16]:
data_rec = data.view(np.recarray)
data_rec.age
Out[16]:
array([25, 45, 37, 19], dtype=int32)
In [17]:
%timeit data['age']
%timeit data_rec['age']
%timeit data_rec.age
1000000 loops, best of 3: 241 ns per loop
100000 loops, best of 3: 4.61 µs per loop
100000 loops, best of 3: 7.27 µs per loop

A Pandas

Para el dia a dia el paquete Pandas es mucho mejor