Series de tiempo
Fechas y horas: representación del tiempo en Python¶
Python natico: datetime
y dateutil
¶
In [1]:
from datetime import datetime
datetime(year=2015, month=7, day=4)
Out[1]:
In [2]:
from dateutil import parser
date = parser.parse("4th of July, 2015")
date
Out[2]:
In [3]:
date.strftime('%A')
Out[3]:
Arrays de tipo tiempo: datetime64
de NumPy¶
In [4]:
import numpy as np
date = np.array('2015-07-04', dtype=np.datetime64)
date
Out[4]:
In [5]:
date + np.arange(12)
Out[5]:
In [6]:
np.datetime64('2015-07-04')
Out[6]:
In [7]:
np.datetime64('2015-07-04 12:00')
Out[7]:
In [8]:
np.datetime64('2015-07-04 12:59:59.50', 'ns')
Out[8]:
Code | Meaning | Time span (relative) | Time span (absolute) |
---|---|---|---|
Y |
Year | ± 9.2e18 years | [9.2e18 BC, 9.2e18 AD] |
M |
Month | ± 7.6e17 years | [7.6e17 BC, 7.6e17 AD] |
W |
Week | ± 1.7e17 years | [1.7e17 BC, 1.7e17 AD] |
D |
Day | ± 2.5e16 years | [2.5e16 BC, 2.5e16 AD] |
h |
Hour | ± 1.0e15 years | [1.0e15 BC, 1.0e15 AD] |
m |
Minute | ± 1.7e13 years | [1.7e13 BC, 1.7e13 AD] |
s |
Second | ± 2.9e12 years | [ 2.9e9 BC, 2.9e9 AD] |
ms |
Millisecond | ± 2.9e9 years | [ 2.9e6 BC, 2.9e6 AD] |
us |
Microsecond | ± 2.9e6 years | [290301 BC, 294241 AD] |
ns |
Nanosecond | ± 292 years | [ 1678 AD, 2262 AD] |
ps |
Picosecond | ± 106 days | [ 1969 AD, 1970 AD] |
fs |
Femtosecond | ± 2.6 hours | [ 1969 AD, 1970 AD] |
as |
Attosecond | ± 9.2 seconds | [ 1969 AD, 1970 AD] |
Fechas y horas en Pandas: lo mejor de los dos mundos¶
In [9]:
import pandas as pd
date = pd.to_datetime("4th of July, 2015")
date
Out[9]:
In [10]:
date.strftime('%A')
Out[10]:
In [11]:
date + pd.to_timedelta(np.arange(12), 'D')
Out[11]:
Series de Tiempo en Pandas: Indexado por el tiempo¶
In [12]:
index = pd.DatetimeIndex(['2014-07-04', '2014-08-04',
'2015-07-04', '2015-08-04'])
data = pd.Series([0, 1, 2, 3], index=index)
data
Out[12]:
In [13]:
data['2014-07-04':'2015-07-04']
Out[13]:
In [14]:
data['2015']
Out[14]:
Estructuras de Series de tiempo en Pandas¶
- For time stamps, Pandas provides the
Timestamp
type. As mentioned before, it is essentially a replacement for Python's nativedatetime
, but is based on the more efficientnumpy.datetime64
data type. The associated Index structure isDatetimeIndex
. - For time Periods, Pandas provides the
Period
type. This encodes a fixed-frequency interval based onnumpy.datetime64
. The associated index structure isPeriodIndex
. - For time deltas or durations, Pandas provides the
Timedelta
type.Timedelta
is a more efficient replacement for Python's nativedatetime.timedelta
type, and is based onnumpy.timedelta64
. The associated index structure isTimedeltaIndex
.
In [15]:
dates = pd.to_datetime([datetime(2015, 7, 3), '4th of July, 2015',
'2015-Jul-6', '07-07-2015', '20150708'])
dates
Out[15]:
In [16]:
dates.to_period('D')
Out[16]:
In [17]:
dates - dates[0]
Out[17]:
Sucesiones regulares: pd.date_range()
¶
In [18]:
pd.date_range('2015-07-03', '2015-07-10')
Out[18]:
In [19]:
pd.date_range('2015-07-03', periods=8)
Out[19]:
In [20]:
pd.date_range('2015-07-03', periods=8, freq='H')
Out[20]:
In [21]:
pd.period_range('2015-07', periods=8, freq='M')
Out[21]:
In [22]:
pd.timedelta_range(0, periods=10, freq='H')
Out[22]:
Frecuencias e intervalos (offsets)¶
Code | Description | Code | Description |
---|---|---|---|
D |
Calendar day | B |
Business day |
W |
Weekly | ||
M |
Month end | BM |
Business month end |
Q |
Quarter end | BQ |
Business quarter end |
A |
Year end | BA |
Business year end |
H |
Hours | BH |
Business hours |
T |
Minutes | ||
S |
Seconds | ||
L |
Milliseonds | ||
U |
Microseconds | ||
N |
nanoseconds |
Code | Description | Code | Description |
---|---|---|---|
MS |
Month start | BMS |
Business month start |
QS |
Quarter start | BQS |
Business quarter start |
AS |
Year start | BAS |
Business year start |
Q-JAN
,BQ-FEB
,QS-MAR
,BQS-APR
, etc.A-JAN
,BA-FEB
,AS-MAR
,BAS-APR
, etc.
W-SUN
,W-MON
,W-TUE
,W-WED
, etc.
In [23]:
pd.timedelta_range(0, periods=9, freq="2H30T")
Out[23]:
In [24]:
from pandas.tseries.offsets import BDay
pd.date_range('2015-07-01', periods=5, freq=BDay())
Out[24]:
Muestreo, Cambios y Ventanas (resample, shift and windows)¶
In [25]:
from pandas_datareader import data
goog = data.DataReader('GOOG', start='2004', end='2016',
data_source='google')
goog.head()
Out[25]:
In [26]:
goog = goog['Close']
In [27]:
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn; seaborn.set()
In [28]:
goog.plot();
Muestres y conversión de frecuencias¶
In [29]:
goog.plot(alpha=0.5, style='-')
goog.resample('BA').mean().plot(style=':')
goog.asfreq('BA').plot(style='--');
plt.legend(['input', 'resample', 'asfreq'],
loc='upper left');
In [30]:
fig, ax = plt.subplots(2, sharex=True)
data = goog.iloc[:10]
data.asfreq('D').plot(ax=ax[0], marker='o')
data.asfreq('D', method='bfill').plot(ax=ax[1], style='-o')
data.asfreq('D', method='ffill').plot(ax=ax[1], style='--o')
ax[1].legend(["back-fill", "forward-fill"]);
Cambios de tiempos (time-shifts)¶
In [31]:
fig, ax = plt.subplots(3, sharey=True)
# apply a frequency to the data
goog = goog.asfreq('D', method='pad')
goog.plot(ax=ax[0])
goog.shift(900).plot(ax=ax[1])
goog.tshift(900).plot(ax=ax[2])
# legends and annotations
local_max = pd.to_datetime('2007-11-05')
offset = pd.Timedelta(900, 'D')
ax[0].legend(['input'], loc=2)
ax[0].get_xticklabels()[2].set(weight='heavy', color='red')
ax[0].axvline(local_max, alpha=0.3, color='red')
ax[1].legend(['shift(900)'], loc=2)
ax[1].get_xticklabels()[2].set(weight='heavy', color='red')
ax[1].axvline(local_max + offset, alpha=0.3, color='red')
ax[2].legend(['tshift(900)'], loc=2)
ax[2].get_xticklabels()[1].set(weight='heavy', color='red')
ax[2].axvline(local_max + offset, alpha=0.3, color='red');
In [32]:
ROI = 100 * (goog.tshift(-365) / goog - 1)
ROI.plot()
plt.ylabel('% Return on Investment');
Ventanas móviles (rolling windows)¶
In [33]:
rolling = goog.rolling(365, center=True)
data = pd.DataFrame({'input': goog,
'one-year rolling_mean': rolling.mean(),
'one-year rolling_std': rolling.std()})
ax = data.plot(style=['-', '--', ':'])
ax.lines[0].set_alpha(0.3)
Información adicional¶
- Referirse a la sección "Time Series/Date" de la documentación.
Ejemplo: Visualización del número de bicicletas en Seattle¶
In [34]:
# !curl -o FremontBridge.csv https://data.seattle.gov/api/views/65db-xm6k/rows.csv?accessType=DOWNLOAD
In [35]:
data = pd.read_csv('FremontBridge.csv', index_col='Date', parse_dates=True)
data.head()
Out[35]:
In [36]:
data.columns = ['West', 'East']
data['Total'] = data.eval('West + East')
In [37]:
data.dropna().describe()
Out[37]:
Visualizando los datos¶
In [38]:
%matplotlib inline
import seaborn; seaborn.set()
In [39]:
data.plot()
plt.ylabel('Hourly Bicycle Count');
In [40]:
weekly = data.resample('W').sum()
weekly.plot(style=[':', '--', '-'])
plt.ylabel('Weekly bicycle count');
In [41]:
daily = data.resample('D').sum()
daily.rolling(30, center=True).sum().plot(style=[':', '--', '-'])
plt.ylabel('mean hourly count');
In [42]:
daily.rolling(50, center=True,
win_type='gaussian').sum(std=10).plot(style=[':', '--', '-']);
Profundizando en los datos¶
In [43]:
by_time = data.groupby(data.index.time).mean()
hourly_ticks = 4 * 60 * 60 * np.arange(6)
by_time.plot(xticks=hourly_ticks, style=[':', '--', '-']);
In [44]:
by_weekday = data.groupby(data.index.dayofweek).mean()
by_weekday.index = ['Mon', 'Tues', 'Wed', 'Thurs', 'Fri', 'Sat', 'Sun']
by_weekday.plot(style=[':', '--', '-']);
In [45]:
weekend = np.where(data.index.weekday < 5, 'Weekday', 'Weekend')
by_time = data.groupby([weekend, data.index.time]).mean()
In [46]:
import matplotlib.pyplot as plt
fig, ax = plt.subplots(1, 2, figsize=(14, 5))
by_time.ix['Weekday'].plot(ax=ax[0], title='Weekdays',
xticks=hourly_ticks, style=[':', '--', '-'])
by_time.ix['Weekend'].plot(ax=ax[1], title='Weekends',
xticks=hourly_ticks, style=[':', '--', '-']);