Basics of Generating Date Ranges and Resampling in Python


The world is full of data that changes over time, from stock prices to weather patterns. This kind of data is called time series data, and analyzing it requires special techniques. This blog post takes a look at the chapter on time series data in the book “Python for Data Analysis” by Wes McKinney. We’ll explore how to use Python’s Pandas library to work with date ranges effectively.

Parsing Dates and Creating Time Series

The first step is wrangling the dates in your data. Pandas provides tools to parse dates in various formats using the datetime module and the dateutil library.

from datetime import datetime
datetime.strptime('2011-05-08', '%Y-%m-%d')  # Parse a specific format

from dateutil.parser import parse
parse('2011-05-08', dayfirst=True)        # Parse without format specifier

Once you have parsed dates, you can create a time series using Pandas Series. The index of the Series becomes the time index, allowing you to easily access data based on specific dates.

import pandas as pd
import numpy as np

ts = pd.Series(np.random.randn(1000), index=pd.date_range('1/1/2000', periods=1000))

Selecting Data by Date

The time index in Pandas allows for intuitive selection of data based on date ranges or parts of the date.

ts['2002']        # Select data for the year 2002
ts['2002-05']      # Select data for May 2002
ts['2002-05':]     # Select data from May 2002 onwards
ts['2002-05':'2002-08']  # Select data between May and August 2002 (inclusive)

Creating Advanced Date Ranges

Pandas provides flexibility in creating date ranges. You can specify frequencies like daily, weekly (specifying weekdays), monthly (e.g., third Friday of each month) and even create ranges with specific durations.

# Create advanced date ranges
dt = pd.date_range('1/1/2024', periods=1000, freq='W-WED')  # Wednesdays of each week
dt = pd.date_range(start='1/1/2024', end='4/1/2024', freq='W-FRI')  # Fridays of the specified time interval
dt = pd.date_range(start='1/1/2024', end='4/1/2024', freq='WOM-3FRI')  # Third Fridays of each month
dt = pd.date_range(start='1/1/2024', periods=20)  # Start and end dates with number of periods
dt = pd.date_range('1/1/2024 03:56:15', periods=15, normalize=True)  # Normalize to midnight
dt = pd.date_range('1/1/2024', '10/1/2024', freq='4H')  # Every 4 hours
dt = pd.date_range('1/1/2024', '10/1/2024', freq='1H30min')  # Every 1.5 hours (corrected frequency)

Resampling: Group Data by Time

The resample method allows you to group data by time intervals similar to groupby. You can calculate statistics like mean or use methods like ohlc (open, high, low, close) for financial data.

ts = pd.Series(np.random.randn(30), index=pd.date_range('1/1/2000', periods=30, freq='4d'))
# Calculate monthly mean
ts.resample('M').mean()

# Get OHLC values for each month
ts.resample('M').ohlc()

This post has covered just a basic glimpse into working with time series data in Pandas. The “Python for Data Analysis” book offers a much deeper dive into this essential data analysis skill.