1

In pandas, there are two obvious ways to represent timespans: Interval[datetime64] and Period. When do I use which? Is there a prefered one in all cases?

I couldn't find this in the documentation - this may be either me being insufficient at finding it, or a gap in the documentation.

1 Answer 1

3

Intervals can be arbitrary within an Index. You can chose an Interval of 1 day, then 3 days. Overlapping, with a gap, etc.

dates = ['2025-01-01', '2025-01-02', '2025-01-05', '2025-01-10']
idx = pd.IntervalIndex.from_breaks(pd.DatetimeIndex(dates))

Output:

IntervalIndex([(2025-01-01 00:00:00, 2025-01-02 00:00:00],
               (2025-01-02 00:00:00, 2025-01-05 00:00:00],
               (2025-01-05 00:00:00, 2025-01-10 00:00:00]],
              dtype='interval[datetime64[ns], right]')

Periods are specialized interval-like objects with a meaningful frequency. They are meant to have a fixed frequency within an Index (e.g. a day, a week). Once you define a type of Period (e.g. weeks ending on Sundays) a given timestamp necessarily falls between two well-defined bounds:

dates = ['2025-01-01', '2025-01-02', '2025-01-05', '2025-01-10']

pd.PeriodIndex(dates, freq='D')
# PeriodIndex(['2025-01-01', '2025-01-02', '2025-01-05', '2025-01-10'], dtype='period[D]')

pd.PeriodIndex(dates, freq='W')
# PeriodIndex(['2024-12-30/2025-01-05', '2024-12-30/2025-01-05',
#              '2024-12-30/2025-01-05', '2025-01-06/2025-01-12'],
#             dtype='period[W-SUN]')

There is actually another type of Interval-like Index that might be useful if do not care about specific Timestamps, but rather a difference to an arbitrary starting point, a TimedeltaIndex:

idx = pd.DatetimeIndex(dates)
idx - idx[0]

# TimedeltaIndex(['0 days', '1 days', '4 days', '9 days'], dtype='timedelta64[ns]', freq=None)

In summary:

  • IntervalIndex[datetime64]: arbitrary intervals, not necessarily contiguous nor with a fixed frequency
  • PeriodIndex: defined frequency intervals, with specific/regular start/end dates
  • TimedeltaIndex: arbitrary time delta from an unspecifed starting point
Sign up to request clarification or add additional context in comments.

3 Comments

This should definitely go into the docs
I guess you typoed the last bullet of your summary: isn't that a TimedeltaIndex? Thank you, good explanation! I'd say the TimedeltaIndex is related and it's great to mention it, but it's quite clear that it's not there to represent a Timespan.
@chichak correct, I didn't even notice when proofreading. That's fixed, thanks!

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.