Skip to content

fix: normalize datetime64 to nanosecond precision in Index for consistent hashing#2120

Open
Ayush10 wants to merge 1 commit intomicrosoft:mainfrom
Ayush10:fix/issue-1806-datetime64-precision
Open

fix: normalize datetime64 to nanosecond precision in Index for consistent hashing#2120
Ayush10 wants to merge 1 commit intomicrosoft:mainfrom
Ayush10:fix/issue-1806-datetime64-precision

Conversation

@Ayush10
Copy link

@Ayush10 Ayush10 commented Feb 1, 2026

Summary

  • Normalizes all numpy.datetime64 values to nanosecond (ns) precision during Index.__init__ construction, ensuring consistent hashing across all dict lookups and set operations
  • Fixes KeyError when multiple Index objects with different datetime64 precisions interact in concat, sum_by_index, __or__, and _align_indices
  • Adds comprehensive test covering cross-precision lookups, arithmetic, concat, sum_by_index, to_dict, and reindex

Root Cause

numpy.datetime64 values with different precisions (e.g. 'ns' vs 's') compare as equal (== returns True) but produce different hashes. Since Index uses a dict for index_map and several operations use set(), mismatched precisions cause KeyError failures even though the datetime values are logically identical.

Fix

A single normalization line in Index.__init__ that converts datetime64 arrays to datetime64[ns] — the standard precision used by pandas. This fixes all downstream operations at the source rather than patching each individually.

Test plan

  • New test_datetime64_precision_normalization test covering all affected code paths
  • All existing test_index_data.py tests pass
  • CI pipeline passes

Fixes #1806

…tent hashing

numpy.datetime64 values with different precisions (e.g. 'ns' vs 's') compare
as equal but produce different hashes, breaking dict lookups and set operations
in Index, concat, sum_by_index, and _align_indices.

Fixes microsoft#1806
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

1 participant