Skip to content

PERF: floatify#64356

Open
jbrockmendel wants to merge 1 commit intopandas-dev:mainfrom
jbrockmendel:perf-floatify
Open

PERF: floatify#64356
jbrockmendel wants to merge 1 commit intopandas-dev:mainfrom
jbrockmendel:perf-floatify

Conversation

@jbrockmendel
Copy link
Member

claude says this avoids an allocation, but I'd like to get @WillAyd's thoughts.

9-10% speedup in pd.to_numeric on floaty strings:

N = 200_000
rng = np.random.default_rng(0)

shorts = pd.array([f"{v:.4f}" for v in rng.uniform(0, 1e4, N)], dtype=object)
longs = pd.array([f"{v:.17g}" for v in rng.uniform(0, 1e15, N)], dtype=object)

%timeit pd.to_numeric(shorts)
26 ms ± 203 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)    # <- main
23.8 ms ± 284 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)  # <- PR

%timeit pd.to_numeric(longs)
29.5 ms ± 306 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)  # <- main
26.7 ms ± 527 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)  # <- PR
Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice find and makes sense

}

const int status = to_double(data, result, sci, dec, maybe_int);
const int status = to_double((char *)data, result, sci, dec, maybe_int);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should just change the function to accept a const char * instead of casting (separate PR is fine)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

2 participants