Newest 'benchmarking' Questions

1 vote

0 answers

56 views

Golang benchmarks involving goroutines show higher than expected allocations when controlling the timer manually

Using go version go1.25.3 darwin/arm64. The below implementation is a simplified version of the actual implementation. type WaitObject struct{ c chan struct{} } func StartNewTestObject(d time....

Ahmad Sameh

19

asked 2 days ago

0 votes

1 answer

66 views

Why does MSVC AVX2 /FP:strict sometimes generate inferior (slower) code to SSE2?

I was testing various expressions of a sixth order polynomial to find the fastest possible throughput. I have stumbled upon a simple polynomial expression length 6 that provokes poor code generation ...

Martin Brown

3,586

asked Sep 12 at 16:51

1 vote

1 answer

81 views

Browser debugger shows less time taken to download a base64 over a multi-part file despite the larger file size [closed]

Today, I researched Base64 encoding versus other methods and whether to use it in a JSON API, considering the 33-37% size overhead that Base64 introduces and all sorts of related topics. To ...

tfn

67

asked Sep 10 at 9:07

2 votes

0 answers

117 views

Why is Horner's method for evaluation faster than expected when loop unrolled for Polynomial lengths N<90?

I have spent some time trying to speed up code that uses Horner's method for evaluating modest length polynomials (N < 32). I have a solution using loop unrolling that works very well at -O2 or ...

Martin Brown

3,586

asked Aug 29 at 16:28

3 votes

1 answer

141 views

BenchmarkDotNet: OutOfMemoryException when benchmarking parsing a JSON file

I'm trying to benchmark the performance of a library I've written that can parse large JSON files into both an object model and a JsonDocument. So far as I can tell I'm doing everything right, but I'...

Ari Roth

5,582

asked Aug 22 at 3:23

7 votes

1 answer

299 views

How to use plain RDTSC without using asm?

I want to use RDTSC in Rust to benchmark how many ticks my function takes. There's a built-in std::arch::x86_64::_rdtsc, alas it always translates into: rdtsc shl rdx, 32 or rax, rdx ...

Daniil Tutubalin

198

asked Aug 17 at 10:50

4 votes

2 answers

196 views

Why is the the generic implementation of Vector.Log so much slower than the non-generic implementations for me?

I've run some benchmarks on Math.Log, System.Numerics.Vector.Log, System.Runtime.Intrinsics.Vector128.Log, Vector256.Log and Vector512.Log and the results were pretty surprising to me. I was expecting ...

user31260114

51

asked Aug 12 at 11:34

1 vote

1 answer

50 views

Looking for simple garbage collector load test

I'm looking for some code or some benchmark to roughly asses the pause times or cpu load caused by some GC in order to get some rough estimate how efficient it is. I just want to see whether some GC ...

OlliP

1,607

asked Aug 4 at 13:08

3 votes

1 answer

191 views

Custom hasher is faster for insert and remove, but when done together is slower, when comparing to std::collections::HashMap

I wish to benchmark various hashmaps for the <K,V> pair <u8, BoxedFnMut> where BoxedFnMut. type BoxedFnMut = Box<dyn FnMut() + Send + 'static>; To do this, I am using divan(0.1.21) ...

Naitik Mundra

502

asked Jul 30 at 11:04

4 votes

1 answer

198 views

Why is a ConcurrentDictionary faster than a Dictionary in benchmark?

I have a really simple benchmark to measure and compare performance of Dictionary<string, int> and ConcurrentDictionary<string, int>: [MemoryDiagnoser] public class ...

Pupkin

1,213

asked Jul 11 at 9:26

0 votes

0 answers

143 views

cargo bench throws "MallocStackLogging: can't turn off malloc stack logging because it was not enabled..." on Apple M4

I'm trying to run cargo bench on my new MacBook (Apple Silicon, macOS [Sequioa version 15.5]), but I get this error: cargo(31826) MallocStackLogging: can't turn off malloc stack logging because it was ...

ajita asthana

1

asked Jul 4 at 11:54

2 votes

0 answers

48 views

Statistical assumptions for Criterion benchmarks

This question is somewhat specific to Rust's Criterion, but I have kept it general so that anybody with knowledge about benchmarking can help. In my Rust codebase, I have a struct Model that is very ...

aleferna

141

asked Jun 27 at 1:11

27 votes

5 answers

9k views

Why is this code 5 times slower in C# compared to Java?

First of all we create a random binary file with 100.000.000 bytes. I used Python for this: import random import os def Main(): length = 100000000 randomArray = random.randbytes(length) ...

Vasilis Kontopoulos

379

asked Jun 23 at 8:08

1 vote

1 answer

91 views

Minimize noise for benchmarking in docker

I am writing a benchmarking framework for compiler-like programs. For benchmarking, I use a docker container (for reproducibility). However, i still measure quite a bit of noise (up to 5%!). My ...

Frobeniusnorm

93

asked Jun 20 at 10:13

2 votes

1 answer

191 views

What is the reason of this performance discrepancy between NumPy and Numba?

This Python 3.12.7 script with NumPy 2.2.4 and Numba 0.61.2: import numpy as np, timeit as ti, numba as nb def f0(a): p0 = a[:-2] p1 = a[1:-1] p2 = a[2:] return (p0 < p1) & (p1 > p2) ...

Paul Jurczak

8,585

asked Jun 19 at 2:19

Collectives™ on Stack Overflow

Golang benchmarks involving goroutines show higher than expected allocations when controlling the timer manually

Why does MSVC AVX2 /FP:strict sometimes generate inferior (slower) code to SSE2?

Browser debugger shows less time taken to download a base64 over a multi-part file despite the larger file size [closed]

Why is Horner's method for evaluation faster than expected when loop unrolled for Polynomial lengths N<90?

BenchmarkDotNet: OutOfMemoryException when benchmarking parsing a JSON file

How to use plain RDTSC without using asm?

Why is the the generic implementation of Vector.Log so much slower than the non-generic implementations for me?

Looking for simple garbage collector load test

Custom hasher is faster for insert and remove, but when done together is slower, when comparing to std::collections::HashMap

Why is a ConcurrentDictionary faster than a Dictionary in benchmark?

cargo bench throws "MallocStackLogging: can't turn off malloc stack logging because it was not enabled..." on Apple M4

Statistical assumptions for Criterion benchmarks

Why is this code 5 times slower in C# compared to Java?

Minimize noise for benchmarking in docker

What is the reason of this performance discrepancy between NumPy and Numba?

Hot Network Questions