3,620 questions
1
vote
0
answers
56
views
Golang benchmarks involving goroutines show higher than expected allocations when controlling the timer manually
Using go version go1.25.3 darwin/arm64.
The below implementation is a simplified version of the actual implementation.
type WaitObject struct{ c chan struct{} }
func StartNewTestObject(d time....
0
votes
1
answer
66
views
Why does MSVC AVX2 /FP:strict sometimes generate inferior (slower) code to SSE2?
I was testing various expressions of a sixth order polynomial to find the fastest possible throughput. I have stumbled upon a simple polynomial expression length 6 that provokes poor code generation ...
1
vote
1
answer
81
views
Browser debugger shows less time taken to download a base64 over a multi-part file despite the larger file size [closed]
Today, I researched Base64 encoding versus other methods and whether to use it in a JSON API, considering the 33-37% size overhead that Base64 introduces and all sorts of related topics.
To ...
2
votes
0
answers
117
views
Why is Horner's method for evaluation faster than expected when loop unrolled for Polynomial lengths N<90?
I have spent some time trying to speed up code that uses Horner's method for evaluating modest length polynomials (N < 32). I have a solution using loop unrolling that works very well at -O2 or ...
3
votes
1
answer
141
views
BenchmarkDotNet: OutOfMemoryException when benchmarking parsing a JSON file
I'm trying to benchmark the performance of a library I've written that can parse large JSON files into both an object model and a JsonDocument. So far as I can tell I'm doing everything right, but I'...
7
votes
1
answer
299
views
How to use plain RDTSC without using asm?
I want to use RDTSC in Rust to benchmark how many ticks my function takes.
There's a built-in std::arch::x86_64::_rdtsc, alas it always translates into:
rdtsc
shl rdx, 32
or rax, rdx
...
4
votes
2
answers
196
views
Why is the the generic implementation of Vector.Log so much slower than the non-generic implementations for me?
I've run some benchmarks on Math.Log, System.Numerics.Vector.Log, System.Runtime.Intrinsics.Vector128.Log, Vector256.Log and Vector512.Log and the results were pretty surprising to me. I was expecting ...
1
vote
1
answer
50
views
Looking for simple garbage collector load test
I'm looking for some code or some benchmark to roughly asses the pause times or cpu load caused by some GC in order to get some rough estimate how efficient it is. I just want to see whether some GC ...
3
votes
1
answer
191
views
Custom hasher is faster for insert and remove, but when done together is slower, when comparing to std::collections::HashMap
I wish to benchmark various hashmaps for the <K,V> pair <u8, BoxedFnMut> where BoxedFnMut.
type BoxedFnMut = Box<dyn FnMut() + Send + 'static>;
To do this, I am using divan(0.1.21) ...
4
votes
1
answer
198
views
Why is a ConcurrentDictionary faster than a Dictionary in benchmark?
I have a really simple benchmark to measure and compare performance of Dictionary<string, int> and ConcurrentDictionary<string, int>:
[MemoryDiagnoser]
public class ...
0
votes
0
answers
143
views
cargo bench throws "MallocStackLogging: can't turn off malloc stack logging because it was not enabled..." on Apple M4
I'm trying to run cargo bench on my new MacBook (Apple Silicon, macOS [Sequioa version 15.5]), but I get this error:
cargo(31826) MallocStackLogging: can't turn off malloc stack logging because it was ...
2
votes
0
answers
48
views
Statistical assumptions for Criterion benchmarks
This question is somewhat specific to Rust's Criterion, but I have kept it general so that anybody with knowledge about benchmarking can help.
In my Rust codebase, I have a struct Model that is very ...
27
votes
5
answers
9k
views
Why is this code 5 times slower in C# compared to Java?
First of all we create a random binary file with 100.000.000 bytes. I used Python for this:
import random
import os
def Main():
length = 100000000
randomArray = random.randbytes(length)
...
1
vote
1
answer
91
views
Minimize noise for benchmarking in docker
I am writing a benchmarking framework for compiler-like programs.
For benchmarking, I use a docker container (for reproducibility).
However, i still measure quite a bit of noise (up to 5%!). My ...
2
votes
1
answer
191
views
What is the reason of this performance discrepancy between NumPy and Numba?
This Python 3.12.7 script with NumPy 2.2.4 and Numba 0.61.2:
import numpy as np, timeit as ti, numba as nb
def f0(a):
p0 = a[:-2]
p1 = a[1:-1]
p2 = a[2:]
return (p0 < p1) & (p1 > p2)
...