Skip to main content
1 vote
0 answers
56 views

Golang benchmarks involving goroutines show higher than expected allocations when controlling the timer manually

Using go version go1.25.3 darwin/arm64. The below implementation is a simplified version of the actual implementation. type WaitObject struct{ c chan struct{} } func StartNewTestObject(d time....
Ahmad Sameh's user avatar
0 votes
1 answer
66 views

Why does MSVC AVX2 /FP:strict sometimes generate inferior (slower) code to SSE2?

I was testing various expressions of a sixth order polynomial to find the fastest possible throughput. I have stumbled upon a simple polynomial expression length 6 that provokes poor code generation ...
Martin Brown's user avatar
  • 3,586
1 vote
1 answer
81 views

Browser debugger shows less time taken to download a base64 over a multi-part file despite the larger file size [closed]

Today, I researched Base64 encoding versus other methods and whether to use it in a JSON API, considering the 33-37% size overhead that Base64 introduces and all sorts of related topics. To ...
tfn's user avatar
  • 67
2 votes
0 answers
117 views

Why is Horner's method for evaluation faster than expected when loop unrolled for Polynomial lengths N<90?

I have spent some time trying to speed up code that uses Horner's method for evaluating modest length polynomials (N < 32). I have a solution using loop unrolling that works very well at -O2 or ...
Martin Brown's user avatar
  • 3,586
3 votes
1 answer
141 views

BenchmarkDotNet: OutOfMemoryException when benchmarking parsing a JSON file

I'm trying to benchmark the performance of a library I've written that can parse large JSON files into both an object model and a JsonDocument. So far as I can tell I'm doing everything right, but I'...
Ari Roth's user avatar
  • 5,582
7 votes
1 answer
299 views

How to use plain RDTSC without using asm?

I want to use RDTSC in Rust to benchmark how many ticks my function takes. There's a built-in std::arch::x86_64::_rdtsc, alas it always translates into: rdtsc shl rdx, 32 or rax, rdx ...
Daniil Tutubalin's user avatar
4 votes
2 answers
196 views

Why is the the generic implementation of Vector.Log so much slower than the non-generic implementations for me?

I've run some benchmarks on Math.Log, System.Numerics.Vector.Log, System.Runtime.Intrinsics.Vector128.Log, Vector256.Log and Vector512.Log and the results were pretty surprising to me. I was expecting ...
user31260114's user avatar
1 vote
1 answer
50 views

Looking for simple garbage collector load test

I'm looking for some code or some benchmark to roughly asses the pause times or cpu load caused by some GC in order to get some rough estimate how efficient it is. I just want to see whether some GC ...
OlliP's user avatar
  • 1,607
3 votes
1 answer
191 views

Custom hasher is faster for insert and remove, but when done together is slower, when comparing to std::collections::HashMap

I wish to benchmark various hashmaps for the <K,V> pair <u8, BoxedFnMut> where BoxedFnMut. type BoxedFnMut = Box<dyn FnMut() + Send + 'static>; To do this, I am using divan(0.1.21) ...
Naitik Mundra's user avatar
4 votes
1 answer
198 views

Why is a ConcurrentDictionary faster than a Dictionary in benchmark?

I have a really simple benchmark to measure and compare performance of Dictionary<string, int> and ConcurrentDictionary<string, int>: [MemoryDiagnoser] public class ...
Pupkin's user avatar
  • 1,213
0 votes
0 answers
143 views

cargo bench throws "MallocStackLogging: can't turn off malloc stack logging because it was not enabled..." on Apple M4

I'm trying to run cargo bench on my new MacBook (Apple Silicon, macOS [Sequioa version 15.5]), but I get this error: cargo(31826) MallocStackLogging: can't turn off malloc stack logging because it was ...
ajita asthana's user avatar
2 votes
0 answers
48 views

Statistical assumptions for Criterion benchmarks

This question is somewhat specific to Rust's Criterion, but I have kept it general so that anybody with knowledge about benchmarking can help. In my Rust codebase, I have a struct Model that is very ...
aleferna's user avatar
  • 141
27 votes
5 answers
9k views

Why is this code 5 times slower in C# compared to Java?

First of all we create a random binary file with 100.000.000 bytes. I used Python for this: import random import os def Main(): length = 100000000 randomArray = random.randbytes(length) ...
Vasilis Kontopoulos's user avatar
1 vote
1 answer
91 views

Minimize noise for benchmarking in docker

I am writing a benchmarking framework for compiler-like programs. For benchmarking, I use a docker container (for reproducibility). However, i still measure quite a bit of noise (up to 5%!). My ...
Frobeniusnorm's user avatar
2 votes
1 answer
191 views

What is the reason of this performance discrepancy between NumPy and Numba?

This Python 3.12.7 script with NumPy 2.2.4 and Numba 0.61.2: import numpy as np, timeit as ti, numba as nb def f0(a): p0 = a[:-2] p1 = a[1:-1] p2 = a[2:] return (p0 < p1) & (p1 > p2) ...
Paul Jurczak's user avatar
  • 8,585

15 30 50 per page
1
2 3 4 5
242