Skip to content

Conversation

@aevyrie
Copy link
Member

@aevyrie aevyrie commented Dec 30, 2025

Objective

  • After a series of optimizations making render and postupdate more parallel, write_batched_instance_buffers was regularly one of the largest spans with very low thread use, sitting at 4ms in 1 4ms frame. This makes it an ideal target to improve throughput. Note this screenshot doesn't include some visibility system optimizations:
image

Solution

  • Spawn tasks for writing buffers to the GPU. This is especially helpful for current_input_buffer and previous_input_buffer, which take about the same time and are the longest buffer writes - moving these to tasks effectively halves the time spent in the system.
image
  • In the 250k bevymark_3d stress test, this saves 1.7ms in the system, and 2.8ms in frame time

frametime

image

system

image

Testing

  • cargo rer bevymark_3d --features=debug,trace_tracy -- --benchmark --waves 250 --per-wave 1000
@alice-i-cecile alice-i-cecile added A-Rendering Drawing game state to the screen C-Performance A change motivated by improving speed, memory usage or compile times S-Needs-Review Needs reviewer attention (from anyone!) to move forward labels Dec 30, 2025
@james7132 james7132 self-requested a review December 30, 2025 19:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-Rendering Drawing game state to the screen C-Performance A change motivated by improving speed, memory usage or compile times S-Needs-Review Needs reviewer attention (from anyone!) to move forward

3 participants