-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
During debugging of a CPU usage issue on an RP2350 I've discovered that the standard Executor does not actually sleep in the wfe/poll loop even if there are no pending tasks (or even when no tasks are spawned at all). Some debug instrumentation quickly showed that the device went through more than a million wfe/poll loop iterations per second even after disabling all tasks. This clearly can't be right and so I started debugging this thing, mostly by changing dependencies to a local copy and commenting out stuff until the problem went away.
Turns out that the executor is calling AtomicPtr::swap via the following call flow:
Executor.run():embassy/embassy-executor/src/arch/cortex_m.rs
Line 100 in 8730a01
pub fn run(&'static mut self, init: impl FnOnce(Spawner)) -> ! { Executor.poll():embassy/embassy-executor/src/raw/mod.rs
Line 574 in abc8e45
self.inner.poll() SyncExecutor.poll():embassy/embassy-executor/src/raw/mod.rs
Line 466 in abc8e45
self.run_queue.dequeue_all(|p| { RunQueue.dequeue_all():let taken = self.stack.take_all(); TransferStack.take_all(): https://docs.rs/cordyceps/0.3.4/src/cordyceps/stack.rs.html#159AtomicPtr.swap()
Disabling this AtomicPtr.swap() call (via a [patch.crates-io] in the Cargo.toml) actually changed the behavior and allowed wfe to sleep.
In order to demonstrate the issue, I've written a minimal reproducer to directly run AtomicPtr::swap and wfe in a loop:
#![no_main]
#![no_std]
use core::fmt::Write;
use core::ptr;
use core::sync::atomic::{AtomicPtr, Ordering};
use cortex_m::asm::wfe;
use cortex_m_rt::entry;
use embassy_rp::{uart};
use embassy_time::Instant;
use defmt::{println};
use embassy_rp::uart::{UartTx};
use {defmt_rtt as _, panic_probe as _};
struct UartWriter {
uart: UartTx<'static, uart::Blocking>,
}
impl Write for UartWriter
{
fn write_str(&mut self, s: &str) -> core::fmt::Result {
self.uart.blocking_write(s.as_bytes()).unwrap();
Ok(())
}
}
#[entry]
fn main() -> !{
println!("main");
let p = embassy_rp::init(Default::default());
let uart: UartTx<uart::Blocking> = UartTx::new(p.UART1, p.PIN_4, p.DMA_CH1, embassy_rp::uart::Config::default());
let mut uart = UartWriter{uart};
let _ = write!(uart, "Starting main\r\n");
let atomic: AtomicPtr<u32> = AtomicPtr::new(ptr::null_mut::<u32>());
for i in 0u32..1_000_000u32{
wfe();
if Instant::now().as_ticks() > 1_000_000{
panic!("i={} after 1_000_000 ticks", i);
}
let val = atomic.swap(ptr::null_mut(), Ordering::AcqRel);
}
write!(uart, "1M events took {} ticks\r\n", Instant::now().as_ticks()).unwrap();
panic!("1M events took {} ticks", Instant::now().as_ticks());
}
And the corresponding Cargo.toml:
[package]
name = "wfe_issue_reproducer"
version = "0.1.0"
edition = "2021"
[dependencies]
cortex-m-rt = "0.7.5"
defmt = "1.0.1"
defmt-rtt = "1.1.0"
embassy-rp = { version = "0.8.0", features = ["defmt", "unstable-pac", "time-driver", "critical-section-impl", "rp235xa", "binary-info"] }
embassy-time = { version = "0.5.0", features = ["defmt", "defmt-timestamp-uptime"]}
panic-probe = { version = "1.0.0", features = ["print-defmt"] }
cortex-m = "0.7.7"
Runnint this code on an RP2350 will output something like this:
Starting main
1M events took 286988 ticks
On the other hand, when commenting out the atomic.swap line, it will actually sleep and never reach the one million iterations in a short amount of time.
Since I initially suspected that the issue could possibly be related to the SWD debug probe being attached, I also implemented uart output, that way the issue can be reproduced with SWD disconnected (just by monitoring the uart output).
As of now I do not understand the reason why AtomicPtr.swap() will set the event register (so that the next wfe instruction will directly return). The AtomicPtr implementation branches out to atomic_xchg from https://github.com/rust-lang/rust/blob/master/library/core/src/intrinsics/mod.rs#L149 and a resulting disassembly listing is using memory synchronization instruction such as strex/ldrex/dmb, which may be related to this issue. Since I'm not an expert for Arm assembly and the low-level details of the Arm Cortex-M architecture, I haven't figured out the root cause of the issue yet.
Potential root causes of this issue:
- Bug in embassy, e.g. incorrect usage/combination of synchronization primitives and
wfeinstruction - Bug in rustc (which is implementing the compiler intrinsics used by
AtomicPtr), tested version is rustc 1.91.0 (f8297e351 2025-10-28) - Hardware bug/errata in the RP2350