I think I just did my first putback to the trunk. God help us all!
It's r20578 and feedback (e.g., "you broke everything") is appreciated,
gentle feedback even more so.
I had claimed at the in-person meeting last week that the "single queue"
approach showed no appreciable performance regression in np=2 pingpong
latencies. Now, it looks like there may be a 1-3% slowdown (due
principally to the lock that must now be used to write to shared FIFOs),
but it's barely out of the noise and already more than won back even
just at np=3 (let alone at higher process counts). I think we're fine here.