But HPCC RandomAccess also just uses non-blocking receives. So, it's
kind of outside the scope of the original ideas here (bypassing the PML
receive-request data structure).
If you poll only the queue that correspond to a posted receive, you only optimize micro-benchmarks, until they start using ANY_SOURCE.
Note that the HPCC RandomAccess benchmark only uses MPI_ANY_SOURCE (and