Sorry, more questions to answer:
> On the other hand I am not sure it could even work at all, as whenever
> tried at run-time to limit the list to just one transport (be it tcp or
> openib, btw), mpi apps would not start.
you need to specify both the transport and self, such as:
mpirun -mca btl self,tcp
This is a simple loopback and leaving it out may be the problem.
> Either way, I'm curious if it's even worth trying and if there's other
> cuts that can be made to shave off one us or two (ok, I'l settle for
> 1.5 :-) )
For Heroic latencies on IB we would need to use small message RDMA and
poll each peers dedicated memory region for completion.