We get this question so much that I really need to add it to the
Open MPI currently always spins for completion for exactly the reason
that Scott cites: lower latency.
Arguably, when using TCP, we could probably get a bit better
performance by blocking and allowing the kernel to make more progress
than a single quick pass through the sockets progress engine, but that
involves some other difficulties such as simultaneously allowing
shared memory progress. We have ideas how to make this work, but it
has unfortunately remained at a lower priority: the performance
difference isn't that great, and we've been focusing on the other,
lower latency interconnects (shmem, MX, verbs, etc.).
On Jun 3, 2009, at 8:37 AM, Scott Atchley wrote:
> On Jun 3, 2009, at 6:05 AM, tsilva_at_[hidden] wrote:
> > Top always shows all the paralell processes at 100% in the %CPU
> > field, although some of the time these must be waiting for a
> > communication to complete. How can I see actual processing as
> > opposed to waiting at a barrier?
> > Thanks,
> > Tiago
> Using what interconnect?
> For performance reasons (lower latency), the app and/or OMPI may be
> polling on the completion. Are you using blocking or non-blocking
> users mailing list