Jeff Squyres wrote:
> We get this question so much that I really need to add it to the FAQ. :-\
> Open MPI currently always spins for completion for exactly the reason
> that Scott cites: lower latency.
> Arguably, when using TCP, we could probably get a bit better performance
> by blocking and allowing the kernel to make more progress than a single
> quick pass through the sockets progress engine, but that involves some
> other difficulties such as simultaneously allowing shared memory
> progress. We have ideas how to make this work, but it has unfortunately
> remained at a lower priority: the performance difference isn't that
> great, and we've been focusing on the other, lower latency interconnects
> (shmem, MX, verbs, etc.).
Whilst I understand that you have other priorities, and I grateful for
the leverage I get by using OpenMPI, I would like to offer an
alternative use case, which I believe may become more common.
We're developing parallel software which is designed to be used
*interactively* as well as in batch mode. We want the same SIMD code
running on a user's quad-core workstation as on a 1,000-node cluster.
For the former case (single workstation), it would be *much* more user
friendly and interactive, for the back-end MPI code not to be spinning
at 100% when it's just waiting for the next front-end command. The GUI
thread doesn't get a look in.
I can't imagine the difficulties involved, but if the POSIX calls
select() and pthread_cond_wait() can do it for TCP and shared-memory
threads respectively, it can't be impossible!
Just my .2c,