On Jan 12, 2010, at 16:57 , Eugene Loh wrote:
> Jeff Squyres wrote:
>> It would be very strange for nanosleep to cause a problem for Open MPI -- it shouldn't interfere with any of Open MPI's mechanisms. Double check that your my_barrier() function is actually working properly -- removing the nanosleep() shouldn't affect the correctness of your barrier.
> I read Gijsbert's e-mail differently. Apparently, the issue is not MPI/OMPI at all, but a hang inside nanosleep.
>> On Dec 31, 2009, at 1:15 PM, Gijsbert Wiesenekker wrote:
>>> I only recently learned about the OMPI_MCA_mpi_yield_when_idle variable, I still have to test if that is an alternative to my workaround.
> mpi_yield_when_idle does not free the CPU up very much. It still polls fairly aggressively, and the yield() call doesn't really free the CPU up that much. It's a weak and probably ungratifying solution for your problem.
>>> Meanwhile I seem to have found the cause of problem ...
>>> ... rather than OpenMPI being the problem, nanosleep is the culprit because the call to it seems to hang.
> So, "we" (OMPI community) are off the hook? Problem is in nanosleep? "We" are relieved (or confused about what you're reporting)!
> users mailing list
Just to confirm: the problem is indeed not with OpenMPI (so the OMPI community is off the hook) but with nanosleep() on Fedora Core 12 (and has not been fixed yet in the current kernel/glibc).
Using MPI_Barrier with OMPI_MCA_mpi_yield_when_idle helps somewhat, but is not ideal, as it still uses a lot of CPU.