This web mail archive is frozen.
This page is part of a frozen web archive of this mailing list.
You can still navigate around this archive, but know that no new mails
have been added to it since July of 2016.
Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.
I'm running OpenMPI 1.6.4 and seeing a problem where mpi_waitall never returns. The case runs fine with MVAPICH. The logic associated with the communications has been extensively debugged in the past; we don't think it has errors. Each process posts non-blocking receives, non-blocking sends, and then does waitall on all the outstanding requests.
The work is broken down into 960 chunks. If I run with 960 processes (60 nodes of 16 cores each), things seem to work. If I use 160 processes (each process handling 6 chunks of work), then each process is handling 6 times as much communication, and that is the case that hangs with OpenMPI 1.6.4; again, seems to work with MVAPICH. Is there an obvious place to start, diagnostically? We're using the openib btl.