Any error messages? Maybe the nodes ran out of memory? I know MPI implement some kind of buffering under the hood, so even though you're sending array's over 2^26 in size, it may require more than that for MPI to actually send it.
Has anyone seen an issue where OpenMPI/Infiniband hangs when sending
messages over 2^26 in size?
For a reason i have not determined just yet machines on my cluster
(OpenMPI v1.5 and Qlogic Stack/QDR IB Adapters) is failing to send
array's over 2^26 in size via the AllToAll collective. (user code)
Further testing seems to indicate that an MPI message over 2^26 fails
(tested with IMB-MPI)
Running the same test on a different older IB connected cluster seems
to work, which would seem to indicate a problem with the infiniband
drivers of some sort rather then openmpi (but i'm not sure).
Any thoughts, directions, or tests?
_______________________________________________
users mailing list
users@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users