May be this FAQ will help :
Brock Palen wrote:
> We have a code (arts) that locks up only when running on IB. Works
> fine on tcp and sm.
> When we ran it in a debugger. It locked up on a MPI_Comm_split()
> That as far as I could tell was valid.
> Because the split was a hack they did to use MPI_File_open() on a
> single cpu, we reworked it to remove the split. The code then locks
> up again.
> This time its locked up on an MPI_Allreduce() Which was really
> strange. When running on 8 cpus only rank 4 would get sucks. The
> rest of the ranks are fine and get the right value. (we are using ddt
> as our debugger).
> Its very strange. Do you have any idea what could cause this to
> happen? We are using openmpi-1.2.3/1.2.6 with PGI compilers.
> Brock Palen
> Center for Advanced Computing
> users mailing list