On 2009-10-29, at 10:21AM, Vincent Loechner wrote:
>
>>> It seems that the calls to collective communication are not
>>> returning for some MPI processes, when the number of processes is
>>> greater or equal to 5. It's reproduceable, on two different
>>> architectures, with two different versions of OpenMPI (1.3.2 and
>>> 1.3.3). It was working correctly with OpenMPI version 1.2.7.
>>
>> Does it work if you turn off the shared memory transport layer;
>> that is,
>>
>> mpirun -n 6 -mca btl ^sm ./testmpi
>
> Yes it does, on both my configurations (AMD and Intel processor).
> So it seems that the shared memory synchronization process is
> broken.
Presumably that is this bug:
https://svn.open-mpi.org/trac/ompi/ticket/2043
I also found by trial and error that increasing the number of fifos, eg
-mca btl_sm_num_fifos 5
on a 6-processor job, apparently worked around the problem.
But yes, something seems broken in OpenMP shared memory transport with
gcc 4.4.x.
Jonathan
--
Jonathan Dursi <ljdursi_at_[hidden]>
|