Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Deadlock with barrier und RMA
From: Barrett, Brian W (bwbarre_at_[hidden])
Date: 2011-06-13 12:11:44


There are no missing calls to MPI_WIN_FENCE as the code is using passive
synchronization (lock/unlock). The test code looks correct, I think this
is a bug in Open MPI. The code also fails on the development trunk, so
upgrading will not fix the bug. I've filed a bug (#2809). Unfortunately,
I'm not sure when I'll have time to investigate further.

One other note... Even when everything works correctly, Open MPI's
passive target synchronization implementation is pretty poor (this coming
from the guy who wrote the code). Open MPI doesn't offer asynchronous
progress for lock/unlock, so all processes have to be entering in the MPI
library for progress. Also, the latency isn't the best.

Brian

On 6/13/11 6:41 AM, "Jeff Squyres" <jsquyres_at_[hidden]> wrote:

>I think your program has a compile error in the Win_create() line.
>
>But other than that, I think you're missing some calls to MPI_WIN_FENCE.
>The one-sided stuff in MPI-2 is really, really confusing.
>
>Others on this list disagree with me, but I actively discourage people
>from using it. Instead, especially if you're just starting with MPI, you
>might want to use MPI_SEND and MPI_RECV (and friends).
>
>I'd also suggest installing your own version of OMPI; the v1.0 series is
>several years out of date (either get your admin to install a more recent
>version, or install a personal copy, as someone outlined earlier in this
>thread). There have been oodles of bug fixes and new features added
>since the v1.0 series.
>
>
>On Jun 11, 2011, at 10:43 AM, Ole Kliemann wrote:
>
>> Hi everyone!
>>
>> I'm trying to use MPI on a cluster running OpenMPI 1.2.4 and starting
>> processes through PBSPro_11.0.2.110766. I've been running into a couple
>> of performance and deadlock problems and like to check whether I'm
>> making a mistake.
>>
>> One of the deadlocks I managed to boil down to the attached example. I
>> run it on 8 cores. It usually deadlocks with all except one process
>> showing
>>
>> start barrier
>>
>> as last output.
>>
>> The one process out of order shows:
>>
>> start getting local
>>
>> My question at this point is simply whether this is expected behaviour
>> of OpenMPI.
>>
>> Thanks in advance!
>> Ole
>> <mpi_barrier.cc>_______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>--
>Jeff Squyres
>jsquyres_at_[hidden]
>For corporate legal information go to:
>http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
>_______________________________________________
>users mailing list
>users_at_[hidden]
>http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>

-- 
  Brian W. Barrett
  Dept. 1423: Scalable System Software
  Sandia National Laboratories