On Mon, Jun 13, 2011 at 04:11:44PM +0000, Barrett, Brian W wrote:
> There are no missing calls to MPI_WIN_FENCE as the code is using passive
> synchronization (lock/unlock). The test code looks correct, I think this
> is a bug in Open MPI. The code also fails on the development trunk, so
> upgrading will not fix the bug. I've filed a bug (#2809). Unfortunately,
> I'm not sure when I'll have time to investigate further.
Thanks alot for you help, much appreciated!
> One other note... Even when everything works correctly, Open MPI's
> passive target synchronization implementation is pretty poor (this coming
> from the guy who wrote the code). Open MPI doesn't offer asynchronous
> progress for lock/unlock, so all processes have to be entering in the MPI
> library for progress. Also, the latency isn't the best.
To shift the topic a bit in this direction: When I stumbled across this
deadlock, I was actually trying to write test code to measure the
performance of passive target synchronization.
I am working on the implementation of an evolutionary algorithm. Each
process evolves one individual. At certain points the best individual
across all processes is to be determined.
But the processes are running completely asynchronous and at very
different speeds until this point.
Using active target means all processes which are faster have to wait.
I measured idle times of 20-30% of the total running time of the
algorithm. I consider this to be pretty bad, but to be expected since
all processes are forced to run synchronized.
Using passive target makes it possible to have all processes run
completely asynchronous throughout the whole execution of the algorithm.
But with the lock/unlock calls I measured idle times of 40-50%. I
consider this to be catastrophic and totally unexpected.
I have not debugged this further because -- as I said -- when trying to
develop test code I ran into the deadlock. But from what I saw so far,
it seemed that process A could not get a lock on process B's window,
while process B was doing heavy calculation in a tight for-loop.
Taking your words above into account, Brian, I think I can assume that
these 40-50% idle time are not so much out of the order and no result on
a big mistake on my part.
> On 6/13/11 6:41 AM, "Jeff Squyres" <jsquyres_at_[hidden]> wrote:
> >I think your program has a compile error in the Win_create() line.
> >But other than that, I think you're missing some calls to MPI_WIN_FENCE.
> >The one-sided stuff in MPI-2 is really, really confusing.
> >Others on this list disagree with me, but I actively discourage people
> >from using it. Instead, especially if you're just starting with MPI, you
> >might want to use MPI_SEND and MPI_RECV (and friends).
> >I'd also suggest installing your own version of OMPI; the v1.0 series is
> >several years out of date (either get your admin to install a more recent
> >version, or install a personal copy, as someone outlined earlier in this
> >thread). There have been oodles of bug fixes and new features added
> >since the v1.0 series.
> >On Jun 11, 2011, at 10:43 AM, Ole Kliemann wrote:
> >> Hi everyone!
> >> I'm trying to use MPI on a cluster running OpenMPI 1.2.4 and starting
> >> processes through PBSPro_220.127.116.11766. I've been running into a couple
> >> of performance and deadlock problems and like to check whether I'm
> >> making a mistake.
> >> One of the deadlocks I managed to boil down to the attached example. I
> >> run it on 8 cores. It usually deadlocks with all except one process
> >> showing
> >> start barrier
> >> as last output.
> >> The one process out of order shows:
> >> start getting local
> >> My question at this point is simply whether this is expected behaviour
> >> of OpenMPI.
> >> Thanks in advance!
> >> Ole
> >> <mpi_barrier.cc>_______________________________________________
> >> users mailing list
> >> users_at_[hidden]
> >> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >Jeff Squyres
> >For corporate legal information go to:
> >users mailing list
> Brian W. Barrett
> Dept. 1423: Scalable System Software
> Sandia National Laboratories
> users mailing list
- application/pgp-signature attachment: stored