Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] MPI_Barrier, again
From: Evgeniy Shapiro (shellinux_at_[hidden])
Date: 2012-01-30 07:19:22


I have attached an example.

Compiler:
ifort (IFORT) 11.1 20090630
Copyright (C) 1985-2009 Intel Corporation. All rights reserved.

flags:
mpif90 -O0 -fp-model precise -traceback -r8 -i4 -fpp -check all
-warn all -warn nounused -save-temps -g -check noarg_temp_created -o
testbar ./mpibarriertest.f90

OpenMPI: 1.4.3

hangs with 15 processes randomly as described.

Evgeniy

Message: 10
Date: Sat, 28 Jan 2012 08:24:39 -0500
From: Jeff Squyres <jsquyres_at_[hidden]>
Subject: Re: [OMPI users] MPI_Barrier, again
To: Open MPI Users <users_at_[hidden]>
Message-ID: <1859C141-813D-46BA-97BC-4B0290FB3291_at_[hidden]>
Content-Type: text/plain; charset=us-ascii

Is there any chance you can make a small-ish reproducer of the issue
that we can run?

On Jan 27, 2012, at 10:45 AM, Evgeniy Shapiro wrote:

> Hi
>
> I have a strange problem with MPI_Barrier occurring when writing to a
> file. The output subroutine (the code is in FORTRAN) is called from
> the main program and there is an MPI_Barrier just before the call.
>
> In the subroutine
>
> 1. Process 0 checks whether the first file exists and, if not, -
> creates the file 1, writes the file header and closes the file
>
> 2. there is a loop over the data sets with an embedded barrier
> do i=0, iDatasets
> call MPI_Barrier
> if I do not own data - cycle and go to the next dataset (and barrier)
> check if the file exists, if not - sleep and check again until it
> does (needed to make sure the buffer has been flushed)
> write my portion of the file
> end do
> in theory the above should result in a sequential write of datasets
> to the file.
>
> 3. Process 0 checks whether the second file exists and, if not, -
> creates the file 2, writes the file header and closes the file
>
> 2. there is a loop over the data sets with an embedded barrier
> do i=0, iDatasets
> call MPI_Barrier
> if I do not own data - cycle and go to the next dataset (and barrier)
> check if the file exists, if not - sleep and check again until it
> does (needed to make sure the buffer has been flushed)
> write my portion of the file including a link to the 1st file
> end do
>
> The sub is called several times (different files/datasets) with a
> barrier between calls, erratically the program hangs in one of the
> calls. The likelihood of the program hanging increases with the
> increase of the number of processes. DDT shows that when this happens
> some of the processes including 0 are waiting at barrier inside the
> first loop, some - at the second barrier and one whereas one process
> is in the sleep/check file status cycle in the second loop. So somehow
> a part of processes go through the 1st barrier before process 0.
> This is a debug version, so no loop unrolling etc.
>
> Is there anything I can do to make sure that the first barrier is
> observed by all processes? Any advice greatly appreciated.
>
> Evgeniy
>
>
> OpenMPI: 1.4.3
> (I cannot use parallel mpi io in this situation for various reasons)
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

--
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/