Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] MPI_Barrier, again
From: Evgeniy Shapiro (shellinux_at_[hidden])
Date: 2012-01-30 07:19:22


I have attached an example.

Compiler:
ifort (IFORT) 11.1 20090630
Copyright (C) 1985-2009 Intel Corporation. All rights reserved.

flags:
mpif90 -O0 -fp-model precise -traceback -r8 -i4 -fpp -check all
-warn all -warn nounused -save-temps -g -check noarg_temp_created -o
testbar ./mpibarriertest.f90

OpenMPI: 1.4.3

hangs with 15 processes randomly as described.

Evgeniy

Message: 10
Date: Sat, 28 Jan 2012 08:24:39 -0500
From: Jeff Squyres <jsquyres_at_[hidden]>
Subject: Re: [OMPI users] MPI_Barrier, again
To: Open MPI Users <users_at_[hidden]>
Message-ID: <1859C141-813D-46BA-97BC-4B0290FB3291_at_[hidden]>
Content-Type: text/plain; charset=us-ascii

Is there any chance you can make a small-ish reproducer of the issue
that we can run?

On Jan 27, 2012, at 10:45 AM, Evgeniy Shapiro wrote:

> Hi
>
> I have a strange problem with MPI_Barrier occurring when writing to a
> file. The output subroutine (the code is in FORTRAN) is called from
> the main program and there is an MPI_Barrier just before the call.
>
> In the subroutine
>
> 1. Process 0 checks whether the first file exists and, if not, -
> creates the file 1, writes the file header and closes the file
>
> 2. there is a loop over the data sets with an embedded barrier
> do i=0, iDatasets
> call MPI_Barrier
> if I do not own data - cycle and go to the next dataset (and barrier)
> check if the file exists, if not - sleep and check again until it
> does (needed to make sure the buffer has been flushed)
> write my portion of the file
> end do
> in theory the above should result in a sequential write of datasets
> to the file.
>
> 3. Process 0 checks whether the second file exists and, if not, -
> creates the file 2, writes the file header and closes the file
>
> 2. there is a loop over the data sets with an embedded barrier
> do i=0, iDatasets
> call MPI_Barrier
> if I do not own data - cycle and go to the next dataset (and barrier)
> check if the file exists, if not - sleep and check again until it
> does (needed to make sure the buffer has been flushed)
> write my portion of the file including a link to the 1st file
> end do
>
> The sub is called several times (different files/datasets) with a
> barrier between calls, erratically the program hangs in one of the
> calls. The likelihood of the program hanging increases with the
> increase of the number of processes. DDT shows that when this happens
> some of the processes including 0 are waiting at barrier inside the
> first loop, some - at the second barrier and one whereas one process
> is in the sleep/check file status cycle in the second loop. So somehow
> a part of processes go through the 1st barrier before process 0.
> This is a debug version, so no loop unrolling etc.
>
> Is there anything I can do to make sure that the first barrier is
> observed by all processes? Any advice greatly appreciated.
>
> Evgeniy
>
>
> OpenMPI: 1.4.3
> (I cannot use parallel mpi io in this situation for various reasons)
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

--
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/