Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Bug with 1.2.5?
From: Grismer, Matthew J Civ USAF AFMC AFRL/RBAC (Matthew.Grismer_at_[hidden])
Date: 2008-03-17 14:23:14


Excellent catch! Thank you, that was indeed the problem. The original
code had a synchronous send, but one of our developers got over-eager
and changed to the non-blocking send without adjusting the deallocate
logic.

Matt

-----Original Message-----
From: users-bounces_at_[hidden] [mailto:users-bounces_at_[hidden]] On
Behalf Of George Bosilca
Sent: Monday, March 17, 2008 12:52 PM
To: Open MPI Users
Subject: Re: [OMPI users] Bug with 1.2.5?

I'm not very familiar with Fortran 90, but the code looks wrong to me.
Here is a snippet from the code:

      call MPI_ISEND(VTMP,int(NCFACES(I)),MPI_FP,NZINT(I)-1, &
                     int(GLOBINT(I)),AVUS_COMM_WORLD,REQUEST(MXIPZ
+I),IERR)

      deallocate(VTMP)

The problem seems to come from the fact that you start a non blocking
send and then you release the buffer, which is completely illegal !!!
The free should happen only once you know the send is completed, which
means after the MPI_Waitall.

This works with MPICH because they buffer data in some cases. And works
for Open MPI on small problem sizes, because then the message can be
send directly without buffering. However, once you're over the eager
limit, a rendez-vous message is required, and the MPI_Isend will be
completed only later.

   george.

On Mar 17, 2008, at 12:33 PM, Grismer, Matthew J Civ USAF AFMC AFRL/
RBAC wrote:

>
> I've attached the requested configuration and ompi_info output, as
> well as the actual error messages that appear (run.out) when the code
> is run.
> I traced it down to the section of code included that is in
> fail_section.F.
>
> The platform is a Mac Pro running Mac OS X 10.5.2, but I also tried it

> on Mac OS X Server 10.4.11 (Xserve Xeon) with the same result. Also
> tried compiling OpenMPI with the Intel C/C++ compilers (version
> 10.1.012), same result.
>
> The code has been run without issue on numerous HPC platforms, and
> runs with OpenMPI on this platform for small problems. Issue shows up
> when running larger problems. Using MPICH2 on this platform with same
> large problem runs fine.
>
> The issue appears to occur when calling the MPI_WAITALL statement at
> the end of the code section; the MPI_IRECV and MPI_ISEND statements
> complete.
>
> Any help is greatly appreciated.
>
> _____________________________________________________
> Matthew Grismer
>
> <
> run
> .out
> >
> <
> config
> .log
> .gz
> >
> <
> fail_section
> .F><mpi_info.txt.gz>_______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users