Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] MPI_IN_PLACE with GATHERV, AGATHERV, and SCATERV
From: Gerlach, Charles A. (charles.gerlach_at_[hidden])
Date: 2013-10-22 17:17:35


The IN_PLACE alterations to my code encompassed GATHERV as well, but as I continued to debug, it appeared more and more as though SCATTERV was the only problem case.
So, I do not forsee any GATHER reproducers, but I'll certainly send 'em if I find 'em.

I followed the link to the bug-diff, and I can confirm that my scatter_f.c and scatterv_f.c are wrong in my 1.6.5. Haven't re-compiled yet to make sure everything else goes.

-Charles

-----Original Message-----
From: users [mailto:users-bounces_at_[hidden]] On Behalf Of Nathan Hjelm
Sent: Tuesday, October 22, 2013 2:50 PM
To: Open MPI Users
Subject: Re: [OMPI users] MPI_IN_PLACE with GATHERV, AGATHERV, and SCATERV

Ok, I think we have this resolved in trunk and the fix will go into 1.7.4. The check for MPI_IN_PLACE was wrong in the mpif-h bindings. The fix was tested with your reproducer. Both MPI_SCATTER and MPI_SCATTERV had this bug. The bug does not exist in 1.6.x though so I don't know why it was failing there.

I don't see a problem with MPI_GATHER or MPI_GATHERV though. Can you send a reproducer for those?

-Nathan Hjelm
HPC-3, LANL

On Tue, Oct 22, 2013 at 02:28:38PM +0000, Gerlach, Charles A. wrote:
> My reproducer is below (SCATTERV only). It needs to be compiled with 64-bit default reals, and I'm running on four cores of a single linux86-64 box running SLED 12.3 (except where noted).
>
> Using Open-MPI with different compilers:
>
> With g95: The non-root procs print the correct values, but the root process seg faults somewhere inside the SCATTERV call.
> With portland: I get: -1614907703: __hpf_esend: not implemented
> (All procs print out the correct values.) With
> Intel (on a Mac Pro): Complains about a null communicator in MPI_FINALIZE and crashes. All procs print out the correct values.
>
> With all three of these compilers, if I comment out the entire IF (MYPN.EQ.0) code so that all procs pass RARR1 into both the send and recv buffers, I get no errors.
>
> With gfortran: This works either way (with IN_PLACE or without).
>
> Other MPI implementations:
>
> With MPICH2 (any compiler) and Intel Visual Fortran on Windows, the IN_PLACE code works.
> They specifically prohibit passing RARR1 into both the send and recv buffers on the root proc.
>
> Reproducer:
>
> PROGRAM MAIN
>
> IMPLICIT NONE
>
> REAL, DIMENSION(1200) :: RARR1
> INTEGER, DIMENSION(4) :: SEND_NUM, SEND_OFF
> INTEGER :: RECV_NUM, MYPN, NPES, IERR
>
> INTEGER :: I, J
>
> INCLUDE 'mpif.h'
>
> SEND_NUM = (/ 300, 300, 300, 300 /)
> SEND_OFF = (/ 0, 300, 600, 900 /)
> RECV_NUM = 300
>
> CALL MPI_INIT(IERR)
>
> CALL MPI_COMM_SIZE(MPI_COMM_WORLD, NPES, IERR)
> CALL MPI_COMM_RANK(MPI_COMM_WORLD, MYPN, IERR)
>
> IF (MYPN.EQ.0) THEN
> DO I = 1,1200
> RARR1(I) = 0.001*I
> ENDDO
> ELSE
> RARR1 = 0.0
> ENDIF
>
> IF (MYPN.EQ.0) THEN
> CALL MPI_SCATTERV(RARR1,SEND_NUM,SEND_OFF,MPI_DOUBLE_PRECISION, &
> MPI_IN_PLACE,RECV_NUM,MPI_DOUBLE_PRECISION,0,MPI_COMM_WORLD,IERR)
> ELSE
> CALL MPI_SCATTERV(RARR1,SEND_NUM,SEND_OFF,MPI_DOUBLE_PRECISION, &
> RARR1,RECV_NUM,MPI_DOUBLE_PRECISION,0,MPI_COMM_WORLD,IERR)
> ENDIF
>
> OPEN(71+MYPN,FORM='FORMATTED',POSITION='APPEND')
> WRITE(71+MYPN,'(3E15.7)') RARR1(1:300)
> CLOSE(71+MYPN)
>
> CALL MPI_FINALIZE(IERR)
>
> END PROGRAM MAIN
>
>
> ________________________________________
> From: users [users-bounces_at_[hidden]] on behalf of Nathan Hjelm
> [hjelmn_at_[hidden]]
> Sent: Wednesday, October 09, 2013 12:37 PM
> To: Open MPI Users
> Subject: Re: [OMPI users] MPI_IN_PLACE with GATHERV, AGATHERV, and
> SCATERV
>
> These functions are tested nightly and there has been no indication
> any of these functions fail with MPI_IN_PLACE. Can you provide a reproducer?
>
> -Nathan
> HPC-3, LANL
>
> On Tue, Oct 08, 2013 at 07:40:50PM +0000, Gerlach, Charles A. wrote:
> > I have an MPI code that was developed using MPICH1 and OpenMPI before the
> > MPI2 standards became commonplace (before MPI_IN_PLACE was an option).
> >
> >
> >
> > So, my code has many examples of GATHERV, AGATHERV and SCATTERV, where I
> > pass the same array in as the SEND_BUF and the RECV_BUF, and this has
> > worked fine for many years.
> >
> >
> >
> > Intel MPI and MPICH2 explicitly disallow this behavior according to the
> > MPI2 standard. So, I have gone through and used MPI_IN_PLACE for all the
> > GATHERV/SCATTERVs that used to pass the same array twice. This code now
> > works with MPICH2 and Intel_MPI, but fails with OpenMPI-1.6.5 on multiple
> > platforms and compilers.
> >
> >
> >
> > PLATFORM COMPILER SUCCESS? (For at least one
> > simple example)
> >
> > ------------------------------------------------------------
> >
> > SLED 12.3 (x86-64) - Portland group - fails
> >
> > SLED 12.3 (x86-64) - g95 - fails
> >
> > SLED 12.3 (x86-64) - gfortran - works
> >
> >
> >
> > OS X 10.8 -- intel -fails
> >
> >
> >
> >
> >
> > In every case where OpenMPI fails with the MPI_IN_PLACE code, I can go
> > back to the original code that passes the same array twice instead of
> > using MPI_IN_PLACE, and it is fine.
> >
> >
> >
> > I have made a test case doing an individual GATHERV with MPI_IN_PLACE, and
> > it works with OpenMPI. So it looks like there is some interaction with my
> > code that is causing the problem. I have no idea how to go about trying to
> > debug it.
> >
> >
> >
> >
> >
> > In summary:
> >
> >
> >
> > OpenMPI-1.6.5 crashes my code when I use GATHERV, AGATHERV, and SCATTERV
> > with MPI_IN_PLACE.
> >
> > Intel MPI and MPICH2 work with my code when I use GATHERV, AGATHERV, and
> > SCATTERV with MPI_IN_PLACE.
> >
> >
> >
> > OpenMPI-1.6.5 works with my code when I pass the same array to SEND_BUF
> > and RECV_BUF instead of using MPI_IN_PLACE for those same GATHERV,
> > AGATHERV, and SCATTERVs.
> >
> >
> >
> >
> >
> > -Charles
>
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
users_at_[hidden]
http://www.open-mpi.org/mailman/listinfo.cgi/users