Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Michael Kluskens (mkluskens_at_[hidden])
Date: 2006-05-03 09:52:48


MPI_Intercomm_merge is broken in OpenMPI 1.1a4r9788 (and likely all
versions)

Details: the second argument, high, of MPI_Intercomm_merge is a
logical in Fortran (pg 216 of Using MPI) and an int in C. This now
correct with regards to the f90 interfaces in OpenMPI 1.1. The
meaning of "high" is as follows (from pg 313 MPI-The Complete
Reference):

If processes in one group provided the value high = false and
processes in the other group provided the value high = true then the
union orders the "low" group before the "high" group.

In other words if I have the following:
  MPI process "parent" calls MPI_Intercomm_merge with high = .false.
( high = 0 in C)
  MPI process "child" calls MPI_Intercomm_merge with high = .true.
(high = 1 in C)
then in the merged communicator - parent has rank 0 and child has
rank 1. This not happening in my tests on OS X 10.4.6 with g95;
however, my two alternative test systems handle this case as I expect
-- Debian Linux with MPICH2 1.0.3 (g95) and SGI MPI Library (sgi-
mpt-1.10.1-sgi301r1) (Intel Fotran 9.x).

The following test code is written to use the Fortran 90 interfaces
but it can be switched to the include file and fixed format source
code (.f) and should compile with both f90 and f77 compilers. I have
not written a C test code.

Michael

mpif90 parent4.f90 -o parent4
mpif90 child4.f90 -o child4

parent startup: 0 of 1
a child starting
parent spawned child process
child 0 of 1
parent merge comm: 1 of 2
ERROR: parent rank incorrect after merge
ERROR: child rank incorrect after merge

-- parent4.f90 --
       program parent4
       USE MPI
       implicit none
       integer ierr,size,rank,child,allmpi
       integer k, subprocesses

       call MPI_INIT(ierr)
       call MPI_COMM_RANK(MPI_COMM_WORLD,rank,ierr)
       call MPI_COMM_SIZE(MPI_COMM_WORLD,size,ierr)

       write(6,*) 'parent startup: ', rank, ' of ', size
       subprocesses = 1

       call MPI_Comm_spawn('child4', MPI_ARGV_NULL,
subprocesses, &
      & MPI_INFO_NULL, 0, MPI_COMM_WORLD, child,
MPI_ERRCODES_IGNORE, &
      & ierr )
       write(6,*) 'parent spawned child process'

       call MPI_Intercomm_merge( child, .false., allmpi, ierr )
       call MPI_COMM_RANK(allmpi,rank,ierr)
       call MPI_COMM_SIZE(allmpi,size,ierr)
       write(*,'(2(A,I3))') 'parent merge comm:',rank, ' of', size

       if ( rank .ne. 0 ) then
         write(6,*) 'ERROR: parent rank incorrect after merge'
       end if
       call MPI_COMM_FREE(allmpi,ierr)
       call MPI_COMM_FREE(child,ierr)

       call MPI_FINALIZE(ierr)
       end
-- child4.f90 --
       program child4
       USE MPI
       implicit none
       integer :: ierr,size,rank,parent,rsize,allmpi

       write(*,*) 'a child starting'
       call MPI_INIT(ierr)
       call MPI_COMM_RANK(MPI_COMM_WORLD,rank,ierr)
       call MPI_COMM_SIZE(MPI_COMM_WORLD,size,ierr)
       write(*,'(2(A,I3))') 'child',rank,' of', size
       call MPI_Comm_get_parent(parent,ierr)

       call MPI_Intercomm_merge( parent, .true., allmpi, ierr )
       call MPI_COMM_RANK(allmpi,rank,ierr)
       call MPI_COMM_SIZE(allmpi,size,ierr)
       if ( rank .eq. 0 ) then
         write(6,*) 'ERROR: child rank incorrect after merge'
       end if

       call MPI_COMM_FREE(allmpi,ierr)
       call MPI_COMM_FREE(parent,ierr)
       call MPI_FINALIZE(ierr)

       write(*,'(2(A,I3),A)') 'child',rank,' of',size,' exiting'
       end
------------------------------------

On May 2, 2006, at 11:54 PM, Jeff Squyres (jsquyres) wrote:

> Ok -- let me know what you find. I just checked and the code *looks*
> right to me, but that doesn't mean that there isn't some deeper
> implication that I'm missing.
>
>> -----Original Message-----
>> From: users-bounces_at_[hidden]
>> [mailto:users-bounces_at_[hidden]] On Behalf Of Michael Kluskens
>> Sent: Tuesday, May 02, 2006 6:05 PM
>> To: Open MPI Users
>> Subject: Re: [OMPI users] openmpi-1.0.2 configure problem
>>
>> My test codes compile fine but I'm fairly certain the logical is
>> being handled incorrectly. When I merge two comm's with one having
>> high=.false. and the other high=.true., the latter should go
>> into the
>> higher ranks and the former should contain rank 0.
>>
>> I'll work it over again tomorrow and see if I can create an f77
>> version or use the mpi.h file and see if I can get a clear
>> difference
>> and I'll compare against MPICH2 but someone else should look into
>> this issue.
>>
>> Michael
>>
>> On May 1, 2006, at 11:57 PM, Jeff Squyres (jsquyres) wrote:
>>
>>> I just fixed the INTERCOMM_MERGE/logical issue on the trunk
>> and the
>>> v1.1
>>> branch -- can you give it a whirl there?
>>>
>>> I ask because this issue is a bug that we fixed on the trunk (and
>>> therefore v1.1) and didn't back-port it to v1.0. There's actually
>>> quite
>>> a few of these F90 fixes on the trunk/v1.1 branch that we did not
>>> back-port to v1.0 (e.g., most of the other logical fixes) mainly
>>> because
>>> we thought you were the main consumer of the F90 MPI API (and
>>> therefore
>>> it wasn't worth it to back port :-) ). If you need all
>> these fixes in
>>> v1.0, we can spend the time to do the back-port, but would prefer
>>> not to
>>> if possible.
>>>
>>>
>>>> -----Original Message-----
>>>> From: users-bounces_at_[hidden]
>>>> [mailto:users-bounces_at_[hidden]] On Behalf Of Michael Kluskens
>>>> Sent: Monday, May 01, 2006 6:20 PM
>>>> To: Open MPI Users
>>>> Subject: [OMPI users] openmpi-1.0.2 configure problem
>>>>
>>>> checking if FORTRAN compiler supports integer(selected_int_kind
>>>> (2))... yes
>>>> checking size of FORTRAN integer(selected_int_kind(2))... unknown
>>>> configure: WARNING: *** Problem running configure test!
>>>> configure: WARNING: *** See config.log for details.
>>>> configure: error: *** Cannot continue.
>>>>
>>>> Source code: openmpi-1.0.2 stable
>>>> OS X 10.4.5 with g95 (Apr 27 2006)
>>>> ./configure F77=g95 FC=g95 LDFLAGS=-lSystemStubs
>>>>
>>>> I find this rather surprising given that I have been regularly
>>>> building nightly snapshots of Open MPI 1.1 and 1.2 (the
>> other bug is
>>>> preventing me from using them at the moment till either I change my
>>>> code or the bugs gets fixed).
>>>>
>>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>