Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Problem with MPI_Scatter() on inter-communicator...
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2008-04-10 11:33:33


Edgar --

Can you file a CMR for v1.2?

On Apr 10, 2008, at 8:10 AM, Edgar Gabriel wrote:
> thanks for reporting the bug, it is fixed on the trunk. The problem
> was
> this time not in the algorithm, but in the checking of the
> preconditions. If recvcount was zero and the rank not equal to the
> rank
> of the root, than we did not even start the scatter, assuming that
> there
> was nothing to do. For inter-communicators the check has to be however
> extended to accept recvcount=0 for root=MPI_ROOT. The fix is in the
> trunk in rev. 18123.
>
> Thanks
> Edgar
>
> Edgar Gabriel wrote:
>> I don't think that anybody answered to your email so far, I'll have a
>> look at it on thursday...
>>
>> Thanks
>> Edgar
>>
>> Audet, Martin wrote:
>>> Hi,
>>>
>>> I don't know if it is my sample code or if it is a problem whit
>>> MPI_Scatter() on inter-communicator (maybe similar to the problem
>>> we found with MPI_Allgather() on inter-communicator a few weeks
>>> ago) but a simple program I wrote freeze during its second
>>> iteration of a loop doing an MPI_Scatter() over an inter-
>>> communicator.
>>>
>>> For example if I compile as follows:
>>>
>>> mpicc -Wall scatter_bug.c -o scatter_bug
>>>
>>> I get no error or warning. Then if a start it with np=2 as follows:
>>>
>>> mpiexec -n 2 ./scatter_bug
>>>
>>> it prints:
>>>
>>> beginning Scatter i_root_group=0
>>> ending Scatter i_root_group=0
>>> beginning Scatter i_root_group=1
>>>
>>> and then hang...
>>>
>>> Note also that if I change the for loop to execute only the
>>> MPI_Scatter() of the second iteration (e.g. replacing
>>> "i_root_group=0;" by "i_root_group=1;"), it prints:
>>>
>>> beginning Scatter i_root_group=1
>>>
>>> and then hang...
>>>
>>> The problem therefore seems to be related with the second
>>> iteration itself.
>>>
>>> Please note that this program run fine with mpich2 1.0.7rc2
>>> (ch3:sock device) for many different number of process (np) when
>>> the executable is ran with or without valgrind.
>>>
>>> The OpenMPI version I use is 1.2.6rc3 and was configured as follows:
>>>
>>> ./configure --prefix=/usr/local/openmpi-1.2.6rc3 --disable-mpi-
>>> f77 --disable-mpi-f90 --disable-mpi-cxx --disable-cxx-exceptions --
>>> with-io-romio-flags=--with-file-system=ufs+nfs
>>>
>>> Note also that all process (when using OpenMPI or mpich2) were
>>> started on the same machine.
>>>
>>> Also if you look at source code, you will notice that some
>>> arguments to MPI_Scatter() are NULL or 0. This may look strange
>>> and problematic when using a normal intra-communicator. However
>>> according to the book "MPI - The complete reference" vol 2 about
>>> MPI-2, for MPI_Scatter() with an inter-communicator:
>>>
>>> "The sendbuf, sendcount and sendtype arguments are significant
>>> only at the root process. The recvbuf, recvcount, and recvtype
>>> arguments are significant only at the processes of the leaf group."
>>>
>>> If anyone else can have a look at this program and try it it would
>>> be helpful.
>>>
>>> Thanks,
>>>
>>> Martin
>>>
>>>
>>> #include <stdio.h>
>>> #include <stdlib.h>
>>> #include <mpi.h>
>>>
>>> int main(int argc, char **argv)
>>> {
>>> int ret_code = 0;
>>> int comm_size, comm_rank;
>>>
>>> MPI_Init(&argc, &argv);
>>>
>>> MPI_Comm_size(MPI_COMM_WORLD, &comm_size);
>>> MPI_Comm_rank(MPI_COMM_WORLD, &comm_rank);
>>>
>>> if (comm_size > 1) {
>>> MPI_Comm subcomm, intercomm;
>>> const int group_id = comm_rank % 2;
>>> int i_root_group;
>>>
>>> /* split process in two groups: even and odd comm_ranks. */
>>> MPI_Comm_split(MPI_COMM_WORLD, group_id, 0, &subcomm);
>>>
>>> /* The remote leader comm_rank for even and odd groups are
>>> respectively: 1 and 0 */
>>> MPI_Intercomm_create(subcomm, 0, MPI_COMM_WORLD, 1-group_id,
>>> 0, &intercomm);
>>>
>>> /* for i_root_group==0 process with comm_rank==0 scatter data
>>> to all process with odd comm_rank */
>>> /* for i_root_group==1 process with comm_rank==1 scatter data
>>> to all process with even comm_rank */
>>> for (i_root_group=0; i_root_group < 2; i_root_group++) {
>>> if (comm_rank == 0) {
>>> printf("beginning Scatter i_root_group=%d
>>> \n",i_root_group);
>>> }
>>> if (group_id == i_root_group) {
>>> const int is_root = (comm_rank == i_root_group);
>>> int *send_buf = NULL;
>>> if (is_root) {
>>> const int nbr_other = (comm_size+i_root_group)/2;
>>> int ii;
>>> send_buf = malloc(nbr_other*sizeof(*send_buf));
>>> for (ii=0; ii < nbr_other; ii++) {
>>> send_buf[ii] = ii;
>>> }
>>> }
>>> MPI_Scatter(send_buf, 1, MPI_INT,
>>> NULL, 0, MPI_INT, (is_root ? MPI_ROOT :
>>> MPI_PROC_NULL), intercomm);
>>>
>>> if (is_root) {
>>> free(send_buf);
>>> }
>>> }
>>> else {
>>> int an_int;
>>> MPI_Scatter(NULL, 0, MPI_INT,
>>> &an_int, 1, MPI_INT, 0, intercomm);
>>> }
>>> if (comm_rank == 0) {
>>> printf("ending Scatter i_root_group=%d\n",i_root_group);
>>> }
>>> }
>>>
>>> MPI_Comm_free(&intercomm);
>>> MPI_Comm_free(&subcomm);
>>> }
>>> else {
>>> fprintf(stderr, "%s: error this program must be started np >
>>> 1\n", argv[0]);
>>> ret_code = 1;
>>> }
>>>
>>> MPI_Finalize();
>>>
>>> return ret_code;
>>> }
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
> --
> Edgar Gabriel
> Assistant Professor
> Parallel Software Technologies Lab http://pstl.cs.uh.edu
> Department of Computer Science University of Houston
> Philip G. Hoffman Hall, Room 524 Houston, TX-77204, USA
> Tel: +1 (713) 743-3857 Fax: +1 (713) 743-3335
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Jeff Squyres
Cisco Systems