Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Problem with MPI_Scatter() on inter-communicator...
From: Edgar Gabriel (gabriel_at_[hidden])
Date: 2008-04-10 11:40:43


done...

Jeff Squyres wrote:
> Edgar --
>
> Can you file a CMR for v1.2?
>
> On Apr 10, 2008, at 8:10 AM, Edgar Gabriel wrote:
>> thanks for reporting the bug, it is fixed on the trunk. The problem
>> was
>> this time not in the algorithm, but in the checking of the
>> preconditions. If recvcount was zero and the rank not equal to the
>> rank
>> of the root, than we did not even start the scatter, assuming that
>> there
>> was nothing to do. For inter-communicators the check has to be however
>> extended to accept recvcount=0 for root=MPI_ROOT. The fix is in the
>> trunk in rev. 18123.
>>
>> Thanks
>> Edgar
>>
>> Edgar Gabriel wrote:
>>> I don't think that anybody answered to your email so far, I'll have a
>>> look at it on thursday...
>>>
>>> Thanks
>>> Edgar
>>>
>>> Audet, Martin wrote:
>>>> Hi,
>>>>
>>>> I don't know if it is my sample code or if it is a problem whit
>>>> MPI_Scatter() on inter-communicator (maybe similar to the problem
>>>> we found with MPI_Allgather() on inter-communicator a few weeks
>>>> ago) but a simple program I wrote freeze during its second
>>>> iteration of a loop doing an MPI_Scatter() over an inter-
>>>> communicator.
>>>>
>>>> For example if I compile as follows:
>>>>
>>>> mpicc -Wall scatter_bug.c -o scatter_bug
>>>>
>>>> I get no error or warning. Then if a start it with np=2 as follows:
>>>>
>>>> mpiexec -n 2 ./scatter_bug
>>>>
>>>> it prints:
>>>>
>>>> beginning Scatter i_root_group=0
>>>> ending Scatter i_root_group=0
>>>> beginning Scatter i_root_group=1
>>>>
>>>> and then hang...
>>>>
>>>> Note also that if I change the for loop to execute only the
>>>> MPI_Scatter() of the second iteration (e.g. replacing
>>>> "i_root_group=0;" by "i_root_group=1;"), it prints:
>>>>
>>>> beginning Scatter i_root_group=1
>>>>
>>>> and then hang...
>>>>
>>>> The problem therefore seems to be related with the second
>>>> iteration itself.
>>>>
>>>> Please note that this program run fine with mpich2 1.0.7rc2
>>>> (ch3:sock device) for many different number of process (np) when
>>>> the executable is ran with or without valgrind.
>>>>
>>>> The OpenMPI version I use is 1.2.6rc3 and was configured as follows:
>>>>
>>>> ./configure --prefix=/usr/local/openmpi-1.2.6rc3 --disable-mpi-
>>>> f77 --disable-mpi-f90 --disable-mpi-cxx --disable-cxx-exceptions --
>>>> with-io-romio-flags=--with-file-system=ufs+nfs
>>>>
>>>> Note also that all process (when using OpenMPI or mpich2) were
>>>> started on the same machine.
>>>>
>>>> Also if you look at source code, you will notice that some
>>>> arguments to MPI_Scatter() are NULL or 0. This may look strange
>>>> and problematic when using a normal intra-communicator. However
>>>> according to the book "MPI - The complete reference" vol 2 about
>>>> MPI-2, for MPI_Scatter() with an inter-communicator:
>>>>
>>>> "The sendbuf, sendcount and sendtype arguments are significant
>>>> only at the root process. The recvbuf, recvcount, and recvtype
>>>> arguments are significant only at the processes of the leaf group."
>>>>
>>>> If anyone else can have a look at this program and try it it would
>>>> be helpful.
>>>>
>>>> Thanks,
>>>>
>>>> Martin
>>>>
>>>>
>>>> #include <stdio.h>
>>>> #include <stdlib.h>
>>>> #include <mpi.h>
>>>>
>>>> int main(int argc, char **argv)
>>>> {
>>>> int ret_code = 0;
>>>> int comm_size, comm_rank;
>>>>
>>>> MPI_Init(&argc, &argv);
>>>>
>>>> MPI_Comm_size(MPI_COMM_WORLD, &comm_size);
>>>> MPI_Comm_rank(MPI_COMM_WORLD, &comm_rank);
>>>>
>>>> if (comm_size > 1) {
>>>> MPI_Comm subcomm, intercomm;
>>>> const int group_id = comm_rank % 2;
>>>> int i_root_group;
>>>>
>>>> /* split process in two groups: even and odd comm_ranks. */
>>>> MPI_Comm_split(MPI_COMM_WORLD, group_id, 0, &subcomm);
>>>>
>>>> /* The remote leader comm_rank for even and odd groups are
>>>> respectively: 1 and 0 */
>>>> MPI_Intercomm_create(subcomm, 0, MPI_COMM_WORLD, 1-group_id,
>>>> 0, &intercomm);
>>>>
>>>> /* for i_root_group==0 process with comm_rank==0 scatter data
>>>> to all process with odd comm_rank */
>>>> /* for i_root_group==1 process with comm_rank==1 scatter data
>>>> to all process with even comm_rank */
>>>> for (i_root_group=0; i_root_group < 2; i_root_group++) {
>>>> if (comm_rank == 0) {
>>>> printf("beginning Scatter i_root_group=%d
>>>> \n",i_root_group);
>>>> }
>>>> if (group_id == i_root_group) {
>>>> const int is_root = (comm_rank == i_root_group);
>>>> int *send_buf = NULL;
>>>> if (is_root) {
>>>> const int nbr_other = (comm_size+i_root_group)/2;
>>>> int ii;
>>>> send_buf = malloc(nbr_other*sizeof(*send_buf));
>>>> for (ii=0; ii < nbr_other; ii++) {
>>>> send_buf[ii] = ii;
>>>> }
>>>> }
>>>> MPI_Scatter(send_buf, 1, MPI_INT,
>>>> NULL, 0, MPI_INT, (is_root ? MPI_ROOT :
>>>> MPI_PROC_NULL), intercomm);
>>>>
>>>> if (is_root) {
>>>> free(send_buf);
>>>> }
>>>> }
>>>> else {
>>>> int an_int;
>>>> MPI_Scatter(NULL, 0, MPI_INT,
>>>> &an_int, 1, MPI_INT, 0, intercomm);
>>>> }
>>>> if (comm_rank == 0) {
>>>> printf("ending Scatter i_root_group=%d\n",i_root_group);
>>>> }
>>>> }
>>>>
>>>> MPI_Comm_free(&intercomm);
>>>> MPI_Comm_free(&subcomm);
>>>> }
>>>> else {
>>>> fprintf(stderr, "%s: error this program must be started np >
>>>> 1\n", argv[0]);
>>>> ret_code = 1;
>>>> }
>>>>
>>>> MPI_Finalize();
>>>>
>>>> return ret_code;
>>>> }
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> --
>> Edgar Gabriel
>> Assistant Professor
>> Parallel Software Technologies Lab http://pstl.cs.uh.edu
>> Department of Computer Science University of Houston
>> Philip G. Hoffman Hall, Room 524 Houston, TX-77204, USA
>> Tel: +1 (713) 743-3857 Fax: +1 (713) 743-3335
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>

-- 
Edgar Gabriel
Assistant Professor
Parallel Software Technologies Lab      http://pstl.cs.uh.edu
Department of Computer Science          University of Houston
Philip G. Hoffman Hall, Room 524        Houston, TX-77204, USA
Tel: +1 (713) 743-3857                  Fax: +1 (713) 743-3335