Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] MPI_COMM_split hanging
From: Josh Hursey (jjhursey_at_[hidden])
Date: 2011-12-12 09:45:31

For MPI_Comm_split, all processes in the input communicator (oldcomm
or MPI_COMM_WORLD in your case) must call the operation since it is
collective over the input communicator. In your program rank 0 is not
calling the operation, so MPI_Comm_split is waiting for it to

If you want rank 0 to be excluded from the any of the communicators,
you can give it a special color that is distinct from all other ranks.
Upon return from MPI_Comm_split, rank 0 will be given a new
communicator containing just one processes, itself. If you do not
intend to use that communicator you can free it immediately

Hope that helps,

On Fri, Dec 9, 2011 at 6:52 PM, Gary Gorbet <gegorbet_at_[hidden]> wrote:
> I am attempting to split my application into multiple master+workers
> groups using MPI_COMM_split. My MPI revision is shown as:
> mpirun --tag-output ompi_info -v ompi full --parsable
> [1,0]<stdout>:package:Open MPI root_at_build-x86-64 Distribution
> [1,0]<stdout>:ompi:version:full:1.4.3
> [1,0]<stdout>:ompi:version:svn:r23834
> [1,0]<stdout>:ompi:version:release_date:Oct 05, 2010
> [1,0]<stdout>:orte:version:full:1.4.3
> [1,0]<stdout>:orte:version:svn:r23834
> [1,0]<stdout>:orte:version:release_date:Oct 05, 2010
> [1,0]<stdout>:opal:version:full:1.4.3
> [1,0]<stdout>:opal:version:svn:r23834
> [1,0]<stdout>:opal:version:release_date:Oct 05, 2010
> [1,0]<stdout>:ident:1.4.3
> The basic problem I am having is that none of processor instances ever
> returns from the MPI_COMM_split call. I am pretty new to MPI and it is
> likely I am not doing things quite correctly. I'd appreciate some guidance.
> I am working with an application that has functioned nicely for a while
> now. It only uses a single MPI_COMM_WORLD communicator. It is standard
> stuff:  a master that hands out tasks to many workers, receives output
> and keeps track of workers that are ready to receive another task. The
> tasks are quite compute-intensive. When running a variation of the
> process that uses Monte Carlo iterations, jobs can exceed the 30 hours
> they are limited to. The MC iterations are independent of each other -
> adding random noise to an input - so I would like to run multiple
> iterations simultaneously so that 4 times the cores runs in a fourth of
> the time. This would entail a supervisor interacting with multiple
> master+workers groups.
> I had thought that I would just have to declare a communicator for each
> group so that broadcasts and syncs would work within a single group.
>   MPI_Comm_size( MPI_COMM_WORLD, &total_proc_count );
>   MPI_Comm_rank( MPI_COMM_WORLD, &my_rank );
>   ...
>   cores_per_group = total_proc_count / groups_count;
>   my_group = my_rank / cores_per_group;     // e.g., 0, 1, ...
>   group_rank = my_rank - my_group * cores_per_group;  // rank within a
> group
>   if ( my_rank == 0 )    continue;    // Do not create group for supervisor
>   MPI_Comm oldcomm = MPI_COMM_WORLD;
>   MPI_Comm my_communicator;    // Actually declared as a class variable
>   int sstat = MPI_Comm_split( oldcomm, my_group, group_rank,
>         &my_communicator );
> There is never a return from the above _split() call. Do I need to do
> something else to set this up? I would have expected perhaps a non-zero
> status return, but not that I would get no return at all. I would
> appreciate any comments or guidance.
> - Gary
> _______________________________________________
> users mailing list
> users_at_[hidden]

Joshua Hursey
Postdoctoral Research Associate
Oak Ridge National Laboratory