Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: [OMPI users] MPI_COMM_split hanging
From: Gary Gorbet (gegorbet_at_[hidden])
Date: 2011-12-09 18:52:20


I am attempting to split my application into multiple master+workers
groups using MPI_COMM_split. My MPI revision is shown as:

mpirun --tag-output ompi_info -v ompi full --parsable
[1,0]<stdout>:package:Open MPI root_at_build-x86-64 Distribution
[1,0]<stdout>:ompi:version:full:1.4.3
[1,0]<stdout>:ompi:version:svn:r23834
[1,0]<stdout>:ompi:version:release_date:Oct 05, 2010
[1,0]<stdout>:orte:version:full:1.4.3
[1,0]<stdout>:orte:version:svn:r23834
[1,0]<stdout>:orte:version:release_date:Oct 05, 2010
[1,0]<stdout>:opal:version:full:1.4.3
[1,0]<stdout>:opal:version:svn:r23834
[1,0]<stdout>:opal:version:release_date:Oct 05, 2010
[1,0]<stdout>:ident:1.4.3

The basic problem I am having is that none of processor instances ever
returns from the MPI_COMM_split call. I am pretty new to MPI and it is
likely I am not doing things quite correctly. I'd appreciate some guidance.

I am working with an application that has functioned nicely for a while
now. It only uses a single MPI_COMM_WORLD communicator. It is standard
stuff: a master that hands out tasks to many workers, receives output
and keeps track of workers that are ready to receive another task. The
tasks are quite compute-intensive. When running a variation of the
process that uses Monte Carlo iterations, jobs can exceed the 30 hours
they are limited to. The MC iterations are independent of each other -
adding random noise to an input - so I would like to run multiple
iterations simultaneously so that 4 times the cores runs in a fourth of
the time. This would entail a supervisor interacting with multiple
master+workers groups.

I had thought that I would just have to declare a communicator for each
group so that broadcasts and syncs would work within a single group.

    MPI_Comm_size( MPI_COMM_WORLD, &total_proc_count );
    MPI_Comm_rank( MPI_COMM_WORLD, &my_rank );
    ...
    cores_per_group = total_proc_count / groups_count;
    my_group = my_rank / cores_per_group; // e.g., 0, 1, ...
    group_rank = my_rank - my_group * cores_per_group; // rank within a
group
    if ( my_rank == 0 ) continue; // Do not create group for supervisor
    MPI_Comm oldcomm = MPI_COMM_WORLD;
    MPI_Comm my_communicator; // Actually declared as a class variable
    int sstat = MPI_Comm_split( oldcomm, my_group, group_rank,
          &my_communicator );

There is never a return from the above _split() call. Do I need to do
something else to set this up? I would have expected perhaps a non-zero
status return, but not that I would get no return at all. I would
appreciate any comments or guidance.

- Gary