Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] MPI_Comm_spawn_multiple
From: Skouson, Gary B (Gary.Skouson_at_[hidden])
Date: 2011-02-21 14:59:44


I'm trying to use MPI_Comm_spawn_multiple and it doesn't seem to always work like I'd expect.

The simple test code I have starts a couple of master processes and then tries to spawn a couple of worker threads on each of the nodes running the master processes.

I was using 1.5.1, but gave 1.5.2rc2 a try too.
 
If I do:
[skouson_at_cu2n29 mpi2_example]$ mpirun -hostfile hostfile -n 2 -bynode ./mpi2_manager
MPI Initialized=0, Finalized=0
MPI Initialized=0, Finalized=0
I'm manager 0 of 2 on cu2n29 running MPI 2.1
setting up host cu2n29 - ./mpi2_worker
setting up host cu2n30 - ./mpi2_worker
Spawning 2 worker processes running ./mpi2_worker
Sleeping for a bit...
I'm manager 1 of 2 on cu2n30 running MPI 2.1
**** I'm worker 0 of 2 on cu2n29 running MPI 2.1
**** Worker 0: number of parents = 2
**** Worker 0: Success!
**** I'm worker 1 of 2 on cu2n30 running MPI 2.1
**** Worker 1: number of parents = 2
**** Worker 1: Success!
**** Worker 0: Value recd = 25
1: MPI Initialized=1, Finalized=1
0: MPI Initialized=1, Finalized=1

It seems to work as expected, however, if I use -loadbalance or -npernode 1 rather than the -bynode flag I get an obscure error and things hang until a ctrl-c out of it.

[skouson_at_cu2n29 mpi2_example]$ mpirun -hostfile hostfile -n 2 -loadbalance ./mpi2_manager
MPI Initialized=0, Finalized=0
MPI Initialized=0, Finalized=0
I'm manager 1 of 2 on cu2n30 running MPI 2.1
I'm manager 0 of 2 on cu2n29 running MPI 2.1
setting up host cu2n29 - ./mpi2_worker
setting up host cu2n30 - ./mpi2_worker
Spawning 2 worker processes running ./mpi2_worker
Sleeping for a bit...
[cu2n29:03088] [[62875,0],0] ORTE_ERROR_LOG: Not found in file base/odls_base_default_fns.c at line 906
mpirun: abort is already in progress...hit ctrl-c again to forcibly terminate

My environment has:
OMPI_MCA_btl_openib_ib_retry_count=7
OMPI_MCA_mpi_keep_peer_hostnames=1
OMPI_MCA_btl_openib_ib_timeout=31

I've included the sample code, along with config.log etc.

If anyone has any can point out what I'm missing to be able to run with the -loadbalance flag, I'd appreciate it.

-----
Gary Skouson