Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: [OMPI users] error performing MPI_Comm_spawn
From: Marcia Cristina Cera (marcia.cristina.cera_at_[hidden])
Date: 2009-12-15 07:51:08


I intend to develop an application using the MPI_Comm_spawn to create
dynamically new MPI tasks (or processes).
The structure of the program is like a tree: each node creates 2 new ones
until reaches a predefined number of levels.

I developed a small program to explain my problem as can be seen in
-- start.c: launches (through MPI_Comm_spawn, in which the argv has the
level value) the root of the tree (a ch_rec program). Afterward spawn, a
message is sent to child and the process block in an MPI_Recv.
-- ch_rec.c: gets its level value and receives the parent message, then if
its level is less than a predefined limit, it will creates 2 children:
        - set the level value;
        - spawn 1 child;
        - send a message;
        - call an MPI_Irecv;
        - repeat the 4 previous steps for the second child;
        - call an MPI_Waitany waiting for children returns.
When children messages are received, the process send a message to its
parent and call MPI_Finalize.

Using the openmpi-1.3.3 version the program runs as expected but with
openmpi-1.4 I get the following error:

$ mpirun -np 1 start
level 0
level = 1
Parent sent: level 0 (pid:4279)
level = 2
Parent sent: level 1 (pid:4281)
[] [[42824,0],0]
ORTE_ERROR_LOG: Not found in file base/plm_base_launch_support.c at line 758

The error happens when my program try to launch the second child immediately
after the first spawn call.
In my tests I try to put an sleep of 2 second between the first and the
second spawn, and then the program runs as expected.

Some one can help me with this version 1.4 bug?