Thank you, Ralph
I will use the 1.3.3 for now...
while waiting for a future fix release that break this race condiction.
Looks to me like it is a race condition, and the timing between 1.3.3 and 1.4 is just enough to trip it. I can break the race, but it will have to be in a future fix release.Meantime, your best bet is to either stick with 1.3.3 or add the delay.On Dec 15, 2009, at 5:51 AM, Marcia Cristina Cera wrote:<spawn-problem.tar.gz>_______________________________________________Hi,
I intend to develop an application using the MPI_Comm_spawn to create dynamically new MPI tasks (or processes).
The structure of the program is like a tree: each node creates 2 new ones until reaches a predefined number of levels.
I developed a small program to explain my problem as can be seen in attachment.
-- start.c: launches (through MPI_Comm_spawn, in which the argv has the level value) the root of the tree (a ch_rec program). Afterward spawn, a message is sent to child and the process block in an MPI_Recv.
-- ch_rec.c: gets its level value and receives the parent message, then if its level is less than a predefined limit, it will creates 2 children:
- set the level value;
- spawn 1 child;
- send a message;
- call an MPI_Irecv;
- repeat the 4 previous steps for the second child;
- call an MPI_Waitany waiting for children returns.
When children messages are received, the process send a message to its parent and call MPI_Finalize.
Using the openmpi-1.3.3 version the program runs as expected but with openmpi-1.4 I get the following error:
$ mpirun -np 1 start
level = 1
Parent sent: level 0 (pid:4279)
level = 2
Parent sent: level 1 (pid:4281)
[xiru-8.portoalegre.grenoble.grid5000.fr:04278] [[42824,0],0] ORTE_ERROR_LOG: Not found in file base/plm_base_launch_support.c at line 758
The error happens when my program try to launch the second child immediately after the first spawn call.
In my tests I try to put an sleep of 2 second between the first and the second spawn, and then the program runs as expected.
Some one can help me with this version 1.4 bug?
users mailing list
users mailing list