Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Doing a lot of spawns does not work with ompi 1.3 BUT works with ompi 1.2.7
From: Anthony Thevenin (anthony.thevenin_at_[hidden])
Date: 2009-01-27 10:39:21


Thank you!

Yes, I am trying to do over 1000 MPI_Comm_spawn on a single node.
But as I mentioned in my previous email, the MPI_Comm_spawn is in a
do-loop. So in this single node, I only have 2 procs (master and slave).
The next spawned slave comes only when the previous slave is dead.
We (my team and me) are developing a coupler which launch the codes
dynamically. Sometimes, depending on the coupling algorithm, we need to
spawn a code (which can be parallel or not) a lot of times (more than 1000).

Anthony

Ralph Castain wrote:
> Just to be clear - you are doing over 1000 MPI_Comm_spawn calls to
> launch all the procs on a single node???
>
> In the 1.2 series, every call to MPI_Comm_spawn would launch another
> daemon on the node, which would then fork/exec the specified app. If
> you look at your process table, you will see a whole lot of "orted"
> processes. Thus, you wouldn't run out of pipes because every orted
> only opened enough for a single process.
>
> In the 1.3 series, there is only one daemon on each node (mpirun fills
> that function on its node). MPI_Comm_spawn simply reuses that daemon
> to launch the new proc(s). Thus, there is a limit to the number of
> procs you can start on any node that is set by the #pipes a process
> can open.
>
> You can adjust that number, of course. You can look it up readily
> enough for your particular system. However, you may find that 1000
> comm_spawns on a single node will lead to poor performance as the
> procs contend for processor attention.
>
> Hope that helps
> Ralph
>
>
> On Jan 27, 2009, at 7:59 AM, Anthony Thevenin wrote:
>
>> Hello,
>>
>> I have two C codes :
>> - master.c : spawns a slave
>> - slave.c : spwaned by the master
>>
>> If the spawn is include in a do-loop, I can do only 123 spawns before
>> having the folowing errors:
>>
>> ORTE_ERROR_LOG: The system limit on number of pipes a process can
>> open was reached in file base/iof_base_setup.c at line 112
>> ORTE_ERROR_LOG: The system limit on number of pipes a process can
>> open was reached in file odls_default_module.c at line 203
>>
>> This test works perfectly even for a lot of spawns (more than 1000)
>> with Open-MPI 1.2.7.
>>
>> You will find the following files attached:
>> config.log.tgz
>> ompi_info.out.tgz
>> ifconfig.out.tgz
>> master.c.tgz
>> slave.c.tgz
>>
>>
>> command used to run my application :
>> mpirun -n 1 ./master
>>
>> COMPILER:
>> PGI 7.1
>>
>> PATH :
>> /space/thevenin/openmpi-1.3_pgi/bin:/usr/local/tecplot/bin:/usr/local/pgi/linux86-64/7.1/bin:/usr/totalview/bin:/usr/local/matlab71/bin:/usr/bin:/usr/ucb:/usr/sbin:/usr/bsd:/sbin:/bin:/usr/bin/X11:/usr/etc:/usr/local/bin:/usr/bin:/usr/bsd:/sbin:/usr/bin/X11:.
>>
>>
>> LD_LIBRARY_PATH:
>> /space/thevenin/openmpi-1.3_pgi/lib:/usr/local/lib
>>
>>
>> If you have any idea of what this occurs, please tell me what to do
>> to make it works.
>> Thank you very much
>>
>>
>> Anthony
>>
>>
>>
>> <config.log.tgz><ifconfig.out.tgz><master.c.tgz><ompi_info.out.tgz><slave.c.tgz>_______________________________________________
>>
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>