Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Running application with MPI_Comm_spawn() in multithreaded environment
From: Ralph Castain (rhc_at_[hidden])
Date: 2008-10-06 09:26:29


Hi Roberto

My time is somewhat limited, so I couldn't review the code in detail.
However, I think I got the gist of it.

A few observations:

1. the code is rather inefficient, if all you want to do is spawn a
pattern of slave processes based on a file. Unless there is some
overriding reason for doing this one comm_spawn at a time, it would be
far faster to issue a single comm_spawn and just provide the hostfile
to us. You could use either the seq or rank_file mapper - both would
take the file and provide the outcome you seek. The only difference
would be that the child procs would all be in the same comm_world -
don't know if that is an issue or not.

2. OMPI definitely cannot handle the threaded version of this code at
this time - not sure when we will get to it.

3. if you serialize the code, we -should- be able to handle it.
However, I'm not entirely sure your current method actually does that.
It looks like you call comm_spawn, and then create a new thread which
then calls comm_spawn. I'm afraid I can't quite figure out how the
thread locking would occur to prevent multiple threads continuing to
call comm_spawn - you might want to check it again and ensure it is
correct. Frankly, I'm not entirely sure what the thread creation is
gaining you - as I said, we can only call comm_spawn serially, so
having multiple threads would seem to be unnecessary...unless this
code is incomplete and you need the threads for some other purpose.

Again, you might look at that loop_spawn code I mentioned before to
see a working example. Alternatively, if your code works under HP MPI,
you might want to stick with it for now until we get the threading
support up to your required level.

Hope that helps
Ralph

On Oct 3, 2008, at 10:36 AM, Roberto Fichera wrote:

> Ralph Castain ha scritto:
>> Interesting. I ran a loop calling comm_spawn 1000 times without a
>> problem. I suspect it is the threading that is causing the trouble
>> here.
> I think so! My guessing is that at low level there is some trouble
> when
> handling *concurrent*
> orted spawning. Maybe
>> You are welcome to send me the code. You can find my loop code in
>> your
>> code distribution under orte/test/mpi - look for loop_spawn and
>> loop_child.
> In the attached code the spawing logic is currently under a loop in
> the
> main of the testmaster, so it's completly
> unthreaded at least until the MPI_Comm_spawn() terminate its work. If
> you wish like to test multithreading spawing
> you can comment the NodeThread_spawnSlave() in the main loop and
> uncomment the same function in the
> NodeThread_threadMain(). Finally if you want multithreading spawning
> but
> serialized against a mutex than uncomment
> the pthread_mutex_lock/unlock() in the NodeThread_threadMain().
>
> This code run *without* any trouble in the HP MPI implementation. It
> works not so well in mpich2 trunk version due
> to two problems: limit of ~24.4K context id and/or a race in poll()
> while waiting a termination under MPI_Comm_disconnect()
> concurrently with a MPI_Comm_spawn().
>
>>
>> Ralph
>>
>> On Oct 3, 2008, at 9:11 AM, Roberto Fichera wrote:
>>
>>> Ralph Castain ha scritto:
>>>>
>>>> On Oct 3, 2008, at 7:14 AM, Roberto Fichera wrote:
>>>>
>>>>> Ralph Castain ha scritto:
>>>>>> I committed something to the trunk yesterday. Given the
>>>>>> complexity of
>>>>>> the fix, I don't plan to bring it over to the 1.3 branch until
>>>>>> sometime mid-to-end next week so it can be adequately tested.
>>>>> Ok! So it means that I can checkout from the SVN/trunk to get
>>>>> you fix,
>>>>> right?
>>>>
>>>> Yes, though note that I don't claim it is fully correct yet. Still
>>>> needs testing. However, I have tested it a fair amount and it seems
>>>> okay.
>>>>
>>>> If you do test it, please let me know how it goes.
>>> I execute my test on the svn/trunk below
>>>
>>> Open MPI: 1.4a1r19677
>>> Open MPI SVN revision: r19677
>>> Open MPI release date: Unreleased developer copy
>>> Open RTE: 1.4a1r19677
>>> Open RTE SVN revision: r19677
>>> Open RTE release date: Unreleased developer copy
>>> OPAL: 1.4a1r19677
>>> OPAL SVN revision: r19677
>>> OPAL release date: Unreleased developer copy
>>> Ident string: 1.4a1r19677
>>>
>>> below is the output which seems to freeze just after the second
>>> spawn.
>>>
>>> [roberto_at_master TestOpenMPI]$ mpirun --verbose --debug-daemons
>>> --hostfile $PBS_NODEFILE -wdir "`pwd`" -np 1 testmaster 100000
>>> $PBS_NODEFILE
>>> [master.tekno-soft.it:30063] [[19516,0],0] orted_cmd: received
>>> add_local_procs
>>> [master.tekno-soft.it:30063] [[19516,0],0] node[0].name master
>>> daemon 0
>>> arch ffc91200
>>> [master.tekno-soft.it:30063] [[19516,0],0] node[1].name cluster4
>>> daemon
>>> INVALID arch ffc91200
>>> [master.tekno-soft.it:30063] [[19516,0],0] node[2].name cluster3
>>> daemon
>>> INVALID arch ffc91200
>>> [master.tekno-soft.it:30063] [[19516,0],0] node[3].name cluster2
>>> daemon
>>> INVALID arch ffc91200
>>> [master.tekno-soft.it:30063] [[19516,0],0] node[4].name cluster1
>>> daemon
>>> INVALID arch ffc91200
>>> Initializing MPI ...
>>> [master.tekno-soft.it:30063] [[19516,0],0] orted_recv: received
>>> sync+nidmap from local proc [[19516,1],0]
>>> [master.tekno-soft.it:30063] [[19516,0],0] orted_cmd: received
>>> collective data cmd
>>> [master.tekno-soft.it:30063] [[19516,0],0] orted_cmd: received
>>> message_local_procs
>>> [master.tekno-soft.it:30063] [[19516,0],0] orted_cmd: received
>>> collective data cmd
>>> [master.tekno-soft.it:30063] [[19516,0],0] orted_cmd: received
>>> message_local_procs
>>> Loading the node's ring from file
>>> '/var/torque/aux//932.master.tekno-soft.it'
>>> ... adding node #1 host is 'cluster4.tekno-soft.it'
>>> ... adding node #2 host is 'cluster3.tekno-soft.it'
>>> ... adding node #3 host is 'cluster2.tekno-soft.it'
>>> ... adding node #4 host is 'cluster1.tekno-soft.it'
>>> A 4 node's ring has been made
>>> At least one node is available, let's start to distribute 100000 job
>>> across 4 nodes!!!
>>> Setting up the host as 'cluster4.tekno-soft.it'
>>> Setting the work directory as '/data/roberto/MPI/TestOpenMPI'
>>> Spawning a task 'testslave.sh' on node 'cluster4.tekno-soft.it'
>>> Daemon was launched on cluster4.tekno-soft.it - beginning to
>>> initialize
>>> Daemon [[19516,0],1] checking in as pid 25123 on host
>>> cluster4.tekno-soft.it
>>> Daemon [[19516,0],1] not using static ports
>>> [cluster4.tekno-soft.it:25123] [[19516,0],1] orted: up and running -
>>> waiting for commands!
>>> [master.tekno-soft.it:30063] [[19516,0],0] orted_cmd: received
>>> add_local_procs
>>> [master.tekno-soft.it:30063] [[19516,0],0] node[0].name master
>>> daemon 0
>>> arch ffc91200
>>> [master.tekno-soft.it:30063] [[19516,0],0] node[1].name cluster4
>>> daemon
>>> 1 arch ffc91200
>>> [master.tekno-soft.it:30063] [[19516,0],0] node[2].name cluster3
>>> daemon
>>> INVALID arch ffc91200
>>> [master.tekno-soft.it:30063] [[19516,0],0] node[3].name cluster2
>>> daemon
>>> INVALID arch ffc91200
>>> [master.tekno-soft.it:30063] [[19516,0],0] node[4].name cluster1
>>> daemon
>>> INVALID arch ffc91200
>>> [cluster4.tekno-soft.it:25123] [[19516,0],1] orted_cmd: received
>>> add_local_procs
>>> [cluster4.tekno-soft.it:25123] [[19516,0],1] node[0].name master
>>> daemon
>>> 0 arch ffc91200
>>> [cluster4.tekno-soft.it:25123] [[19516,0],1] node[1].name cluster4
>>> daemon 1 arch ffc91200
>>> [cluster4.tekno-soft.it:25123] [[19516,0],1] node[2].name cluster3
>>> daemon INVALID arch ffc91200
>>> [cluster4.tekno-soft.it:25123] [[19516,0],1] node[3].name cluster2
>>> daemon INVALID arch ffc91200
>>> [cluster4.tekno-soft.it:25123] [[19516,0],1] node[4].name cluster1
>>> daemon INVALID arch ffc91200
>>> [cluster4.tekno-soft.it:25123] [[19516,0],1] orted_recv: received
>>> sync+nidmap from local proc [[19516,2],0]
>>> [master.tekno-soft.it:30063] [[19516,0],0] orted_cmd: received
>>> collective data cmd
>>> [master.tekno-soft.it:30063] [[19516,0],0] orted_cmd: received
>>> message_local_procs
>>> [cluster4.tekno-soft.it:25123] [[19516,0],1] orted_cmd: received
>>> collective data cmd
>>> [cluster4.tekno-soft.it:25123] [[19516,0],1] orted_cmd: received
>>> message_local_procs
>>> [cluster4.tekno-soft.it:25123] [[19516,0],1] orted_cmd: received
>>> collective data cmd
>>> [master.tekno-soft.it:30063] [[19516,0],0] orted_cmd: received
>>> collective data cmd
>>> [master.tekno-soft.it:30063] [[19516,0],0] orted_cmd: received
>>> message_local_procs
>>> [cluster4.tekno-soft.it:25123] [[19516,0],1] orted_cmd: received
>>> message_local_procs
>>>
>>> Let me know if you need my test program.
>>>
>>>>
>>>> Thanks
>>>> Ralph
>>>>
>>>>>
>>>>>> Ralph
>>>>>>
>>>>>> On Oct 3, 2008, at 5:02 AM, Roberto Fichera wrote:
>>>>>>
>>>>>>> Ralph Castain ha scritto:
>>>>>>>> Actually, it just occurred to me that you may be seeing a
>>>>>>>> problem in
>>>>>>>> comm_spawn itself that I am currently chasing down. It is in
>>>>>>>> the
>>>>>>>> 1.3
>>>>>>>> branch and has to do with comm_spawning procs on subsets of
>>>>>>>> nodes
>>>>>>>> (instead of across all nodes). Could be related to this - you
>>>>>>>> might
>>>>>>>> want to give me a chance to complete the fix. I have
>>>>>>>> identified the
>>>>>>>> problem and should have it fixed later today in our trunk -
>>>>>>>> probably
>>>>>>>> won't move to the 1.3 branch for several days.
>>>>>>> Do you have any news about the above fix? Does the fix is
>>>>>>> already
>>>>>>> available for testing?
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
> <testspawn.tar.bz2>_______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users