Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] How to specify hosts for MPI_Comm_spawn
From: Ralph Castain (rhc_at_[hidden])
Date: 2008-07-30 12:05:08


Just to be clear: you do not require a daemon on every node. You just
need one daemon - sitting somewhere - that can act as the data server
for MPI_Name_publish/lookup. You then tell each app where to find it.

Normally, mpirun fills that function. But if you don't have it, you
can kickoff a persistent orted (perhaps just have the parent
application fork/exec it) for that purpose.

On Jul 30, 2008, at 9:50 AM, Robert Kubrick wrote:

>
> On Jul 30, 2008, at 11:12 AM, Mark Borgerding wrote:
>
>> I appreciate the suggestion about running a daemon on each of the
>> remote nodes, but wouldn't I kind of be reinventing the wheel
>> there? Process management is one of the things I'd like to be able
>> to count on ORTE for.
>> Would the following work to give the parent process an intercomm
>> with each child?
>>
>> parent i.e. my non-mpirun-started process calls MPI_Init then
>> MPI_Open_port
>> parent spawns mpirun command via system/exec to create the remote
>> children . The name from MPI_Open_port is placed in the environment.
>> parent calls MPI_Comm_accept (once for each child?)
>
> I think you have to create a separate thread to run the accept, in
> order to accept multiple client connections. This should be
> supported by OpenMPI as it was the original idea of the API design
> to handle multiple client connections. There is an MPI_Comm_accept
> multi-thread example in the book Using MPI-2.
>
>> all children call MPI_connect to the name
>>
>> I think this would give one intercommunicator back to the parent
>> for each remote process (not ideal, but I can worry about broadcast
>> data later)
>> The remote processes can communicate to each other through
>> MPI_COMM_WORLD.
>
> You should be able to merge each child communicator from each accept
> thread into a global comm anyway.
>
>>
>>
>> Actually when I think through the details, much of this is pretty
>> similar to the daemon MPI_Publish_name+MPI_Lookup_name approach.
>> The main difference being which processes come first.
>
> You can run a deamon through system/exec the same way you run
> mpiexec. Just use ssh or rsh on the system/exec call.
>
>>
>>
>>
>>
>> Mark Borgerding wrote:
>>> I'm afraid I can't dictate to the customer that they must upgrade.
>>> The target platform is RHEL 5.2 ( uses openmpi 1.2.6 )
>>>
>>> I will try to find some sort of workaround. Any suggestions on how
>>> to "fake" the functionality of MPI_Comm_spawn are welcome.
>>>
>>> To reiterate my needs:
>>> I am writing a shared object that plugs into an existing framework.
>>> I do not control how the framework launches its processes (no
>>> mpirun).
>>> I want to start remote processes to crunch the data.
>>> The shared object marshall the I/O between the framework and the
>>> remote processes.
>>>
>>> -- Mark
>>>
>>>
>>> Ralph Castain wrote:
>>>> Singleton comm_spawn works fine on the 1.3 release branch - if
>>>> singleton comm_spawn is critical to your plans, I suggest moving
>>>> to that version. You can get a pre-release version off of the www.open-mpi.org
>>>> web site.
>>>>
>>>>
>>>> On Jul 30, 2008, at 6:58 AM, Ralph Castain wrote:
>>>>
>>>>> As your own tests have shown, it works fine if you just "mpirun -
>>>>> n 1 ./spawner". It is only singleton comm_spawn that appears to
>>>>> be having a problem in the latest 1.2 release. So I don't think
>>>>> comm_spawn is "useless". ;-)
>>>>>
>>>>> I'm checking this morning to ensure that singletons properly
>>>>> spawns on other nodes in the 1.3 release. I sincerely doubt we
>>>>> will backport a fix to 1.2.
>>>>>
>>>>>
>>>>> On Jul 30, 2008, at 6:49 AM, Mark Borgerding wrote:
>>>>>
>>>>>> I keep checking my email in hopes that someone will come up
>>>>>> with something that Matt or I might've missed.
>>>>>> I'm just having a hard time accepting that something so
>>>>>> fundamental would be so broken.
>>>>>> The MPI_Comm_spawn command is essentially useless without the
>>>>>> ability to spawn processes on other nodes.
>>>>>>
>>>>>> If this is true, then my personal scorecard reads:
>>>>>> # Days spent using openmpi: 4 (off and on)
>>>>>> # identified bugs in openmpi :2
>>>>>> # useful programs built: 0
>>>>>>
>>>>>> Please prove me wrong. I'm eager to be shown my ignorance --
>>>>>> to find out where I've been stupid and what documentation I
>>>>>> should've read.
>>>>>>
>>>>>>
>>>>>> Matt Hughes wrote:
>>>>>>> I've found that I always have to use mpirun to start my spawner
>>>>>>> process, due to the exact problem you are having: the need to
>>>>>>> give
>>>>>>> OMPI a hosts file! It seems the singleton functionality is
>>>>>>> lacking
>>>>>>> somehow... it won't allow you to spawn on arbitrary hosts. I
>>>>>>> have not
>>>>>>> tested if this is fixed in the 1.3 series.
>>>>>>>
>>>>>>> Try
>>>>>>> mpiexec -np 1 -H op2-1,op2-2 spawner op2-2
>>>>>>>
>>>>>>> mpiexec should start the first process on op2-1, and the spawn
>>>>>>> call
>>>>>>> should start the second on op2-2. If you don't use the Info
>>>>>>> object to
>>>>>>> set the hostname specifically, then on 1.2.x it will
>>>>>>> automatically
>>>>>>> start on op2-2. With 1.3, the spawn call will start processes
>>>>>>> starting with the first item in the host list.
>>>>>>>
>>>>>>> mch
>>>>>>
>>>>>> [snip]
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> users_at_[hidden]
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users