Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] How to specify hosts for MPI_Comm_spawn
From: Ralph Castain (rhc_at_[hidden])
Date: 2008-07-30 10:36:05


IThe problem would be finding a way to tell all the MPI apps how to
contact each other as the Intercomm procedure needs that info to
complete. I don't recall if the MPI_Name_publish/lookup functions
worked in 1.2 - I'm building the code now to see.

If it does, then you could use it to get the required contact info and
wire up the Intercomm...it's a lot of what goes on under the
comm_spawn covers anyway. Only diff is the necessity for the server...

On Jul 30, 2008, at 8:24 AM, Robert Kubrick wrote:

> Mark, if you can run a server process on the remote machine, you
> could send a request from your local MPI app to your server, then
> use an Intercomm to link the local process to the new remote process?
>
> On Jul 30, 2008, at 9:55 AM, Mark Borgerding wrote:
>
>> I'm afraid I can't dictate to the customer that they must upgrade.
>> The target platform is RHEL 5.2 ( uses openmpi 1.2.6 )
>>
>> I will try to find some sort of workaround. Any suggestions on how
>> to "fake" the functionality of MPI_Comm_spawn are welcome.
>>
>> To reiterate my needs:
>> I am writing a shared object that plugs into an existing framework.
>> I do not control how the framework launches its processes (no
>> mpirun).
>> I want to start remote processes to crunch the data.
>> The shared object marshall the I/O between the framework and the
>> remote processes.
>>
>> -- Mark
>>
>>
>> Ralph Castain wrote:
>>> Singleton comm_spawn works fine on the 1.3 release branch - if
>>> singleton comm_spawn is critical to your plans, I suggest moving
>>> to that version. You can get a pre-release version off of the www.open-mpi.org
>>> web site.
>>>
>>>
>>> On Jul 30, 2008, at 6:58 AM, Ralph Castain wrote:
>>>
>>>> As your own tests have shown, it works fine if you just "mpirun -
>>>> n 1 ./spawner". It is only singleton comm_spawn that appears to
>>>> be having a problem in the latest 1.2 release. So I don't think
>>>> comm_spawn is "useless". ;-)
>>>>
>>>> I'm checking this morning to ensure that singletons properly
>>>> spawns on other nodes in the 1.3 release. I sincerely doubt we
>>>> will backport a fix to 1.2.
>>>>
>>>>
>>>> On Jul 30, 2008, at 6:49 AM, Mark Borgerding wrote:
>>>>
>>>>> I keep checking my email in hopes that someone will come up with
>>>>> something that Matt or I might've missed.
>>>>> I'm just having a hard time accepting that something so
>>>>> fundamental would be so broken.
>>>>> The MPI_Comm_spawn command is essentially useless without the
>>>>> ability to spawn processes on other nodes.
>>>>>
>>>>> If this is true, then my personal scorecard reads:
>>>>> # Days spent using openmpi: 4 (off and on)
>>>>> # identified bugs in openmpi :2
>>>>> # useful programs built: 0
>>>>>
>>>>> Please prove me wrong. I'm eager to be shown my ignorance -- to
>>>>> find out where I've been stupid and what documentation I
>>>>> should've read.
>>>>>
>>>>>
>>>>> Matt Hughes wrote:
>>>>>> I've found that I always have to use mpirun to start my spawner
>>>>>> process, due to the exact problem you are having: the need to
>>>>>> give
>>>>>> OMPI a hosts file! It seems the singleton functionality is
>>>>>> lacking
>>>>>> somehow... it won't allow you to spawn on arbitrary hosts. I
>>>>>> have not
>>>>>> tested if this is fixed in the 1.3 series.
>>>>>>
>>>>>> Try
>>>>>> mpiexec -np 1 -H op2-1,op2-2 spawner op2-2
>>>>>>
>>>>>> mpiexec should start the first process on op2-1, and the spawn
>>>>>> call
>>>>>> should start the second on op2-2. If you don't use the Info
>>>>>> object to
>>>>>> set the hostname specifically, then on 1.2.x it will
>>>>>> automatically
>>>>>> start on op2-2. With 1.3, the spawn call will start processes
>>>>>> starting with the first item in the host list.
>>>>>>
>>>>>> mch
>>>>>
>>>>> [snip]
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users