Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] How to specify hosts for MPI_Comm_spawn
From: Ralph Castain (rhc_at_[hidden])
Date: 2008-07-30 10:49:37


Okay, I tested it and MPI_Name_publish and MPI_Name_lookup work on
1.2.6, so this may provide an avenue (albeit cumbersome) for you to
get this to work. It may require a server, though, to make it work -
your first MPI proc may be able to play that role if you pass it's
contact info to the others, but I'd have to play with it for awhile to
be sure. Haven't really tried that before.

Otherwise, even if we devised a fix for the singleton comm_spawn in
1.2, it would still require an upgrade by the customer as it wouldn't
be in 1.2.6 - best that could happen is for it to appear in 1.2.7,
assuming we created the fix for that impending release (far from
certain).

So if this doesn't work, and the customer cannot or will not upgrade
from 1.2.6, I fear you probably cannot do this with OMPI under the
constraints you describe.

On Jul 30, 2008, at 8:36 AM, Ralph Castain wrote:

> IThe problem would be finding a way to tell all the MPI apps how to
> contact each other as the Intercomm procedure needs that info to
> complete. I don't recall if the MPI_Name_publish/lookup functions
> worked in 1.2 - I'm building the code now to see.
>
> If it does, then you could use it to get the required contact info
> and wire up the Intercomm...it's a lot of what goes on under the
> comm_spawn covers anyway. Only diff is the necessity for the server...
>
> On Jul 30, 2008, at 8:24 AM, Robert Kubrick wrote:
>
>> Mark, if you can run a server process on the remote machine, you
>> could send a request from your local MPI app to your server, then
>> use an Intercomm to link the local process to the new remote process?
>>
>> On Jul 30, 2008, at 9:55 AM, Mark Borgerding wrote:
>>
>>> I'm afraid I can't dictate to the customer that they must upgrade.
>>> The target platform is RHEL 5.2 ( uses openmpi 1.2.6 )
>>>
>>> I will try to find some sort of workaround. Any suggestions on how
>>> to "fake" the functionality of MPI_Comm_spawn are welcome.
>>>
>>> To reiterate my needs:
>>> I am writing a shared object that plugs into an existing framework.
>>> I do not control how the framework launches its processes (no
>>> mpirun).
>>> I want to start remote processes to crunch the data.
>>> The shared object marshall the I/O between the framework and the
>>> remote processes.
>>>
>>> -- Mark
>>>
>>>
>>> Ralph Castain wrote:
>>>> Singleton comm_spawn works fine on the 1.3 release branch - if
>>>> singleton comm_spawn is critical to your plans, I suggest moving
>>>> to that version. You can get a pre-release version off of the www.open-mpi.org
>>>> web site.
>>>>
>>>>
>>>> On Jul 30, 2008, at 6:58 AM, Ralph Castain wrote:
>>>>
>>>>> As your own tests have shown, it works fine if you just "mpirun -
>>>>> n 1 ./spawner". It is only singleton comm_spawn that appears to
>>>>> be having a problem in the latest 1.2 release. So I don't think
>>>>> comm_spawn is "useless". ;-)
>>>>>
>>>>> I'm checking this morning to ensure that singletons properly
>>>>> spawns on other nodes in the 1.3 release. I sincerely doubt we
>>>>> will backport a fix to 1.2.
>>>>>
>>>>>
>>>>> On Jul 30, 2008, at 6:49 AM, Mark Borgerding wrote:
>>>>>
>>>>>> I keep checking my email in hopes that someone will come up
>>>>>> with something that Matt or I might've missed.
>>>>>> I'm just having a hard time accepting that something so
>>>>>> fundamental would be so broken.
>>>>>> The MPI_Comm_spawn command is essentially useless without the
>>>>>> ability to spawn processes on other nodes.
>>>>>>
>>>>>> If this is true, then my personal scorecard reads:
>>>>>> # Days spent using openmpi: 4 (off and on)
>>>>>> # identified bugs in openmpi :2
>>>>>> # useful programs built: 0
>>>>>>
>>>>>> Please prove me wrong. I'm eager to be shown my ignorance --
>>>>>> to find out where I've been stupid and what documentation I
>>>>>> should've read.
>>>>>>
>>>>>>
>>>>>> Matt Hughes wrote:
>>>>>>> I've found that I always have to use mpirun to start my spawner
>>>>>>> process, due to the exact problem you are having: the need to
>>>>>>> give
>>>>>>> OMPI a hosts file! It seems the singleton functionality is
>>>>>>> lacking
>>>>>>> somehow... it won't allow you to spawn on arbitrary hosts. I
>>>>>>> have not
>>>>>>> tested if this is fixed in the 1.3 series.
>>>>>>>
>>>>>>> Try
>>>>>>> mpiexec -np 1 -H op2-1,op2-2 spawner op2-2
>>>>>>>
>>>>>>> mpiexec should start the first process on op2-1, and the spawn
>>>>>>> call
>>>>>>> should start the second on op2-2. If you don't use the Info
>>>>>>> object to
>>>>>>> set the hostname specifically, then on 1.2.x it will
>>>>>>> automatically
>>>>>>> start on op2-2. With 1.3, the spawn call will start processes
>>>>>>> starting with the first item in the host list.
>>>>>>>
>>>>>>> mch
>>>>>>
>>>>>> [snip]
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> users_at_[hidden]
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users