Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

From: David Daniel (ddd_at_[hidden])
Date: 2007-03-22 20:33:51


OK. This sounds sensible.

Thanks, David

On Mar 22, 2007, at 10:38 AM, Ralph Castain wrote:

> We had a nice chat about this on the OpenRTE telecon this morning. The
> question of what to do with multiple prefix's has been a long-
> running issue,
> most recently captured in bug trac report #497. The problem is that
> prefix
> is intended to tell us where to find the ORTE/OMPI executables, and
> therefore is associated with a node - not an app_context. What we
> haven't
> been able to define is an appropriate notation that a user can
> exploit to
> tell us the association.
>
> This issue has arisen on several occasions where either (a) users have
> heterogeneous clusters with a common file system, so the prefix
> must be
> adjusted on each *type* of node to point to the correct type of
> binary; and
> (b) for whatever reason, typically on rsh/ssh clusters, users have
> installed
> the binaries in different locations on some of the nodes. In this
> latter
> case, the reports have been from homogeneous clusters, so the
> *type* of
> binary was never the issue - it just wasn't located where we expected.
>
> Sun's solution is (I believe) what most of us would expect - they
> locate
> their executables in the same relative location on all their nodes.
> The
> binary in that location is correct for that local architecture. This
> requires, though, that the "prefix" location not be on a common
> file system.
>
> Unfortunately, that isn't the case with LANL's roadrunner, nor can
> we expect
> that everyone will follow that sensible approach :-). So we need a
> notation
> to support the "exception" case where someone needs to truly
> specify prefix
> versus node(s).
>
> We discussed a number of options, including auto-detecting the
> local arch
> and appending it to the specified "prefix" and several others. After
> discussing them, those of us on the call decided that adding a
> field to the
> hostfile that specifies the prefix to use on that host would be the
> best
> solution. This could be done on a cluster-level basis, so -
> although it is
> annoying to create the data file - at least it would only have to
> be done
> once.
>
> Again, this is the exception case, so requiring a little
> inconvenience seems
> a reasonable thing to do.
>
> Anyone have heartburn and/or other suggestions? If not, we might
> start to
> play with this next week. We would have to do some small
> modifications to
> the RAS, RMAPS, and PLS components to ensure that any multi-prefix
> info gets
> correctly propagated and used across all platforms for consistent
> behavior.
>
> Ralph
>
>
> On 3/22/07 9:11 AM, "David Daniel" <ddd_at_[hidden]> wrote:
>
>> This is a development system for roadrunner using ssh.
>>
>> David
>>
>> On Mar 22, 2007, at 5:19 AM, Jeff Squyres wrote:
>>
>>> FWIW, I believe that we had intended --prefix to handle simple cases
>>> which is why this probably doesn't work for you. But as long as the
>>> different prefixes are specified for different nodes, it could
>>> probably be made to work.
>>>
>>> Which launcher are you using this with?
>>>
>>>
>>>
>>> On Mar 21, 2007, at 11:36 PM, Ralph Castain wrote:
>>>
>>>> Yo David
>>>>
>>>> What system are you running this on? RoadRunner? If so, I can take
>>>> a look at
>>>> "fixing" it for you tomorrow (Thurs).
>>>>
>>>> Ralph
>>>>
>>>>
>>>> On 3/21/07 10:17 AM, "David Daniel" <ddd_at_[hidden]> wrote:
>>>>
>>>>> I'm experimenting with heterogeneous applications (x86_64 <-->
>>>>> ppc64), where the systems share the file system where Open MPI is
>>>>> installed.
>>>>>
>>>>> What I would like to be able to do is something like this:
>>>>>
>>>>> mpirun --np 1 --host host-x86_64 --prefix /opt/ompi/x86_64
>>>>> a.out.x86_64 : --np 1 --host host-ppc64 --prefix /opt/ompi/ppc64
>>>>> a.out.ppc64
>>>>>
>>>>> Unfortunately it looks as if the second --prefix is always
>>>>> ignored.
>>>>> My guess is that orte_app_context_t::prefix_dir is getting set,
>>>>> but
>>>>> only the 0th app context is never consulted (except in the dynamic
>>>>> process stuff where I do see a loop over the app context array).
>>>>>
>>>>> I can of course work around it with startup scripts, but a command
>>>>> line solution would be attractive.
>>>>>
>>>>> This is with openmpi-1.2.
>>>>>
>>>>> Thanks, David
>>>>>
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> devel_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>
>>>>
>>>> _______________________________________________
>>>> devel mailing list
>>>> devel_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>>
>>> --
>>> Jeff Squyres
>>> Cisco Systems
>>>
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>> --
>> David Daniel <ddd_at_[hidden]>
>> Computer Science for High-Performance Computing (CCS-1)
>>
>>
>>
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

--
David Daniel <ddd_at_[hidden]>
Computer Science for High-Performance Computing (CCS-1)