Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] RTE Issue IV: RTE/MPI relative modex responsibilities
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2007-12-06 10:00:08

On Dec 5, 2007, at 11:23 AM, Ralph H Castain wrote:

>> Well, I think it is pretty obvious that I am a fan of a attribute
>> system :)
>> For completeness, I will point out that we also exchange architecture
>> and hostname info in the modex.
> True - except we should note that hostname info is only exchanged if
> someone
> specifically requests it.

Note that I am a fan of *always* exchanging the hostname information.

I say this because multiple Cisco customers have told us that this is
invaluable debugging information: when a BTL fails to send a message,
for example, we specifically put in the error message "hostA tried to
send to hostB and failed" (vs. "communicator X rank Y tried to send to
rank Z"). System administrators want/need the actual hostnames in
order to [greatly] simplify the process of troubleshooting if there
is a problem in the fabric, and if so, where it is.

This is especially important for very large fabrics.

>> Do we really need a complete node map? A far as I can tell, it looks
>> like the MPI layer only needs a list of local processes. So maybe it
>> would be better to forget about the node ids at the mpi layer and
>> just
>> return the local procs.
> I agree, though I don't think we want a parallel list of procs. We
> just need
> to set the "local" flag in the existing ompi_proc_t structures.

I agree that the desired end result is that we need that "local" flag
set in the relevant ompi_proc_t's.

As previously implied: strcmp'ing hostnames is not always sufficient
(e.g., on the cray). Hence, sending hostnames around is useful for
the reasons I cited above, but it may not be sufficient for what is

>> So my vote would be to leave the modex alone, but remove the node id,
>> and add a function to get the list of local procs. It doesn't
>> matter to
>> me how the RTE implements that.
> I think we would need to be careful here that we don't create a need
> for
> more communication. We have two functions currently in the modex:
> 1. how to exchange the info required to populate the ompi_proc_t
> structures;
> and
> 2. how to identify which of those procs are "local"
> The problem with leaving the modex as it currently sits is that some
> environments require a different mechanism for exchanging the
> ompi_proc_t
> info. While most can use the RML, some can't. The same division of
> capabilities applies to getting the "local" info, so it makes sense
> to me to
> put the modex in a framework.
> Otherwise, we wind up with a bunch of #if's in the code to support
> environments like the Cray. I believe the mca system was put in place
> precisely to avoid those kind of practices, so it makes sense to me
> to take
> advantage of it.

FWIW, I'm very against putting #if's in the code for specific
architectures / RTE's. Such differences is what the MCA is for.

>> Alternatively, if we did a process attribute system we could just use
>> predefined attributes, and the runtime can get each process's node id
>> however it wants.
> Same problem as above, isn't it? Probably ignorance on my part, but
> it seems
> to me that we simply exchange a modex framework for an attribute
> framework
> (since each environment would have to get the attribute values in a
> different manner) - don't we?
> I have no problem with using attributes instead of the modex, but
> the issue
> appears to be the same either way - you still need a framework to
> handle the
> different methods.

I agree -- I don't see the difference. Tim -- can you explain? (I
also didn't quite understand your statement about being a fan of
attribute systems; other than it being an ASCII system with a flat
namespace [why is a flat namespace good, btw?], I don't really see how
it's significantly different than the modex principle...?)

Jeff Squyres
Cisco Systems