Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI devel] RTE Issue IV: RTE/MPI relative modex responsibilities
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2007-12-06 10:00:08


On Dec 5, 2007, at 11:23 AM, Ralph H Castain wrote:

>> Well, I think it is pretty obvious that I am a fan of a attribute
>> system :)
>>
>> For completeness, I will point out that we also exchange architecture
>> and hostname info in the modex.
>
> True - except we should note that hostname info is only exchanged if
> someone
> specifically requests it.

Note that I am a fan of *always* exchanging the hostname information.

I say this because multiple Cisco customers have told us that this is
invaluable debugging information: when a BTL fails to send a message,
for example, we specifically put in the error message "hostA tried to
send to hostB and failed" (vs. "communicator X rank Y tried to send to
rank Z"). System administrators want/need the actual hostnames in
order to [greatly] simplify the process of troubleshooting if there
is a problem in the fabric, and if so, where it is.

This is especially important for very large fabrics.

>> Do we really need a complete node map? A far as I can tell, it looks
>> like the MPI layer only needs a list of local processes. So maybe it
>> would be better to forget about the node ids at the mpi layer and
>> just
>> return the local procs.
>
> I agree, though I don't think we want a parallel list of procs. We
> just need
> to set the "local" flag in the existing ompi_proc_t structures.

I agree that the desired end result is that we need that "local" flag
set in the relevant ompi_proc_t's.

As previously implied: strcmp'ing hostnames is not always sufficient
(e.g., on the cray). Hence, sending hostnames around is useful for
the reasons I cited above, but it may not be sufficient for what is
needed.

>> So my vote would be to leave the modex alone, but remove the node id,
>> and add a function to get the list of local procs. It doesn't
>> matter to
>> me how the RTE implements that.
>
> I think we would need to be careful here that we don't create a need
> for
> more communication. We have two functions currently in the modex:
>
> 1. how to exchange the info required to populate the ompi_proc_t
> structures;
> and
>
> 2. how to identify which of those procs are "local"
>
> The problem with leaving the modex as it currently sits is that some
> environments require a different mechanism for exchanging the
> ompi_proc_t
> info. While most can use the RML, some can't. The same division of
> capabilities applies to getting the "local" info, so it makes sense
> to me to
> put the modex in a framework.
>
> Otherwise, we wind up with a bunch of #if's in the code to support
> environments like the Cray. I believe the mca system was put in place
> precisely to avoid those kind of practices, so it makes sense to me
> to take
> advantage of it.

FWIW, I'm very against putting #if's in the code for specific
architectures / RTE's. Such differences is what the MCA is for.

>> Alternatively, if we did a process attribute system we could just use
>> predefined attributes, and the runtime can get each process's node id
>> however it wants.
>
> Same problem as above, isn't it? Probably ignorance on my part, but
> it seems
> to me that we simply exchange a modex framework for an attribute
> framework
> (since each environment would have to get the attribute values in a
> different manner) - don't we?
>
> I have no problem with using attributes instead of the modex, but
> the issue
> appears to be the same either way - you still need a framework to
> handle the
> different methods.

I agree -- I don't see the difference. Tim -- can you explain? (I
also didn't quite understand your statement about being a fan of
attribute systems; other than it being an ASCII system with a flat
namespace [why is a flat namespace good, btw?], I don't really see how
it's significantly different than the modex principle...?)

-- 
Jeff Squyres
Cisco Systems