What specifically do you have in mind ?

After talking with Jeff I withdraw my request to change the approach.  This is a good approach when one wants to send warnings to some sort of logging system, in addition to errors.  Sending the data up stream like I suggested can’t rely on the error return-code, and as such requires a check on every return – bad idea.

If the call is for a discussion beyond this, this is fine with me, but would be more useful once a concrete idea on how to implement step 4 is reached.  If people have specific ideas, an early call would be good, otherwise I would expect that early Jan we would be better prepared to talk about specifics.

The copy and branch approach is not practical – it doubles the maintenance work, and the point is to leverage on-going work.


On 12/4/08 5:15 PM, "Jeff Squyres" <jsquyres@cisco.com> wrote:

The likelihood of a physical meeting about this in the near future is
unlikely; I think we're all facing travel restrictions and constraints
with the holidays coming up.

How about a teleconf to discuss the following about the notifier:

- what exactly is there today
- why what is there today is the way it is
- discuss proposals on different ways to do it

More specifically, I think we all agree that the idea of an MPI
application notifying a higher-level entity when it detects errors is
a good one (e.g., on the host, or in the network, or ...).  I think
that it is worth discussing in higher bandwidth so that we can avoid
email hell (I agree with Ralph; this could devolve pretty easily).

I propose any of the following times to discuss (I'll setup a phone

- Mon, Dec 8, 2pm, 3pm, or 4pm Eastern
- Tue, Dec 9, 10am, noon, 1pm, 2pm, 3pm, or 4pm Eastern
- Wed, Dec 10, any time
- Thu, Dec 11, 11am, 1pm, 2pm, 3pm, or 4pm Eastern
- Fri, Dec 12, 9am, 10am, 11am, 2pm, 3pm, or 4pm Eastern

On Dec 4, 2008, at 3:16 PM, Ralph Castain wrote:

> I'm beginning to believe that we need a design meeting specifically
> over this question. Too many unknowns exist, with significant
> potential problems lurking behind them. Frankly, this issue could
> have a major impact on how we operate, performance, and a variety of
> other factors going forward - many of which may be difficult to
> predict.
> I suspect there may not be "optimal" solutions to many of these
> questions, but there certainly will be strong opinions in multiple
> directions.
> As part of that discussion, I propose that we consider alternative
> methods for meeting the same overall objective - namely, reuse of
> the BTL's by another software project. For example, a simple copy-
> and-branch is the dominant method today, with patches used by both
> parties to cherry-pick the changes they want from the other code
> users. Multiple tools have been developed to support this mode of
> operation, yet we haven't discussed any of them in this context. The
> proposed approach contains a number of impacts that may be avoided
> with an alternative approach.
> Without such a meeting, I fear we are going to rapidly dissolve into
> email hell again.
> Ralph
> On Dec 4, 2008, at 1:07 PM, Eugene Loh wrote:
>> Richard Graham wrote:
>>> I expect this will involve some sort of well defined interface
>>> between the btl’s and orte, and I don’t know if this will also
>>> require something like this between the btl’s and the pml – I
>>> think that interface is rigidly enforced, but am not sure.
>> I'm probably missing the scope of what you're saying here, but it
>> raises another question in my mind.  Is there today a well-defined
>> interface between the BTLs and... anything else?  PML or whatever?  
>> Maybe this comes back to a documentation question:  do we (or will
>> we) have anything written down that says what a BTL must do, what
>> it may rely on, etc.?
>> _______________________________________________
>> devel mailing list
>> devel@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> _______________________________________________
> devel mailing list
> devel@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

Jeff Squyres
Cisco Systems

devel mailing list