Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Preparations for moving the btl's
From: Richard Graham (rlgraham_at_[hidden])
Date: 2008-12-04 17:38:31


What specifically do you have in mind ?

After talking with Jeff I withdraw my request to change the approach. This
is a good approach when one wants to send warnings to some sort of logging
system, in addition to errors. Sending the data up stream like I suggested
can¹t rely on the error return-code, and as such requires a check on every
return ­ bad idea.

If the call is for a discussion beyond this, this is fine with me, but would
be more useful once a concrete idea on how to implement step 4 is reached.
If people have specific ideas, an early call would be good, otherwise I
would expect that early Jan we would be better prepared to talk about
specifics.

The copy and branch approach is not practical ­ it doubles the maintenance
work, and the point is to leverage on-going work.

Rich

On 12/4/08 5:15 PM, "Jeff Squyres" <jsquyres_at_[hidden]> wrote:

> The likelihood of a physical meeting about this in the near future is
> unlikely; I think we're all facing travel restrictions and constraints
> with the holidays coming up.
>
> How about a teleconf to discuss the following about the notifier:
>
> - what exactly is there today
> - why what is there today is the way it is
> - discuss proposals on different ways to do it
>
> More specifically, I think we all agree that the idea of an MPI
> application notifying a higher-level entity when it detects errors is
> a good one (e.g., on the host, or in the network, or ...). I think
> that it is worth discussing in higher bandwidth so that we can avoid
> email hell (I agree with Ralph; this could devolve pretty easily).
>
> I propose any of the following times to discuss (I'll setup a phone
> bridge):
>
> - Mon, Dec 8, 2pm, 3pm, or 4pm Eastern
> - Tue, Dec 9, 10am, noon, 1pm, 2pm, 3pm, or 4pm Eastern
> - Wed, Dec 10, any time
> - Thu, Dec 11, 11am, 1pm, 2pm, 3pm, or 4pm Eastern
> - Fri, Dec 12, 9am, 10am, 11am, 2pm, 3pm, or 4pm Eastern
>
>
>
>
> On Dec 4, 2008, at 3:16 PM, Ralph Castain wrote:
>
>> > I'm beginning to believe that we need a design meeting specifically
>> > over this question. Too many unknowns exist, with significant
>> > potential problems lurking behind them. Frankly, this issue could
>> > have a major impact on how we operate, performance, and a variety of
>> > other factors going forward - many of which may be difficult to
>> > predict.
>> >
>> > I suspect there may not be "optimal" solutions to many of these
>> > questions, but there certainly will be strong opinions in multiple
>> > directions.
>> >
>> > As part of that discussion, I propose that we consider alternative
>> > methods for meeting the same overall objective - namely, reuse of
>> > the BTL's by another software project. For example, a simple copy-
>> > and-branch is the dominant method today, with patches used by both
>> > parties to cherry-pick the changes they want from the other code
>> > users. Multiple tools have been developed to support this mode of
>> > operation, yet we haven't discussed any of them in this context. The
>> > proposed approach contains a number of impacts that may be avoided
>> > with an alternative approach.
>> >
>> > Without such a meeting, I fear we are going to rapidly dissolve into
>> > email hell again.
>> >
>> > Ralph
>> >
>> >
>> >
>> > On Dec 4, 2008, at 1:07 PM, Eugene Loh wrote:
>> >
>>> >> Richard Graham wrote:
>>>> >>>
>>>> >>> I expect this will involve some sort of well defined interface
>>>> >>> between the btl¹s and orte, and I don¹t know if this will also
>>>> >>> require something like this between the btl¹s and the pml ­ I
>>>> >>> think that interface is rigidly enforced, but am not sure.
>>> >> I'm probably missing the scope of what you're saying here, but it
>>> >> raises another question in my mind. Is there today a well-defined
>>> >> interface between the BTLs and... anything else? PML or whatever?
>>> >> Maybe this comes back to a documentation question: do we (or will
>>> >> we) have anything written down that says what a BTL must do, what
>>> >> it may rely on, etc.?
>>> >> _______________________________________________
>>> >> devel mailing list
>>> >> devel_at_[hidden]
>>> >> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> >
>> > _______________________________________________
>> > devel mailing list
>> > devel_at_[hidden]
>> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
> --
> Jeff Squyres
> Cisco Systems
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>