Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] problem in the ORTE notifier framework
From: George Bosilca (bosilca_at_[hidden])
Date: 2009-05-27 11:34:51


What is a generic threshold? And what is a counter? We have a policy
against such coding standards, and to be honest I would like to stick
to it. The reason is that the PML is a very complex piece of code, and
I would like to keep it as easy to understand as possible. If people
start adding #if/#endif all over the code, we diverging from this goal.

The only way to make this work is to call the notifier or some other
framework in this "slow path" and let this other framework do it's own
logic to determine what and when to print. Of course the cost of this
is a function call plus an atomic operation (which is already not
cheap). It's starting to get expensive, even for a "slow path", which
in this particular context is just one insertion in an atomic FIFO.

If instead of counting in number of times we try to send the fragment,
and switch to a time base approach, this can be solved with the PERUSE
calls. There is a callback when the request is created, and another
callback when the first fragment is pushed successfully into the
network. Computing the time between these two, allow a tool to figure
out how much time the request was waiting in some internal queues, and
therefore how much delay this added to the execution time.

   george.

On May 27, 2009, at 06:59 , Ralph Castain wrote:

> ORTE_NOTIFIER_VERBOSE(api, counter, threshold,...)
>
> #if WANT_NOTIFIER_VERBOSE
> opal_atomic_increment(counter);
> if (counter > threshold) {
> orte_notifier.api(.....)
> }
> #endif