The code in 1.3 is definitely different from the trunk as it lags quite a bit behind. However, the trunk definitely does include the code I referenced.

Not sure why the hg mirror wouldn't have it. I would have to defer to Jeff on that question - could be a bug in the update macro that maintains the mirror?

I haven't checked the opal_sos branch to see if it has the code in it, but I would have thought those guys were tracking the trunk that closely - that code was committed in r19209.

Ralph


On Thu, May 28, 2009 at 1:45 AM, Sylvain Jeaugey <sylvain.jeaugey@bull.net> wrote:
To be more complete, we pull Hg from http://www.open-mpi.org/hg/hgwebdir.cgi/ompi-svn-mirror/ ; are we mistaken ?

If not, the code in v1.3 seems to be different from the code in the trunk ...

Sylvain


On Thu, 28 May 2009, Nadia Derbey wrote:

On Tue, 2009-05-26 at 17:24 -0600, Ralph Castain wrote:
First, to answer Nadia's question: you will find that the init
function for the module is already called when it is selected - see
the code in orte/mca/base/notifier_base_select.c, lines 72-76 (in the
trunk.

Strange? Our repository is a clone of the trunk?

It's true that if I "hg update" to v1.3 I see that the fix is there.

Regards,
Nadia

It would be a good idea to tie into the sos work to avoid conflicts
when it all gets merged back together, assuming that isn't a big
problem for you.

As for Jeff's suggestion: dealing with the performance hit problem is
why I suggested ORTE_NOTIFIER_VERBOSE, modeled after the
OPAL_OUTPUT_VERBOSE model. The idea was to compile it in -only- when
the system is built for it - maybe using a --with-notifier-verbose
configuration option. Frankly, some organizations would happily pay a
small performance penalty for the benefits.

I would personally recommend that the notifier framework keep the
stats so things can be compact and self-contained. We still get
atomicity by allowing each framework/component/whatever specify the
threshold. Creating yet another system to do nothing more than track
error/warning frequencies to decide whether or not to notify seems
wasteful.

Perhaps worth a phone call to decide path forward?


On Tue, May 26, 2009 at 1:06 PM, Jeff Squyres <jsquyres@cisco.com>
wrote:
       Nadia --

       Sorry I didn't get to jump in on the other thread earlier.

       We have made considerable changes to the notifier framework in
       a branch to better support "SOS" functionality:


        https://www.open-mpi.org/hg/auth/hgwebdir.cgi/jsquyres/opal-sos

       Cisco and Indiana U. have been working on this branch for a
       while.  A description of the SOS stuff is here:

          https://svn.open-mpi.org/trac/ompi/wiki/ErrorMessages

       As for setting up an external web server with hg, don't bother
       -- just get an account at bitbucket.org.  They're free and
       allow you to host hg repositories there.  I've used bitbucket
       to collaborate on code before it hits OMPI's SVN trunk with
       both internal and external OMPI developers.

       We can certainly move the opal-sos repo to bitbucket (or
       branch again off opal-sos to bitbucket -- whatever makes more
       sense) to facilitate collaborating with you.

       Back on topic...

       I'd actually suggest a combination of what has been discussed
       in the other thread.  The notifier can be the mechanism that
       actually sends the output message, but it doesn't have to be
       the mechanism that tracks the stats and decides when to output
       a message.  That can be separate logic, and therefore be more
       fine-grained (and potentially even specific to the MPI layer).

       The Big Question will how to do this with zero performance
       impact when it is not being used. This has always been the
       difficult issue when trying to implement any kind of
       monitoring inside the core OMPI performance-sensitive paths.
        Even adding individual branches has met with resistance (in
       performance-critical code paths)...





       On May 26, 2009, at 10:59 AM, Nadia Derbey wrote:



               Hi,

               While having a look at the notifier framework under
               orte, I noticed that
               the way it is written, the init routine for the
               selected module cannot
               be called.

               Attached is a small patch that fixes this issue.

               Regards,
               Nadia


               <orte_notifier_fix_select.patch><ATT14046023.txt>


       --
       Jeff Squyres
       Cisco Systems

       _______________________________________________
       devel mailing list
       devel@open-mpi.org
       http://www.open-mpi.org/mailman/listinfo.cgi/devel

_______________________________________________
devel mailing list
devel@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
--
Nadia Derbey <Nadia.Derbey@bull.net>

_______________________________________________
devel mailing list
devel@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


_______________________________________________
devel mailing list
devel@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel