Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI devel] problem in the ORTE notifier framework
From: Nadia Derbey (Nadia.Derbey_at_[hidden])
Date: 2009-05-28 03:36:38


On Tue, 2009-05-26 at 17:24 -0600, Ralph Castain wrote:
> First, to answer Nadia's question: you will find that the init
> function for the module is already called when it is selected - see
> the code in orte/mca/base/notifier_base_select.c, lines 72-76 (in the
> trunk.

Strange? Our repository is a clone of the trunk?
>
It's true that if I "hg update" to v1.3 I see that the fix is there.

Regards,
Nadia

> It would be a good idea to tie into the sos work to avoid conflicts
> when it all gets merged back together, assuming that isn't a big
> problem for you.
>
> As for Jeff's suggestion: dealing with the performance hit problem is
> why I suggested ORTE_NOTIFIER_VERBOSE, modeled after the
> OPAL_OUTPUT_VERBOSE model. The idea was to compile it in -only- when
> the system is built for it - maybe using a --with-notifier-verbose
> configuration option. Frankly, some organizations would happily pay a
> small performance penalty for the benefits.
>
> I would personally recommend that the notifier framework keep the
> stats so things can be compact and self-contained. We still get
> atomicity by allowing each framework/component/whatever specify the
> threshold. Creating yet another system to do nothing more than track
> error/warning frequencies to decide whether or not to notify seems
> wasteful.
>
> Perhaps worth a phone call to decide path forward?
>
>
> On Tue, May 26, 2009 at 1:06 PM, Jeff Squyres <jsquyres_at_[hidden]>
> wrote:
> Nadia --
>
> Sorry I didn't get to jump in on the other thread earlier.
>
> We have made considerable changes to the notifier framework in
> a branch to better support "SOS" functionality:
>
>
> https://www.open-mpi.org/hg/auth/hgwebdir.cgi/jsquyres/opal-sos
>
> Cisco and Indiana U. have been working on this branch for a
> while. A description of the SOS stuff is here:
>
> https://svn.open-mpi.org/trac/ompi/wiki/ErrorMessages
>
> As for setting up an external web server with hg, don't bother
> -- just get an account at bitbucket.org. They're free and
> allow you to host hg repositories there. I've used bitbucket
> to collaborate on code before it hits OMPI's SVN trunk with
> both internal and external OMPI developers.
>
> We can certainly move the opal-sos repo to bitbucket (or
> branch again off opal-sos to bitbucket -- whatever makes more
> sense) to facilitate collaborating with you.
>
> Back on topic...
>
> I'd actually suggest a combination of what has been discussed
> in the other thread. The notifier can be the mechanism that
> actually sends the output message, but it doesn't have to be
> the mechanism that tracks the stats and decides when to output
> a message. That can be separate logic, and therefore be more
> fine-grained (and potentially even specific to the MPI layer).
>
> The Big Question will how to do this with zero performance
> impact when it is not being used. This has always been the
> difficult issue when trying to implement any kind of
> monitoring inside the core OMPI performance-sensitive paths.
> Even adding individual branches has met with resistance (in
> performance-critical code paths)...
>
>
>
>
>
> On May 26, 2009, at 10:59 AM, Nadia Derbey wrote:
>
>
>
> Hi,
>
> While having a look at the notifier framework under
> orte, I noticed that
> the way it is written, the init routine for the
> selected module cannot
> be called.
>
> Attached is a small patch that fixes this issue.
>
> Regards,
> Nadia
>
>
> <orte_notifier_fix_select.patch><ATT14046023.txt>
>
>
> --
> Jeff Squyres
> Cisco Systems
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

-- 
Nadia Derbey <Nadia.Derbey_at_[hidden]>