Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] rankfile questions
From: Lenny Verkhovsky (lennyb_at_[hidden])
Date: 2008-03-19 05:36:57


Hi,

> -----Original Message-----
> From: devel-bounces_at_[hidden] [mailto:devel-bounces_at_[hidden]]
On
> Behalf Of Ralph Castain
> Sent: Wednesday, March 19, 2008 3:19 AM
> To: Open MPI Developers
> Subject: Re: [OMPI devel] rankfile questions
>
> Not trying to pile on here...but I do have a question.
>
> This commit inserted a bunch of affinity-specific code in
ompi_mpi_init.c.
> Was this truly necessary?
>
> It seems to me this violates our code architecture. Affinity-specific
code
> belongs in the opal_p[m]affinity functions. Why aren't we just calling
a
> "opal_paffinity_set_my_processor" function (or whatever name you like)
in
> mpi_init, and doing all this paffinity stuff there?

This is the only place where this code is used. These functions process
the info from ODLS and set paffinity appropriately. Moving this code to
OPAL will cause unnecessary changes in paffinity base API.

>
> It would make mpi_init a lot cleaner, and preserve the code standards
we
> have had since the beginning.
>
> In addition, the code that has been added returns ORTE error and
success
> codes. Given the location, it should be OMPI error and success codes -
if
> we
> move it to where I think it belongs (in OPAL), then those codes should
> obviously be OPAL codes.

Will be cleaned up,
thanks.

>
> If I'm missing some reason why these things can't be done, please
> enlighten
> me. Otherwise, it would be nice if this could be cleaned up.
>
> Thanks
> Ralph
>
> On 3/18/08 8:39 AM, "Jeff Squyres" <jsquyres_at_[hidden]> wrote:
>
> > On Mar 18, 2008, at 9:32 AM, Jeff Squyres wrote:
> >
> >> I notice that rankfile didn't compile properly on some platforms
and
> >> issued warnings on other platforms. Thanks to Ralph for cleaning
it
> >> up...
> >>
> >> 1. I see a getenv("slot_list") in the MPI side of the code; it
looks
> >> like $slot_list is set by the odls for the MPI process. Why isn't
it
> >> an MCA parameter? That's what all other values passed by the orted
to
> >> the MPI process appear to be.

"slot_list" consist of socket:core pair for the rank to be bind to. This
info changes according to rankfile and different for each node and rank,
therefore it cannot be passed via mca parameter.

> >>
> >> 2. I see that ompi_mpi_params.c is now registering 2 rmaps-level
MCA
> >> parameters. Why? Shouldn't these be in ORTE somewhere?

If you mean paffinity_alone and rank_file_debug, then
1. paffinity_alone was there before.
2. After getting some answers from Ralph about orte_debug in
ompi_mpi_init I intend to introduce ompi_debug mca parameter that will
be used in this library and rank_file_debug will be removed.
 
> >
> >
> > A few more notes:
> >
> > 3. Most of the files in orte/mca/rmaps/rankfile do not obey the
prefix
> > rule. I think that they should be renamed.

Rank_file component was copied from round_robin, I thought it would be
strange if it would look differently.

> >
> > 4. A quick look through rankfile_lex.l seems to show that there are
> > global variables that are not protected by the prefix rule (or
> > static). Ditto in rmaps_rf.c. These should be fixed.

What do you mean?

> >
> > 5. rank_file_done was instantiated in both rankfile_lex.l and
> > ramps_rf.c (causing a duplicate symbol linker error on OS X). I
> > removed it from rmaps_rf.c (it was declared "extern" in
> > rankfile_lex.h, assumedly to indicate that it is "owned" by the
lex.l
> > file...?).
thanks

> >
> > 6. svn:ignore was not set in the new rankfile directory.
Will be fixed.

I guess due to the heavy network traffic nowadays, all these comments
came now and not 2 weeks ago when I sent the code for reviews :) :) :).
 
Best Regards,
Lenny.

>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel