Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] rankfile questions
From: Ralph Castain (rhc_at_[hidden])
Date: 2008-03-18 21:18:43


Not trying to pile on here...but I do have a question.

This commit inserted a bunch of affinity-specific code in ompi_mpi_init.c.
Was this truly necessary?

It seems to me this violates our code architecture. Affinity-specific code
belongs in the opal_p[m]affinity functions. Why aren't we just calling a
"opal_paffinity_set_my_processor" function (or whatever name you like) in
mpi_init, and doing all this paffinity stuff there?

It would make mpi_init a lot cleaner, and preserve the code standards we
have had since the beginning.

In addition, the code that has been added returns ORTE error and success
codes. Given the location, it should be OMPI error and success codes - if we
move it to where I think it belongs (in OPAL), then those codes should
obviously be OPAL codes.

If I'm missing some reason why these things can't be done, please enlighten
me. Otherwise, it would be nice if this could be cleaned up.

Thanks
Ralph

On 3/18/08 8:39 AM, "Jeff Squyres" <jsquyres_at_[hidden]> wrote:

> On Mar 18, 2008, at 9:32 AM, Jeff Squyres wrote:
>
>> I notice that rankfile didn't compile properly on some platforms and
>> issued warnings on other platforms. Thanks to Ralph for cleaning it
>> up...
>>
>> 1. I see a getenv("slot_list") in the MPI side of the code; it looks
>> like $slot_list is set by the odls for the MPI process. Why isn't it
>> an MCA parameter? That's what all other values passed by the orted to
>> the MPI process appear to be.
>>
>> 2. I see that ompi_mpi_params.c is now registering 2 rmaps-level MCA
>> parameters. Why? Shouldn't these be in ORTE somewhere?
>
>
> A few more notes:
>
> 3. Most of the files in orte/mca/rmaps/rankfile do not obey the prefix
> rule. I think that they should be renamed.
>
> 4. A quick look through rankfile_lex.l seems to show that there are
> global variables that are not protected by the prefix rule (or
> static). Ditto in rmaps_rf.c. These should be fixed.
>
> 5. rank_file_done was instantiated in both rankfile_lex.l and
> ramps_rf.c (causing a duplicate symbol linker error on OS X). I
> removed it from rmaps_rf.c (it was declared "extern" in
> rankfile_lex.h, assumedly to indicate that it is "owned" by the lex.l
> file...?).
>
> 6. svn:ignore was not set in the new rankfile directory.