Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

From: David Erukhimovich (daviderukh_at_[hidden])
Date: 2007-10-20 08:58:38


Hi
Thank you for your answer.

First of all, my two questions wasn't connected and they belong to different
part of my project. and the subject of the mail should have been: Trying to
get total procs num in rds framework (sorry my mistake).

Here the parts in the order of the last email

1. I've solved the problem about getting total num of procs in rds (just
called some function incorrectly), so sorry for disturbing you about that.
Now a bit more about what I'm trying to do, maybe there is a better way then
mine:
I have a tool (external application) that given a list of machines and a
number n , it chooses the n best ones from the list (least loaded ones) and
if the list of machines isn't given, it just returns the n best machines
from the claster. I am wishing to include this in ompi. hence - given a
machinefile, It'll run the process only on the best nodes. If a machinefile
isn't given, it'll take the best node that my application returns.
I think the best place to implement it is in rds - after building the list
of newly discovered nodes: if it is empty, fill it using my tool, otherwise
filter it using my tool. It seems to me the most logical way to do it. Am I
right? I am asking you because I guess you have a better knowledge in ompi
architecture.

2. The other thing I am trying to do is to make ompi to run every process,
not directly, but through external program. e.g: If I want to launch the
program "hostname", I want that following to be launched: "<my-program>
<my-program's-flags> hostname".
I figured that the best way to do it is in odls framework because there I
have the exact executing point.
I am currently working on the checkpoint 1.2.3. I don't work on the trunk
because I need the patches to be added on some stable release. Is there a
1.2.* release where the bug is fixed. And if not - when can such fixed
version be stable

Thank you
--Davis

---------- Forwarded message ----------
From: Ralph Castain <rhc_at_[hidden]>
Date: Oct 17, 2007 11:22 PM
Subject: Re: [OMPI devel] Trying to get total procs num in odls framework
To: daviderukh_at_[hidden]
Cc: "Jeff Squyres (jsquyres)" <jsquyres_at_[hidden]>

Hi David

I could probably answer your questions better if I had a better
understanding of what you are trying to do. For example, looking in the
hostfile rds for the number of procs to be launched seems strange as the
functional role of the framework is to simply learn what nodes are
available.

It would also help to have some idea of what environment you are working in,
and how you configured the beast.

Please see comments below.
Ralph

On 10/17/07 2:47 PM, "Jeff Squyres" <jsquyres_at_[hidden]> wrote:

> Yo Ralph --
>
> Can you answer these questions?
>
> Begin forwarded message:
>
>> From: David Erukhimovich <daviderukh_at_[hidden]>
>> Date: October 14, 2007 5:08:45 PM EDT
>> To: devel_at_[hidden]
>> Subject: [OMPI devel] Trying to get total procs num in odls framework
>> Reply-To: Open MPI Developers <devel_at_[hidden]>
>>
>> Hello,
>> I have 2 questions:
>> 1. I am trying to get the total number of requested processes for
>> the job
>> in' hostfile' component in rds. I took the job object that was
>> given as a
>> parameter, extracted the application objects and checked how many
>> procs
>> each application has. The result in every run was 0. As I
>> understand, this
>> variable is updated before the rds part. So what am I doing wrong?

Do you mean you took the jobid given to the hostfile RDS (which isn't an
object, but just a number) and did an orte_rmgr.get_app_context to get the
array of app_contexts? Is there some reason why you would want to do that
there?

Depending upon what the command line looks like, it is possible for the
number of procs to be zero - we allow that option and then fill in the
number later. If it was specified, though, we do insert the number in the
app_context object.

Maybe you could tell me what the command line looks like, the function call
you used to get the "application objects", and what field you were looking
at when you found zero?

>>
>> 2. I've discovered an undocumented framework - odls.

It wasn't exactly hidden...we haven't documented it because we are lazy and
the existing components cover every known environment (or so we thought).
;-)

Is there some special reason to want to create another one?

>> I've created a
>> new
>> component for it. The problem is that there is no way to switch
>> between
>> the default component and mine (--mca odls <my component> doesn't
>> work).
>> Is there a way to switch between odls components (I saw bprocs
>> there and
>> I guess it is used)?

Are you working on the trunk? What r level?

Reason I ask: I recently fixed a problem where the command line mca params
were not getting passed to the orteds. Your description looks like you
haven't picked up that change. If you have updated recently, and you still
can't get it to work, then we likely have a lingering problem.

If I read your subject line correctly, then I am somewhat puzzled. You can
look at the orte/mca/odls/base/odls_base_default_fns.c file, the
orte_odls_base_default_get_add_procs_data function and see where we get the
total number of procs in a job and how that is passed to the orteds. If you
have some new environment that the existing odls components can't handle,
then I would strongly suggest you at least use the default functions in the
base to provide as much support as possible as this will help you to keep
pace with changes in the system.

I would also welcome feedback on what you encountered that required a new
odls component - perhaps we can modify the base support functions to make it
fit within one of the existing components.

Thanks
Ralph

>>
>> Thank you,
>> --David
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>