Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] RTE node allocation component
From: Ralph Castain (rhc_at_[hidden])
Date: 2012-04-13 12:43:28

Looks like you are using an old version - the trunk RAS has changed a bit. I'll shortly be implementing further changes to support dynamic allocation requests that might be relevant here as well.

Adding job data to the RAS base isn't a good idea - remember, multiple jobs can be launching at the same time!

On Apr 13, 2012, at 10:07 AM, Alex Margolin wrote:

> Hi,
> The next component I'm writing is a component for allocating nodes to
> run the processes of an MPI job.
> Suppose I have a "getbestnode" executable which not only tells me the
> best location for spawning a new process,
> but it also reserves the space (for some time), so that every time I run
> it I get different results (as the best cores are already reserved).
> I thought I should write a component under orte/mca/ras, similar to
> loadleveler, but the problem is that I can't determine inside the module
> the amount of slots required allocate. It gets an list to fill in as a parameter, and
> I guess it assumes I somehow know how many processes are run because the
> allocation was done externally and now I'm just asking the allocator for
> the list.
> A related location, the rmaps, has this information (and much more), but
> it doesn't look like a good location for such a module since it maps
> already allocated resources, and has a lot of irrelevant code in this case.
> Maybe the answer is to change the base module a bit, to contain this
> information? It could be used as a decent sanity check for other modules
> - making sure the external allocation fits the amount of processes we
> intend to run. Maybe orte_ras_base_allocate(orte_job_t *jdata) in
> ras_base_allocate.c can store the relevant information from jdata in
> orte_ras_base? In the long run it can become a parameter passed to the
> ras components, but for backwards-compatability the global will do for now.
> Thanks,
> Alex
> P.S. An RDS component is elaborately mentioned in ras.h, yet it is no
> longer available, right?
> _______________________________________________
> devel mailing list
> devel_at_[hidden]