Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] RTE node allocation component
From: Alex Margolin (alex.margolin_at_[hidden])
Date: 2012-04-14 16:55:02


As to the old version: I'm working in parallel on a patch to branch 1.6
and the trunk, which (the patches, not the versions) are almost identical.
There is a minor difference in my patch for the RAS: in the trunk I used
the preexisting total_slots_alloc while in 1.6 I added it to
orte_ras_base (exactly whee it is located in the trunk). I admit it's
not the original intent of the author of orte_ras_base data struct
specifically or maybe even the RAS component in general, but I see no
other way to implement it now...

What I've written for RAS (attached is my current patch for the 1.6
branch, incl. BTL and ODLS modules previously sent here) is a module
which does 2 things (for mpirun -n X foo):
1. Waits for X slots to become available somewhere in the cluster (optional)
2. Create the allocation composed of the X best machines to use
- This requires the RAS module to know the amount of slots to allocate
in advance... is there a better way to do it? (in 1.6/trunk?)
I tried to access the orte_job_t struct using my jobid from inside the
ras module, but that struct isn't initialized with content at that time.

Thanks,
Alex

P.S. I'm preparing a patch for both 1.6 branch and trunk because I want
to do some benchmarking (note saying trunk is bad for this purpose) and
I want it to be available in the long run. Am I missing something here?
I hope I'll get the contributor paper signed so I can commit rather then
working on my laptop...

On 04/13/2012 07:43 PM, Ralph Castain wrote:
> Looks like you are using an old version - the trunk RAS has changed a bit. I'll shortly be implementing further changes to support dynamic allocation requests that might be relevant here as well.
>
> Adding job data to the RAS base isn't a good idea - remember, multiple jobs can be launching at the same time!
>
> On Apr 13, 2012, at 10:07 AM, Alex Margolin wrote:
>
>> Hi,
>>
>> The next component I'm writing is a component for allocating nodes to
>> run the processes of an MPI job.
>> Suppose I have a "getbestnode" executable which not only tells me the
>> best location for spawning a new process,
>> but it also reserves the space (for some time), so that every time I run
>> it I get different results (as the best cores are already reserved).
>>
>> I thought I should write a component under orte/mca/ras, similar to
>> loadleveler, but the problem is that I can't determine inside the module
>> the amount of slots required allocate. It gets an list to fill in as a parameter, and
>> I guess it assumes I somehow know how many processes are run because the
>> allocation was done externally and now I'm just asking the allocator for
>> the list.
>>
>> A related location, the rmaps, has this information (and much more), but
>> it doesn't look like a good location for such a module since it maps
>> already allocated resources, and has a lot of irrelevant code in this case.
>>
>> Maybe the answer is to change the base module a bit, to contain this
>> information? It could be used as a decent sanity check for other modules
>> - making sure the external allocation fits the amount of processes we
>> intend to run. Maybe orte_ras_base_allocate(orte_job_t *jdata) in
>> ras_base_allocate.c can store the relevant information from jdata in
>> orte_ras_base? In the long run it can become a parameter passed to the
>> ras components, but for backwards-compatability the global will do for now.
>>
>> Thanks,
>> Alex
>>
>> P.S. An RDS component is elaborately mentioned in ras.h, yet it is no
>> longer available, right?
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel