Hello All,
In the recent few weeks I implemented ticket 1023 (https://svn.open-mpi.org/trac/ompi/ticket/1023).
In a few words, the purpose of ticket 1023 is to
expand the hostfile syntax to precisely specify slot
location (in terms of virtual CPU ID or socket core notation) in the node and/or
rank in a MCW.
The code is in a temporary branch https://svn.open-mpi.org/svn/ompi/tmp/sharon/
The changes are:
1. In the RAS base component:
a. Added new list of orte_ras_proc_t
structures
b. Each orte_ras_proc_t structure
contains 3 members: node_name, rank and cpu_list.
c. the cpu_list is a string representing
the slot list from the hostfile i.e.: if the
SLOT token in the
hostfile is - SLOT=1@2:1,3:1-4, the slot_list string is: 2:1,3:7-9.
2. In the RDS hostfile component:
a. Added new token SLOT to the lex parser.
b. filling the orte_ras_proc_t structure
list according the SLOT token in the hostfile.
3. In the RMAPS round robin component:
a. Added new member to orte_mapped_node_t
structure - slot_list (similar to the slot_list
in the orte_ras_proc_t structure)
b. in the orte_rmaps_rr_map, mapping job
according to hostfile ranks before mapping the job
by slot or by node.
c. in the orte_rmaps_rr_map, arranging
the MCW ranks according to the hostfile.
4. in the ODLS default module:
a. Added slot_list to orte_odls_default_get_add_procs_data.
b. Added slot_list to orte_odls_default_launch_local_procs.
c. Added new member to the child
structure a cpu_set bitmap (for PLPA)
d. Added mapping of the slot_list string
to a cpu_set bitmap in the child structure.
For more details you can browse the code.
I would like to merge these changes to the trunk as
soon as possible since, as I understood from Ralph Castain emails,
The Open RTE will go through a lot of changes in the
near future and since this is a relatively small change I want to merge
it before the big change.
Any comments?