Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] new map-by-obj has a problem
From: tmishima_at_[hidden]
Date: 2014-02-27 18:19:08


Hi Ralph, I'm afraid to say your new "map-by obj" causes another problem.

I have overload message with this command line as shown below:

mpirun -np 8 -host node05,node06 -report-bindings -map-by socket:pe=2
-display-map ~/mis/openmpi/d
emos/myprog
--------------------------------------------------------------------------
A request was made to bind to that would result in binding more
processes than cpus on a resource:

   Bind to: CORE
   Node: node05
   #processes: 2
   #cpus: 1

You can override this protection by adding the "overload-allowed"
option to your binding directive.
--------------------------------------------------------------------------

Then, I add "-bind-to core:overload-allowed" to see what happenes.

mpirun -np 8 -host node05,node06 -report-bindings -map-by socket:pe=2
-display-map -bind-to core:o
verload-allowed ~/mis/openmpi/demos/myprog
 Data for JOB [14398,1] offset 0

 ======================== JOB MAP ========================

 Data for node: node05 Num slots: 1 Max slots: 0 Num procs: 4
        Process OMPI jobid: [14398,1] App: 0 Process rank: 0
        Process OMPI jobid: [14398,1] App: 0 Process rank: 1
        Process OMPI jobid: [14398,1] App: 0 Process rank: 2
        Process OMPI jobid: [14398,1] App: 0 Process rank: 3

 Data for node: node06 Num slots: 1 Max slots: 0 Num procs: 4
        Process OMPI jobid: [14398,1] App: 0 Process rank: 4
        Process OMPI jobid: [14398,1] App: 0 Process rank: 5
        Process OMPI jobid: [14398,1] App: 0 Process rank: 6
        Process OMPI jobid: [14398,1] App: 0 Process rank: 7

 =============================================================
[node06.cluster:18443] MCW rank 6 bound to socket 0[core 0[hwt 0]], socket
0[core 1[hwt 0]]: [B/B/./.][./././.]
[node05.cluster:20901] MCW rank 2 bound to socket 0[core 0[hwt 0]], socket
0[core 1[hwt 0]]: [B/B/./.][./././.]
[node06.cluster:18443] MCW rank 7 bound to socket 0[core 2[hwt 0]], socket
0[core 3[hwt 0]]: [././B/B][./././.]
[node05.cluster:20901] MCW rank 3 bound to socket 0[core 2[hwt 0]], socket
0[core 3[hwt 0]]: [././B/B][./././.]
[node06.cluster:18443] MCW rank 4 bound to socket 0[core 0[hwt 0]], socket
0[core 1[hwt 0]]: [B/B/./.][./././.]
[node05.cluster:20901] MCW rank 0 bound to socket 0[core 0[hwt 0]], socket
0[core 1[hwt 0]]: [B/B/./.][./././.]
[node06.cluster:18443] MCW rank 5 bound to socket 0[core 2[hwt 0]], socket
0[core 3[hwt 0]]: [././B/B][./././.]
[node05.cluster:20901] MCW rank 1 bound to socket 0[core 2[hwt 0]], socket
0[core 3[hwt 0]]: [././B/B][./././.]
Hello world from process 4 of 8
Hello world from process 2 of 8
Hello world from process 6 of 8
Hello world from process 0 of 8
Hello world from process 5 of 8
Hello world from process 1 of 8
Hello world from process 7 of 8
Hello world from process 3 of 8

When I add "map-by obj:span", it works fine:

mpirun -np 8 -host node05,node06 -report-bindings -map-by socket:pe=2,span
-display-map ~/mis/ope
nmpi/demos/myprog
 Data for JOB [14703,1] offset 0

 ======================== JOB MAP ========================

 Data for node: node05 Num slots: 1 Max slots: 0 Num procs: 4
        Process OMPI jobid: [14703,1] App: 0 Process rank: 0
        Process OMPI jobid: [14703,1] App: 0 Process rank: 2
        Process OMPI jobid: [14703,1] App: 0 Process rank: 1
        Process OMPI jobid: [14703,1] App: 0 Process rank: 3

 Data for node: node06 Num slots: 1 Max slots: 0 Num procs: 4
        Process OMPI jobid: [14703,1] App: 0 Process rank: 4
        Process OMPI jobid: [14703,1] App: 0 Process rank: 6
        Process OMPI jobid: [14703,1] App: 0 Process rank: 5
        Process OMPI jobid: [14703,1] App: 0 Process rank: 7

 =============================================================
[node06.cluster:18491] MCW rank 6 bound to socket 0[core 2[hwt 0]], socket
0[core 3[hwt 0]]: [././B/B][./././.]
[node05.cluster:20949] MCW rank 2 bound to socket 0[core 2[hwt 0]], socket
0[core 3[hwt 0]]: [././B/B][./././.]
[node06.cluster:18491] MCW rank 7 bound to socket 1[core 6[hwt 0]], socket
1[core 7[hwt 0]]: [./././.][././B/B]
[node05.cluster:20949] MCW rank 3 bound to socket 1[core 6[hwt 0]], socket
1[core 7[hwt 0]]: [./././.][././B/B]
[node06.cluster:18491] MCW rank 4 bound to socket 0[core 0[hwt 0]], socket
0[core 1[hwt 0]]: [B/B/./.][./././.]
[node05.cluster:20949] MCW rank 0 bound to socket 0[core 0[hwt 0]], socket
0[core 1[hwt 0]]: [B/B/./.][./././.]
[node06.cluster:18491] MCW rank 5 bound to socket 1[core 4[hwt 0]], socket
1[core 5[hwt 0]]: [./././.][B/B/./.]
[node05.cluster:20949] MCW rank 1 bound to socket 1[core 4[hwt 0]], socket
1[core 5[hwt 0]]: [./././.][B/B/./.]
....

So, byobj_span would be okay. Of course, bynode and byslot should be okay.
Could you take a look at orte_rmaps_rr_byobj again?

Regards,
Tetsuya Mishima