Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] new map-by-obj has a problem
From: tmishima_at_[hidden]
Date: 2014-02-27 19:10:37


Hi Ralph, this is just for your information.

I tried to restore previous orte_rmaps_rr_byobj. Then I gets the result
below with this command line:

mpirun -np 8 -host node05,node06 -report-bindings -map-by socket:pe=2
-display-map -bind-to core:overload-allowed ~/mis/openmpi/demos/myprog
 Data for JOB [31184,1] offset 0

 ======================== JOB MAP ========================

 Data for node: node05 Num slots: 1 Max slots: 0 Num procs: 7
        Process OMPI jobid: [31184,1] App: 0 Process rank: 0
        Process OMPI jobid: [31184,1] App: 0 Process rank: 2
        Process OMPI jobid: [31184,1] App: 0 Process rank: 4
        Process OMPI jobid: [31184,1] App: 0 Process rank: 6
        Process OMPI jobid: [31184,1] App: 0 Process rank: 1
        Process OMPI jobid: [31184,1] App: 0 Process rank: 3
        Process OMPI jobid: [31184,1] App: 0 Process rank: 5

 Data for node: node06 Num slots: 1 Max slots: 0 Num procs: 1
        Process OMPI jobid: [31184,1] App: 0 Process rank: 7

 =============================================================
[node06.cluster:18857] MCW rank 7 bound to socket 0[core 0[hwt 0]], socket
0[core 1[hwt 0]]: [B/B/./.][./././.]
[node05.cluster:21399] MCW rank 3 bound to socket 1[core 6[hwt 0]], socket
1[core 7[hwt 0]]: [./././.][././B/B]
[node05.cluster:21399] MCW rank 4 bound to socket 0[core 0[hwt 0]], socket
0[core 1[hwt 0]]: [B/B/./.][./././.]
[node05.cluster:21399] MCW rank 5 bound to socket 1[core 4[hwt 0]], socket
1[core 5[hwt 0]]: [./././.][B/B/./.]
[node05.cluster:21399] MCW rank 6 bound to socket 0[core 2[hwt 0]], socket
0[core 3[hwt 0]]: [././B/B][./././.]
[node05.cluster:21399] MCW rank 0 bound to socket 0[core 0[hwt 0]], socket
0[core 1[hwt 0]]: [B/B/./.][./././.]
[node05.cluster:21399] MCW rank 1 bound to socket 1[core 4[hwt 0]], socket
1[core 5[hwt 0]]: [./././.][B/B/./.]
[node05.cluster:21399] MCW rank 2 bound to socket 0[core 2[hwt 0]], socket
0[core 3[hwt 0]]: [././B/B][./././.]
....

Then I add "-hostfile pbs_hosts" and the result is:

[mishima_at_manage work]$cat pbs_hosts
node05 slots=8
node06 slots=8
[mishima_at_manage work]$ mpirun -np 8 -hostfile ~/work/pbs_hosts
-report-bindings -map-by socket:pe=2 -display-map
~/mis/openmpi/demos/myprog
 Data for JOB [30254,1] offset 0

 ======================== JOB MAP ========================

 Data for node: node05 Num slots: 8 Max slots: 0 Num procs: 4
        Process OMPI jobid: [30254,1] App: 0 Process rank: 0
        Process OMPI jobid: [30254,1] App: 0 Process rank: 2
        Process OMPI jobid: [30254,1] App: 0 Process rank: 1
        Process OMPI jobid: [30254,1] App: 0 Process rank: 3

 Data for node: node06 Num slots: 8 Max slots: 0 Num procs: 4
        Process OMPI jobid: [30254,1] App: 0 Process rank: 4
        Process OMPI jobid: [30254,1] App: 0 Process rank: 6
        Process OMPI jobid: [30254,1] App: 0 Process rank: 5
        Process OMPI jobid: [30254,1] App: 0 Process rank: 7

 =============================================================
[node05.cluster:21501] MCW rank 2 bound to socket 0[core 2[hwt 0]], socket
0[core 3[hwt 0]]: [././B/B][./././.]
[node05.cluster:21501] MCW rank 3 bound to socket 1[core 6[hwt 0]], socket
1[core 7[hwt 0]]: [./././.][././B/B]
[node05.cluster:21501] MCW rank 0 bound to socket 0[core 0[hwt 0]], socket
0[core 1[hwt 0]]: [B/B/./.][./././.]
[node05.cluster:21501] MCW rank 1 bound to socket 1[core 4[hwt 0]], socket
1[core 5[hwt 0]]: [./././.][B/B/./.]
[node06.cluster:18935] MCW rank 6 bound to socket 0[core 2[hwt 0]], socket
0[core 3[hwt 0]]: [././B/B][./././.]
[node06.cluster:18935] MCW rank 7 bound to socket 1[core 6[hwt 0]], socket
1[core 7[hwt 0]]: [./././.][././B/B]
[node06.cluster:18935] MCW rank 4 bound to socket 0[core 0[hwt 0]], socket
0[core 1[hwt 0]]: [B/B/./.][./././.]
[node06.cluster:18935] MCW rank 5 bound to socket 1[core 4[hwt 0]], socket
1[core 5[hwt 0]]: [./././.][B/B/./.]
....

I think previous version's behavior would be close to what I expect.

Tetusya

> They have 4 cores/socket and 2 sockets, totally 4 X 2 = 8 cores, each.
>
> Here is the output of lstopo.
>
> mishima_at_manage round_robin]$ rsh node05
> Last login: Tue Feb 18 15:10:15 from manage
> [mishima_at_node05 ~]$ lstopo
> Machine (32GB)
> NUMANode L#0 (P#0 16GB) + Socket L#0 + L3 L#0 (6144KB)
> L2 L#0 (512KB) + L1d L#0 (64KB) + L1i L#0 (64KB) + Core L#0 + PU L#0
> (P#0)
> L2 L#1 (512KB) + L1d L#1 (64KB) + L1i L#1 (64KB) + Core L#1 + PU L#1
> (P#1)
> L2 L#2 (512KB) + L1d L#2 (64KB) + L1i L#2 (64KB) + Core L#2 + PU L#2
> (P#2)
> L2 L#3 (512KB) + L1d L#3 (64KB) + L1i L#3 (64KB) + Core L#3 + PU L#3
> (P#3)
> NUMANode L#1 (P#1 16GB) + Socket L#1 + L3 L#1 (6144KB)
> L2 L#4 (512KB) + L1d L#4 (64KB) + L1i L#4 (64KB) + Core L#4 + PU L#4
> (P#4)
> L2 L#5 (512KB) + L1d L#5 (64KB) + L1i L#5 (64KB) + Core L#5 + PU L#5
> (P#5)
> L2 L#6 (512KB) + L1d L#6 (64KB) + L1i L#6 (64KB) + Core L#6 + PU L#6
> (P#6)
> L2 L#7 (512KB) + L1d L#7 (64KB) + L1i L#7 (64KB) + Core L#7 + PU L#7
> (P#7)
> ....
>
> I foucused on byobj_span and bynode. I didn't notice byobj was modified,
> sorry.
>
> Tetsuya
>
> > Hmmm..what does your node look like again (sockets and cores)?
> >
> > On Feb 27, 2014, at 3:19 PM, tmishima_at_[hidden] wrote:
> >
> > >
> > > Hi Ralph, I'm afraid to say your new "map-by obj" causes another
> problem.
> > >
> > > I have overload message with this command line as shown below:
> > >
> > > mpirun -np 8 -host node05,node06 -report-bindings -map-by socket:pe=2
> > > -display-map ~/mis/openmpi/d
> > > emos/myprog
> > >
>
--------------------------------------------------------------------------
> > > A request was made to bind to that would result in binding more
> > > processes than cpus on a resource:
> > >
> > > Bind to: CORE
> > > Node: node05
> > > #processes: 2
> > > #cpus: 1
> > >
> > > You can override this protection by adding the "overload-allowed"
> > > option to your binding directive.
> > >
>
--------------------------------------------------------------------------
> > >
> > > Then, I add "-bind-to core:overload-allowed" to see what happenes.
> > >
> > > mpirun -np 8 -host node05,node06 -report-bindings -map-by socket:pe=2
> > > -display-map -bind-to core:o
> > > verload-allowed ~/mis/openmpi/demos/myprog
> > > Data for JOB [14398,1] offset 0
> > >
> > > ======================== JOB MAP ========================
> > >
> > > Data for node: node05 Num slots: 1 Max slots: 0 Num procs: 4
> > > Process OMPI jobid: [14398,1] App: 0 Process rank: 0
> > > Process OMPI jobid: [14398,1] App: 0 Process rank: 1
> > > Process OMPI jobid: [14398,1] App: 0 Process rank: 2
> > > Process OMPI jobid: [14398,1] App: 0 Process rank: 3
> > >
> > > Data for node: node06 Num slots: 1 Max slots: 0 Num procs: 4
> > > Process OMPI jobid: [14398,1] App: 0 Process rank: 4
> > > Process OMPI jobid: [14398,1] App: 0 Process rank: 5
> > > Process OMPI jobid: [14398,1] App: 0 Process rank: 6
> > > Process OMPI jobid: [14398,1] App: 0 Process rank: 7
> > >
> > > =============================================================
> > > [node06.cluster:18443] MCW rank 6 bound to socket 0[core 0[hwt 0]],
> socket
> > > 0[core 1[hwt 0]]: [B/B/./.][./././.]
> > > [node05.cluster:20901] MCW rank 2 bound to socket 0[core 0[hwt 0]],
> socket
> > > 0[core 1[hwt 0]]: [B/B/./.][./././.]
> > > [node06.cluster:18443] MCW rank 7 bound to socket 0[core 2[hwt 0]],
> socket
> > > 0[core 3[hwt 0]]: [././B/B][./././.]
> > > [node05.cluster:20901] MCW rank 3 bound to socket 0[core 2[hwt 0]],
> socket
> > > 0[core 3[hwt 0]]: [././B/B][./././.]
> > > [node06.cluster:18443] MCW rank 4 bound to socket 0[core 0[hwt 0]],
> socket
> > > 0[core 1[hwt 0]]: [B/B/./.][./././.]
> > > [node05.cluster:20901] MCW rank 0 bound to socket 0[core 0[hwt 0]],
> socket
> > > 0[core 1[hwt 0]]: [B/B/./.][./././.]
> > > [node06.cluster:18443] MCW rank 5 bound to socket 0[core 2[hwt 0]],
> socket
> > > 0[core 3[hwt 0]]: [././B/B][./././.]
> > > [node05.cluster:20901] MCW rank 1 bound to socket 0[core 2[hwt 0]],
> socket
> > > 0[core 3[hwt 0]]: [././B/B][./././.]
> > > Hello world from process 4 of 8
> > > Hello world from process 2 of 8
> > > Hello world from process 6 of 8
> > > Hello world from process 0 of 8
> > > Hello world from process 5 of 8
> > > Hello world from process 1 of 8
> > > Hello world from process 7 of 8
> > > Hello world from process 3 of 8
> > >
> > > When I add "map-by obj:span", it works fine:
> > >
> > > mpirun -np 8 -host node05,node06 -report-bindings -map-by
> socket:pe=2,span
> > > -display-map ~/mis/ope
> > > nmpi/demos/myprog
> > > Data for JOB [14703,1] offset 0
> > >
> > > ======================== JOB MAP ========================
> > >
> > > Data for node: node05 Num slots: 1 Max slots: 0 Num procs: 4
> > > Process OMPI jobid: [14703,1] App: 0 Process rank: 0
> > > Process OMPI jobid: [14703,1] App: 0 Process rank: 2
> > > Process OMPI jobid: [14703,1] App: 0 Process rank: 1
> > > Process OMPI jobid: [14703,1] App: 0 Process rank: 3
> > >
> > > Data for node: node06 Num slots: 1 Max slots: 0 Num procs: 4
> > > Process OMPI jobid: [14703,1] App: 0 Process rank: 4
> > > Process OMPI jobid: [14703,1] App: 0 Process rank: 6
> > > Process OMPI jobid: [14703,1] App: 0 Process rank: 5
> > > Process OMPI jobid: [14703,1] App: 0 Process rank: 7
> > >
> > > =============================================================
> > > [node06.cluster:18491] MCW rank 6 bound to socket 0[core 2[hwt 0]],
> socket
> > > 0[core 3[hwt 0]]: [././B/B][./././.]
> > > [node05.cluster:20949] MCW rank 2 bound to socket 0[core 2[hwt 0]],
> socket
> > > 0[core 3[hwt 0]]: [././B/B][./././.]
> > > [node06.cluster:18491] MCW rank 7 bound to socket 1[core 6[hwt 0]],
> socket
> > > 1[core 7[hwt 0]]: [./././.][././B/B]
> > > [node05.cluster:20949] MCW rank 3 bound to socket 1[core 6[hwt 0]],
> socket
> > > 1[core 7[hwt 0]]: [./././.][././B/B]
> > > [node06.cluster:18491] MCW rank 4 bound to socket 0[core 0[hwt 0]],
> socket
> > > 0[core 1[hwt 0]]: [B/B/./.][./././.]
> > > [node05.cluster:20949] MCW rank 0 bound to socket 0[core 0[hwt 0]],
> socket
> > > 0[core 1[hwt 0]]: [B/B/./.][./././.]
> > > [node06.cluster:18491] MCW rank 5 bound to socket 1[core 4[hwt 0]],
> socket
> > > 1[core 5[hwt 0]]: [./././.][B/B/./.]
> > > [node05.cluster:20949] MCW rank 1 bound to socket 1[core 4[hwt 0]],
> socket
> > > 1[core 5[hwt 0]]: [./././.][B/B/./.]
> > > ....
> > >
> > > So, byobj_span would be okay. Of course, bynode and byslot should be
> okay.
> > > Could you take a look at orte_rmaps_rr_byobj again?
> > >
> > > Regards,
> > > Tetsuya Mishima
> > >
> > > _______________________________________________
> > > users mailing list
> > > users_at_[hidden]
> > > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users