Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] new map-by-obj has a problem
From: tmishima_at_[hidden]
Date: 2014-02-28 18:43:01


Hi Ralph, I'm a litte bit late to your release.

I found a minor mistake in byobj_span -integer casting problem.

--- rmaps_rr_mappers.30892.c 2014-03-01 08:31:50 +0900
+++ rmaps_rr_mappers.c 2014-03-01 08:33:22 +0900
@@ -689,7 +689,7 @@
     }

     /* compute how many objs need an extra proc */
- if (0 > (nxtra_objs = app->num_procs - (navg * nobjs))) {
+ if (0 > (nxtra_objs = (int)app->num_procs - (navg * (int)nobjs))) {
         nxtra_objs = 0;
     }

Tetsuya

> Please take a look at https://svn.open-mpi.org/trac/ompi/ticket/4317
>
>
> On Feb 27, 2014, at 8:13 PM, tmishima_at_[hidden] wrote:
>
> >
> >
> > Hi Ralph, I can't operate our cluster for a few days, sorry.
> >
> > But now, I'm narrowing down the cause by browsing the source code.
> >
> > My best guess is the line 529. The opal_hwloc_base_get_obj_by_type will
> > reset the object pointer to the first one when you move on to the next
> > node.
> >
> > 529 if (NULL == (obj =
opal_hwloc_base_get_obj_by_type
> > (node->topology, target, cache_level, i, OPAL_HWLOC_AVAILABLE))) {
> > 530 ORTE_ERROR_LOG(ORTE_ERR_NOT_FOUND);
> > 531 return ORTE_ERR_NOT_FOUND;
> > 532 }
> >
> > if node->slots=1, then nprocs is set as nprocs=1 in the second pass:
> >
> > 495 nprocs = (node->slots - node->slots_inuse) /
> > orte_rmaps_base.cpus_per_rank;
> > 496 if (nprocs < 1) {
> > 497 if (second_pass) {
> > 498 /* already checked for oversubscription
permission,
> > so at least put
> > 499 * one proc on it
> > 500 */
> > 501 nprocs = 1;
> >
> > Therefore, opal_hwloc_base_get_obj_by_type is called one by one at each
> > node, which means
> > the object we get is always first one.
> >
> > It's not elegant but I guess you need dummy calls of
> > opal_hwloc_base_get_obj_by_type to
> > move the object pointer to the right place or modify
> > opal_hwloc_base_get_obj_by_type itself.
> >
> > Tetsuya
> >
> >> I'm having trouble seeing why it is failing, so I added some more
debug
> > output. Could you run the failure case again with -mca
rmaps_base_verbose
> > 10?
> >>
> >> Thanks
> >> Ralph
> >>
> >> On Feb 27, 2014, at 6:11 PM, tmishima_at_[hidden] wrote:
> >>
> >>>
> >>>
> >>> Just checking the difference, not so significant meaning...
> >>>
> >>> Anyway, I guess it's due to the behavior when slot counts is missing
> >>> (regarded as slots=1) and it's oversubscribed unintentionally.
> >>>
> >>> I'm going out now, so I can't verify it quickly. If I provide the
> >>> correct slot counts, it wll work, I guess. How do you think?
> >>>
> >>> Tetsuya
> >>>
> >>>> "restore" in what sense?
> >>>>
> >>>> On Feb 27, 2014, at 4:10 PM, tmishima_at_[hidden] wrote:
> >>>>
> >>>>>
> >>>>>
> >>>>> Hi Ralph, this is just for your information.
> >>>>>
> >>>>> I tried to restore previous orte_rmaps_rr_byobj. Then I gets the
> > result
> >>>>> below with this command line:
> >>>>>
> >>>>> mpirun -np 8 -host node05,node06 -report-bindings -map-by
socket:pe=2
> >>>>> -display-map -bind-to core:overload-allowed
> > ~/mis/openmpi/demos/myprog
> >>>>> Data for JOB [31184,1] offset 0
> >>>>>
> >>>>> ======================== JOB MAP ========================
> >>>>>
> >>>>> Data for node: node05 Num slots: 1 Max slots: 0 Num procs: 7
> >>>>> Process OMPI jobid: [31184,1] App: 0 Process rank: 0
> >>>>> Process OMPI jobid: [31184,1] App: 0 Process rank: 2
> >>>>> Process OMPI jobid: [31184,1] App: 0 Process rank: 4
> >>>>> Process OMPI jobid: [31184,1] App: 0 Process rank: 6
> >>>>> Process OMPI jobid: [31184,1] App: 0 Process rank: 1
> >>>>> Process OMPI jobid: [31184,1] App: 0 Process rank: 3
> >>>>> Process OMPI jobid: [31184,1] App: 0 Process rank: 5
> >>>>>
> >>>>> Data for node: node06 Num slots: 1 Max slots: 0 Num procs: 1
> >>>>> Process OMPI jobid: [31184,1] App: 0 Process rank: 7
> >>>>>
> >>>>> =============================================================
> >>>>> [node06.cluster:18857] MCW rank 7 bound to socket 0[core 0[hwt 0]],
> >>> socket
> >>>>> 0[core 1[hwt 0]]: [B/B/./.][./././.]
> >>>>> [node05.cluster:21399] MCW rank 3 bound to socket 1[core 6[hwt 0]],
> >>> socket
> >>>>> 1[core 7[hwt 0]]: [./././.][././B/B]
> >>>>> [node05.cluster:21399] MCW rank 4 bound to socket 0[core 0[hwt 0]],
> >>> socket
> >>>>> 0[core 1[hwt 0]]: [B/B/./.][./././.]
> >>>>> [node05.cluster:21399] MCW rank 5 bound to socket 1[core 4[hwt 0]],
> >>> socket
> >>>>> 1[core 5[hwt 0]]: [./././.][B/B/./.]
> >>>>> [node05.cluster:21399] MCW rank 6 bound to socket 0[core 2[hwt 0]],
> >>> socket
> >>>>> 0[core 3[hwt 0]]: [././B/B][./././.]
> >>>>> [node05.cluster:21399] MCW rank 0 bound to socket 0[core 0[hwt 0]],
> >>> socket
> >>>>> 0[core 1[hwt 0]]: [B/B/./.][./././.]
> >>>>> [node05.cluster:21399] MCW rank 1 bound to socket 1[core 4[hwt 0]],
> >>> socket
> >>>>> 1[core 5[hwt 0]]: [./././.][B/B/./.]
> >>>>> [node05.cluster:21399] MCW rank 2 bound to socket 0[core 2[hwt 0]],
> >>> socket
> >>>>> 0[core 3[hwt 0]]: [././B/B][./././.]
> >>>>> ....
> >>>>>
> >>>>>
> >>>>> Then I add "-hostfile pbs_hosts" and the result is:
> >>>>>
> >>>>> [mishima_at_manage work]$cat pbs_hosts
> >>>>> node05 slots=8
> >>>>> node06 slots=8
> >>>>> [mishima_at_manage work]$ mpirun -np 8 -hostfile ~/work/pbs_hosts
> >>>>> -report-bindings -map-by socket:pe=2 -display-map
> >>>>> ~/mis/openmpi/demos/myprog
> >>>>> Data for JOB [30254,1] offset 0
> >>>>>
> >>>>> ======================== JOB MAP ========================
> >>>>>
> >>>>> Data for node: node05 Num slots: 8 Max slots: 0 Num procs: 4
> >>>>> Process OMPI jobid: [30254,1] App: 0 Process rank: 0
> >>>>> Process OMPI jobid: [30254,1] App: 0 Process rank: 2
> >>>>> Process OMPI jobid: [30254,1] App: 0 Process rank: 1
> >>>>> Process OMPI jobid: [30254,1] App: 0 Process rank: 3
> >>>>>
> >>>>> Data for node: node06 Num slots: 8 Max slots: 0 Num procs: 4
> >>>>> Process OMPI jobid: [30254,1] App: 0 Process rank: 4
> >>>>> Process OMPI jobid: [30254,1] App: 0 Process rank: 6
> >>>>> Process OMPI jobid: [30254,1] App: 0 Process rank: 5
> >>>>> Process OMPI jobid: [30254,1] App: 0 Process rank: 7
> >>>>>
> >>>>> =============================================================
> >>>>> [node05.cluster:21501] MCW rank 2 bound to socket 0[core 2[hwt 0]],
> >>> socket
> >>>>> 0[core 3[hwt 0]]: [././B/B][./././.]
> >>>>> [node05.cluster:21501] MCW rank 3 bound to socket 1[core 6[hwt 0]],
> >>> socket
> >>>>> 1[core 7[hwt 0]]: [./././.][././B/B]
> >>>>> [node05.cluster:21501] MCW rank 0 bound to socket 0[core 0[hwt 0]],
> >>> socket
> >>>>> 0[core 1[hwt 0]]: [B/B/./.][./././.]
> >>>>> [node05.cluster:21501] MCW rank 1 bound to socket 1[core 4[hwt 0]],
> >>> socket
> >>>>> 1[core 5[hwt 0]]: [./././.][B/B/./.]
> >>>>> [node06.cluster:18935] MCW rank 6 bound to socket 0[core 2[hwt 0]],
> >>> socket
> >>>>> 0[core 3[hwt 0]]: [././B/B][./././.]
> >>>>> [node06.cluster:18935] MCW rank 7 bound to socket 1[core 6[hwt 0]],
> >>> socket
> >>>>> 1[core 7[hwt 0]]: [./././.][././B/B]
> >>>>> [node06.cluster:18935] MCW rank 4 bound to socket 0[core 0[hwt 0]],
> >>> socket
> >>>>> 0[core 1[hwt 0]]: [B/B/./.][./././.]
> >>>>> [node06.cluster:18935] MCW rank 5 bound to socket 1[core 4[hwt 0]],
> >>> socket
> >>>>> 1[core 5[hwt 0]]: [./././.][B/B/./.]
> >>>>> ....
> >>>>>
> >>>>>
> >>>>> I think previous version's behavior would be close to what I
expect.
> >>>>>
> >>>>> Tetusya
> >>>>>
> >>>>>> They have 4 cores/socket and 2 sockets, totally 4 X 2 = 8 cores,
> > each.
> >>>>>>
> >>>>>> Here is the output of lstopo.
> >>>>>>
> >>>>>> mishima_at_manage round_robin]$ rsh node05
> >>>>>> Last login: Tue Feb 18 15:10:15 from manage
> >>>>>> [mishima_at_node05 ~]$ lstopo
> >>>>>> Machine (32GB)
> >>>>>> NUMANode L#0 (P#0 16GB) + Socket L#0 + L3 L#0 (6144KB)
> >>>>>> L2 L#0 (512KB) + L1d L#0 (64KB) + L1i L#0 (64KB) + Core L#0 + PU
L#0
> >>>>>> (P#0)
> >>>>>> L2 L#1 (512KB) + L1d L#1 (64KB) + L1i L#1 (64KB) + Core L#1 + PU
L#1
> >>>>>> (P#1)
> >>>>>> L2 L#2 (512KB) + L1d L#2 (64KB) + L1i L#2 (64KB) + Core L#2 + PU
L#2
> >>>>>> (P#2)
> >>>>>> L2 L#3 (512KB) + L1d L#3 (64KB) + L1i L#3 (64KB) + Core L#3 + PU
L#3
> >>>>>> (P#3)
> >>>>>> NUMANode L#1 (P#1 16GB) + Socket L#1 + L3 L#1 (6144KB)
> >>>>>> L2 L#4 (512KB) + L1d L#4 (64KB) + L1i L#4 (64KB) + Core L#4 + PU
L#4
> >>>>>> (P#4)
> >>>>>> L2 L#5 (512KB) + L1d L#5 (64KB) + L1i L#5 (64KB) + Core L#5 + PU
L#5
> >>>>>> (P#5)
> >>>>>> L2 L#6 (512KB) + L1d L#6 (64KB) + L1i L#6 (64KB) + Core L#6 + PU
L#6
> >>>>>> (P#6)
> >>>>>> L2 L#7 (512KB) + L1d L#7 (64KB) + L1i L#7 (64KB) + Core L#7 + PU
L#7
> >>>>>> (P#7)
> >>>>>> ....
> >>>>>>
> >>>>>> I foucused on byobj_span and bynode. I didn't notice byobj was
> >>> modified,
> >>>>>> sorry.
> >>>>>>
> >>>>>> Tetsuya
> >>>>>>
> >>>>>>> Hmmm..what does your node look like again (sockets and cores)?
> >>>>>>>
> >>>>>>> On Feb 27, 2014, at 3:19 PM, tmishima_at_[hidden] wrote:
> >>>>>>>
> >>>>>>>>
> >>>>>>>> Hi Ralph, I'm afraid to say your new "map-by obj" causes another
> >>>>>> problem.
> >>>>>>>>
> >>>>>>>> I have overload message with this command line as shown below:
> >>>>>>>>
> >>>>>>>> mpirun -np 8 -host node05,node06 -report-bindings -map-by
> >>> socket:pe=2
> >>>>>>>> -display-map ~/mis/openmpi/d
> >>>>>>>> emos/myprog
> >>>>>>>>
> >>>>>>
> >>>>>
> >>>
> >
--------------------------------------------------------------------------
> >>>>>>>> A request was made to bind to that would result in binding more
> >>>>>>>> processes than cpus on a resource:
> >>>>>>>>
> >>>>>>>> Bind to: CORE
> >>>>>>>> Node: node05
> >>>>>>>> #processes: 2
> >>>>>>>> #cpus: 1
> >>>>>>>>
> >>>>>>>> You can override this protection by adding the
"overload-allowed"
> >>>>>>>> option to your binding directive.
> >>>>>>>>
> >>>>>>
> >>>>>
> >>>
> >
--------------------------------------------------------------------------
> >>>>>>>>
> >>>>>>>> Then, I add "-bind-to core:overload-allowed" to see what
happenes.
> >>>>>>>>
> >>>>>>>> mpirun -np 8 -host node05,node06 -report-bindings -map-by
> >>> socket:pe=2
> >>>>>>>> -display-map -bind-to core:o
> >>>>>>>> verload-allowed ~/mis/openmpi/demos/myprog
> >>>>>>>> Data for JOB [14398,1] offset 0
> >>>>>>>>
> >>>>>>>> ======================== JOB MAP ========================
> >>>>>>>>
> >>>>>>>> Data for node: node05 Num slots: 1 Max slots: 0 Num
procs:
> > 4
> >>>>>>>> Process OMPI jobid: [14398,1] App: 0 Process rank: 0
> >>>>>>>> Process OMPI jobid: [14398,1] App: 0 Process rank: 1
> >>>>>>>> Process OMPI jobid: [14398,1] App: 0 Process rank: 2
> >>>>>>>> Process OMPI jobid: [14398,1] App: 0 Process rank: 3
> >>>>>>>>
> >>>>>>>> Data for node: node06 Num slots: 1 Max slots: 0 Num
procs:
> > 4
> >>>>>>>> Process OMPI jobid: [14398,1] App: 0 Process rank: 4
> >>>>>>>> Process OMPI jobid: [14398,1] App: 0 Process rank: 5
> >>>>>>>> Process OMPI jobid: [14398,1] App: 0 Process rank: 6
> >>>>>>>> Process OMPI jobid: [14398,1] App: 0 Process rank: 7
> >>>>>>>>
> >>>>>>>> =============================================================
> >>>>>>>> [node06.cluster:18443] MCW rank 6 bound to socket 0[core 0[hwt
> > 0]],
> >>>>>> socket
> >>>>>>>> 0[core 1[hwt 0]]: [B/B/./.][./././.]
> >>>>>>>> [node05.cluster:20901] MCW rank 2 bound to socket 0[core 0[hwt
> > 0]],
> >>>>>> socket
> >>>>>>>> 0[core 1[hwt 0]]: [B/B/./.][./././.]
> >>>>>>>> [node06.cluster:18443] MCW rank 7 bound to socket 0[core 2[hwt
> > 0]],
> >>>>>> socket
> >>>>>>>> 0[core 3[hwt 0]]: [././B/B][./././.]
> >>>>>>>> [node05.cluster:20901] MCW rank 3 bound to socket 0[core 2[hwt
> > 0]],
> >>>>>> socket
> >>>>>>>> 0[core 3[hwt 0]]: [././B/B][./././.]
> >>>>>>>> [node06.cluster:18443] MCW rank 4 bound to socket 0[core 0[hwt
> > 0]],
> >>>>>> socket
> >>>>>>>> 0[core 1[hwt 0]]: [B/B/./.][./././.]
> >>>>>>>> [node05.cluster:20901] MCW rank 0 bound to socket 0[core 0[hwt
> > 0]],
> >>>>>> socket
> >>>>>>>> 0[core 1[hwt 0]]: [B/B/./.][./././.]
> >>>>>>>> [node06.cluster:18443] MCW rank 5 bound to socket 0[core 2[hwt
> > 0]],
> >>>>>> socket
> >>>>>>>> 0[core 3[hwt 0]]: [././B/B][./././.]
> >>>>>>>> [node05.cluster:20901] MCW rank 1 bound to socket 0[core 2[hwt
> > 0]],
> >>>>>> socket
> >>>>>>>> 0[core 3[hwt 0]]: [././B/B][./././.]
> >>>>>>>> Hello world from process 4 of 8
> >>>>>>>> Hello world from process 2 of 8
> >>>>>>>> Hello world from process 6 of 8
> >>>>>>>> Hello world from process 0 of 8
> >>>>>>>> Hello world from process 5 of 8
> >>>>>>>> Hello world from process 1 of 8
> >>>>>>>> Hello world from process 7 of 8
> >>>>>>>> Hello world from process 3 of 8
> >>>>>>>>
> >>>>>>>> When I add "map-by obj:span", it works fine:
> >>>>>>>>
> >>>>>>>> mpirun -np 8 -host node05,node06 -report-bindings -map-by
> >>>>>> socket:pe=2,span
> >>>>>>>> -display-map ~/mis/ope
> >>>>>>>> nmpi/demos/myprog
> >>>>>>>> Data for JOB [14703,1] offset 0
> >>>>>>>>
> >>>>>>>> ======================== JOB MAP ========================
> >>>>>>>>
> >>>>>>>> Data for node: node05 Num slots: 1 Max slots: 0 Num
procs:
> > 4
> >>>>>>>> Process OMPI jobid: [14703,1] App: 0 Process rank: 0
> >>>>>>>> Process OMPI jobid: [14703,1] App: 0 Process rank: 2
> >>>>>>>> Process OMPI jobid: [14703,1] App: 0 Process rank: 1
> >>>>>>>> Process OMPI jobid: [14703,1] App: 0 Process rank: 3
> >>>>>>>>
> >>>>>>>> Data for node: node06 Num slots: 1 Max slots: 0 Num
procs:
> > 4
> >>>>>>>> Process OMPI jobid: [14703,1] App: 0 Process rank: 4
> >>>>>>>> Process OMPI jobid: [14703,1] App: 0 Process rank: 6
> >>>>>>>> Process OMPI jobid: [14703,1] App: 0 Process rank: 5
> >>>>>>>> Process OMPI jobid: [14703,1] App: 0 Process rank: 7
> >>>>>>>>
> >>>>>>>> =============================================================
> >>>>>>>> [node06.cluster:18491] MCW rank 6 bound to socket 0[core 2[hwt
> > 0]],
> >>>>>> socket
> >>>>>>>> 0[core 3[hwt 0]]: [././B/B][./././.]
> >>>>>>>> [node05.cluster:20949] MCW rank 2 bound to socket 0[core 2[hwt
> > 0]],
> >>>>>> socket
> >>>>>>>> 0[core 3[hwt 0]]: [././B/B][./././.]
> >>>>>>>> [node06.cluster:18491] MCW rank 7 bound to socket 1[core 6[hwt
> > 0]],
> >>>>>> socket
> >>>>>>>> 1[core 7[hwt 0]]: [./././.][././B/B]
> >>>>>>>> [node05.cluster:20949] MCW rank 3 bound to socket 1[core 6[hwt
> > 0]],
> >>>>>> socket
> >>>>>>>> 1[core 7[hwt 0]]: [./././.][././B/B]
> >>>>>>>> [node06.cluster:18491] MCW rank 4 bound to socket 0[core 0[hwt
> > 0]],
> >>>>>> socket
> >>>>>>>> 0[core 1[hwt 0]]: [B/B/./.][./././.]
> >>>>>>>> [node05.cluster:20949] MCW rank 0 bound to socket 0[core 0[hwt
> > 0]],
> >>>>>> socket
> >>>>>>>> 0[core 1[hwt 0]]: [B/B/./.][./././.]
> >>>>>>>> [node06.cluster:18491] MCW rank 5 bound to socket 1[core 4[hwt
> > 0]],
> >>>>>> socket
> >>>>>>>> 1[core 5[hwt 0]]: [./././.][B/B/./.]
> >>>>>>>> [node05.cluster:20949] MCW rank 1 bound to socket 1[core 4[hwt
> > 0]],
> >>>>>> socket
> >>>>>>>> 1[core 5[hwt 0]]: [./././.][B/B/./.]
> >>>>>>>> ....
> >>>>>>>>
> >>>>>>>> So, byobj_span would be okay. Of course, bynode and byslot
should
> > be
> >>>>>> okay.
> >>>>>>>> Could you take a look at orte_rmaps_rr_byobj again?
> >>>>>>>>
> >>>>>>>> Regards,
> >>>>>>>> Tetsuya Mishima
> >>>>>>>>
> >>>>>>>> _______________________________________________
> >>>>>>>> users mailing list
> >>>>>>>> users_at_[hidden]
> >>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>>>>>>
> >>>>>>> _______________________________________________
> >>>>>>> users mailing list
> >>>>>>> users_at_[hidden]
> >>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>>>>>
> >>>>>> _______________________________________________
> >>>>>> users mailing list
> >>>>>> users_at_[hidden]
> >>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>>>>
> >>>>> _______________________________________________
> >>>>> users mailing list
> >>>>> users_at_[hidden]>>>>>
http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>>>
> >>>> _______________________________________________
> >>>> users mailing list
> >>>> users_at_[hidden]
> >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>>
> >>> _______________________________________________
> >>> users mailing list
> >>> users_at_[hidden]
> >>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>
> >> _______________________________________________
> >> users mailing list
> >> users_at_[hidden]
> >> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users