Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] new map-by-obj has a problem
From: Ralph Castain (rhc_at_[hidden])
Date: 2014-02-27 18:24:33


Hmmm..what does your node look like again (sockets and cores)?

On Feb 27, 2014, at 3:19 PM, tmishima_at_[hidden] wrote:

>
> Hi Ralph, I'm afraid to say your new "map-by obj" causes another problem.
>
> I have overload message with this command line as shown below:
>
> mpirun -np 8 -host node05,node06 -report-bindings -map-by socket:pe=2
> -display-map ~/mis/openmpi/d
> emos/myprog
> --------------------------------------------------------------------------
> A request was made to bind to that would result in binding more
> processes than cpus on a resource:
>
> Bind to: CORE
> Node: node05
> #processes: 2
> #cpus: 1
>
> You can override this protection by adding the "overload-allowed"
> option to your binding directive.
> --------------------------------------------------------------------------
>
> Then, I add "-bind-to core:overload-allowed" to see what happenes.
>
> mpirun -np 8 -host node05,node06 -report-bindings -map-by socket:pe=2
> -display-map -bind-to core:o
> verload-allowed ~/mis/openmpi/demos/myprog
> Data for JOB [14398,1] offset 0
>
> ======================== JOB MAP ========================
>
> Data for node: node05 Num slots: 1 Max slots: 0 Num procs: 4
> Process OMPI jobid: [14398,1] App: 0 Process rank: 0
> Process OMPI jobid: [14398,1] App: 0 Process rank: 1
> Process OMPI jobid: [14398,1] App: 0 Process rank: 2
> Process OMPI jobid: [14398,1] App: 0 Process rank: 3
>
> Data for node: node06 Num slots: 1 Max slots: 0 Num procs: 4
> Process OMPI jobid: [14398,1] App: 0 Process rank: 4
> Process OMPI jobid: [14398,1] App: 0 Process rank: 5
> Process OMPI jobid: [14398,1] App: 0 Process rank: 6
> Process OMPI jobid: [14398,1] App: 0 Process rank: 7
>
> =============================================================
> [node06.cluster:18443] MCW rank 6 bound to socket 0[core 0[hwt 0]], socket
> 0[core 1[hwt 0]]: [B/B/./.][./././.]
> [node05.cluster:20901] MCW rank 2 bound to socket 0[core 0[hwt 0]], socket
> 0[core 1[hwt 0]]: [B/B/./.][./././.]
> [node06.cluster:18443] MCW rank 7 bound to socket 0[core 2[hwt 0]], socket
> 0[core 3[hwt 0]]: [././B/B][./././.]
> [node05.cluster:20901] MCW rank 3 bound to socket 0[core 2[hwt 0]], socket
> 0[core 3[hwt 0]]: [././B/B][./././.]
> [node06.cluster:18443] MCW rank 4 bound to socket 0[core 0[hwt 0]], socket
> 0[core 1[hwt 0]]: [B/B/./.][./././.]
> [node05.cluster:20901] MCW rank 0 bound to socket 0[core 0[hwt 0]], socket
> 0[core 1[hwt 0]]: [B/B/./.][./././.]
> [node06.cluster:18443] MCW rank 5 bound to socket 0[core 2[hwt 0]], socket
> 0[core 3[hwt 0]]: [././B/B][./././.]
> [node05.cluster:20901] MCW rank 1 bound to socket 0[core 2[hwt 0]], socket
> 0[core 3[hwt 0]]: [././B/B][./././.]
> Hello world from process 4 of 8
> Hello world from process 2 of 8
> Hello world from process 6 of 8
> Hello world from process 0 of 8
> Hello world from process 5 of 8
> Hello world from process 1 of 8
> Hello world from process 7 of 8
> Hello world from process 3 of 8
>
> When I add "map-by obj:span", it works fine:
>
> mpirun -np 8 -host node05,node06 -report-bindings -map-by socket:pe=2,span
> -display-map ~/mis/ope
> nmpi/demos/myprog
> Data for JOB [14703,1] offset 0
>
> ======================== JOB MAP ========================
>
> Data for node: node05 Num slots: 1 Max slots: 0 Num procs: 4
> Process OMPI jobid: [14703,1] App: 0 Process rank: 0
> Process OMPI jobid: [14703,1] App: 0 Process rank: 2
> Process OMPI jobid: [14703,1] App: 0 Process rank: 1
> Process OMPI jobid: [14703,1] App: 0 Process rank: 3
>
> Data for node: node06 Num slots: 1 Max slots: 0 Num procs: 4
> Process OMPI jobid: [14703,1] App: 0 Process rank: 4
> Process OMPI jobid: [14703,1] App: 0 Process rank: 6
> Process OMPI jobid: [14703,1] App: 0 Process rank: 5
> Process OMPI jobid: [14703,1] App: 0 Process rank: 7
>
> =============================================================
> [node06.cluster:18491] MCW rank 6 bound to socket 0[core 2[hwt 0]], socket
> 0[core 3[hwt 0]]: [././B/B][./././.]
> [node05.cluster:20949] MCW rank 2 bound to socket 0[core 2[hwt 0]], socket
> 0[core 3[hwt 0]]: [././B/B][./././.]
> [node06.cluster:18491] MCW rank 7 bound to socket 1[core 6[hwt 0]], socket
> 1[core 7[hwt 0]]: [./././.][././B/B]
> [node05.cluster:20949] MCW rank 3 bound to socket 1[core 6[hwt 0]], socket
> 1[core 7[hwt 0]]: [./././.][././B/B]
> [node06.cluster:18491] MCW rank 4 bound to socket 0[core 0[hwt 0]], socket
> 0[core 1[hwt 0]]: [B/B/./.][./././.]
> [node05.cluster:20949] MCW rank 0 bound to socket 0[core 0[hwt 0]], socket
> 0[core 1[hwt 0]]: [B/B/./.][./././.]
> [node06.cluster:18491] MCW rank 5 bound to socket 1[core 4[hwt 0]], socket
> 1[core 5[hwt 0]]: [./././.][B/B/./.]
> [node05.cluster:20949] MCW rank 1 bound to socket 1[core 4[hwt 0]], socket
> 1[core 5[hwt 0]]: [./././.][B/B/./.]
> ....
>
> So, byobj_span would be okay. Of course, bynode and byslot should be okay.
> Could you take a look at orte_rmaps_rr_byobj again?
>
> Regards,
> Tetsuya Mishima
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users