Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] default num_procs of round_robin_mapper with cpus-per-proc option
From: Ralph Castain (rhc_at_[hidden])
Date: 2014-01-23 00:06:37


On Jan 22, 2014, at 8:08 PM, tmishima_at_[hidden] wrote:

>
>
> Thanks, Ralph.
>
> I have one more question. I'm sorry to ask you many things ...

Not a problem

>
> Could you tell me the difference between "map-by slot" and "map-by core".
> From my understanding, slot is the synonym of core.

Not really - see below

> But those behaviors
> using openmpi-1.7.4rc2 with the cpus-per-proc option are quite different
> as shown below. I tried to browse the source code but I could not make it
> clear so far.
>

It is a little subtle, I fear. When you tell us "map-by slot", we assign each process to an allocated slot without associating it to any specific cpu or core. When we then bind to core (as we do by default), we balance the binding across the sockets to improve performance.

When you tell us "map-by core", then we directly associate each process with a specific core. So when we bind, we bind you to that core. This will cause us to fully use all the cores on the first socket before we move to the next.

I'm a little puzzled by your output as it appears that cpus-per-proc was ignored, so that's something I'd have to look at more carefully. Best guess is that we aren't skipping cores to account for the cpus-per-core setting, and thus the procs are being mapped to consecutive cores - which wouldn't be very good if we then bound them to multiple neighboring cores as they'd fall on top of each other.

> Regards,
> Tetsuya Mishima
>
> [ un-managed environment] (node05,06 has 8 cores each)
>
> [mishima_at_manage work]$ cat pbs_hosts
> node05
> node05
> node05
> node05
> node05
> node05
> node05
> node05
> node06
> node06
> node06
> node06
> node06
> node06
> node06
> node06
> [mishima_at_manage work]$ mpirun -np 4 -hostfile pbs_hosts -report-bindings
> -cpus-per-proc 4 -map-by slot ~/mis/openmpi/dem
> os/myprog
> [node05.cluster:23949] MCW rank 1 bound to socket 1[core 4[hwt 0]], socket
> 1[core 5[hwt 0]], socket 1[core 6[hwt 0]], so
> cket 1[core 7[hwt 0]]: [./././.][B/B/B/B]
> [node05.cluster:23949] MCW rank 0 bound to socket 0[core 0[hwt 0]], socket
> 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], so
> cket 0[core 3[hwt 0]]: [B/B/B/B][./././.]
> [node06.cluster:22139] MCW rank 3 bound to socket 1[core 4[hwt 0]], socket
> 1[core 5[hwt 0]], socket 1[core 6[hwt 0]], so
> cket 1[core 7[hwt 0]]: [./././.][B/B/B/B]
> [node06.cluster:22139] MCW rank 2 bound to socket 0[core 0[hwt 0]], socket
> 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], so
> cket 0[core 3[hwt 0]]: [B/B/B/B][./././.]
> Hello world from process 0 of 4
> Hello world from process 1 of 4
> Hello world from process 3 of 4
> Hello world from process 2 of 4
> [mishima_at_manage work]$ mpirun -np 4 -hostfile pbs_hosts -report-bindings
> -cpus-per-proc 4 -map-by core ~/mis/openmpi/dem
> os/myprog
> [node05.cluster:23985] MCW rank 1 bound to socket 0[core 1[hwt 0]]:
> [./B/./.][./././.]
> [node05.cluster:23985] MCW rank 0 bound to socket 0[core 0[hwt 0]]:
> [B/././.][./././.]
> [node06.cluster:22175] MCW rank 3 bound to socket 0[core 1[hwt 0]]:
> [./B/./.][./././.]
> [node06.cluster:22175] MCW rank 2 bound to socket 0[core 0[hwt 0]]:
> [B/././.][./././.]
> Hello world from process 2 of 4
> Hello world from process 3 of 4
> Hello world from process 0 of 4
> Hello world from process 1 of 4
>
> (note) I have the same behavior in the managed environment by Torque
>
>> Seems like a reasonable, minimal risk request - will do
>>
>> On Jan 22, 2014, at 4:28 PM, tmishima_at_[hidden] wrote:
>>
>>>
>>> Hi Ralph, I want to ask you one more thing about default setting of
>>> num_procs
>>> when we don't specify the -np option and we set the cpus-per-proc > 1.
>>>
>>> In this case, the round_robin_mapper sets num_procs = num_slots as
> below:
>>>
>>> rmaps_rr.c:
>>> 130 if (0 == app->num_procs) {
>>> 131 /* set the num_procs to equal the number of slots on
> these
>>> mapped nodes */
>>> 132 app->num_procs = num_slots;
>>> 133 }
>>>
>>> However, because of cpus_per_rank > 1, this num_procs will be refused
> at
>>> the
>>> line 61 in rmaps_rr_mappers.c as below, unless we switch on the
>>> oversubscribe
>>> directive.
>>>
>>> rmaps_rr_mappers.c:
>>> 61 if (num_slots < ((int)app->num_procs *
>>> orte_rmaps_base.cpus_per_rank)) {
>>> 62 if (ORTE_MAPPING_NO_OVERSUBSCRIBE &
> ORTE_GET_MAPPING_DIRECTIVE
>>> (jdata->map->mapping)) {
>>> 63 orte_show_help("help-orte-rmaps-base.txt",
>>> "orte-rmaps-base:alloc-error",
>>> 64 true, app->num_procs, app->app);
>>> 65 return ORTE_ERR_SILENT;
>>> 66 }
>>> 67 }
>>>
>>> Therefore, I think the default num_procs should be equal to the number
> of
>>> num_slots divided by cpus/rank:
>>>
>>> app->num_procs = num_slots / orte_rmaps_base.cpus_per_rank;
>>>
>>> This would be more convinient for most of people who want to use the
>>> -cpus-per-proc option. I already confirmed it worked well. Please
> consider
>>> to apply this fix to 1.7.4.
>>>
>>> Regards,
>>> Tetsuya Mishima
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users