Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] default num_procs of round_robin_mapper with cpus-per-proc option
From: tmishima_at_[hidden]
Date: 2014-01-22 23:08:59


Thanks, Ralph.

I have one more question. I'm sorry to ask you many things ...

Could you tell me the difference between "map-by slot" and "map-by core".
>From my understanding, slot is the synonym of core. But those behaviors
using openmpi-1.7.4rc2 with the cpus-per-proc option are quite different
as shown below. I tried to browse the source code but I could not make it
clear so far.

Regards,
Tetsuya Mishima

[ un-managed environment] (node05,06 has 8 cores each)

[mishima_at_manage work]$ cat pbs_hosts
node05
node05
node05
node05
node05
node05
node05
node05
node06
node06
node06
node06
node06
node06
node06
node06
[mishima_at_manage work]$ mpirun -np 4 -hostfile pbs_hosts -report-bindings
-cpus-per-proc 4 -map-by slot ~/mis/openmpi/dem
os/myprog
[node05.cluster:23949] MCW rank 1 bound to socket 1[core 4[hwt 0]], socket
1[core 5[hwt 0]], socket 1[core 6[hwt 0]], so
cket 1[core 7[hwt 0]]: [./././.][B/B/B/B]
[node05.cluster:23949] MCW rank 0 bound to socket 0[core 0[hwt 0]], socket
0[core 1[hwt 0]], socket 0[core 2[hwt 0]], so
cket 0[core 3[hwt 0]]: [B/B/B/B][./././.]
[node06.cluster:22139] MCW rank 3 bound to socket 1[core 4[hwt 0]], socket
1[core 5[hwt 0]], socket 1[core 6[hwt 0]], so
cket 1[core 7[hwt 0]]: [./././.][B/B/B/B]
[node06.cluster:22139] MCW rank 2 bound to socket 0[core 0[hwt 0]], socket
0[core 1[hwt 0]], socket 0[core 2[hwt 0]], so
cket 0[core 3[hwt 0]]: [B/B/B/B][./././.]
Hello world from process 0 of 4
Hello world from process 1 of 4
Hello world from process 3 of 4
Hello world from process 2 of 4
[mishima_at_manage work]$ mpirun -np 4 -hostfile pbs_hosts -report-bindings
-cpus-per-proc 4 -map-by core ~/mis/openmpi/dem
os/myprog
[node05.cluster:23985] MCW rank 1 bound to socket 0[core 1[hwt 0]]:
[./B/./.][./././.]
[node05.cluster:23985] MCW rank 0 bound to socket 0[core 0[hwt 0]]:
[B/././.][./././.]
[node06.cluster:22175] MCW rank 3 bound to socket 0[core 1[hwt 0]]:
[./B/./.][./././.]
[node06.cluster:22175] MCW rank 2 bound to socket 0[core 0[hwt 0]]:
[B/././.][./././.]
Hello world from process 2 of 4
Hello world from process 3 of 4
Hello world from process 0 of 4
Hello world from process 1 of 4

(note) I have the same behavior in the managed environment by Torque

> Seems like a reasonable, minimal risk request - will do
>
> On Jan 22, 2014, at 4:28 PM, tmishima_at_[hidden] wrote:
>
> >
> > Hi Ralph, I want to ask you one more thing about default setting of
> > num_procs
> > when we don't specify the -np option and we set the cpus-per-proc > 1.
> >
> > In this case, the round_robin_mapper sets num_procs = num_slots as
below:
> >
> > rmaps_rr.c:
> > 130 if (0 == app->num_procs) {
> > 131 /* set the num_procs to equal the number of slots on
these
> > mapped nodes */
> > 132 app->num_procs = num_slots;
> > 133 }
> >
> > However, because of cpus_per_rank > 1, this num_procs will be refused
at
> > the
> > line 61 in rmaps_rr_mappers.c as below, unless we switch on the
> > oversubscribe
> > directive.
> >
> > rmaps_rr_mappers.c:
> > 61 if (num_slots < ((int)app->num_procs *
> > orte_rmaps_base.cpus_per_rank)) {
> > 62 if (ORTE_MAPPING_NO_OVERSUBSCRIBE &
ORTE_GET_MAPPING_DIRECTIVE
> > (jdata->map->mapping)) {
> > 63 orte_show_help("help-orte-rmaps-base.txt",
> > "orte-rmaps-base:alloc-error",
> > 64 true, app->num_procs, app->app);
> > 65 return ORTE_ERR_SILENT;
> > 66 }
> > 67 }
> >
> > Therefore, I think the default num_procs should be equal to the number
of
> > num_slots divided by cpus/rank:
> >
> > app->num_procs = num_slots / orte_rmaps_base.cpus_per_rank;
> >
> > This would be more convinient for most of people who want to use the
> > -cpus-per-proc option. I already confirmed it worked well. Please
consider
> > to apply this fix to 1.7.4.
> >
> > Regards,
> > Tetsuya Mishima
> >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users