Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] bind-to-socket across sockets with different core counts
From: Ralph Castain (rhc_at_[hidden])
Date: 2013-08-22 17:56:16


You need to tell mpirun that your system doesn't have homogeneous nodes:

   --hetero-nodes Nodes in cluster may differ in topology, so send
                                      the topology back from each node [Default = false]

On Aug 22, 2013, at 2:48 PM, Noah Knowles <nknowles_at_[hidden]> wrote:

> Hi, newb here, so sorry if this is a dumb question but I haven't found an answer. I am running OpenMPI 1.7.2 on a small Rocks 6.1, Bladecenter H cluster. I am using the bind-to-socket option on nodes with different numbers of cores per socket. For the sample output below, compute-0-2 has two 6-core sockets and compute-0-3 has two 8-core sockets.
>
> [1,4]<stderr>:[compute-0-2.local:03268] MCW rank 4 bound to socket 0[core 0[hwt 0]], socket 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], socket 0[core 3[hwt 0]], socket 0[core 4[hwt 0]], socket 0[core 5[hwt 0]]: [B/B/B/B/B/B][./././././.]
> [1,5]<stderr>:[compute-0-2.local:03268] MCW rank 5 bound to socket 1[core 6[hwt 0]], socket 1[core 7[hwt 0]], socket 1[core 8[hwt 0]], socket 1[core 9[hwt 0]], socket 1[core 10[hwt 0]], socket 1[core 11[hwt 0]]: [./././././.][B/B/B/B/B/B]
> [1,6]<stderr>:[compute-0-3.local:03816] MCW rank 6 bound to socket 0[core 0[hwt 0]], socket 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], socket 0[core 3[hwt 0]], socket 0[core 4[hwt 0]], socket 0[core 5[hwt 0]]: [B/B/B/B/B/B/./.][./././././././.]
> [1,7]<stderr>:[compute-0-3.local:03816] MCW rank 7 bound to socket 0[core 6[hwt 0]], socket 0[core 7[hwt 0]], socket 1[core 8[hwt 0]], socket 1[core 9[hwt 0]], socket 1[core 10[hwt 0]], socket 1[core 11[hwt 0]]: [././././././B/B][B/B/B/B/./././.]
>
> Is this behavior intended? Is there any way to cause bind-to-socket to use all cores on a socket for the 6-core AND the 8-core nodes? Or at least to have that last binding not spread across cores on two sockets?
> I've tried a rankfile too, but had errors-- that should probably be a separate thread though.
>
> Thanks,
> Noah
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users