Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] slowdown with infiniband and latest CentOS kernel
From: Nathan Hjelm (hjelmn_at_[hidden])
Date: 2013-12-17 11:24:08


On Tue, Dec 17, 2013 at 11:16:48AM -0500, Noam Bernstein wrote:
> On Dec 17, 2013, at 11:04 AM, Ralph Castain <rhc_at_[hidden]> wrote:
>
> > Are you binding the procs? We don't bind by default (this will change in 1.7.4), and binding can play a significant role when comparing across kernels.
> >
> > add "--bind-to-core" to your cmd line
>
> I've previously always used mpi_paffinity_alone=1, and the new behavior
> seems to be independent of whether or not I use it. I'll try bind-to-core.

That would be the problem. That variable no longer exists in 1.7.4 and
has been replaced by hwloc_base_binding_policy. --bind-to core is an
alias of -mca hwloc_base_binding_policy core.

> One more possible clue. I haven't done a full test, but for one
> particular setup (newer nodes, single node so presumably using
> sm), there are apparently two ways to fix the problem:
> 1. go back to the previous kernel, but stick with openmpi 1.7.3
> 2. stick with the new kernel, but go back to openmpi 1.6.4
>
> So it appears to be some interaction between the new kernel and 1.7.3 that
> isn't present with 1.6.4.
>
> We specifically switched to 1.7.3 because of a bug in 1.6.4 (lock up in some
> collective communication), but now I'm wondering whether I should just test
> 1.6.5.
>
> Noam
>

> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users



  • application/pgp-signature attachment: stored