Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Mapping ranks to hosts (from MPI error messages)
From: Jeff Squyres (jsquyres) (jsquyres_at_[hidden])
Date: 2014-03-28 09:58:40


Good information; thanks.

The short reason for this change in behavior of the affinity options is that when we first created affinity (waaaay back in 1.0 days, no one really cared about it much, and so we just did a first attempt). Gradually over time, affinity has become much more important. As such, we have learned much from what our users want and how they want to use affinity. That has caused a few changes in approaches to how we do affinity -- and because our understanding has grown, sometime it means that the changes we've made have been revolutionary (vs. evolutionary), meaning that CLI options change, behaviors change, etc.

Sorry about that -- it really reflects how the whole HPC community is evolving its attitude towards affinity over time.

BTW, you should be aware that Open MPI v1.8 -- i.e., the next stable series -- is scheduled to be released on Monday. There's additional changes with regards to affinity in 1.8 (compared to the v1.6 series); much of what has been discussed on this thread has been in the context of v1.7.x (which is being renamed to 1.8 on Monday, per our "feature series eventually turns into stable series" versioning philosophy).

On Mar 28, 2014, at 9:47 AM, "Sasso, John (GE Power & Water, Non-GE)" <John1.Sasso_at_[hidden]> wrote:

> Thanks again! I tried --display-devel-map and I think it provides a bit too much info for our needs. However, it is nice to know.
>
> BTW, some interesting behavior in using "--report-bindings --bind-to-core" vs "--display-map".
>
> * If I use "--report-bindings --bind-to-core" but the MPI tasks on a host fail to start up, then nothing is reported. For example, I had a problem where a job started across 4 hosts but the hosts could not communicate with one another via TCP/IP.
>
> * If I use "--display-map" then the mapping is shown, even in the failure case I mentioned in the last bullet.
>
> * What is nice about "--report-bindings --bind-to-core" over "--display-map" is that it will report the binding of each rank to CPU, whereas the latter will show you what ranks are running on a given host. For our needs, this may be sufficient, tho it would be nice to have the CPU bindings shown as well
>
> * If using "--report-bindings --bind-to-core" with OpenMPI 1.4.1 then the bindings on just the head node are shown. In 1.6.1, full bindings across all hosts are shown. (I'd have to read release notes on this...)
>
> --john
>
>
> -----Original Message-----
> From: users [mailto:users-bounces_at_[hidden]] On Behalf Of Ralph Castain
> Sent: Thursday, March 27, 2014 7:01 PM
> To: Open MPI Users
> Subject: Re: [OMPI users] Mapping ranks to hosts (from MPI error messages)
>
> Oooooooh...it's Jeff's fault!
>
> Fwiw you can get even more detailed mapping info with --display-devel-map
>
> Sent from my iPhone
>
>> On Mar 27, 2014, at 2:58 PM, "Jeff Squyres (jsquyres)" <jsquyres_at_[hidden]> wrote:
>>
>>> On Mar 27, 2014, at 4:06 PM, "Sasso, John (GE Power & Water, Non-GE)" <John1.Sasso_at_[hidden]> wrote:
>>>
>>> Yes, I noticed that I could not find --display-map in any of the man pages. Intentional?
>>
>> Oops; nope. I'll ask Ralph to add it...
>>
>> --
>> Jeff Squyres
>> jsquyres_at_[hidden]
>> For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/