Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] [EXTERNAL] Re: [OMPI svn] svn:open-mpi r28016 - trunk/ompi/mca/btl/tcp
From: Barrett, Brian W (bwbarre_at_[hidden])
Date: 2013-02-05 10:25:30


I guess I can see that, but I have the opposite use case; I have a device
on some nodes and not others that I want to ignore, so I set
btl_tcp_if_exclude to include that device. It would be totally
counter-intuitive to have a giant warning because of that.

Brian

On 2/5/13 6:46 AM, "Jeff Squyres (jsquyres)" <jsquyres_at_[hidden]> wrote:

>I had a typo in my btl_tcp_if_exclude such that it was effectively
>
> mpirun --mca btl_tco_if_exclude bogus ...
>
>instead of ignoring the actual interface I wanted to ignore. And since I
>wasn't ignoring the special loopback device that I have on some machines,
>every single MPI job hung because they tried to use those interfaces to
>communicate with processes on other nodes that that interface could not
>reach.
>
>
>
>On Feb 4, 2013, at 5:56 PM, "Barrett, Brian W" <bwbarre_at_[hidden]> wrote:
>
>> I'm confused; why is it disastrous to have an interface in if_exclude
>>that doesn't exist? I can see it being a problem if we don't exclude
>>something in the list, but the other way is (in my opinion) harmless but
>>with a useful use case...
>>
>> Brian
>>
>>
>>
>> Sent with Good (www.good.com)
>>
>>
>> -----Original Message-----
>> From: Jeff Squyres (jsquyres) [mailto:jsquyres_at_[hidden]]
>> Sent: Monday, February 04, 2013 06:47 PM Mountain Standard Time
>> To: Open MPI Developers
>> Subject: [EXTERNAL] Re: [OMPI devel] [OMPI svn] svn:open-mpi r28016 -
>>trunk/ompi/mca/btl/tcp
>>
>> On Feb 4, 2013, at 2:03 PM, George Bosilca <bosilca_at_[hidden]> wrote:
>>
>>> The two behaviors you describe for include and exclude do not look
>>>conflicting to me. Inclusion is a strong request, the user enforce the
>>>usage of a specific interface. If the interface is not available, then
>>>we have a problem. Exclude on the other side, must enforce that a
>>>specific interface is not in use, fact that can be quite simple if the
>>>interface is not available.
>>
>> I still maintain that it's equally disastrous if you don't exclude the
>>correct interfaces (I lost 2 nights of MTT because of this!).
>>
>>> I'm not a fan of the nowarn option. Seems like a lot of code with
>>>limited interest, especially if we only plan to support it in TCP.
>>
>> This is a good point -- I wonder what openib (and others?) do who
>>support *_if_include and *_if_exclude notation. Do they warn / error if
>>you specify an invalid interface?
>>
>>> If you need specialized arguments for some of your nodes here is what
>>>I do: rename the binaries to .orig, and use the original name to create
>>>a sh script that will change the value of mca_param_files to something
>>>based on the host name (if such a file exists) and then call the .orig
>>>executable. Works like a charm., even when a batch scheduler is used.
>>
>> That will still be quite difficult to do in MTT. Remember: all the
>>tests that are run in MTT are shared across all of us via the ompi-tests
>>SVN repo. Are you suggesting that I alias every test in the ompi-tests
>>SVN with a public script that you should run that should look for some
>>site-specific MCA override param file?
>>
>> --
>> Jeff Squyres
>> jsquyres_at_[hidden]
>> For corporate legal information go to:
>>http://www.cisco.com/web/about/doing_business/legal/cri/
>>
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
>--
>Jeff Squyres
>jsquyres_at_[hidden]
>For corporate legal information go to:
>http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
>_______________________________________________
>devel mailing list
>devel_at_[hidden]
>http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>

--
  Brian W. Barrett
  Scalable System Software Group
  Sandia National Laboratories


  • application/pkcs7-signature attachment: smime.p7s