Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] [EXTERNAL] Re: [OMPI svn] svn:open-mpi r28016 - trunk/ompi/mca/btl/tcp
From: Jeff Squyres (jsquyres) (jsquyres_at_[hidden])
Date: 2013-02-05 10:33:08


Yeah, that's the quandary: I can see both use cases.

That's why I proposed the "nowarn:" syntax that George hated. :-)

Got any other suggestion on how to handle both use cases?

On Feb 5, 2013, at 7:25 AM, "Barrett, Brian W" <bwbarre_at_[hidden]> wrote:

> I guess I can see that, but I have the opposite use case; I have a device
> on some nodes and not others that I want to ignore, so I set
> btl_tcp_if_exclude to include that device. It would be totally
> counter-intuitive to have a giant warning because of that.
>
> Brian
>
> On 2/5/13 6:46 AM, "Jeff Squyres (jsquyres)" <jsquyres_at_[hidden]> wrote:
>
>> I had a typo in my btl_tcp_if_exclude such that it was effectively
>>
>> mpirun --mca btl_tco_if_exclude bogus ...
>>
>> instead of ignoring the actual interface I wanted to ignore. And since I
>> wasn't ignoring the special loopback device that I have on some machines,
>> every single MPI job hung because they tried to use those interfaces to
>> communicate with processes on other nodes that that interface could not
>> reach.
>>
>>
>>
>> On Feb 4, 2013, at 5:56 PM, "Barrett, Brian W" <bwbarre_at_[hidden]> wrote:
>>
>>> I'm confused; why is it disastrous to have an interface in if_exclude
>>> that doesn't exist? I can see it being a problem if we don't exclude
>>> something in the list, but the other way is (in my opinion) harmless but
>>> with a useful use case...
>>>
>>> Brian
>>>
>>>
>>>
>>> Sent with Good (www.good.com)
>>>
>>>
>>> -----Original Message-----
>>> From: Jeff Squyres (jsquyres) [mailto:jsquyres_at_[hidden]]
>>> Sent: Monday, February 04, 2013 06:47 PM Mountain Standard Time
>>> To: Open MPI Developers
>>> Subject: [EXTERNAL] Re: [OMPI devel] [OMPI svn] svn:open-mpi r28016 -
>>> trunk/ompi/mca/btl/tcp
>>>
>>> On Feb 4, 2013, at 2:03 PM, George Bosilca <bosilca_at_[hidden]> wrote:
>>>
>>>> The two behaviors you describe for include and exclude do not look
>>>> conflicting to me. Inclusion is a strong request, the user enforce the
>>>> usage of a specific interface. If the interface is not available, then
>>>> we have a problem. Exclude on the other side, must enforce that a
>>>> specific interface is not in use, fact that can be quite simple if the
>>>> interface is not available.
>>>
>>> I still maintain that it's equally disastrous if you don't exclude the
>>> correct interfaces (I lost 2 nights of MTT because of this!).
>>>
>>>> I'm not a fan of the nowarn option. Seems like a lot of code with
>>>> limited interest, especially if we only plan to support it in TCP.
>>>
>>> This is a good point -- I wonder what openib (and others?) do who
>>> support *_if_include and *_if_exclude notation. Do they warn / error if
>>> you specify an invalid interface?
>>>
>>>> If you need specialized arguments for some of your nodes here is what
>>>> I do: rename the binaries to .orig, and use the original name to create
>>>> a sh script that will change the value of mca_param_files to something
>>>> based on the host name (if such a file exists) and then call the .orig
>>>> executable. Works like a charm., even when a batch scheduler is used.
>>>
>>> That will still be quite difficult to do in MTT. Remember: all the
>>> tests that are run in MTT are shared across all of us via the ompi-tests
>>> SVN repo. Are you suggesting that I alias every test in the ompi-tests
>>> SVN with a public script that you should run that should look for some
>>> site-specific MCA override param file?
>>>
>>> --
>>> Jeff Squyres
>>> jsquyres_at_[hidden]
>>> For corporate legal information go to:
>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>>
>>>
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>>
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>>
>> --
>> Jeff Squyres
>> jsquyres_at_[hidden]
>> For corporate legal information go to:
>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>>
>
>
> --
> Brian W. Barrett
> Scalable System Software Group
> Sandia National Laboratories
>
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/