Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI devel] [EXTERNAL] Re: [OMPI svn] svn:open-mpi r28016 - trunk/ompi/mca/btl/tcp
From: Jeff Squyres (jsquyres) (jsquyres_at_[hidden])
Date: 2013-02-05 10:33:08


Yeah, that's the quandary: I can see both use cases.

That's why I proposed the "nowarn:" syntax that George hated. :-)

Got any other suggestion on how to handle both use cases?

On Feb 5, 2013, at 7:25 AM, "Barrett, Brian W" <bwbarre_at_[hidden]> wrote:

> I guess I can see that, but I have the opposite use case; I have a device
> on some nodes and not others that I want to ignore, so I set
> btl_tcp_if_exclude to include that device. It would be totally
> counter-intuitive to have a giant warning because of that.
>
> Brian
>
> On 2/5/13 6:46 AM, "Jeff Squyres (jsquyres)" <jsquyres_at_[hidden]> wrote:
>
>> I had a typo in my btl_tcp_if_exclude such that it was effectively
>>
>> mpirun --mca btl_tco_if_exclude bogus ...
>>
>> instead of ignoring the actual interface I wanted to ignore. And since I
>> wasn't ignoring the special loopback device that I have on some machines,
>> every single MPI job hung because they tried to use those interfaces to
>> communicate with processes on other nodes that that interface could not
>> reach.
>>
>>
>>
>> On Feb 4, 2013, at 5:56 PM, "Barrett, Brian W" <bwbarre_at_[hidden]> wrote:
>>
>>> I'm confused; why is it disastrous to have an interface in if_exclude
>>> that doesn't exist? I can see it being a problem if we don't exclude
>>> something in the list, but the other way is (in my opinion) harmless but
>>> with a useful use case...
>>>
>>> Brian
>>>
>>>
>>>
>>> Sent with Good (www.good.com)
>>>
>>>
>>> -----Original Message-----
>>> From: Jeff Squyres (jsquyres) [mailto:jsquyres_at_[hidden]]
>>> Sent: Monday, February 04, 2013 06:47 PM Mountain Standard Time
>>> To: Open MPI Developers
>>> Subject: [EXTERNAL] Re: [OMPI devel] [OMPI svn] svn:open-mpi r28016 -
>>> trunk/ompi/mca/btl/tcp
>>>
>>> On Feb 4, 2013, at 2:03 PM, George Bosilca <bosilca_at_[hidden]> wrote:
>>>
>>>> The two behaviors you describe for include and exclude do not look
>>>> conflicting to me. Inclusion is a strong request, the user enforce the
>>>> usage of a specific interface. If the interface is not available, then
>>>> we have a problem. Exclude on the other side, must enforce that a
>>>> specific interface is not in use, fact that can be quite simple if the
>>>> interface is not available.
>>>
>>> I still maintain that it's equally disastrous if you don't exclude the
>>> correct interfaces (I lost 2 nights of MTT because of this!).
>>>
>>>> I'm not a fan of the nowarn option. Seems like a lot of code with
>>>> limited interest, especially if we only plan to support it in TCP.
>>>
>>> This is a good point -- I wonder what openib (and others?) do who
>>> support *_if_include and *_if_exclude notation. Do they warn / error if
>>> you specify an invalid interface?
>>>
>>>> If you need specialized arguments for some of your nodes here is what
>>>> I do: rename the binaries to .orig, and use the original name to create
>>>> a sh script that will change the value of mca_param_files to something
>>>> based on the host name (if such a file exists) and then call the .orig
>>>> executable. Works like a charm., even when a batch scheduler is used.
>>>
>>> That will still be quite difficult to do in MTT. Remember: all the
>>> tests that are run in MTT are shared across all of us via the ompi-tests
>>> SVN repo. Are you suggesting that I alias every test in the ompi-tests
>>> SVN with a public script that you should run that should look for some
>>> site-specific MCA override param file?
>>>
>>> --
>>> Jeff Squyres
>>> jsquyres_at_[hidden]
>>> For corporate legal information go to:
>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>>
>>>
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>>
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>>
>> --
>> Jeff Squyres
>> jsquyres_at_[hidden]
>> For corporate legal information go to:
>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>>
>
>
> --
> Brian W. Barrett
> Scalable System Software Group
> Sandia National Laboratories
>
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/