Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI devel] [OMPI svn] svn:open-mpi r28016 - trunk/ompi/mca/btl/tcp
From: Jeff Squyres (jsquyres) (jsquyres_at_[hidden])
Date: 2013-02-04 12:02:48


On Feb 1, 2013, at 9:59 PM, "Barrett, Brian W" <bwbarre_at_[hidden]> wrote:

> I don't think this is right either. Excluding a device that doesn't exist has many use cases. Such as disabling a network that only exists on part of the cluster. I'm not sure about what to do with seq; it's more like include than exclude.

Hmm. I've now given this quite a bit of thought. Here's what I think:

1. Just like there might be good reasons to exclude non-existent interfaces (e.g., networks that only include on part of the cluster), the same argument could be made for *including* non-existent interfaces.

2. It seems odd to me to have different behavior for non-existent interfaces between include, exclude, and/or seq.

3. We have a very strong precedent throughout OMPI that if a human asks for something that OMPI can't deliver, OMPI should error. According to this, and according to the Law of Least Surprise, I would think that if I typo an exclude interface name, OMPI should error and make a human figure it out.

4. If someone wants different includes/excludes in different parts of the cluster, then they should have per-node values for these MCA params.

5. That being said, #4 is not always feasible. Concrete example (which is why this whole thing started, incidentally): in my MTT cluster at Cisco, I have *some* nodes with back-to-back interfaces. I can't think of a good way to have per-node MCA params in an MTT run that is SLURM-queued and may end up on random nodes in my cluster -- that may or may not include nodes with loopback interfaces.

So how about this compromise:

If an invalid include, exclude, or if_seq interface is specified:
- If that interface is prefaced with "nowarn:", silently ignore that token
- Otherwise, display a show_help message and ignore the TCP BTL

For example:

    mpirun --mca btl_tcp_if_include nowarn:eth5,eth6

- If eth5 doesn't exist, the job will continue just as if eth5 wasn't specified
- If eth6 doesn't exist, the TCP BTL will disqualify itself

(BTW: yes, I'm volunteering to code up whatever we agree on)

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/