This web mail archive is frozen.
This page is part of a frozen web archive of this mailing list.
You can still navigate around this archive, but know that no new mails
have been added to it since July of 2016.
Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.
Correct -- it doesn't make sense to specify both include *and* exclude: by specifying one, you're implicitly (but exactly/precisely) specifying the other.
My suggestion would be to use positive notation, not negative notation. For example:
mpirun --mca btl tcp,self --mca btl_tcp_if_include eth0 ...
That way, you *know* you're only getting the TCP and self BTLs, and you *know* you're only getting eth0. If that works, then spread out from there, e.g.:
mpirun --mca btl tcp,sm,self --mca btl_tcp_if_include eth0,eth1 ...
E.g., also include the "sm" BTL (which is only used for shared memory communications between 2 procs on the same server, and is therefore useless for a 2-proc-across-2-server run of osu_bw, but you get the idea), but also use eth0 and eth1.
And so on.
The problem with using ^openib and/or btl_tcp_if_exclude is that you might end up using some BTLs and/or TCP interfaces that you don't expect, and therefore can run into problems.
On Sep 20, 2013, at 11:17 AM, Ralph Castain <rhc_at_[hidden]> wrote:
> I don't think you are allowed to specify both include and exclude options at the same time as they conflict - you should either exclude ib0 or include eth0 (or whatever).
> My guess is that the various nodes are trying to communicate across disjoint networks. We've seen that before when, for example, eth0 on one node is on one subnet, and eth0 on another node is on a different subnet. You might look for that kind of arrangement.
> On Sep 20, 2013, at 8:05 AM, "Elken, Tom" <tom.elken_at_[hidden]> wrote:
>>> The trouble is when I try to add some "--mca" parameters to force it to
>>> use TCP/Ethernet, the program seems to hang. I get the headers of the
>>> "osu_bw" output, but no results, even on the first case (1 byte payload
>>> per packet). This is occurring on both the IB-enabled nodes, and on the
>>> Ethernet-only nodes. The specific syntax I was using was: "mpirun
>>> --mca btl ^openib --mca btl_tcp_if_exclude ib0 ./osu_bw"
>> When we want to run over TCP and IPoIB on an IB/PSM equipped cluster, we use:
>> --mca btl sm --mca btl tcp,self --mca btl_tcp_if_exclude eth0 --mca btl_tcp_if_include ib0 --mca mtl ^psm
>> based on this, it looks like the following might work for you:
>> --mca btl sm,tcp,self --mca btl_tcp_if_exclude ib0 --mca btl_tcp_if_include eth0 --mca btl ^openib
>> If you don't have ib0 ports configured on the IB nodes, probably you don't need the" --mca btl_tcp_if_exclude ib0."
>>> The problem occurs at least with OpenMPI 1.6.3 compiled with GNU 4.4
>>> compilers, with 1.6.3 compiled with Intel 13.0.1 compilers, and with
>>> 1.6.5 compiled with Intel 13.0.1 compilers. I haven't tested any other
>>> combinations yet.
>>> Any ideas here? It's very possible this is a system configuration
>>> problem, but I don't know where to look. At this point, any ideas would
>>> be welcome, either about the specific situation, or general pointers on
>>> mpirun debugging flags to use. I can't find much in the docs yet on
>>> run-time debugging for OpenMPI, as opposed to debugging the application.
>>> Maybe I'm just looking in the wrong place.
>>> Lloyd Brown
>>> Systems Administrator
>>> Fulton Supercomputing Lab
>>> Brigham Young University
>>> users mailing list
>> users mailing list
> users mailing list
For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/