Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Debugging Runtime/Ethernet Problems
From: Ralph Castain (rhc_at_[hidden])
Date: 2013-09-20 11:17:43


I don't think you are allowed to specify both include and exclude options at the same time as they conflict - you should either exclude ib0 or include eth0 (or whatever).

My guess is that the various nodes are trying to communicate across disjoint networks. We've seen that before when, for example, eth0 on one node is on one subnet, and eth0 on another node is on a different subnet. You might look for that kind of arrangement.

On Sep 20, 2013, at 8:05 AM, "Elken, Tom" <tom.elken_at_[hidden]> wrote:

>> The trouble is when I try to add some "--mca" parameters to force it to
>> use TCP/Ethernet, the program seems to hang. I get the headers of the
>> "osu_bw" output, but no results, even on the first case (1 byte payload
>> per packet). This is occurring on both the IB-enabled nodes, and on the
>> Ethernet-only nodes. The specific syntax I was using was: "mpirun
>> --mca btl ^openib --mca btl_tcp_if_exclude ib0 ./osu_bw"
>
> When we want to run over TCP and IPoIB on an IB/PSM equipped cluster, we use:
> --mca btl sm --mca btl tcp,self --mca btl_tcp_if_exclude eth0 --mca btl_tcp_if_include ib0 --mca mtl ^psm
>
> based on this, it looks like the following might work for you:
> --mca btl sm,tcp,self --mca btl_tcp_if_exclude ib0 --mca btl_tcp_if_include eth0 --mca btl ^openib
>
> If you don't have ib0 ports configured on the IB nodes, probably you don't need the" --mca btl_tcp_if_exclude ib0."
>
> -Tom
>
>>
>> The problem occurs at least with OpenMPI 1.6.3 compiled with GNU 4.4
>> compilers, with 1.6.3 compiled with Intel 13.0.1 compilers, and with
>> 1.6.5 compiled with Intel 13.0.1 compilers. I haven't tested any other
>> combinations yet.
>>
>> Any ideas here? It's very possible this is a system configuration
>> problem, but I don't know where to look. At this point, any ideas would
>> be welcome, either about the specific situation, or general pointers on
>> mpirun debugging flags to use. I can't find much in the docs yet on
>> run-time debugging for OpenMPI, as opposed to debugging the application.
>> Maybe I'm just looking in the wrong place.
>>
>>
>> Thanks,
>>
>> --
>> Lloyd Brown
>> Systems Administrator
>> Fulton Supercomputing Lab
>> Brigham Young University
>> http://marylou.byu.edu
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users