Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] tcp communication problems with 1.4.3 and 1.4.4 rc2 on FreeBSD
From: Steve Kargl (sgk_at_[hidden])
Date: 2011-07-08 15:09:09


On Fri, Jul 08, 2011 at 02:19:27PM -0400, Jeff Squyres wrote:
>
> The easiest way to fix this is likely to use the btl_tcp_if_include
> or btl_tcp_if_exclude MCA parameters -- i.e., tell OMPI exactly
> which interfaces to use:
>
> http://www.open-mpi.org/faq/?category=tcp#tcp-selection
>

Perhaps, I'm again misreading the output, but it appears that
1.4.4rc2 does not even see the 2nd nic.

hpc:kargl[317] ifconfig bge0
bge0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
    options=8009b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,LINKSTATE>
    ether 00:e0:81:40:48:92
    inet 10.208.78.111 netmask 0xffffff00 broadcast 10.208.78.255
    inet6 fe80::2e0:81ff:fe40:4892%bge0 prefixlen 64 scopeid 0x3
    nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
    media: Ethernet autoselect (1000baseT <full-duplex>)
    status: active
hpc:kargl[318] ifconfig bge1
bge1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
    options=8009b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,LINKSTATE>
    ether 00:e0:81:40:48:93
    inet 192.168.0.10 netmask 0xffffff00 broadcast 192.168.0.255
    inet6 fe80::2e0:81ff:fe40:4893%bge1 prefixlen 64 scopeid 0x4
    nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
    media: Ethernet autoselect (1000baseT <full-duplex>)
    status: active

kargl[319] /usr/local/openmpi-1.4.4/bin/mpiexec --mca btl_base_verbose 30 \
  --mca btl_tcp_if_include bge1 -machinefile mf1 ./z

hpc:kargl[320] /usr/local/openmpi-1.4.4/bin/mpiexec --mca btl_base_verbose 10 --mca btl_tcp_if_include bge1 -machinefile mf1 ./z
[hpc.apl.washington.edu:12295] mca: base: components_open: Looking for btl components
[hpc.apl.washington.edu:12295] mca: base: components_open: opening btl components
[hpc.apl.washington.edu:12295] mca: base: components_open: found loaded component self
[hpc.apl.washington.edu:12295] mca: base: components_open: component self has no register function
[hpc.apl.washington.edu:12295] mca: base: components_open: component self open function successful
[hpc.apl.washington.edu:12295] mca: base: components_open: found loaded component sm
[hpc.apl.washington.edu:12295] mca: base: components_open: component sm has no register function
[hpc.apl.washington.edu:12295] mca: base: components_open: component sm open function successful
[hpc.apl.washington.edu:12295] mca: base: components_open: found loaded component tcp
[hpc.apl.washington.edu:12295] mca: base: components_open: component tcp has no register function
[hpc.apl.washington.edu:12295] mca: base: components_open: component tcp open function successful
[hpc.apl.washington.edu:12295] select: initializing btl component self
[hpc.apl.washington.edu:12295] select: init of component self returned success
[hpc.apl.washington.edu:12295] select: initializing btl component sm
[hpc.apl.washington.edu:12295] select: init of component sm returned success
[hpc.apl.washington.edu:12295] select: initializing btl component tcp
[hpc.apl.washington.edu:12295] select: init of component tcp returned success
[node11.cimu.org:21878] mca: base: components_open: Looking for btl components
[node11.cimu.org:21878] mca: base: components_open: opening btl components
[node11.cimu.org:21878] mca: base: components_open: found loaded component self
[node11.cimu.org:21878] mca: base: components_open: component self has no register function
[node11.cimu.org:21878] mca: base: components_open: component self open function successful
[node11.cimu.org:21878] mca: base: components_open: found loaded component sm
[node11.cimu.org:21878] mca: base: components_open: component sm has no register function
[node11.cimu.org:21878] mca: base: components_open: component sm open function successful
[node11.cimu.org:21878] mca: base: components_open: found loaded component tcp
[node11.cimu.org:21878] mca: base: components_open: component tcp has no register function
[node11.cimu.org:21878] mca: base: components_open: component tcp open function successful
[node11.cimu.org:21878] select: initializing btl component self
[node11.cimu.org:21878] select: init of component self returned success
[node11.cimu.org:21878] select: initializing btl component sm
[node11.cimu.org:21878] select: init of component sm returned success
[node11.cimu.org:21878] select: initializing btl component tcp
[node11.cimu.org][[13916,1],1][btl_tcp_component.c:468:mca_btl_tcp_component_create_instances] invalid interface "bge1"
[node11.cimu.org:21878] select: init of component tcp returned success
--------------------------------------------------------------------------
At least one pair of MPI processes are unable to reach each other for
MPI communications. This means that no Open MPI device has indicated
that it can be used to communicate between these processes. This is
an error; Open MPI requires that all MPI processes be able to reach
each other. This error can sometimes be the result of forgetting to
specify the "self" BTL.

-- 
Steve