Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] [OMPI devel] Open-MPI between Mac and Linux (ubuntu 9.04) over wireless
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2009-09-22 21:20:14


(only replying to users list)

Some suggestions:

- MPI seems to startup but the additional TCP connections required for
MPI connections seem to be failing / timing out / some other error.
- Are you running firewalls between your machines? If so, can you
disable them?
- I see that you're specifying "--mca btl_tcp_port_min_v4 36900" but
one of the debug lines reads:
> [apex-backpack:31956] btl: tcp: attempting to connect() to address
> 10.11.14.203 on port 9360
- Try not using the name "localhost", but rather the IP address of the
local machine

On Sep 22, 2009, at 5:27 PM, Pallab Datta wrote:

> The following are the ifconfig for both the Mac and the Linux
> respectively:
>
> fuji:openmpi-1.3.3 pallabdatta$ ifconfig
> lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> mtu 16384
> inet6 fe80::1%lo0 prefixlen 64 scopeid 0x1
> inet 127.0.0.1 netmask 0xff000000
> inet6 ::1 prefixlen 128
> gif0: flags=8010<POINTOPOINT,MULTICAST> mtu 1280
> stf0: flags=0<> mtu 1280
> en0: flags=8863<UP,BROADCAST,SMART,RUNNING,SIMPLEX,MULTICAST> mtu 1500
> inet6 fe80::21f:5bff:fe3d:eaac%en0 prefixlen 64 scopeid 0x4
> inet 10.11.14.203 netmask 0xfffff000 broadcast 10.11.15.255
> ether 00:1f:5b:3d:ea:ac
> media: autoselect (100baseTX <full-duplex>) status: active
> supported media: autoselect 10baseT/UTP <half-duplex> 10baseT/UTP
> <full-duplex> 10baseT/UTP <full-duplex,hw-loopback> 10baseT/UTP
> <full-duplex,flow-control> 100baseTX <half-duplex> 100baseTX
> <full-duplex> 100baseTX <full-duplex,hw-loopback> 100baseTX
> <full-duplex,flow-control> 1000baseT <full-duplex> 1000baseT
> <full-duplex,hw-loopback> 1000baseT <full-duplex,flow-control>
> en1: flags=8863<UP,BROADCAST,SMART,RUNNING,SIMPLEX,MULTICAST> mtu 1500
> ether 00:1f:5b:3d:ea:ad
> media: autoselect status: inactive
> supported media: autoselect 10baseT/UTP <half-duplex> 10baseT/UTP
> <full-duplex> 10baseT/UTP <full-duplex,hw-loopback> 10baseT/UTP
> <full-duplex,flow-control> 100baseTX <half-duplex> 100baseTX
> <full-duplex> 100baseTX <full-duplex,hw-loopback> 100baseTX
> <full-duplex,flow-control> 1000baseT <full-duplex> 1000baseT
> <full-duplex,hw-loopback> 1000baseT <full-duplex,flow-control>
> fw0: flags=8863<UP,BROADCAST,SMART,RUNNING,SIMPLEX,MULTICAST> mtu 4078
> lladdr 00:22:41:ff:fe:ed:7d:a8
> media: autoselect <full-duplex> status: inactive
> supported media: autoselect <full-duplex>
>
>
> LINUX:
> ====
> pallabdatta_at_apex-backpack:~/backpack/src$ ifconfig
> lo Link encap:Local Loopback
> inet addr:127.0.0.1 Mask:255.0.0.0
> inet6 addr: ::1/128 Scope:Host
> UP LOOPBACK RUNNING MTU:16436 Metric:1
> RX packets:116 errors:0 dropped:0 overruns:0 frame:0
> TX packets:116 errors:0 dropped:0 overruns:0 carrier:0
> collisions:0 txqueuelen:0
> RX bytes:11788 (11.7 KB) TX bytes:11788 (11.7 KB)
>
> wlan0 Link encap:Ethernet HWaddr 00:21:79:c2:54:c7
> inet addr:10.11.14.205 Bcast:10.11.14.255 Mask:
> 255.255.240.0
> inet6 addr: fe80::221:79ff:fec2:54c7/64 Scope:Link
> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
> RX packets:72531 errors:0 dropped:0 overruns:0 frame:0
> TX packets:28894 errors:0 dropped:0 overruns:0 carrier:0
> collisions:0 txqueuelen:1000
> RX bytes:5459312 (5.4 MB) TX bytes:7264193 (7.2 MB)
>
> wmaster0 Link encap:UNSPEC HWaddr
> 00-21-79-C2-54-C7-34-63-00-00-00-00-00-00-00-00
> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
> RX packets:0 errors:0 dropped:0 overruns:0 frame:0
> TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
> collisions:0 txqueuelen:1000
> RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
>
> The mac is a Two 2.26GHz Quad-Core Intel Xeon Mac Pro and the Linux
> Box is
> Ubuntu Server Edition 9.04. The Mac has the ethernet interface to
> connect
> to the network and the linux box connects via a wireless adapter
> (IOGEAR).
>
> Please help me any way I can fix this issue. It really needs to work
> for
> our project.
> thanks in advance,
> regards,
> pallab
>
>
>
>
>
>> My other concern was the following but I am not sure it applies here.
>> If you have multiple interfaces on the node, and they are on the same
>> subnet, then you cannot actually select what IP address to go out of.
>> You can only select the IP address you want to connect to. In these
>> cases, I have seen a hang because we think we are selecting an IP
>> address to go out of, but it actually goes out the other one.
>> Perhaps you can send the User's list the output from "ifconfig" on
>> each
>> of the machines which would show all the interfaces. You need to
>> get the
>> right arguments for ifconfig depending on the OS you are running on.
>>
>> One thought is make sure the ethernet interface is marked down on
>> both
>> boxes if that is possible.
>>
>> Pallab Datta wrote:
>>> Any suggestions on to how to debug this further..??
>>> do you think I need to enable any other option besides
>>> heterogeneous at
>>> the configure proompt.?
>>>
>>>
>>>> The -enable-heterogeneous should do the trick. And to answer the
>>>> previous question, yes, put both of the interfaces in the include
>>>> list.
>>>>
>>>> --mca btl_tcp_if_include en0,wlan0
>>>>
>>>> If that does not work, then I may have one other thought why it
>>>> might
>>>> not work although perhaps not a solution.
>>>>
>>>> Rolf
>>>>
>>>> Pallab Datta wrote:
>>>>
>>>>> Hi Rolf,
>>>>>
>>>>> Do i need to configure openmpi with some specific options apart
>>>>> from
>>>>> --enable-heterogeneous..?
>>>>> I am currently using
>>>>> ./configure --prefix=/usr/local/ --enable-heterogeneous
>>>>> --disable-static
>>>>> --enable-shared --enable-debug
>>>>>
>>>>> on both ends...is the above correct..?! Please let me know.
>>>>> thanks and regards,
>>>>> pallab
>>>>>
>>>>>
>>>>>
>>>>>> Hi:
>>>>>> I assume if you wait several minutes than your program will
>>>>>> actually
>>>>>> time out, yes? I guess I have two suggestions. First, can you
>>>>>> run a
>>>>>> non-MPI job using the wireless? Something like hostname?
>>>>>> Secondly,
>>>>>> you
>>>>>> may want to specify the specific interfaces you want it to use
>>>>>> on the
>>>>>> two machines. You can do that via the "--mca btl_tcp_if_include"
>>>>>> run-time parameter. Just list the ones that you expect it to
>>>>>> use.
>>>>>>
>>>>>> Also, this is not right - "--mca OMPI_mca_mpi_preconnect_all
>>>>>> 1" It
>>>>>> should be --mca mpi_preconnect_mpi 1 if you want to do the
>>>>>> connection
>>>>>> during MPI_Init.
>>>>>>
>>>>>> Rolf
>>>>>>
>>>>>> Pallab Datta wrote:
>>>>>>
>>>>>>
>>>>>>> The following is the error dump
>>>>>>>
>>>>>>> fuji:src pallabdatta$ /usr/local/bin/mpirun --mca
>>>>>>> btl_tcp_port_min_v4
>>>>>>> 36900 -mca btl_tcp_port_range_v4 32 --mca btl_base_verbose 30
>>>>>>> --mca
>>>>>>> btl
>>>>>>> tcp,self --mca OMPI_mca_mpi_preconnect_all 1 -np 2 -hetero -H
>>>>>>> localhost,10.11.14.205 /tmp/hello
>>>>>>> [fuji.local:01316] mca: base: components_open: Looking for btl
>>>>>>> components
>>>>>>> [fuji.local:01316] mca: base: components_open: opening btl
>>>>>>> components
>>>>>>> [fuji.local:01316] mca: base: components_open: found loaded
>>>>>>> component
>>>>>>> self
>>>>>>> [fuji.local:01316] mca: base: components_open: component self
>>>>>>> has no
>>>>>>> register function
>>>>>>> [fuji.local:01316] mca: base: components_open: component self
>>>>>>> open
>>>>>>> function successful
>>>>>>> [fuji.local:01316] mca: base: components_open: found loaded
>>>>>>> component
>>>>>>> tcp
>>>>>>> [fuji.local:01316] mca: base: components_open: component tcp
>>>>>>> has no
>>>>>>> register function
>>>>>>> [fuji.local:01316] mca: base: components_open: component tcp
>>>>>>> open
>>>>>>> function
>>>>>>> successful
>>>>>>> [fuji.local:01316] select: initializing btl component self
>>>>>>> [fuji.local:01316] select: init of component self returned
>>>>>>> success
>>>>>>> [fuji.local:01316] select: initializing btl component tcp
>>>>>>> [fuji.local:01316] select: init of component tcp returned
>>>>>>> success
>>>>>>> [apex-backpack:04753] mca: base: components_open: Looking for
>>>>>>> btl
>>>>>>> components
>>>>>>> [apex-backpack:04753] mca: base: components_open: opening btl
>>>>>>> components
>>>>>>> [apex-backpack:04753] mca: base: components_open: found loaded
>>>>>>> component
>>>>>>> self
>>>>>>> [apex-backpack:04753] mca: base: components_open: component
>>>>>>> self has
>>>>>>> no
>>>>>>> register function
>>>>>>> [apex-backpack:04753] mca: base: components_open: component self
>>>>>>> open
>>>>>>> function successful
>>>>>>> [apex-backpack:04753] mca: base: components_open: found loaded
>>>>>>> component
>>>>>>> tcp
>>>>>>> [apex-backpack:04753] mca: base: components_open: component
>>>>>>> tcp has
>>>>>>> no
>>>>>>> register function
>>>>>>> [apex-backpack:04753] mca: base: components_open: component
>>>>>>> tcp open
>>>>>>> function successful
>>>>>>> [apex-backpack:04753] select: initializing btl component self
>>>>>>> [apex-backpack:04753] select: init of component self returned
>>>>>>> success
>>>>>>> [apex-backpack:04753] select: initializing btl component tcp
>>>>>>> [apex-backpack:04753] select: init of component tcp returned
>>>>>>> success
>>>>>>> Process 0 on fuji.local out of 2
>>>>>>> Process 1 on apex-backpack out of 2
>>>>>>> [apex-backpack:04753] btl: tcp: attempting to connect() to
>>>>>>> address
>>>>>>> 10.11.14.203 on port 9360
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> Hi
>>>>>>>>
>>>>>>>> I am trying to run open-mpi 1.3.3. between a linux box running
>>>>>>>> ubuntu
>>>>>>>> server v.9.04 and a Macintosh. I have configured openmpi with
>>>>>>>> the
>>>>>>>> following options.:
>>>>>>>> ./configure --prefix=/usr/local/ --enable-heterogeneous
>>>>>>>> --disable-shared
>>>>>>>> --enable-static
>>>>>>>>
>>>>>>>> When both the machines are connected to the network via
>>>>>>>> ethernet
>>>>>>>> cables
>>>>>>>> openmpi works fine.
>>>>>>>>
>>>>>>>> But when I switch the linux box to a wireless adapter i can
>>>>>>>> reach
>>>>>>>> (ping)
>>>>>>>> the macintosh
>>>>>>>> but openmpi hangs on a hello world program.
>>>>>>>>
>>>>>>>> I ran :
>>>>>>>>
>>>>>>>> /usr/local/bin/mpirun --mca btl_tcp_port_min_v4 36900 -mca
>>>>>>>> btl_tcp_port_range_v4 32 --mca btl_base_verbose 30 --mca
>>>>>>>> OMPI_mca_mpi_preconnect_all 1 -np 2 -hetero -H
>>>>>>>> localhost,10.11.14.205
>>>>>>>> /tmp/back
>>>>>>>>
>>>>>>>> it hangs on a send receive function between the two ends. All
>>>>>>>> my
>>>>>>>> firewalls
>>>>>>>> are turned off at the macintosh end. PLEASE HELP ASAP>
>>>>>>>> regards,
>>>>>>>> pallab
>>>>>>>> _______________________________________________
>>>>>>>> users mailing list
>>>>>>>> users_at_[hidden]
>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> users mailing list
>>>>>>> users_at_[hidden]
>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> --
>>>>>>
>>>>>> =========================
>>>>>> rolf.vandevaart_at_[hidden]
>>>>>> 781-442-3043
>>>>>> =========================
>>>>>>
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> users_at_[hidden]
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>
>>>>>>
>>>>>
>>>> --
>>>>
>>>> =========================
>>>> rolf.vandevaart_at_[hidden]
>>>> 781-442-3043
>>>> =========================
>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>> --
>>
>> =========================
>> rolf.vandevaart_at_[hidden]
>> 781-442-3043
>> =========================
>>
>>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

-- 
Jeff Squyres
jsquyres_at_[hidden]