Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] [OMPI devel] Open-MPI between Mac and Linux (ubuntu 9.04) over wireless
From: Pallab Datta (datta_at_[hidden])
Date: 2009-09-24 12:54:28


Yes I had tried that initially it (apex-backpack) was trying to connect
the Mac (10.11.14.203) at port number 4 which is too low. So that's why I
made the port range higher..

> Have you tried running without limiting the port range?
>
> On Sep 24, 2009, at 12:39 PM, Pallab Datta wrote:
>
>> Hi All,
>>
>> Yes I can ping and ssh from apex-backpack to my Mac (fuji.local).
>> I fixed the wireless broadcast to reflect the same on both ends
>> (10.11.14.255) but still the problem persists.
>>
>> I have tried other wireless adapters as well. But no luck till far.
>> Please let me know what can be done...
>> regards, pallab
>>
>>> (putting this back on the list where others can reply as well, and if
>>> we solve it, the solution will be google-ized)
>>>
>>> According to your debug output:
>>>
>>>>> [apex-backpack:31956] btl: tcp: attempting to connect() to address
>>>>> 10.11.14.203 on port 9360
>>>
>>> It *is* trying to connect to the right IP address. Are you able to
>>> ping to .203 from apex-backpack?
>>>
>>> I also notice that you ethernet configuration does not exactly match
>>> between linux and osx:
>>>
>>> en0: flags=8863<UP,BROADCAST,SMART,RUNNING,SIMPLEX,MULTICAST> mtu
>>> 1500
>>> inet 10.11.14.203 netmask 0xfffff000 broadcast 10.11.15.255
>>>
>>> wlan0 Link encap:Ethernet HWaddr 00:21:79:c2:54:c7
>>> inet addr:10.11.14.205 Bcast:10.11.14.255 Mask:
>>> 255.255.240.0
>>>
>>>
>>> On Sep 22, 2009, at 9:26 PM, Pallab Datta wrote:
>>>
>>>> There is no firewall running between the machines. I tried using the
>>>> IP
>>>> address instead of localhost but it gave me the same output. MPI is
>>>> not
>>>> even timing out..it keeps eternally hanging on..:(
>>>>
>>>> I have disabled the ethernet interface on the linux box, keeping
>>>> only the
>>>> wireless up. On the mac i only have the ethernet turned on. My mac
>>>> is a 8
>>>> core mac pro.
>>>>
>>>> Please help me debug this..
>>>> thanks in advance, regards,
>>>> pallab
>>>>
>>>>
>>>>> (only replying to users list)
>>>>>
>>>>> Some suggestions:
>>>>>
>>>>> - MPI seems to startup but the additional TCP connections required
>>>>> for
>>>>> MPI connections seem to be failing / timing out / some other error.
>>>>> - Are you running firewalls between your machines? If so, can you
>>>>> disable them?
>>>>> - I see that you're specifying "--mca btl_tcp_port_min_v4 36900"
>>>>> but
>>>>> one of the debug lines reads:
>>>>>> [apex-backpack:31956] btl: tcp: attempting to connect() to address
>>>>>> 10.11.14.203 on port 9360
>>>>> - Try not using the name "localhost", but rather the IP address of
>>>>> the
>>>>> local machine
>>>>>
>>>>>
>>>>> On Sep 22, 2009, at 5:27 PM, Pallab Datta wrote:
>>>>>
>>>>>> The following are the ifconfig for both the Mac and the Linux
>>>>>> respectively:
>>>>>>
>>>>>> fuji:openmpi-1.3.3 pallabdatta$ ifconfig
>>>>>> lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> mtu 16384
>>>>>> inet6 fe80::1%lo0 prefixlen 64 scopeid 0x1
>>>>>> inet 127.0.0.1 netmask 0xff000000
>>>>>> inet6 ::1 prefixlen 128
>>>>>> gif0: flags=8010<POINTOPOINT,MULTICAST> mtu 1280
>>>>>> stf0: flags=0<> mtu 1280
>>>>>> en0: flags=8863<UP,BROADCAST,SMART,RUNNING,SIMPLEX,MULTICAST> mtu
>>>>>> 1500
>>>>>> inet6 fe80::21f:5bff:fe3d:eaac%en0 prefixlen 64 scopeid 0x4
>>>>>> inet 10.11.14.203 netmask 0xfffff000 broadcast 10.11.15.255
>>>>>> ether 00:1f:5b:3d:ea:ac
>>>>>> media: autoselect (100baseTX <full-duplex>) status: active
>>>>>> supported media: autoselect 10baseT/UTP <half-duplex> 10baseT/UTP
>>>>>> <full-duplex> 10baseT/UTP <full-duplex,hw-loopback> 10baseT/UTP
>>>>>> <full-duplex,flow-control> 100baseTX <half-duplex> 100baseTX
>>>>>> <full-duplex> 100baseTX <full-duplex,hw-loopback> 100baseTX
>>>>>> <full-duplex,flow-control> 1000baseT <full-duplex> 1000baseT
>>>>>> <full-duplex,hw-loopback> 1000baseT <full-duplex,flow-control>
>>>>>> en1: flags=8863<UP,BROADCAST,SMART,RUNNING,SIMPLEX,MULTICAST> mtu
>>>>>> 1500
>>>>>> ether 00:1f:5b:3d:ea:ad
>>>>>> media: autoselect status: inactive
>>>>>> supported media: autoselect 10baseT/UTP <half-duplex> 10baseT/UTP
>>>>>> <full-duplex> 10baseT/UTP <full-duplex,hw-loopback> 10baseT/UTP
>>>>>> <full-duplex,flow-control> 100baseTX <half-duplex> 100baseTX
>>>>>> <full-duplex> 100baseTX <full-duplex,hw-loopback> 100baseTX
>>>>>> <full-duplex,flow-control> 1000baseT <full-duplex> 1000baseT
>>>>>> <full-duplex,hw-loopback> 1000baseT <full-duplex,flow-control>
>>>>>> fw0: flags=8863<UP,BROADCAST,SMART,RUNNING,SIMPLEX,MULTICAST> mtu
>>>>>> 4078
>>>>>> lladdr 00:22:41:ff:fe:ed:7d:a8
>>>>>> media: autoselect <full-duplex> status: inactive
>>>>>> supported media: autoselect <full-duplex>
>>>>>>
>>>>>>
>>>>>> LINUX:
>>>>>> ====
>>>>>> pallabdatta_at_apex-backpack:~/backpack/src$ ifconfig
>>>>>> lo Link encap:Local Loopback
>>>>>> inet addr:127.0.0.1 Mask:255.0.0.0
>>>>>> inet6 addr: ::1/128 Scope:Host
>>>>>> UP LOOPBACK RUNNING MTU:16436 Metric:1
>>>>>> RX packets:116 errors:0 dropped:0 overruns:0 frame:0
>>>>>> TX packets:116 errors:0 dropped:0 overruns:0 carrier:0
>>>>>> collisions:0 txqueuelen:0
>>>>>> RX bytes:11788 (11.7 KB) TX bytes:11788 (11.7 KB)
>>>>>>
>>>>>> wlan0 Link encap:Ethernet HWaddr 00:21:79:c2:54:c7
>>>>>> inet addr:10.11.14.205 Bcast:10.11.14.255 Mask:
>>>>>> 255.255.240.0
>>>>>> inet6 addr: fe80::221:79ff:fec2:54c7/64 Scope:Link
>>>>>> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
>>>>>> RX packets:72531 errors:0 dropped:0 overruns:0 frame:0
>>>>>> TX packets:28894 errors:0 dropped:0 overruns:0 carrier:0
>>>>>> collisions:0 txqueuelen:1000
>>>>>> RX bytes:5459312 (5.4 MB) TX bytes:7264193 (7.2 MB)
>>>>>>
>>>>>> wmaster0 Link encap:UNSPEC HWaddr
>>>>>> 00-21-79-C2-54-C7-34-63-00-00-00-00-00-00-00-00
>>>>>> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
>>>>>> RX packets:0 errors:0 dropped:0 overruns:0 frame:0
>>>>>> TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
>>>>>> collisions:0 txqueuelen:1000
>>>>>> RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
>>>>>>
>>>>>> The mac is a Two 2.26GHz Quad-Core Intel Xeon Mac Pro and the
>>>>>> Linux
>>>>>> Box is
>>>>>> Ubuntu Server Edition 9.04. The Mac has the ethernet interface to
>>>>>> connect
>>>>>> to the network and the linux box connects via a wireless adapter
>>>>>> (IOGEAR).
>>>>>>
>>>>>> Please help me any way I can fix this issue. It really needs to
>>>>>> work
>>>>>> for
>>>>>> our project.
>>>>>> thanks in advance,
>>>>>> regards,
>>>>>> pallab
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>> My other concern was the following but I am not sure it applies
>>>>>>> here.
>>>>>>> If you have multiple interfaces on the node, and they are on the
>>>>>>> same
>>>>>>> subnet, then you cannot actually select what IP address to go out
>>>>>>> of.
>>>>>>> You can only select the IP address you want to connect to. In
>>>>>>> these
>>>>>>> cases, I have seen a hang because we think we are selecting an IP
>>>>>>> address to go out of, but it actually goes out the other one.
>>>>>>> Perhaps you can send the User's list the output from "ifconfig"
>>>>>>> on
>>>>>>> each
>>>>>>> of the machines which would show all the interfaces. You need to
>>>>>>> get the
>>>>>>> right arguments for ifconfig depending on the OS you are running
>>>>>>> on.
>>>>>>>
>>>>>>> One thought is make sure the ethernet interface is marked down on
>>>>>>> both
>>>>>>> boxes if that is possible.
>>>>>>>
>>>>>>> Pallab Datta wrote:
>>>>>>>> Any suggestions on to how to debug this further..??
>>>>>>>> do you think I need to enable any other option besides
>>>>>>>> heterogeneous at
>>>>>>>> the configure proompt.?
>>>>>>>>
>>>>>>>>
>>>>>>>>> The -enable-heterogeneous should do the trick. And to answer
>>>>>>>>> the
>>>>>>>>> previous question, yes, put both of the interfaces in the
>>>>>>>>> include
>>>>>>>>> list.
>>>>>>>>>
>>>>>>>>> --mca btl_tcp_if_include en0,wlan0
>>>>>>>>>
>>>>>>>>> If that does not work, then I may have one other thought why it
>>>>>>>>> might
>>>>>>>>> not work although perhaps not a solution.
>>>>>>>>>
>>>>>>>>> Rolf
>>>>>>>>>
>>>>>>>>> Pallab Datta wrote:
>>>>>>>>>
>>>>>>>>>> Hi Rolf,
>>>>>>>>>>
>>>>>>>>>> Do i need to configure openmpi with some specific options
>>>>>>>>>> apart
>>>>>>>>>> from
>>>>>>>>>> --enable-heterogeneous..?
>>>>>>>>>> I am currently using
>>>>>>>>>> ./configure --prefix=/usr/local/ --enable-heterogeneous
>>>>>>>>>> --disable-static
>>>>>>>>>> --enable-shared --enable-debug
>>>>>>>>>>
>>>>>>>>>> on both ends...is the above correct..?! Please let me know.
>>>>>>>>>> thanks and regards,
>>>>>>>>>> pallab
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> Hi:
>>>>>>>>>>> I assume if you wait several minutes than your program will
>>>>>>>>>>> actually
>>>>>>>>>>> time out, yes? I guess I have two suggestions. First, can
>>>>>>>>>>> you
>>>>>>>>>>> run a
>>>>>>>>>>> non-MPI job using the wireless? Something like hostname?
>>>>>>>>>>> Secondly,
>>>>>>>>>>> you
>>>>>>>>>>> may want to specify the specific interfaces you want it to
>>>>>>>>>>> use
>>>>>>>>>>> on the
>>>>>>>>>>> two machines. You can do that via the "--mca
>>>>>>>>>>> btl_tcp_if_include"
>>>>>>>>>>> run-time parameter. Just list the ones that you expect it to
>>>>>>>>>>> use.
>>>>>>>>>>>
>>>>>>>>>>> Also, this is not right - "--mca OMPI_mca_mpi_preconnect_all
>>>>>>>>>>> 1" It
>>>>>>>>>>> should be --mca mpi_preconnect_mpi 1 if you want to do the
>>>>>>>>>>> connection
>>>>>>>>>>> during MPI_Init.
>>>>>>>>>>>
>>>>>>>>>>> Rolf
>>>>>>>>>>>
>>>>>>>>>>> Pallab Datta wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> The following is the error dump
>>>>>>>>>>>>
>>>>>>>>>>>> fuji:src pallabdatta$ /usr/local/bin/mpirun --mca
>>>>>>>>>>>> btl_tcp_port_min_v4
>>>>>>>>>>>> 36900 -mca btl_tcp_port_range_v4 32 --mca btl_base_verbose
>>>>>>>>>>>> 30
>>>>>>>>>>>> --mca
>>>>>>>>>>>> btl
>>>>>>>>>>>> tcp,self --mca OMPI_mca_mpi_preconnect_all 1 -np 2 -hetero
>>>>>>>>>>>> -H
>>>>>>>>>>>> localhost,10.11.14.205 /tmp/hello
>>>>>>>>>>>> [fuji.local:01316] mca: base: components_open: Looking for
>>>>>>>>>>>> btl
>>>>>>>>>>>> components
>>>>>>>>>>>> [fuji.local:01316] mca: base: components_open: opening btl
>>>>>>>>>>>> components
>>>>>>>>>>>> [fuji.local:01316] mca: base: components_open: found loaded
>>>>>>>>>>>> component
>>>>>>>>>>>> self
>>>>>>>>>>>> [fuji.local:01316] mca: base: components_open: component
>>>>>>>>>>>> self
>>>>>>>>>>>> has no
>>>>>>>>>>>> register function
>>>>>>>>>>>> [fuji.local:01316] mca: base: components_open: component
>>>>>>>>>>>> self
>>>>>>>>>>>> open
>>>>>>>>>>>> function successful
>>>>>>>>>>>> [fuji.local:01316] mca: base: components_open: found loaded
>>>>>>>>>>>> component
>>>>>>>>>>>> tcp
>>>>>>>>>>>> [fuji.local:01316] mca: base: components_open: component tcp
>>>>>>>>>>>> has no
>>>>>>>>>>>> register function
>>>>>>>>>>>> [fuji.local:01316] mca: base: components_open: component tcp
>>>>>>>>>>>> open
>>>>>>>>>>>> function
>>>>>>>>>>>> successful
>>>>>>>>>>>> [fuji.local:01316] select: initializing btl component self
>>>>>>>>>>>> [fuji.local:01316] select: init of component self returned
>>>>>>>>>>>> success
>>>>>>>>>>>> [fuji.local:01316] select: initializing btl component tcp
>>>>>>>>>>>> [fuji.local:01316] select: init of component tcp returned
>>>>>>>>>>>> success
>>>>>>>>>>>> [apex-backpack:04753] mca: base: components_open: Looking
>>>>>>>>>>>> for
>>>>>>>>>>>> btl
>>>>>>>>>>>> components
>>>>>>>>>>>> [apex-backpack:04753] mca: base: components_open: opening
>>>>>>>>>>>> btl
>>>>>>>>>>>> components
>>>>>>>>>>>> [apex-backpack:04753] mca: base: components_open: found
>>>>>>>>>>>> loaded
>>>>>>>>>>>> component
>>>>>>>>>>>> self
>>>>>>>>>>>> [apex-backpack:04753] mca: base: components_open: component
>>>>>>>>>>>> self has
>>>>>>>>>>>> no
>>>>>>>>>>>> register function
>>>>>>>>>>>> [apex-backpack:04753] mca: base: components_open: component
>>>>>>>>>>>> self
>>>>>>>>>>>> open
>>>>>>>>>>>> function successful
>>>>>>>>>>>> [apex-backpack:04753] mca: base: components_open: found
>>>>>>>>>>>> loaded
>>>>>>>>>>>> component
>>>>>>>>>>>> tcp
>>>>>>>>>>>> [apex-backpack:04753] mca: base: components_open: component
>>>>>>>>>>>> tcp has
>>>>>>>>>>>> no
>>>>>>>>>>>> register function
>>>>>>>>>>>> [apex-backpack:04753] mca: base: components_open: component
>>>>>>>>>>>> tcp open
>>>>>>>>>>>> function successful
>>>>>>>>>>>> [apex-backpack:04753] select: initializing btl component
>>>>>>>>>>>> self
>>>>>>>>>>>> [apex-backpack:04753] select: init of component self
>>>>>>>>>>>> returned
>>>>>>>>>>>> success
>>>>>>>>>>>> [apex-backpack:04753] select: initializing btl component tcp
>>>>>>>>>>>> [apex-backpack:04753] select: init of component tcp returned
>>>>>>>>>>>> success
>>>>>>>>>>>> Process 0 on fuji.local out of 2
>>>>>>>>>>>> Process 1 on apex-backpack out of 2
>>>>>>>>>>>> [apex-backpack:04753] btl: tcp: attempting to connect() to
>>>>>>>>>>>> address
>>>>>>>>>>>> 10.11.14.203 on port 9360
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi
>>>>>>>>>>>>>
>>>>>>>>>>>>> I am trying to run open-mpi 1.3.3. between a linux box
>>>>>>>>>>>>> running
>>>>>>>>>>>>> ubuntu
>>>>>>>>>>>>> server v.9.04 and a Macintosh. I have configured openmpi
>>>>>>>>>>>>> with
>>>>>>>>>>>>> the
>>>>>>>>>>>>> following options.:
>>>>>>>>>>>>> ./configure --prefix=/usr/local/ --enable-heterogeneous
>>>>>>>>>>>>> --disable-shared
>>>>>>>>>>>>> --enable-static
>>>>>>>>>>>>>
>>>>>>>>>>>>> When both the machines are connected to the network via
>>>>>>>>>>>>> ethernet
>>>>>>>>>>>>> cables
>>>>>>>>>>>>> openmpi works fine.
>>>>>>>>>>>>>
>>>>>>>>>>>>> But when I switch the linux box to a wireless adapter i can
>>>>>>>>>>>>> reach
>>>>>>>>>>>>> (ping)
>>>>>>>>>>>>> the macintosh
>>>>>>>>>>>>> but openmpi hangs on a hello world program.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I ran :
>>>>>>>>>>>>>
>>>>>>>>>>>>> /usr/local/bin/mpirun --mca btl_tcp_port_min_v4 36900 -mca
>>>>>>>>>>>>> btl_tcp_port_range_v4 32 --mca btl_base_verbose 30 --mca
>>>>>>>>>>>>> OMPI_mca_mpi_preconnect_all 1 -np 2 -hetero -H
>>>>>>>>>>>>> localhost,10.11.14.205
>>>>>>>>>>>>> /tmp/back
>>>>>>>>>>>>>
>>>>>>>>>>>>> it hangs on a send receive function between the two ends.
>>>>>>>>>>>>> All
>>>>>>>>>>>>> my
>>>>>>>>>>>>> firewalls
>>>>>>>>>>>>> are turned off at the macintosh end. PLEASE HELP ASAP>
>>>>>>>>>>>>> regards,
>>>>>>>>>>>>> pallab
>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>> users_at_[hidden]
>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> users mailing list
>>>>>>>>>>>> users_at_[hidden]
>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>>
>>>>>>>>>>> =========================
>>>>>>>>>>> rolf.vandevaart_at_[hidden]
>>>>>>>>>>> 781-442-3043
>>>>>>>>>>> =========================
>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> users mailing list
>>>>>>>>>>> users_at_[hidden]
>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> --
>>>>>>>>>
>>>>>>>>> =========================
>>>>>>>>> rolf.vandevaart_at_[hidden]
>>>>>>>>> 781-442-3043
>>>>>>>>> =========================
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>> =========================
>>>>>>> rolf.vandevaart_at_[hidden]
>>>>>>> 781-442-3043
>>>>>>> =========================
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> devel mailing list
>>>>>> devel_at_[hidden]
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>
>>>>>
>>>>> --
>>>>> Jeff Squyres
>>>>> jsquyres_at_[hidden]
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>
>>>>
>>>
>>>
>>> --
>>> Jeff Squyres
>>> jsquyres_at_[hidden]
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> --
> Jeff Squyres
> jsquyres_at_[hidden]
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>