Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] [OMPI devel] Open-MPI between Mac and Linux (ubuntu 9.04) over wireless
From: Pallab Datta (datta_at_[hidden])
Date: 2009-09-24 12:39:29


Hi All,

Yes I can ping and ssh from apex-backpack to my Mac (fuji.local).
I fixed the wireless broadcast to reflect the same on both ends
(10.11.14.255) but still the problem persists.

I have tried other wireless adapters as well. But no luck till far.
Please let me know what can be done...
regards, pallab

> (putting this back on the list where others can reply as well, and if
> we solve it, the solution will be google-ized)
>
> According to your debug output:
>
>>> [apex-backpack:31956] btl: tcp: attempting to connect() to address
>>> 10.11.14.203 on port 9360
>
> It *is* trying to connect to the right IP address. Are you able to
> ping to .203 from apex-backpack?
>
> I also notice that you ethernet configuration does not exactly match
> between linux and osx:
>
> en0: flags=8863<UP,BROADCAST,SMART,RUNNING,SIMPLEX,MULTICAST> mtu 1500
> inet 10.11.14.203 netmask 0xfffff000 broadcast 10.11.15.255
>
> wlan0 Link encap:Ethernet HWaddr 00:21:79:c2:54:c7
> inet addr:10.11.14.205 Bcast:10.11.14.255 Mask:
> 255.255.240.0
>
>
> On Sep 22, 2009, at 9:26 PM, Pallab Datta wrote:
>
>> There is no firewall running between the machines. I tried using the
>> IP
>> address instead of localhost but it gave me the same output. MPI is
>> not
>> even timing out..it keeps eternally hanging on..:(
>>
>> I have disabled the ethernet interface on the linux box, keeping
>> only the
>> wireless up. On the mac i only have the ethernet turned on. My mac
>> is a 8
>> core mac pro.
>>
>> Please help me debug this..
>> thanks in advance, regards,
>> pallab
>>
>>
>>> (only replying to users list)
>>>
>>> Some suggestions:
>>>
>>> - MPI seems to startup but the additional TCP connections required
>>> for
>>> MPI connections seem to be failing / timing out / some other error.
>>> - Are you running firewalls between your machines? If so, can you
>>> disable them?
>>> - I see that you're specifying "--mca btl_tcp_port_min_v4 36900" but
>>> one of the debug lines reads:
>>>> [apex-backpack:31956] btl: tcp: attempting to connect() to address
>>>> 10.11.14.203 on port 9360
>>> - Try not using the name "localhost", but rather the IP address of
>>> the
>>> local machine
>>>
>>>
>>> On Sep 22, 2009, at 5:27 PM, Pallab Datta wrote:
>>>
>>>> The following are the ifconfig for both the Mac and the Linux
>>>> respectively:
>>>>
>>>> fuji:openmpi-1.3.3 pallabdatta$ ifconfig
>>>> lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> mtu 16384
>>>> inet6 fe80::1%lo0 prefixlen 64 scopeid 0x1
>>>> inet 127.0.0.1 netmask 0xff000000
>>>> inet6 ::1 prefixlen 128
>>>> gif0: flags=8010<POINTOPOINT,MULTICAST> mtu 1280
>>>> stf0: flags=0<> mtu 1280
>>>> en0: flags=8863<UP,BROADCAST,SMART,RUNNING,SIMPLEX,MULTICAST> mtu
>>>> 1500
>>>> inet6 fe80::21f:5bff:fe3d:eaac%en0 prefixlen 64 scopeid 0x4
>>>> inet 10.11.14.203 netmask 0xfffff000 broadcast 10.11.15.255
>>>> ether 00:1f:5b:3d:ea:ac
>>>> media: autoselect (100baseTX <full-duplex>) status: active
>>>> supported media: autoselect 10baseT/UTP <half-duplex> 10baseT/UTP
>>>> <full-duplex> 10baseT/UTP <full-duplex,hw-loopback> 10baseT/UTP
>>>> <full-duplex,flow-control> 100baseTX <half-duplex> 100baseTX
>>>> <full-duplex> 100baseTX <full-duplex,hw-loopback> 100baseTX
>>>> <full-duplex,flow-control> 1000baseT <full-duplex> 1000baseT
>>>> <full-duplex,hw-loopback> 1000baseT <full-duplex,flow-control>
>>>> en1: flags=8863<UP,BROADCAST,SMART,RUNNING,SIMPLEX,MULTICAST> mtu
>>>> 1500
>>>> ether 00:1f:5b:3d:ea:ad
>>>> media: autoselect status: inactive
>>>> supported media: autoselect 10baseT/UTP <half-duplex> 10baseT/UTP
>>>> <full-duplex> 10baseT/UTP <full-duplex,hw-loopback> 10baseT/UTP
>>>> <full-duplex,flow-control> 100baseTX <half-duplex> 100baseTX
>>>> <full-duplex> 100baseTX <full-duplex,hw-loopback> 100baseTX
>>>> <full-duplex,flow-control> 1000baseT <full-duplex> 1000baseT
>>>> <full-duplex,hw-loopback> 1000baseT <full-duplex,flow-control>
>>>> fw0: flags=8863<UP,BROADCAST,SMART,RUNNING,SIMPLEX,MULTICAST> mtu
>>>> 4078
>>>> lladdr 00:22:41:ff:fe:ed:7d:a8
>>>> media: autoselect <full-duplex> status: inactive
>>>> supported media: autoselect <full-duplex>
>>>>
>>>>
>>>> LINUX:
>>>> ====
>>>> pallabdatta_at_apex-backpack:~/backpack/src$ ifconfig
>>>> lo Link encap:Local Loopback
>>>> inet addr:127.0.0.1 Mask:255.0.0.0
>>>> inet6 addr: ::1/128 Scope:Host
>>>> UP LOOPBACK RUNNING MTU:16436 Metric:1
>>>> RX packets:116 errors:0 dropped:0 overruns:0 frame:0
>>>> TX packets:116 errors:0 dropped:0 overruns:0 carrier:0
>>>> collisions:0 txqueuelen:0
>>>> RX bytes:11788 (11.7 KB) TX bytes:11788 (11.7 KB)
>>>>
>>>> wlan0 Link encap:Ethernet HWaddr 00:21:79:c2:54:c7
>>>> inet addr:10.11.14.205 Bcast:10.11.14.255 Mask:
>>>> 255.255.240.0
>>>> inet6 addr: fe80::221:79ff:fec2:54c7/64 Scope:Link
>>>> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
>>>> RX packets:72531 errors:0 dropped:0 overruns:0 frame:0
>>>> TX packets:28894 errors:0 dropped:0 overruns:0 carrier:0
>>>> collisions:0 txqueuelen:1000
>>>> RX bytes:5459312 (5.4 MB) TX bytes:7264193 (7.2 MB)
>>>>
>>>> wmaster0 Link encap:UNSPEC HWaddr
>>>> 00-21-79-C2-54-C7-34-63-00-00-00-00-00-00-00-00
>>>> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
>>>> RX packets:0 errors:0 dropped:0 overruns:0 frame:0
>>>> TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
>>>> collisions:0 txqueuelen:1000
>>>> RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
>>>>
>>>> The mac is a Two 2.26GHz Quad-Core Intel Xeon Mac Pro and the Linux
>>>> Box is
>>>> Ubuntu Server Edition 9.04. The Mac has the ethernet interface to
>>>> connect
>>>> to the network and the linux box connects via a wireless adapter
>>>> (IOGEAR).
>>>>
>>>> Please help me any way I can fix this issue. It really needs to work
>>>> for
>>>> our project.
>>>> thanks in advance,
>>>> regards,
>>>> pallab
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>> My other concern was the following but I am not sure it applies
>>>>> here.
>>>>> If you have multiple interfaces on the node, and they are on the
>>>>> same
>>>>> subnet, then you cannot actually select what IP address to go out
>>>>> of.
>>>>> You can only select the IP address you want to connect to. In these
>>>>> cases, I have seen a hang because we think we are selecting an IP
>>>>> address to go out of, but it actually goes out the other one.
>>>>> Perhaps you can send the User's list the output from "ifconfig" on
>>>>> each
>>>>> of the machines which would show all the interfaces. You need to
>>>>> get the
>>>>> right arguments for ifconfig depending on the OS you are running
>>>>> on.
>>>>>
>>>>> One thought is make sure the ethernet interface is marked down on
>>>>> both
>>>>> boxes if that is possible.
>>>>>
>>>>> Pallab Datta wrote:
>>>>>> Any suggestions on to how to debug this further..??
>>>>>> do you think I need to enable any other option besides
>>>>>> heterogeneous at
>>>>>> the configure proompt.?
>>>>>>
>>>>>>
>>>>>>> The -enable-heterogeneous should do the trick. And to answer the
>>>>>>> previous question, yes, put both of the interfaces in the include
>>>>>>> list.
>>>>>>>
>>>>>>> --mca btl_tcp_if_include en0,wlan0
>>>>>>>
>>>>>>> If that does not work, then I may have one other thought why it
>>>>>>> might
>>>>>>> not work although perhaps not a solution.
>>>>>>>
>>>>>>> Rolf
>>>>>>>
>>>>>>> Pallab Datta wrote:
>>>>>>>
>>>>>>>> Hi Rolf,
>>>>>>>>
>>>>>>>> Do i need to configure openmpi with some specific options apart
>>>>>>>> from
>>>>>>>> --enable-heterogeneous..?
>>>>>>>> I am currently using
>>>>>>>> ./configure --prefix=/usr/local/ --enable-heterogeneous
>>>>>>>> --disable-static
>>>>>>>> --enable-shared --enable-debug
>>>>>>>>
>>>>>>>> on both ends...is the above correct..?! Please let me know.
>>>>>>>> thanks and regards,
>>>>>>>> pallab
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> Hi:
>>>>>>>>> I assume if you wait several minutes than your program will
>>>>>>>>> actually
>>>>>>>>> time out, yes? I guess I have two suggestions. First, can you
>>>>>>>>> run a
>>>>>>>>> non-MPI job using the wireless? Something like hostname?
>>>>>>>>> Secondly,
>>>>>>>>> you
>>>>>>>>> may want to specify the specific interfaces you want it to use
>>>>>>>>> on the
>>>>>>>>> two machines. You can do that via the "--mca
>>>>>>>>> btl_tcp_if_include"
>>>>>>>>> run-time parameter. Just list the ones that you expect it to
>>>>>>>>> use.
>>>>>>>>>
>>>>>>>>> Also, this is not right - "--mca OMPI_mca_mpi_preconnect_all
>>>>>>>>> 1" It
>>>>>>>>> should be --mca mpi_preconnect_mpi 1 if you want to do the
>>>>>>>>> connection
>>>>>>>>> during MPI_Init.
>>>>>>>>>
>>>>>>>>> Rolf
>>>>>>>>>
>>>>>>>>> Pallab Datta wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> The following is the error dump
>>>>>>>>>>
>>>>>>>>>> fuji:src pallabdatta$ /usr/local/bin/mpirun --mca
>>>>>>>>>> btl_tcp_port_min_v4
>>>>>>>>>> 36900 -mca btl_tcp_port_range_v4 32 --mca btl_base_verbose 30
>>>>>>>>>> --mca
>>>>>>>>>> btl
>>>>>>>>>> tcp,self --mca OMPI_mca_mpi_preconnect_all 1 -np 2 -hetero -H
>>>>>>>>>> localhost,10.11.14.205 /tmp/hello
>>>>>>>>>> [fuji.local:01316] mca: base: components_open: Looking for btl
>>>>>>>>>> components
>>>>>>>>>> [fuji.local:01316] mca: base: components_open: opening btl
>>>>>>>>>> components
>>>>>>>>>> [fuji.local:01316] mca: base: components_open: found loaded
>>>>>>>>>> component
>>>>>>>>>> self
>>>>>>>>>> [fuji.local:01316] mca: base: components_open: component self
>>>>>>>>>> has no
>>>>>>>>>> register function
>>>>>>>>>> [fuji.local:01316] mca: base: components_open: component self
>>>>>>>>>> open
>>>>>>>>>> function successful
>>>>>>>>>> [fuji.local:01316] mca: base: components_open: found loaded
>>>>>>>>>> component
>>>>>>>>>> tcp
>>>>>>>>>> [fuji.local:01316] mca: base: components_open: component tcp
>>>>>>>>>> has no
>>>>>>>>>> register function
>>>>>>>>>> [fuji.local:01316] mca: base: components_open: component tcp
>>>>>>>>>> open
>>>>>>>>>> function
>>>>>>>>>> successful
>>>>>>>>>> [fuji.local:01316] select: initializing btl component self
>>>>>>>>>> [fuji.local:01316] select: init of component self returned
>>>>>>>>>> success
>>>>>>>>>> [fuji.local:01316] select: initializing btl component tcp
>>>>>>>>>> [fuji.local:01316] select: init of component tcp returned
>>>>>>>>>> success
>>>>>>>>>> [apex-backpack:04753] mca: base: components_open: Looking for
>>>>>>>>>> btl
>>>>>>>>>> components
>>>>>>>>>> [apex-backpack:04753] mca: base: components_open: opening btl
>>>>>>>>>> components
>>>>>>>>>> [apex-backpack:04753] mca: base: components_open: found loaded
>>>>>>>>>> component
>>>>>>>>>> self
>>>>>>>>>> [apex-backpack:04753] mca: base: components_open: component
>>>>>>>>>> self has
>>>>>>>>>> no
>>>>>>>>>> register function
>>>>>>>>>> [apex-backpack:04753] mca: base: components_open: component
>>>>>>>>>> self
>>>>>>>>>> open
>>>>>>>>>> function successful
>>>>>>>>>> [apex-backpack:04753] mca: base: components_open: found loaded
>>>>>>>>>> component
>>>>>>>>>> tcp
>>>>>>>>>> [apex-backpack:04753] mca: base: components_open: component
>>>>>>>>>> tcp has
>>>>>>>>>> no
>>>>>>>>>> register function
>>>>>>>>>> [apex-backpack:04753] mca: base: components_open: component
>>>>>>>>>> tcp open
>>>>>>>>>> function successful
>>>>>>>>>> [apex-backpack:04753] select: initializing btl component self
>>>>>>>>>> [apex-backpack:04753] select: init of component self returned
>>>>>>>>>> success
>>>>>>>>>> [apex-backpack:04753] select: initializing btl component tcp
>>>>>>>>>> [apex-backpack:04753] select: init of component tcp returned
>>>>>>>>>> success
>>>>>>>>>> Process 0 on fuji.local out of 2
>>>>>>>>>> Process 1 on apex-backpack out of 2
>>>>>>>>>> [apex-backpack:04753] btl: tcp: attempting to connect() to
>>>>>>>>>> address
>>>>>>>>>> 10.11.14.203 on port 9360
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> Hi
>>>>>>>>>>>
>>>>>>>>>>> I am trying to run open-mpi 1.3.3. between a linux box
>>>>>>>>>>> running
>>>>>>>>>>> ubuntu
>>>>>>>>>>> server v.9.04 and a Macintosh. I have configured openmpi with
>>>>>>>>>>> the
>>>>>>>>>>> following options.:
>>>>>>>>>>> ./configure --prefix=/usr/local/ --enable-heterogeneous
>>>>>>>>>>> --disable-shared
>>>>>>>>>>> --enable-static
>>>>>>>>>>>
>>>>>>>>>>> When both the machines are connected to the network via
>>>>>>>>>>> ethernet
>>>>>>>>>>> cables
>>>>>>>>>>> openmpi works fine.
>>>>>>>>>>>
>>>>>>>>>>> But when I switch the linux box to a wireless adapter i can
>>>>>>>>>>> reach
>>>>>>>>>>> (ping)
>>>>>>>>>>> the macintosh
>>>>>>>>>>> but openmpi hangs on a hello world program.
>>>>>>>>>>>
>>>>>>>>>>> I ran :
>>>>>>>>>>>
>>>>>>>>>>> /usr/local/bin/mpirun --mca btl_tcp_port_min_v4 36900 -mca
>>>>>>>>>>> btl_tcp_port_range_v4 32 --mca btl_base_verbose 30 --mca
>>>>>>>>>>> OMPI_mca_mpi_preconnect_all 1 -np 2 -hetero -H
>>>>>>>>>>> localhost,10.11.14.205
>>>>>>>>>>> /tmp/back
>>>>>>>>>>>
>>>>>>>>>>> it hangs on a send receive function between the two ends. All
>>>>>>>>>>> my
>>>>>>>>>>> firewalls
>>>>>>>>>>> are turned off at the macintosh end. PLEASE HELP ASAP>
>>>>>>>>>>> regards,
>>>>>>>>>>> pallab
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> users mailing list
>>>>>>>>>>> users_at_[hidden]
>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> users mailing list
>>>>>>>>>> users_at_[hidden]
>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> --
>>>>>>>>>
>>>>>>>>> =========================
>>>>>>>>> rolf.vandevaart_at_[hidden]
>>>>>>>>> 781-442-3043
>>>>>>>>> =========================
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> users mailing list
>>>>>>>>> users_at_[hidden]
>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>> =========================
>>>>>>> rolf.vandevaart_at_[hidden]
>>>>>>> 781-442-3043
>>>>>>> =========================
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> =========================
>>>>> rolf.vandevaart_at_[hidden]
>>>>> 781-442-3043
>>>>> =========================
>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> devel mailing list
>>>> devel_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>>
>>> --
>>> Jeff Squyres
>>> jsquyres_at_[hidden]
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>
>
>
> --
> Jeff Squyres
> jsquyres_at_[hidden]
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>