Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI devel] [OMPI users] Open-MPI between Mac and Linux (ubuntu 9.04) over wireless
From: Pallab Datta (datta_at_[hidden])
Date: 2009-09-22 16:59:16


Is this a bug running open-mpi over heterogeneous environments (between a
mac and linux) over wireless links.
Please suggest what needs to be done or what I am missing.?!
Any clues as to how to debug this will be of great help.
thanks and regards, pallab

> Hi Rolf,
>
> I ran the following:
>
> pallabdatta$ /usr/local/bin/mpirun --mca btl_tcp_port_min_v4 36900 -mca
> btl_tcp_port_range_v4 32 --mca btl_base_verbose 30 --mca
> btl_tcp_if_include en0,wlan0 -np 2 -hetero -H localhost,10.11.14.205
> /tmp/hello
>
> [fuji.local:02267] mca: base: components_open: Looking for btl components
> [fuji.local:02267] mca: base: components_open: opening btl components
> [fuji.local:02267] mca: base: components_open: found loaded component self
> [fuji.local:02267] mca: base: components_open: component self has no
> register function
> [fuji.local:02267] mca: base: components_open: component self open
> function successful
> [fuji.local:02267] mca: base: components_open: found loaded component sm
> [fuji.local:02267] mca: base: components_open: component sm has no
> register function
> [fuji.local:02267] mca: base: components_open: component sm open function
> successful
> [fuji.local:02267] mca: base: components_open: found loaded component tcp
> [fuji.local:02267] mca: base: components_open: component tcp has no
> register function
> [fuji.local:02267] mca: base: components_open: component tcp open function
> successful
> [fuji.local:02267] select: initializing btl component self
> [fuji.local:02267] select: init of component self returned success
> [fuji.local:02267] select: initializing btl component sm
> [fuji.local:02267] select: init of component sm returned success
> [fuji.local:02267] select: initializing btl component tcp
> [fuji.local][[59424,1],0][btl_tcp_component.c:468:mca_btl_tcp_component_create_instances]
> invalid interface "wlan0"
> [fuji.local:02267] select: init of component tcp returned success
> [apex-backpack:31956] mca: base: components_open: Looking for btl
> components
> [apex-backpack:31956] mca: base: components_open: opening btl components
> [apex-backpack:31956] mca: base: components_open: found loaded component
> self
> [apex-backpack:31956] mca: base: components_open: component self has no
> register function
> [apex-backpack:31956] mca: base: components_open: component self open
> function successful
> [apex-backpack:31956] mca: base: components_open: found loaded component
> sm
> [apex-backpack:31956] mca: base: components_open: component sm has no
> register function
> [apex-backpack:31956] mca: base: components_open: component sm open
> function successful
> [apex-backpack:31956] mca: base: components_open: found loaded component
> tcp
> [apex-backpack:31956] mca: base: components_open: component tcp has no
> register function
> [apex-backpack:31956] mca: base: components_open: component tcp open
> function successful
> [apex-backpack:31956] select: initializing btl component self
> [apex-backpack:31956] select: init of component self returned success
> [apex-backpack:31956] select: initializing btl component sm
> [apex-backpack:31956] select: init of component sm returned success
> [apex-backpack:31956] select: initializing btl component tcp
> [apex-backpack][[59424,1],1][btl_tcp_component.c:468:mca_btl_tcp_component_create_instances]
> invalid interface "en0"
> [apex-backpack:31956] select: init of component tcp returned success
> Process 0 on fuji.local out of 2
> Process 1 on apex-backpack out of 2
> [apex-backpack:31956] btl: tcp: attempting to connect() to address
> 10.11.14.203 on port 9360
>
>
>
> It launches the processes on both ends and then it hangs at the send
> receive part..!!
> What is the other thing that you were mentioning which makes you think
> that its not working?!?
> Please suggest..
> --regards, pallab
>
>
>
>> The -enable-heterogeneous should do the trick. And to answer the
>> previous question, yes, put both of the interfaces in the include list.
>>
>> --mca btl_tcp_if_include en0,wlan0
>>
>> If that does not work, then I may have one other thought why it might
>> not work although perhaps not a solution.
>>
>> Rolf
>>
>> Pallab Datta wrote:
>>> Hi Rolf,
>>>
>>> Do i need to configure openmpi with some specific options apart from
>>> --enable-heterogeneous..?
>>> I am currently using
>>> ./configure --prefix=/usr/local/ --enable-heterogeneous
>>> --disable-static
>>> --enable-shared --enable-debug
>>>
>>> on both ends...is the above correct..?! Please let me know.
>>> thanks and regards,
>>> pallab
>>>
>>>
>>>> Hi:
>>>> I assume if you wait several minutes than your program will actually
>>>> time out, yes? I guess I have two suggestions. First, can you run a
>>>> non-MPI job using the wireless? Something like hostname? Secondly,
>>>> you
>>>> may want to specify the specific interfaces you want it to use on the
>>>> two machines. You can do that via the "--mca btl_tcp_if_include"
>>>> run-time parameter. Just list the ones that you expect it to use.
>>>>
>>>> Also, this is not right - "--mca OMPI_mca_mpi_preconnect_all 1" It
>>>> should be --mca mpi_preconnect_mpi 1 if you want to do the connection
>>>> during MPI_Init.
>>>>
>>>> Rolf
>>>>
>>>> Pallab Datta wrote:
>>>>
>>>>> The following is the error dump
>>>>>
>>>>> fuji:src pallabdatta$ /usr/local/bin/mpirun --mca btl_tcp_port_min_v4
>>>>> 36900 -mca btl_tcp_port_range_v4 32 --mca btl_base_verbose 30 --mca
>>>>> btl
>>>>> tcp,self --mca OMPI_mca_mpi_preconnect_all 1 -np 2 -hetero -H
>>>>> localhost,10.11.14.205 /tmp/hello
>>>>> [fuji.local:01316] mca: base: components_open: Looking for btl
>>>>> components
>>>>> [fuji.local:01316] mca: base: components_open: opening btl components
>>>>> [fuji.local:01316] mca: base: components_open: found loaded component
>>>>> self
>>>>> [fuji.local:01316] mca: base: components_open: component self has no
>>>>> register function
>>>>> [fuji.local:01316] mca: base: components_open: component self open
>>>>> function successful
>>>>> [fuji.local:01316] mca: base: components_open: found loaded component
>>>>> tcp
>>>>> [fuji.local:01316] mca: base: components_open: component tcp has no
>>>>> register function
>>>>> [fuji.local:01316] mca: base: components_open: component tcp open
>>>>> function
>>>>> successful
>>>>> [fuji.local:01316] select: initializing btl component self
>>>>> [fuji.local:01316] select: init of component self returned success
>>>>> [fuji.local:01316] select: initializing btl component tcp
>>>>> [fuji.local:01316] select: init of component tcp returned success
>>>>> [apex-backpack:04753] mca: base: components_open: Looking for btl
>>>>> components
>>>>> [apex-backpack:04753] mca: base: components_open: opening btl
>>>>> components
>>>>> [apex-backpack:04753] mca: base: components_open: found loaded
>>>>> component
>>>>> self
>>>>> [apex-backpack:04753] mca: base: components_open: component self has
>>>>> no
>>>>> register function
>>>>> [apex-backpack:04753] mca: base: components_open: component self open
>>>>> function successful
>>>>> [apex-backpack:04753] mca: base: components_open: found loaded
>>>>> component
>>>>> tcp
>>>>> [apex-backpack:04753] mca: base: components_open: component tcp has
>>>>> no
>>>>> register function
>>>>> [apex-backpack:04753] mca: base: components_open: component tcp open
>>>>> function successful
>>>>> [apex-backpack:04753] select: initializing btl component self
>>>>> [apex-backpack:04753] select: init of component self returned success
>>>>> [apex-backpack:04753] select: initializing btl component tcp
>>>>> [apex-backpack:04753] select: init of component tcp returned success
>>>>> Process 0 on fuji.local out of 2
>>>>> Process 1 on apex-backpack out of 2
>>>>> [apex-backpack:04753] btl: tcp: attempting to connect() to address
>>>>> 10.11.14.203 on port 9360
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>> Hi
>>>>>>
>>>>>> I am trying to run open-mpi 1.3.3. between a linux box running
>>>>>> ubuntu
>>>>>> server v.9.04 and a Macintosh. I have configured openmpi with the
>>>>>> following options.:
>>>>>> ./configure --prefix=/usr/local/ --enable-heterogeneous
>>>>>> --disable-shared
>>>>>> --enable-static
>>>>>>
>>>>>> When both the machines are connected to the network via ethernet
>>>>>> cables
>>>>>> openmpi works fine.
>>>>>>
>>>>>> But when I switch the linux box to a wireless adapter i can reach
>>>>>> (ping)
>>>>>> the macintosh
>>>>>> but openmpi hangs on a hello world program.
>>>>>>
>>>>>> I ran :
>>>>>>
>>>>>> /usr/local/bin/mpirun --mca btl_tcp_port_min_v4 36900 -mca
>>>>>> btl_tcp_port_range_v4 32 --mca btl_base_verbose 30 --mca
>>>>>> OMPI_mca_mpi_preconnect_all 1 -np 2 -hetero -H
>>>>>> localhost,10.11.14.205
>>>>>> /tmp/back
>>>>>>
>>>>>> it hangs on a send receive function between the two ends. All my
>>>>>> firewalls
>>>>>> are turned off at the macintosh end. PLEASE HELP ASAP>
>>>>>> regards,
>>>>>> pallab
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> users_at_[hidden]
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>
>>>>>>
>>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>
>>>>>
>>>> --
>>>>
>>>> =========================
>>>> rolf.vandevaart_at_[hidden]
>>>> 781-442-3043
>>>> =========================
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>
>>>
>>
>>
>> --
>>
>> =========================
>> rolf.vandevaart_at_[hidden]
>> 781-442-3043
>> =========================
>>
>>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>