Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] Connection refused with openmpi-1.6.0
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2012-07-12 07:26:48


Are you setting any MCA parameters, such as btl_tcp_if_include or btl_tcp_if_exclude, perchance? They could be in your environment or in a file, too.

I ask because we should be skipping the loopback device by default (i.e., it should be covered by the default value of btl_tcp_if_exclude).

What is the output from ifconfig?

On Jul 11, 2012, at 12:12 PM, Siegmar Gross wrote:

> Hello Reuti,
>
> thank you for your reply.
>
>>> I get the following error when I try to run my programs with
>>> openmpi-1.6.0.
>>>
>>> tyr hello_1 52 which mpiexec
>>> /usr/local/openmpi-1.6_32_cc/bin/mpiexec
>>> tyr hello_1 53
>>>
>>> tyr hello_1 51 mpiexec --host tyr,sunpc1 -np 3 hello_1_mpi
>>> Process 0 of 3 running on tyr.informatik.hs-fulda.de
>>> Process 2 of 3 running on tyr.informatik.hs-fulda.de
>>> [[4154,1],0][../../../../../openmpi-1.6/ompi/mca/btl/tcp/btl_tcp_endpoint.c:586:m
>>> ca_btl_tcp_endpoint_start_connect] from tyr.informatik.hs-fulda.de to: sunpc1
>>> Unable to connect to the peer 127.0.0.1 on port 1024: Connection refused
>>>
>>> Process 1 of 3 running on sunpc1.informatik.hs-fulda.de
>>> [[4154,1],1][../../../../../openmpi-1.6/ompi/mca/btl/tcp/btl_tcp_endpoint.c:586:m
>>> ca_btl_tcp_endpoint_start_connect] from sunpc1.informatik.hs-fulda.de to: tyr
>>> Unable to connect to the peer 127.0.0.1 on port 516: Connection refused
>>
>> Some distributions give the loopback interface also the name of the host. Is there an additonal line:
>>
>> 127.0.0.1 tyr.informatik.hs-fulda.de
>>
>> in /etc/hosts besides the localhost and interface entry?
>
> No it isn't.
>
> tyr etc 16 more hosts
> #
> # Internet host table
> #
> ::1 localhost
> 127.0.0.1 localhost
> ...
>
> tyr etc 20 ssh sunpc1 head /etc/hosts
> 127.0.0.1 localhost
> ...
>
>
> Kind regards
>
> Siegmar
>
>
>
>
>
>>> [sunpc1.informatik.hs-fulda.de:24555] *** An error occurred in MPI_Barrier
>>> [sunpc1.informatik.hs-fulda.de:24555] *** on communicator MPI_COMM_WORLD
>>> [sunpc1.informatik.hs-fulda.de:24555] *** MPI_ERR_INTERN: internal error
>>> [sunpc1.informatik.hs-fulda.de:24555] *** MPI_ERRORS_ARE_FATAL: your MPI job will
>>> now abort
>>> ...
>>>
>>>
>>> I have no problems with just one host (in this case "127.0.0.1" should
>>> work). Why didn't mpiexec use the ip-addresses of the hosts in the
>>> above example?
>>>
>>>
>>> tyr hello_1 53 mpiexec --host tyr -np 2 hello_1_mpi
>>> Process 0 of 2 running on tyr.informatik.hs-fulda.de
>>> Now 1 slave tasks are sending greetings.
>>> Greetings from task 1:
>>> ...
>>>
>>>
>>> tyr hello_1 54 mpiexec --host sunpc1 -np 2 hello_1_mpi
>>> Process 1 of 2 running on sunpc1.informatik.hs-fulda.de
>>> Process 0 of 2 running on sunpc1.informatik.hs-fulda.de
>>> Now 1 slave tasks are sending greetings.
>>> Greetings from task 1:
>>> ...
>>>
>>>
>>> The problem doesn't result from the heterogeneity of the two
>>> hosts because I get the same error with two Sparc-systems or
>>> two PCs. I didn't have any problems with openmpi-1.2.4.
>>>
>>> tyr hello_1 18 mpiexec -mca btl ^udapl --host tyr,sunpc1,linpc1 \
>>> -np 4 hello_1_mpi
>>> Process 0 of 4 running on tyr.informatik.hs-fulda.de
>>> Process 2 of 4 running on linpc1
>>> Process 1 of 4 running on sunpc1.informatik.hs-fulda.de
>>> Process 3 of 4 running on tyr.informatik.hs-fulda.de
>>> Now 3 slave tasks are sending greetings.
>>> Greetings from task 2:
>>> ...
>>>
>>> tyr hello_1 19 which mpiexec
>>> /usr/local/openmpi-1.2.4/bin/mpiexec
>>>
>>> Do you have any ideas why it doesn't work with openmpi-1.6.0?
>>> I configured the package with
>>>
>>> ../openmpi-1.6/configure --prefix=/usr/local/openmpi-1.6_32_cc \
>>> LDFLAGS="-m32" \
>>> CC="cc" CXX="CC" F77="f77" FC="f95" \
>>> CFLAGS="-m32" CXXFLAGS="-m32 -library=stlport4" FFLAGS="-m32" \
>>> FCFLAGS="-m32" \
>>> CPP="cpp" CXXCPP="cpp" \
>>> CPPFLAGS="" CXXCPPFLAGS="" \
>>> C_INCL_PATH="" C_INCLUDE_PATH="" CPLUS_INCLUDE_PATH="" \
>>> OBJC_INCLUDE_PATH="" MPIHOME="" \
>>> --without-udapl --without-openib \
>>> --enable-mpi-f90 --with-mpi-f90-size=small \
>>> --enable-heterogeneous --enable-cxx-exceptions \
>>> --enable-orterun-prefix-by-default \
>>> --with-threads=posix --enable-mpi-thread-multiple \
>>> --enable-opal-multi-threads \
>>> --with-hwloc=internal --with-ft=LAM --enable-sparse-groups \
>>> |& tee log.configure.$SYSTEM_ENV.$MACHINE_ENV.32_cc
>>>
>>> Thank you very much for any help in advance.
>>>
>>>
>>> Kind regards
>>>
>>> Siegmar
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/