Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Connection refused with openmpi-1.6.0
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2012-07-12 07:26:48


Are you setting any MCA parameters, such as btl_tcp_if_include or btl_tcp_if_exclude, perchance? They could be in your environment or in a file, too.

I ask because we should be skipping the loopback device by default (i.e., it should be covered by the default value of btl_tcp_if_exclude).

What is the output from ifconfig?

On Jul 11, 2012, at 12:12 PM, Siegmar Gross wrote:

> Hello Reuti,
>
> thank you for your reply.
>
>>> I get the following error when I try to run my programs with
>>> openmpi-1.6.0.
>>>
>>> tyr hello_1 52 which mpiexec
>>> /usr/local/openmpi-1.6_32_cc/bin/mpiexec
>>> tyr hello_1 53
>>>
>>> tyr hello_1 51 mpiexec --host tyr,sunpc1 -np 3 hello_1_mpi
>>> Process 0 of 3 running on tyr.informatik.hs-fulda.de
>>> Process 2 of 3 running on tyr.informatik.hs-fulda.de
>>> [[4154,1],0][../../../../../openmpi-1.6/ompi/mca/btl/tcp/btl_tcp_endpoint.c:586:m
>>> ca_btl_tcp_endpoint_start_connect] from tyr.informatik.hs-fulda.de to: sunpc1
>>> Unable to connect to the peer 127.0.0.1 on port 1024: Connection refused
>>>
>>> Process 1 of 3 running on sunpc1.informatik.hs-fulda.de
>>> [[4154,1],1][../../../../../openmpi-1.6/ompi/mca/btl/tcp/btl_tcp_endpoint.c:586:m
>>> ca_btl_tcp_endpoint_start_connect] from sunpc1.informatik.hs-fulda.de to: tyr
>>> Unable to connect to the peer 127.0.0.1 on port 516: Connection refused
>>
>> Some distributions give the loopback interface also the name of the host. Is there an additonal line:
>>
>> 127.0.0.1 tyr.informatik.hs-fulda.de
>>
>> in /etc/hosts besides the localhost and interface entry?
>
> No it isn't.
>
> tyr etc 16 more hosts
> #
> # Internet host table
> #
> ::1 localhost
> 127.0.0.1 localhost
> ...
>
> tyr etc 20 ssh sunpc1 head /etc/hosts
> 127.0.0.1 localhost
> ...
>
>
> Kind regards
>
> Siegmar
>
>
>
>
>
>>> [sunpc1.informatik.hs-fulda.de:24555] *** An error occurred in MPI_Barrier
>>> [sunpc1.informatik.hs-fulda.de:24555] *** on communicator MPI_COMM_WORLD
>>> [sunpc1.informatik.hs-fulda.de:24555] *** MPI_ERR_INTERN: internal error
>>> [sunpc1.informatik.hs-fulda.de:24555] *** MPI_ERRORS_ARE_FATAL: your MPI job will
>>> now abort
>>> ...
>>>
>>>
>>> I have no problems with just one host (in this case "127.0.0.1" should
>>> work). Why didn't mpiexec use the ip-addresses of the hosts in the
>>> above example?
>>>
>>>
>>> tyr hello_1 53 mpiexec --host tyr -np 2 hello_1_mpi
>>> Process 0 of 2 running on tyr.informatik.hs-fulda.de
>>> Now 1 slave tasks are sending greetings.
>>> Greetings from task 1:
>>> ...
>>>
>>>
>>> tyr hello_1 54 mpiexec --host sunpc1 -np 2 hello_1_mpi
>>> Process 1 of 2 running on sunpc1.informatik.hs-fulda.de
>>> Process 0 of 2 running on sunpc1.informatik.hs-fulda.de
>>> Now 1 slave tasks are sending greetings.
>>> Greetings from task 1:
>>> ...
>>>
>>>
>>> The problem doesn't result from the heterogeneity of the two
>>> hosts because I get the same error with two Sparc-systems or
>>> two PCs. I didn't have any problems with openmpi-1.2.4.
>>>
>>> tyr hello_1 18 mpiexec -mca btl ^udapl --host tyr,sunpc1,linpc1 \
>>> -np 4 hello_1_mpi
>>> Process 0 of 4 running on tyr.informatik.hs-fulda.de
>>> Process 2 of 4 running on linpc1
>>> Process 1 of 4 running on sunpc1.informatik.hs-fulda.de
>>> Process 3 of 4 running on tyr.informatik.hs-fulda.de
>>> Now 3 slave tasks are sending greetings.
>>> Greetings from task 2:
>>> ...
>>>
>>> tyr hello_1 19 which mpiexec
>>> /usr/local/openmpi-1.2.4/bin/mpiexec
>>>
>>> Do you have any ideas why it doesn't work with openmpi-1.6.0?
>>> I configured the package with
>>>
>>> ../openmpi-1.6/configure --prefix=/usr/local/openmpi-1.6_32_cc \
>>> LDFLAGS="-m32" \
>>> CC="cc" CXX="CC" F77="f77" FC="f95" \
>>> CFLAGS="-m32" CXXFLAGS="-m32 -library=stlport4" FFLAGS="-m32" \
>>> FCFLAGS="-m32" \
>>> CPP="cpp" CXXCPP="cpp" \
>>> CPPFLAGS="" CXXCPPFLAGS="" \
>>> C_INCL_PATH="" C_INCLUDE_PATH="" CPLUS_INCLUDE_PATH="" \
>>> OBJC_INCLUDE_PATH="" MPIHOME="" \
>>> --without-udapl --without-openib \
>>> --enable-mpi-f90 --with-mpi-f90-size=small \
>>> --enable-heterogeneous --enable-cxx-exceptions \
>>> --enable-orterun-prefix-by-default \
>>> --with-threads=posix --enable-mpi-thread-multiple \
>>> --enable-opal-multi-threads \
>>> --with-hwloc=internal --with-ft=LAM --enable-sparse-groups \
>>> |& tee log.configure.$SYSTEM_ENV.$MACHINE_ENV.32_cc
>>>
>>> Thank you very much for any help in advance.
>>>
>>>
>>> Kind regards
>>>
>>> Siegmar
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/