Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2006-10-20 16:44:18


Two questions:

1. Have you tried the just-released 1.1.2?
2. Are you closing stdin/out/err?

On Oct 19, 2006, at 3:31 PM, Jeffrey B. Layton wrote:

> A small update. I was looking through the error file a bit more
> (it was 159MB). I found the following error message sequence:
>
> o1:22805] mca_oob_tcp_accept: accept() failed with errno 9.
> [o4:11242] [0,1,4]-[0,0,0] mca_oob_tcp_peer_recv_blocking: recv()
> failed
> with errno=104
> [o1:22805] mca_oob_tcp_accept: accept() failed with errno 9.
> ...
> [o1:22805] mca_oob_tcp_accept: accept() failed with errno 9.
> [o3:32205] [0,1,2]-[0,0,0] mca_oob_tcp_peer_complete_connect:
> connection
> failed (errno=111) - retrying (pid=32205)
> [o1:22805] mca_oob_tcp_accept: accept() failed with errno 9.
> [o3:32206] [0,1,3]-[0,0,0] mca_oob_tcp_peer_complete_connect:
> connection
> failed (errno=111) - retrying (pid=32206)
> [o1:22805] mca_oob_tcp_accept: accept() failed with errno 9.
> ...
>
> I don't know if this changes things (my google attempts didn't
> really give me much information).
>
> Jeff
>
>
>> Good afternoon,
>>
>> I really hate to post asking for help with a problem, but
>> my own efforts have not worked out well (probably operator
>> error).
>> Anyway, I'm trying to run a code that was built with PGI 6.1
>> and OpenMPI-1.1.1. The mpirun command looks like:
>>
>> mpirun --hostfile machines.${PBS_JOBID} --np ${NP} -mca btl
>> self,sm,tcp
>> ./${EXE} ${CASEPROJ} >> OUTPUT
>>
>> I get the following error in the PBS error file:
>>
>> [o1:22559] mca_oob_tcp_accept: accept() failed with errno 9.
>> ...
>>
>> and keeps repeating (for a long time).
>>
>> ompi_info gives the following output:
>>
>>> ompi_info
>> Open MPI: 1.1.1
>> Open MPI SVN revision: r11473
>> Open RTE: 1.1.1
>> Open RTE SVN revision: r11473
>> OPAL: 1.1.1
>> OPAL SVN revision: r11473
>> Prefix: /usr/x86_64-pgi-6.1/openmpi-1.1.1
>> Configured architecture: x86_64-suse-linux-gnu
>> Configured by: root
>> Configured on: Mon Oct 16 20:51:34 MDT 2006
>> Configure host: lo248
>> Built by: root
>> Built on: Mon Oct 16 21:02:00 MDT 2006
>> Built host: lo248
>> C bindings: yes
>> C++ bindings: yes
>> Fortran77 bindings: yes (all)
>> Fortran90 bindings: yes
>> Fortran90 bindings size: small
>> C compiler: pgcc
>> C compiler absolute: /opt/pgi/linux86-64/6.1/bin/pgcc
>> C++ compiler: pgCC
>> C++ compiler absolute: /opt/pgi/linux86-64/6.1/bin/pgCC
>> Fortran77 compiler: pgf77
>> Fortran77 compiler abs: /opt/pgi/linux86-64/6.1/bin/pgf77
>> Fortran90 compiler: pgf90
>> Fortran90 compiler abs: /opt/pgi/linux86-64/6.1/bin/pgf90
>> C profiling: yes
>> C++ profiling: yes
>> Fortran77 profiling: yes
>> Fortran90 profiling: yes
>> C++ exceptions: yes
>> Thread support: posix (mpi: no, progress: no)
>> Internal debug support: no
>> MPI parameter check: runtime
>> Memory profiling support: no
>> Memory debugging support: no
>> libltdl support: yes
>> MCA memory: ptmalloc2 (MCA v1.0, API v1.0, Component
>> v1.1.1)
>> MCA paffinity: linux (MCA v1.0, API v1.0, Component
>> v1.1.1)
>> MCA maffinity: first_use (MCA v1.0, API v1.0, Component
>> v1.1.1)
>> MCA maffinity: libnuma (MCA v1.0, API v1.0, Component
>> v1.1.1)
>> MCA timer: linux (MCA v1.0, API v1.0, Component
>> v1.1.1)
>> MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0)
>> MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0)
>> MCA coll: basic (MCA v1.0, API v1.0, Component
>> v1.1.1)
>> MCA coll: hierarch (MCA v1.0, API v1.0, Component
>> v1.1.1)
>> MCA coll: self (MCA v1.0, API v1.0, Component v1.1.1)
>> MCA coll: sm (MCA v1.0, API v1.0, Component v1.1.1)
>> MCA coll: tuned (MCA v1.0, API v1.0, Component
>> v1.1.1)
>> MCA io: romio (MCA v1.0, API v1.0, Component
>> v1.1.1)
>> MCA mpool: gm (MCA v1.0, API v1.0, Component v1.1.1)
>> MCA mpool: sm (MCA v1.0, API v1.0, Component v1.1.1)
>> MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.1.1)
>> MCA bml: r2 (MCA v1.0, API v1.0, Component v1.1.1)
>> MCA rcache: rb (MCA v1.0, API v1.0, Component v1.1.1)
>> MCA btl: gm (MCA v1.0, API v1.0, Component v1.1.1)
>> MCA btl: self (MCA v1.0, API v1.0, Component v1.1.1)
>> MCA btl: sm (MCA v1.0, API v1.0, Component v1.1.1)
>> MCA btl: tcp (MCA v1.0, API v1.0, Component v1.0)
>> MCA topo: unity (MCA v1.0, API v1.0, Component
>> v1.1.1)
>> MCA osc: pt2pt (MCA v1.0, API v1.0, Component v1.0)
>> MCA gpr: null (MCA v1.0, API v1.0, Component v1.1.1)
>> MCA gpr: proxy (MCA v1.0, API v1.0, Component
>> v1.1.1)
>> MCA gpr: replica (MCA v1.0, API v1.0, Component
>> v1.1.1)
>> MCA iof: proxy (MCA v1.0, API v1.0, Component
>> v1.1.1)
>> MCA iof: svc (MCA v1.0, API v1.0, Component v1.1.1)
>> MCA ns: proxy (MCA v1.0, API v1.0, Component
>> v1.1.1)
>> MCA ns: replica (MCA v1.0, API v1.0, Component
>> v1.1.1)
>> MCA oob: tcp (MCA v1.0, API v1.0, Component v1.0)
>> MCA ras: dash_host (MCA v1.0, API v1.0, Component
>> v1.1.1)
>> MCA ras: hostfile (MCA v1.0, API v1.0, Component
>> v1.1.1)
>> MCA ras: localhost (MCA v1.0, API v1.0, Component
>> v1.1.1)
>> MCA rds: hostfile (MCA v1.0, API v1.0, Component
>> v1.1.1)
>> MCA rds: resfile (MCA v1.0, API v1.0, Component
>> v1.1.1)
>> MCA rmaps: round_robin (MCA v1.0, API v1.0,
>> Component v1.1.1)
>> MCA rmgr: proxy (MCA v1.0, API v1.0, Component
>> v1.1.1)
>> MCA rmgr: urm (MCA v1.0, API v1.0, Component v1.1.1)
>> MCA rml: oob (MCA v1.0, API v1.0, Component v1.1.1)
>> MCA pls: fork (MCA v1.0, API v1.0, Component v1.1.1)
>> MCA pls: rsh (MCA v1.0, API v1.0, Component v1.1.1)
>> MCA sds: env (MCA v1.0, API v1.0, Component v1.1.1)
>> MCA sds: pipe (MCA v1.0, API v1.0, Component v1.1.1)
>> MCA sds: seed (MCA v1.0, API v1.0, Component v1.1.1)
>> MCA sds: singleton (MCA v1.0, API v1.0, Component
>> v1.1.1)
>>
>>
>>
>> I found this link via google:
>>
>> http://www.open-mpi.org/community/lists/users/2006/06/1486.php
>>
>> But to be honest I'm not sure how to apply this to fix my problem.
>>
>> Thanks!
>>
>> Jeff
>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Jeff Squyres
Server Virtualization Business Unit
Cisco Systems