Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] And anyone know what limits connections?
From: Ralph Castain (rhc_at_[hidden])
Date: 2009-07-11 14:33:48


I'm afraid I don't know anything about petsc4py, so I can't speak to
it. However, I can say there is nothing in OMPI that would limit the
number of connections on a machine.

There are, of course, system limits on that value. Have you checked
that ulimit isn't set to something absurdly low? Do you have any idea
how many connections petsc4py is trying to open, beyond those OMPI
opens itself?

OMPI will use 1 socket/process for its out-of-band system, and 1
socket/process for its MPI interconnect (assuming TCP). In addition,
since sockets count as file descriptors, we open 3 fds/process for I/O
forwarding, plus another 1 fd on the process designated to receive
stdin (defaults to rank=0).

So for your 8 process job, OMPI is going to require ~41 file
descriptors when run on a single node.

On your two machine system, OMPI will consume roughly 16 fds/node
since the daemons on those nodes need their own sockets for
communication. If mpirun is on a separate machine (i.e., has no local
procs), then it will consume 2 fds, one for each connection to the
daemons on the other nodes.

HTH
Ralph

On Jul 11, 2009, at 8:44 AM, John R. Cary wrote:

> Thanks for your answer, below. Just so my other question does not
> get lost, I will post it again.
>
> I cannot get an 8-proc job to run on an 8-core cluster with openmpi
> and petsc. I loaded mpi4py and petsc4py, and then
> I try to run the python script:
>
> from mpi4py import MPI
> from petsc4py import PETSc
>
> using
>
> mpirun -n 8 -x PYTHONPATH python test-mpi4py.py
>
> This hangs on my 8-core FC11 box. Either of the following
> allows it to work:
>
> Remove the petsc4py import statement
>
> Run not on localhost, but on two machines in the cluster:
> mpirun -n 8 -host 10.0.0.14,10.0.0.15 -x PYTHONPATH python test-
> mpi4py.py
>
>
> It seems as though something (openmpi? rsh?) is limiting the
> number of connections per machine, and then that petsc is
> requiring additional connections which cause that limit to be
> exceeded.
>
> What could be doing this limiting?
>
> Thanks...John Cary
>
>
>
> Ralph Castain wrote:
>> In the 1.3 series and beyond, you have to specifically tell us the
>> name of any hostfile, including the default one for your system.
>> So, in this example, you would want to set:
>>
>> OMPI_MCA_orte_default_hostfile=absolute-path-to-openmpi-default-
>> hostfile
>>
>> in your environment, or just add:
>>
>> -mca default-hostfile path-to-openmpi-default-hostfile
>>
>> on your cmd line. Check out "man orte_hosts" for a full explanation
>> of how these are used as it has changed from 1.2.
>>
>> Ralph
>>
>>
>> On Jul 11, 2009, at 7:21 AM, John R. Cary wrote:
>>
>>> The original problem was that I could not get an 8-proc job to
>>> run on an 8-core cluster. I loaded mpi4py and petsc4py, and then
>>> I try to run the python script:
>>>
>>> from mpi4py import MPI
>>> from petsc4py import PETSc
>>>
>>> using
>>>
>>> mpirun -n 8 -x PYTHONPATH python test-mpi4py.py
>>>
>>> This hangs on my 8-core FC11 box. Either of the following
>>> allows it to work:
>>>
>>> Remove the petsc4py import statement
>>>
>>> Run not on localhost, but on two machines in the cluster:
>>> mpirun -n 8 -host 10.0.0.14,10.0.0.15 -x PYTHONPATH python test-
>>> mpi4py.py
>>>
>>>
>>> Curiously, putting
>>>
>>> 10.0.0.12 slots=4
>>> 10.0.0.13 slots=4
>>> 10.0.0.14 slots=4
>>> 10.0.0.15 slots=4
>>>
>>>
>>> in openmpi-default-hostfile does not seem to affect anything.
>>>
>>> Any idea why?
>>>
>>> FYI, I am running over rsh. The output of ompi_info is appended.
>>>
>>> It seems as though something (openmpi? rsh?) is limiting the
>>> number of connections per machine, and then that petsc is
>>> requiring additional connections which exceed that limit.
>>> What could be doing this limiting?
>>>
>>> Thanks....John Cary
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> $ ompi_info
>>> Package: Open MPI cary_at_[hidden] Distribution
>>> Open MPI: 1.3.2
>>> Open MPI SVN revision: r21054
>>> Open MPI release date: Apr 21, 2009
>>> Open RTE: 1.3.2
>>> Open RTE SVN revision: r21054
>>> Open RTE release date: Apr 21, 2009
>>> OPAL: 1.3.2
>>> OPAL SVN revision: r21054
>>> OPAL release date: Apr 21, 2009
>>> Ident string: 1.3.2
>>> Prefix: /usr/local/openmpi-1.3.2-nodlopen
>>> Configured architecture: x86_64-unknown-linux-gnu
>>> Configure host: iter.txcorp.com
>>> Configured by: cary
>>> Configured on: Fri Jul 10 07:12:06 MDT 2009
>>> Configure host: iter.txcorp.com
>>> Built by: cary
>>> Built on: Fri Jul 10 07:42:03 MDT 2009
>>> Built host: iter.txcorp.com
>>> C bindings: yes
>>> C++ bindings: yes
>>> Fortran77 bindings: yes (all)
>>> Fortran90 bindings: yes
>>> Fortran90 bindings size: small
>>> C compiler: gcc
>>> C compiler absolute: /usr/lib64/ccache/gcc
>>> C++ compiler: g++
>>> C++ compiler absolute: /usr/lib64/ccache/g++
>>> Fortran77 compiler: gfortran
>>> Fortran77 compiler abs: /usr/bin/gfortran
>>> Fortran90 compiler: gfortran
>>> Fortran90 compiler abs: /usr/bin/gfortran
>>> C profiling: yes
>>> C++ profiling: yes
>>> Fortran77 profiling: yes
>>> Fortran90 profiling: yes
>>> C++ exceptions: no
>>> Thread support: posix (mpi: no, progress: no)
>>> Sparse Groups: no
>>> Internal debug support: no
>>> MPI parameter check: runtime
>>> Memory profiling support: no
>>> Memory debugging support: no
>>> libltdl support: no
>>> Heterogeneous support: no
>>> mpirun default --prefix: no
>>> MPI I/O support: yes
>>> MPI_WTIME support: gettimeofday
>>> Symbol visibility support: yes
>>> FT Checkpoint support: no (checkpoint thread: no)
>>> MCA backtrace: execinfo (MCA v2.0, API v2.0, Component
>>> v1.3.2)
>>> MCA memory: ptmalloc2 (MCA v2.0, API v2.0, Component
>>> v1.3.2)
>>> MCA paffinity: linux (MCA v2.0, API v2.0, Component v1.3.2)
>>> MCA carto: auto_detect (MCA v2.0, API v2.0, Component
>>> v1.3.2)
>>> MCA carto: file (MCA v2.0, API v2.0, Component v1.3.2)
>>> MCA maffinity: first_use (MCA v2.0, API v2.0, Component
>>> v1.3.2)
>>> MCA timer: linux (MCA v2.0, API v2.0, Component v1.3.2)
>>> MCA installdirs: env (MCA v2.0, API v2.0, Component v1.3.2)
>>> MCA installdirs: config (MCA v2.0, API v2.0, Component v1.3.2)
>>> MCA dpm: orte (MCA v2.0, API v2.0, Component v1.3.2)
>>> MCA pubsub: orte (MCA v2.0, API v2.0, Component v1.3.2)
>>> MCA allocator: basic (MCA v2.0, API v2.0, Component v1.3.2)
>>> MCA allocator: bucket (MCA v2.0, API v2.0, Component v1.3.2)
>>> MCA coll: basic (MCA v2.0, API v2.0, Component v1.3.2)
>>> MCA coll: hierarch (MCA v2.0, API v2.0, Component
>>> v1.3.2)
>>> MCA coll: inter (MCA v2.0, API v2.0, Component v1.3.2)
>>> MCA coll: self (MCA v2.0, API v2.0, Component v1.3.2)
>>> MCA coll: sm (MCA v2.0, API v2.0, Component v1.3.2)
>>> MCA coll: sync (MCA v2.0, API v2.0, Component v1.3.2)
>>> MCA coll: tuned (MCA v2.0, API v2.0, Component v1.3.2)
>>> MCA io: romio (MCA v2.0, API v2.0, Component v1.3.2)
>>> MCA mpool: fake (MCA v2.0, API v2.0, Component v1.3.2)
>>> MCA mpool: rdma (MCA v2.0, API v2.0, Component v1.3.2)
>>> MCA mpool: sm (MCA v2.0, API v2.0, Component v1.3.2)
>>> MCA pml: cm (MCA v2.0, API v2.0, Component v1.3.2)
>>> MCA pml: csum (MCA v2.0, API v2.0, Component v1.3.2)
>>> MCA pml: ob1 (MCA v2.0, API v2.0, Component v1.3.2)
>>> MCA pml: v (MCA v2.0, API v2.0, Component v1.3.2)
>>> MCA bml: r2 (MCA v2.0, API v2.0, Component v1.3.2)
>>> MCA rcache: vma (MCA v2.0, API v2.0, Component v1.3.2)
>>> MCA btl: self (MCA v2.0, API v2.0, Component v1.3.2)
>>> MCA btl: sm (MCA v2.0, API v2.0, Component v1.3.2)
>>> MCA btl: tcp (MCA v2.0, API v2.0, Component v1.3.2)
>>> MCA topo: unity (MCA v2.0, API v2.0, Component v1.3.2)
>>> MCA osc: pt2pt (MCA v2.0, API v2.0, Component v1.3.2)
>>> MCA osc: rdma (MCA v2.0, API v2.0, Component v1.3.2)
>>> MCA iof: hnp (MCA v2.0, API v2.0, Component v1.3.2)
>>> MCA iof: orted (MCA v2.0, API v2.0, Component v1.3.2)
>>> MCA iof: tool (MCA v2.0, API v2.0, Component v1.3.2)
>>> MCA oob: tcp (MCA v2.0, API v2.0, Component v1.3.2)
>>> MCA odls: default (MCA v2.0, API v2.0, Component
>>> v1.3.2)
>>> MCA ras: slurm (MCA v2.0, API v2.0, Component v1.3.2)
>>> MCA ras: tm (MCA v2.0, API v2.0, Component v1.3.2)
>>> MCA rmaps: rank_file (MCA v2.0, API v2.0, Component
>>> v1.3.2)
>>> MCA rmaps: round_robin (MCA v2.0, API v2.0, Component
>>> v1.3.2)
>>> MCA rmaps: seq (MCA v2.0, API v2.0, Component v1.3.2)
>>> MCA rml: oob (MCA v2.0, API v2.0, Component v1.3.2)
>>> MCA routed: binomial (MCA v2.0, API v2.0, Component
>>> v1.3.2)
>>> MCA routed: direct (MCA v2.0, API v2.0, Component v1.3.2)
>>> MCA routed: linear (MCA v2.0, API v2.0, Component v1.3.2)
>>> MCA plm: rsh (MCA v2.0, API v2.0, Component v1.3.2)
>>> MCA plm: slurm (MCA v2.0, API v2.0, Component v1.3.2)
>>> MCA plm: tm (MCA v2.0, API v2.0, Component v1.3.2)
>>> MCA filem: rsh (MCA v2.0, API v2.0, Component v1.3.2)
>>> MCA errmgr: default (MCA v2.0, API v2.0, Component
>>> v1.3.2)
>>> MCA ess: env (MCA v2.0, API v2.0, Component v1.3.2)
>>> MCA ess: hnp (MCA v2.0, API v2.0, Component v1.3.2)
>>> MCA ess: singleton (MCA v2.0, API v2.0, Component
>>> v1.3.2)
>>> MCA ess: slurm (MCA v2.0, API v2.0, Component v1.3.2)
>>> MCA ess: tool (MCA v2.0, API v2.0, Component v1.3.2)
>>> MCA grpcomm: bad (MCA v2.0, API v2.0, Component v1.3.2)
>>> MCA grpcomm: basic (MCA v2.0, API v2.0, Component v1.3.2)
>>>
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users