Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] users Digest, Vol 1212, Issue 3, Message: 2
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2009-04-28 08:42:43


On Apr 27, 2009, at 10:22 PM, jan wrote:

> Thank You Jeff Squyres.
>
> I have checked out the web page
> http://www.open-mpi.org/community/lists/announce/2009/03/0029.php,
> then the
> page https://svn.open-mpi.org/trac/ompi/ticket/1853 , but the web page
> svn.open-mpi.org seems crash.
>

Try that ticket again; sometimes Trac does weird things. :-( A
reload of the page usually fixes the problem.

> Then I tried OpenMpi V1.3.2 for many different configuration again.
> but
> found the problem still occurred periodic, ie. twice success, then
> twice
> failed, twice
> success, then twice failed ... . Do you have any suggestion for this
> issue?
>

Can you send us a small example that reproduces the problem?

>
> Thank you again.
>
> Best Regards,
>
> Gloria Jan
> Wavelink Technology Inc.
>
>
> >
> > Per http://www.open-mpi.org/community/lists/announce/2009/03/0029.php
> ,
> > can you try upgrading to Open MPI v1.3.2?
> >
> >
> > On Apr 24, 2009, at 5:21 AM, jan wrote:
> >
> >> Dear Sir,
> >>
> >> I?m running a cluster with OpenMPI.
> >>
> >> $mpirun --mca mpi_show_mpi_alloc_mem_leaks 8 --mca
> >> mpi_show_handle_leaks 1 $HOME/test/cpi
> >>
> >> I got the error message as job failed:
> >>
> >> Process 15 on node2
> >> Process 6 on node1
> >> Process 14 on node2
> >> ? ? ?
> >> Process 0 on node1
> >> Process 10 on node2
> >> [node2][[9340,1],13][btl_openib_component.c:3002:poll_device] error
> >> polling HP C
> >> Q with -2 errno says Success
> >> [node2][[9340,1],9][btl_openib_component.c:3002:poll_device] error
> >> polling HP CQ
> >> with -2 errno says Success
> >> [node2][[9340,1],10][btl_openib_component.c:3002:poll_device] error
> >> polling HP C
> >> Q with -2 errno says Success
> >> [node2][[9340,1],11][btl_openib_component.c:3002:poll_device] error
> >> polling HP C
> >> Q with -2 errno says Success
> >> [node2][[9340,1],8][btl_openib_component.c:3002:poll_device] error
> >> polling HP CQ
> >> with -2 errno says Success
> >> [node2][[9340,1],15][btl_openib_component.c:3002:poll_device]
> [node2]
> >> [[9340,1],1
> >> 2][btl_openib_component.c:3002:poll_device] error polling HP CQ
> with
> >> -2 errno sa
> >> ys Success
> >> error polling HP CQ with -2 errno says Success
> >> [node2][[9340,1],14][btl_openib_component.c:3002:poll_device] error
> >> polling HP C
> >> Q with -2 errno says Success
> >> mpirun: killing job...
> >>
> >>
> --------------------------------------------------------------------------
> >> mpirun noticed that process rank 0 with PID 28438 on node node1
> >> exited on signal
> >> 0 (Unknown signal 0).
> >>
> --------------------------------------------------------------------------
> >> mpirun: clean termination accomplished
> >>
> >> and got the message as job success
> >>
> >> Process 1 on node1
> >> Process 2 on node1
> >> ? ? ?
> >> Process 13 on node2
> >> Process 14 on node2
> >>
> --------------------------------------------------------------------------
> >> The following memory locations were allocated via MPI_ALLOC_MEM but
> >> not freed via MPI_FREE_MEM before invoking MPI_FINALIZE:
> >>
> >> Process ID: [[13692,1],12]
> >> Hostname: node2
> >> PID: 30183
> >>
> >> (null)
> >>
> --------------------------------------------------------------------------
> >> [node1:32276] 15 more processes have sent help message help-mpool-
> >> base.txt / all
> >> mem leaks
> >> [node1:32276] Set MCA parameter "orte_base_help_aggregate" to 0 to
> >> see all help
> >> / error messages
> >>
> >>
> >> It occurred periodic, ie. twice success, then twice failed, twice
> >> success, then twice failed ? . I download the OFED-1.4.1-rc3 from
> >> The OpenFabrics Alliance and installed on Dell PowerEdge M600 Blade
> >> Server. The infiniband Mezzanine Cards is Mellanox ConnectX QDR &
> >> DDR. And infiniband switch module is Mellanox M2401G. OS is CentOS
> >> 5.3, kernel 2.6.18-128.1.6.el5, with PGI V7.2-5 compiler. It?s
> >> running OpenSM subnet manager.
> >>
> >> Best Regards,
> >>
> >> Gloria Jan
> >>
> >> Wavelink Technology Inc.
> >>
> >> The output of the "ompi_info --all" command as:
> >>
> >> Package: Open MPI root_at_vortex Distribution
> >> Open MPI: 1.3.1
> >> Open MPI SVN revision: r20826
> >> Open MPI release date: Mar 18, 2009
> >> Open RTE: 1.3.1
> >> Open RTE SVN revision: r20826
> >> Open RTE release date: Mar 18, 2009
> >> OPAL: 1.3.1
> >> OPAL SVN revision: r20826
> >> OPAL release date: Mar 18, 2009
> >> Ident string: 1.3.1
> >> MCA backtrace: execinfo (MCA v2.0, API v2.0, Component
> >> v1.3.1)
> >> MCA memory: ptmalloc2 (MCA v2.0, API v2.0, Component
> >> v1.3.1)
> >> MCA paffinity: linux (MCA v2.0, API v2.0, Component
> v1.3.1)
> >> MCA carto: auto_detect (MCA v2.0, API v2.0,
> Component
> >> v1.3.1)
> >> MCA carto: file (MCA v2.0, API v2.0, Component
> v1.3.1)
> >> MCA maffinity: first_use (MCA v2.0, API v2.0, Component
> >> v1.3.1)
> >> MCA timer: linux (MCA v2.0, API v2.0, Component
> v1.3.1)
> >> MCA installdirs: env (MCA v2.0, API v2.0, Component
> v1.3.1)
> >> MCA installdirs: config (MCA v2.0, API v2.0, Component
> >> v1.3.1)
> >> MCA dpm: orte (MCA v2.0, API v2.0, Component
> v1.3.1)
> >> MCA pubsub: orte (MCA v2.0, API v2.0, Component
> v1.3.1)
> >> MCA allocator: basic (MCA v2.0, API v2.0, Component
> v1.3.1)
> >> MCA allocator: bucket (MCA v2.0, API v2.0, Component
> >> v1.3.1)
> >> MCA coll: basic (MCA v2.0, API v2.0, Component
> v1.3.1)
> >> MCA coll: hierarch (MCA v2.0, API v2.0, Component
> >> v1.3.1)
> >> MCA coll: inter (MCA v2.0, API v2.0, Component
> v1.3.1)
> >> MCA coll: self (MCA v2.0, API v2.0, Component
> v1.3.1)
> >> MCA coll: sm (MCA v2.0, API v2.0, Component v1.3.1)
> >> MCA coll: sync (MCA v2.0, API v2.0, Component
> v1.3.1)
> >> MCA coll: tuned (MCA v2.0, API v2.0, Component
> v1.3.1)
> >> MCA io: romio (MCA v2.0, API v2.0, Component
> v1.3.1)
> >> MCA mpool: fake (MCA v2.0, API v2.0, Component
> v1.3.1)
> >> MCA mpool: rdma (MCA v2.0, API v2.0, Component
> v1.3.1)
> >> MCA mpool: sm (MCA v2.0, API v2.0, Component v1.3.1)
> >> MCA pml: cm (MCA v2.0, API v2.0, Component v1.3.1)
> >> MCA pml: ob1 (MCA v2.0, API v2.0, Component
> v1.3.1)
> >> MCA pml: v (MCA v2.0, API v2.0, Component v1.3.1)
> >> MCA bml: r2 (MCA v2.0, API v2.0, Component v1.3.1)
> >> MCA rcache: vma (MCA v2.0, API v2.0, Component
> v1.3.1)
> >> MCA btl: ofud (MCA v2.0, API v2.0, Component
> v1.3.1)
> >> MCA btl: openib (MCA v2.0, API v2.0, Component
> >> v1.3.1)
> >> MCA btl: self (MCA v2.0, API v2.0, Component
> v1.3.1)
> >> MCA btl: sm (MCA v2.0, API v2.0, Component v1.3.1)
> >> MCA btl: tcp (MCA v2.0, API v2.0, Component
> v1.3.1)
> >> MCA topo: unity (MCA v2.0, API v2.0, Component
> v1.3.1)
> >> MCA osc: pt2pt (MCA v2.0, API v2.0, Component
> v1.3.1)
> >> MCA osc: rdma (MCA v2.0, API v2.0, Component
> v1.3.1)
> >> MCA iof: hnp (MCA v2.0, API v2.0, Component
> v1.3.1)
> >> MCA iof: orted (MCA v2.0, API v2.0, Component
> v1.3.1)
> >> MCA iof: tool (MCA v2.0, API v2.0, Component
> v1.3.1)
> >> MCA oob: tcp (MCA v2.0, API v2.0, Component
> v1.3.1)
> >> MCA odls: default (MCA v2.0, API v2.0, Component
> >> v1.3.1)
> >> MCA ras: slurm (MCA v2.0, API v2.0, Component
> v1.3.1)
> >> MCA ras: tm (MCA v2.0, API v2.0, Component v1.3.1)
> >> MCA rmaps: rank_file (MCA v2.0, API v2.0, Component
> >> v1.3.1)
> >> MCA rmaps: round_robin (MCA v2.0, API v2.0,
> Component
> >> v1.3.1)
> >> MCA rmaps: seq (MCA v2.0, API v2.0, Component
> v1.3.1)
> >> MCA rml: oob (MCA v2.0, API v2.0, Component
> v1.3.1)
> >> MCA routed: binomial (MCA v2.0, API v2.0, Component
> >> v1.3.1)
> >> MCA routed: direct (MCA v2.0, API v2.0, Component
> >> v1.3.1)
> >> MCA routed: linear (MCA v2.0, API v2.0, Component
> >> v1.3.1)
> >> MCA plm: rsh (MCA v2.0, API v2.0, Component
> v1.3.1)
> >> MCA plm: slurm (MCA v2.0, API v2.0, Component
> v1.3.1)
> >> MCA plm: tm (MCA v2.0, API v2.0, Component v1.3.1)
> >> MCA filem: rsh (MCA v2.0, API v2.0, Component
> v1.3.1)
> >> MCA errmgr: default (MCA v2.0, API v2.0, Component
> >> v1.3.1)
> >> MCA ess: env (MCA v2.0, API v2.0, Component
> v1.3.1)
> >> MCA ess: hnp (MCA v2.0, API v2.0, Component
> v1.3.1)
> >> MCA ess: singleton (MCA v2.0, API v2.0, Component
> >> v1.3.1)
> >> MCA ess: slurm (MCA v2.0, API v2.0, Component
> v1.3.1)
> >> MCA ess: tool (MCA v2.0, API v2.0, Component
> v1.3.1)
> >> MCA grpcomm: bad (MCA v2.0, API v2.0, Component
> v1.3.1)
> >> MCA grpcomm: basic (MCA v2.0, API v2.0, Component
> v1.3.1)
> >> Prefix: /usr/mpi/pgi/openmpi-1.3.1
> >> Exec_prefix: /usr/mpi/pgi/openmpi-1.3.1
> >> Bindir: /usr/mpi/pgi/openmpi-1.3.1/bin
> >> Sbindir: /usr/mpi/pgi/openmpi-1.3.1/sbin
> >> Libdir: /usr/mpi/pgi/openmpi-1.3.1/lib64
> >> Incdir: /usr/mpi/pgi/openmpi-1.3.1/include
> >> Mandir: /usr/mpi/pgi/openmpi-1.3.1/share/man
> >> Pkglibdir: /usr/mpi/pgi/openmpi-1.3.1/lib64/openmpi
> >> Libexecdir: /usr/mpi/pgi/openmpi-1.3.1/libexec
> >> Datarootdir: /usr/mpi/pgi/openmpi-1.3.1/share
> >> Datadir: /usr/mpi/pgi/openmpi-1.3.1/share
> >> Sysconfdir: /usr/mpi/pgi/openmpi-1.3.1/etc
> >> Sharedstatedir: /usr/mpi/pgi/openmpi-1.3.1/com
> >> Localstatedir: /var Infodir: /usr/share/
> >> info
> >> Pkgdatadir: /usr/mpi/pgi/openmpi-1.3.1/share/openmpi
> >> Pkglibdir: /usr/mpi/pgi/openmpi-1.3.1/lib64/openmpi
> >> Pkgincludedir: /usr/mpi/pgi/openmpi-1.3.1/include/
> openmpi
> >> Configured architecture: x86_64-redhat-linux-gnu
> >> Configure host: vortex
> >> Configured by: root
> >> Configured on: Sun Apr 12 23:23:14 CST 2009
> >> Configure host: vortex
> >> Built by: root
> >> Built on: Sun Apr 12 23:28:52 CST 2009
> >> Built host: vortex
> >> C bindings: yes
> >> C++ bindings: yes
> >> Fortran77 bindings: yes (all)
> >> Fortran90 bindings: yes
> >> Fortran90 bindings size: small
> >> C compiler: pgcc
> >> C compiler absolute: /opt/pgi/linux86-64/7.2-5/bin/pgcc
> >> C char size: 1
> >> C bool size: 1
> >> C short size: 2
> >> C int size: 4
> >> C long size: 8
> >> C float size: 4
> >> C double size: 8
> >> C pointer size: 8
> >> C char align: 1
> >> C bool align: 1
> >> C int align: 4
> >> C float align: 4
> >> C double align: 8
> >> C++ compiler: pgCC
> >> C++ compiler absolute: /opt/pgi/linux86-64/7.2-5/bin/pgCC
> >> Fortran77 compiler: pgf77
> >> Fortran77 compiler abs: /opt/pgi/linux86-64/7.2-5/bin/pgf77
> >> Fortran90 compiler: pgf90
> >> Fortran90 compiler abs: /opt/pgi/linux86-64/7.2-5/bin/pgf90
> >> Fort integer size: 4
> >> Fort logical size: 4
> >> Fort logical value true: -1
> >> Fort have integer1: yes
> >> Fort have integer2: yes
> >> Fort have integer4: yes
> >> Fort have integer8: yes
> >> Fort have integer16: no
> >> Fort have real4: yes
> >> Fort have real8: yes
> >> Fort have real16: no
> >> Fort have complex8: yes
> >> Fort have complex16: yes
> >> Fort have complex32: no
> >> Fort integer1 size: 1
> >> Fort integer2 size: 2
> >> Fort integer4 size: 4
> >> Fort integer8 size: 8
> >> Fort integer16 size: -1
> >> Fort real size: 4
> >> Fort real4 size: 4
> >> Fort real8 size: 8
> >> Fort real16 size: -1
> >> Fort dbl prec size: 4
> >> Fort cplx size: 4
> >> Fort dbl cplx size: 4
> >> Fort cplx8 size: 8
> >> Fort cplx16 size: 16
> >> Fort cplx32 size: -1
> >> Fort integer align: 4
> >> Fort integer1 align: 1
> >> Fort integer2 align: 2
> >> Fort integer4 align: 4
> >> Fort integer8 align: 8
> >> Fort integer16 align: -1
> >> Fort real align: 4
> >> Fort real4 align: 4
> >> Fort real8 align: 8
> >> Fort real16 align: -1
> >> Fort dbl prec align: 4
> >> Fort cplx align: 4
> >> Fort dbl cplx align: 4
> >> Fort cplx8 align: 4
> >> Fort cplx16 align: 8
> >> Fort cplx32 align: -1
> >> C profiling: yes
> >> C++ profiling: yes Thread support: posix (mpi:
> >> no, progress: no)
> >> Sparse Groups: no
> >> Build CFLAGS: -O -DNDEBUG
> >> Build CXXFLAGS: -O -DNDEBUG
> >> Build FFLAGS:
> >> Build FCFLAGS: -O2
> >> Build LDFLAGS: -export-dynamic
> >> Build LIBS: -lnsl -lutil -lpthread
> >> Wrapper extra CFLAGS:
> >> Wrapper extra CXXFLAGS: -fpic
> >> Wrapper extra FFLAGS:
> >> Wrapper extra FCFLAGS:
> >> Wrapper extra LDFLAGS:
> >> Wrapper extra LIBS: -ldl -Wl,--export-dynamic -lnsl -
> lutil
> >> -lpthread -ldl
> >> Internal debug support: no
> >> MPI parameter check: runtime
> >> Memory profiling support: no
> >> Memory debugging support: no
> >> libltdl support: yes
> >> Heterogeneous support: no
> >> mpirun default --prefix: yes
> >> MPI I/O support: yes
> >> MPI_WTIME support: gettimeofday
> >> Symbol visibility support: no
> >> FT Checkpoint support: no (checkpoint thread: no)
> >> MCA mca: parameter "mca_param_files" (current
> >> value: "/home/alpha/.openmpi/mca-params.conf:/usr/mpi/pgi/open
> >> mpi-1.3.1/etc/openmpi-mca-params.conf", data source: default value)
> >> Path for MCA configuration files
> >> containing default parameter values
> >> MCA mca: parameter
> >> "mca_base_param_file_prefix" (current value: <none>, data source:
> >> default value)
> >> Aggregate MCA parameter file sets
> >> MCA mca: parameter
> >> "mca_base_param_file_path" (current value: "/usr/mpi/pgi/
> >> openmpi-1.3.1/share/openmpi/amca
> >> -param-sets:/home/alpha", data source: default value)
> >> Aggregate MCA parameter Search path
> >> MCA mca: parameter
> >> "mca_base_param_file_path_force" (current value: <none>, data
> >> source: default value)
> >> Forced Aggregate MCA parameter Search
> path
> >> MCA mca: parameter "mca_component_path" (current
> >> value: "/usr/mpi/pgi/openmpi-1.3.1/lib64/openmpi:/home/alph
> >> a/.openmpi/components", data source: default value)
> >> Path where to look for Open MPI and ORTE
> >> components
> >> MCA mca: parameter "mca_verbose" (current value:
> >> <none>, data source: default value)
> >> Top-level verbosity parameter
> >> MCA mca: parameter
> >> "mca_component_show_load_errors" (current value: "1", data source:
> >> default value)
> >> Whether to show errors for components
> that
> >> failed to load or not
> >> MCA mca: parameter
> >> "mca_component_disable_dlopen" (current value: "0", data source:
> >> default value)
> >> Whether to attempt to disable opening
> >> dynamic components or not
> >> MCA mpi: parameter "mpi_param_check" (current
> >> value: "1", data source: default value)
> >> Whether you want MPI API parameters
> >> checked at run-time or not. Possible values are 0 (no checking
> >> ) and 1 (perform checking at run-time)
> >> MCA mpi: parameter "mpi_yield_when_idle" (current
> >> value: "-1", data source: default value)
> >> Yield the processor when waiting for MPI
> >> communication (for MPI processes, will default to 1 when o
> >> versubscribing nodes)
> >> MCA mpi: parameter "mpi_event_tick_rate" (current
> >> value: "-1", data source: default value)
> >> How often to progress TCP communications
> >> (0 = never, otherwise specified in microseconds)
> >> MCA mpi: parameter
> "mpi_show_handle_leaks" (current
> >> value: "1", data source: environment)
> >> Whether MPI_FINALIZE shows all MPI
> handles
> >> that were not freed or not
> >> MCA mpi: parameter "mpi_no_free_handles" (current
> >> value: "0", data source: environment)
> >> Whether to actually free MPI objects when
> >> their handles are freed
> >> MCA mpi: parameter
> >> "mpi_show_mpi_alloc_mem_leaks" (current value: "8", data source:
> >> environment)
> >> If >0, MPI_FINALIZE will show up to this
> >> many instances of memory allocated by MPI_ALLOC_MEM that w
> >> as not freed by MPI_FREE_MEM
> >> MCA mpi: parameter "mpi_show_mca_params" (current
> >> value: <none>, data source: default value)
> >> Whether to show all MCA parameter values
> >> during MPI_INIT or not (good for reproducability of MPI jo
> >> bs for debug purposes). Accepted values are all, default, file,
> api,
> >> and enviro - or a comma delimited combination of them
> >> MCA mpi: parameter
> >> "mpi_show_mca_params_file" (current value: <none>, data source:
> >> default value)
> >> If mpi_show_mca_params is true, setting
> >> this string to a valid filename tells Open MPI to dump all
> >> the MCA parameter values into a file suitable for reading via the
> >> mca_param_files parameter (good for reproducability of MPI
> >> jobs)
> >> MCA mpi: parameter
> >> "mpi_keep_peer_hostnames" (current value: "1", data source: default
> >> value)
> >> If nonzero, save the string hostnames of
> >> all MPI peer processes (mostly for error / debugging outpu
> >> t messages). This can add quite a bit of memory usage to each MPI
> >> process.
> >> MCA mpi: parameter "mpi_abort_delay" (current
> >> value: "0", data source: default value)
> >> If nonzero, print out an identifying
> >> message when MPI_ABORT is invoked (hostname, PID of the proces
> >> s that called MPI_ABORT) and delay for that many seconds before
> >> exiting (a negative delay value means to never abort). This
> >> allows attaching of a debugger before quitting the job.
> >> MCA mpi: parameter
> "mpi_abort_print_stack" (current
> >> value: "0", data source: default value)
> >> If nonzero, print out a stack trace when
> >> MPI_ABORT is invoked
> >> MCA mpi: parameter "mpi_preconnect_mpi" (current
> >> value: "0", data source: default value, synonyms: mpi_preco
> >> nnect_all)
> >> Whether to force MPI processes to fully
> >> wire-up the MPI connections between MPI processes during MP
> >> I_INIT (vs. making connections lazily -- upon the first MPI traffic
> >> between each process peer pair)
> >> MCA mpi: parameter "mpi_preconnect_all" (current
> >> value: "0", data source: default value, deprecated, synonym
> >> of: mpi_preconnect_mpi)
> >> Whether to force MPI processes to fully
> >> wire-up the MPI connections between MPI processes during MP
> >> I_INIT (vs. making connections lazily -- upon the first MPI traffic
> >> between each process peer pair)
> >> MCA mpi: parameter "mpi_leave_pinned" (current
> >> value: "0", data source: environment)
> >> Whether to use the "leave pinned"
> protocol
> >> or not. Enabling this setting can help bandwidth perfor
> >> mance when repeatedly sending and receiving large messages with the
> >> same buffers over RDMA-based networks (0 = do not use "le
> >> ave pinned" protocol, 1 = use "leave pinned" protocol, -1 = allow
> >> network to choose at runtime).
> >> MCA mpi: parameter
> >> "mpi_leave_pinned_pipeline" (current value: "0", data source:
> >> default value)
> >> Whether to use the "leave pinned
> pipeline"
> >> protocol or not.
> >> MCA mpi: parameter "mpi_paffinity_alone" (current
> >> value: "0", data source: default value)
> >> If nonzero, assume that this job is the
> >> only (set of) process(es) running on each node and bind pro
> >> cesses to processors, starting with processor ID 0
> >> MCA mpi: parameter "mpi_warn_on_fork" (current
> >> value: "1", data source: default value)
> >> If nonzero, issue a warning if program
> >> forks under conditions that could cause system errors
> >> MCA mpi: information
> >> "mpi_have_sparse_group_storage" (value: "0", data source: default
> >> value)
> >> Whether this Open MPI installation
> >> supports storing of data in MPI groups in "sparse" formats (good
> >> for extremely large process count MPI jobs that create many
> >> communicators/groups)
> >> MCA mpi: parameter
> >> "mpi_use_sparse_group_storage" (current value: "0", data source:
> >> default value)
> >> Whether to use "sparse" storage formats
> >> for MPI groups (only relevant if mpi_have_sparse_group_storage is
> 1)
> >> MCA orte: parameter
> >> "orte_base_help_aggregate" (current value: "1", data source:
> default
> >> value)
> >> If orte_base_help_aggregate is true,
> >> duplicate help messages will be aggregated rather than display
> >> ed individually. This can be helpful for parallel jobs that
> >> experience multiple identical failures; rather than print out th
> >> e same help/failure message N times, display it once with a count
> of
> >> how many processes sent the same message.
> >> MCA orte: parameter "orte_tmpdir_base" (current
> >> value: <none>, data source: default value)
> >> Base of the session directory tree
> >> MCA orte: parameter "orte_no_session_dirs" (current
> >> value: <none>, data source: default value)
> >> Prohibited locations for session
> >> directories (multiple locations separated by ',', default=NULL)
> >> MCA orte: parameter "orte_debug" (current value:
> >> "0", data source: default value)
> >> Top-level ORTE debug switch (default
> >> verbosity: 1)
> >> MCA orte: parameter "orte_debug_verbose" (current
> >> value: "-1", data source: default value)
> >> Verbosity level for ORTE debug messages
> >> (default: 1)
> >> MCA orte: parameter "orte_debug_daemons" (current
> >> value: "0", data source: default value)
> >> Whether to debug the ORTE daemons or not
> >> MCA orte: parameter
> >> "orte_debug_daemons_file" (current value: "0", data source: default
> >> value)
> >> Whether want stdout/stderr of daemons to
> >> go to a file or not
> >> MCA orte: parameter
> >> "orte_leave_session_attached" (current value: "0", data source:
> >> default value)
> >> Whether applications and/or daemons
> should
> >> leave their sessions attached so that any output can be
> >> received - this allows X forwarding without all the attendant
> >> debugging output
> >> MCA orte: parameter "orte_do_not_launch" (current
> >> value: "0", data source: default value)
> >> Perform all necessary operations to
> >> prepare to launch the application, but do not actually launch it
> >> MCA orte: parameter "orte_daemon_spin" (current
> >> value: "0", data source: default value)
> >> Have any orteds spin until we can connect
> >> a debugger to them
> >> MCA orte: parameter "orte_daemon_fail" (current
> >> value: "-1", data source: default value)
> >> Have the specified orted fail after init
> >> for debugging purposes
> >> MCA orte: parameter
> >> "orte_daemon_fail_delay" (current value: "0", data source: default
> >> value)
> >> Have the specified orted fail after
> >> specified number of seconds (default: 0 => no delay)
> >> MCA orte: parameter "orte_heartbeat_rate" (current
> >> value: "0", data source: default value)
> >> Seconds between checks for daemon state-
> of-
> >> health (default: 0 => do not check)
> >> MCA orte: parameter "orte_startup_timeout" (current
> >> value: "0", data source: default value)
> >> Milliseconds/daemon to wait for startup
> >> before declaring failed_to_start (default: 0 => do not chec
> >> Fortran77 profiling: yes
> >> Fortran90 profiling: yes
> >> C++ exceptions: nok)
> >> MCA orte: parameter "orte_timing" (current value:
> >> "0", data source: default value)
> >> Request that critical timing loops be
> >> measured
> >> MCA orte: parameter
> >> "orte_base_user_debugger" (current value: "totalview @mpirun@ -a
> >> @mpirun_args@ : ddt -n @
> >> np@ -start @executable@ @executable_argv@ @single_app@ : fxp
> >> @mpirun@ -a @mpirun_args@", data source: default value)
> >> Sequence of user-level debuggers to
> search
> >> for in orterun
> >> MCA orte: parameter "orte_abort_timeout" (current
> >> value: "1", data source: default value)
> >> Max time to wait [in secs] before
> aborting
> >> an ORTE operation (default: 1sec)
> >> MCA orte: parameter "orte_timeout_step" (current
> >> value: "1000", data source: default value)
> >> Time to wait [in usecs/proc] before
> >> aborting an ORTE operation (default: 1000 usec/proc)
> >> MCA orte: parameter
> "orte_default_hostfile" (current
> >> value: <none>, data source: default value)
> >> Name of the default hostfile (relative or
> >> absolute path)
> >> MCA orte: parameter
> >> "orte_keep_fqdn_hostnames" (current value: "0", data source:
> default
> >> value)
> >> Whether or not to keep FQDN hostnames
> >> [default: no]
> >> MCA orte: parameter
> "orte_contiguous_nodes" (current
> >> value: "2147483647", data source: default value)
> >> Number of nodes after which contiguous
> >> nodename encoding will automatically be used [default: INT_MAX]
> >> MCA orte: parameter "orte_tag_output" (current
> >> value: "0", data source: default value)
> >> Tag all output with [job,rank] (default:
> >> false)
> >> MCA orte: parameter "orte_xml_output" (current
> >> value: "0", data source: default value)
> >> Display all output in XML format
> (default:
> >> false)
> >> MCA orte: parameter
> "orte_timestamp_output" (current
> >> value: "0", data source: default value)
> >> Timestamp all application process output
> >> (default: false)
> >> MCA orte: parameter "orte_output_filename" (current
> >> value: <none>, data source: default value)
> >> Redirect output from application
> processes
> >> into filename.rank [default: NULL]
> >> MCA orte: parameter
> >> "orte_show_resolved_nodenames" (current value: "0", data source:
> >> default value)
> >> Display any node names that are resolved
> >> to a different name (default: false)
> >> MCA orte: parameter "orte_hetero_apps" (current
> >> value: "0", data source: default value)
> >> Indicates that multiple app_contexts are
> >> being provided that are a mix of 32/64 bit binaries (default:
> false)
> >> MCA orte: parameter "orte_launch_agent" (current
> >> value: "orted", data source: default value)
> >> Command used to start processes on remote
> >> nodes (default: orted)
> >> MCA orte: parameter
> >> "orte_allocation_required" (current value: "0", data source:
> default
> >> value)
> >> Whether or not an allocation by a
> resource
> >> manager is required [default: no]
> >> MCA orte: parameter "orte_xterm" (current value:
> >> <none>, data source: default value)
> >> Create a new xterm window and display
> >> output from the specified ranks there [default: none]
> >> MCA orte: parameter
> >> "orte_forward_job_control" (current value: "0", data source:
> default
> >> value)
> >> Forward SIGTSTP (after converting to
> >> SIGSTOP) and SIGCONT signals to the application procs [default: no]
> >> MCA opal: parameter "opal_signal" (current value:
> >> "6,7,8,11", data source: default value)
> >> If a signal is received, display the
> stack
> >> trace frame
> >> MCA opal: parameter
> >> "opal_set_max_sys_limits" (current value: "0", data source: default
> >> value)
> >> Set to non-zero to automatically set any
> >> system-imposed limits to the maximum allowed
> >> MCA opal: parameter "opal_event_include" (current
> >> value: "poll", data source: default value)
> >> ... ... ...
> >>
> >> _______________________________________________
> >> users mailing list
> >> users_at_[hidden]
> >> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> >
> > --
> > Jeff Squyres
> > Cisco Systems
> >
> >
> >
> >
> > ------------------------------
> >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> > End of users Digest, Vol 1212, Issue 3
> > **************************************
> >
>

-- 
Jeff Squyres
Cisco Systems