Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] users Digest, Vol 1212, Issue 3
From: jan (jan_at_[hidden])
Date: 2009-04-26 22:24:48


Dear Jeff,

Thankyou for your help. I have tried the OpenMPI v1.3.2 on Sunday, but the
problems till occured.

Regards, Gloria
Wavelink Technology Inc.

> Per http://www.open-mpi.org/community/lists/announce/2009/03/0029.php,
> can you try upgrading to Open MPI v1.3.2?
>
>
> On Apr 24, 2009, at 5:21 AM, jan wrote:
>
>> Dear Sir,
>>
>> I?m running a cluster with OpenMPI.
>>
>> $mpirun --mca mpi_show_mpi_alloc_mem_leaks 8 --mca
>> mpi_show_handle_leaks 1 $HOME/test/cpi
>>
>> I got the error message as job failed:
>>
>> Process 15 on node2
>> Process 6 on node1
>> Process 14 on node2
>> ? ? ?
>> Process 0 on node1
>> Process 10 on node2
>> [node2][[9340,1],13][btl_openib_component.c:3002:poll_device] error
>> polling HP C
>> Q with -2 errno says Success
>> [node2][[9340,1],9][btl_openib_component.c:3002:poll_device] error
>> polling HP CQ
>> with -2 errno says Success
>> [node2][[9340,1],10][btl_openib_component.c:3002:poll_device] error
>> polling HP C
>> Q with -2 errno says Success
>> [node2][[9340,1],11][btl_openib_component.c:3002:poll_device] error
>> polling HP C
>> Q with -2 errno says Success
>> [node2][[9340,1],8][btl_openib_component.c:3002:poll_device] error
>> polling HP CQ
>> with -2 errno says Success
>> [node2][[9340,1],15][btl_openib_component.c:3002:poll_device] [node2]
>> [[9340,1],1
>> 2][btl_openib_component.c:3002:poll_device] error polling HP CQ with
>> -2 errno sa
>> ys Success
>> error polling HP CQ with -2 errno says Success
>> [node2][[9340,1],14][btl_openib_component.c:3002:poll_device] error
>> polling HP C
>> Q with -2 errno says Success
>> mpirun: killing job...
>>
>> --------------------------------------------------------------------------
>> mpirun noticed that process rank 0 with PID 28438 on node node1
>> exited on signal
>> 0 (Unknown signal 0).
>> --------------------------------------------------------------------------
>> mpirun: clean termination accomplished
>>
>> and got the message as job success
>>
>> Process 1 on node1
>> Process 2 on node1
>> ? ? ?
>> Process 13 on node2
>> Process 14 on node2
>> --------------------------------------------------------------------------
>> The following memory locations were allocated via MPI_ALLOC_MEM but
>> not freed via MPI_FREE_MEM before invoking MPI_FINALIZE:
>>
>> Process ID: [[13692,1],12]
>> Hostname: node2
>> PID: 30183
>>
>> (null)
>> --------------------------------------------------------------------------
>> [node1:32276] 15 more processes have sent help message help-mpool-
>> base.txt / all
>> mem leaks
>> [node1:32276] Set MCA parameter "orte_base_help_aggregate" to 0 to
>> see all help
>> / error messages
>>
>>
>> It occurred periodic, ie. twice success, then twice failed, twice
>> success, then twice failed ? . I download the OFED-1.4.1-rc3 from
>> The OpenFabrics Alliance and installed on Dell PowerEdge M600 Blade
>> Server. The infiniband Mezzanine Cards is Mellanox ConnectX QDR &
>> DDR. And infiniband switch module is Mellanox M2401G. OS is CentOS
>> 5.3, kernel 2.6.18-128.1.6.el5, with PGI V7.2-5 compiler. It?s
>> running OpenSM subnet manager.
>>
>> Best Regards,
>>
>> Gloria Jan
>>
>> Wavelink Technology Inc.
>>
>> The output of the "ompi_info --all" command as:
>>
>> Package: Open MPI root_at_vortex Distribution
>> Open MPI: 1.3.1
>> Open MPI SVN revision: r20826
>> Open MPI release date: Mar 18, 2009
>> Open RTE: 1.3.1
>> Open RTE SVN revision: r20826
>> Open RTE release date: Mar 18, 2009
>> OPAL: 1.3.1
>> OPAL SVN revision: r20826
>> OPAL release date: Mar 18, 2009
>> Ident string: 1.3.1
>> MCA backtrace: execinfo (MCA v2.0, API v2.0, Component
>> v1.3.1)
>> MCA memory: ptmalloc2 (MCA v2.0, API v2.0, Component
>> v1.3.1)
>> MCA paffinity: linux (MCA v2.0, API v2.0, Component v1.3.1)
>> MCA carto: auto_detect (MCA v2.0, API v2.0, Component
>> v1.3.1)
>> MCA carto: file (MCA v2.0, API v2.0, Component v1.3.1)
>> MCA maffinity: first_use (MCA v2.0, API v2.0, Component
>> v1.3.1)
>> MCA timer: linux (MCA v2.0, API v2.0, Component v1.3.1)
>> MCA installdirs: env (MCA v2.0, API v2.0, Component v1.3.1)
>> MCA installdirs: config (MCA v2.0, API v2.0, Component
>> v1.3.1)
>> MCA dpm: orte (MCA v2.0, API v2.0, Component v1.3.1)
>> MCA pubsub: orte (MCA v2.0, API v2.0, Component v1.3.1)
>> MCA allocator: basic (MCA v2.0, API v2.0, Component v1.3.1)
>> MCA allocator: bucket (MCA v2.0, API v2.0, Component
>> v1.3.1)
>> MCA coll: basic (MCA v2.0, API v2.0, Component v1.3.1)
>> MCA coll: hierarch (MCA v2.0, API v2.0, Component
>> v1.3.1)
>> MCA coll: inter (MCA v2.0, API v2.0, Component v1.3.1)
>> MCA coll: self (MCA v2.0, API v2.0, Component v1.3.1)
>> MCA coll: sm (MCA v2.0, API v2.0, Component v1.3.1)
>> MCA coll: sync (MCA v2.0, API v2.0, Component v1.3.1)
>> MCA coll: tuned (MCA v2.0, API v2.0, Component v1.3.1)
>> MCA io: romio (MCA v2.0, API v2.0, Component v1.3.1)
>> MCA mpool: fake (MCA v2.0, API v2.0, Component v1.3.1)
>> MCA mpool: rdma (MCA v2.0, API v2.0, Component v1.3.1)
>> MCA mpool: sm (MCA v2.0, API v2.0, Component v1.3.1)
>> MCA pml: cm (MCA v2.0, API v2.0, Component v1.3.1)
>> MCA pml: ob1 (MCA v2.0, API v2.0, Component v1.3.1)
>> MCA pml: v (MCA v2.0, API v2.0, Component v1.3.1)
>> MCA bml: r2 (MCA v2.0, API v2.0, Component v1.3.1)
>> MCA rcache: vma (MCA v2.0, API v2.0, Component v1.3.1)
>> MCA btl: ofud (MCA v2.0, API v2.0, Component v1.3.1)
>> MCA btl: openib (MCA v2.0, API v2.0, Component
>> v1.3.1)
>> MCA btl: self (MCA v2.0, API v2.0, Component v1.3.1)
>> MCA btl: sm (MCA v2.0, API v2.0, Component v1.3.1)
>> MCA btl: tcp (MCA v2.0, API v2.0, Component v1.3.1)
>> MCA topo: unity (MCA v2.0, API v2.0, Component v1.3.1)
>> MCA osc: pt2pt (MCA v2.0, API v2.0, Component v1.3.1)
>> MCA osc: rdma (MCA v2.0, API v2.0, Component v1.3.1)
>> MCA iof: hnp (MCA v2.0, API v2.0, Component v1.3.1)
>> MCA iof: orted (MCA v2.0, API v2.0, Component v1.3.1)
>> MCA iof: tool (MCA v2.0, API v2.0, Component v1.3.1)
>> MCA oob: tcp (MCA v2.0, API v2.0, Component v1.3.1)
>> MCA odls: default (MCA v2.0, API v2.0, Component
>> v1.3.1)
>> MCA ras: slurm (MCA v2.0, API v2.0, Component v1.3.1)
>> MCA ras: tm (MCA v2.0, API v2.0, Component v1.3.1)
>> MCA rmaps: rank_file (MCA v2.0, API v2.0, Component
>> v1.3.1)
>> MCA rmaps: round_robin (MCA v2.0, API v2.0, Component
>> v1.3.1)
>> MCA rmaps: seq (MCA v2.0, API v2.0, Component v1.3.1)
>> MCA rml: oob (MCA v2.0, API v2.0, Component v1.3.1)
>> MCA routed: binomial (MCA v2.0, API v2.0, Component
>> v1.3.1)
>> MCA routed: direct (MCA v2.0, API v2.0, Component
>> v1.3.1)
>> MCA routed: linear (MCA v2.0, API v2.0, Component
>> v1.3.1)
>> MCA plm: rsh (MCA v2.0, API v2.0, Component v1.3.1)
>> MCA plm: slurm (MCA v2.0, API v2.0, Component v1.3.1)
>> MCA plm: tm (MCA v2.0, API v2.0, Component v1.3.1)
>> MCA filem: rsh (MCA v2.0, API v2.0, Component v1.3.1)
>> MCA errmgr: default (MCA v2.0, API v2.0, Component
>> v1.3.1)
>> MCA ess: env (MCA v2.0, API v2.0, Component v1.3.1)
>> MCA ess: hnp (MCA v2.0, API v2.0, Component v1.3.1)
>> MCA ess: singleton (MCA v2.0, API v2.0, Component
>> v1.3.1)
>> MCA ess: slurm (MCA v2.0, API v2.0, Component v1.3.1)
>> MCA ess: tool (MCA v2.0, API v2.0, Component v1.3.1)
>> MCA grpcomm: bad (MCA v2.0, API v2.0, Component v1.3.1)
>> MCA grpcomm: basic (MCA v2.0, API v2.0, Component v1.3.1)
>> Prefix: /usr/mpi/pgi/openmpi-1.3.1
>> Exec_prefix: /usr/mpi/pgi/openmpi-1.3.1
>> Bindir: /usr/mpi/pgi/openmpi-1.3.1/bin
>> Sbindir: /usr/mpi/pgi/openmpi-1.3.1/sbin
>> Libdir: /usr/mpi/pgi/openmpi-1.3.1/lib64
>> Incdir: /usr/mpi/pgi/openmpi-1.3.1/include
>> Mandir: /usr/mpi/pgi/openmpi-1.3.1/share/man
>> Pkglibdir: /usr/mpi/pgi/openmpi-1.3.1/lib64/openmpi
>> Libexecdir: /usr/mpi/pgi/openmpi-1.3.1/libexec
>> Datarootdir: /usr/mpi/pgi/openmpi-1.3.1/share
>> Datadir: /usr/mpi/pgi/openmpi-1.3.1/share
>> Sysconfdir: /usr/mpi/pgi/openmpi-1.3.1/etc
>> Sharedstatedir: /usr/mpi/pgi/openmpi-1.3.1/com
>> Localstatedir: /var Infodir: /usr/share/
>> info
>> Pkgdatadir: /usr/mpi/pgi/openmpi-1.3.1/share/openmpi
>> Pkglibdir: /usr/mpi/pgi/openmpi-1.3.1/lib64/openmpi
>> Pkgincludedir: /usr/mpi/pgi/openmpi-1.3.1/include/openmpi
>> Configured architecture: x86_64-redhat-linux-gnu
>> Configure host: vortex
>> Configured by: root
>> Configured on: Sun Apr 12 23:23:14 CST 2009
>> Configure host: vortex
>> Built by: root
>> Built on: Sun Apr 12 23:28:52 CST 2009
>> Built host: vortex
>> C bindings: yes
>> C++ bindings: yes
>> Fortran77 bindings: yes (all)
>> Fortran90 bindings: yes
>> Fortran90 bindings size: small
>> C compiler: pgcc
>> C compiler absolute: /opt/pgi/linux86-64/7.2-5/bin/pgcc
>> C char size: 1
>> C bool size: 1
>> C short size: 2
>> C int size: 4
>> C long size: 8
>> C float size: 4
>> C double size: 8
>> C pointer size: 8
>> C char align: 1
>> C bool align: 1
>> C int align: 4
>> C float align: 4
>> C double align: 8
>> C++ compiler: pgCC
>> C++ compiler absolute: /opt/pgi/linux86-64/7.2-5/bin/pgCC
>> Fortran77 compiler: pgf77
>> Fortran77 compiler abs: /opt/pgi/linux86-64/7.2-5/bin/pgf77
>> Fortran90 compiler: pgf90
>> Fortran90 compiler abs: /opt/pgi/linux86-64/7.2-5/bin/pgf90
>> Fort integer size: 4
>> Fort logical size: 4
>> Fort logical value true: -1
>> Fort have integer1: yes
>> Fort have integer2: yes
>> Fort have integer4: yes
>> Fort have integer8: yes
>> Fort have integer16: no
>> Fort have real4: yes
>> Fort have real8: yes
>> Fort have real16: no
>> Fort have complex8: yes
>> Fort have complex16: yes
>> Fort have complex32: no
>> Fort integer1 size: 1
>> Fort integer2 size: 2
>> Fort integer4 size: 4
>> Fort integer8 size: 8
>> Fort integer16 size: -1
>> Fort real size: 4
>> Fort real4 size: 4
>> Fort real8 size: 8
>> Fort real16 size: -1
>> Fort dbl prec size: 4
>> Fort cplx size: 4
>> Fort dbl cplx size: 4
>> Fort cplx8 size: 8
>> Fort cplx16 size: 16
>> Fort cplx32 size: -1
>> Fort integer align: 4
>> Fort integer1 align: 1
>> Fort integer2 align: 2
>> Fort integer4 align: 4
>> Fort integer8 align: 8
>> Fort integer16 align: -1
>> Fort real align: 4
>> Fort real4 align: 4
>> Fort real8 align: 8
>> Fort real16 align: -1
>> Fort dbl prec align: 4
>> Fort cplx align: 4
>> Fort dbl cplx align: 4
>> Fort cplx8 align: 4
>> Fort cplx16 align: 8
>> Fort cplx32 align: -1
>> C profiling: yes
>> C++ profiling: yes Thread support: posix (mpi:
>> no, progress: no)
>> Sparse Groups: no
>> Build CFLAGS: -O -DNDEBUG
>> Build CXXFLAGS: -O -DNDEBUG
>> Build FFLAGS:
>> Build FCFLAGS: -O2
>> Build LDFLAGS: -export-dynamic
>> Build LIBS: -lnsl -lutil -lpthread
>> Wrapper extra CFLAGS:
>> Wrapper extra CXXFLAGS: -fpic
>> Wrapper extra FFLAGS:
>> Wrapper extra FCFLAGS:
>> Wrapper extra LDFLAGS:
>> Wrapper extra LIBS: -ldl -Wl,--export-dynamic -lnsl -lutil
>> -lpthread -ldl
>> Internal debug support: no
>> MPI parameter check: runtime
>> Memory profiling support: no
>> Memory debugging support: no
>> libltdl support: yes
>> Heterogeneous support: no
>> mpirun default --prefix: yes
>> MPI I/O support: yes
>> MPI_WTIME support: gettimeofday
>> Symbol visibility support: no
>> FT Checkpoint support: no (checkpoint thread: no)
>> MCA mca: parameter "mca_param_files" (current
>> value: "/home/alpha/.openmpi/mca-params.conf:/usr/mpi/pgi/open
>> mpi-1.3.1/etc/openmpi-mca-params.conf", data source: default value)
>> Path for MCA configuration files
>> containing default parameter values
>> MCA mca: parameter
>> "mca_base_param_file_prefix" (current value: <none>, data source:
>> default value)
>> Aggregate MCA parameter file sets
>> MCA mca: parameter
>> "mca_base_param_file_path" (current value: "/usr/mpi/pgi/
>> openmpi-1.3.1/share/openmpi/amca
>> -param-sets:/home/alpha", data source: default value)
>> Aggregate MCA parameter Search path
>> MCA mca: parameter
>> "mca_base_param_file_path_force" (current value: <none>, data
>> source: default value)
>> Forced Aggregate MCA parameter Search path
>> MCA mca: parameter "mca_component_path" (current
>> value: "/usr/mpi/pgi/openmpi-1.3.1/lib64/openmpi:/home/alph
>> a/.openmpi/components", data source: default value)
>> Path where to look for Open MPI and ORTE
>> components
>> MCA mca: parameter "mca_verbose" (current value:
>> <none>, data source: default value)
>> Top-level verbosity parameter
>> MCA mca: parameter
>> "mca_component_show_load_errors" (current value: "1", data source:
>> default value)
>> Whether to show errors for components that
>> failed to load or not
>> MCA mca: parameter
>> "mca_component_disable_dlopen" (current value: "0", data source:
>> default value)
>> Whether to attempt to disable opening
>> dynamic components or not
>> MCA mpi: parameter "mpi_param_check" (current
>> value: "1", data source: default value)
>> Whether you want MPI API parameters
>> checked at run-time or not. Possible values are 0 (no checking
>> ) and 1 (perform checking at run-time)
>> MCA mpi: parameter "mpi_yield_when_idle" (current
>> value: "-1", data source: default value)
>> Yield the processor when waiting for MPI
>> communication (for MPI processes, will default to 1 when o
>> versubscribing nodes)
>> MCA mpi: parameter "mpi_event_tick_rate" (current
>> value: "-1", data source: default value)
>> How often to progress TCP communications
>> (0 = never, otherwise specified in microseconds)
>> MCA mpi: parameter "mpi_show_handle_leaks" (current
>> value: "1", data source: environment)
>> Whether MPI_FINALIZE shows all MPI handles
>> that were not freed or not
>> MCA mpi: parameter "mpi_no_free_handles" (current
>> value: "0", data source: environment)
>> Whether to actually free MPI objects when
>> their handles are freed
>> MCA mpi: parameter
>> "mpi_show_mpi_alloc_mem_leaks" (current value: "8", data source:
>> environment)
>> If >0, MPI_FINALIZE will show up to this
>> many instances of memory allocated by MPI_ALLOC_MEM that w
>> as not freed by MPI_FREE_MEM
>> MCA mpi: parameter "mpi_show_mca_params" (current
>> value: <none>, data source: default value)
>> Whether to show all MCA parameter values
>> during MPI_INIT or not (good for reproducability of MPI jo
>> bs for debug purposes). Accepted values are all, default, file, api,
>> and enviro - or a comma delimited combination of them
>> MCA mpi: parameter
>> "mpi_show_mca_params_file" (current value: <none>, data source:
>> default value)
>> If mpi_show_mca_params is true, setting
>> this string to a valid filename tells Open MPI to dump all
>> the MCA parameter values into a file suitable for reading via the
>> mca_param_files parameter (good for reproducability of MPI
>> jobs)
>> MCA mpi: parameter
>> "mpi_keep_peer_hostnames" (current value: "1", data source: default
>> value)
>> If nonzero, save the string hostnames of
>> all MPI peer processes (mostly for error / debugging outpu
>> t messages). This can add quite a bit of memory usage to each MPI
>> process.
>> MCA mpi: parameter "mpi_abort_delay" (current
>> value: "0", data source: default value)
>> If nonzero, print out an identifying
>> message when MPI_ABORT is invoked (hostname, PID of the proces
>> s that called MPI_ABORT) and delay for that many seconds before
>> exiting (a negative delay value means to never abort). This
>> allows attaching of a debugger before quitting the job.
>> MCA mpi: parameter "mpi_abort_print_stack" (current
>> value: "0", data source: default value)
>> If nonzero, print out a stack trace when
>> MPI_ABORT is invoked
>> MCA mpi: parameter "mpi_preconnect_mpi" (current
>> value: "0", data source: default value, synonyms: mpi_preco
>> nnect_all)
>> Whether to force MPI processes to fully
>> wire-up the MPI connections between MPI processes during MP
>> I_INIT (vs. making connections lazily -- upon the first MPI traffic
>> between each process peer pair)
>> MCA mpi: parameter "mpi_preconnect_all" (current
>> value: "0", data source: default value, deprecated, synonym
>> of: mpi_preconnect_mpi)
>> Whether to force MPI processes to fully
>> wire-up the MPI connections between MPI processes during MP
>> I_INIT (vs. making connections lazily -- upon the first MPI traffic
>> between each process peer pair)
>> MCA mpi: parameter "mpi_leave_pinned" (current
>> value: "0", data source: environment)
>> Whether to use the "leave pinned" protocol
>> or not. Enabling this setting can help bandwidth perfor
>> mance when repeatedly sending and receiving large messages with the
>> same buffers over RDMA-based networks (0 = do not use "le
>> ave pinned" protocol, 1 = use "leave pinned" protocol, -1 = allow
>> network to choose at runtime).
>> MCA mpi: parameter
>> "mpi_leave_pinned_pipeline" (current value: "0", data source:
>> default value)
>> Whether to use the "leave pinned pipeline"
>> protocol or not.
>> MCA mpi: parameter "mpi_paffinity_alone" (current
>> value: "0", data source: default value)
>> If nonzero, assume that this job is the
>> only (set of) process(es) running on each node and bind pro
>> cesses to processors, starting with processor ID 0
>> MCA mpi: parameter "mpi_warn_on_fork" (current
>> value: "1", data source: default value)
>> If nonzero, issue a warning if program
>> forks under conditions that could cause system errors
>> MCA mpi: information
>> "mpi_have_sparse_group_storage" (value: "0", data source: default
>> value)
>> Whether this Open MPI installation
>> supports storing of data in MPI groups in "sparse" formats (good
>> for extremely large process count MPI jobs that create many
>> communicators/groups)
>> MCA mpi: parameter
>> "mpi_use_sparse_group_storage" (current value: "0", data source:
>> default value)
>> Whether to use "sparse" storage formats
>> for MPI groups (only relevant if mpi_have_sparse_group_storage is 1)
>> MCA orte: parameter
>> "orte_base_help_aggregate" (current value: "1", data source: default
>> value)
>> If orte_base_help_aggregate is true,
>> duplicate help messages will be aggregated rather than display
>> ed individually. This can be helpful for parallel jobs that
>> experience multiple identical failures; rather than print out th
>> e same help/failure message N times, display it once with a count of
>> how many processes sent the same message.
>> MCA orte: parameter "orte_tmpdir_base" (current
>> value: <none>, data source: default value)
>> Base of the session directory tree
>> MCA orte: parameter "orte_no_session_dirs" (current
>> value: <none>, data source: default value)
>> Prohibited locations for session
>> directories (multiple locations separated by ',', default=NULL)
>> MCA orte: parameter "orte_debug" (current value:
>> "0", data source: default value)
>> Top-level ORTE debug switch (default
>> verbosity: 1)
>> MCA orte: parameter "orte_debug_verbose" (current
>> value: "-1", data source: default value)
>> Verbosity level for ORTE debug messages
>> (default: 1)
>> MCA orte: parameter "orte_debug_daemons" (current
>> value: "0", data source: default value)
>> Whether to debug the ORTE daemons or not
>> MCA orte: parameter
>> "orte_debug_daemons_file" (current value: "0", data source: default
>> value)
>> Whether want stdout/stderr of daemons to
>> go to a file or not
>> MCA orte: parameter
>> "orte_leave_session_attached" (current value: "0", data source:
>> default value)
>> Whether applications and/or daemons should
>> leave their sessions attached so that any output can be
>> received - this allows X forwarding without all the attendant
>> debugging output
>> MCA orte: parameter "orte_do_not_launch" (current
>> value: "0", data source: default value)
>> Perform all necessary operations to
>> prepare to launch the application, but do not actually launch it
>> MCA orte: parameter "orte_daemon_spin" (current
>> value: "0", data source: default value)
>> Have any orteds spin until we can connect
>> a debugger to them
>> MCA orte: parameter "orte_daemon_fail" (current
>> value: "-1", data source: default value)
>> Have the specified orted fail after init
>> for debugging purposes
>> MCA orte: parameter
>> "orte_daemon_fail_delay" (current value: "0", data source: default
>> value)
>> Have the specified orted fail after
>> specified number of seconds (default: 0 => no delay)
>> MCA orte: parameter "orte_heartbeat_rate" (current
>> value: "0", data source: default value)
>> Seconds between checks for daemon state-of-
>> health (default: 0 => do not check)
>> MCA orte: parameter "orte_startup_timeout" (current
>> value: "0", data source: default value)
>> Milliseconds/daemon to wait for startup
>> before declaring failed_to_start (default: 0 => do not chec
>> Fortran77 profiling: yes
>> Fortran90 profiling: yes
>> C++ exceptions: nok)
>> MCA orte: parameter "orte_timing" (current value:
>> "0", data source: default value)
>> Request that critical timing loops be
>> measured
>> MCA orte: parameter
>> "orte_base_user_debugger" (current value: "totalview @mpirun@ -a
>> @mpirun_args@ : ddt -n @
>> np@ -start @executable@ @executable_argv@ @single_app@ : fxp
>> @mpirun@ -a @mpirun_args@", data source: default value)
>> Sequence of user-level debuggers to search
>> for in orterun
>> MCA orte: parameter "orte_abort_timeout" (current
>> value: "1", data source: default value)
>> Max time to wait [in secs] before aborting
>> an ORTE operation (default: 1sec)
>> MCA orte: parameter "orte_timeout_step" (current
>> value: "1000", data source: default value)
>> Time to wait [in usecs/proc] before
>> aborting an ORTE operation (default: 1000 usec/proc)
>> MCA orte: parameter "orte_default_hostfile" (current
>> value: <none>, data source: default value)
>> Name of the default hostfile (relative or
>> absolute path)
>> MCA orte: parameter
>> "orte_keep_fqdn_hostnames" (current value: "0", data source: default
>> value)
>> Whether or not to keep FQDN hostnames
>> [default: no]
>> MCA orte: parameter "orte_contiguous_nodes" (current
>> value: "2147483647", data source: default value)
>> Number of nodes after which contiguous
>> nodename encoding will automatically be used [default: INT_MAX]
>> MCA orte: parameter "orte_tag_output" (current
>> value: "0", data source: default value)
>> Tag all output with [job,rank] (default:
>> false)
>> MCA orte: parameter "orte_xml_output" (current
>> value: "0", data source: default value)
>> Display all output in XML format (default:
>> false)
>> MCA orte: parameter "orte_timestamp_output" (current
>> value: "0", data source: default value)
>> Timestamp all application process output
>> (default: false)
>> MCA orte: parameter "orte_output_filename" (current
>> value: <none>, data source: default value)
>> Redirect output from application processes
>> into filename.rank [default: NULL]
>> MCA orte: parameter
>> "orte_show_resolved_nodenames" (current value: "0", data source:
>> default value)
>> Display any node names that are resolved
>> to a different name (default: false)
>> MCA orte: parameter "orte_hetero_apps" (current
>> value: "0", data source: default value)
>> Indicates that multiple app_contexts are
>> being provided that are a mix of 32/64 bit binaries (default: false)
>> MCA orte: parameter "orte_launch_agent" (current
>> value: "orted", data source: default value)
>> Command used to start processes on remote
>> nodes (default: orted)
>> MCA orte: parameter
>> "orte_allocation_required" (current value: "0", data source: default
>> value)
>> Whether or not an allocation by a resource
>> manager is required [default: no]
>> MCA orte: parameter "orte_xterm" (current value:
>> <none>, data source: default value)
>> Create a new xterm window and display
>> output from the specified ranks there [default: none]
>> MCA orte: parameter
>> "orte_forward_job_control" (current value: "0", data source: default
>> value)
>> Forward SIGTSTP (after converting to
>> SIGSTOP) and SIGCONT signals to the application procs [default: no]
>> MCA opal: parameter "opal_signal" (current value:
>> "6,7,8,11", data source: default value)
>> If a signal is received, display the stack
>> trace frame
>> MCA opal: parameter
>> "opal_set_max_sys_limits" (current value: "0", data source: default
>> value)
>> Set to non-zero to automatically set any
>> system-imposed limits to the maximum allowed
>> MCA opal: parameter "opal_event_include" (current
>> value: "poll", data source: default value)
>> ... ... ...
>>
> --
> Jeff Squyres
> Cisco Systems
>
>