Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Ralph Castain (rhc_at_[hidden])
Date: 2007-07-10 18:38:58


On 7/10/07 3:56 PM, "Glenn Carver" <Glenn.Carver_at_[hidden]> wrote:

> Brian, Ralph,
>
> I neglected to mention in my first email that the application hasn't
> completed when I see the "HNP lost" messages. All processes of the
> pplication are still running on the nodes (well consuming cpu cycles
> really). I should check to see i mpirun is still there.

Ah - yes, that does make a large difference. In that case, my explanation is
invalid as it applies to the scenario where your procs have completed, and
that message dos mean that you are losing communication. My apologies.

If mpirun is still there, then I would be very suspicious of your fabric.

>
> Further investigation has revealed that the IB switc is logging a
> fault on both power supplies (even though all status lights are
> green). So I suspect I do have a flaky IB problem. I'll start there
> anyway.
>
> On quick question - what would be a good TCP stress test to do? It
> might help me tie down the problem o the switch or the adapter cards.

Have to let someone else help you ere.

>
> Thanks again,
> Glenn
>
>
>> What Ralph said is generally true. If your application competed,
>> this is nothing to worry about. It means that an error occurred on
>> the socket between mpirun ad some other process. However, combind
>> with the travo0 errors in the log files, it could mean that your
>> IPoIB network is acting flaky. That would have e slightly
>> concerned. Enough that I'd consider running some TCP stress tests on
>> the networkto make sure it's acting normally.
>>
>> Hope this helps,
>>
>> Brian
>>
>> On Jul 10, 2007, at 11:32 AM, Ralph H Castain wrote:
>>
>>>
>>>
>>>
>>> On 7/10/07 11:08 AM, "Glenn Crver" <Glenn.Carver_at_[hidden]>
>>> wrote:
>>>
>>>> i,
>>>>
>>>> I'd be grateful if someone could explain the meaning of this error
>>>> message to me and whether it indicates a hardware problem or
>>>> application software issue:
>>>>
>>>> [node2:11881] OOB: Connecion to HNP lost
>>>> [node1:09876] OOB: Connection to HNP lost
>>>
>>> This message is nothing to be concerned about - all it indicates is
>>> that
>>> mpirun exitedbefore our daemon on your backend nodes did. It's
>>> relatively
>>> harmless and probably should beeliminated in some future version
>>> (except
>>> when developers are running in debug mode).
>>
>>> The message can appear when the timing changes between front and
>>> backend
>>> nodes. What happens is:
>>>
>>> 1. mpirun detects that your proceses have all completed. It then
>>> orders the
>>> shutdown of the daemons on your backend nodes.
>>>
>>> 2. each daemondoes an orderly shutdown. Just before it terminates,
>>> i tells
>>> mpirun that it is done cleaning up and is about to exit
>>>
>>> 3. when mpirun hears that all daemons are done cleaning up, it
>>> exits itself.
>>> This is where the timing issue comes into play - if mpirun exits
>>> before the
>>> daemon, then you get that error message as the daemon is terminating.
>>>
>>> So it's all a question of whether mpirun completes the last few
>>> steps to
>>> exit before the daemons do. In most cases, the daemons complete
>>> first as
>>> they have less to do. Sometimes, mpirun manages to get out first,
>>> and you
>>> get the message.
>>>
>>> I doubt it has anything to do with your hardwareissues.
>>> Personally, I would
>>> just ignore the message - I'll see it gets removd in later
>>> releases to
>>> avoid unnecessary confusion.
>>>
>>> Hope that helps
>>> Ralph
>>>
>>>
>>>>
>>>> I have a small cluster which until last week was just fine.
>>>> Unfortunately we were ht by a sudden power dip which brought the
>>>> cluster down and did significant damage to other servers (blew power
>>>> supplies and disk). Although the cluster machines and the Infiniband
>>>> link is up and running jobs I am now getting thse errors in user
>>>> applications which we've never had before.
>>>>
>>>> The system messages file reports (for node2):
>>>> Jul 5 12:08:28 node1 genunix: [ID 408789 kern.notice] OTICE:
>>>> tavor0: fault cleared external to device; service available
>>>> Jul 5 12:0:28 node1 genunix: [ID 451854 kern.notice] NOTICE:
>>>> tavor0: port 1 up
>>>> Jul 7 16:18:32 node1 genunix: [ID 408114 kern.info]
>>>> /pci_at_1,0/pci1022,7450_at_2/pci15b3,5a46_at_1/pci15b3,5a44_at_0 (tavor0) online
>>>> Jul 7 16:18:32 node1 ib: [ID 842868 kern.info] IB device:
>>>> daplt_at_0, daplt0
>>>> Jul 7 1:18:32 node1 genunix: [ID 936769 kern.info] daplt0 is /ib/
>>>> daplt_at_0
>>>> Jul 7 16:18:32 node1 genunix: [ID 408114 kern.info] /ib/daplt_at_0
>>>> (daplt0) online
>>>> Jul 7 16:18:3 node1 genunix: [ID 834635 kern.info] /ib/daplt_at_0
>>>> (daplt0) multipath status: degraded, path
>>>> /pci_at_1,0/pci1022,7450_at_2/pci15
>>>> b3,5a46_at_1/pci15b3,5a44_at_0 (tavor0) to target address: daplt,0 is
>>>> online Load balancing: round-robin
>>>
>>>> I wonder if this messages are indicative of a hardware problem,
>>>> possibly on the Infiniband switch or the host adapters on the cluster
>>>> machines. The cluster software has not been altered but there have
>>>> been small changes to the application codes. But I want t rule out
>>>> hardware issues because of the power dip first.
>>>>
>>>> Anyone seen this message before and know whether to investigate
>>>> hardware first? I did check the archives but it didn't help. More
>>>> info provided below.
>>>>
>>>> Any help appreciate, thanks.
>>>>
>>>> Glenn
>>>>
>>>> --
>>>> Details:
>>>> Cluster uses mix of Sun's X4100/X4200 machines linked with Sun
>>>> supplied Infiniband and host adapters. All machines are running
>>>> Solaris 10_x86 (11/06) with latest kernel patches
>>>> Software is Sun Clustertools 7.
>>>>
>>>> Node2 $ ifconfig ibd1
>>>> ibd1: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 2044
>>>> index 3
>>>> inet 192.168.50.202 netmask fffff00 broadcast
>>>> 192.168.50.255
>>>>
>>>> Node1 $ ifconfig ibd1
>>>> ibd1: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 2044
>>>> index3
>>>> inet 192.168.50.201 netmask ffffff00 broadcast
>>>> 192.168.50.255
>>>>
>>>>
>>>> ompi_info -a
>>>> Open MPI: 1.2.1r1496-ct7b030r1838
>>>> Open MPI SVN revision: 0
>>>> Open RTE: 1.2.1r14096-ct7b030r1838
>>>> Open RTE SVN revision: 0
>>> OPAL: 1.2.1r14096-ct7b030r1838
>>>> OPAL SVN revision: 0
>>>> MCA backtrace: printstack MCA v1.0, API v1.0,
>>>> Component v1.2.1)
>>>> MCA paffinity: solaris (MCA v1.0, API v1.0, Component
>>>> v1.2.1)
>>>> MCA maffinity: first_use (MCA v1.0, API v1.0,
>>>> Component v1.2.1)
>>>> MCA timer: solaris (MCA v1.0, API v1.0, Component
>>>> v1.2.1)
>>>> MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0)
>>>> MCA allocator: bucket (MCA v10, API v1.0, Component
>>>> v1.0)
>>>> MCA coll: basic (MCA v1.0, API v1.0, Component
>>>> v1.2.1)
>>>> MCA coll: self (MA v1.0, API v1.0, Component
>>>> v1.2.1)
>>>> MCA coll: sm (MCA v1.0, API v1.0, Comonent v1.2.1)
>>>> MCA coll: tuned (MCA v1.0, API v1.0, Component
>>>> v1.2.1)
>>> MCA io: romio (MCA v1.0, API v1.0, Component
>>>> v1.2.1)
>>>> MCA mpool: sm (MCA v1.0, API v1.0, Component v1.2.1)
>>>> MCA mool: udapl (MCA v1.0, API v1.0, Component
>>>> v1.2.1)
>>>> MCA pml: c (MCA v1.0, API v1.0, Component v1.2.1)
>>>> MCA pml: ob1 (MCA v1.0, AP v1.0, Component v1.2.1)
>>>> MCA bml: r2 MCA v1.0, API v1.0, Component v1.2.1)
>>>> MCA rcache: rb (MCA v1.0, API v1.0, Component v1.2.1)
>>>> MCA rcache: vma (MCA v1.0, API v1.0, Component v1.2.1)
>>>> MCA btl: self (MCA v1.0, API v1.0.1, Component
>>>> v1.2.1)
>>>> MCA btl: sm (MCA v1.0, API v1.0.1, Component
>>>> v1.2.1)
>>>> MCA btl: tcp (MCA v1.0, API v1.0.1, Component v1.0)
>>>> MCA btl: udapl (MCA v1.0, API v1.0, Component
>>>> v1.2.1)
>>>> MCA topo: unity (MCA v1.0, API v1.0, Component
>>>> v1.2.1)
>>>> MCA osc: pt2pt (MCA v1.0, API v1.0, Component
>>>> v1.2.1)
>>>> MCA errmgr: hnp (MCA v1.0, API v1.3, Component v1.2.1)
>>>> MCA errmgr: orted (MCA v1.0, API v1.3, Component
>>>> v1.2.1)
>>>> MCA errmgr: proxy (MCA v1.0, API v1.3, Component
>>>> v1.2.1)
>>>> MCA gpr: null (MCA v1.0, API v1.0, Component
>>>> v1.2.1)
>>>> MCA gpr: proxy (MCA v1.0, API v1.0, Component
>>>> v1.2.1)
>>>> MCA gpr: replica (MCA v1.0, API v1.0, Component
>>>> v1.2.1)
>>>> MCA iof: proxy (MCA v1.0, API v1.0, Component
>>>> v1.2.1)
>>>> MCA iof: svc (MCA v1.0, API v1.0, Component v1.2.1)
>>>> MCA ns: proxy (MCA v1.0, API v2.0, Component
>>>> v1.2.1)
>>>> MCA ns: replica (MCA v1.0, API v2.0, Component
>>>> v1.2.1)
>>>> MCA oob: tcp (MCA v1.0, API v1.0, Component v1.0)
>>>> MCA ras: dash_host (MCA v1.0, API v1.3,
>>>> Component v1.2.1)
>>>> MCA ras: gridengine (MCA v1.0, API v1.3,
>>>> Component v1.2.1)
>>>> MCA ras: localhost (MCA v1.0, API v1.3,
>>>> Component v1.2.1)
>>>> MCA ras: tm (MCA v1.0, API v1.3, Component v1.2.1)
>>>> MCA rds: hostfile (MCA v1.0, API v1.3, Component
>>>> v1.2.1)
>>>> MCA rds: proxy (MCA v1.0, API v1.3, Component
>>>> v1.2.1)
>>>> MCA rds: resfile (MCA v1.0, API v1.3, Component
>>>> v1.2.1)
>>>> MCA rmaps: round_robin (MCA v1.0, API v1.3,
>>>> Component v1.2.1)
>>>> MCA rmgr: proxy (MCA v1.0, API v2.0, Component
>>>> v1.2.1)
>>>> MCA rmgr: urm (MCA v1.0, API v2.0, Component v1.2.1)
>>>> MCA rml: oob (MCA v1.0, API v1.0, Component v1.2.1)
>>>> MCA pls: gridengine (MCA v1.0, API v1.3,
>>>> Component v1.2.1)
>>>> MCA pls: proxy (MCA v1.0, API v1.3, Component
>>>> v1.2.1)
>>>> MCA pls: rsh (MCA v1.0, API v1.3, Component v1.2.1)
>>>> MCA pls: tm (MCA v1.0, API v1.3, Component v1.2.1)
>>>> MCA sds: env (MCA v1.0, API v1.0, Component v1.2.1)
>>>> MCA sds: pipe (MCA v1.0, API v1.0, Component
>>>> v1.2.1)
>>>> MCA sds: seed (MCA v1.0, API v1.0, Component
>>>> v1.2.1)
>>>> MCA sds: singleton (MCA v1.0, API v1.0,
>>>> Component v1.2.1)
>>>> Prefix: /opt/SUNWhpc/HPC7.0
>>>> Bindir: /opt/SUNWhpc/HPC7.0/bin
>>>> Libdir: /opt/SUNWhpc/HPC7.0/lib
>>>> Incdir: /opt/SUNWhpc/HPC7.0/include
>>>> Pkglibdir: /opt/SUNWhpc/HPC7.0/lib/openmpi
>>>> Sysconfdir: /opt/SUNWhpc/HPC7.0/etc
>>>> Configured architecture: i386-pc-solaris2.10
>>>> Configured by: root
>>>> Configured on: Fri Mar 30 13:40:12 EDT 2007
>>>> Configure host: burpen-csx10-0
>>>> Built by: root
>>>> Built on: Fri Mar 30 13:57:25 EDT 2007
>>>> Built host: burpen-csx10-0
>>>> C bindings: yes
>>>> C++ bindings: yes
>>>> Fortran77 bindings: yes (all)
>>>> Fortran90 bindings: yes
>>>> Fortran90 bindings size: trivial
>>>> C compiler: cc
>>>> C compiler absolute: /ws/ompi-tools/SUNWspro/SOS11/bin/cc
>>>> C char size: 1
>>>> C bool size: 1
>>>> C short size: 2
>>>> C int size: 4
>>>> C long size: 4
>>>> C float size: 4
>>>> C double size: 8
>>>> C pointer size: 4
>>>> C char align: 1
>>>> C bool align: 1
>>>> C int align: 4
>>>> C float align: 4
>>>> C double align: 4
>>>> C++ compiler: CC
>>>> C++ compiler absolute: /ws/ompi-tools/SUNWspro/SOS11/bin/CC
>>>> Fortran77 compiler: f77
>>>> Fortran77 compiler abs: /ws/ompi-tools/SUNWspro/SOS11/bin/f77
>>>> Fortran90 compiler: f95
>>>> Fortran90 compiler abs: /ws/ompi-tools/SUNWspro/SOS11/bin/f95
>>>> Fort integer size: 4
>>>> Fort logical size: 4
>>>> Fort logical value true: 1
>>>> Fort have integer1: yes
>>>> Fort have integer2: yes
>>>> Fort have integer4: yes
>>>> Fort have integer8: yes
>>>> Fort have integer16: no
>>>> Fort have real4: yes
>>>> Fort have real8: yes
>>>> Fort have real16: no
>>>> Fort have complex8: yes
>>>> Fort have complex16: yes
>>>> Fort have complex32: no
>>>> Fort integer1 size: 1
>>>> Fort integer2 size: 2
>>>> Fort integer4 size: 4
>>>> Fort integer8 size: 8
>>>> Fort integer16 size: -1
>>>> Fort real size: 4
>>>> Fort real4 size: 4
>>>> Fort real8 size: 8
>>>> Fort real16 size: -1
>>>> Fort dbl prec size: 4
>>>> Fort cplx size: 4
>>>> Fort dbl cplx size: 4
>>>> Fort cplx8 size: 8
>>>> Fort cplx16 size: 16
>>>> Fort cplx32 size: -1
>>>> Fort integer align: 4
>>>> Fort integer1 align: 1
>>>> Fort integer2 align: 2
>>>> Fort integer4 align: 4
>>>> Fort integer8 align: 4
>>>> Fort integer16 align: -1
>>>> Fort real align: 4
>>>> Fort real4 align: 4
>>>> Fort real8 align: 4
>>>> Fort real16 align: -1
>>>> Fort dbl prec align: 4
>>>> Fort cplx align: 4
>>>> Fort dbl cplx align: 4
>>>> Fort cplx8 align: 4
>>>> Fort cplx16 align: 4
>>>> Fort cplx32 align: -1
>>>> C profiling: yes
>>>> C++ profiling: yes
>>>> Fortran77 profiling: yes
>>>> Fortran90 profiling: yes
>>>> C++ exceptions: yes
>>>> Thread support: no
>>>> Build CFLAGS: -DNDEBUG -xtarget=opteron -xarch=sse2 -
>>>> xprefetch
>>>> -xprefetch_level=2 -xvector=simd
>>>> -xdepend=yes -xbuiltin=%all
>>>> -xO5
>>>> Build CXXFLAGS: -DNDEBUG -xtarget=opteron -xarch=sse2 -
>>>> xprefetch
>>>> -xprefetch_level=2 -xvector=simd
>>>> -xdepend=yes -xbuiltin=%all
>>>> -xO5
>>>> Build FFLAGS: -xtarget=opteron -xarch=sse2 -xprefetch
>>>> -xprefetch_level=2
>>>> -xvector=simd -stackvar -xO5
>>>> Build FCFLAGS: -xtarget=opteron -xarch=sse2 -xprefetch
>>>> -xprefetch_level=2
>>>> -xvector=simd -stackvar -xO5
>>>> Build LDFLAGS: -export-dynamic -R/opt/mx/lib
>>>> -R/opt/SUNWhpc/HPC7.0/lib
>>>> -R/opt/mx/lib/amd64 -R/opt/SUNWhpc/
>>>> HPC7.0/lib/amd64
>>>> -R/opt/mx/lib -R/opt/SUNWhpc/HPC7.0/lib
>>>> -R/opt/mx/lib/amd64
>>>> -R/opt/SUNWhpc/HPC7.0/lib/amd64 -R/opt/
>>>> mx/lib
>>>> -R/opt/SUNWhpc/HPC7.0/lib -R/opt/mx/lib/
>>>> amd64
>>>> -R/opt/SUNWhpc/HPC7.0/lib/amd64
>>>> Build LIBS: -lsocket -lnsl -lrt -lm
>>>> Wrapper extra CFLAGS:
>>>> Wrapper extra CXXFLAGS:
>>>> Wrapper extra FFLAGS:
>>>> Wrapper extra FCFLAGS:
>>>> Wrapper extra LDFLAGS: -R/opt/mx/lib -R/opt/SUNWhpc/HPC7.0/
>>>> lib
>>>> -R/opt/mx/lib/amd64
>>>> -R/opt/SUNWhpc/HPC7.0/lib/amd64
>>>> Wrapper extra LIBS: -lsocket -lnsl -lrt -lm -ldl
>>>> Internal debug support: no
>>>> MPI parameter check: runtime
>>>> Memory profiling support: no
>>>> Memory debugging support: no
>>>> libltdl support: yes
>>>> Heterogeneous support: yes
>>>> mpirun default --prefix: yes
>>>> MCA mca: parameter "mca_param_files" (current
>>>> value:
>>>>
>>>> "/home/tomcat/.openmpi/mca-params.conf:/opt/SUNWhpc/HPC7.0/etc/
>>>> openmpi-mca-par
>>>> ams.conf")
>>>> Path for MCA configuration files
>>>> containing
>>>> default parameter
>>>> values
>>>> MCA mca: parameter "mca_component_path" (current
>>>> value:
>>>>
>>>> "/opt/SUNWhpc/HPC7.0/lib/openmpi:/home/tomcat/.openmpi/components")
>>>> Path where to look for Open MPI and
>>>> ORTE components
>>>> MCA mca: parameter "mca_verbose" (current value:
>>>> <none>)
>>>> Top-level verbosity parameter
>>>> MCA mca: parameter "mca_component_show_load_errors"
>>>> (current value: "0")
>>>> Whether to show errors for components that
>>>> failed to load or
>>>> not
>>>> MCA mca: parameter "mca_component_disable_dlopen"
>>>> (current value: "0")
>>>> Whether to attempt to disable opening
>>>> dynamic components or not
>>>> MCA mpi: parameter "mpi_param_check" (current
>>>> value: "1")
>>>> Whether you want MPI API parameters
>>>> checked
>>>> at run-time or not.
>>>> Possible values are 0 (no checking) and 1
>>>> (perform checking at
>>>> run-time)
>>>> MCA mpi: parameter
>>>> "mpi_yield_when_idle" (current value:
>>>> "0")
>>>> Yield the processor when waiting for MPI
>>>> communication (for MPI
>>>> processes, will default to 1 when
>>>> oversubscribing nodes)
>>>> MCA mpi: parameter
>>>> "mpi_event_tick_rate" (current value:
>>>> "-1")
>>>> How often to progress TCP
>>>> communications (0
>>>> = never, otherwise
>>>> specified in microseconds)
>>>> MCA mpi: parameter "mpi_show_handle_leaks" (current
>>>> value: "0")
>>>> Whether MPI_FINALIZE shows all MPI handles
>>>> that were not freed
>>>> or not
>>>> MCA mpi: parameter
>>>> "mpi_no_free_handles" (current value:
>>>> "0")
>>>> Whether to actually free MPI objects when
>>>> their handles are
>>>> freed
>>>> MCA mpi: parameter
>>>> "mpi_show_mca_params" (current value:
>>>> "0")
>>>> Whether to show all MCA parameter value
>>>> during MPI_INIT or not
>>>> (good for reproducability of MPI jobs)
>>>> MCA mpi: parameter "mpi_show_mca_params_file"
>>>> (current value: <none>)
>>>> If mpi_show_mca_params is true, setting
>>>> this string to a valid
>>>> filename tells Open MPI to dump all the
>>>> MCA
>>>> parameter values
>>>> into a file suitable for reading via the
>>>> mca_param_files
>>>> parameter (good for reproducability of
>>>> MPI jobs)
>>>> MCA mpi: parameter
>>>> "mpi_paffinity_alone" (current value:
>>>> "0")
>>>> If nonzero, assume that this job is the
>>>> only (set
>>>> of)
>>>> process(es) running on each node and bind
>>>> processes to
>>>> processors, starting with processor ID 0
>>>> MCA mpi: parameter "mpi_keep_peer_hostnames"
>>>> (current value: "1")
>>>> If nonzero, save the string hostnames of
>>>> all MPI peer processes
>>>> (mostly for error / debugging output
>>>> messages). This can add
>>>> quite a bit of memory usage to each MPI
>>>> process.
>>>> MCA mpi: parameter "mpi_abort_delay" (current
>>>> value: "0")
>>>> If nonzero, print out an identifying
>>>> message when MPI_ABORT is
>>>> invoked (hostname, PID of the process that
>>>> called MPI_ABORT) and
>>>> delay for that many seconds before exiting
>>>> (a negative delay
>>>> value means to never abort). This allows
>>>> attaching of a
>>>> debugger before quitting the job.
>>>> MCA mpi: information
>>>> "mpi_abort_print_stack" (value: "0")
>>>> If nonzero, print out a stack trace when
>>>> MPI_ABORT is invoked
>>>> MCA mpi: parameter "mpi_preconnect_all" (current
>>>> value: "0")
>>>> Whether to force MPI processes to create
>>>> connections / warmup
>>>> with *all* peers during MPI_INIT (vs.
>>>> making connections lazily
>>>> -- upon the first MPI traffic between each
>>>> process peer pair)
>>>> MCA mpi: parameter "mpi_preconnect_oob" (current
>>>> value: "0")
>>>> Whether to force MPI processes to fully
>>>> wire-up the OOB system
>>>> between MPI processes.
>>>> MCA mpi: parameter "mpi_leave_pinned" (current
>>>> value: "0")
>>>> Whether to use the "leave pinned" protocol
>>>> or not. Enabling
>>>> this setting can help bandwidth
>>>> performance
>>>> when repeatedly
>>>> sending and receiving large messages with
>>>> the same buffers over
>>>> RDMA-based networks.
>>>> MCA mpi: parameter "mpi_leave_pinned_pipeline"
>>>> (current value: "0")
>>>> Whether to use the "leave pinned pipeline"
>>>> protocol or not.
>>>> MCA orte: parameter "orte_debug" (current value:
>>>> "0")
>>>> Top-level ORTE debug switch
>>>> MCA orte: parameter "orte_no_daemonize" (current
>>>> value: "0")
>>>> Whether to properly daemonize the ORTE
>>>> daemons or
>>>> not
>>>> MCA orte: parameter "orte_base_user_debugger"
>>>> (current value: "totalview
>>>> @mpirun@ -a @mpirun_args@ : fxp
>>>> @mpirun@ -a
>>>> @mpirun_args@")
>>>> Sequence of user-level debuggers to search
>>>> for in orterun
>>>> MCA orte: parameter "orte_abort_timeout" (current
>>>> value:
>>>> "10")
>>>> Time to wait [in seconds] before giving up
>>>> on aborting an ORTE
>>>> operation
>>>> MCA orte: parameter "orte_timing" (current value:
>>>> "0")
>>>> Request that critical timing loops be
>>>> measured
>>>> MCA opal: parameter "opal_signal" (current value:
>>>> "6,10,8,11")
>>>> If a signal is received, display the stack
>>>> trace frame
>>>> MCA backtrace: parameter "backtrace" (current value:
>>>> <none>)
>>>> Default selection set of components for
>>>> the
>>>> backtrace framework
>>>> (<none> means "use all components that
>>>> can be
>>>> found")
>>>> MCA backtrace: parameter
>>>> "backtrace_base_verbose" (current
>>>> value: "0")
>>>> Verbosity level for the backtrace
>>>> framework
>>>> (0 = no verbosity)
>>>> MCA backtrace: parameter "backtrace_printstack_priority"
>>>> (current value: "0")
>>>> MCA memory: parameter "memory" (current value: <none>)
>>>> Default selection set of components for
>>>> the
>>>> memory framework
>>>> (<none> means "use all components that
>>>> can be
>>>> found")
>>>> MCA memory: parameter
>>>> "memory_base_verbose" (current value:
>>>> "0")
>>>> Verbosity level for the memory
>>>> framework (0
>>>> = no verbosity)
>>>> MCA paffinity: parameter "paffinity" (current value:
>>>> <none>)
>>>> Default selection set of components for
>>>> the
>>>> paffinity framework
>>>> (<none> means "use all components that
>>>> can be
>>>> found")
>>>> MCA paffinity: parameter "paffinity_solaris_priority"
>>>> (current value: "10")
>>>> Priority of the solaris paffinity
>>>> component
>>>> MCA maffinity: parameter "maffinity" (current value:
>>>> <none>)
>>>> Default selection set of components for
>>>> the
>>>> maffinity framework
>>>> (<none> means "use all components that
>>>> can be
>>>> found")
>>>> MCA maffinity: parameter "maffinity_first_use_priority"
>>>> (current value: "10")
>>>> Priority of the first_use maffinity
>>>> component
>>>> MCA timer: parameter "timer" (current value: <none>)
>>>> Default selection set of components for
>>>> the
>>>> timer framework
>>>> (<none> means "use all components that
>>>> can be
>>>> found")
>>>> MCA timer: parameter "timer_base_verbose" (current
>>>> value: "0")
>>>> Verbosity level for the timer framework (0
>>>> = no verbosity)
>>>> MCA timer: parameter
>>>> "timer_solaris_priority" (current
>>>> value: "0")
>>>> MCA allocator: parameter "allocator" (current value:
>>>> <none>)
>>>> Default selection set of components for
>>>> the
>>>> allocator framework
>>>> (<none> means "use all components that
>>>> can be
>>>> found")
>>>> MCA allocator: parameter
>>>> "allocator_base_verbose" (current
>>>> value: "0")
>>>> Verbosity level for the allocator
>>>> framework
>>>> (0 = no verbosity)
>>>> MCA allocator: parameter "allocator_basic_priority"
>>>> (current value: "0")
>>>> MCA allocator: parameter "allocator_bucket_num_buckets"
>>>> (current value: "30")
>>>> MCA allocator: parameter "allocator_bucket_priority"
>>>> (current value: "0")
>>>> MCA coll: parameter "coll" (current value: <none>)
>>>> Default selection set of components for
>>>> the
>>>> coll framework
>>>> (<none> means "use all components that
>>>> can be
>>>> found")
>>>> MCA coll: parameter "coll_base_verbose" (current
>>>> value: "0")
>>>> Verbosity level for the coll framework
>>>> (0 =
>>>> no verbosity)
>>>> MCA coll: parameter
>>>> "coll_basic_priority" (current value:
>>>> "10")
>>>> Priority of the basic coll component
>>>> MCA coll: parameter
>>>> "coll_basic_crossover" (current value:
>>>> "4")
>>>> Minimum number of processes in a
>>>> communicator before using the
>>>> logarithmic algorithms
>>>> MCA coll: parameter "coll_self_priority" (current
>>>> value:
>>>> "75")
>>>> MCA coll: parameter "coll_sm_priority" (current
>>>> value: "0")
>>>> Priority of the sm coll component
>>>> MCA coll: parameter "coll_sm_control_size" (current
>>>> value: "4096")
>>>> Length of the control data -- should
>>>> usually be either the
>>>> length of a cache line on most SMPs, or
>>>> the
>>>> size of a page on
>>>> machines that support direct memory
>>>> affinity page placement (in
>>>> bytes)
>>>> MCA coll: parameter "coll_sm_bootstrap_filename"
>>>> (current value:
>>>> "shared_mem_sm_bootstrap")
>>>> Filename (in the Open MPI session
>>>> directory) of the coll sm
>>>> component bootstrap rendezvous mmap file
>>>> MCA coll: parameter "coll_sm_bootstrap_num_segments"
>>>> (current value: "8")
>>>> Number of segments in the bootstrap file
>>>> MCA coll: parameter "coll_sm_fragment_size" (current
>>>> value: "8192")
>>>> Fragment size (in bytes) used for passing
>>>> data through shared
>>>> memory (will be rounded up to the nearest
>>>> control_size size)
>>>> MCA coll: parameter "coll_sm_mpool" (current
>>>> value: "sm")
>>>> Name of the mpool component to use
>>>> MCA coll: parameter "coll_sm_comm_in_use_flags"
>>>> (current value: "2")
>>>> Number of "in use" flags, used to mark a
>>>> message passing area
>>>> segment as currently being used or not
>>>> (must be >= 2 and <=
>>>> comm_num_segments)
>>>> MCA coll: parameter "coll_sm_comm_num_segments"
>>>> (current value: "8")
>>>> Number of segments in each communicator's
>>>> shared memory message
>>>> passing area (must be >= 2, and must be
>>>> a multiple
>>>> of
>>>> comm_in_use_flags)
>>>> MCA coll: parameter
>>>> "coll_sm_tree_degree" (current value:
>>>> "4")
>>>> Degree of the tree for tree-based
>>>> operations (must be => 1 and
>>>> <= min(control_size, 255))
>>>> MCA coll: information
>>>> "coll_sm_shared_mem_used_bootstrap" (value: "160")
>>>> Amount of shared memory used in the shared
>>>> memory bootstrap area
>>>> (in bytes)
>>>> MCA coll: parameter
>>>> "coll_sm_info_num_procs" (current
>>>> value: "4")
>>>> Number of processes to use for the
>>>> calculation of
>>>> the
>>>> shared_mem_size MCA information parameter
>>>> (must be => 2)
>>>> MCA coll: information "coll_sm_shared_mem_used_data"
>>>> (value: "548864")
>>>> Amount of shared memory used in the shared
>>>> memory data area for
>>>> info_num_procs processes (in bytes)
>>>> MCA coll: parameter
>>>> "coll_tuned_priority" (current value:
>>>> "30")
>>>> Priority of the tuned coll component
>>>> MCA coll: parameter
>>>> "coll_tuned_pre_allocate_memory_comm_size_limit"
>>>> (current value: "32768")
>>>> Size of communicator were we stop
>>>> pre-allocating memory for the
>>>> fixed internal buffer used for message
>>>> requests etc that is hung
>>>> off the communicator data segment. I.e. if
>>>> you have a 100'000
>>>> nodes you might not want to pre-allocate
>>>> 200'000 request handle
>>>> slots per communicator instance!
>>>> MCA coll: parameter "coll_tuned_init_tree_fanout"
>>>> (current value: "4")
>>>> Inital fanout used in the tree topologies
>>>> for each communicator.
>>>> This is only an initial guess, if a tuned
>>>> collective needs a
>>>> different fanout for an operation, it
>>>> build
>>>> it dynamically. This
>>>> parameter is only for the first guess and
>>>> might save a little
>>>> time
>>>> MCA coll: parameter "coll_tuned_init_chain_fanout"
>>>> (current value: "4")
>>>> Inital fanout used in the chain (fanout
>>>> followed by pipeline)
>>>> topologies for each communicator. This is
>>>> only an initial guess,
>>>> if a tuned collective needs a different
>>>> fanout for an operation,
>>>> it build it dynamically. This parameter is
>>>> only for the first
>>>> guess and might save a little time
>>>> MCA coll: parameter "coll_tuned_use_dynamic_rules"
>>>> (current value: "0")
>>>> Switch used to decide if we use static
>>>> (compiled/if statements)
>>>> or dynamic (built at runtime) decision
>>>> function
>>>> rules
>>>> MCA io: parameter "io_base_freelist_initial_size"
>>>> (current value: "16")
>>>> Initial MPI-2 IO request freelist size
>>>> MCA io: parameter "io_base_freelist_max_size"
>>>> (current value: "64")
>>>> Max size of the MPI-2 IO request freelist
>>>> MCA io: parameter "io_base_freelist_increment"
>>>> (current value: "16")
>>>> Increment size of the MPI-2 IO request
>>>> freelist
>>>> MCA io: parameter "io" (current value: <none>)
>>>> Default selection set of components for
>>>> the
>>>> io framework (<none>
>>>> means "use all components that can be
>>>> found")
>>>> MCA io: parameter "io_base_verbose" (current
>>>> value: "0")
>>>> Verbosity level for the io framework (0 =
>>>> no verbosity)
>>>> MCA io: parameter "io_romio_priority" (current
>>>> value: "10")
>>>> Priority of the io romio component
>>>> MCA io: parameter "io_romio_delete_priority"
>>>> (current value: "10")
>>>> Delete priority of the io romio component
>>>> MCA io: parameter
>>>> "io_romio_enable_parallel_optimizations" (current
>>>> value: "0")
>>>> Enable set of Open MPI-added options to
>>>> improve collective file
>>>> i/o performance
>>>> MCA mpool: parameter "mpool" (current value: <none>)
>>>> Default selection set of components for
>>>> the
>>>> mpool framework
>>>> (<none> means "use all components that
>>>> can be
>>>> found")
>>>> MCA mpool: parameter "mpool_base_verbose" (current
>>>> value: "0")
>>>> Verbosity level for the mpool framework (0
>>>> = no verbosity)
>>>> MCA mpool: parameter "mpool_sm_allocator" (current
>>>> value: "bucket")
>>>> Name of allocator component to use with
>>>> sm mpool
>>>> MCA mpool: parameter "mpool_sm_max_size" (current
>>>> value: "536870912")
>>>> Maximum size of the sm mpool shared
>>>> memory file
>>>> MCA mpool: parameter "mpool_sm_min_size" (current
>>>> value: "134217728")
>>>> Minimum size of the sm mpool shared
>>>> memory file
>>>> MCA mpool: parameter
>>>> "mpool_sm_per_peer_size" (current
>>>> value: "33554432")
>>>> Size (in bytes) to allocate per local peer
>>>> in the sm mpool
>>>> shared memory file, bounded by min_size
>>>> and
>>>> max_size
>>>> MCA mpool: parameter "mpool_sm_priority" (current
>>>> value: "0")
>>>> MCA mpool: parameter
>>>> "mpool_udapl_priority" (current value:
>>>> "0")
>>>> MCA mpool: parameter "mpool_base_use_mem_hooks"
>>>> (current value: "0")
>>>> use memory hooks for deregistering
>>>> freed memory
>>>> MCA mpool: parameter
>>>> "mpool_use_mem_hooks" (current value:
>>>> "0")
>>>> (deprecated, use mpool_base_use_mem_hooks)
>>>> MCA pml: parameter "pml" (current value: <none>)
>>>> Default selection set of components for
>>>> the
>>>> pml framework
>>>> (<none> means "use all components that
>>>> can be
>>>> found")
>>>> MCA pml: parameter "pml_base_verbose" (current
>>>> value: "0")
>>>> Verbosity level for the pml framework (0 =
>>>> no verbosity)
>>>> MCA pml: parameter
>>>> "pml_cm_free_list_num" (current value:
>>>> "4")
>>>> Initial size of request free lists
>>>> MCA pml: parameter "pml_cm_free_list_max" (current
>>>> value: "-1")
>>>> Maximum size of request free lists
>>>> MCA pml: parameter "pml_cm_free_list_inc" (current
>>>> value: "64")
>>>> Number of elements to add when growing
>>>> request free lists
>>>> MCA pml: parameter "pml_cm_priority" (current
>>>> value: "30")
>>>> CM PML selection priority
>>>> MCA pml: parameter "pml_ob1_free_list_num" (current
>>>> value: "4")
>>>> MCA pml: parameter "pml_ob1_free_list_max" (current
>>>> value: "-1")
>>>> MCA pml: parameter "pml_ob1_free_list_inc" (current
>>>> value: "64")
>>>> MCA pml: parameter "pml_ob1_priority" (current
>>>> value: "20")
>>>> MCA pml: parameter "pml_ob1_eager_limit" (current
>>>> value: "131072")
>>>> MCA pml: parameter "pml_ob1_send_pipeline_depth"
>>>> (current value: "3")
>>>> MCA pml: parameter "pml_ob1_recv_pipeline_depth"
>>>> (current value: "4")
>>>> MCA bml: parameter "bml" (current value: <none>)
>>>> Default selection set of components for
>>>> the
>>>> bml framework
>>>> (<none> means "use all components that
>>>> can be
>>>> found")
>>>> MCA bml: parameter "bml_base_verbose" (current
>>>> value: "0")
>>>> Verbosity level for the bml framework (0 =
>>>> no verbosity)
>>>> MCA bml: parameter "bml_r2_show_unreach_errors"
>>>> (current value: "1")
>>>> Show error message when procs are
>>>> unreachable
>>>> MCA bml: parameter "bml_r2_priority" (current
>>>> value: "0")
>>>> MCA rcache: parameter "rcache" (current value: <none>)
>>>> Default selection set of components for
>>>> the
>>>> rcache framework
>>>> (<none> means "use all components that
>>>> can be
>>>> found")
>>>> MCA rcache: parameter
>>>> "rcache_base_verbose" (current value:
>>>> "0")
>>>> Verbosity level for the rcache
>>>> framework (0
>>>> = no verbosity)
>>>> MCA rcache: parameter "rcache_rb_priority" (current
>>>> value: "0")
>>>> MCA rcache: parameter "rcache_vma_mru_len" (current
>>>> value:
>>>> "256")
>>>> The maximum size IN ENTRIES of the MRU
>>>> (most recently used)
>>>> rcache list
>>>> MCA rcache: parameter "rcache_vma_mru_size" (current
>>>> value: "1073741824")
>>>> The maximum size IN BYTES of the MRU (most
>>>> recently used) rcache
>>>> list
>>>> MCA rcache: parameter
>>>> "rcache_vma_priority" (current value:
>>>> "0")
>>>> MCA btl: parameter "btl_base_debug" (current
>>>> value: "0")
>>>> If btl_base_debug is 1 standard debug is
>>>> output, if > 1 verbose
>>>> debug is output
>>>> MCA btl: parameter "btl" (current value: <none>)
>>>> Default selection set of components for
>>>> the
>>>> btl framework
>>>> (<none> means "use all components that
>>>> can be
>>>> found")
>>>> MCA btl: parameter "btl_base_verbose" (current
>>>> value: "0")
>>>> Verbosity level for the btl framework (0 =
>>>> no verbosity)
>>>> MCA btl: parameter
>>>> "btl_self_free_list_num" (current
>>>> value: "0")
>>>> Number of fragments by default
>>>> MCA btl: parameter
>>>> "btl_self_free_list_max" (current
>>>> value: "-1")
>>>> Maximum number of fragments
>>>> MCA btl: parameter
>>>> "btl_self_free_list_inc" (current
>>>> value: "32")
>>>> Increment by this number of fragments
>>>> MCA btl: parameter "btl_self_eager_limit" (current
>>>> value: "131072")
>>>> Eager size fragmeng (before the rendez-
>>>> vous
>>>> ptotocol)
>>>> MCA btl: parameter
>>>> "btl_self_min_send_size" (current
>>>> value: "262144")
>>>> Minimum fragment size after the rendez-
>>>> vous
>>>> MCA btl: parameter
>>>> "btl_self_max_send_size" (current
>>>> value: "262144")
>>>> Maximum fragment size after the rendez-
>>>> vous
>>>> MCA btl: parameter
>>>> "btl_self_min_rdma_size" (current value:
>>>> "2147483647")
>>>> Maximum fragment size for the RDMA
>>>> transfer
>>>> MCA btl: parameter
>>>> "btl_self_max_rdma_size" (current value:
>>>> "2147483647")
>>>> Maximum fragment size for the RDMA
>>>> transfer
>>>> MCA btl: parameter "btl_self_exclusivity" (current
>>>> value: "65536")
>>>> Device exclusivity
>>>> MCA btl: parameter "btl_self_flags" (current
>>>> value: "10")
>>>> Active behavior flags
>>>> MCA btl: parameter "btl_self_priority" (current
>>>> value: "0")
>>>> MCA btl: parameter
>>>> "btl_sm_free_list_num" (current value:
>>>> "8")
>>>> MCA btl: parameter "btl_sm_free_list_max" (current
>>>> value: "-1")
>>>> MCA btl: parameter "btl_sm_free_list_inc" (current
>>>> value: "64")
>>>> MCA btl: parameter "btl_sm_exclusivity" (current
>>>> value: "65535")
>>>> MCA btl: parameter "btl_sm_latency" (current
>>>> value: "100")
>>>> MCA btl: parameter "btl_sm_max_procs" (current
>>>> value: "-1")
>>>> MCA btl: parameter "btl_sm_sm_extra_procs" (current
>>>> value: "2")
>>>> MCA btl: parameter "btl_sm_mpool" (current
>>>> value: "sm")
>>>> MCA btl: parameter "btl_sm_eager_limit" (current
>>>> value: "4096")
>>>> MCA btl: parameter "btl_sm_max_frag_size" (current
>>>> value: "32768")
>>>> MCA btl: parameter "btl_sm_size_of_cb_queue"
>>>> (current value: "128")
>>>> MCA btl: parameter "btl_sm_cb_lazy_free_freq"
>>>> (current value: "120")
>>>> MCA btl: parameter "btl_sm_priority" (current
>>>> value: "0")
>>>> MCA btl: parameter "btl_tcp_if_include" (current
>>>> value: <none>)
>>>> MCA btl: parameter "btl_tcp_if_exclude" (current
>>>> value:
>>>> "lo")
>>>> MCA btl: parameter "btl_tcp_free_list_num" (current
>>>> value: "8")
>>>> MCA btl: parameter "btl_tcp_free_list_max" (current
>>>> value: "-1")
>>>> MCA btl: parameter "btl_tcp_free_list_inc" (current
>>>> value: "32")
>>>> MCA btl: parameter "btl_tcp_sndbuf" (current value:
>>>> "131072")
>>>> MCA btl: parameter "btl_tcp_rcvbuf" (current value:
>>>> "131072")
>>>> MCA btl: parameter
>>>> "btl_tcp_endpoint_cache" (current
>>>> value: "30720")
>>>> MCA btl: parameter
>>>> "btl_tcp_exclusivity" (current value:
>>>> "0")
>>>> MCA btl: parameter "btl_tcp_eager_limit" (current
>>>> value: "65536")
>>>> MCA btl: parameter "btl_tcp_min_send_size" (current
>>>> value: "65536")
>>>> MCA btl: parameter "btl_tcp_max_send_size" (current
>>>> value: "131072")
>>>> MCA btl: parameter "btl_tcp_min_rdma_size" (current
>>>> value: "131072")
>>>> MCA btl: parameter "btl_tcp_max_rdma_size" (current
>>>> value: "2147483647")
>>>> MCA btl: parameter "btl_tcp_flags" (current
>>>> value: "122")
>>>> MCA btl: parameter "btl_tcp_priority" (current
>>>> value: "0")
>>>> MCA btl: parameter "btl_udapl_free_list_num"
>>>> (current value: "8")
>>>> Initial size of free lists (must be >= 1).
>>>> MCA btl: parameter "btl_udapl_free_list_max"
>>>> (current value: "-1")
>>>> Maximum size of free lists (-1 = infinite,
>>>> otherwise must be >=
>>>> 1).
>>>> MCA btl: parameter "btl_udapl_free_list_inc"
>>>> (current value: "8")
>>>> Increment size of free lists (must be
>>>>> = 1).
>>>> MCA btl: parameter "btl_udapl_mpool" (current
>>>> value:
>>>> "udapl")
>>>> Name of the memory pool to be used.
>>>> MCA btl: parameter "btl_udapl_max_modules" (current
>>>> value: "8")
>>>> Maximum number of supported HCAs.
>>>> MCA btl: parameter
>>>> "btl_udapl_num_recvs" (current value:
>>>> "8")
>>>> Total number of receive buffers to keep
>>>> posted per endpoint
>>>> (must be >= 1).
>>>> MCA btl: parameter
>>>> "btl_udapl_num_sends" (current value:
>>>> "7")
>>>> Maximum number of sends to post on an
>>>> endpoint (must be >= 1).
>>>> MCA btl: parameter "btl_udapl_sr_win" (current
>>>> value: "4")
>>>> Window size at which point an explicit
>>>> credit message will be
>>>> generated (must be >= 1).
>>>> MCA btl: parameter "btl_udapl_eager_rdma_num"
>>>> (current value: "32")
>>>> Number of RDMA buffers to allocate for
>>>> small messages (must be
>>>>> = 1).
>>>> MCA btl: parameter "btl_udapl_max_eager_rdma_peers"
>>>> (current value:
>>>> "16")
>>>> Maximum number of peers allowed to use
>>>> RDMA
>>>> for short messages
>>>> (independently RDMA will still be used for
>>>> large messages, (must
>>>> be >= 0; if zero then RDMA will not be
>>>> used for
>>>> short
>>>> messages).
>>>> MCA btl: parameter "btl_udapl_eager_rdma_win"
>>>> (current value: "28")
>>>> Window size at which point an explicit
>>>> credit message will be
>>>> generated (must be >= 1).
>>>> MCA btl: parameter "btl_udapl_timeout" (current
>>>> value: "10000000")
>>>> Connection timeout, in microseconds.
>>>> MCA btl: parameter "btl_udapl_conn_priv_data"
>>>> (current value: "1")
>>>> Use connect private data to establish
>>>> connections (not supported
>>>> by all uDAPL implementations).
>>>> MCA btl: parameter
>>>> "btl_udapl_async_events" (current
>>>> value: "100000000")
>>>> The asynchronous event queue will only be
>>>> checked after entering
>>>> progress this number of times.
>>>> MCA btl: parameter "btl_udapl_buffer_alignment"
>>>> (current value: "256")
>>>> Preferred communication buffer alignment,
>>>> in bytes (must be >=
>>>> 1).
>>>> MCA btl: parameter "btl_udapl_async_evd_qlen"
>>>> (current value: "256")
>>>> The asynchronous event dispatcher queue
>>>> length.
>>>> MCA btl: parameter "btl_udapl_conn_evd_qlen"
>>>> (current value: "256")
>>>> The connection event dispatcher queue
>>>> length is a function of
>>>> the number of connections expected.
>>>> MCA btl: parameter
>>>> "btl_udapl_dto_evd_qlen" (current
>>>> value: "256")
>>>> The data transfer operation event
>>>> dispatcher queue length is a
>>>> function of the number of connections as
>>>> well as the maximum
>>>> number of outstanding data transfer
>>>> operations.
>>>> MCA btl: parameter "btl_udapl_max_request_dtos"
>>>> (current value: "76")
>>>> Maximum number of outstanding submitted
>>>> sends and rdma
>>>> operations per endpoint, (see Section
>>>> 6.6.6
>>>> of uDAPL Spec.).
>>>> MCA btl: parameter "btl_udapl_max_recv_dtos"
>>>> (current value: "8")
>>>> Maximum number of outstanding submitted
>>>> receive operations per
>>>> endpoint, (see Section 6.6.6 of uDAPL
>>>> Spec.).
>>>> MCA btl: parameter "btl_udapl_exclusivity" (current
>>>> value: "1014")
>>>> uDAPL BTL exclusivity (must be >= 0).
>>>> MCA btl: parameter "btl_udapl_eager_limit" (current
>>>> value: "8192")
>>>> Eager send limit, in bytes (must be >= 1).
>>>> MCA btl: parameter "btl_udapl_min_send_size"
>>>> (current value: "16384")
>>>> Minimum send size, in bytes (must be >=
>>>> 1).
>>>> MCA btl: parameter "btl_udapl_max_send_size"
>>>> (current value: "65536")
>>>> Maximum send size, in bytes (must be >=
>>>> 1).
>>>> MCA btl: parameter "btl_udapl_min_rdma_size"
>>>> (curr