Hi,
Graham E Fagg wrote:
> I am not sure which alltoall your using in 1.1 so can you please run
> the ompi_info utility which is normally built and put into the same
> directory as mpirun?
>
> i.e. host% ompi_info
>
> This provides lots of really usefull info on everything before we dig
> deeper into your issue
>
>
> and then more specifically run
> host% ompi_info --param coll all
Find attached ~/notes from
$ ( ompi_info; echo '====================='; ompi_info --param coll all ) >~/notes
Thanks in advance and kind regards,
--
Frank Gruellich
HPC-Techniker
Tel.: +49 3722 528 42
Fax: +49 3722 528 15
E-Mail: frank.gruellich_at_[hidden]
MEGWARE Computer GmbH
Vertrieb und Service
Nordstrasse 19
09247 Chemnitz/Roehrsdorf
Germany
http://www.megware.com/
Open MPI: 1.1b1
Open MPI SVN revision: r10217
Open RTE: 1.1b1
Open RTE SVN revision: r10217
OPAL: 1.1b1
OPAL SVN revision: r10217
Prefix: /usr/ofed/mpi/intel/openmpi-1.1b1-1
Configured architecture: x86_64-suse-linux-gnu
Configured by: root
Configured on: Wed Jul 19 20:51:46 CEST 2006
Configure host: frontend
Built by: root
Built on: Wed Jul 19 21:04:47 CEST 2006
Built host: frontend
C bindings: yes
C++ bindings: yes
Fortran77 bindings: yes (all)
Fortran90 bindings: yes
Fortran90 bindings size: small
C compiler: icc
C compiler absolute: /software/intel/cce/9.1.038/bin/icc
C++ compiler: icpc
C++ compiler absolute: /software/intel/cce/9.1.038/bin/icpc
Fortran77 compiler: ifort
Fortran77 compiler abs: /software/intel/fce/9.1.032/bin/ifort
Fortran90 compiler: gfortran
Fortran90 compiler abs: /usr/bin/gfortran
C profiling: yes
C++ profiling: yes
Fortran77 profiling: yes
Fortran90 profiling: yes
C++ exceptions: no
Thread support: posix (mpi: no, progress: no)
Internal debug support: no
MPI parameter check: runtime
Memory profiling support: no
Memory debugging support: no
libltdl support: yes
MCA memory: ptmalloc2 (MCA v1.0, API v1.0, Component v1.1)
MCA paffinity: linux (MCA v1.0, API v1.0, Component v1.1)
MCA maffinity: first_use (MCA v1.0, API v1.0, Component v1.1)
MCA maffinity: libnuma (MCA v1.0, API v1.0, Component v1.1)
MCA timer: linux (MCA v1.0, API v1.0, Component v1.1)
MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0)
MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0)
MCA coll: basic (MCA v1.0, API v1.0, Component v1.1)
MCA coll: hierarch (MCA v1.0, API v1.0, Component v1.1)
MCA coll: self (MCA v1.0, API v1.0, Component v1.1)
MCA coll: sm (MCA v1.0, API v1.0, Component v1.1)
MCA coll: tuned (MCA v1.0, API v1.0, Component v1.1)
MCA io: romio (MCA v1.0, API v1.0, Component v1.1)
MCA mpool: openib (MCA v1.0, API v1.0, Component v1.1)
MCA mpool: sm (MCA v1.0, API v1.0, Component v1.1)
MCA pml: dr (MCA v1.0, API v1.0, Component v1.1)
MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.1)
MCA bml: r2 (MCA v1.0, API v1.0, Component v1.1)
MCA rcache: rb (MCA v1.0, API v1.0, Component v1.1)
MCA btl: openib (MCA v1.0, API v1.0, Component v1.1)
MCA btl: self (MCA v1.0, API v1.0, Component v1.1)
MCA btl: sm (MCA v1.0, API v1.0, Component v1.1)
MCA btl: tcp (MCA v1.0, API v1.0, Component v1.0)
MCA topo: unity (MCA v1.0, API v1.0, Component v1.1)
MCA osc: pt2pt (MCA v1.0, API v1.0, Component v1.0)
MCA gpr: null (MCA v1.0, API v1.0, Component v1.1)
MCA gpr: proxy (MCA v1.0, API v1.0, Component v1.1)
MCA gpr: replica (MCA v1.0, API v1.0, Component v1.1)
MCA iof: proxy (MCA v1.0, API v1.0, Component v1.1)
MCA iof: svc (MCA v1.0, API v1.0, Component v1.1)
MCA ns: proxy (MCA v1.0, API v1.0, Component v1.1)
MCA ns: replica (MCA v1.0, API v1.0, Component v1.1)
MCA oob: tcp (MCA v1.0, API v1.0, Component v1.0)
MCA ras: dash_host (MCA v1.0, API v1.0, Component v1.1)
MCA ras: hostfile (MCA v1.0, API v1.0, Component v1.1)
MCA ras: localhost (MCA v1.0, API v1.0, Component v1.1)
MCA ras: slurm (MCA v1.0, API v1.0, Component v1.1)
MCA rds: hostfile (MCA v1.0, API v1.0, Component v1.1)
MCA rds: resfile (MCA v1.0, API v1.0, Component v1.1)
MCA rmaps: round_robin (MCA v1.0, API v1.0, Component v1.1)
MCA rmgr: proxy (MCA v1.0, API v1.0, Component v1.1)
MCA rmgr: urm (MCA v1.0, API v1.0, Component v1.1)
MCA rml: oob (MCA v1.0, API v1.0, Component v1.1)
MCA pls: fork (MCA v1.0, API v1.0, Component v1.1)
MCA pls: rsh (MCA v1.0, API v1.0, Component v1.1)
MCA pls: slurm (MCA v1.0, API v1.0, Component v1.1)
MCA sds: env (MCA v1.0, API v1.0, Component v1.1)
MCA sds: pipe (MCA v1.0, API v1.0, Component v1.1)
MCA sds: seed (MCA v1.0, API v1.0, Component v1.1)
MCA sds: singleton (MCA v1.0, API v1.0, Component v1.1)
MCA sds: slurm (MCA v1.0, API v1.0, Component v1.1)
=====================
MCA coll: parameter "coll" (current value: <none>)
Default selection set of components for the coll framework (<none> means "use all components that can be found")
MCA coll: parameter "coll_base_verbose" (current value: "0")
Verbosity level for the coll framework (0 = no verbosity)
MCA coll: parameter "coll_basic_priority" (current value: "10")
Priority of the basic coll component
MCA coll: parameter "coll_basic_crossover" (current value: "4")
Minimum number of processes in a communicator before using the logarithmic algorithms
MCA coll: parameter "coll_hierarch_priority" (current value: "0")
Priority of the hierarchical coll component
MCA coll: parameter "coll_hierarch_verbose" (current value: "0")
Turn verbose message of the hierarchical coll component on/off
MCA coll: parameter "coll_hierarch_use_rdma" (current value: "0")
Switch from the send btl list used to detect hierarchies to the rdma btl list
MCA coll: parameter "coll_hierarch_ignore_sm" (current value: "0")
Ignore sm protocol when detecting hierarchies. Required to enable the usage of protocol specific collective operations
MCA coll: parameter "coll_hierarch_symmetric" (current value: "0")
Assume symmetric configuration
MCA coll: parameter "coll_self_priority" (current value: "75")
MCA coll: parameter "coll_sm_priority" (current value: "0")
Priority of the sm coll component
MCA coll: parameter "coll_sm_control_size" (current value: "4096")
Length of the control data -- should usually be either the length of a cache line on most SMPs, or the size of a page on machines that support direct memory affinity page placement (in bytes)
MCA coll: parameter "coll_sm_bootstrap_filename" (current value: "coll-sm-bootstrap")
Filename (in the Open MPI session directory) of the coll sm component bootstrap rendezvous mmap file
MCA coll: parameter "coll_sm_bootstrap_num_segments" (current value: "8")
Number of segments in the bootstrap file
MCA coll: parameter "coll_sm_fragment_size" (current value: "8192")
Fragment size (in bytes) used for passing data through shared memory (will be rounded up to the nearest control_size size)
MCA coll: parameter "coll_sm_mpool" (current value: "sm")
Name of the mpool component to use
MCA coll: parameter "coll_sm_comm_in_use_flags" (current value: "2")
Number of "in use" flags, used to mark a message passing area segment as currently being used or not (must be >= 2 and <= comm_num_segments)
MCA coll: parameter "coll_sm_comm_num_segments" (current value: "8")
Number of segments in each communicator's shared memory message passing area (must be >= 2, and must be a multiple of comm_in_use_flags)
MCA coll: parameter "coll_sm_tree_degree" (current value: "4")
Degree of the tree for tree-based operations (must be => 1 and <= min(control_size, 255))
MCA coll: information "coll_sm_shared_mem_used_bootstrap" (value: "216")
Amount of shared memory used in the shared memory bootstrap area (in bytes)
MCA coll: parameter "coll_sm_info_num_procs" (current value: "4")
Number of processes to use for the calculation of the shared_mem_size MCA information parameter (must be => 2)
MCA coll: information "coll_sm_shared_mem_used_data" (value: "548864")
Amount of shared memory used in the shared memory data area for info_num_procs processes (in bytes)
MCA coll: parameter "coll_tuned_priority" (current value: "30")
Priority of the tuned coll component
MCA coll: parameter "coll_tuned_pre_allocate_memory_comm_size_limit" (current value: "32768")
Size of communicator were we stop pre-allocating memory for the fixed internal buffer used for message requests etc that is hung off the communicator data segment. I.e. if you have a 100'000 nodes you might not want to pre-allocate 200'000 request handle slots per communicator instance!
MCA coll: parameter "coll_tuned_use_dynamic_rules" (current value: "0")
Switch used to decide if we use static (if statements) or dynamic (built at runtime) decision function rules
MCA coll: parameter "coll_tuned_init_tree_fanout" (current value: "4")
Inital fanout used in the tree topologies for each communicator. This is only an initial guess, if a tuned collective needs a different fanout for an operation, it build it dynamically. This parameter is only for the first guess and might save a little time
MCA coll: parameter "coll_tuned_init_chain_fanout" (current value: "4")
Inital fanout used in the chain (fanout followed by pipeline) topologies for each communicator. This is only an initial guess, if a tuned collective needs a different fanout for an operation, it build it dynamically. This parameter is only for the first guess and might save a little time
MCA coll: parameter "coll_tuned_allreduce_algorithm" (current value: "0")
Which allreduce algorithm is used. Can be locked down to choice of: 0 ignore, 1 basic linear, 2 nonoverlapping (tuned reduce + tuned bcast)
MCA coll: parameter "coll_tuned_allreduce_algorithm_segmentsize" (current value: "0")
Segment size in bytes used by default for allreduce algorithms. Only has meaning if algorithm is forced and supports segmenting. 0 bytes means no segmentation.
MCA coll: parameter "coll_tuned_allreduce_algorithm_tree_fanout" (current value: "4")
Fanout for n-tree used for allreduce algorithms. Only has meaning if algorithm is forced and supports n-tree topo based operation.
MCA coll: parameter "coll_tuned_allreduce_algorithm_chain_fanout" (current value: "4")
Fanout for chains used for allreduce algorithms. Only has meaning if algorithm is forced and supports chain topo based operation.
MCA coll: parameter "coll_tuned_alltoall_algorithm" (current value: "0")
Which alltoall algorithm is used. Can be locked down to choice of: 0 ignore, 1 basic linear, 2 pairwise, 3: modified bruck, 4: two proc only.
MCA coll: parameter "coll_tuned_alltoall_algorithm_segmentsize" (current value: "0")
Segment size in bytes used by default for alltoall algorithms. Only has meaning if algorithm is forced and supports segmenting. 0 bytes means no segmentation.
MCA coll: parameter "coll_tuned_alltoall_algorithm_tree_fanout" (current value: "4")
Fanout for n-tree used for alltoall algorithms. Only has meaning if algorithm is forced and supports n-tree topo based operation.
MCA coll: parameter "coll_tuned_alltoall_algorithm_chain_fanout" (current value: "4")
Fanout for chains used for alltoall algorithms. Only has meaning if algorithm is forced and supports chain topo based operation.
MCA coll: parameter "coll_tuned_barrier_algorithm" (current value: "0")
Which barrier algorithm is used. Can be locked down to choice of: 0 ignore, 1 linear, 2 double ring, 3: recursive doubling 4: bruck, 5: two proc only, 6: step based bmtree
MCA coll: parameter "coll_tuned_bcast_algorithm" (current value: "0")
Which bcast algorithm is used. Can be locked down to choice of: 0 ignore, 1 basic linear, 2 chain, 3: pipeline, 4: split binary tree, 5: binary tree, 6: BM tree.
MCA coll: parameter "coll_tuned_bcast_algorithm_segmentsize" (current value: "0")
Segment size in bytes used by default for bcast algorithms. Only has meaning if algorithm is forced and supports segmenting. 0 bytes means no segmentation.
MCA coll: parameter "coll_tuned_bcast_algorithm_tree_fanout" (current value: "4")
Fanout for n-tree used for bcast algorithms. Only has meaning if algorithm is forced and supports n-tree topo based operation.
MCA coll: parameter "coll_tuned_bcast_algorithm_chain_fanout" (current value: "4")
Fanout for chains used for bcast algorithms. Only has meaning if algorithm is forced and supports chain topo based operation.
MCA coll: parameter "coll_tuned_reduce_algorithm" (current value: "0")
Which reduce algorithm is used. Can be locked down to choice of: 0 ignore, 1 linear, 2 chain, 3 pipeline
MCA coll: parameter "coll_tuned_reduce_algorithm_segmentsize" (current value: "0")
Segment size in bytes used by default for reduce algorithms. Only has meaning if algorithm is forced and supports segmenting. 0 bytes means no segmentation.
MCA coll: parameter "coll_tuned_reduce_algorithm_tree_fanout" (current value: "4")
Fanout for n-tree used for reduce algorithms. Only has meaning if algorithm is forced and supports n-tree topo based operation.
MCA coll: parameter "coll_tuned_reduce_algorithm_chain_fanout" (current value: "4")
Fanout for chains used for reduce algorithms. Only has meaning if algorithm is forced and supports chain topo based operation.
|