Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] Bug: coll_tuned_dynamic_rules_filename and duplicate communicators
From: Jumper, John (John.Jumper_at_[hidden])
Date: 2009-07-07 13:04:47


I am attempting to use coll_tuned_dynamic_rules_filename to tune Open
MPI 1.3.2. Based on my testing, it appears that the dynamic rules file
*only* influences the algorithm selection for MPI_COMM_WORLD. Any
duplicate communicators will only use fixed or forced rules, which may
have much worse performance than the custom-tuned collectives in the
dynamic rules file. The following code demonstrates the difference
between MPI_COMM_WORLD and a duplicate communicator.

test.c:
#include <mpi.h>

int main( int argc, char** argv ) {
  float u = 0.0, v = 0.0;
  MPI_Comm world_dup;
  
  MPI_Init( &argc, &argv );
  MPI_Comm_dup( MPI_COMM_WORLD, &world_dup );

  MPI_Allreduce( &u, &v, 1, MPI_FLOAT, MPI_SUM, world_dup );
  MPI_Barrier( MPI_COMM_WORLD );
  MPI_Allreduce( &u, &v, 1, MPI_FLOAT, MPI_SUM, MPI_COMM_WORLD );

  MPI_Finalize();
  return 0;
}

allreduce.ompi:
1
2
1
9
1
0 1 0 0

invocation:
orterun -np 9 \
        -mca btl self,sm,openib,tcp \
        -mca coll_tuned_use_dynamic_rules 1 \
        -mca coll_tuned_dynamic_rules_filename allreduce.ompi \
        -mca coll_base_verbose 1000 \
        -- test

This program is run with tracing, and the barrier is only used to
separate the allreduce calls in the trace. The trace for one node is at
the end of the message, and the relevant section is the choice of
algorithms for the two allreduce calls. The allreduce.ompi file
indicates that all size 9 communicators should use the basic linear
allreduce algorithm. MPI_COMM_WORLD uses basic_linear, but the
world_dup communicator uses the fixed algorithm (for this message size,
the fixed algorithm is recursive doubling).

Thank you.

John Jumper

Trace of one process for the above program:
mca: base: components_open: opening coll components
mca: base: components_open: found loaded component basic
mca: base: components_open: component basic register function successful
mca: base: components_open: component basic has no open function
mca: base: components_open: found loaded component hierarch
mca: base: components_open: component hierarch has no register function
mca: base: components_open: component hierarch open function successful
mca: base: components_open: found loaded component inter
mca: base: components_open: component inter has no register function
mca: base: components_open: component inter open function successful
mca: base: components_open: found loaded component self
mca: base: components_open: component self has no register function
mca: base: components_open: component self open function successful
mca: base: components_open: found loaded component sm
mca: base: components_open: component sm has no register function
mca: base: components_open: component sm open function successful
mca: base: components_open: found loaded component sync
mca: base: components_open: component sync register function successful
mca: base: components_open: component sync has no open function
mca: base: components_open: found loaded component tuned
mca: base: components_open: component tuned has no register function
coll:tuned:component_open: done!
mca: base: components_open: component tuned open function successful
coll:find_available: querying coll component basic
coll:find_available: coll component basic is available
coll:find_available: querying coll component hierarch
coll:find_available: coll component hierarch is available
coll:find_available: querying coll component inter
coll:find_available: coll component inter is available
coll:find_available: querying coll component self
coll:find_available: coll component self is available
coll:find_available: querying coll component sm
coll:find_available: coll component sm is available
coll:find_available: querying coll component sync
coll:find_available: coll component sync is available
coll:find_available: querying coll component tuned
coll:find_available: coll component tuned is available
coll:base:comm_select: new communicator: MPI_COMM_WORLD (cid 0)
coll:base:comm_select: Checking all available modules
coll:base:comm_select: component available: basic, priority: 10
coll:base:comm_select: component not available: hierarch
coll:base:comm_select: component not available: inter
coll:base:comm_select: component not available: self
coll:base:comm_select: component not available: sm
coll:base:comm_select: component not available: sync
coll:tuned:module_tuned query called
coll:tuned:module_query using intra_dynamic
coll:base:comm_select: component available: tuned, priority: 30
coll:tuned:module_init called.
coll:tuned:module_init MCW & Dynamic
coll:tuned:module_init Opening [allreduce.ompi]
Reading dynamic rule for collective ID 2
Read communicator count 1 for dynamic rule for collective ID 2
Read message count 1 for dynamic rule for collective ID 2 and comm size
9
Done reading dynamic rule for collective ID 2

Collectives with rules : 1
Communicator sizes with rules : 1
Message sizes with rules : 1
Lines in configuration file read : 0
coll:tuned:module_init Read 1 valid rules
Selected the following com rule id 0
alg_id 2 com_id 0 com_size 9
number of message sizes 1
alg_id 2 com_id 0 com_size 9 msg_id 0
msg_size 0 -> algorithm 1 topo in/out 0 segsize 0
max_requests 0
coll:tuned:topo_build_tree Building fo 4 rt 0
coll:tuned:topo_build_tree Building fo 2 rt 0
coll:tuned:topo:build_bmtree rt 0
coll:tuned:topo:build_in_order_bmtree rt 0
coll:tuned:topo:build_chain fo 4 rt 0
coll:tuned:topo:build_chain fo 1 rt 0
coll:tuned:topo_build_in_order_tree Building fo 2 rt 8
coll:tuned:module_init Tuned is in use
coll:base:comm_select: new communicator: MPI_COMM_SELF (cid 1)
coll:base:comm_select: Checking all available modules
coll:base:comm_select: component available: basic, priority: 10
coll:base:comm_select: component not available: hierarch
coll:base:comm_select: component not available: inter
coll:base:comm_select: component available: self, priority: 75
coll:base:comm_select: component not available: sm
coll:base:comm_select: component not available: sync
coll:tuned:module_tuned query called
coll:base:comm_select: component not available: tuned
coll:base:comm_select: new communicator: MPI COMMUNICATOR 4 DUP FROM 0
(cid 4)
coll:base:comm_select: Checking all available modules
coll:base:comm_select: component available: basic, priority: 10
coll:base:comm_select: component not available: hierarch
coll:base:comm_select: component not available: inter
coll:base:comm_select: component not available: self
coll:base:comm_select: component not available: sm
coll:base:comm_select: component not available: sync
coll:tuned:module_tuned query called
coll:tuned:module_query using intra_dynamic
coll:base:comm_select: component available: tuned, priority: 30
coll:tuned:module_init called.
coll:tuned:topo_build_tree Building fo 4 rt 0
coll:tuned:topo_build_tree Building fo 2 rt 0
coll:tuned:topo:build_bmtree rt 0
coll:tuned:topo:build_in_order_bmtree rt 0
coll:tuned:topo:build_chain fo 4 rt 0
coll:tuned:topo:build_chain fo 1 rt 0
coll:tuned:topo_build_in_order_tree Building fo 2 rt 8
coll:tuned:module_init Tuned is in use
ompi_coll_tuned_allreduce_intra_dec_dynamic
ompi_coll_tuned_allreduce_intra_dec_fixed
coll:tuned:allreduce_intra_recursivedoubling rank 8
ompi_coll_tuned_barrier_intra_dec_dynamic
ompi_coll_tuned_barrier_intra_dec_fixed com_size 9
ompi_coll_tuned_barrier_intra_bruck rank 8
ompi_coll_tuned_allreduce_intra_dec_dynamic
Selected the following msg rule id 0
alg_id 2 com_id 0 com_size 9 msg_id 0
msg_size 0 -> algorithm 1 topo in/out 0 segsize 0
max_requests 0
coll:tuned:allreduce_intra_do_this algorithm 1 topo fan in/out 0 segsize
0
coll:tuned:allreduce_intra_basic_linear rank 8
coll:tuned:reduce_intra_basic_linear rank 8
ompi_coll_tuned_bcast_intra_basic_linear rank 8 root 0
mca: base: close: unloading component basic
mca: base: close: unloading component hierarch
mca: base: close: unloading component inter
mca: base: close: unloading component self
mca: base: close: component sm closed
mca: base: close: unloading component sm
mca: base: close: unloading component sync
coll:tuned:component_close: called
coll:tuned:component_close: done!
mca: base: close: component tuned closed
mca: base: close: unloading component tuned