Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Bug: coll_tuned_dynamic_rules_filename and duplicatecommunicators
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2009-07-11 08:29:35


Thanks for the bug report!

I've filed https://svn.open-mpi.org/trac/ompi/ticket/1974 about this.

On Jul 7, 2009, at 1:04 PM, Jumper, John wrote:

> I am attempting to use coll_tuned_dynamic_rules_filename to tune Open
> MPI 1.3.2. Based on my testing, it appears that the dynamic rules
> file
> *only* influences the algorithm selection for MPI_COMM_WORLD. Any
> duplicate communicators will only use fixed or forced rules, which may
> have much worse performance than the custom-tuned collectives in the
> dynamic rules file. The following code demonstrates the difference
> between MPI_COMM_WORLD and a duplicate communicator.
>
> test.c:
> #include <mpi.h>
>
> int main( int argc, char** argv ) {
> float u = 0.0, v = 0.0;
> MPI_Comm world_dup;
>
> MPI_Init( &argc, &argv );
> MPI_Comm_dup( MPI_COMM_WORLD, &world_dup );
>
> MPI_Allreduce( &u, &v, 1, MPI_FLOAT, MPI_SUM, world_dup );
> MPI_Barrier( MPI_COMM_WORLD );
> MPI_Allreduce( &u, &v, 1, MPI_FLOAT, MPI_SUM, MPI_COMM_WORLD );
>
> MPI_Finalize();
> return 0;
> }
>
> allreduce.ompi:
> 1
> 2
> 1
> 9
> 1
> 0 1 0 0
>
> invocation:
> orterun -np 9 \
> -mca btl self,sm,openib,tcp \
> -mca coll_tuned_use_dynamic_rules 1 \
> -mca coll_tuned_dynamic_rules_filename allreduce.ompi \
> -mca coll_base_verbose 1000 \
> -- test
>
> This program is run with tracing, and the barrier is only used to
> separate the allreduce calls in the trace. The trace for one node
> is at
> the end of the message, and the relevant section is the choice of
> algorithms for the two allreduce calls. The allreduce.ompi file
> indicates that all size 9 communicators should use the basic linear
> allreduce algorithm. MPI_COMM_WORLD uses basic_linear, but the
> world_dup communicator uses the fixed algorithm (for this message
> size,
> the fixed algorithm is recursive doubling).
>
> Thank you.
>
> John Jumper
>
>
>
> Trace of one process for the above program:
> mca: base: components_open: opening coll components
> mca: base: components_open: found loaded component basic
> mca: base: components_open: component basic register function
> successful
> mca: base: components_open: component basic has no open function
> mca: base: components_open: found loaded component hierarch
> mca: base: components_open: component hierarch has no register
> function
> mca: base: components_open: component hierarch open function
> successful
> mca: base: components_open: found loaded component inter
> mca: base: components_open: component inter has no register function
> mca: base: components_open: component inter open function successful
> mca: base: components_open: found loaded component self
> mca: base: components_open: component self has no register function
> mca: base: components_open: component self open function successful
> mca: base: components_open: found loaded component sm
> mca: base: components_open: component sm has no register function
> mca: base: components_open: component sm open function successful
> mca: base: components_open: found loaded component sync
> mca: base: components_open: component sync register function
> successful
> mca: base: components_open: component sync has no open function
> mca: base: components_open: found loaded component tuned
> mca: base: components_open: component tuned has no register function
> coll:tuned:component_open: done!
> mca: base: components_open: component tuned open function successful
> coll:find_available: querying coll component basic
> coll:find_available: coll component basic is available
> coll:find_available: querying coll component hierarch
> coll:find_available: coll component hierarch is available
> coll:find_available: querying coll component inter
> coll:find_available: coll component inter is available
> coll:find_available: querying coll component self
> coll:find_available: coll component self is available
> coll:find_available: querying coll component sm
> coll:find_available: coll component sm is available
> coll:find_available: querying coll component sync
> coll:find_available: coll component sync is available
> coll:find_available: querying coll component tuned
> coll:find_available: coll component tuned is available
> coll:base:comm_select: new communicator: MPI_COMM_WORLD (cid 0)
> coll:base:comm_select: Checking all available modules
> coll:base:comm_select: component available: basic, priority: 10
> coll:base:comm_select: component not available: hierarch
> coll:base:comm_select: component not available: inter
> coll:base:comm_select: component not available: self
> coll:base:comm_select: component not available: sm
> coll:base:comm_select: component not available: sync
> coll:tuned:module_tuned query called
> coll:tuned:module_query using intra_dynamic
> coll:base:comm_select: component available: tuned, priority: 30
> coll:tuned:module_init called.
> coll:tuned:module_init MCW & Dynamic
> coll:tuned:module_init Opening [allreduce.ompi]
> Reading dynamic rule for collective ID 2
> Read communicator count 1 for dynamic rule for collective ID 2
> Read message count 1 for dynamic rule for collective ID 2 and comm
> size
> 9
> Done reading dynamic rule for collective ID 2
>
> Collectives with rules : 1
> Communicator sizes with rules : 1
> Message sizes with rules : 1
> Lines in configuration file read : 0
> coll:tuned:module_init Read 1 valid rules
> Selected the following com rule id 0
> alg_id 2 com_id 0 com_size 9
> number of message sizes 1
> alg_id 2 com_id 0 com_size 9 msg_id 0
> msg_size 0 -> algorithm 1 topo in/out 0 segsize 0
> max_requests 0
> coll:tuned:topo_build_tree Building fo 4 rt 0
> coll:tuned:topo_build_tree Building fo 2 rt 0
> coll:tuned:topo:build_bmtree rt 0
> coll:tuned:topo:build_in_order_bmtree rt 0
> coll:tuned:topo:build_chain fo 4 rt 0
> coll:tuned:topo:build_chain fo 1 rt 0
> coll:tuned:topo_build_in_order_tree Building fo 2 rt 8
> coll:tuned:module_init Tuned is in use
> coll:base:comm_select: new communicator: MPI_COMM_SELF (cid 1)
> coll:base:comm_select: Checking all available modules
> coll:base:comm_select: component available: basic, priority: 10
> coll:base:comm_select: component not available: hierarch
> coll:base:comm_select: component not available: inter
> coll:base:comm_select: component available: self, priority: 75
> coll:base:comm_select: component not available: sm
> coll:base:comm_select: component not available: sync
> coll:tuned:module_tuned query called
> coll:base:comm_select: component not available: tuned
> coll:base:comm_select: new communicator: MPI COMMUNICATOR 4 DUP FROM 0
> (cid 4)
> coll:base:comm_select: Checking all available modules
> coll:base:comm_select: component available: basic, priority: 10
> coll:base:comm_select: component not available: hierarch
> coll:base:comm_select: component not available: inter
> coll:base:comm_select: component not available: self
> coll:base:comm_select: component not available: sm
> coll:base:comm_select: component not available: sync
> coll:tuned:module_tuned query called
> coll:tuned:module_query using intra_dynamic
> coll:base:comm_select: component available: tuned, priority: 30
> coll:tuned:module_init called.
> coll:tuned:topo_build_tree Building fo 4 rt 0
> coll:tuned:topo_build_tree Building fo 2 rt 0
> coll:tuned:topo:build_bmtree rt 0
> coll:tuned:topo:build_in_order_bmtree rt 0
> coll:tuned:topo:build_chain fo 4 rt 0
> coll:tuned:topo:build_chain fo 1 rt 0
> coll:tuned:topo_build_in_order_tree Building fo 2 rt 8
> coll:tuned:module_init Tuned is in use
> ompi_coll_tuned_allreduce_intra_dec_dynamic
> ompi_coll_tuned_allreduce_intra_dec_fixed
> coll:tuned:allreduce_intra_recursivedoubling rank 8
> ompi_coll_tuned_barrier_intra_dec_dynamic
> ompi_coll_tuned_barrier_intra_dec_fixed com_size 9
> ompi_coll_tuned_barrier_intra_bruck rank 8
> ompi_coll_tuned_allreduce_intra_dec_dynamic
> Selected the following msg rule id 0
> alg_id 2 com_id 0 com_size 9 msg_id 0
> msg_size 0 -> algorithm 1 topo in/out 0 segsize 0
> max_requests 0
> coll:tuned:allreduce_intra_do_this algorithm 1 topo fan in/out 0
> segsize
> 0
> coll:tuned:allreduce_intra_basic_linear rank 8
> coll:tuned:reduce_intra_basic_linear rank 8
> ompi_coll_tuned_bcast_intra_basic_linear rank 8 root 0
> mca: base: close: unloading component basic
> mca: base: close: unloading component hierarch
> mca: base: close: unloading component inter
> mca: base: close: unloading component self
> mca: base: close: component sm closed
> mca: base: close: unloading component sm
> mca: base: close: unloading component sync
> coll:tuned:component_close: called
> coll:tuned:component_close: done!
> mca: base: close: component tuned closed
> mca: base: close: unloading component tuned
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

-- 
Jeff Squyres
Cisco Systems