Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Bug: coll_tuned_dynamic_rules_filename and duplicate communicators
From: George Bosilca (bosilca_at_[hidden])
Date: 2009-08-14 17:07:53


John,

Thanks for your bug report. This issues has been fixed in commit
r21825. I'll hope to be able to push it in our next release.

   Thanks,
     george.

On Jul 7, 2009, at 13:04 , Jumper, John wrote:

> I am attempting to use coll_tuned_dynamic_rules_filename to tune Open
> MPI 1.3.2. Based on my testing, it appears that the dynamic rules
> file
> *only* influences the algorithm selection for MPI_COMM_WORLD. Any
> duplicate communicators will only use fixed or forced rules, which may
> have much worse performance than the custom-tuned collectives in the
> dynamic rules file. The following code demonstrates the difference
> between MPI_COMM_WORLD and a duplicate communicator.
>
> test.c:
> #include <mpi.h>
>
> int main( int argc, char** argv ) {
> float u = 0.0, v = 0.0;
> MPI_Comm world_dup;
>
> MPI_Init( &argc, &argv );
> MPI_Comm_dup( MPI_COMM_WORLD, &world_dup );
>
> MPI_Allreduce( &u, &v, 1, MPI_FLOAT, MPI_SUM, world_dup );
> MPI_Barrier( MPI_COMM_WORLD );
> MPI_Allreduce( &u, &v, 1, MPI_FLOAT, MPI_SUM, MPI_COMM_WORLD );
>
> MPI_Finalize();
> return 0;
> }
>
> allreduce.ompi:
> 1
> 2
> 1
> 9
> 1
> 0 1 0 0
>
> invocation:
> orterun -np 9 \
> -mca btl self,sm,openib,tcp \
> -mca coll_tuned_use_dynamic_rules 1 \
> -mca coll_tuned_dynamic_rules_filename allreduce.ompi \
> -mca coll_base_verbose 1000 \
> -- test
>
> This program is run with tracing, and the barrier is only used to
> separate the allreduce calls in the trace. The trace for one node
> is at
> the end of the message, and the relevant section is the choice of
> algorithms for the two allreduce calls. The allreduce.ompi file
> indicates that all size 9 communicators should use the basic linear
> allreduce algorithm. MPI_COMM_WORLD uses basic_linear, but the
> world_dup communicator uses the fixed algorithm (for this message
> size,
> the fixed algorithm is recursive doubling).
>
> Thank you.
>
> John Jumper
>
>
>
> Trace of one process for the above program:
> mca: base: components_open: opening coll components
> mca: base: components_open: found loaded component basic
> mca: base: components_open: component basic register function
> successful
> mca: base: components_open: component basic has no open function
> mca: base: components_open: found loaded component hierarch
> mca: base: components_open: component hierarch has no register
> function
> mca: base: components_open: component hierarch open function
> successful
> mca: base: components_open: found loaded component inter
> mca: base: components_open: component inter has no register function
> mca: base: components_open: component inter open function successful
> mca: base: components_open: found loaded component self
> mca: base: components_open: component self has no register function
> mca: base: components_open: component self open function successful
> mca: base: components_open: found loaded component sm
> mca: base: components_open: component sm has no register function
> mca: base: components_open: component sm open function successful
> mca: base: components_open: found loaded component sync
> mca: base: components_open: component sync register function
> successful
> mca: base: components_open: component sync has no open function
> mca: base: components_open: found loaded component tuned
> mca: base: components_open: component tuned has no register function
> coll:tuned:component_open: done!
> mca: base: components_open: component tuned open function successful
> coll:find_available: querying coll component basic
> coll:find_available: coll component basic is available
> coll:find_available: querying coll component hierarch
> coll:find_available: coll component hierarch is available
> coll:find_available: querying coll component inter
> coll:find_available: coll component inter is available
> coll:find_available: querying coll component self
> coll:find_available: coll component self is available
> coll:find_available: querying coll component sm
> coll:find_available: coll component sm is available
> coll:find_available: querying coll component sync
> coll:find_available: coll component sync is available
> coll:find_available: querying coll component tuned
> coll:find_available: coll component tuned is available
> coll:base:comm_select: new communicator: MPI_COMM_WORLD (cid 0)
> coll:base:comm_select: Checking all available modules
> coll:base:comm_select: component available: basic, priority: 10
> coll:base:comm_select: component not available: hierarch
> coll:base:comm_select: component not available: inter
> coll:base:comm_select: component not available: self
> coll:base:comm_select: component not available: sm
> coll:base:comm_select: component not available: sync
> coll:tuned:module_tuned query called
> coll:tuned:module_query using intra_dynamic
> coll:base:comm_select: component available: tuned, priority: 30
> coll:tuned:module_init called.
> coll:tuned:module_init MCW & Dynamic
> coll:tuned:module_init Opening [allreduce.ompi]
> Reading dynamic rule for collective ID 2
> Read communicator count 1 for dynamic rule for collective ID 2
> Read message count 1 for dynamic rule for collective ID 2 and comm
> size
> 9
> Done reading dynamic rule for collective ID 2
>
> Collectives with rules : 1
> Communicator sizes with rules : 1
> Message sizes with rules : 1
> Lines in configuration file read : 0
> coll:tuned:module_init Read 1 valid rules
> Selected the following com rule id 0
> alg_id 2 com_id 0 com_size 9
> number of message sizes 1
> alg_id 2 com_id 0 com_size 9 msg_id 0
> msg_size 0 -> algorithm 1 topo in/out 0 segsize 0
> max_requests 0
> coll:tuned:topo_build_tree Building fo 4 rt 0
> coll:tuned:topo_build_tree Building fo 2 rt 0
> coll:tuned:topo:build_bmtree rt 0
> coll:tuned:topo:build_in_order_bmtree rt 0
> coll:tuned:topo:build_chain fo 4 rt 0
> coll:tuned:topo:build_chain fo 1 rt 0
> coll:tuned:topo_build_in_order_tree Building fo 2 rt 8
> coll:tuned:module_init Tuned is in use
> coll:base:comm_select: new communicator: MPI_COMM_SELF (cid 1)
> coll:base:comm_select: Checking all available modules
> coll:base:comm_select: component available: basic, priority: 10
> coll:base:comm_select: component not available: hierarch
> coll:base:comm_select: component not available: inter
> coll:base:comm_select: component available: self, priority: 75
> coll:base:comm_select: component not available: sm
> coll:base:comm_select: component not available: sync
> coll:tuned:module_tuned query called
> coll:base:comm_select: component not available: tuned
> coll:base:comm_select: new communicator: MPI COMMUNICATOR 4 DUP FROM 0
> (cid 4)
> coll:base:comm_select: Checking all available modules
> coll:base:comm_select: component available: basic, priority: 10
> coll:base:comm_select: component not available: hierarch
> coll:base:comm_select: component not available: inter
> coll:base:comm_select: component not available: self
> coll:base:comm_select: component not available: sm
> coll:base:comm_select: component not available: sync
> coll:tuned:module_tuned query called
> coll:tuned:module_query using intra_dynamic
> coll:base:comm_select: component available: tuned, priority: 30
> coll:tuned:module_init called.
> coll:tuned:topo_build_tree Building fo 4 rt 0
> coll:tuned:topo_build_tree Building fo 2 rt 0
> coll:tuned:topo:build_bmtree rt 0
> coll:tuned:topo:build_in_order_bmtree rt 0
> coll:tuned:topo:build_chain fo 4 rt 0
> coll:tuned:topo:build_chain fo 1 rt 0
> coll:tuned:topo_build_in_order_tree Building fo 2 rt 8
> coll:tuned:module_init Tuned is in use
> ompi_coll_tuned_allreduce_intra_dec_dynamic
> ompi_coll_tuned_allreduce_intra_dec_fixed
> coll:tuned:allreduce_intra_recursivedoubling rank 8
> ompi_coll_tuned_barrier_intra_dec_dynamic
> ompi_coll_tuned_barrier_intra_dec_fixed com_size 9
> ompi_coll_tuned_barrier_intra_bruck rank 8
> ompi_coll_tuned_allreduce_intra_dec_dynamic
> Selected the following msg rule id 0
> alg_id 2 com_id 0 com_size 9 msg_id 0
> msg_size 0 -> algorithm 1 topo in/out 0 segsize 0
> max_requests 0
> coll:tuned:allreduce_intra_do_this algorithm 1 topo fan in/out 0
> segsize
> 0
> coll:tuned:allreduce_intra_basic_linear rank 8
> coll:tuned:reduce_intra_basic_linear rank 8
> ompi_coll_tuned_bcast_intra_basic_linear rank 8 root 0
> mca: base: close: unloading component basic
> mca: base: close: unloading component hierarch
> mca: base: close: unloading component inter
> mca: base: close: unloading component self
> mca: base: close: component sm closed
> mca: base: close: unloading component sm
> mca: base: close: unloading component sync
> coll:tuned:component_close: called
> coll:tuned:component_close: done!
> mca: base: close: component tuned closed
> mca: base: close: unloading component tuned
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users