Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r25270
From: Tim Mattox (timattox_at_[hidden])
Date: 2011-10-12 11:26:52


All,
If you wanted to speedup these routines for processors without __builtin_clz,
there are a variety of variations in C to implement clz efficiently.
See Hacker's Delight nlz (number of leading zeros):
http://www.hackersdelight.org/HDcode/nlz.c.txt

Or from my Ph.D. advisor's magic algorithm's page:
http://aggregate.org/MAGIC/#Leading%20Zero%20Count

And you can directly implement opal_next_poweroftwo()
with this:
http://aggregate.org/MAGIC/#Next%20Largest%20Power%20of%202

The Hacker's Delight webpage (and book) are fun to read for that
certain kind of person. :-)
http://www.hackersdelight.org/

On Tue, Oct 11, 2011 at 6:49 PM, <rusraink_at_[hidden]> wrote:
> Author: rusraink
> Date: 2011-10-11 18:49:01 EDT (Tue, 11 Oct 2011)
> New Revision: 25270
> URL: https://svn.open-mpi.org/trac/ompi/changeset/25270
>
> Log:
>  - Check, whether the compiler supports __builtin_clz (count leading
>   zeroes);
>   if so, use it for bit-operations like opal_cube_dim and opal_hibit.
>   Implement two versions of power-of-two.
>   In case of opal_next_poweroftwo, this reduces the average execution
>   time from 83 cycles to 4 cycles (Intel Nehalem, icc, -O2, inlining,
>   measured rdtsc, with loop over 2^27 values).
>   Numbers for other functions are similar (but of course heavily depend
>   on the usage, e.g. opal_hibit() with a start of 4 does not save
>   much).  The bsr instruction on AMD Opteron is also not as fast.
>
>  - Replace various places where the next power-of-two is computed.
>
>   Tested on Intel Nehalem Cluster with openib, compilers GNU-4.6.1 and
>   Intel-12.0.4 using mpi_testsuite -t "Collective" with 128 processes.
>
>
> Added:
>   trunk/test/util/opal_bit_ops.c
> Text files modified:
>   trunk/ompi/mca/btl/openib/btl_openib_mca.c            |    13 +---
>   trunk/ompi/mca/btl/sm/btl_sm.h                        |     5 -
>   trunk/ompi/mca/btl/sm/btl_sm_component.c              |     9 +--
>   trunk/ompi/mca/btl/wv/btl_wv_mca.c                    |    13 +---
>   trunk/ompi/mca/coll/basic/coll_basic_reduce_scatter.c |     5 +
>   trunk/ompi/mca/coll/tuned/coll_tuned_allgather.c      |     3
>   trunk/ompi/mca/coll/tuned/coll_tuned_allreduce.c      |     4 +
>   trunk/ompi/mca/coll/tuned/coll_tuned_barrier.c        |     5 +
>   trunk/ompi/mca/coll/tuned/coll_tuned_decision_fixed.c |     5 +
>   trunk/ompi/mca/coll/tuned/coll_tuned_reduce_scatter.c |     5 +
>   trunk/ompi/mca/coll/tuned/coll_tuned_topo.c           |     3
>   trunk/opal/class/opal_hash_table.c                    |     8 --
>   trunk/opal/config/opal_setup_cc.m4                    |    20 ++++++
>   trunk/opal/util/bit_ops.h                             |   106 +++++++++++++++++++++++++++++++++++----
>   trunk/test/util/Makefile.am                           |    14 ++++-
>   15 files changed, 158 insertions(+), 60 deletions(-)
>
[snip]

-- 
Tim Mattox, Ph.D. - http://homepage.mac.com/tmattox/
 timattox_at_[hidden] || tmattox_at_[hidden]
    I'm a bright... http://www.the-brights.net/