Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r25270
From: Rainer Keller (keller_at_[hidden])
Date: 2011-10-14 17:07:33


Hi Tim,
in fact I was trying the OR-alternative -- however, it's only a win on older
AMD Opterons (16 cycles vs. 20), but cannot beat the __builtin_clz alternative
on Intel.

Best regards,
Rainer

On Wednesday 12 October 2011 11:26:52 Tim Mattox wrote:
> All,
> If you wanted to speedup these routines for processors without
> __builtin_clz, there are a variety of variations in C to implement clz
> efficiently. See Hacker's Delight nlz (number of leading zeros):
> http://www.hackersdelight.org/HDcode/nlz.c.txt
>
> Or from my Ph.D. advisor's magic algorithm's page:
> http://aggregate.org/MAGIC/#Leading%20Zero%20Count
>
> And you can directly implement opal_next_poweroftwo()
> with this:
> http://aggregate.org/MAGIC/#Next%20Largest%20Power%20of%202
>
> The Hacker's Delight webpage (and book) are fun to read for that
> certain kind of person. :-)
> http://www.hackersdelight.org/