The answer is "it depends"; there's a lot of factors involved.
- What is the topology of your network?
- Where do processes land within the topology of the network?
- What interconnect are you using? (e.g., the openib BTL will
usually use short message RDMA to a limited set of peers as an
- How long are your messages?
OMPI does not have any special optimizations for point-to-point
communications for MPI_COMM_WORLD ranks that happen to be powers of
two. Other factors may contribute to make that true for your runs,
but there's nothing hard-coded in Open MPI for that.
On Jun 5, 2007, at 1:10 PM, Andy Georgi wrote:
> hi everybody,
> i'm new on this list and started using OpenMPI for my parallel
> jobs. first step was to measure the latency for blocking
> communication functions. now my first question: is it possible that
> ordained communication pairs will be optimized?
> latency for special processnumbers is nearly 25% smaller, e.g. for
> process 1,2,4,8,16,32,64... (every computer scientist should see
> the pattern ;-)). it doesn't matter from which process i send the
> message if the receiver is one of these processes i have top
> latency values. it's not possible that this effect comes through
> the network because communication from proc5 to proc32 e.g. is
> faster than communication from proc32 to proc5. i've tried it with
> OpenMPI for Intel 1.1.4 and 1.2.2 and OpenMPI for PGI 1.2.2. always
> the same results. now i think it must be a kind of optimization. if
> it's so i would like to know it because then i have an
> explanation ;-).
> thx and regards,
> users mailing list