Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] RE : RE : Latency of 250 microseconds with Open-MPI 1.4.3, Mellanox Infiniband and 256 MPI ranks
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2011-09-21 17:09:41


On Sep 21, 2011, at 4:24 PM, Sébastien Boisvert wrote:

>> What happens if you run 2 ibv_rc_pingpong's on each node? Or N ibv_rc_pingpongs?
>
> With 11 ibv_rc_pingpong's
>
> http://pastebin.com/85sPcA47
>
> Code to do that => https://gist.github.com/1233173
>
> Latencies are around 20 microseconds.

This seems to imply that the network is to blame for the higher latency...?

I.e., if you run the same pattern with MPI processes and get 20us latency, that would tend to imply that the network itself is not performing well with that IO pattern.

> My job seems to do well so far with ofud !
>
> [sboisver12_at_colosse2 ray]$ qstat
> job-ID prior name user state submit/start at queue slots ja-task-ID
> -----------------------------------------------------------------------------------------------------------------
> 3047460 0.55384 fish-Assem sboisver12 r 09/21/2011 15:02:25 med_at_r104-n58 256

I would still be suspicious -- ofud is not well tested, and it can definitely hang if there are network drops.

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/