(still trolling through the history in my INBOX...)
On Jul 9, 2010, at 8:56 AM, Andreas Schäfer wrote:
> On 14:39 Fri 09 Jul , Peter Kjellstrom wrote:
> > 8x pci-express gen2 5GT/s should show figures like mine. If it's pci-express
> > gen1 or gen2 2.5GT/s or 4x or if the IB only came up with two lanes then 1500
> > is expected.
> lspci and ibv_devinfo tell me it's PCIe 2.0 x8 and InfiniBand 4x QDR
> (active_width 4X, active_speed 10.0 Gbps), so I /should/ be able to
> get about twice the throughput of what I'm currently seeing.
You'll get different shared memory performance if you bind both the local procs to a single socket or two different sockets. I don't know much about AMDs, so I can't say exactly what it'll do offhand.
As for the IB performance, you want to make sure that your MPI process is bound to a core that is "near" the HCA for minimum latency and max bandwidth. Then also check that your IB fabric is clean, etc. I believe that OFED comes with a bunch of verbs-level latency and bandwidth unit tests that can measure what you're getting across your fabric (i.e., raw network performance without MPI). It's been a while since I've worked deeply with OFED stuff; I don't remember the command names offhand -- perhaps ibv_rc_pingpong, or somesuch?
For corporate legal information go to: