Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Advices for parameter tuning for CUDA-aware MPI
From: Maxime Boissonneault (maxime.boissonneault_at_[hidden])
Date: 2014-05-27 16:06:55


Answers inline too.
>> 2) Is the absence of btl_openib_have_driver_gdr an indicator of something
>> missing ?
> Yes, that means that somehow the GPU Direct RDMA is not installed correctly. All that check does is make sure that the file /sys/kernel/mm/memory_peers/nv_mem/version exists. Does that exist?
>
It does not. There is no
/sys/kernel/mm/memory_peers/

>> 3) Are the default parameters, especially the rdma limits and such, optimal for
>> our configuration ?
> That is hard to say. GPU Direct RDMA does not work well when the GPU and IB card are not "close" on the system. Can you run "nvidia-smi topo -m" on your system?
nvidia-smi topo -m
gives me the error
[mboisson_at_login-gpu01 ~]$ nvidia-smi topo -m
Invalid combination of input arguments. Please run 'nvidia-smi -h' for help.

I could not find anything related to topology in the help. However, I
can tell you the following which I believe to be true
- GPU0 and GPU1 are on PCIe bus 0, socket 0
- GPU2 and GPU3 are on PCIe bus 1, socket 0
- GPU4 and GPU5 are on PCIe bus 2, socket 1
- GPU6 and GPU7 are on PCIe bus 3, socket 1

There is one IB card which I believe is on socket 0.

I know that we do not have the Mellanox Ofed. We use the Linux RDMA from
CentOS 6.5. However, should that completely disable GDR within a single
node ? i.e. does GDR _have_ to go through IB ? I would assume that our
lack of Mellanox OFED would result in no-GDR inter-node, but GDR intra-node.

Thanks

-- 
---------------------------------
Maxime Boissonneault
Analyste de calcul - Calcul Québec, Université Laval
Ph. D. en physique