Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] CUDA RDMA not selected by default
From: Jens Glaser (jglaser_at_[hidden])
Date: 2012-03-19 16:10:32


Hello,

I am using the latest trunk version of OMPI, in order to take advantage of the new CUDA RDMA features (smcuda BTL). RDMA support is superb, however, I have to give a manual parameter

mpirun --mca pml ob1 ...

to have the OB1 upper layer selected and, consequently, to get smcuda activated. Otherwise mpirun chooses the cm upper layer, which is wrong. The hardware is a

InfiniBand: QLogic Corp. IBA7322 QDR InfiniBand HCA (rev 02).

This is the output of
mpirun - mca pml_base_verbose 100

[cas002:05518] select: component cm selected
[cas002:05518] mca: base: close: component v closed
[cas002:05518] mca: base: close: unloading component v
[cas002:05518] mca: base: close: component bfo closed
[cas002:05518] mca: base: close: unloading component bfo
[cas002:05518] mca: base: close: component csum closed
[cas002:05518] mca: base: close: unloading component csum
[cas002:05518] mca: base: close: component dr closed
[cas002:05518] mca: base: close: unloading component dr
[cas002:05518] mca: base: close: component ob1 closed
[cas002:05518] mca: base: close: unloading component ob1
[cas002:05520] mca: base: components_open: component cm open function successful
[cas002:05520] mca: base: components_open: found loaded component csum
[cas002:05520] mca: base: components_open: component csum has no register function
[cas002:05520] mca: base: components_open: component csum open function successful
[cas002:05520] mca: base: components_open: found loaded component dr
[cas002:05520] mca: base: components_open: component dr has no register function
[cas002:05520] mca: base: components_open: component dr open function successful
[cas002:05520] mca: base: components_open: found loaded component ob1
[cas002:05520] mca: base: components_open: component ob1 has no register function
[cas002:05520] mca: base: components_open: component ob1 open function successful
[cas002:05520] select: component v not in the include list
[cas002:05520] select: component bfo not in the include list
[cas002:05520] select: initializing pml component cm
[cas002:05520] select: init returned priority 30
[cas002:05520] select: component csum not in the include list
[cas002:05520] select: component dr not in the include list
[cas002:05520] select: initializing pml component ob1
[cas002:05520] select: init returned failure for component ob1
[cas002:05520] selected cm best priority 30
[cas002:05520] select: component cm selected
[cas002:05520] mca: base: close: component v closed
[cas002:05520] mca: base: close: unloading component v
[cas002:05520] mca: base: close: component bfo closed
[cas002:05520] mca: base: close: unloading component bfo
[cas002:05520] mca: base: close: component csum closed
[cas002:05520] mca: base: close: unloading component csum
[cas002:05520] mca: base: close: component dr closed
[cas002:05520] mca: base: close: unloading component dr
[cas002:05520] mca: base: close: component ob1 closed
[cas002:05520] mca: base: close: unloading component ob1
[cas002:05518] check:select: checking my pml cm against rank=0 pml cm
[cas002:05517] check:select: rank=0
[cas002:05520] check:select: checking my pml cm against rank=0 pml cm
[cas002:05519] check:select: checking my pml cm against rank=0 pml cm

Configure options:
./configure --with-openib --with-cuda --prefix=/home/it1/glaser/local --with-tm=/opt/torque --enable-shared

Does anyone have any idea what causes openmpi to select cm by default?

Thanks,
Jens.