Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] CUDA RDMA not selected by default
From: Jeffrey Squyres (jsquyres_at_[hidden])
Date: 2012-03-19 16:22:02


+1

See the description of cm vs. ob1 in the OMPI README. Here's the latest description (I think we recently added a little more description here):

    https://svn.open-mpi.org/trac/ompi/browser/trunk/README#L421

The PSM MTL does not have CUDA support; the smcuda BTL is for ob1 only.

On Mar 19, 2012, at 4:15 PM, Nathan Hjelm wrote:

> The selection of cm is not wrong per se. You will find that the psm mtl is much better than the openib btl for QLogic harware.
>
> -Nathan
>
> On Mon, 19 Mar 2012, Jens Glaser wrote:
>
>> Hello,
>>
>> I am using the latest trunk version of OMPI, in order to take advantage of the new CUDA RDMA features (smcuda BTL). RDMA support is superb, however, I have to give a manual parameter
>>
>> mpirun --mca pml ob1 ...
>>
>> to have the OB1 upper layer selected and, consequently, to get smcuda activated. Otherwise mpirun chooses the cm upper layer, which is wrong. The hardware is a
>>
>> InfiniBand: QLogic Corp. IBA7322 QDR InfiniBand HCA (rev 02).
>>
>> This is the output of
>> mpirun - mca pml_base_verbose 100
>>
>> [cas002:05518] select: component cm selected
>> [cas002:05518] mca: base: close: component v closed
>> [cas002:05518] mca: base: close: unloading component v
>> [cas002:05518] mca: base: close: component bfo closed
>> [cas002:05518] mca: base: close: unloading component bfo
>> [cas002:05518] mca: base: close: component csum closed
>> [cas002:05518] mca: base: close: unloading component csum
>> [cas002:05518] mca: base: close: component dr closed
>> [cas002:05518] mca: base: close: unloading component dr
>> [cas002:05518] mca: base: close: component ob1 closed
>> [cas002:05518] mca: base: close: unloading component ob1
>> [cas002:05520] mca: base: components_open: component cm open function successful
>> [cas002:05520] mca: base: components_open: found loaded component csum
>> [cas002:05520] mca: base: components_open: component csum has no register function
>> [cas002:05520] mca: base: components_open: component csum open function successful
>> [cas002:05520] mca: base: components_open: found loaded component dr
>> [cas002:05520] mca: base: components_open: component dr has no register function
>> [cas002:05520] mca: base: components_open: component dr open function successful
>> [cas002:05520] mca: base: components_open: found loaded component ob1
>> [cas002:05520] mca: base: components_open: component ob1 has no register function
>> [cas002:05520] mca: base: components_open: component ob1 open function successful
>> [cas002:05520] select: component v not in the include list
>> [cas002:05520] select: component bfo not in the include list
>> [cas002:05520] select: initializing pml component cm
>> [cas002:05520] select: init returned priority 30
>> [cas002:05520] select: component csum not in the include list
>> [cas002:05520] select: component dr not in the include list
>> [cas002:05520] select: initializing pml component ob1
>> [cas002:05520] select: init returned failure for component ob1
>> [cas002:05520] selected cm best priority 30
>> [cas002:05520] select: component cm selected
>> [cas002:05520] mca: base: close: component v closed
>> [cas002:05520] mca: base: close: unloading component v
>> [cas002:05520] mca: base: close: component bfo closed
>> [cas002:05520] mca: base: close: unloading component bfo
>> [cas002:05520] mca: base: close: component csum closed
>> [cas002:05520] mca: base: close: unloading component csum
>> [cas002:05520] mca: base: close: component dr closed
>> [cas002:05520] mca: base: close: unloading component dr
>> [cas002:05520] mca: base: close: component ob1 closed
>> [cas002:05520] mca: base: close: unloading component ob1
>> [cas002:05518] check:select: checking my pml cm against rank=0 pml cm
>> [cas002:05517] check:select: rank=0
>> [cas002:05520] check:select: checking my pml cm against rank=0 pml cm
>> [cas002:05519] check:select: checking my pml cm against rank=0 pml cm
>>
>> Configure options:
>> ./configure --with-openib --with-cuda --prefix=/home/it1/glaser/local --with-tm=/opt/torque --enable-shared
>>
>> Does anyone have any idea what causes openmpi to select cm by default?
>>
>> Thanks,
>> Jens.
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/