A few weeks ago, I posted to the list about difficulties I was having getting openib to work with Torque (see "openib segfaults with Torque", June 6, 2014). The issues were related to Torque imposing restrictive limits on locked memory, and have since been resolved.
However, now that I've had some time to test the applications, I'm seeing abysmal performance over the openib layer. Applications run with the tcp btl execute about 10x faster than with the openib btl. Clearly something still isn't quite right.
I tried running with "-mca btl_openib_verbose 1", but didn't see anything resembling a smoking gun. How should I go about determining the source of the problem? (This uses the same OpenMPI Version 1.8.1 / SLES11 SP3 / GCC 4.8.3 setup discussed previously.)