On Wed, Aug 25, 2010 at 12:14 PM, Jeff Squyres <jsquyres_at_[hidden]> wrote:
> It would simplify testing if you could get all the eth0's to be of one type and on the same subnet, and the same for eth1.
> Once you do that, try using just one of the networks by telling OMPI to use only one of the devices, something like this:
> mpirun --mca btl_tcp_if_include eth0 ...
Thanks for all the suggestions guys! We finally got this figured out.
It was the result of two different (hardware specific) bugs in the
RDMA driver. The 10GigE card was advertising a wrong size for the CQ
stack (as far as I understand!).
In case anyone wants to know more, the bugfixes are posted here: