Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] infiniband problem
From: Michael Oevermann (michael.oevermann_at_[hidden])
Date: 2008-11-20 14:38:10


Hi all,

I have "inherited" a small cluster with a head node and four compute
nodes which I have to administer. The nodes are connected via infiniband (OFED), but the head is not.
I am a complete novice to the infiniband stuff and here is my problem:

The infiniband configuration seems to be OK. The usual tests suggested in the OFED install guide give
the expected output, e.g.

ibv_devinfo on the nodes:

************************* oscar_cluster *************************
--------- n01---------
hca_id: mthca0
fw_ver: 1.2.0
node_guid: 0002:c902:0025:930c
sys_image_guid: 0002:c902:0025:930f
vendor_id: 0x02c9
vendor_part_id: 25204
hw_ver: 0xA0
board_id: MT_03B0140001
phys_port_cnt: 1
port: 1
state: PORT_ACTIVE (4)
max_mtu: 2048 (4)
active_mtu: 2048 (4)
sm_lid: 2
port_lid: 1
port_lmc: 0x00

etc. for the other nodes.

sminfo on the nodes:

************************* oscar_cluster *************************
--------- n01---------
sminfo: sm lid 2 sm guid 0x2c90200259201, activity count 6881 priority 0
state 3 SMINFO_MASTER
--------- n02---------
sminfo: sm lid 2 sm guid 0x2c90200259201, activity count 6882 priority 0
state 3 SMINFO_MASTER
--------- n03---------
sminfo: sm lid 2 sm guid 0x2c90200259201, activity count 6883 priority 0
state 3 SMINFO_MASTER
--------- n04---------
sminfo: sm lid 2 sm guid 0x2c90200259201, activity count 6884 priority 0
state 3 SMINFO_MASTER

However, when I directly start a mpi job (without using a scheduler) via:

/usr/mpi/gcc4/openmpi-1.2.2-1/bin/mpirun -np 4 -hostfile
/home/sysgen/infiniband-mpi-test/machine/usr/mpi/gcc4/openmpi-1.2.2-1/tests/IMB-2.3/IMB-MPI1

I get the error message:

0,1,0]: uDAPL on host n01 was unable to find any NICs.
Another transport will be used instead, although this may result in
lower performance.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
[0,1,2]: uDAPL on host n01 was unable to find any NICs.
Another transport will be used instead, although this may result in
lower performance.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
[0,1,3]: uDAPL on host n02 was unable to find any NICs.
Another transport will be used instead, although this may result in
lower performance.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
[0,1,1]: uDAPL on host n02 was unable to find any NICs.
Another transport will be used instead, although this may result in
lower performance.
--------------------------------------------------------------------------

MPI with normal GB Etherrnet and IP networking just works fine, but the
infinband doesn't. The MPI libs I am using
for the test are definitely compiled with IB support and the tests have
been run successfully on
the cluster before.

Any suggestions what is going wrong here?

Best regards and thanks for any help!

Michael