Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] infiniband problem
From: Michael Oevermann (michael.oevermann_at_[hidden])
Date: 2008-11-20 16:16:08


Hi Ralph,

that was indeed a typo, the command is of course

/usr/mpi/gcc4/openmpi-1.2.2-1/bin/mpirun -np 4 -hostfile
/home/sysgen/infiniband-mpi-test/machine
/usr/mpi/gcc4/openmpi-1.2.2-1/tests/IMB-2.3/IMB-MPI1

with a blank behind /machine. Anyway, your suggested options -mca btl
openib,sm,self
did help!!! Right now I am not able to check the performance results as
the cluster is busy with jobs so I cannot
compare with the old benchmark results.

Thanks for help!

Michael

Ralph Castain schrieb:
> Your command line may have just come across with a typo, but something
> isn't right:
>
> -hostfile
> /home/sysgen/infiniband-mpi-test/machine/usr/mpi/gcc4/openmpi-1.2.2-1/tests/IMB-2.3/IMB-MPI1
>
> That looks more like a path to a binary than a path to a hostfile. Is
> there a missing space or filename somewhere?
>
> If not, then I would have expected this to error out since the
> argument would be taken as the hostfile, leaving no executable specified.
>
> If you get that straightened out, then try adding -mca btl
> openib,sm,self to the cmd line. This will direct mpirun to use only
> the OpenIB, shared memory, and loopback transports, so you shouldn't
> pick up uDAPL any more.
>
> Ralph
>
>
> On Nov 20, 2008, at 12:38 PM, Michael Oevermann wrote:
>
>> Hi all,
>>
>> I have "inherited" a small cluster with a head node and four compute
>> nodes which I have to administer. The nodes are connected via infiniband (OFED), but the head is not.
>> I am a complete novice to the infiniband stuff and here is my problem:
>>
>> The infiniband configuration seems to be OK. The usual tests suggested in the OFED install guide give
>> the expected output, e.g.
>>
>>
>>
>> ibv_devinfo on the nodes:
>>
>>
>> ************************* oscar_cluster *************************
>> --------- n01---------
>> hca_id: mthca0
>> fw_ver: 1.2.0
>> node_guid: 0002:c902:0025:930c
>> sys_image_guid: 0002:c902:0025:930f
>> vendor_id: 0x02c9
>> vendor_part_id: 25204
>> hw_ver: 0xA0
>> board_id: MT_03B0140001
>> phys_port_cnt: 1
>> port: 1
>> state: PORT_ACTIVE (4)
>> max_mtu: 2048 (4)
>> active_mtu: 2048 (4)
>> sm_lid: 2
>> port_lid: 1
>> port_lmc: 0x00
>>
>> etc. for the other nodes.
>>
>> sminfo on the nodes:
>>
>> ************************* oscar_cluster *************************
>> --------- n01---------
>> sminfo: sm lid 2 sm guid 0x2c90200259201, activity count 6881
>> priority 0 state 3 SMINFO_MASTER
>> --------- n02---------
>> sminfo: sm lid 2 sm guid 0x2c90200259201, activity count 6882
>> priority 0 state 3 SMINFO_MASTER
>> --------- n03---------
>> sminfo: sm lid 2 sm guid 0x2c90200259201, activity count 6883
>> priority 0 state 3 SMINFO_MASTER
>> --------- n04---------
>> sminfo: sm lid 2 sm guid 0x2c90200259201, activity count 6884
>> priority 0 state 3 SMINFO_MASTER
>>
>>
>>
>> However, when I directly start a mpi job (without using a scheduler) via:
>>
>> /usr/mpi/gcc4/openmpi-1.2.2-1/bin/mpirun -np 4 -hostfile
>> /home/sysgen/infiniband-mpi-test/machine/usr/mpi/gcc4/openmpi-1.2.2-1/tests/IMB-2.3/IMB-MPI1
>>
>> I get the error message:
>>
>> 0,1,0]: uDAPL on host n01 was unable to find any NICs.
>> Another transport will be used instead, although this may result in
>> lower performance.
>> --------------------------------------------------------------------------
>> --------------------------------------------------------------------------
>> [0,1,2]: uDAPL on host n01 was unable to find any NICs.
>> Another transport will be used instead, although this may result in
>> lower performance.
>> --------------------------------------------------------------------------
>> --------------------------------------------------------------------------
>> [0,1,3]: uDAPL on host n02 was unable to find any NICs.
>> Another transport will be used instead, although this may result in
>> lower performance.
>> --------------------------------------------------------------------------
>> --------------------------------------------------------------------------
>> [0,1,1]: uDAPL on host n02 was unable to find any NICs.
>> Another transport will be used instead, although this may result in
>> lower performance.
>> --------------------------------------------------------------------------
>> MPI with normal GB Etherrnet and IP networking just works fine, but
>> the infinband doesn't. The MPI libs I am using
>> for the test are definitely compiled with IB support and the tests
>> have been run successfully on
>> the cluster before.
>>
>> Any suggestions what is going wrong here?
>>
>> Best regards and thanks for any help!
>>
>> Michael
>>
>>
>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden] <mailto:users_at_[hidden]>
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users