Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] infiniband problem
From: Tim Mattox (timattox_at_[hidden])
Date: 2008-11-20 16:35:49


BTW - after you get more comfortable with your new-to-you cluster, I
recommend you upgrade your Open MPI installation. v1.2.8 has
a lot of bugfixes relative to v1.2.2. Also, Open MPI 1.3 should be
available "next month"... so watch for an announcement on that front.

On Thu, Nov 20, 2008 at 3:16 PM, Michael Oevermann
<michael.oevermann_at_[hidden]> wrote:
> Hi Ralph,
>
> that was indeed a typo, the command is of course
>
> /usr/mpi/gcc4/openmpi-1.2.2-1/bin/mpirun -np 4 -hostfile
> /home/sysgen/infiniband-mpi-test/machine
> /usr/mpi/gcc4/openmpi-1.2.2-1/tests/IMB-2.3/IMB-MPI1
>
> with a blank behind /machine. Anyway, your suggested options -mca btl
> openib,sm,self
> did help!!! Right now I am not able to check the performance results as the
> cluster is busy with jobs so I cannot
> compare with the old benchmark results.
>
> Thanks for help!
>
> Michael
>
>
> Ralph Castain schrieb:
>>
>> Your command line may have just come across with a typo, but something
>> isn't right:
>>
>> -hostfile
>> /home/sysgen/infiniband-mpi-test/machine/usr/mpi/gcc4/openmpi-1.2.2-1/tests/IMB-2.3/IMB-MPI1
>>
>> That looks more like a path to a binary than a path to a hostfile. Is
>> there a missing space or filename somewhere?
>>
>> If not, then I would have expected this to error out since the argument
>> would be taken as the hostfile, leaving no executable specified.
>>
>> If you get that straightened out, then try adding -mca btl openib,sm,self
>> to the cmd line. This will direct mpirun to use only the OpenIB, shared
>> memory, and loopback transports, so you shouldn't pick up uDAPL any more.
>>
>> Ralph
>>
>>
>> On Nov 20, 2008, at 12:38 PM, Michael Oevermann wrote:
>>
>>> Hi all,
>>>
>>> I have "inherited" a small cluster with a head node and four compute
>>> nodes which I have to administer. The nodes are connected via infiniband
>>> (OFED), but the head is not. I am a complete novice to the infiniband stuff
>>> and here is my problem:
>>>
>>> The infiniband configuration seems to be OK. The usual tests suggested in
>>> the OFED install guide give the expected output, e.g.
>>>
>>>
>>> ibv_devinfo on the nodes:
>>>
>>>
>>> ************************* oscar_cluster *************************
>>> --------- n01---------
>>> hca_id: mthca0
>>> fw_ver: 1.2.0
>>> node_guid: 0002:c902:0025:930c
>>> sys_image_guid: 0002:c902:0025:930f
>>> vendor_id: 0x02c9
>>> vendor_part_id: 25204
>>> hw_ver: 0xA0
>>> board_id: MT_03B0140001
>>> phys_port_cnt: 1
>>> port: 1
>>> state: PORT_ACTIVE (4)
>>> max_mtu: 2048 (4)
>>> active_mtu: 2048 (4)
>>> sm_lid: 2
>>> port_lid: 1
>>> port_lmc: 0x00
>>>
>>> etc. for the other nodes.
>>>
>>> sminfo on the nodes:
>>>
>>> ************************* oscar_cluster *************************
>>> --------- n01---------
>>> sminfo: sm lid 2 sm guid 0x2c90200259201, activity count 6881 priority 0
>>> state 3 SMINFO_MASTER
>>> --------- n02---------
>>> sminfo: sm lid 2 sm guid 0x2c90200259201, activity count 6882 priority 0
>>> state 3 SMINFO_MASTER
>>> --------- n03---------
>>> sminfo: sm lid 2 sm guid 0x2c90200259201, activity count 6883 priority 0
>>> state 3 SMINFO_MASTER
>>> --------- n04---------
>>> sminfo: sm lid 2 sm guid 0x2c90200259201, activity count 6884 priority 0
>>> state 3 SMINFO_MASTER
>>>
>>>
>>>
>>> However, when I directly start a mpi job (without using a scheduler) via:
>>>
>>> /usr/mpi/gcc4/openmpi-1.2.2-1/bin/mpirun -np 4 -hostfile
>>> /home/sysgen/infiniband-mpi-test/machine/usr/mpi/gcc4/openmpi-1.2.2-1/tests/IMB-2.3/IMB-MPI1
>>>
>>> I get the error message:
>>>
>>> 0,1,0]: uDAPL on host n01 was unable to find any NICs.
>>> Another transport will be used instead, although this may result in
>>> lower performance.
>>>
>>> --------------------------------------------------------------------------
>>>
>>> --------------------------------------------------------------------------
>>> [0,1,2]: uDAPL on host n01 was unable to find any NICs.
>>> Another transport will be used instead, although this may result in
>>> lower performance.
>>>
>>> --------------------------------------------------------------------------
>>>
>>> --------------------------------------------------------------------------
>>> [0,1,3]: uDAPL on host n02 was unable to find any NICs.
>>> Another transport will be used instead, although this may result in
>>> lower performance.
>>>
>>> --------------------------------------------------------------------------
>>>
>>> --------------------------------------------------------------------------
>>> [0,1,1]: uDAPL on host n02 was unable to find any NICs.
>>> Another transport will be used instead, although this may result in
>>> lower performance.
>>>
>>> --------------------------------------------------------------------------
>>> MPI with normal GB Etherrnet and IP networking just works fine, but the
>>> infinband doesn't. The MPI libs I am using
>>> for the test are definitely compiled with IB support and the tests have
>>> been run successfully on
>>> the cluster before.
>>>
>>> Any suggestions what is going wrong here?
>>>
>>> Best regards and thanks for any help!
>>>
>>> Michael
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden] <mailto:users_at_[hidden]>
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

-- 
Tim Mattox, Ph.D. - http://homepage.mac.com/tmattox/
 tmattox_at_[hidden] || timattox_at_[hidden]
    I'm a bright... http://www.the-brights.net/