Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] infiniband problem
From: Tim Mattox (timattox_at_[hidden])
Date: 2008-11-20 16:35:49


BTW - after you get more comfortable with your new-to-you cluster, I
recommend you upgrade your Open MPI installation. v1.2.8 has
a lot of bugfixes relative to v1.2.2. Also, Open MPI 1.3 should be
available "next month"... so watch for an announcement on that front.

On Thu, Nov 20, 2008 at 3:16 PM, Michael Oevermann
<michael.oevermann_at_[hidden]> wrote:
> Hi Ralph,
>
> that was indeed a typo, the command is of course
>
> /usr/mpi/gcc4/openmpi-1.2.2-1/bin/mpirun -np 4 -hostfile
> /home/sysgen/infiniband-mpi-test/machine
> /usr/mpi/gcc4/openmpi-1.2.2-1/tests/IMB-2.3/IMB-MPI1
>
> with a blank behind /machine. Anyway, your suggested options -mca btl
> openib,sm,self
> did help!!! Right now I am not able to check the performance results as the
> cluster is busy with jobs so I cannot
> compare with the old benchmark results.
>
> Thanks for help!
>
> Michael
>
>
> Ralph Castain schrieb:
>>
>> Your command line may have just come across with a typo, but something
>> isn't right:
>>
>> -hostfile
>> /home/sysgen/infiniband-mpi-test/machine/usr/mpi/gcc4/openmpi-1.2.2-1/tests/IMB-2.3/IMB-MPI1
>>
>> That looks more like a path to a binary than a path to a hostfile. Is
>> there a missing space or filename somewhere?
>>
>> If not, then I would have expected this to error out since the argument
>> would be taken as the hostfile, leaving no executable specified.
>>
>> If you get that straightened out, then try adding -mca btl openib,sm,self
>> to the cmd line. This will direct mpirun to use only the OpenIB, shared
>> memory, and loopback transports, so you shouldn't pick up uDAPL any more.
>>
>> Ralph
>>
>>
>> On Nov 20, 2008, at 12:38 PM, Michael Oevermann wrote:
>>
>>> Hi all,
>>>
>>> I have "inherited" a small cluster with a head node and four compute
>>> nodes which I have to administer. The nodes are connected via infiniband
>>> (OFED), but the head is not. I am a complete novice to the infiniband stuff
>>> and here is my problem:
>>>
>>> The infiniband configuration seems to be OK. The usual tests suggested in
>>> the OFED install guide give the expected output, e.g.
>>>
>>>
>>> ibv_devinfo on the nodes:
>>>
>>>
>>> ************************* oscar_cluster *************************
>>> --------- n01---------
>>> hca_id: mthca0
>>> fw_ver: 1.2.0
>>> node_guid: 0002:c902:0025:930c
>>> sys_image_guid: 0002:c902:0025:930f
>>> vendor_id: 0x02c9
>>> vendor_part_id: 25204
>>> hw_ver: 0xA0
>>> board_id: MT_03B0140001
>>> phys_port_cnt: 1
>>> port: 1
>>> state: PORT_ACTIVE (4)
>>> max_mtu: 2048 (4)
>>> active_mtu: 2048 (4)
>>> sm_lid: 2
>>> port_lid: 1
>>> port_lmc: 0x00
>>>
>>> etc. for the other nodes.
>>>
>>> sminfo on the nodes:
>>>
>>> ************************* oscar_cluster *************************
>>> --------- n01---------
>>> sminfo: sm lid 2 sm guid 0x2c90200259201, activity count 6881 priority 0
>>> state 3 SMINFO_MASTER
>>> --------- n02---------
>>> sminfo: sm lid 2 sm guid 0x2c90200259201, activity count 6882 priority 0
>>> state 3 SMINFO_MASTER
>>> --------- n03---------
>>> sminfo: sm lid 2 sm guid 0x2c90200259201, activity count 6883 priority 0
>>> state 3 SMINFO_MASTER
>>> --------- n04---------
>>> sminfo: sm lid 2 sm guid 0x2c90200259201, activity count 6884 priority 0
>>> state 3 SMINFO_MASTER
>>>
>>>
>>>
>>> However, when I directly start a mpi job (without using a scheduler) via:
>>>
>>> /usr/mpi/gcc4/openmpi-1.2.2-1/bin/mpirun -np 4 -hostfile
>>> /home/sysgen/infiniband-mpi-test/machine/usr/mpi/gcc4/openmpi-1.2.2-1/tests/IMB-2.3/IMB-MPI1
>>>
>>> I get the error message:
>>>
>>> 0,1,0]: uDAPL on host n01 was unable to find any NICs.
>>> Another transport will be used instead, although this may result in
>>> lower performance.
>>>
>>> --------------------------------------------------------------------------
>>>
>>> --------------------------------------------------------------------------
>>> [0,1,2]: uDAPL on host n01 was unable to find any NICs.
>>> Another transport will be used instead, although this may result in
>>> lower performance.
>>>
>>> --------------------------------------------------------------------------
>>>
>>> --------------------------------------------------------------------------
>>> [0,1,3]: uDAPL on host n02 was unable to find any NICs.
>>> Another transport will be used instead, although this may result in
>>> lower performance.
>>>
>>> --------------------------------------------------------------------------
>>>
>>> --------------------------------------------------------------------------
>>> [0,1,1]: uDAPL on host n02 was unable to find any NICs.
>>> Another transport will be used instead, although this may result in
>>> lower performance.
>>>
>>> --------------------------------------------------------------------------
>>> MPI with normal GB Etherrnet and IP networking just works fine, but the
>>> infinband doesn't. The MPI libs I am using
>>> for the test are definitely compiled with IB support and the tests have
>>> been run successfully on
>>> the cluster before.
>>>
>>> Any suggestions what is going wrong here?
>>>
>>> Best regards and thanks for any help!
>>>
>>> Michael
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden] <mailto:users_at_[hidden]>
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

-- 
Tim Mattox, Ph.D. - http://homepage.mac.com/tmattox/
 tmattox_at_[hidden] || timattox_at_[hidden]
    I'm a bright... http://www.the-brights.net/