Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] openmpi/openib problems
From: jessie puls (pulsj_at_[hidden])
Date: 2008-02-20 12:53:46


Secifically Jobs are not being handed to other nodes ever. Running

mpirun -mca btl openib,self -np 20 /bin/hostname

will return the same hostname 20 times, even if I specify -bynode as an
argument. They are Debian systems with 2 dual core processors in them,
and have the most recent open fabrics user and kernel packages from
openfabrics.org installed. I'm running a 2.6.18 kernel. My subnet
manager is on my switch, which is a Cisco SFS 7000. Also, as I
mentioned earlier everything is ok when I am using ipoib, but switching
to verbs is giving me a lot of problems.

Output from ibv_devinfo:

hca_id: mthca0
         fw_ver: 1.2.0
         node_guid: 0030:487c:a278:0000
         sys_image_guid: 0030:487c:a278:0003
         vendor_id: 0x02c9
         vendor_part_id: 25204
         hw_ver: 0xA0
         board_id: SM_0000000003
         phys_port_cnt: 1
                 port: 1
                         state: PORT_ACTIVE (4)
                         max_mtu: 2048 (4)
                         active_mtu: 2048 (4)
                         sm_lid: 2
                         port_lid: 9
                         port_lmc: 0x00

With the obvious exception of the node_guid, and sys_image_guid this is
the same across all of the nodes. I'm also attaching config.log and the
output from ompi_info --all

ulimit -l reports unlimited

Jeff Squyres wrote:
> Can you be more specific about what problems you're seeing?
>
> http://www.open-mpi.org/community/help/
>
> Note that the rdma mpool is the plugin that is used by the openib btl;
> there is no openib mpool (there used to be, but its functionality got
> generalized and put into the "rdma" plugin).
>
>
>
> On Feb 19, 2008, at 12:35 PM, jessie puls wrote:
>
>> jessie puls wrote:
>>> Hi all,
>>>
>>> I'm having problems getting openmpi to work correctly using verbs on
>>> some systems. It's been working using openib for quite some time,
>>> but I
>>> need to get it working using verbs for some research I'm doing.
>>
>> This would make a whole lot more sense if I'd typed it correctly.
>> It's
>> been working using ipoib.
>>
>>
>> Anyway
>>> all seems to be good on the openib side of things. ibv_devinfo and
>>> ibv_devices returns device information, and they are listed as
>>> active on
>>> each node. Also all hosts are visible to each other (ibhosts shows a
>>> full list).
>>>
>>> The problem I see with openmpi is I have the openib btl, but not the
>>> openib mpool, and when looking at the contents of ompi/mca/mpool/ I
>>> don't see openib there (sm and rdma are both listed and ompi_info
>>> shows
>>> they've been included in the build). Any help would be appreciated.
>>>
>>> Thanks,
>>>
>>> Jessie
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>