Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Problem with openmpi and infiniband
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2008-12-25 08:04:23


Another thing to try is a change that we made late in the Open MPI
v1.2 series with regards to IB:

     http://www.open-mpi.org/faq/?category=openfabrics#v1.2-use-early-completion

On Dec 24, 2008, at 10:07 PM, Tim Mattox wrote:

> For your runs with Open MPI over InfiniBand, try using openib,sm,self
> for the BTL setting, so that shared memory communications are used
> within a node. It would give us another datapoint to help diagnose
> the problem. As for other things we would need to help diagnose the
> problem, please follow the advice on this FAQ entry, and the help
> page:
> http://www.open-mpi.org/faq/?category=openfabrics#ofa-troubleshoot
> http://www.open-mpi.org/community/help/
>
> On Wed, Dec 24, 2008 at 5:55 AM, Biagio Lucini
> <B.Lucini_at_[hidden]> wrote:
>> Pavel Shamis (Pasha) wrote:
>>>
>>> Biagio Lucini wrote:
>>>>
>>>> Hello,
>>>>
>>>> I am new to this list, where I hope to find a solution for a
>>>> problem
>>>> that I have been having for quite a longtime.
>>>>
>>>> I run various versions of openmpi (from 1.1.2 to 1.2.8) on a
>>>> cluster
>>>> with Infiniband interconnects that I use and administer at the same
>>>> time. The openfabric stac is OFED-1.2.5, the compilers gcc 4.2 and
>>>> Intel. The queue manager is SGE 6.0u8.
>>>
>>> Do you use OpenMPI version that is included in OFED ? Did you was
>>> able
>>> to run basic OFED/OMPI tests/benchmarks between two nodes ?
>>>
>>
>> Hi,
>>
>> yes to both questions: the OMPI version is the one that comes with
>> OFED
>> (1.1.2-1) and the basic tests run fine. For instance, IMB-MPI1
>> (which is
>> more than basic, as far as I can see) reports for the last test:
>>
>> #---------------------------------------------------
>> # Benchmarking Barrier
>> # #processes = 6
>> #---------------------------------------------------
>> #repetitions t_min[usec] t_max[usec] t_avg[usec]
>> 1000 22.93 22.95 22.94
>>
>>
>> for the openib,self btl (6 processes, all processes on different
>> nodes)
>> and
>>
>> #---------------------------------------------------
>> # Benchmarking Barrier
>> # #processes = 6
>> #---------------------------------------------------
>> #repetitions t_min[usec] t_max[usec] t_avg[usec]
>> 1000 191.30 191.42 191.34
>>
>> for the tcp,self btl (same test)
>>
>> No anomalies for other tests (ping-pong, all-to-all etc.)
>>
>> Thanks,
>> Biagio
>>
>>
>> --
>> =========================================================
>>
>> Dr. Biagio Lucini
>> Department of Physics, Swansea University
>> Singleton Park, SA2 8PP Swansea (UK)
>> Tel. +44 (0)1792 602284
>>
>> =========================================================
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
>
>
> --
> Tim Mattox, Ph.D. - http://homepage.mac.com/tmattox/
> tmattox_at_[hidden] || timattox_at_[hidden]
> I'm a bright... http://www.the-brights.net/
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Jeff Squyres
Cisco Systems