Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Problem with openmpi and infiniband
From: doriankrause (doriankrause_at_[hidden])
Date: 2008-12-23 17:31:53


Hi

Biagio Lucini wrote:
> Hello,
>
> I am new to this list, where I hope to find a solution for a problem
> that I have been having for quite a longtime.
>
> I run various versions of openmpi (from 1.1.2 to 1.2.8) on a cluster
> with Infiniband interconnects that I use and administer at the same
> time. The openfabric stac is OFED-1.2.5, the compilers gcc 4.2 and
> Intel. The queue manager is SGE 6.0u8.
>
> The trouble is with an MPI code that runs fine with an openmpi 1.1.2
> library compiled without infiniband support (I have tested the
> scalability of the code up to 64 cores, the nodes are 4 or 8 cores,
> the results are exactly what I expect), but if I try to use a version
> compiled for infiniband, then only a subset of comunications (the ones
> connecting cores in the same node) are enabled, and because of this
> the program fails (gets stuck in a perennial waiting phase, in
> particular). This happens with any combination of compilers/library
> releases (1.1.2, 1.2.7, 1.2.8) I have tried. On other codes, and in
> particular on benchmarks downloaded from the net, openmpi over
> infiniband seems to work (I compared the latency with the tcp btl, so
> I am pretty sure that infiniband works). The two variables I kept
> fixed are SGE and the OFED module stack. I would like not to touch
> them, if possible, because the cluster seems to run fine for other
> purposes.
>
> My question is: does anyone has a suggestion on what I could try next?
> I'm pretty sure that to get an answer I need to provide more details,
> which I am willing to do, but in more than two months of
> testing/trying/hoping/praying I have accumulated so much material and
> information that if I post everything in this e-mail I am likely to
> confuse a potential helper, more than helping him to understand the
> problem.

Does the problem only show up with openmpi? Did you tried to use mvapich
(http://mvapich.cse.ohio-state.edu/) to test whether it is a hardware or
software problem? (I don't know any other open-source MPI implementation
which supports infiniband)

Dorian

>
> Thank you in advance,
> Biagio Lucini
>