Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] OpenMPI locking up only on IB
From: Pavel Shamis (Pasha) (pasha_at_[hidden])
Date: 2008-07-03 11:40:12

Brock Palen wrote:
> Ok it looks like a bigger problem. The segfault is not related to
> OMPI because when I go and rebuild 1.2 or another version we use with
> IB all the time, it will now fail with a segfault when forcing IB.
> The old libs of the same version still work. They of-course do not
> have the flag to turn off early completion.
> Was there an older version of OpenMPI that did not suffer from the
> early completion problem?
The issue was fixed in 1.3 branch, all versions before 1.3 have this
> We have many installed and for a quick test latest and greatest would
> not be of much concern while we track down the problem on our end.
> We are on RHEL4 using OFED provided by redhat. The error is "address
> not mapped to object"
I think that best for you will be try to install Mellanox OFED
distribution that already include pre-build versions on OpenMPI 1.2.6
with Intel and Pgi compilers:

> Brock Palen
> Center for Advanced Computing
> brockp_at_[hidden]
> (734)936-1985
> On Jul 3, 2008, at 8:38 AM, Jeff Squyres wrote:
>> On Jul 2, 2008, at 11:51 PM, Pavel Shamis (Pasha) wrote:
>>>> In trying to build 1.2.6 with the pgi compilers it makes an MPI
>>>> library that works with tcp, sm. But it segfaults on openib.
>>>> Both our intel compiler version and pgi version of 1.2.6 blow up
>>>> like this when we force IB. So this is a new issue.
>>> I have ompi 1.2.6 installed on my machines with Intel compiler
>>> (version 10.1) and Pgi compiler (version 7.1-5), both of them works
>>> with IB without any problem. BTW Mellanox provides Mellanox OFED
>>> binary distribution that include Intel and Pgi Open MPI 1.2.6 build.
>>> You can download it from here
>>>> Is there a way to shut off early completion in 1.2.3?
>>> Sure, just add "--mca |pml_ob1_use_early_completion 0" to your
>>> command line.| ||
>> Note that this flag was not added until v1.2.6; it has no effect in
>> v1.2.3.
>>>> Or the the above a known issues and i should use 1.2.7-pre or grab
>>>> a 1.3 snap shot?
>>> 1.2.6 should be ok.
>> The upcoming v1.3 series works a little differently; there's no need
>> to use this flag in the v1.3 series (i.e., this flag only exists in
>> the v1.2 series starting with v1.2.6).
>> --
>> Jeff Squyres
>> Cisco Systems