Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] ScaLapack and BLACS on Leopard
From: Gregory John Orris (gregory.orris_at_[hidden])
Date: 2008-03-06 12:20:06


Sorry for the long delay in response.

Let's get back to the beginning:
My original compiler configuration was gcc from the standard Leopard
Developer Tools supplied off the installation DVD. This version was
4.0.1. However, it has been significantly modified by Apple to work
with Leopard. If you haven't used Apple's Developer Environment,
you're missing out on something. It's pretty sweet. But the price you
pay for it is no fortran support (not usually a problem for me but it
is relevant here) and usually a somewhat time-lagged compiler. I'm not
as plugged into Apple as perhaps I should be, but I can only imagine
that their philosophy is to really over test their compiler. Gratis,
Apple throws into it's "frameworks" a shared library called vecLib,
that includes machine optimized BLAS and CLAPACK routines. Also, with
Leopard, Apple has integrated open-mpi (yea!). But they have once
again not included fortran support (boo!).

Now, to get fortran on a Mac you have several options (most of which
cannot really survive the cost-benefit analysis of a competent
manager), but a perfectly fine freeware option is to get it off of
hpc.sourceforge.net. This version is based on gcc 4.3.0. There are a
few legitimate reasons to stick with Apple's older gcc. As it's not
really a good idea to try an mix libraries from one compiler version
with another. Especially here, because (without knowing precisely what
Apple has done) there is a tremendous difference in execution speed of
code written with gcc 4.0 and 4.1 as opposed to 4.2 and later. (This
has been well documented on many systems.) Also, out of a bit of
laziness, I really didn't want to go to the trouble of re-writing (or
finding) all of the compiler scripts in the Developer Environment to
use the new gcc.

So, I compiled open-mpi-1.2.5 with gcc, g++ 4.0.1, and gfortran 4.3.
Then, I compiled BLACS and ScaLAPACK using the configuration from the
open-mpi FAQ page. Everything compiles perfectly ok, independent of
whether you choose 32 or 64 bit addressing. First problem was that I
was still calling mpicc from the Apple supplied openmpi and mpif77
from the newly installed distribution. Once again, I've not a clue
what Apple has done, but while the two would compile items together,
they DO NOT COMMUNICATE properly in 64-bit mode. MPI_COMM_WORLD even
in the test routines of openMPI would fail! This is the point at which
I originated the message asking if anyone had gotten a 64-bit version
to actually work. The errors were in libSystem and were not what I'd
expect from a simple openmpi error. I believe this problem is caused
by a difference in how pointers were/are treated within gcc from
version to version. Thus mixing versions essentially caused failure
within the Apple supplied openmpi distribution and the new one I
installed.

How to get over this hurdle? Install the complete gcc 4.3.0 from the
hpc.sourceforge.net site and recompile EVERYTHING!

You might think you were done here, but there is one (or actually
four) additional problem(s). Now NONE of the complex routines worked.
All of the test routines returned failure. And I tracked it down the
the fact that pzdotc, pzdotu, pcdotc, and pcdotu inside of the PBLAS
routines were failing. Potentially this was a much more difficult
problem, since rewriting these codes is really not what I'm paid to
do. Tracing down these errors further I found that the actual problem
is with the zdotc, zdotu, cdotc, and cdotu BLAS routines inside of
Apple's vecLib. So, the problem seemed as though a faulty manufacturer
supplied and optimized library was not functioning properly. Well, as
it turns out there is a peculiar difference (again) between versions
of the gcc suite in how it regards, returned values from complex
fortran functions (I'm only assuming this since the workaround was
successful). This problem has been know for some time now (perhaps 4
years or more). See, http://developer.apple.com/hardware/ve/errata.html#fortran_conventions

How to get over this hurdle? Install ATLAS, CLAPACK, and CBLAS off the
netlib.org web site, and compile them with the gcc 4.3.0 suite.

So, where am I now? BLACS and ScaLAPACK, and PBLAS work in 64-bit mode
with CLAPACK-3.1.1, ATLAS 3.8.1, Open-MPI-1.2.5, and GCC 4.3.0 and
link with ATLAS and CLAPACK and NOT vecLib!

Long way of saying that the problem appears to be solved, but not well
documented (until now)!

Regards,
Greg

On Mar 6, 2008, at 8:25 AM, Terry Dontje wrote:

> Ok, I think I found the cause of the SPARC segv when trying to use a
> 64-bit compiled Open MPI library. If one does not set the WHATMPI
> variable in the Bmake.inc it defaults to UseF77Mpi which assumes all
> handles are ints. This is a correct assumption if you are using the
> F77
> interfaces but the way BLACS seems to compile for Open MPI it uses
> the C
> versions. So the handles are stored as 32 bits in BLACS and passed to
> the C Open MPI interfaces which expects 64 bits. In cases where your
> addresses need more than 32 bits this will cause MPI to segv when
> passed
> an invalid address due to this coersion.
>
> So by setting "WHATMPI= -DUseCMpi" I've gotten the SPARC version of
> BLACS compiled for 64 bits to pass its tests without segv'ing. I do
> believe this issue actually exists for other platforms (ie AMD64 and
> IA64) with other OSes and compilers. Just that we've been lucky that
> MPI_COMM_WORLD is allocated such that it has an address that fits in
> 32
> bits. I am amazed still that we haven't seen this fail in user codes.
> Note, I have not confirmed this failure with a test case but the code
> stack in dbx looks the same on X64 platforms as the code on SPARC
> except
> the address is smaller on the former.
>
> Greg, I would be interested in knowing if you are still seeing the
> problem on Leopard and whether the above setting helps any.
>
> --td
>
> *
>> *Subject:* Re: [OMPI users] ScaLapack and BLACS on Leopard
>> *From:* Terry Dontje (/Terry.Dontje_at_[hidden]/)
>> *Date:* 2008-03-03 07:34:17
> *
>>
>> What kind of system lib errors are you seeing and do you have a stack
>> trace? Note, I was trying something similar with Solaris and 64-bit
>> on
>> a SPARC machine and was seeing segv's inside the MPI Library due to a
>> pointer being passed through an integer (thus dropping the upper 32
>> bits). Funny thing is it all works under Solaris on AMD64 or IA-64
>> platforms.
>>
>> --td
>>
>>> Date: Thu, 28 Feb 2008 17:50:28 -0500
>>> From: Gregory John Orris <gregory.orris_at_[hidden]>
>>> Subject: [OMPI users] ScaLapack and BLACS on Leopard
>>> To: Open MPI Users <users_at_[hidden]>
>>> Message-ID: <528FD4C0-6157-49CB-80E6-1C62684E4545_at_[hidden]>
>>> Content-Type: text/plain; charset="us-ascii"
>>>
>>> Hey Folks,
>>>
>>> Anyone got ScaLapack and BLACS working and not just compiled under
>>> OSX10.5 in 64-bit mode?
>>> The FAQ site directions were followed and every thing compiles just
>>> fine. But ALL of the single precision routines and many of the
>>> double
>>> precisions routines in the TESTING directory fail with system lib
>>> errors.
>>>
>>> I've gotten some interesting errors and am wondering what the magic
>>> touch is.
>>>
>>> Regards,
>>> Greg
>>>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>