Ok, I think I found the cause of the SPARC segv when trying to use a
64-bit compiled Open MPI library. If one does not set the WHATMPI
variable in the Bmake.inc it defaults to UseF77Mpi which assumes all
handles are ints. This is a correct assumption if you are using the F77
interfaces but the way BLACS seems to compile for Open MPI it uses the C
versions. So the handles are stored as 32 bits in BLACS and passed to
the C Open MPI interfaces which expects 64 bits. In cases where your
addresses need more than 32 bits this will cause MPI to segv when passed
an invalid address due to this coersion.
So by setting "WHATMPI= -DUseCMpi" I've gotten the SPARC version of
BLACS compiled for 64 bits to pass its tests without segv'ing. I do
believe this issue actually exists for other platforms (ie AMD64 and
IA64) with other OSes and compilers. Just that we've been lucky that
MPI_COMM_WORLD is allocated such that it has an address that fits in 32
bits. I am amazed still that we haven't seen this fail in user codes.
Note, I have not confirmed this failure with a test case but the code
stack in dbx looks the same on X64 platforms as the code on SPARC except
the address is smaller on the former.
Greg, I would be interested in knowing if you are still seeing the
problem on Leopard and whether the above setting helps any.
> *Subject:* Re: [OMPI users] ScaLapack and BLACS on Leopard
> *From:* Terry Dontje (/Terry.Dontje_at_[hidden]/)
> *Date:* 2008-03-03 07:34:17
> What kind of system lib errors are you seeing and do you have a stack
> trace? Note, I was trying something similar with Solaris and 64-bit on
> a SPARC machine and was seeing segv's inside the MPI Library due to a
> pointer being passed through an integer (thus dropping the upper 32
> bits). Funny thing is it all works under Solaris on AMD64 or IA-64
> > Date: Thu, 28 Feb 2008 17:50:28 -0500
> > From: Gregory John Orris <gregory.orris_at_[hidden]>
> > Subject: [OMPI users] ScaLapack and BLACS on Leopard
> > To: Open MPI Users <users_at_[hidden]>
> > Message-ID: <528FD4C0-6157-49CB-80E6-1C62684E4545_at_[hidden]>
> > Content-Type: text/plain; charset="us-ascii"
> > Hey Folks,
> > Anyone got ScaLapack and BLACS working and not just compiled under
> > OSX10.5 in 64-bit mode?
> > The FAQ site directions were followed and every thing compiles just
> > fine. But ALL of the single precision routines and many of the double
> > precisions routines in the TESTING directory fail with system lib
> > errors.
> > I've gotten some interesting errors and am wondering what the magic
> > touch is.
> > Regards,
> > Greg