Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Michael Kluskens (mklus_at_[hidden])
Date: 2006-10-03 10:53:58


Summary:

OpenMPI 1.1.1 and 1.3a1r11943 have different bugs with regards to
BLACS 1.1p3.

1.3 fails where 1.1.1 passes and vice-versus.

(1.1.1): Integer, real, double precision SDRV tests fail cases 1 &
51, then lots of errors until Integer SUM test then all tests pass.

(1.3): No errors until it crashes on the Complex AMX test (which is
after the Integer Sum test).

System configuration: Debian 3.1r3 on dual opteron, gcc 3.3.5, Intel
ifort 9.1.032.

On Oct 3, 2006, at 2:44 AM, Åke Sandgren wrote:

> On Mon, 2006-10-02 at 18:39 -0400, Michael Kluskens wrote:
>> OpenMPI, BLACS, and blacstester built just fine. Tester reports
>> errors for integer and real cases #1 and #51 and more for the other
>> types..
>>
>> <http://svn.open-mpi.org/trac/ompi/ticket/356> is an open ticket
>> related to this.
>
> Finally someone else with the same problem!!!
>
> I tried the suggested fix from ticket 356 but it didn't help.
> I still get lots of errors in the blacstest.
>
> I'm running on a dual-cpu opteron with Ubuntu dapper and gcc-4.0.
> The tests also failed on our i386 Ubuntu breezy system with gcc-3.4

More details of my two tests:
--------------------------------
OpenMPI 1.1.1
./configure --prefix=/opt/intel9.1/openmpi/1.1.1 F77=ifort FC=ifort --
with-mpi-f90-size=medium

BLACS 1.1 patch 3, Bmake.inc based on Bmake.MPI-LINUX with following
changes:

BTOPdir = /opt/intel9.1/openmpi/1.1.1/BLACS
BLACSDBGLVL = 1
MPIdir = /opt/intel9.1/openmpi/1.1.1
MPILIB =
INTFACE = -DAdd_
F77 = $(MPIdir)/bin/mpif77
CC = $(MPIdir)/bin/mpicc
CCFLAGS = -O3

--------------------------------
OpenMPI 1.3a1r11943
./configure --prefix=/opt/intel9.1/openmpi/1.3 F77=ifort FC=ifort --
with-mpi-f90-size=medium

similar changes for Bmake.inc in BLACS.

test launched in BLACS/TESTING/EXE using:

mpirun --prefix /opt/intel9.1/openmpi/1.3 -np 4 xCbtest_MPI-LINUX-1

No errors works much better but eventually failures with:

COMPLEX AMX TESTS: BEGIN.
Signal:11 info.si_errno:0(Success) si_code:128()
Failing at addr:(nil)
Signal:11 info.si_errno:0(Success) si_code:1(SEGV_MAPERR)
Failing at addr:0xb80000
[0] func:/opt/intel9.1/openmpi/1.3/lib/libopal.so.0
(opal_backtrace_print+0x1f) [0x2a95aa5c1f]
*** End of error message ***

Michael