Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] SCALAPACK: Segmentation Fault (11) and Signal code: Address not mapped (1)
From: Backlund, Daniel (daniel.backlund_at_[hidden])
Date: 2008-01-22 16:06:24


Hello all, I am using OMPI 1.2.4 on a Linux cluster (Rocks 4.2). OMPI was configured to use the
Pathscale Compiler Suite installed in the (NFS mounted on nodes) /home/PROGRAMS/pathscale. I am
trying to compile and run the example1.f that comes with the ACML package from AMD, and I am
unable to get it to run. All nodes have the same Opteron processors and 2GB ram per core. OMPI
was configured as below.

export CC=pathcc
export CXX=pathCC
export FC=pathf90
export F77=pathf90

./configure --prefix=/opt/openmpi/1.2.4 --enable-static --without-threads --without-memory-manager \
  --without-libnuma --disable-mpi-threads

The configuration was successful, the install was successful, I can even run a sample mpihello.f90
program. I would eventually like to link the ACML SCALAPACK and BLACS libraries to our code, but I
need some help. The ACML version is 3.1.0 for pathscale64. I go into the scalapack_examples directory,
modify GNUmakefile to the correct values, and compile successfully. I have made openmpi into an rpm and
pushed it to the nodes, modified LD_LIBRARY_PATH and PATH, and made sure I can see it on all nodes.
When I try to run the example1.exe which is generated, using /opt/openmpi/1.2.4/bin/mpirun -np 6 example1.exe
I get the following output:

<<<< example1.res >>>>

[XXXXXXX:31295] *** Process received signal ***
[XXXXXXX:31295] Signal: Segmentation fault (11)
[XXXXXXX:31295] Signal code: Address not mapped (1)
[XXXXXXX:31295] Failing at address: 0x44000070
[XXXXXXX:31295] *** End of error message ***
[XXXXXXX:31298] *** Process received signal ***
[XXXXXXX:31298] Signal: Segmentation fault (11)
[XXXXXXX:31298] Signal code: Address not mapped (1)
[XXXXXXX:31298] Failing at address: 0x44000070
[XXXXXXX:31298] *** End of error message ***
[XXXXXXX:31299] *** Process received signal ***
[XXXXXXX:31299] Signal: Segmentation fault (11)
[XXXXXXX:31299] Signal code: Address not mapped (1)
[XXXXXXX:31299] Failing at address: 0x44000070
[XXXXXXX:31299] *** End of error message ***
[XXXXXXX:31300] *** Process received signal ***
[XXXXXXX:31300] Signal: Segmentation fault (11)
[XXXXXXX:31300] Signal code: Address not mapped (1)
[XXXXXXX:31300] Failing at address: 0x44000070
[XXXXXXX:31300] *** End of error message ***
[XXXXXXX:31296] *** Process received signal ***
[XXXXXXX:31296] Signal: Segmentation fault (11)
[XXXXXXX:31296] Signal code: Address not mapped (1)
[XXXXXXX:31296] Failing at address: 0x44000070
[XXXXXXX:31296] *** End of error message ***
[XXXXXXX:31297] *** Process received signal ***
[XXXXXXX:31297] Signal: Segmentation fault (11)
[XXXXXXX:31297] Signal code: Address not mapped (1)
[XXXXXXX:31297] Failing at address: 0x44000070
[XXXXXXX:31297] *** End of error message ***
mpirun noticed that job rank 0 with PID 31295 on node XXXXXXX.ourdomain.com exited on signal 11 (Segmentation fault).
5 additional processes aborted (not shown)

<<<< end example1.res >>>>

Here is the result of ldd example1.exe

<<<< ldd example1.exe >>>>
        libmpi_f90.so.0 => /opt/openmpi/1.2.4/lib/libmpi_f90.so.0 (0x0000002a9557d000)
        libmpi_f77.so.0 => /opt/openmpi/1.2.4/lib/libmpi_f77.so.0 (0x0000002a95681000)
        libmpi.so.0 => /opt/openmpi/1.2.4/lib/libmpi.so.0 (0x0000002a957b3000)
        libopen-rte.so.0 => /opt/openmpi/1.2.4/lib/libopen-rte.so.0 (0x0000002a959fb000)
        libopen-pal.so.0 => /opt/openmpi/1.2.4/lib/libopen-pal.so.0 (0x0000002a95be7000)
        librt.so.1 => /lib64/tls/librt.so.1 (0x0000003e7cd00000)
        libnsl.so.1 => /lib64/libnsl.so.1 (0x0000003e7c200000)
        libutil.so.1 => /lib64/libutil.so.1 (0x0000003e79e00000)
        libmv.so.1 => /home/PROGRAMS/pathscale/lib/3.0/libmv.so.1 (0x0000002a95d4d000)
        libmpath.so.1 => /home/PROGRAMS/pathscale/lib/3.0/libmpath.so.1 (0x0000002a95e76000)
        libm.so.6 => /lib64/tls/libm.so.6 (0x0000003e77a00000)
        libdl.so.2 => /lib64/libdl.so.2 (0x0000003e77c00000)
        libpathfortran.so.1 => /home/PROGRAMS/pathscale/lib/3.0/libpathfortran.so.1 (0x0000002a95f97000)
        libc.so.6 => /lib64/tls/libc.so.6 (0x0000003e77700000)
        libpthread.so.0 => /lib64/tls/libpthread.so.0 (0x0000003e78200000)
        /lib64/ld-linux-x86-64.so.2 (0x0000003e76800000)
<<<< end ldd >>>>

Like I said, the compilation of the example program yields no errors, it just will not run.
Does anybody have any suggestions? Am I doing something wrong?