Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] SCALAPACK: Segmentation Fault (11) and Signal code: Address not mapped (1)
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2008-01-28 14:51:22


Sorry for not replying earlier.

I'm not a SCALAPACK expert, but a common mistake I've seen users make
is to use the mpif.h from a different MPI implementation when
compiling their fortran programs. Can you verify that you're getting
the Open MPI mpif.h?

Also, there is a known problem that with the Pathscale compiler that
they have stubbornly refused to comment on for about a year now
(meaning: a problem was identified many moons ago, and it has not been
tracked down to be either a Pathscale compiler problem or an Open MPI
problem -- we did as much as we could and handed off to Pathscale, but
with no forward progress since then). So you *may* be running into
that issue...? FWIW, we only saw the pathscale problem when running
on InfiniBand hardware, so YMMV.

Can you run any other MPI programs with Open MPI?

On Jan 22, 2008, at 4:06 PM, Backlund, Daniel wrote:

>
> Hello all, I am using OMPI 1.2.4 on a Linux cluster (Rocks 4.2).
> OMPI was configured to use the
> Pathscale Compiler Suite installed in the (NFS mounted on nodes) /
> home/PROGRAMS/pathscale. I am
> trying to compile and run the example1.f that comes with the ACML
> package from AMD, and I am
> unable to get it to run. All nodes have the same Opteron processors
> and 2GB ram per core. OMPI
> was configured as below.
>
> export CC=pathcc
> export CXX=pathCC
> export FC=pathf90
> export F77=pathf90
>
> ./configure --prefix=/opt/openmpi/1.2.4 --enable-static --without-
> threads --without-memory-manager \
> --without-libnuma --disable-mpi-threads
>
> The configuration was successful, the install was successful, I can
> even run a sample mpihello.f90
> program. I would eventually like to link the ACML SCALAPACK and
> BLACS libraries to our code, but I
> need some help. The ACML version is 3.1.0 for pathscale64. I go into
> the scalapack_examples directory,
> modify GNUmakefile to the correct values, and compile successfully.
> I have made openmpi into an rpm and
> pushed it to the nodes, modified LD_LIBRARY_PATH and PATH, and made
> sure I can see it on all nodes.
> When I try to run the example1.exe which is generated, using /opt/
> openmpi/1.2.4/bin/mpirun -np 6 example1.exe
> I get the following output:
>
> <<<< example1.res >>>>
>
> [XXXXXXX:31295] *** Process received signal ***
> [XXXXXXX:31295] Signal: Segmentation fault (11)
> [XXXXXXX:31295] Signal code: Address not mapped (1)
> [XXXXXXX:31295] Failing at address: 0x44000070
> [XXXXXXX:31295] *** End of error message ***
> [XXXXXXX:31298] *** Process received signal ***
> [XXXXXXX:31298] Signal: Segmentation fault (11)
> [XXXXXXX:31298] Signal code: Address not mapped (1)
> [XXXXXXX:31298] Failing at address: 0x44000070
> [XXXXXXX:31298] *** End of error message ***
> [XXXXXXX:31299] *** Process received signal ***
> [XXXXXXX:31299] Signal: Segmentation fault (11)
> [XXXXXXX:31299] Signal code: Address not mapped (1)
> [XXXXXXX:31299] Failing at address: 0x44000070
> [XXXXXXX:31299] *** End of error message ***
> [XXXXXXX:31300] *** Process received signal ***
> [XXXXXXX:31300] Signal: Segmentation fault (11)
> [XXXXXXX:31300] Signal code: Address not mapped (1)
> [XXXXXXX:31300] Failing at address: 0x44000070
> [XXXXXXX:31300] *** End of error message ***
> [XXXXXXX:31296] *** Process received signal ***
> [XXXXXXX:31296] Signal: Segmentation fault (11)
> [XXXXXXX:31296] Signal code: Address not mapped (1)
> [XXXXXXX:31296] Failing at address: 0x44000070
> [XXXXXXX:31296] *** End of error message ***
> [XXXXXXX:31297] *** Process received signal ***
> [XXXXXXX:31297] Signal: Segmentation fault (11)
> [XXXXXXX:31297] Signal code: Address not mapped (1)
> [XXXXXXX:31297] Failing at address: 0x44000070
> [XXXXXXX:31297] *** End of error message ***
> mpirun noticed that job rank 0 with PID 31295 on node
> XXXXXXX.ourdomain.com exited on signal 11 (Segmentation fault).
> 5 additional processes aborted (not shown)
>
> <<<< end example1.res >>>>
>
> Here is the result of ldd example1.exe
>
> <<<< ldd example1.exe >>>>
> libmpi_f90.so.0 => /opt/openmpi/1.2.4/lib/libmpi_f90.so.0
> (0x0000002a9557d000)
> libmpi_f77.so.0 => /opt/openmpi/1.2.4/lib/libmpi_f77.so.0
> (0x0000002a95681000)
> libmpi.so.0 => /opt/openmpi/1.2.4/lib/libmpi.so.0
> (0x0000002a957b3000)
> libopen-rte.so.0 => /opt/openmpi/1.2.4/lib/libopen-rte.so.0
> (0x0000002a959fb000)
> libopen-pal.so.0 => /opt/openmpi/1.2.4/lib/libopen-pal.so.0
> (0x0000002a95be7000)
> librt.so.1 => /lib64/tls/librt.so.1 (0x0000003e7cd00000)
> libnsl.so.1 => /lib64/libnsl.so.1 (0x0000003e7c200000)
> libutil.so.1 => /lib64/libutil.so.1 (0x0000003e79e00000)
> libmv.so.1 => /home/PROGRAMS/pathscale/lib/3.0/libmv.so.1
> (0x0000002a95d4d000)
> libmpath.so.1 => /home/PROGRAMS/pathscale/lib/3.0/libmpath.so.
> 1 (0x0000002a95e76000)
> libm.so.6 => /lib64/tls/libm.so.6 (0x0000003e77a00000)
> libdl.so.2 => /lib64/libdl.so.2 (0x0000003e77c00000)
> libpathfortran.so.1 => /home/PROGRAMS/pathscale/lib/3.0/
> libpathfortran.so.1 (0x0000002a95f97000)
> libc.so.6 => /lib64/tls/libc.so.6 (0x0000003e77700000)
> libpthread.so.0 => /lib64/tls/libpthread.so.0
> (0x0000003e78200000)
> /lib64/ld-linux-x86-64.so.2 (0x0000003e76800000)
> <<<< end ldd >>>>
>
> Like I said, the compilation of the example program yields no
> errors, it just will not run.
> Does anybody have any suggestions? Am I doing something wrong?
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Jeff Squyres
Cisco Systems