Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: [OMPI devel] make check fails for Intel 2011.6.233 (OpenMPI 1.4.3)
From: Larry Baker (baker_at_[hidden])
Date: 2011-10-07 14:08:36


I ran into a problem this past week trying to upgrade our OpenMPI
1.4.3 for the latest Intel 2011 compiler, 2011.6.233.

make check fails with Segmentation Fault errors:

> [root_at_hydra openmpi-1.4.3]# tail -20 ../openmpi-1.4.3-check-intel.
> 6.233.log
> /bin/sh ../../libtool --tag=CC --mode=link icc -DNDEBUG -g -O3 -
> finline-functions -fno-strict-aliasing -restrict -pthread -
> fvisibility=hidden -shared-intel -export-dynamic -shared-intel -o
> ddt_pack ddt_pack.o ../../ompi/libmpi.la -lnsl -lutil
> libtool: link: icc -DNDEBUG -g -O3 -finline-functions -fno-strict-
> aliasing -restrict -pthread -fvisibility=hidden -shared-intel -
> shared-intel -o .libs/ddt_pack ddt_pack.o -Wl,--export-
> dynamic ../../ompi/.libs/libmpi.so /usr/local/src/openmpi-1.4.3/
> orte/.libs/libopen-rte.so /usr/local/src/openmpi-1.4.3/opal/.libs/
> libopen-pal.so -ldl -lnsl -lutil -pthread -Wl,-rpath -Wl,/usr/local/
> lib
> make[3]: Leaving directory `/state/partition1/root/src/openmpi-1.4.3/
> test/datatype'
> make check-TESTS
> make[3]: Entering directory `/state/partition1/root/src/
> openmpi-1.4.3/test/datatype'
> /bin/sh: line 4: 6322 Segmentation fault ${dir}$tst
> FAIL: checksum
> /bin/sh: line 4: 6355 Segmentation fault ${dir}$tst
> FAIL: position
> ========================================================
> 2 of 2 tests failed
> Please report to http://www.open-mpi.org/community/help/
> ========================================================
> make[3]: *** [check-TESTS] Error 1
> make[3]: Leaving directory `/state/partition1/root/src/openmpi-1.4.3/
> test/datatype'
> make[2]: *** [check-am] Error 2
> make[2]: Leaving directory `/state/partition1/root/src/openmpi-1.4.3/
> test/datatype'
> make[1]: *** [check-recursive] Error 1
> make[1]: Leaving directory `/state/partition1/root/src/openmpi-1.4.3/
> test'
> make: *** [check-recursive] Error 1

Before trying to track down the problem, I thought I'd describe what I
see here in case someone recognizes what might be happening.

We have been using OpenMPI 1.4.3 compiled with the Intel 2011.3.174
compiler. I've updated the Intel 2011 compilers as they have come out
with new versions: 2011.4.191, 2011.5.220, and now 2011.6.233.
However, I've not recompiled OpenMPI 1.4.3 until now.

Since the original compilation of OpenMPI 1.4.3 with the Intel
2011.3.174 compilers, I have installed libnuma and libnuma-devel RPMs
on our cluster front end. I noticed that changed the OpenMPI 1.4.3 ./
configure output. To test that this was not the cause of the problem,
I recompiled OpenMPI 1.4.3 using both the CentOS/Rocks GNU compilers
and the Intel 2011.3.174 compilers. They both passed all the make
check tests.

To find out when this problem first occurs, I systematically
configured, compiled, and checked OpenMPI 1.4.3 with all four versions
of the Intel 2011 compilers we have. We use the modules package to
load the compiler environment:

> [root_at_hydra openmpi-1.4.3]# env | grep /opt/intel
> LD_LIBRARY_PATH=/opt/intel/composer_xe_2011_sp1.6.233/compiler/lib/
> intel64:/opt/intel/composer_xe_2011_sp1.6.233/mkl/lib/intel64
> PATH=/opt/intel/composer_xe_2011_sp1.6.233/bin/intel64:/usr/kerberos/
> sbin:/usr/kerberos/bin:/usr/java/latest/bin:/usr/local/sbin:/usr/
> local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/opt/eclipse:/opt/ganglia/
> bin:/opt/ganglia/sbin:/opt/maui/bin:/opt/torque/bin:/opt/torque/
> sbin:/opt/rocks/bin:/opt/rocks/sbin:/root/bin

Here's the steps I use to make and test OpenMPI 1.4.3 (I use a patched
version to accommodate the six compilers we have; I've submitted those
patches here in the past):

> # cd /usr/local/src
> # tar -xjf openmpi-1.4.3-patched.tar.bz2
> # cd openmpi-1.4.3
> # module load compilers/intel/2011.6.233
> # ./configure >../openmpi-1.4.3-configure-intel.6.233.log 2>&1 --
> with-tm --with-openib --without-valgrind --without-udapl --enable-
> contrib-no-build=vt --with-wrapper-ldflags="-shared-intel" CC="icc"
> CFLAGS="-g -O3" CXX="icpc" CXXFLAGS="-g -O3" FC="ifort" FCFLAGS="-g -
> O3" F77="ifort" FFLAGS="-g -O3" LDFLAGS="-shared-intel"
> # make >../openmpi-1.4.3-make-intel.6.233.log 2>&1
> # make check >../openmpi-1.4.3-check-intel.6.233.log 2>&1

(When I generate the OpenMPI 1.4.3 library we actually use, I also add
a --prefix. But, that complicates diff's of the stdout files for
these steps, so it is not used here. Thus, I do NOT proceed to make
install any of these libraries.)

The three earlier versions of the Intel 2011 compilers all pass the
make check tests. When I compare the ./configure stdout files, they
are all identical. However, the ./configure stdout file for the Intel
2011.6.233 compilers has one difference:

> [root_at_hydra openmpi-1.4.3]# diff ../openmpi-1.4.3-configure-intel.
> {5.220,6.233}.log
> 178c178
> < checking for __attribute__(may_alias)... no
> ---
> > checking for __attribute__(may_alias)... yes

That is obviously where I will start looking for the source of the
problem.

Maybe someone reading this list knows what the purpose of that test
is, whether the Intel 2011 compilers are expected to have this feature
enabled, and whether the code this enables can cause this problem if
the Intel 2011.6.233 compilers do not fully support whatever this test
is intended to discern.

Larry Baker
US Geological Survey
650-329-5608
baker_at_[hidden]