Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] openmpi 1.4 broken -mca coll_tuned_use_dynamic_rules 1
From: Daniel Spångberg (daniels_at_[hidden])
Date: 2009-12-30 05:13:25


Dear OpenMPI list,

I have used the dynamic rules for collectives to be able to select one
specific algorithm. With the latest versions of openmpi this seems to be
broken. Just enabling coll_tuned_use_dynamic_rules causes the code to
segfault. However, I do not provide a file with rules, since I just want
to modify the behavior of one routine.

I have tried the below example code on openmpi 1.3.2, 1.3.3, 1.3.4, and
1.4. It *works* on 1.3.2, 1.3.3, but segfaults on 1.3.4 and 1.4. I have
confirmed this on Scientific Linux 5.2, and 5.4. I have also successfully
reproduced the crash using version 1.4 running on debian etch. All running
on amd64, compiled from source without other options to configure than
--prefix. The crash occurs whether I use the intel 11.1 compiler (via env
CC) or gcc. It also occurs no matter the btl is set to openib,self
tcp,self sm,self or combinations of those. See below for ompi_info and
other info. I have tried MPI_Alltoall, MPI_Alltoallv, and MPI_Allreduce
which behave the same.

#include <stdlib.h>
#include <mpi.h>

int main(int argc, char **argv)
{
   int rank,size;
   char *buffer, *buffer2;

   MPI_Init(&argc,&argv);

   MPI_Comm_size(MPI_COMM_WORLD,&size);
   MPI_Comm_rank(MPI_COMM_WORLD,&rank);

   buffer=calloc(100*size,1);
   buffer2=calloc(100*size,1);

   MPI_Alltoall(buffer,100,MPI_BYTE,buffer2,100,MPI_BYTE,MPI_COMM_WORLD);

   MPI_Finalize();
   return 0;
}

Demonstrated behaviour:

$ ompi_info
                  Package: Open MPI daniels_at_arthur Distribution
                 Open MPI: 1.4
    Open MPI SVN revision: r22285
    Open MPI release date: Dec 08, 2009
                 Open RTE: 1.4
    Open RTE SVN revision: r22285
    Open RTE release date: Dec 08, 2009
                     OPAL: 1.4
        OPAL SVN revision: r22285
        OPAL release date: Dec 08, 2009
             Ident string: 1.4
                   Prefix:
/home/daniels/src/MISC/openmpi-1.4/openmpi-1.4_install
  Configured architecture: x86_64-unknown-linux-gnu
           Configure host: arthur
            Configured by: daniels
            Configured on: Tue Dec 29 16:54:37 CET 2009
           Configure host: arthur
                 Built by: daniels
                 Built on: Tue Dec 29 17:04:36 CET 2009
               Built host: arthur
               C bindings: yes
             C++ bindings: yes
       Fortran77 bindings: yes (all)
       Fortran90 bindings: yes
  Fortran90 bindings size: small
               C compiler: gcc
      C compiler absolute: /usr/bin/gcc
             C++ compiler: g++
    C++ compiler absolute: /usr/bin/g++
       Fortran77 compiler: gfortran
   Fortran77 compiler abs: /usr/bin/gfortran
       Fortran90 compiler: gfortran
   Fortran90 compiler abs: /usr/bin/gfortran
              C profiling: yes
            C++ profiling: yes
      Fortran77 profiling: yes
      Fortran90 profiling: yes
           C++ exceptions: no
           Thread support: posix (mpi: no, progress: no)
            Sparse Groups: no
   Internal debug support: no
      MPI parameter check: runtime
Memory profiling support: no
Memory debugging support: no
          libltdl support: yes
    Heterogeneous support: no
  mpirun default --prefix: no
          MPI I/O support: yes
        MPI_WTIME support: gettimeofday
Symbol visibility support: yes
    FT Checkpoint support: no (checkpoint thread: no)
            MCA backtrace: execinfo (MCA v2.0, API v2.0, Component v1.4)
               MCA memory: ptmalloc2 (MCA v2.0, API v2.0, Component v1.4)
            MCA paffinity: linux (MCA v2.0, API v2.0, Component v1.4)

                MCA carto: auto_detect (MCA v2.0, API v2.0, Component v1.4)
                MCA carto: file (MCA v2.0, API v2.0, Component v1.4)
            MCA maffinity: first_use (MCA v2.0, API v2.0, Component v1.4)
                MCA timer: linux (MCA v2.0, API v2.0, Component v1.4)
          MCA installdirs: env (MCA v2.0, API v2.0, Component v1.4)
          MCA installdirs: config (MCA v2.0, API v2.0, Component v1.4)
                  MCA dpm: orte (MCA v2.0, API v2.0, Component v1.4)
               MCA pubsub: orte (MCA v2.0, API v2.0, Component v1.4)
            MCA allocator: basic (MCA v2.0, API v2.0, Component v1.4)
            MCA allocator: bucket (MCA v2.0, API v2.0, Component v1.4)
                 MCA coll: basic (MCA v2.0, API v2.0, Component v1.4)
                 MCA coll: hierarch (MCA v2.0, API v2.0, Component v1.4)
                 MCA coll: inter (MCA v2.0, API v2.0, Component v1.4)
                 MCA coll: self (MCA v2.0, API v2.0, Component v1.4)
                 MCA coll: sm (MCA v2.0, API v2.0, Component v1.4)
                 MCA coll: sync (MCA v2.0, API v2.0, Component v1.4)
                 MCA coll: tuned (MCA v2.0, API v2.0, Component v1.4)
                   MCA io: romio (MCA v2.0, API v2.0, Component v1.4)
                MCA mpool: fake (MCA v2.0, API v2.0, Component v1.4)
                MCA mpool: rdma (MCA v2.0, API v2.0, Component v1.4)
                MCA mpool: sm (MCA v2.0, API v2.0, Component v1.4)
                  MCA pml: cm (MCA v2.0, API v2.0, Component v1.4)
                  MCA pml: csum (MCA v2.0, API v2.0, Component v1.4)
                  MCA pml: ob1 (MCA v2.0, API v2.0, Component v1.4)
                  MCA pml: v (MCA v2.0, API v2.0, Component v1.4)
                  MCA bml: r2 (MCA v2.0, API v2.0, Component v1.4)
               MCA rcache: vma (MCA v2.0, API v2.0, Component v1.4)
                  MCA btl: self (MCA v2.0, API v2.0, Component v1.4)
                  MCA btl: sm (MCA v2.0, API v2.0, Component v1.4)
                  MCA btl: tcp (MCA v2.0, API v2.0, Component v1.4)
                 MCA topo: unity (MCA v2.0, API v2.0, Component v1.4)
                  MCA osc: pt2pt (MCA v2.0, API v2.0, Component v1.4)
                  MCA osc: rdma (MCA v2.0, API v2.0, Component v1.4)
                  MCA iof: hnp (MCA v2.0, API v2.0, Component v1.4)
                  MCA iof: orted (MCA v2.0, API v2.0, Component v1.4)
                  MCA iof: tool (MCA v2.0, API v2.0, Component v1.4)
                  MCA oob: tcp (MCA v2.0, API v2.0, Component v1.4)
                 MCA odls: default (MCA v2.0, API v2.0, Component v1.4)
                  MCA ras: slurm (MCA v2.0, API v2.0, Component v1.4)
                MCA rmaps: load_balance (MCA v2.0, API v2.0, Component v1.4)
                MCA rmaps: rank_file (MCA v2.0, API v2.0, Component v1.4)
                MCA rmaps: round_robin (MCA v2.0, API v2.0, Component v1.4)
                MCA rmaps: seq (MCA v2.0, API v2.0, Component v1.4)
                  MCA rml: oob (MCA v2.0, API v2.0, Component v1.4)
               MCA routed: binomial (MCA v2.0, API v2.0, Component v1.4)
               MCA routed: direct (MCA v2.0, API v2.0, Component v1.4)
               MCA routed: linear (MCA v2.0, API v2.0, Component v1.4)
                  MCA plm: rsh (MCA v2.0, API v2.0, Component v1.4)
                  MCA plm: slurm (MCA v2.0, API v2.0, Component v1.4)
                MCA filem: rsh (MCA v2.0, API v2.0, Component v1.4)
               MCA errmgr: default (MCA v2.0, API v2.0, Component v1.4)
                  MCA ess: env (MCA v2.0, API v2.0, Component v1.4)
                  MCA ess: hnp (MCA v2.0, API v2.0, Component v1.4)
                  MCA ess: singleton (MCA v2.0, API v2.0, Component v1.4)
                  MCA ess: slurm (MCA v2.0, API v2.0, Component v1.4)
                  MCA ess: tool (MCA v2.0, API v2.0, Component v1.4)
              MCA grpcomm: bad (MCA v2.0, API v2.0, Component v1.4)
              MCA grpcomm: basic (MCA v2.0, API v2.0, Component v1.4)

$ mpicc -O2 -o bug_openmpi_1.4_test bug_openmpi_1.4_test.c
$ ldd ./bug_openmpi_1.4_test
         libmpi.so.0 =>
/home/daniels/src/MISC/openmpi-1.4/openmpi-1.4_install/lib/libmpi.so.0
(0x00002b33fa57e000)
         libopen-rte.so.0 =>
/home/daniels/src/MISC/openmpi-1.4/openmpi-1.4_install/lib/libopen-rte.so.0
(0x00002b33fa821000)
         libopen-pal.so.0 =>
/home/daniels/src/MISC/openmpi-1.4/openmpi-1.4_install/lib/libopen-pal.so.0
(0x00002b33faa6b000)
         libdl.so.2 => /lib64/libdl.so.2 (0x00000032c7400000)
         libnsl.so.1 => /lib64/libnsl.so.1 (0x00000032cfe00000)
         libutil.so.1 => /lib64/libutil.so.1 (0x00000032d4a00000)
         libm.so.6 => /lib64/libm.so.6 (0x00000032c7000000)
         libpthread.so.0 => /lib64/libpthread.so.0 (0x00000032c7800000)
         libc.so.6 => /lib64/libc.so.6 (0x00000032c6c00000)
         /lib64/ld-linux-x86-64.so.2 (0x00000032c5c00000)
$ mpirun -mca btl tcp,self -mca coll_tuned_use_dynamic_rules 0 -np 8
./bug_openmpi_1.4_test
$ mpirun -mca btl tcp,self -mca coll_tuned_use_dynamic_rules 1 -np 8
./bug_openmpi_1.4_test
[girasole:27510] *** Process received signal ***
[girasole:27510] Signal: Segmentation fault (11)
[girasole:27510] Signal code: (128)
[girasole:27510] Failing at address: (nil)
[girasole:27503] *** Process received signal ***
[girasole:27503] Signal: Segmentation fault (11)
[girasole:27503] Signal code: (128)
[girasole:27503] Failing at address: (nil)
[girasole:27510] [ 0] /lib64/libpthread.so.0 [0x32c780de80]
[girasole:27510] [ 1]
/home/daniels/src/MISC/openmpi-1.4/openmpi-1.4_install/lib/openmpi/mca_coll_tuned.so
[0x2ae2b29fbeb5]
[girasole:27510] [ 2]
/home/daniels/src/MISC/openmpi-1.4/openmpi-1.4_install/lib/openmpi/mca_coll_tuned.so
[0x2ae2b29fa8ca]
[girasole:27510] [ 3]
/home/daniels/src/MISC/openmpi-1.4/openmpi-1.4_install/lib/libmpi.so.0(MPI_Alltoall+0x15f)
[0x2ae2ae76bbff]
[girasole:27510] [ 4] ./bug_openmpi_1.4_test(main+0x97) [0x4009b7]
[girasole:27510] [ 5] /lib64/libc.so.6(__libc_start_main+0xf4)
[0x32c6c1d8b4]
[girasole:27510] [ 6] ./bug_openmpi_1.4_test [0x400869]
[girasole:27510] *** End of error message ***
[girasole:27503] [ 0] /lib64/libpthread.so.0 [0x32c780de80]
[girasole:27503] [ 1]
/home/daniels/src/MISC/openmpi-1.4/openmpi-1.4_install/lib/openmpi/mca_coll_tuned.so
[0x2b534b1b6eb5]
[girasole:27503] [ 2]
/home/daniels/src/MISC/openmpi-1.4/openmpi-1.4_install/lib/openmpi/mca_coll_tuned.so
[0x2b534b1b58ca]
[girasole:27503] [ 3]
/home/daniels/src/MISC/openmpi-1.4/openmpi-1.4_install/lib/libmpi.so.0(MPI_Alltoall+0x15f)
[0x2b5346f26bff]
[girasole:27503] [ 4] ./bug_openmpi_1.4_test(main+0x97) [0x4009b7]
[girasole:27503] [ 5] /lib64/libc.so.6(__libc_start_main+0xf4)
[0x32c6c1d8b4]
[girasole:27503] [ 6] ./bug_openmpi_1.4_test [0x400869]
[girasole:27503] *** End of error message ***
[girasole:27505] *** Process received signal ***
[girasole:27505] Signal: Segmentation fault (11)
[girasole:27505] Signal code: (128)
[girasole:27505] Failing at address: (nil)
[girasole:27509] *** Process received signal ***
[girasole:27509] Signal: Segmentation fault (11)
[girasole:27509] Signal code: (128)
[girasole:27509] Failing at address: (nil)
[girasole:27505] [ 0] /lib64/libpthread.so.0 [0x32c780de80]
[girasole:27505] [ 1]
/home/daniels/src/MISC/openmpi-1.4/openmpi-1.4_install/lib/openmpi/mca_coll_tuned.so
[0x2ab662aa0eb5]
[girasole:27505] [ 2]
/home/daniels/src/MISC/openmpi-1.4/openmpi-1.4_install/lib/openmpi/mca_coll_tuned.so
[0x2ab662a9f8ca]
[girasole:27505] [ 3]
/home/daniels/src/MISC/openmpi-1.4/openmpi-1.4_install/lib/libmpi.so.0(MPI_Alltoall+0x15f)
[0x2ab65e810bff]
[girasole:27505] [ 4] ./bug_openmpi_1.4_test(main+0x97) [0x4009b7]
[girasole:27505] [ 5] /lib64/libc.so.6(__libc_start_main+0xf4)
[0x32c6c1d8b4]
[girasole:27505] [ 6] ./bug_openmpi_1.4_test [0x400869]
[girasole:27505] *** End of error message ***
[girasole:27507] *** Process received signal ***
[girasole:27507] Signal: Segmentation fault (11)
[girasole:27507] Signal code: (128)
[girasole:27507] Failing at address: (nil)
[girasole:27509] [ 0] /lib64/libpthread.so.0 [0x32c780de80]
[girasole:27509] [ 1]
/home/daniels/src/MISC/openmpi-1.4/openmpi-1.4_install/lib/openmpi/mca_coll_tuned.so
[0x2b7dc1863eb5]
[girasole:27509] [ 2]
/home/daniels/src/MISC/openmpi-1.4/openmpi-1.4_install/lib/openmpi/mca_coll_tuned.so
[0x2b7dc18628ca]
[girasole:27509] [ 3]
/home/daniels/src/MISC/openmpi-1.4/openmpi-1.4_install/lib/libmpi.so.0(MPI_Alltoall+0x15f)
[0x2b7dbd5d3bff]
[girasole:27509] [ 4] ./bug_openmpi_1.4_test(main+0x97) [0x4009b7]
[girasole:27509] [ 5] /lib64/libc.so.6(__libc_start_main+0xf4)
[0x32c6c1d8b4]
[girasole:27509] [ 6] ./bug_openmpi_1.4_test [0x400869]
[girasole:27509] *** End of error message ***
[girasole:27507] [ 0] /lib64/libpthread.so.0 [0x32c780de80]
[girasole:27507] [ 1]
/home/daniels/src/MISC/openmpi-1.4/openmpi-1.4_install/lib/openmpi/mca_coll_tuned.so
[0x2b09eb873eb5]
[girasole:27507] [ 2]
/home/daniels/src/MISC/openmpi-1.4/openmpi-1.4_install/lib/openmpi/mca_coll_tuned.so
[0x2b09eb8728ca]
[girasole:27507] [ 3]
/home/daniels/src/MISC/openmpi-1.4/openmpi-1.4_install/lib/libmpi.so.0(MPI_Alltoall+0x15f)
[0x2b09e75e3bff]
[girasole:27507] [ 4] ./bug_openmpi_1.4_test(main+0x97) [0x4009b7]
[girasole:27507] [ 5] /lib64/libc.so.6(__libc_start_main+0xf4)
[0x32c6c1d8b4]
[girasole:27507] [ 6] ./bug_openmpi_1.4_test [0x400869]
[girasole:27507] *** End of error message ***
[girasole:27504] *** Process received signal ***
[girasole:27504] Signal: Segmentation fault (11)
[girasole:27504] Signal code: (128)
[girasole:27504] Failing at address: (nil)
[girasole:27506] *** Process received signal ***
[girasole:27506] Signal: Segmentation fault (11)
[girasole:27506] Signal code: (128)
[girasole:27506] Failing at address: (nil)
[girasole:27504] [ 0] /lib64/libpthread.so.0 [0x32c780de80]
[girasole:27504] [ 1]
/home/daniels/src/MISC/openmpi-1.4/openmpi-1.4_install/lib/openmpi/mca_coll_tuned.so
[0x2b6fde1afeb5]
[girasole:27504] [ 2]
/home/daniels/src/MISC/openmpi-1.4/openmpi-1.4_install/lib/openmpi/mca_coll_tuned.so
[0x2b6fde1ae8ca]
[girasole:27504] [ 3]
/home/daniels/src/MISC/openmpi-1.4/openmpi-1.4_install/lib/libmpi.so.0(MPI_Alltoall+0x15f)
[0x2b6fd9f1fbff]
[girasole:27504] [ 4] ./bug_openmpi_1.4_test(main+0x97) [0x4009b7]
[girasole:27504] [ 5] /lib64/libc.so.6(__libc_start_main+0xf4)
[0x32c6c1d8b4]
[girasole:27504] [ 6] ./bug_openmpi_1.4_test [0x400869]
[girasole:27504] *** End of error message ***
--------------------------------------------------------------------------
mpirun noticed that process rank 7 with PID 27510 on node girasole exited
on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
[girasole:27506] [ 0] /lib64/libpthread.so.0 [0x32c780de80]
[girasole:27506] [ 1]
/home/daniels/src/MISC/openmpi-1.4/openmpi-1.4_install/lib/openmpi/mca_coll_tuned.so
[0x2b66f2908eb5]
[girasole:27506] [ 2]
/home/daniels/src/MISC/openmpi-1.4/openmpi-1.4_install/lib/openmpi/mca_coll_tuned.so
[0x2b66f29078ca]
[girasole:27506] [ 3]
/home/daniels/src/MISC/openmpi-1.4/openmpi-1.4_install/lib/libmpi.so.0(MPI_Alltoall+0x15f)
[0x2b66ee678bff]
[girasole:27506] [ 4] ./bug_openmpi_1.4_test(main+0x97) [0x4009b7]
[girasole:27506] [ 5] /lib64/libc.so.6(__libc_start_main+0xf4)
[0x32c6c1d8b4]
[girasole:27506] [ 6] ./bug_openmpi_1.4_test [0x400869]
[girasole:27506] *** End of error message ***
[girasole:27508] *** Process received signal ***
[girasole:27508] Signal: Segmentation fault (11)
[girasole:27508] Signal code: (128)
[girasole:27508] Failing at address: (nil)
[girasole:27508] [ 0] /lib64/libpthread.so.0 [0x32c780de80]
[girasole:27508] [ 1]
/home/daniels/src/MISC/openmpi-1.4/openmpi-1.4_install/lib/openmpi/mca_coll_tuned.so
[0x2b89b09a1eb5]
[girasole:27508] [ 2]
/home/daniels/src/MISC/openmpi-1.4/openmpi-1.4_install/lib/openmpi/mca_coll_tuned.so
[0x2b89b09a08ca]
[girasole:27508] [ 3]
/home/daniels/src/MISC/openmpi-1.4/openmpi-1.4_install/lib/libmpi.so.0(MPI_Alltoall+0x15f)
[0x2b89ac711bff]
[girasole:27508] [ 4] ./bug_openmpi_1.4_test(main+0x97) [0x4009b7]
[girasole:27508] [ 5] /lib64/libc.so.6(__libc_start_main+0xf4)
[0x32c6c1d8b4]
[girasole:27508] [ 6] ./bug_openmpi_1.4_test [0x400869]
[girasole:27508] *** End of error message ***

Best regards,

-- 
Daniel Spångberg
Materialkemi
Uppsala Universitet