Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] [OMPI users] openmpi 1.4 broken -mca coll_tuned_use_dynamic_rules 1
From: Lenny Verkhovsky (lenny.verkhovsky_at_[hidden])
Date: 2009-12-30 09:17:17


This is the a knowing issue,
        https://svn.open-mpi.org/trac/ompi/ticket/2087
Maybe it's priority should be raised up.
Lenny.

On Wed, Dec 30, 2009 at 12:13 PM, Daniel Spångberg <daniels_at_[hidden]>wrote:

> Dear OpenMPI list,
>
> I have used the dynamic rules for collectives to be able to select one
> specific algorithm. With the latest versions of openmpi this seems to be
> broken. Just enabling coll_tuned_use_dynamic_rules causes the code to
> segfault. However, I do not provide a file with rules, since I just want to
> modify the behavior of one routine.
>
> I have tried the below example code on openmpi 1.3.2, 1.3.3, 1.3.4, and
> 1.4. It *works* on 1.3.2, 1.3.3, but segfaults on 1.3.4 and 1.4. I have
> confirmed this on Scientific Linux 5.2, and 5.4. I have also successfully
> reproduced the crash using version 1.4 running on debian etch. All running
> on amd64, compiled from source without other options to configure than
> --prefix. The crash occurs whether I use the intel 11.1 compiler (via env
> CC) or gcc. It also occurs no matter the btl is set to openib,self tcp,self
> sm,self or combinations of those. See below for ompi_info and other info. I
> have tried MPI_Alltoall, MPI_Alltoallv, and MPI_Allreduce which behave the
> same.
>
> #include <stdlib.h>
> #include <mpi.h>
>

>
> int main(int argc, char **argv)
> {
> int rank,size;
> char *buffer, *buffer2;
>
> MPI_Init(&argc,&argv);
>
> MPI_Comm_size(MPI_COMM_WORLD,&size);
> MPI_Comm_rank(MPI_COMM_WORLD,&rank);
>
> buffer=calloc(100*size,1);
> buffer2=calloc(100*size,1);
>
> MPI_Alltoall(buffer,100,MPI_BYTE,buffer2,100,MPI_BYTE,MPI_COMM_WORLD);
>
> MPI_Finalize();
> return 0;
> }
>
> Demonstrated behaviour:
>
> $ ompi_info
> Package: Open MPI daniels_at_arthur Distribution
> Open MPI: 1.4
> Open MPI SVN revision: r22285
> Open MPI release date: Dec 08, 2009
> Open RTE: 1.4
> Open RTE SVN revision: r22285
> Open RTE release date: Dec 08, 2009
> OPAL: 1.4
> OPAL SVN revision: r22285
> OPAL release date: Dec 08, 2009
> Ident string: 1.4
> Prefix:
> /home/daniels/src/MISC/openmpi-1.4/openmpi-1.4_install
> Configured architecture: x86_64-unknown-linux-gnu
> Configure host: arthur
> Configured by: daniels
> Configured on: Tue Dec 29 16:54:37 CET 2009
> Configure host: arthur
> Built by: daniels
> Built on: Tue Dec 29 17:04:36 CET 2009
> Built host: arthur
> C bindings: yes
> C++ bindings: yes
> Fortran77 bindings: yes (all)
> Fortran90 bindings: yes
> Fortran90 bindings size: small
> C compiler: gcc
> C compiler absolute: /usr/bin/gcc
> C++ compiler: g++
> C++ compiler absolute: /usr/bin/g++
> Fortran77 compiler: gfortran
> Fortran77 compiler abs: /usr/bin/gfortran
> Fortran90 compiler: gfortran
> Fortran90 compiler abs: /usr/bin/gfortran
> C profiling: yes
> C++ profiling: yes
> Fortran77 profiling: yes
> Fortran90 profiling: yes
> C++ exceptions: no
> Thread support: posix (mpi: no, progress: no)
> Sparse Groups: no
> Internal debug support: no
> MPI parameter check: runtime
> Memory profiling support: no
> Memory debugging support: no
> libltdl support: yes
> Heterogeneous support: no
> mpirun default --prefix: no
> MPI I/O support: yes
> MPI_WTIME support: gettimeofday
> Symbol visibility support: yes
> FT Checkpoint support: no (checkpoint thread: no)
> MCA backtrace: execinfo (MCA v2.0, API v2.0, Component v1.4)
> MCA memory: ptmalloc2 (MCA v2.0, API v2.0, Component v1.4)
> MCA paffinity: linux (MCA v2.0, API v2.0, Component v1.4)
>
> MCA carto: auto_detect (MCA v2.0, API v2.0, Component v1.4)
> MCA carto: file (MCA v2.0, API v2.0, Component v1.4)
> MCA maffinity: first_use (MCA v2.0, API v2.0, Component v1.4)
> MCA timer: linux (MCA v2.0, API v2.0, Component v1.4)
> MCA installdirs: env (MCA v2.0, API v2.0, Component v1.4)
> MCA installdirs: config (MCA v2.0, API v2.0, Component v1.4)
> MCA dpm: orte (MCA v2.0, API v2.0, Component v1.4)
> MCA pubsub: orte (MCA v2.0, API v2.0, Component v1.4)
> MCA allocator: basic (MCA v2.0, API v2.0, Component v1.4)
> MCA allocator: bucket (MCA v2.0, API v2.0, Component v1.4)
> MCA coll: basic (MCA v2.0, API v2.0, Component v1.4)
> MCA coll: hierarch (MCA v2.0, API v2.0, Component v1.4)
> MCA coll: inter (MCA v2.0, API v2.0, Component v1.4)
> MCA coll: self (MCA v2.0, API v2.0, Component v1.4)
> MCA coll: sm (MCA v2.0, API v2.0, Component v1.4)
> MCA coll: sync (MCA v2.0, API v2.0, Component v1.4)
> MCA coll: tuned (MCA v2.0, API v2.0, Component v1.4)
> MCA io: romio (MCA v2.0, API v2.0, Component v1.4)
> MCA mpool: fake (MCA v2.0, API v2.0, Component v1.4)
> MCA mpool: rdma (MCA v2.0, API v2.0, Component v1.4)
> MCA mpool: sm (MCA v2.0, API v2.0, Component v1.4)
> MCA pml: cm (MCA v2.0, API v2.0, Component v1.4)
> MCA pml: csum (MCA v2.0, API v2.0, Component v1.4)
> MCA pml: ob1 (MCA v2.0, API v2.0, Component v1.4)
> MCA pml: v (MCA v2.0, API v2.0, Component v1.4)
> MCA bml: r2 (MCA v2.0, API v2.0, Component v1.4)
> MCA rcache: vma (MCA v2.0, API v2.0, Component v1.4)
> MCA btl: self (MCA v2.0, API v2.0, Component v1.4)
> MCA btl: sm (MCA v2.0, API v2.0, Component v1.4)
> MCA btl: tcp (MCA v2.0, API v2.0, Component v1.4)
> MCA topo: unity (MCA v2.0, API v2.0, Component v1.4)
> MCA osc: pt2pt (MCA v2.0, API v2.0, Component v1.4)
> MCA osc: rdma (MCA v2.0, API v2.0, Component v1.4)
> MCA iof: hnp (MCA v2.0, API v2.0, Component v1.4)
> MCA iof: orted (MCA v2.0, API v2.0, Component v1.4)
> MCA iof: tool (MCA v2.0, API v2.0, Component v1.4)
> MCA oob: tcp (MCA v2.0, API v2.0, Component v1.4)
> MCA odls: default (MCA v2.0, API v2.0, Component v1.4)
> MCA ras: slurm (MCA v2.0, API v2.0, Component v1.4)
> MCA rmaps: load_balance (MCA v2.0, API v2.0, Component v1.4)
> MCA rmaps: rank_file (MCA v2.0, API v2.0, Component v1.4)
> MCA rmaps: round_robin (MCA v2.0, API v2.0, Component v1.4)
> MCA rmaps: seq (MCA v2.0, API v2.0, Component v1.4)
> MCA rml: oob (MCA v2.0, API v2.0, Component v1.4)
> MCA routed: binomial (MCA v2.0, API v2.0, Component v1.4)
> MCA routed: direct (MCA v2.0, API v2.0, Component v1.4)
> MCA routed: linear (MCA v2.0, API v2.0, Component v1.4)
> MCA plm: rsh (MCA v2.0, API v2.0, Component v1.4)
> MCA plm: slurm (MCA v2.0, API v2.0, Component v1.4)
> MCA filem: rsh (MCA v2.0, API v2.0, Component v1.4)
> MCA errmgr: default (MCA v2.0, API v2.0, Component v1.4)
> MCA ess: env (MCA v2.0, API v2.0, Component v1.4)
> MCA ess: hnp (MCA v2.0, API v2.0, Component v1.4)
> MCA ess: singleton (MCA v2.0, API v2.0, Component v1.4)
> MCA ess: slurm (MCA v2.0, API v2.0, Component v1.4)
> MCA ess: tool (MCA v2.0, API v2.0, Component v1.4)
> MCA grpcomm: bad (MCA v2.0, API v2.0, Component v1.4)
> MCA grpcomm: basic (MCA v2.0, API v2.0, Component v1.4)
>
> $ mpicc -O2 -o bug_openmpi_1.4_test bug_openmpi_1.4_test.c
> $ ldd ./bug_openmpi_1.4_test
> libmpi.so.0 =>
> /home/daniels/src/MISC/openmpi-1.4/openmpi-1.4_install/lib/libmpi.so.0
> (0x00002b33fa57e000)
> libopen-rte.so.0 =>
> /home/daniels/src/MISC/openmpi-1.4/openmpi-1.4_install/lib/libopen-rte.so.0
> (0x00002b33fa821000)
> libopen-pal.so.0 =>
> /home/daniels/src/MISC/openmpi-1.4/openmpi-1.4_install/lib/libopen-pal.so.0
> (0x00002b33faa6b000)
> libdl.so.2 => /lib64/libdl.so.2 (0x00000032c7400000)
> libnsl.so.1 => /lib64/libnsl.so.1 (0x00000032cfe00000)
> libutil.so.1 => /lib64/libutil.so.1 (0x00000032d4a00000)
> libm.so.6 => /lib64/libm.so.6 (0x00000032c7000000)
> libpthread.so.0 => /lib64/libpthread.so.0 (0x00000032c7800000)
> libc.so.6 => /lib64/libc.so.6 (0x00000032c6c00000)
> /lib64/ld-linux-x86-64.so.2 (0x00000032c5c00000)
> $ mpirun -mca btl tcp,self -mca coll_tuned_use_dynamic_rules 0 -np 8
> ./bug_openmpi_1.4_test
> $ mpirun -mca btl tcp,self -mca coll_tuned_use_dynamic_rules 1 -np 8
> ./bug_openmpi_1.4_test
> [girasole:27510] *** Process received signal ***
> [girasole:27510] Signal: Segmentation fault (11)
> [girasole:27510] Signal code: (128)
> [girasole:27510] Failing at address: (nil)
> [girasole:27503] *** Process received signal ***
> [girasole:27503] Signal: Segmentation fault (11)
> [girasole:27503] Signal code: (128)
> [girasole:27503] Failing at address: (nil)
> [girasole:27510] [ 0] /lib64/libpthread.so.0 [0x32c780de80]
> [girasole:27510] [ 1]
> /home/daniels/src/MISC/openmpi-1.4/openmpi-1.4_install/lib/openmpi/mca_coll_tuned.so
> [0x2ae2b29fbeb5]
> [girasole:27510] [ 2]
> /home/daniels/src/MISC/openmpi-1.4/openmpi-1.4_install/lib/openmpi/mca_coll_tuned.so
> [0x2ae2b29fa8ca]
> [girasole:27510] [ 3]
> /home/daniels/src/MISC/openmpi-1.4/openmpi-1.4_install/lib/libmpi.so.0(MPI_Alltoall+0x15f)
> [0x2ae2ae76bbff]
> [girasole:27510] [ 4] ./bug_openmpi_1.4_test(main+0x97) [0x4009b7]
> [girasole:27510] [ 5] /lib64/libc.so.6(__libc_start_main+0xf4)
> [0x32c6c1d8b4]
> [girasole:27510] [ 6] ./bug_openmpi_1.4_test [0x400869]
> [girasole:27510] *** End of error message ***
> [girasole:27503] [ 0] /lib64/libpthread.so.0 [0x32c780de80]
> [girasole:27503] [ 1]
> /home/daniels/src/MISC/openmpi-1.4/openmpi-1.4_install/lib/openmpi/mca_coll_tuned.so
> [0x2b534b1b6eb5]
> [girasole:27503] [ 2]
> /home/daniels/src/MISC/openmpi-1.4/openmpi-1.4_install/lib/openmpi/mca_coll_tuned.so
> [0x2b534b1b58ca]
> [girasole:27503] [ 3]
> /home/daniels/src/MISC/openmpi-1.4/openmpi-1.4_install/lib/libmpi.so.0(MPI_Alltoall+0x15f)
> [0x2b5346f26bff]
> [girasole:27503] [ 4] ./bug_openmpi_1.4_test(main+0x97) [0x4009b7]
> [girasole:27503] [ 5] /lib64/libc.so.6(__libc_start_main+0xf4)
> [0x32c6c1d8b4]
> [girasole:27503] [ 6] ./bug_openmpi_1.4_test [0x400869]
> [girasole:27503] *** End of error message ***
> [girasole:27505] *** Process received signal ***
> [girasole:27505] Signal: Segmentation fault (11)
> [girasole:27505] Signal code: (128)
> [girasole:27505] Failing at address: (nil)
> [girasole:27509] *** Process received signal ***
> [girasole:27509] Signal: Segmentation fault (11)
> [girasole:27509] Signal code: (128)
> [girasole:27509] Failing at address: (nil)
> [girasole:27505] [ 0] /lib64/libpthread.so.0 [0x32c780de80]
> [girasole:27505] [ 1]
> /home/daniels/src/MISC/openmpi-1.4/openmpi-1.4_install/lib/openmpi/mca_coll_tuned.so
> [0x2ab662aa0eb5]
> [girasole:27505] [ 2]
> /home/daniels/src/MISC/openmpi-1.4/openmpi-1.4_install/lib/openmpi/mca_coll_tuned.so
> [0x2ab662a9f8ca]
> [girasole:27505] [ 3]
> /home/daniels/src/MISC/openmpi-1.4/openmpi-1.4_install/lib/libmpi.so.0(MPI_Alltoall+0x15f)
> [0x2ab65e810bff]
> [girasole:27505] [ 4] ./bug_openmpi_1.4_test(main+0x97) [0x4009b7]
> [girasole:27505] [ 5] /lib64/libc.so.6(__libc_start_main+0xf4)
> [0x32c6c1d8b4]
> [girasole:27505] [ 6] ./bug_openmpi_1.4_test [0x400869]
> [girasole:27505] *** End of error message ***
> [girasole:27507] *** Process received signal ***
> [girasole:27507] Signal: Segmentation fault (11)
> [girasole:27507] Signal code: (128)
> [girasole:27507] Failing at address: (nil)
> [girasole:27509] [ 0] /lib64/libpthread.so.0 [0x32c780de80]
> [girasole:27509] [ 1]
> /home/daniels/src/MISC/openmpi-1.4/openmpi-1.4_install/lib/openmpi/mca_coll_tuned.so
> [0x2b7dc1863eb5]
> [girasole:27509] [ 2]
> /home/daniels/src/MISC/openmpi-1.4/openmpi-1.4_install/lib/openmpi/mca_coll_tuned.so
> [0x2b7dc18628ca]
> [girasole:27509] [ 3]
> /home/daniels/src/MISC/openmpi-1.4/openmpi-1.4_install/lib/libmpi.so.0(MPI_Alltoall+0x15f)
> [0x2b7dbd5d3bff]
> [girasole:27509] [ 4] ./bug_openmpi_1.4_test(main+0x97) [0x4009b7]
> [girasole:27509] [ 5] /lib64/libc.so.6(__libc_start_main+0xf4)
> [0x32c6c1d8b4]
> [girasole:27509] [ 6] ./bug_openmpi_1.4_test [0x400869]
> [girasole:27509] *** End of error message ***
> [girasole:27507] [ 0] /lib64/libpthread.so.0 [0x32c780de80]
> [girasole:27507] [ 1]
> /home/daniels/src/MISC/openmpi-1.4/openmpi-1.4_install/lib/openmpi/mca_coll_tuned.so
> [0x2b09eb873eb5]
> [girasole:27507] [ 2]
> /home/daniels/src/MISC/openmpi-1.4/openmpi-1.4_install/lib/openmpi/mca_coll_tuned.so
> [0x2b09eb8728ca]
> [girasole:27507] [ 3]
> /home/daniels/src/MISC/openmpi-1.4/openmpi-1.4_install/lib/libmpi.so.0(MPI_Alltoall+0x15f)
> [0x2b09e75e3bff]
> [girasole:27507] [ 4] ./bug_openmpi_1.4_test(main+0x97) [0x4009b7]
> [girasole:27507] [ 5] /lib64/libc.so.6(__libc_start_main+0xf4)
> [0x32c6c1d8b4]
> [girasole:27507] [ 6] ./bug_openmpi_1.4_test [0x400869]
> [girasole:27507] *** End of error message ***
> [girasole:27504] *** Process received signal ***
> [girasole:27504] Signal: Segmentation fault (11)
> [girasole:27504] Signal code: (128)
> [girasole:27504] Failing at address: (nil)
> [girasole:27506] *** Process received signal ***
> [girasole:27506] Signal: Segmentation fault (11)
> [girasole:27506] Signal code: (128)
> [girasole:27506] Failing at address: (nil)
> [girasole:27504] [ 0] /lib64/libpthread.so.0 [0x32c780de80]
> [girasole:27504] [ 1]
> /home/daniels/src/MISC/openmpi-1.4/openmpi-1.4_install/lib/openmpi/mca_coll_tuned.so
> [0x2b6fde1afeb5]
> [girasole:27504] [ 2]
> /home/daniels/src/MISC/openmpi-1.4/openmpi-1.4_install/lib/openmpi/mca_coll_tuned.so
> [0x2b6fde1ae8ca]
> [girasole:27504] [ 3]
> /home/daniels/src/MISC/openmpi-1.4/openmpi-1.4_install/lib/libmpi.so.0(MPI_Alltoall+0x15f)
> [0x2b6fd9f1fbff]
> [girasole:27504] [ 4] ./bug_openmpi_1.4_test(main+0x97) [0x4009b7]
> [girasole:27504] [ 5] /lib64/libc.so.6(__libc_start_main+0xf4)
> [0x32c6c1d8b4]
> [girasole:27504] [ 6] ./bug_openmpi_1.4_test [0x400869]
> [girasole:27504] *** End of error message ***
> --------------------------------------------------------------------------
> mpirun noticed that process rank 7 with PID 27510 on node girasole exited
> on signal 11 (Segmentation fault).
> --------------------------------------------------------------------------
> [girasole:27506] [ 0] /lib64/libpthread.so.0 [0x32c780de80]
> [girasole:27506] [ 1]
> /home/daniels/src/MISC/openmpi-1.4/openmpi-1.4_install/lib/openmpi/mca_coll_tuned.so
> [0x2b66f2908eb5]
> [girasole:27506] [ 2]
> /home/daniels/src/MISC/openmpi-1.4/openmpi-1.4_install/lib/openmpi/mca_coll_tuned.so
> [0x2b66f29078ca]
> [girasole:27506] [ 3]
> /home/daniels/src/MISC/openmpi-1.4/openmpi-1.4_install/lib/libmpi.so.0(MPI_Alltoall+0x15f)
> [0x2b66ee678bff]
> [girasole:27506] [ 4] ./bug_openmpi_1.4_test(main+0x97) [0x4009b7]
> [girasole:27506] [ 5] /lib64/libc.so.6(__libc_start_main+0xf4)
> [0x32c6c1d8b4]
> [girasole:27506] [ 6] ./bug_openmpi_1.4_test [0x400869]
> [girasole:27506] *** End of error message ***
> [girasole:27508] *** Process received signal ***
> [girasole:27508] Signal: Segmentation fault (11)
> [girasole:27508] Signal code: (128)
> [girasole:27508] Failing at address: (nil)
> [girasole:27508] [ 0] /lib64/libpthread.so.0 [0x32c780de80]
> [girasole:27508] [ 1]
> /home/daniels/src/MISC/openmpi-1.4/openmpi-1.4_install/lib/openmpi/mca_coll_tuned.so
> [0x2b89b09a1eb5]
> [girasole:27508] [ 2]
> /home/daniels/src/MISC/openmpi-1.4/openmpi-1.4_install/lib/openmpi/mca_coll_tuned.so
> [0x2b89b09a08ca]
> [girasole:27508] [ 3]
> /home/daniels/src/MISC/openmpi-1.4/openmpi-1.4_install/lib/libmpi.so.0(MPI_Alltoall+0x15f)
> [0x2b89ac711bff]
> [girasole:27508] [ 4] ./bug_openmpi_1.4_test(main+0x97) [0x4009b7]
> [girasole:27508] [ 5] /lib64/libc.so.6(__libc_start_main+0xf4)
> [0x32c6c1d8b4]
> [girasole:27508] [ 6] ./bug_openmpi_1.4_test [0x400869]
> [girasole:27508] *** End of error message ***
>
>
> Best regards,
>
> --
> Daniel Spångberg
> Materialkemi
> Uppsala Universitet
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>