Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI devel] [OMPI users] openmpi 1.4 broken -mca coll_tuned_use_dynamic_rules 1
From: Lenny Verkhovsky (lenny.verkhovsky_at_[hidden])
Date: 2009-12-30 09:17:17


This is the a knowing issue,
        https://svn.open-mpi.org/trac/ompi/ticket/2087
Maybe it's priority should be raised up.
Lenny.

On Wed, Dec 30, 2009 at 12:13 PM, Daniel Spångberg <daniels_at_[hidden]>wrote:

> Dear OpenMPI list,
>
> I have used the dynamic rules for collectives to be able to select one
> specific algorithm. With the latest versions of openmpi this seems to be
> broken. Just enabling coll_tuned_use_dynamic_rules causes the code to
> segfault. However, I do not provide a file with rules, since I just want to
> modify the behavior of one routine.
>
> I have tried the below example code on openmpi 1.3.2, 1.3.3, 1.3.4, and
> 1.4. It *works* on 1.3.2, 1.3.3, but segfaults on 1.3.4 and 1.4. I have
> confirmed this on Scientific Linux 5.2, and 5.4. I have also successfully
> reproduced the crash using version 1.4 running on debian etch. All running
> on amd64, compiled from source without other options to configure than
> --prefix. The crash occurs whether I use the intel 11.1 compiler (via env
> CC) or gcc. It also occurs no matter the btl is set to openib,self tcp,self
> sm,self or combinations of those. See below for ompi_info and other info. I
> have tried MPI_Alltoall, MPI_Alltoallv, and MPI_Allreduce which behave the
> same.
>
> #include <stdlib.h>
> #include <mpi.h>
>

>
> int main(int argc, char **argv)
> {
> int rank,size;
> char *buffer, *buffer2;
>
> MPI_Init(&argc,&argv);
>
> MPI_Comm_size(MPI_COMM_WORLD,&size);
> MPI_Comm_rank(MPI_COMM_WORLD,&rank);
>
> buffer=calloc(100*size,1);
> buffer2=calloc(100*size,1);
>
> MPI_Alltoall(buffer,100,MPI_BYTE,buffer2,100,MPI_BYTE,MPI_COMM_WORLD);
>
> MPI_Finalize();
> return 0;
> }
>
> Demonstrated behaviour:
>
> $ ompi_info
> Package: Open MPI daniels_at_arthur Distribution
> Open MPI: 1.4
> Open MPI SVN revision: r22285
> Open MPI release date: Dec 08, 2009
> Open RTE: 1.4
> Open RTE SVN revision: r22285
> Open RTE release date: Dec 08, 2009
> OPAL: 1.4
> OPAL SVN revision: r22285
> OPAL release date: Dec 08, 2009
> Ident string: 1.4
> Prefix:
> /home/daniels/src/MISC/openmpi-1.4/openmpi-1.4_install
> Configured architecture: x86_64-unknown-linux-gnu
> Configure host: arthur
> Configured by: daniels
> Configured on: Tue Dec 29 16:54:37 CET 2009
> Configure host: arthur
> Built by: daniels
> Built on: Tue Dec 29 17:04:36 CET 2009
> Built host: arthur
> C bindings: yes
> C++ bindings: yes
> Fortran77 bindings: yes (all)
> Fortran90 bindings: yes
> Fortran90 bindings size: small
> C compiler: gcc
> C compiler absolute: /usr/bin/gcc
> C++ compiler: g++
> C++ compiler absolute: /usr/bin/g++
> Fortran77 compiler: gfortran
> Fortran77 compiler abs: /usr/bin/gfortran
> Fortran90 compiler: gfortran
> Fortran90 compiler abs: /usr/bin/gfortran
> C profiling: yes
> C++ profiling: yes
> Fortran77 profiling: yes
> Fortran90 profiling: yes
> C++ exceptions: no
> Thread support: posix (mpi: no, progress: no)
> Sparse Groups: no
> Internal debug support: no
> MPI parameter check: runtime
> Memory profiling support: no
> Memory debugging support: no
> libltdl support: yes
> Heterogeneous support: no
> mpirun default --prefix: no
> MPI I/O support: yes
> MPI_WTIME support: gettimeofday
> Symbol visibility support: yes
> FT Checkpoint support: no (checkpoint thread: no)
> MCA backtrace: execinfo (MCA v2.0, API v2.0, Component v1.4)
> MCA memory: ptmalloc2 (MCA v2.0, API v2.0, Component v1.4)
> MCA paffinity: linux (MCA v2.0, API v2.0, Component v1.4)
>
> MCA carto: auto_detect (MCA v2.0, API v2.0, Component v1.4)
> MCA carto: file (MCA v2.0, API v2.0, Component v1.4)
> MCA maffinity: first_use (MCA v2.0, API v2.0, Component v1.4)
> MCA timer: linux (MCA v2.0, API v2.0, Component v1.4)
> MCA installdirs: env (MCA v2.0, API v2.0, Component v1.4)
> MCA installdirs: config (MCA v2.0, API v2.0, Component v1.4)
> MCA dpm: orte (MCA v2.0, API v2.0, Component v1.4)
> MCA pubsub: orte (MCA v2.0, API v2.0, Component v1.4)
> MCA allocator: basic (MCA v2.0, API v2.0, Component v1.4)
> MCA allocator: bucket (MCA v2.0, API v2.0, Component v1.4)
> MCA coll: basic (MCA v2.0, API v2.0, Component v1.4)
> MCA coll: hierarch (MCA v2.0, API v2.0, Component v1.4)
> MCA coll: inter (MCA v2.0, API v2.0, Component v1.4)
> MCA coll: self (MCA v2.0, API v2.0, Component v1.4)
> MCA coll: sm (MCA v2.0, API v2.0, Component v1.4)
> MCA coll: sync (MCA v2.0, API v2.0, Component v1.4)
> MCA coll: tuned (MCA v2.0, API v2.0, Component v1.4)
> MCA io: romio (MCA v2.0, API v2.0, Component v1.4)
> MCA mpool: fake (MCA v2.0, API v2.0, Component v1.4)
> MCA mpool: rdma (MCA v2.0, API v2.0, Component v1.4)
> MCA mpool: sm (MCA v2.0, API v2.0, Component v1.4)
> MCA pml: cm (MCA v2.0, API v2.0, Component v1.4)
> MCA pml: csum (MCA v2.0, API v2.0, Component v1.4)
> MCA pml: ob1 (MCA v2.0, API v2.0, Component v1.4)
> MCA pml: v (MCA v2.0, API v2.0, Component v1.4)
> MCA bml: r2 (MCA v2.0, API v2.0, Component v1.4)
> MCA rcache: vma (MCA v2.0, API v2.0, Component v1.4)
> MCA btl: self (MCA v2.0, API v2.0, Component v1.4)
> MCA btl: sm (MCA v2.0, API v2.0, Component v1.4)
> MCA btl: tcp (MCA v2.0, API v2.0, Component v1.4)
> MCA topo: unity (MCA v2.0, API v2.0, Component v1.4)
> MCA osc: pt2pt (MCA v2.0, API v2.0, Component v1.4)
> MCA osc: rdma (MCA v2.0, API v2.0, Component v1.4)
> MCA iof: hnp (MCA v2.0, API v2.0, Component v1.4)
> MCA iof: orted (MCA v2.0, API v2.0, Component v1.4)
> MCA iof: tool (MCA v2.0, API v2.0, Component v1.4)
> MCA oob: tcp (MCA v2.0, API v2.0, Component v1.4)
> MCA odls: default (MCA v2.0, API v2.0, Component v1.4)
> MCA ras: slurm (MCA v2.0, API v2.0, Component v1.4)
> MCA rmaps: load_balance (MCA v2.0, API v2.0, Component v1.4)
> MCA rmaps: rank_file (MCA v2.0, API v2.0, Component v1.4)
> MCA rmaps: round_robin (MCA v2.0, API v2.0, Component v1.4)
> MCA rmaps: seq (MCA v2.0, API v2.0, Component v1.4)
> MCA rml: oob (MCA v2.0, API v2.0, Component v1.4)
> MCA routed: binomial (MCA v2.0, API v2.0, Component v1.4)
> MCA routed: direct (MCA v2.0, API v2.0, Component v1.4)
> MCA routed: linear (MCA v2.0, API v2.0, Component v1.4)
> MCA plm: rsh (MCA v2.0, API v2.0, Component v1.4)
> MCA plm: slurm (MCA v2.0, API v2.0, Component v1.4)
> MCA filem: rsh (MCA v2.0, API v2.0, Component v1.4)
> MCA errmgr: default (MCA v2.0, API v2.0, Component v1.4)
> MCA ess: env (MCA v2.0, API v2.0, Component v1.4)
> MCA ess: hnp (MCA v2.0, API v2.0, Component v1.4)
> MCA ess: singleton (MCA v2.0, API v2.0, Component v1.4)
> MCA ess: slurm (MCA v2.0, API v2.0, Component v1.4)
> MCA ess: tool (MCA v2.0, API v2.0, Component v1.4)
> MCA grpcomm: bad (MCA v2.0, API v2.0, Component v1.4)
> MCA grpcomm: basic (MCA v2.0, API v2.0, Component v1.4)
>
> $ mpicc -O2 -o bug_openmpi_1.4_test bug_openmpi_1.4_test.c
> $ ldd ./bug_openmpi_1.4_test
> libmpi.so.0 =>
> /home/daniels/src/MISC/openmpi-1.4/openmpi-1.4_install/lib/libmpi.so.0
> (0x00002b33fa57e000)
> libopen-rte.so.0 =>
> /home/daniels/src/MISC/openmpi-1.4/openmpi-1.4_install/lib/libopen-rte.so.0
> (0x00002b33fa821000)
> libopen-pal.so.0 =>
> /home/daniels/src/MISC/openmpi-1.4/openmpi-1.4_install/lib/libopen-pal.so.0
> (0x00002b33faa6b000)
> libdl.so.2 => /lib64/libdl.so.2 (0x00000032c7400000)
> libnsl.so.1 => /lib64/libnsl.so.1 (0x00000032cfe00000)
> libutil.so.1 => /lib64/libutil.so.1 (0x00000032d4a00000)
> libm.so.6 => /lib64/libm.so.6 (0x00000032c7000000)
> libpthread.so.0 => /lib64/libpthread.so.0 (0x00000032c7800000)
> libc.so.6 => /lib64/libc.so.6 (0x00000032c6c00000)
> /lib64/ld-linux-x86-64.so.2 (0x00000032c5c00000)
> $ mpirun -mca btl tcp,self -mca coll_tuned_use_dynamic_rules 0 -np 8
> ./bug_openmpi_1.4_test
> $ mpirun -mca btl tcp,self -mca coll_tuned_use_dynamic_rules 1 -np 8
> ./bug_openmpi_1.4_test
> [girasole:27510] *** Process received signal ***
> [girasole:27510] Signal: Segmentation fault (11)
> [girasole:27510] Signal code: (128)
> [girasole:27510] Failing at address: (nil)
> [girasole:27503] *** Process received signal ***
> [girasole:27503] Signal: Segmentation fault (11)
> [girasole:27503] Signal code: (128)
> [girasole:27503] Failing at address: (nil)
> [girasole:27510] [ 0] /lib64/libpthread.so.0 [0x32c780de80]
> [girasole:27510] [ 1]
> /home/daniels/src/MISC/openmpi-1.4/openmpi-1.4_install/lib/openmpi/mca_coll_tuned.so
> [0x2ae2b29fbeb5]
> [girasole:27510] [ 2]
> /home/daniels/src/MISC/openmpi-1.4/openmpi-1.4_install/lib/openmpi/mca_coll_tuned.so
> [0x2ae2b29fa8ca]
> [girasole:27510] [ 3]
> /home/daniels/src/MISC/openmpi-1.4/openmpi-1.4_install/lib/libmpi.so.0(MPI_Alltoall+0x15f)
> [0x2ae2ae76bbff]
> [girasole:27510] [ 4] ./bug_openmpi_1.4_test(main+0x97) [0x4009b7]
> [girasole:27510] [ 5] /lib64/libc.so.6(__libc_start_main+0xf4)
> [0x32c6c1d8b4]
> [girasole:27510] [ 6] ./bug_openmpi_1.4_test [0x400869]
> [girasole:27510] *** End of error message ***
> [girasole:27503] [ 0] /lib64/libpthread.so.0 [0x32c780de80]
> [girasole:27503] [ 1]
> /home/daniels/src/MISC/openmpi-1.4/openmpi-1.4_install/lib/openmpi/mca_coll_tuned.so
> [0x2b534b1b6eb5]
> [girasole:27503] [ 2]
> /home/daniels/src/MISC/openmpi-1.4/openmpi-1.4_install/lib/openmpi/mca_coll_tuned.so
> [0x2b534b1b58ca]
> [girasole:27503] [ 3]
> /home/daniels/src/MISC/openmpi-1.4/openmpi-1.4_install/lib/libmpi.so.0(MPI_Alltoall+0x15f)
> [0x2b5346f26bff]
> [girasole:27503] [ 4] ./bug_openmpi_1.4_test(main+0x97) [0x4009b7]
> [girasole:27503] [ 5] /lib64/libc.so.6(__libc_start_main+0xf4)
> [0x32c6c1d8b4]
> [girasole:27503] [ 6] ./bug_openmpi_1.4_test [0x400869]
> [girasole:27503] *** End of error message ***
> [girasole:27505] *** Process received signal ***
> [girasole:27505] Signal: Segmentation fault (11)
> [girasole:27505] Signal code: (128)
> [girasole:27505] Failing at address: (nil)
> [girasole:27509] *** Process received signal ***
> [girasole:27509] Signal: Segmentation fault (11)
> [girasole:27509] Signal code: (128)
> [girasole:27509] Failing at address: (nil)
> [girasole:27505] [ 0] /lib64/libpthread.so.0 [0x32c780de80]
> [girasole:27505] [ 1]
> /home/daniels/src/MISC/openmpi-1.4/openmpi-1.4_install/lib/openmpi/mca_coll_tuned.so
> [0x2ab662aa0eb5]
> [girasole:27505] [ 2]
> /home/daniels/src/MISC/openmpi-1.4/openmpi-1.4_install/lib/openmpi/mca_coll_tuned.so
> [0x2ab662a9f8ca]
> [girasole:27505] [ 3]
> /home/daniels/src/MISC/openmpi-1.4/openmpi-1.4_install/lib/libmpi.so.0(MPI_Alltoall+0x15f)
> [0x2ab65e810bff]
> [girasole:27505] [ 4] ./bug_openmpi_1.4_test(main+0x97) [0x4009b7]
> [girasole:27505] [ 5] /lib64/libc.so.6(__libc_start_main+0xf4)
> [0x32c6c1d8b4]
> [girasole:27505] [ 6] ./bug_openmpi_1.4_test [0x400869]
> [girasole:27505] *** End of error message ***
> [girasole:27507] *** Process received signal ***
> [girasole:27507] Signal: Segmentation fault (11)
> [girasole:27507] Signal code: (128)
> [girasole:27507] Failing at address: (nil)
> [girasole:27509] [ 0] /lib64/libpthread.so.0 [0x32c780de80]
> [girasole:27509] [ 1]
> /home/daniels/src/MISC/openmpi-1.4/openmpi-1.4_install/lib/openmpi/mca_coll_tuned.so
> [0x2b7dc1863eb5]
> [girasole:27509] [ 2]
> /home/daniels/src/MISC/openmpi-1.4/openmpi-1.4_install/lib/openmpi/mca_coll_tuned.so
> [0x2b7dc18628ca]
> [girasole:27509] [ 3]
> /home/daniels/src/MISC/openmpi-1.4/openmpi-1.4_install/lib/libmpi.so.0(MPI_Alltoall+0x15f)
> [0x2b7dbd5d3bff]
> [girasole:27509] [ 4] ./bug_openmpi_1.4_test(main+0x97) [0x4009b7]
> [girasole:27509] [ 5] /lib64/libc.so.6(__libc_start_main+0xf4)
> [0x32c6c1d8b4]
> [girasole:27509] [ 6] ./bug_openmpi_1.4_test [0x400869]
> [girasole:27509] *** End of error message ***
> [girasole:27507] [ 0] /lib64/libpthread.so.0 [0x32c780de80]
> [girasole:27507] [ 1]
> /home/daniels/src/MISC/openmpi-1.4/openmpi-1.4_install/lib/openmpi/mca_coll_tuned.so
> [0x2b09eb873eb5]
> [girasole:27507] [ 2]
> /home/daniels/src/MISC/openmpi-1.4/openmpi-1.4_install/lib/openmpi/mca_coll_tuned.so
> [0x2b09eb8728ca]
> [girasole:27507] [ 3]
> /home/daniels/src/MISC/openmpi-1.4/openmpi-1.4_install/lib/libmpi.so.0(MPI_Alltoall+0x15f)
> [0x2b09e75e3bff]
> [girasole:27507] [ 4] ./bug_openmpi_1.4_test(main+0x97) [0x4009b7]
> [girasole:27507] [ 5] /lib64/libc.so.6(__libc_start_main+0xf4)
> [0x32c6c1d8b4]
> [girasole:27507] [ 6] ./bug_openmpi_1.4_test [0x400869]
> [girasole:27507] *** End of error message ***
> [girasole:27504] *** Process received signal ***
> [girasole:27504] Signal: Segmentation fault (11)
> [girasole:27504] Signal code: (128)
> [girasole:27504] Failing at address: (nil)
> [girasole:27506] *** Process received signal ***
> [girasole:27506] Signal: Segmentation fault (11)
> [girasole:27506] Signal code: (128)
> [girasole:27506] Failing at address: (nil)
> [girasole:27504] [ 0] /lib64/libpthread.so.0 [0x32c780de80]
> [girasole:27504] [ 1]
> /home/daniels/src/MISC/openmpi-1.4/openmpi-1.4_install/lib/openmpi/mca_coll_tuned.so
> [0x2b6fde1afeb5]
> [girasole:27504] [ 2]
> /home/daniels/src/MISC/openmpi-1.4/openmpi-1.4_install/lib/openmpi/mca_coll_tuned.so
> [0x2b6fde1ae8ca]
> [girasole:27504] [ 3]
> /home/daniels/src/MISC/openmpi-1.4/openmpi-1.4_install/lib/libmpi.so.0(MPI_Alltoall+0x15f)
> [0x2b6fd9f1fbff]
> [girasole:27504] [ 4] ./bug_openmpi_1.4_test(main+0x97) [0x4009b7]
> [girasole:27504] [ 5] /lib64/libc.so.6(__libc_start_main+0xf4)
> [0x32c6c1d8b4]
> [girasole:27504] [ 6] ./bug_openmpi_1.4_test [0x400869]
> [girasole:27504] *** End of error message ***
> --------------------------------------------------------------------------
> mpirun noticed that process rank 7 with PID 27510 on node girasole exited
> on signal 11 (Segmentation fault).
> --------------------------------------------------------------------------
> [girasole:27506] [ 0] /lib64/libpthread.so.0 [0x32c780de80]
> [girasole:27506] [ 1]
> /home/daniels/src/MISC/openmpi-1.4/openmpi-1.4_install/lib/openmpi/mca_coll_tuned.so
> [0x2b66f2908eb5]
> [girasole:27506] [ 2]
> /home/daniels/src/MISC/openmpi-1.4/openmpi-1.4_install/lib/openmpi/mca_coll_tuned.so
> [0x2b66f29078ca]
> [girasole:27506] [ 3]
> /home/daniels/src/MISC/openmpi-1.4/openmpi-1.4_install/lib/libmpi.so.0(MPI_Alltoall+0x15f)
> [0x2b66ee678bff]
> [girasole:27506] [ 4] ./bug_openmpi_1.4_test(main+0x97) [0x4009b7]
> [girasole:27506] [ 5] /lib64/libc.so.6(__libc_start_main+0xf4)
> [0x32c6c1d8b4]
> [girasole:27506] [ 6] ./bug_openmpi_1.4_test [0x400869]
> [girasole:27506] *** End of error message ***
> [girasole:27508] *** Process received signal ***
> [girasole:27508] Signal: Segmentation fault (11)
> [girasole:27508] Signal code: (128)
> [girasole:27508] Failing at address: (nil)
> [girasole:27508] [ 0] /lib64/libpthread.so.0 [0x32c780de80]
> [girasole:27508] [ 1]
> /home/daniels/src/MISC/openmpi-1.4/openmpi-1.4_install/lib/openmpi/mca_coll_tuned.so
> [0x2b89b09a1eb5]
> [girasole:27508] [ 2]
> /home/daniels/src/MISC/openmpi-1.4/openmpi-1.4_install/lib/openmpi/mca_coll_tuned.so
> [0x2b89b09a08ca]
> [girasole:27508] [ 3]
> /home/daniels/src/MISC/openmpi-1.4/openmpi-1.4_install/lib/libmpi.so.0(MPI_Alltoall+0x15f)
> [0x2b89ac711bff]
> [girasole:27508] [ 4] ./bug_openmpi_1.4_test(main+0x97) [0x4009b7]
> [girasole:27508] [ 5] /lib64/libc.so.6(__libc_start_main+0xf4)
> [0x32c6c1d8b4]
> [girasole:27508] [ 6] ./bug_openmpi_1.4_test [0x400869]
> [girasole:27508] *** End of error message ***
>
>
> Best regards,
>
> --
> Daniel Spångberg
> Materialkemi
> Uppsala Universitet
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>