Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] 1.7.5 fails on simple test
From: Ralph Castain (rhc_at_[hidden])
Date: 2014-02-10 16:20:41


Done - thanks Rolf!!

On Feb 10, 2014, at 1:13 PM, Rolf vandeVaart <rvandevaart_at_[hidden]> wrote:

> I have tracked this down. There is a missing commit that affects ompi_mpi_init.c causing it to initialize bml twice.
> Ralph, can you apply r30310 to 1.7?
>
> Thanks,
> Rolf
>
> From: devel [mailto:devel-bounces_at_[hidden]] On Behalf Of Rolf vandeVaart
> Sent: Monday, February 10, 2014 12:29 PM
> To: Open MPI Developers
> Subject: Re: [OMPI devel] 1.7.5 fails on simple test
>
> I have seen this same issue although my core dump is a little bit different. I am running with tcp,self. The first entry in the list of BTLs is garbage, but then there is tcp and self in the list. Strange. This is my core dump. Line 208 in bml_r2.c is where I get the SEGV.
>
> Program terminated with signal 11, Segmentation fault.
> #0 0x00007fb6dec981d0 in ?? ()
> Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.107.el6_4.5.x86_64
> (gdb) where
> #0 0x00007fb6dec981d0 in ?? ()
> #1 <signal handler called>
> #2 0x00007fb6e82fff38 in main_arena () from /lib64/libc.so.6
> #3 0x00007fb6e4103de2 in mca_bml_r2_add_procs (nprocs=2, procs=0x2061440, reachable=0x7fff80487b40)
> at ../../../../../ompi/mca/bml/r2/bml_r2.c:208
> #4 0x00007fb6df50a751 in mca_pml_ob1_add_procs (procs=0x2060bc0, nprocs=2)
> at ../../../../../ompi/mca/pml/ob1/pml_ob1.c:332
> #5 0x00007fb6e8570dca in ompi_mpi_init (argc=1, argv=0x7fff80488158, requested=0, provided=0x7fff80487cc8)
> at ../../ompi/runtime/ompi_mpi_init.c:776
> #6 0x00007fb6e85a3606 in PMPI_Init (argc=0x7fff80487d8c, argv=0x7fff80487d80) at pinit.c:84
> #7 0x0000000000401c56 in main (argc=1, argv=0x7fff80488158) at MPI_Isend_ator_c.c:143
> (gdb)
> #3 0x00007fb6e4103de2 in mca_bml_r2_add_procs (nprocs=2, procs=0x2061440, reachable=0x7fff80487b40)
> at ../../../../../ompi/mca/bml/r2/bml_r2.c:208
> 208 rc = btl->btl_add_procs(btl, n_new_procs, new_procs, btl_endpoints, reachable);
> (gdb) print *btl
> $1 = {btl_component = 0x7fb6e82ffee8, btl_eager_limit = 140423556234984, btl_rndv_eager_limit = 140423556235000,
> btl_max_send_size = 140423556235000, btl_rdma_pipeline_send_length = 140423556235016,
> btl_rdma_pipeline_frag_size = 140423556235016, btl_min_rdma_pipeline_size = 140423556235032,
> btl_exclusivity = 3895459608, btl_latency = 32694, btl_bandwidth = 3895459624, btl_flags = 32694,
> btl_seg_size = 140423556235048, btl_add_procs = 0x7fb6e82fff38 <main_arena+184>,
> btl_del_procs = 0x7fb6e82fff38 <main_arena+184>, btl_register = 0x7fb6e82fff48 <main_arena+200>,
> btl_finalize = 0x7fb6e82fff48 <main_arena+200>, btl_alloc = 0x7fb6e82fff58 <main_arena+216>,
> btl_free = 0x7fb6e82fff58 <main_arena+216>, btl_prepare_src = 0x7fb6e82fff68 <main_arena+232>,
> btl_prepare_dst = 0x7fb6e82fff68 <main_arena+232>, btl_send = 0x7fb6e82fff78 <main_arena+248>,
> btl_sendi = 0x7fb6e82fff78 <main_arena+248>, btl_put = 0x7fb6e82fff88 <main_arena+264>,
> btl_get = 0x7fb6e82fff88 <main_arena+264>, btl_dump = 0x7fb6e82fff98 <main_arena+280>,
> btl_mpool = 0x7fb6e82fff98, btl_register_error = 0x7fb6e82fffa8 <main_arena+296>,
> btl_ft_event = 0x7fb6e82fffa8 <main_arena+296>}
> (gdb)
>
>
> From: devel [mailto:devel-bounces_at_[hidden]] On Behalf Of Mike Dubman
> Sent: Monday, February 10, 2014 4:23 AM
> To: Open MPI Developers
> Subject: [OMPI devel] 1.7.5 fails on simple test
>
>
>
> $/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/bin/mpirun -np 8 -mca pml ob1 -mca btl self,tcp /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/examples/hello_usempi
> [vegas12:12724] *** Process received signal ***
> [vegas12:12724] Signal: Segmentation fault (11)
> [vegas12:12724] Signal code: (128)
> [vegas12:12724] Failing at address: (nil)
> [vegas12:12724] [ 0] /lib64/libpthread.so.0[0x3937c0f500]
> [vegas12:12724] [ 1] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/openmpi/mca_btl_tcp.so(mca_btl_tcp_component_init+0x583)[0x7ffff395f813]
> [vegas12:12724] [ 2] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libmpi.so.1(mca_btl_base_select+0x117)[0x7ffff78e14a7]
> [vegas12:12724] [ 3] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/openmpi/mca_bml_r2.so(mca_bml_r2_component_init+0x12)[0x7ffff3ded6f2]
> [vegas12:12724] [ 4] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libmpi.so.1(mca_bml_base_init+0x99)[0x7ffff78e0cc9]
> [vegas12:12724] [ 5] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/openmpi/mca_pml_ob1.so(+0x51d8)[0x7ffff37481d8]
> [vegas12:12724] [ 6] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libmpi.so.1(mca_pml_base_select+0x1e0)[0x7ffff78f31e0]
> [vegas12:12724] [ 7] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libmpi.so.1(ompi_mpi_init+0x52b)[0x7ffff78bffdb]
> [vegas12:12724] [ 8] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libmpi.so.1(MPI_Init+0x170)[0x7ffff78d4210]
> [vegas12:12724] [ 9] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libmpi_mpifh.so.2(PMPI_Init_f08+0x25)[0x7ffff7b71c25]
> [vegas12:12724] [10] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/examples/hello_usempi[0x400c0b]
> [vegas12:12724] [11] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/examples/hello_usempi[0x400d4a]
> [vegas12:12724] [12] /lib64/libc.so.6(__libc_start_main+0xfd)[0x393741ecdd]
> [vegas12:12724] [13] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/examples/hello_usempi[0x400b29]
> [vegas12:12724] *** End of error message ***
> [vegas12:12731] *** Process received signal ***
> [vegas12:12731] Signal: Segmentation fault (11)
> [vegas12:12731] Signal code: (128)
> [vegas12:12731] Failing at address: (nil)
> [vegas12:12731] [ 0] /lib64/libpthread.so.0[0x3937c0f500]
> [vegas12:12731] [ 1] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/openmpi/mca_btl_tcp.so(mca_btl_tcp_component_init+0x583)[0x7ffff395f813]
> [vegas12:12731] [ 2] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libmpi.so.1(mca_btl_base_select+0x117)[0x7ffff78e14a7]
> [vegas12:12731] [ 3] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/openmpi/mca_bml_r2.so(mca_bml_r2_component_init+0x12)[0x7ffff3ded6f2]
> [vegas12:12731] [ 4] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libmpi.so.1(mca_bml_base_init+0x99)[0x7ffff78e0cc9]
> [vegas12:12731] [ 5] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/openmpi/mca_pml_ob1.so(+0x51d8)[0x7ffff37481d8]
> [vegas12:12731] [ 6] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libmpi.so.1(mca_pml_base_select+0x1e0)[0x7ffff78f31e0]
> [vegas12:12731] [ 7] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libmpi.so.1(ompi_mpi_init+0x52b)[0x7ffff78bffdb]
> [vegas12:12731] [ 8] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libmpi.so.1(MPI_Init+0x170)[0x7ffff78d4210]
> [vegas12:12731] [ 9] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libmpi_mpifh.so.2(PMPI_Init_f08+0x25)[0x7ffff7b71c25]
> [vegas12:12731] [10] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/examples/hello_usempi[0x400c0b]
> [vegas12:12731] [11] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/examples/hello_usempi[0x400d4a]
> [vegas12:12731] [12] /lib64/libc.so.6(__libc_start_main+0xfd)[0x393741ecdd]
> [vegas12:12731] [13] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/examples/hello_usempi[0x400b29]
> [vegas12:12731] *** End of error message ***
> --------------------------------------------------------------------------
> mpirun noticed that process rank 0 with PID 12724 on node vegas12 exited on signal 11 (Segmentation fault).
> --------------------------------------------------------------------------
> jenkins_at_vegas12 ~
>
>
> This email message is for the sole use of the intended recipient(s) and may contain confidential information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel