Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] 1.7.5 fails on simple test
From: Paul Hargrove (phhargrove_at_[hidden])
Date: 2014-02-10 21:54:17


All the platforms that failed over the weekend have passed today.

-Paul

On Mon, Feb 10, 2014 at 2:34 PM, Paul Hargrove <phhargrove_at_[hidden]> wrote:

> The fastest of my systems that failed over the weekend (a ppc64) has
> completed tests successfully.
> I will report on the ppc32 and SPARC results when they have all passed or
> failed.
>
> -Paul
>
>
> On Mon, Feb 10, 2014 at 1:52 PM, Ralph Castain <rhc_at_[hidden]> wrote:
>
>> Tarball is now posted
>>
>> On Feb 10, 2014, at 1:31 PM, Ralph Castain <rhc_at_[hidden]> wrote:
>>
>> Generating it now - sorry for my lack of response, my OMPI email was down
>> for some reason. I can now receive it, but still haven't gotten the backlog
>> from the down period.
>>
>>
>> On Feb 10, 2014, at 1:23 PM, Paul Hargrove <phhargrove_at_[hidden]> wrote:
>>
>> Ralph,
>>
>> If you give me a heads-up when this makes it into a tarball, I will
>> retest my failing ppc and sparc platforms.
>>
>> -Paul
>>
>>
>> On Mon, Feb 10, 2014 at 1:13 PM, Rolf vandeVaart <rvandevaart_at_[hidden]>wrote:
>>
>>> I have tracked this down. There is a missing commit that affects
>>> ompi_mpi_init.c causing it to initialize bml twice.
>>>
>>> Ralph, can you apply r30310 to 1.7?
>>>
>>>
>>>
>>> Thanks,
>>>
>>> Rolf
>>>
>>>
>>>
>>> *From:* devel [mailto:devel-bounces_at_[hidden]] *On Behalf Of *Rolf
>>> vandeVaart
>>> *Sent:* Monday, February 10, 2014 12:29 PM
>>> *To:* Open MPI Developers
>>> *Subject:* Re: [OMPI devel] 1.7.5 fails on simple test
>>>
>>>
>>>
>>> I have seen this same issue although my core dump is a little bit
>>> different. I am running with tcp,self. The first entry in the list of
>>> BTLs is garbage, but then there is tcp and self in the list. Strange.
>>> This is my core dump. Line 208 in bml_r2.c is where I get the SEGV.
>>>
>>>
>>>
>>> Program terminated with signal 11, Segmentation fault.
>>>
>>> #0 0x00007fb6dec981d0 in ?? ()
>>>
>>> Missing separate debuginfos, use: debuginfo-install
>>> glibc-2.12-1.107.el6_4.5.x86_64
>>>
>>> (gdb) where
>>>
>>> #0 0x00007fb6dec981d0 in ?? ()
>>>
>>> #1 <signal handler called>
>>>
>>> #2 0x00007fb6e82fff38 in main_arena () from /lib64/libc.so.6
>>>
>>> #3 0x00007fb6e4103de2 in mca_bml_r2_add_procs (nprocs=2,
>>> procs=0x2061440, reachable=0x7fff80487b40)
>>>
>>> at ../../../../../ompi/mca/bml/r2/bml_r2.c:208
>>>
>>> #4 0x00007fb6df50a751 in mca_pml_ob1_add_procs (procs=0x2060bc0,
>>> nprocs=2)
>>>
>>> at ../../../../../ompi/mca/pml/ob1/pml_ob1.c:332
>>>
>>> #5 0x00007fb6e8570dca in ompi_mpi_init (argc=1, argv=0x7fff80488158,
>>> requested=0, provided=0x7fff80487cc8)
>>>
>>> at ../../ompi/runtime/ompi_mpi_init.c:776
>>>
>>> #6 0x00007fb6e85a3606 in PMPI_Init (argc=0x7fff80487d8c,
>>> argv=0x7fff80487d80) at pinit.c:84
>>>
>>> #7 0x0000000000401c56 in main (argc=1, argv=0x7fff80488158) at
>>> MPI_Isend_ator_c.c:143
>>>
>>> (gdb)
>>>
>>> #3 0x00007fb6e4103de2 in mca_bml_r2_add_procs (nprocs=2,
>>> procs=0x2061440, reachable=0x7fff80487b40)
>>>
>>> at ../../../../../ompi/mca/bml/r2/bml_r2.c:208
>>>
>>> 208 rc = btl->btl_add_procs(btl, n_new_procs, new_procs,
>>> btl_endpoints, reachable);
>>>
>>> (gdb) print *btl
>>>
>>> $1 = {btl_component = 0x7fb6e82ffee8, btl_eager_limit = 140423556234984,
>>> btl_rndv_eager_limit = 140423556235000,
>>>
>>> btl_max_send_size = 140423556235000, btl_rdma_pipeline_send_length =
>>> 140423556235016,
>>>
>>> btl_rdma_pipeline_frag_size = 140423556235016,
>>> btl_min_rdma_pipeline_size = 140423556235032,
>>>
>>> btl_exclusivity = 3895459608, btl_latency = 32694, btl_bandwidth =
>>> 3895459624, btl_flags = 32694,
>>>
>>> btl_seg_size = 140423556235048, btl_add_procs = 0x7fb6e82fff38
>>> <main_arena+184>,
>>>
>>> btl_del_procs = 0x7fb6e82fff38 <main_arena+184>, btl_register =
>>> 0x7fb6e82fff48 <main_arena+200>,
>>>
>>> btl_finalize = 0x7fb6e82fff48 <main_arena+200>, btl_alloc =
>>> 0x7fb6e82fff58 <main_arena+216>,
>>>
>>> btl_free = 0x7fb6e82fff58 <main_arena+216>, btl_prepare_src =
>>> 0x7fb6e82fff68 <main_arena+232>,
>>>
>>> btl_prepare_dst = 0x7fb6e82fff68 <main_arena+232>, btl_send =
>>> 0x7fb6e82fff78 <main_arena+248>,
>>>
>>> btl_sendi = 0x7fb6e82fff78 <main_arena+248>, btl_put = 0x7fb6e82fff88
>>> <main_arena+264>,
>>>
>>> btl_get = 0x7fb6e82fff88 <main_arena+264>, btl_dump = 0x7fb6e82fff98
>>> <main_arena+280>,
>>>
>>> btl_mpool = 0x7fb6e82fff98, btl_register_error = 0x7fb6e82fffa8
>>> <main_arena+296>,
>>>
>>> btl_ft_event = 0x7fb6e82fffa8 <main_arena+296>}
>>>
>>> (gdb)
>>>
>>>
>>>
>>>
>>>
>>> *From:* devel [mailto:devel-bounces_at_[hidden]<devel-bounces_at_[hidden]>]
>>> *On Behalf Of *Mike Dubman
>>> *Sent:* Monday, February 10, 2014 4:23 AM
>>> *To:* Open MPI Developers
>>> *Subject:* [OMPI devel] 1.7.5 fails on simple test
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> *$/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/bin/mpirun -np 8 -mca pml ob1 -mca btl self,tcp /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/examples/hello_usempi*
>>>
>>> *[vegas12:12724] *** Process received signal ****
>>>
>>> *[vegas12:12724] Signal: Segmentation fault (11)*
>>>
>>> *[vegas12:12724] Signal code: (128)*
>>>
>>> *[vegas12:12724] Failing at address: (nil)*
>>>
>>> *[vegas12:12724] [ 0] /lib64/libpthread.so.0[0x3937c0f500]*
>>>
>>> *[vegas12:12724] [ 1] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/openmpi/mca_btl_tcp.so(mca_btl_tcp_component_init+0x583)[0x7ffff395f813]*
>>>
>>> *[vegas12:12724] [ 2] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libmpi.so.1(mca_btl_base_select+0x117)[0x7ffff78e14a7]*
>>>
>>> *[vegas12:12724] [ 3] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/openmpi/mca_bml_r2.so(mca_bml_r2_component_init+0x12)[0x7ffff3ded6f2]*
>>>
>>> *[vegas12:12724] [ 4] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libmpi.so.1(mca_bml_base_init+0x99)[0x7ffff78e0cc9]*
>>>
>>> *[vegas12:12724] [ 5] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/openmpi/mca_pml_ob1.so(+0x51d8)[0x7ffff37481d8]*
>>>
>>> *[vegas12:12724] [ 6] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libmpi.so.1(mca_pml_base_select+0x1e0)[0x7ffff78f31e0]*
>>>
>>> *[vegas12:12724] [ 7] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libmpi.so.1(ompi_mpi_init+0x52b)[0x7ffff78bffdb]*
>>>
>>> *[vegas12:12724] [ 8] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libmpi.so.1(MPI_Init+0x170)[0x7ffff78d4210]*
>>>
>>> *[vegas12:12724] [ 9] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libmpi_mpifh.so.2(PMPI_Init_f08+0x25)[0x7ffff7b71c25]*
>>>
>>> *[vegas12:12724] [10] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/examples/hello_usempi[0x400c0b]*
>>>
>>> *[vegas12:12724] [11] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/examples/hello_usempi[0x400d4a]*
>>>
>>> *[vegas12:12724] [12] /lib64/libc.so.6(__libc_start_main+0xfd)[0x393741ecdd]*
>>>
>>> *[vegas12:12724] [13] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/examples/hello_usempi[0x400b29]*
>>>
>>> *[vegas12:12724] *** End of error message ****
>>>
>>> *[vegas12:12731] *** Process received signal ****
>>>
>>> *[vegas12:12731] Signal: Segmentation fault (11)*
>>>
>>> *[vegas12:12731] Signal code: (128)*
>>>
>>> *[vegas12:12731] Failing at address: (nil)*
>>>
>>> *[vegas12:12731] [ 0] /lib64/libpthread.so.0[0x3937c0f500]*
>>>
>>> *[vegas12:12731] [ 1] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/openmpi/mca_btl_tcp.so(mca_btl_tcp_component_init+0x583)[0x7ffff395f813]*
>>>
>>> *[vegas12:12731] [ 2] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libmpi.so.1(mca_btl_base_select+0x117)[0x7ffff78e14a7]*
>>>
>>> *[vegas12:12731] [ 3] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/openmpi/mca_bml_r2.so(mca_bml_r2_component_init+0x12)[0x7ffff3ded6f2]*
>>>
>>> *[vegas12:12731] [ 4] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libmpi.so.1(mca_bml_base_init+0x99)[0x7ffff78e0cc9]*
>>>
>>> *[vegas12:12731] [ 5] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/openmpi/mca_pml_ob1.so(+0x51d8)[0x7ffff37481d8]*
>>>
>>> *[vegas12:12731] [ 6] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libmpi.so.1(mca_pml_base_select+0x1e0)[0x7ffff78f31e0]*
>>>
>>> *[vegas12:12731] [ 7] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libmpi.so.1(ompi_mpi_init+0x52b)[0x7ffff78bffdb]*
>>>
>>> *[vegas12:12731] [ 8] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libmpi.so.1(MPI_Init+0x170)[0x7ffff78d4210]*
>>>
>>> *[vegas12:12731] [ 9] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libmpi_mpifh.so.2(PMPI_Init_f08+0x25)[0x7ffff7b71c25]*
>>>
>>> *[vegas12:12731] [10] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/examples/hello_usempi[0x400c0b]*
>>>
>>> *[vegas12:12731] [11] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/examples/hello_usempi[0x400d4a]*
>>>
>>> *[vegas12:12731] [12] /lib64/libc.so.6(__libc_start_main+0xfd)[0x393741ecdd]*
>>>
>>> *[vegas12:12731] [13] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/examples/hello_usempi[0x400b29]*
>>>
>>> *[vegas12:12731] *** End of error message ****
>>>
>>> *--------------------------------------------------------------------------*
>>>
>>> *mpirun noticed that process rank 0 with PID 12724 on node vegas12 exited on signal 11 (Segmentation fault).*
>>>
>>> *--------------------------------------------------------------------------*
>>>
>>> *jenkins_at_vegas12 ~*
>>>
>>>
>>>
>>>
>>>
>>> ------------------------------
>>>
>>> This email message is for the sole use of the intended recipient(s) and
>>> may contain confidential information. Any unauthorized review, use,
>>> disclosure or distribution is prohibited. If you are not the intended
>>> recipient, please contact the sender by reply email and destroy all copies
>>> of the original message.
>>> ------------------------------
>>>
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>
>>
>>
>> --
>> Paul H. Hargrove PHHargrove_at_[hidden]
>> Future Technologies Group
>> Computer and Data Sciences Department Tel: +1-510-495-2352
>> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>>
>>
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>
>
>
> --
> Paul H. Hargrove PHHargrove_at_[hidden]
> Future Technologies Group
> Computer and Data Sciences Department Tel: +1-510-495-2352
> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
>

-- 
Paul H. Hargrove                          PHHargrove_at_[hidden]
Future Technologies Group
Computer and Data Sciences Department     Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900