Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] [SPAM:## 71%] checkpoint --term core dump
From: Ralph Castain (rhc_at_[hidden])
Date: 2013-01-04 22:01:59


I'm afraid we have lost our checkpoint/restart support, so we probably won't be able to address this unless he just happens to glance in at some time. Only suggestion I could make is to not enable the thread options as thread support is weak at best.

On Jan 4, 2013, at 4:34 PM, William Au <au_wai_chung_at_[hidden]> wrote:

> Hi,
>
> I encountered a core dump when using ompi-checkpoint --term pid.
>
> Here is the trace:
>
> [genova:01808] *** Process received signal ***
> [genova:01808] Signal: Segmentation fault (11)
> [genova:01808] Signal code: Address not mapped (1)
> [genova:01808] Failing at address: 0x90
> [genova:01808] [ 0] /lib64/libpthread.so.0 [0x3a78a0ebe0]
> [genova:01808] [ 1] /import/cad-capex2/wa156553/openmpi-1.6_x86_64_i4/lib/openmpi/mca_crcp_bkmrk.so [0x2aaaaefe110b]
> [genova:01808] [ 2] /import/cad-capex2/wa156553/openmpi-1.6_x86_64_i4/lib/openmpi/mca_crcp_bkmrk.so [0x2aaaaefe4952]
> [genova:01808] [ 3] /import/cad-capex2/wa156553/openmpi-1.6_x86_64_i4/lib/openmpi/mca_crcp_bkmrk.so(ompi_crcp_bkmrk_pml_ft_event+0x74e) [0x2aaaaefe5b9e]
> [genova:01808] [ 4] /import/cad-capex2/wa156553/openmpi-1.6_x86_64_i4/lib/openmpi/mca_pml_crcpw.so(mca_pml_crcpw_ft_event+0x59) [0x2aaaacc1eea9]
> [genova:01808] [ 5] /import/cad-capex2/wa156553/openmpi-1.6_x86_64_i4/lib/libmpi.so.1(ompi_cr_coord+0xe0) [0x2b95b29a5690]
> [genova:01808] [ 6] /import/cad-capex2/wa156553/openmpi-1.6_x86_64_i4/lib/libmpi.so.1(opal_cr_inc_core_prep+0xc) [0x2b95b2a6017c]
> [genova:01808] [ 7] /import/cad-capex2/wa156553/openmpi-1.6_x86_64_i4/lib/openmpi/mca_snapc_full.so [0x2aaaab7d9d15]
> [genova:01808] [ 8] /import/cad-capex2/wa156553/openmpi-1.6_x86_64_i4/lib/libmpi.so.1(opal_cr_test_if_checkpoint_ready+0x52) [0x2b95b2a60282]
> [genova:01808] [ 9] /import/cad-capex2/wa156553/openmpi-1.6_x86_64_i4/lib/libmpi.so.1 [0x2b95b2a60ec1]
> [genova:01808] [10] /lib64/libpthread.so.0 [0x3a78a0677d]
> [genova:01808] [11] /lib64/libc.so.6(clone+0x6d) [0x3a77ad3c1d]
> [genova:01808] *** End of error message ***
> [genova:01807] local) Error: Unable to read state from named pipe (/tmp/opal_cr_prog_write.1808). 0
> [genova:01807] [[8178,0],0] ORTE_ERROR_LOG: Error in file snapc_full_local.c at line 1602
> [genova:01807] local) Error: Unable to read state from named pipe (/tmp/opal_cr_prog_write.1810). 0
> [genova:01807] [[8178,0],0] ORTE_ERROR_LOG: Error in file snapc_full_local.c at line 1602
> [genova:01807] local) Error: Unable to read state from named pipe (/tmp/opal_cr_prog_write.1809). 0
> [genova:01807] [[8178,0],0] ORTE_ERROR_LOG: Error in file snapc_full_local.c at line 1602
>
> I configure with the following options:
>
> ./configure --enable-heterogeneous --enable-cxx-exceptions --enable-shared --enable-orterun-prefix-by-default --enable-mpi-f90 --with-mpi-f90-size=small --with-ft=cr --with-blcr=/opt/blcr --with-blcr-libdir=/opt/blcr/lib --enable-ft-thread --enable-opal-multi-threads
>
> I am using openmpi 1.6.
>
> Any idea where I should look?
>
> Thanks.
>
> Regards,
>
> William
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users