Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: [OMPI devel] MPI_T SEGV on DSO
From: KAWASHIMA Takahiro (rivis.kawashima_at_[hidden])
Date: 2014-07-29 11:04:55


Hi,

I encountered the same SEGV reported on the users list when
running varList program.

  http://www.open-mpi.org/community/lists/users/2014/07/24792.php

mpiexec -n 1 ./varList:
----------------------------------------------------------------
... snip ...
event U/D-2 CHAR n/a ALL
event_base_verbose D/D-8 INT n/a LOCAL 0
event_libevent2021_event_include U/A-3 CHAR n/a LOCAL poll
opal_event_include U/A-3 CHAR n/a LOCAL poll
event_libevent2021_major_version D/A-9 INT n/a UNKNOWN 1
event_libevent2021_minor_version D/A-9 INT n/a UNKNOWN 9
event_libevent2021_release_version D/A-9 INT n/a UNKNOWN 0
shmem U/D-2 CHAR n/a ALL
shmem_base_verbose D/D-8 INT n/a LOCAL 0
shmem_base_RUNTIME_QUERY_hint D/A-9 CHAR n/a ALL-EQ
shmem_mmap_priority U/A-3 INT n/a ALL 50
shmem_mmap_enable_nfs_warning D/A-9 INT n/a LOCAL true
shmem_mmap_relocate_backing_file D/A-9 INT n/a ALL 0
shmem_mmap_backing_file_base_dir D/A-9 CHAR n/a ALL /dev/shm
shmem_mmap_major_version D/A-9 INT n/a UNKNOWN 1
shmem_mmap_minor_version D/A-9 INT n/a UNKNOWN 9
shmem_mmap_release_version D/A-9 INT n/a UNKNOWN 0
shmem_posix_major_version D/A-9 INT n/a UNKNOWN 1201644720
shmem_posix_minor_version D/A-9 INT n/a UNKNOWN 32756
shmem_posix_release_version D/A-9 INT n/a UNKNOWN 6
[ppc:12688] *** Process received signal ***
[ppc:12688] Signal: Segmentation fault (11)
[ppc:12688] Signal code: Invalid permissions (2)
[ppc:12688] Failing at address: 0x7ff4479f83d8
[ppc:12688] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x325c0)[0x7ff4493015c0]
[ppc:12688] [ 1] /home/rivis/opt/openmpi-trunk-debug/lib/libmpi.so.0(PMPI_T_cvar_read+0xbc)[0x7ff44970abb7]
[ppc:12688] [ 2] ./varlist(list_cvars+0x56a)[0x4029bc]
[ppc:12688] [ 3] ./varlist(main+0x42b)[0x403598]
[ppc:12688] [ 4] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xfd)[0x7ff4492edeed]
[ppc:12688] [ 5] ./varlist[0x4016c9]
[ppc:12688] *** End of error message ***
        ----------------------------------------------------------------

I tracked this error and found that this seems related to DSO.

The error occurs when accessing value->intval for the
control variable shmem_sysv_major_version in MPI_T_cvar_read.

  https://svn.open-mpi.org/trac/ompi/browser/trunk/ompi/mpi/tool/cvar_read.c

The 'value' was gotten by mca_base_var_get_value and it points
mca_shmem_sysv_component.super.base_version.mca_component_major_version,
which was dlclose'd in MPI_INIT for DSO.
(component mmap is selected on my environment)

Abnormal shmem_posix_{major,minor,relase}_version values in
my output above are the same reason. SEGV occurs if the memory
was returned to kernel, and abnormal values are printed
if not yet.

So this SEGV doesn't occur if I configure Open MPI with
--disable-dlopen option. I think it's the reason why Nathan
doesn't see this error.

Regards,
KAWASHIMA Takahiro