Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] MPI_T SEGV on DSO
From: Nathan Hjelm (hjelmn_at_[hidden])
Date: 2014-07-29 11:38:43


The problem is the code in question does not check the return code of
MPI_T_cvar_handle_alloc . We are returning an error and they still try
to use the handle (which is stale). Uncomment this section of the code:

                //if (MPI_T_ERR_INVALID_INDEX == err)// { NOTE TZI: This variable is not recognized by Mvapich. It is OpenMPI specific.
                // continue;

Note that MPI_T_ERR_INVALID_INDEX is in the MPI-3 standard but mvapich
must not have implemented it (and thus should not claim to be MPI 3.0).

-Nathan

On Wed, Jul 30, 2014 at 12:04:55AM +0900, KAWASHIMA Takahiro wrote:
> Hi,
>
> I encountered the same SEGV reported on the users list when
> running varList program.
>
> http://www.open-mpi.org/community/lists/users/2014/07/24792.php
>
> mpiexec -n 1 ./varList:
> ----------------------------------------------------------------
> ... snip ...
> event U/D-2 CHAR n/a ALL
> event_base_verbose D/D-8 INT n/a LOCAL 0
> event_libevent2021_event_include U/A-3 CHAR n/a LOCAL poll
> opal_event_include U/A-3 CHAR n/a LOCAL poll
> event_libevent2021_major_version D/A-9 INT n/a UNKNOWN 1
> event_libevent2021_minor_version D/A-9 INT n/a UNKNOWN 9
> event_libevent2021_release_version D/A-9 INT n/a UNKNOWN 0
> shmem U/D-2 CHAR n/a ALL
> shmem_base_verbose D/D-8 INT n/a LOCAL 0
> shmem_base_RUNTIME_QUERY_hint D/A-9 CHAR n/a ALL-EQ
> shmem_mmap_priority U/A-3 INT n/a ALL 50
> shmem_mmap_enable_nfs_warning D/A-9 INT n/a LOCAL true
> shmem_mmap_relocate_backing_file D/A-9 INT n/a ALL 0
> shmem_mmap_backing_file_base_dir D/A-9 CHAR n/a ALL /dev/shm
> shmem_mmap_major_version D/A-9 INT n/a UNKNOWN 1
> shmem_mmap_minor_version D/A-9 INT n/a UNKNOWN 9
> shmem_mmap_release_version D/A-9 INT n/a UNKNOWN 0
> shmem_posix_major_version D/A-9 INT n/a UNKNOWN 1201644720
> shmem_posix_minor_version D/A-9 INT n/a UNKNOWN 32756
> shmem_posix_release_version D/A-9 INT n/a UNKNOWN 6
> [ppc:12688] *** Process received signal ***
> [ppc:12688] Signal: Segmentation fault (11)
> [ppc:12688] Signal code: Invalid permissions (2)
> [ppc:12688] Failing at address: 0x7ff4479f83d8
> [ppc:12688] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x325c0)[0x7ff4493015c0]
> [ppc:12688] [ 1] /home/rivis/opt/openmpi-trunk-debug/lib/libmpi.so.0(PMPI_T_cvar_read+0xbc)[0x7ff44970abb7]
> [ppc:12688] [ 2] ./varlist(list_cvars+0x56a)[0x4029bc]
> [ppc:12688] [ 3] ./varlist(main+0x42b)[0x403598]
> [ppc:12688] [ 4] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xfd)[0x7ff4492edeed]
> [ppc:12688] [ 5] ./varlist[0x4016c9]
> [ppc:12688] *** End of error message ***
> ----------------------------------------------------------------
>
> I tracked this error and found that this seems related to DSO.
>
> The error occurs when accessing value->intval for the
> control variable shmem_sysv_major_version in MPI_T_cvar_read.
>
> https://svn.open-mpi.org/trac/ompi/browser/trunk/ompi/mpi/tool/cvar_read.c
>
> The 'value' was gotten by mca_base_var_get_value and it points
> mca_shmem_sysv_component.super.base_version.mca_component_major_version,
> which was dlclose'd in MPI_INIT for DSO.
> (component mmap is selected on my environment)
>
> Abnormal shmem_posix_{major,minor,relase}_version values in
> my output above are the same reason. SEGV occurs if the memory
> was returned to kernel, and abnormal values are printed
> if not yet.
>
> So this SEGV doesn't occur if I configure Open MPI with
> --disable-dlopen option. I think it's the reason why Nathan
> doesn't see this error.
>
> Regards,
> KAWASHIMA Takahiro
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: http://www.open-mpi.org/community/lists/devel/2014/07/15304.php



  • application/pgp-signature attachment: stored