Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Possible bug in finalize, OpenMPI v1.5, head revision
From: Andrew Senin (andrew.senin_at_[hidden])
Date: 2012-01-18 17:19:09


No, nothing specific. Only basic settings (--mca btl openib,self
--npernode 1, etc).

Actually I'm were confused with this error because today it just
disapeared. I had 2 separate folders where it was reproduced in 100%
of test runs. Today I recompiled the source and it is gone in both
folders. But yesterday I tried recompiling multiple times with no
effect. So I believe this must be somehow related to some unknown
settings in the lab which have been changed. Trying to reproduce the
crash now...

Regards,
Andrew Senin.

On Thu, Jan 19, 2012 at 12:05 AM, Jeff Squyres <jsquyres_at_[hidden]> wrote:
> Jumping in pretty late in this thread here...
>
> I see that it's failing in opal_hwloc_base_close().  That's a little worrysome.
>
> I do see an odd path through the hwloc initialization that *could* cause an error during finalization -- but it would involve you setting an invalid value for an MCA parameter.  Are you setting hwloc_base_mem_bind_failure_action or
> hwloc_base_mem_alloc_policy, perchance?
>
>
> On Jan 16, 2012, at 1:56 PM, Andrew Senin wrote:
>
>> Hi,
>>
>> I think I've found a bug in the hear revision of the OpenMPI 1.5
>> branch. If it is configured with --disable-debug it crashes in
>> finalize on the hello_c.c example. Did I miss something out?
>>
>> Configure options:
>> ./configure --with-pmi=/usr/ --with-slurm=/usr/ --without-psm
>> --disable-debug --enable-mpirun-prefix-by-default
>> --prefix=/hpc/home/USERS/senina/projects/distribs/openmpi-svn_v1.5/install
>>
>> Runtime command and output:
>> LD_LIBRARY_PATH=$LD_LIBRARY_PATH:../lib ./mpirun --mca btl openib,self
>> --npernode 1 --host mir1,mir2 ./hello
>>
>> Hello, world, I am 0 of 2
>> Hello, world, I am 1 of 2
>> [mir1:05542] *** Process received signal ***
>> [mir1:05542] Signal: Segmentation fault (11)
>> [mir1:05542] Signal code: Address not mapped (1)
>> [mir1:05542] Failing at address: 0xe8
>> [mir2:10218] *** Process received signal ***
>> [mir2:10218] Signal: Segmentation fault (11)
>> [mir2:10218] Signal code: Address not mapped (1)
>> [mir2:10218] Failing at address: 0xe8
>> [mir1:05542] [ 0] /lib64/libpthread.so.0() [0x390d20f4c0]
>> [mir1:05542] [ 1]
>> /hpc/home/USERS/senina/projects/distribs/openmpi-svn_v1.5/install/lib/libmpi.so.1(+0x1346a8)
>> [0x7f4588cee6a8]
>> [mir1:05542] [ 2]
>> /hpc/home/USERS/senina/projects/distribs/openmpi-svn_v1.5/install/lib/libmpi.so.1(opal_hwloc_base_close+0x32)
>> [0x7f4588cee700]
>> [mir1:05542] [ 3]
>> /hpc/home/USERS/senina/projects/distribs/openmpi-svn_v1.5/install/lib/libmpi.so.1(opal_finalize+0x73)
>> [0x7f4588d1beb2]
>> [mir1:05542] [ 4]
>> /hpc/home/USERS/senina/projects/distribs/openmpi-svn_v1.5/install/lib/libmpi.so.1(orte_finalize+0xfe)
>> [0x7f4588c81eb5]
>> [mir1:05542] [ 5]
>> /hpc/home/USERS/senina/projects/distribs/openmpi-svn_v1.5/install/lib/libmpi.so.1(ompi_mpi_finalize+0x67a)
>> [0x7f4588c217c3]
>> [mir1:05542] [ 6]
>> /hpc/home/USERS/senina/projects/distribs/openmpi-svn_v1.5/install/lib/libmpi.so.1(PMPI_Finalize+0x59)
>> [0x7f4588c39959]
>> [mir1:05542] [ 7] ./hello(main+0x69) [0x4008fd]
>> [mir1:05542] [ 8] /lib64/libc.so.6(__libc_start_main+0xfd) [0x390ca1ec5d]
>> [mir1:05542] [ 9] ./hello() [0x4007d9]
>> [mir1:05542] *** End of error message ***
>> [mir2:10218] [ 0] /lib64/libpthread.so.0() [0x3a6dc0f4c0]
>> [mir2:10218] [ 1]
>> /hpc/home/USERS/senina/projects/distribs/openmpi-svn_v1.5/install/lib/libmpi.so.1(+0x1346a8)
>> [0x7f409f31d6a8]
>> [mir2:10218] [ 2]
>> /hpc/home/USERS/senina/projects/distribs/openmpi-svn_v1.5/install/lib/libmpi.so.1(opal_hwloc_base_close+0x32)
>> [0x7f409f31d700]
>> [mir2:10218] [ 3]
>> /hpc/home/USERS/senina/projects/distribs/openmpi-svn_v1.5/install/lib/libmpi.so.1(opal_finalize+0x73)
>> [0x7f409f34aeb2]
>> [mir2:10218] [ 4]
>> /hpc/home/USERS/senina/projects/distribs/openmpi-svn_v1.5/install/lib/libmpi.so.1(orte_finalize+0xfe)
>> [0x7f409f2b0eb5]
>> [mir2:10218] [ 5]
>> /hpc/home/USERS/senina/projects/distribs/openmpi-svn_v1.5/install/lib/libmpi.so.1(ompi_mpi_finalize+0x67a)
>> [0x7f409f2507c3]
>> [mir2:10218] [ 6]
>> /hpc/home/USERS/senina/projects/distribs/openmpi-svn_v1.5/install/lib/libmpi.so.1(PMPI_Finalize+0x59)
>> [0x7f409f268959]
>> [mir2:10218] [ 7] ./hello(main+0x69) [0x4008fd]
>> [mir2:10218] [ 8] /lib64/libc.so.6(__libc_start_main+0xfd) [0x3a6d41ec5d]
>> [mir2:10218] [ 9] ./hello() [0x4007d9]
>> [mir2:10218] *** End of error message ***
>> --------------------------------------------------------------------------
>> mpirun noticed that process rank 0 with PID 5542 on node mir1 exited
>> on signal 11 (Segmentation fault).
>> ---------------------------------------------------------------------
>>
>> Thanks,
>> Andrew Senin
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> --
> Jeff Squyres
> jsquyres_at_[hidden]
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users