Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Possible bug in finalize, OpenMPI v1.5, head revision
From: Ralph Castain (rhc_at_[hidden])
Date: 2012-01-19 10:44:49


I didn't commit anything to the v1.5 branch yesterday - just the trunk.

As I told Mike off-list, I think it may have been that the binary was compiled against a different OMPI version by mistake. It looks very much like what I'd expect to have happen in that scenario.

On Jan 19, 2012, at 7:52 AM, Jeff Squyres wrote:

> Did you "svn up"? I ask because Ralph committed some stuff yesterday that may have fixed this.
>
>
> On Jan 18, 2012, at 5:19 PM, Andrew Senin wrote:
>
>> No, nothing specific. Only basic settings (--mca btl openib,self
>> --npernode 1, etc).
>>
>> Actually I'm were confused with this error because today it just
>> disapeared. I had 2 separate folders where it was reproduced in 100%
>> of test runs. Today I recompiled the source and it is gone in both
>> folders. But yesterday I tried recompiling multiple times with no
>> effect. So I believe this must be somehow related to some unknown
>> settings in the lab which have been changed. Trying to reproduce the
>> crash now...
>>
>> Regards,
>> Andrew Senin.
>>
>> On Thu, Jan 19, 2012 at 12:05 AM, Jeff Squyres <jsquyres_at_[hidden]> wrote:
>>> Jumping in pretty late in this thread here...
>>>
>>> I see that it's failing in opal_hwloc_base_close(). That's a little worrysome.
>>>
>>> I do see an odd path through the hwloc initialization that *could* cause an error during finalization -- but it would involve you setting an invalid value for an MCA parameter. Are you setting hwloc_base_mem_bind_failure_action or
>>> hwloc_base_mem_alloc_policy, perchance?
>>>
>>>
>>> On Jan 16, 2012, at 1:56 PM, Andrew Senin wrote:
>>>
>>>> Hi,
>>>>
>>>> I think I've found a bug in the hear revision of the OpenMPI 1.5
>>>> branch. If it is configured with --disable-debug it crashes in
>>>> finalize on the hello_c.c example. Did I miss something out?
>>>>
>>>> Configure options:
>>>> ./configure --with-pmi=/usr/ --with-slurm=/usr/ --without-psm
>>>> --disable-debug --enable-mpirun-prefix-by-default
>>>> --prefix=/hpc/home/USERS/senina/projects/distribs/openmpi-svn_v1.5/install
>>>>
>>>> Runtime command and output:
>>>> LD_LIBRARY_PATH=$LD_LIBRARY_PATH:../lib ./mpirun --mca btl openib,self
>>>> --npernode 1 --host mir1,mir2 ./hello
>>>>
>>>> Hello, world, I am 0 of 2
>>>> Hello, world, I am 1 of 2
>>>> [mir1:05542] *** Process received signal ***
>>>> [mir1:05542] Signal: Segmentation fault (11)
>>>> [mir1:05542] Signal code: Address not mapped (1)
>>>> [mir1:05542] Failing at address: 0xe8
>>>> [mir2:10218] *** Process received signal ***
>>>> [mir2:10218] Signal: Segmentation fault (11)
>>>> [mir2:10218] Signal code: Address not mapped (1)
>>>> [mir2:10218] Failing at address: 0xe8
>>>> [mir1:05542] [ 0] /lib64/libpthread.so.0() [0x390d20f4c0]
>>>> [mir1:05542] [ 1]
>>>> /hpc/home/USERS/senina/projects/distribs/openmpi-svn_v1.5/install/lib/libmpi.so.1(+0x1346a8)
>>>> [0x7f4588cee6a8]
>>>> [mir1:05542] [ 2]
>>>> /hpc/home/USERS/senina/projects/distribs/openmpi-svn_v1.5/install/lib/libmpi.so.1(opal_hwloc_base_close+0x32)
>>>> [0x7f4588cee700]
>>>> [mir1:05542] [ 3]
>>>> /hpc/home/USERS/senina/projects/distribs/openmpi-svn_v1.5/install/lib/libmpi.so.1(opal_finalize+0x73)
>>>> [0x7f4588d1beb2]
>>>> [mir1:05542] [ 4]
>>>> /hpc/home/USERS/senina/projects/distribs/openmpi-svn_v1.5/install/lib/libmpi.so.1(orte_finalize+0xfe)
>>>> [0x7f4588c81eb5]
>>>> [mir1:05542] [ 5]
>>>> /hpc/home/USERS/senina/projects/distribs/openmpi-svn_v1.5/install/lib/libmpi.so.1(ompi_mpi_finalize+0x67a)
>>>> [0x7f4588c217c3]
>>>> [mir1:05542] [ 6]
>>>> /hpc/home/USERS/senina/projects/distribs/openmpi-svn_v1.5/install/lib/libmpi.so.1(PMPI_Finalize+0x59)
>>>> [0x7f4588c39959]
>>>> [mir1:05542] [ 7] ./hello(main+0x69) [0x4008fd]
>>>> [mir1:05542] [ 8] /lib64/libc.so.6(__libc_start_main+0xfd) [0x390ca1ec5d]
>>>> [mir1:05542] [ 9] ./hello() [0x4007d9]
>>>> [mir1:05542] *** End of error message ***
>>>> [mir2:10218] [ 0] /lib64/libpthread.so.0() [0x3a6dc0f4c0]
>>>> [mir2:10218] [ 1]
>>>> /hpc/home/USERS/senina/projects/distribs/openmpi-svn_v1.5/install/lib/libmpi.so.1(+0x1346a8)
>>>> [0x7f409f31d6a8]
>>>> [mir2:10218] [ 2]
>>>> /hpc/home/USERS/senina/projects/distribs/openmpi-svn_v1.5/install/lib/libmpi.so.1(opal_hwloc_base_close+0x32)
>>>> [0x7f409f31d700]
>>>> [mir2:10218] [ 3]
>>>> /hpc/home/USERS/senina/projects/distribs/openmpi-svn_v1.5/install/lib/libmpi.so.1(opal_finalize+0x73)
>>>> [0x7f409f34aeb2]
>>>> [mir2:10218] [ 4]
>>>> /hpc/home/USERS/senina/projects/distribs/openmpi-svn_v1.5/install/lib/libmpi.so.1(orte_finalize+0xfe)
>>>> [0x7f409f2b0eb5]
>>>> [mir2:10218] [ 5]
>>>> /hpc/home/USERS/senina/projects/distribs/openmpi-svn_v1.5/install/lib/libmpi.so.1(ompi_mpi_finalize+0x67a)
>>>> [0x7f409f2507c3]
>>>> [mir2:10218] [ 6]
>>>> /hpc/home/USERS/senina/projects/distribs/openmpi-svn_v1.5/install/lib/libmpi.so.1(PMPI_Finalize+0x59)
>>>> [0x7f409f268959]
>>>> [mir2:10218] [ 7] ./hello(main+0x69) [0x4008fd]
>>>> [mir2:10218] [ 8] /lib64/libc.so.6(__libc_start_main+0xfd) [0x3a6d41ec5d]
>>>> [mir2:10218] [ 9] ./hello() [0x4007d9]
>>>> [mir2:10218] *** End of error message ***
>>>> --------------------------------------------------------------------------
>>>> mpirun noticed that process rank 0 with PID 5542 on node mir1 exited
>>>> on signal 11 (Segmentation fault).
>>>> ---------------------------------------------------------------------
>>>>
>>>> Thanks,
>>>> Andrew Senin
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>>
>>> --
>>> Jeff Squyres
>>> jsquyres_at_[hidden]
>>> For corporate legal information go to:
>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> --
> Jeff Squyres
> jsquyres_at_[hidden]
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users