Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] Possible bug in finalize, OpenMPI v1.5, head revision
From: Mike Dubman (mike.ompi_at_[hidden])
Date: 2012-01-22 15:40:49


it was compiled with the same ompi.
We see it occasionally on different clusters with different ompi folders.
(all v1.5)

On Thu, Jan 19, 2012 at 5:44 PM, Ralph Castain <rhc_at_[hidden]> wrote:

> I didn't commit anything to the v1.5 branch yesterday - just the trunk.
>
> As I told Mike off-list, I think it may have been that the binary was
> compiled against a different OMPI version by mistake. It looks very much
> like what I'd expect to have happen in that scenario.
>
> On Jan 19, 2012, at 7:52 AM, Jeff Squyres wrote:
>
> > Did you "svn up"? I ask because Ralph committed some stuff yesterday
> that may have fixed this.
> >
> >
> > On Jan 18, 2012, at 5:19 PM, Andrew Senin wrote:
> >
> >> No, nothing specific. Only basic settings (--mca btl openib,self
> >> --npernode 1, etc).
> >>
> >> Actually I'm were confused with this error because today it just
> >> disapeared. I had 2 separate folders where it was reproduced in 100%
> >> of test runs. Today I recompiled the source and it is gone in both
> >> folders. But yesterday I tried recompiling multiple times with no
> >> effect. So I believe this must be somehow related to some unknown
> >> settings in the lab which have been changed. Trying to reproduce the
> >> crash now...
> >>
> >> Regards,
> >> Andrew Senin.
> >>
> >> On Thu, Jan 19, 2012 at 12:05 AM, Jeff Squyres <jsquyres_at_[hidden]>
> wrote:
> >>> Jumping in pretty late in this thread here...
> >>>
> >>> I see that it's failing in opal_hwloc_base_close(). That's a little
> worrysome.
> >>>
> >>> I do see an odd path through the hwloc initialization that *could*
> cause an error during finalization -- but it would involve you setting an
> invalid value for an MCA parameter. Are you setting
> hwloc_base_mem_bind_failure_action or
> >>> hwloc_base_mem_alloc_policy, perchance?
> >>>
> >>>
> >>> On Jan 16, 2012, at 1:56 PM, Andrew Senin wrote:
> >>>
> >>>> Hi,
> >>>>
> >>>> I think I've found a bug in the hear revision of the OpenMPI 1.5
> >>>> branch. If it is configured with --disable-debug it crashes in
> >>>> finalize on the hello_c.c example. Did I miss something out?
> >>>>
> >>>> Configure options:
> >>>> ./configure --with-pmi=/usr/ --with-slurm=/usr/ --without-psm
> >>>> --disable-debug --enable-mpirun-prefix-by-default
> >>>>
> --prefix=/hpc/home/USERS/senina/projects/distribs/openmpi-svn_v1.5/install
> >>>>
> >>>> Runtime command and output:
> >>>> LD_LIBRARY_PATH=$LD_LIBRARY_PATH:../lib ./mpirun --mca btl openib,self
> >>>> --npernode 1 --host mir1,mir2 ./hello
> >>>>
> >>>> Hello, world, I am 0 of 2
> >>>> Hello, world, I am 1 of 2
> >>>> [mir1:05542] *** Process received signal ***
> >>>> [mir1:05542] Signal: Segmentation fault (11)
> >>>> [mir1:05542] Signal code: Address not mapped (1)
> >>>> [mir1:05542] Failing at address: 0xe8
> >>>> [mir2:10218] *** Process received signal ***
> >>>> [mir2:10218] Signal: Segmentation fault (11)
> >>>> [mir2:10218] Signal code: Address not mapped (1)
> >>>> [mir2:10218] Failing at address: 0xe8
> >>>> [mir1:05542] [ 0] /lib64/libpthread.so.0() [0x390d20f4c0]
> >>>> [mir1:05542] [ 1]
> >>>>
> /hpc/home/USERS/senina/projects/distribs/openmpi-svn_v1.5/install/lib/libmpi.so.1(+0x1346a8)
> >>>> [0x7f4588cee6a8]
> >>>> [mir1:05542] [ 2]
> >>>>
> /hpc/home/USERS/senina/projects/distribs/openmpi-svn_v1.5/install/lib/libmpi.so.1(opal_hwloc_base_close+0x32)
> >>>> [0x7f4588cee700]
> >>>> [mir1:05542] [ 3]
> >>>>
> /hpc/home/USERS/senina/projects/distribs/openmpi-svn_v1.5/install/lib/libmpi.so.1(opal_finalize+0x73)
> >>>> [0x7f4588d1beb2]
> >>>> [mir1:05542] [ 4]
> >>>>
> /hpc/home/USERS/senina/projects/distribs/openmpi-svn_v1.5/install/lib/libmpi.so.1(orte_finalize+0xfe)
> >>>> [0x7f4588c81eb5]
> >>>> [mir1:05542] [ 5]
> >>>>
> /hpc/home/USERS/senina/projects/distribs/openmpi-svn_v1.5/install/lib/libmpi.so.1(ompi_mpi_finalize+0x67a)
> >>>> [0x7f4588c217c3]
> >>>> [mir1:05542] [ 6]
> >>>>
> /hpc/home/USERS/senina/projects/distribs/openmpi-svn_v1.5/install/lib/libmpi.so.1(PMPI_Finalize+0x59)
> >>>> [0x7f4588c39959]
> >>>> [mir1:05542] [ 7] ./hello(main+0x69) [0x4008fd]
> >>>> [mir1:05542] [ 8] /lib64/libc.so.6(__libc_start_main+0xfd)
> [0x390ca1ec5d]
> >>>> [mir1:05542] [ 9] ./hello() [0x4007d9]
> >>>> [mir1:05542] *** End of error message ***
> >>>> [mir2:10218] [ 0] /lib64/libpthread.so.0() [0x3a6dc0f4c0]
> >>>> [mir2:10218] [ 1]
> >>>>
> /hpc/home/USERS/senina/projects/distribs/openmpi-svn_v1.5/install/lib/libmpi.so.1(+0x1346a8)
> >>>> [0x7f409f31d6a8]
> >>>> [mir2:10218] [ 2]
> >>>>
> /hpc/home/USERS/senina/projects/distribs/openmpi-svn_v1.5/install/lib/libmpi.so.1(opal_hwloc_base_close+0x32)
> >>>> [0x7f409f31d700]
> >>>> [mir2:10218] [ 3]
> >>>>
> /hpc/home/USERS/senina/projects/distribs/openmpi-svn_v1.5/install/lib/libmpi.so.1(opal_finalize+0x73)
> >>>> [0x7f409f34aeb2]
> >>>> [mir2:10218] [ 4]
> >>>>
> /hpc/home/USERS/senina/projects/distribs/openmpi-svn_v1.5/install/lib/libmpi.so.1(orte_finalize+0xfe)
> >>>> [0x7f409f2b0eb5]
> >>>> [mir2:10218] [ 5]
> >>>>
> /hpc/home/USERS/senina/projects/distribs/openmpi-svn_v1.5/install/lib/libmpi.so.1(ompi_mpi_finalize+0x67a)
> >>>> [0x7f409f2507c3]
> >>>> [mir2:10218] [ 6]
> >>>>
> /hpc/home/USERS/senina/projects/distribs/openmpi-svn_v1.5/install/lib/libmpi.so.1(PMPI_Finalize+0x59)
> >>>> [0x7f409f268959]
> >>>> [mir2:10218] [ 7] ./hello(main+0x69) [0x4008fd]
> >>>> [mir2:10218] [ 8] /lib64/libc.so.6(__libc_start_main+0xfd)
> [0x3a6d41ec5d]
> >>>> [mir2:10218] [ 9] ./hello() [0x4007d9]
> >>>> [mir2:10218] *** End of error message ***
> >>>>
> --------------------------------------------------------------------------
> >>>> mpirun noticed that process rank 0 with PID 5542 on node mir1 exited
> >>>> on signal 11 (Segmentation fault).
> >>>> ---------------------------------------------------------------------
> >>>>
> >>>> Thanks,
> >>>> Andrew Senin
> >>>> _______________________________________________
> >>>> users mailing list
> >>>> users_at_[hidden]
> >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>>
> >>>
> >>> --
> >>> Jeff Squyres
> >>> jsquyres_at_[hidden]
> >>> For corporate legal information go to:
> >>> http://www.cisco.com/web/about/doing_business/legal/cri/
> >>>
> >>>
> >>> _______________________________________________
> >>> users mailing list
> >>> users_at_[hidden]
> >>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>
> >> _______________________________________________
> >> users mailing list
> >> users_at_[hidden]
> >> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> >
> > --
> > Jeff Squyres
> > jsquyres_at_[hidden]
> > For corporate legal information go to:
> > http://www.cisco.com/web/about/doing_business/legal/cri/
> >
> >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>