Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

From: Ferris McCormick (fmccor_at_[hidden])
Date: 2005-09-26 15:09:06


On Mon, 2005-09-26 at 14:59 +0000, Ferris McCormick wrote:
> On Fri, 2005-09-16 at 11:35 -0500, Brian Barrett wrote:
> > On Sep 16, 2005, at 8:44 AM, Ferris McCormick wrote:
> >
> > > ==========================================
> > > fmccor_at_polylepis util [235]% ./opal_timer
> > > --> frequency: 900000000
> > > --> cycle count
> > > Slept approximately 903151189 cycles, or 1003501 us
> > > --> usecs
> > > Slept approximately 18446744073289684648 us
> > > ==========================================
> >
> > That last value means that I'm munging the upper 32 bits of the tick
> > register (it's 64 bits long). So we're not quite there yet, but
> > getting closer. I should be able to get to that today.
> >
> > The other problem is very odd. Since you're compiling in 32bit mode,
> > I'd expect us to see it on our PowerPC machines, but I haven't run into
> > that one yet. I'll try to compile without debugging and see what I can
> > see.
> >
> >
> > Brian
> >
> Here's a little more information on the SegFault when trying
> OBJ_DESTRUCT(&verbose); in opal/util/optput.c:
> First of all, verbose is of type opal_output_stream_t, and this is not
> an opal_object_t, so OBJ_DESTRUCT is calling opal_obj_run_destructors
> with an object of the wrong type (although ompi might be forcing storage
> allocation so that this call should work; I haven't worked it out).
>
> Second, on my system at least, when OBJ_DESTRUCT(&verbose) gets called,
> verbose looks like this (I have a debug fprintf to try to look at a bit
> of the verbose structure. The corresponding fprintf I put after
> OBJ_CONSTRUCT(&verbose, opal_output_stream_t); is fine.)
> ====================================
> Program received signal SIGSEGV, Segmentation fault.
> 0x7014f7d4 in opal_output_close (output_id=1883966264) at output.c:287
> 287 fprintf(stderr,"Destroying verbose, depth=%d
> \n",(/*(opal_object_t*)&*/verbose.super).obj_class->cls_depth);
> Current language: auto; currently c
> (gdb) print verbose
> $1 = {super = {obj_class = 0x0, obj_reference_count = 1},
> lds_is_debugging = false,
> lds_verbose_level = 0, lds_want_syslog = false, lds_syslog_priority =
> 0,
> lds_syslog_ident = 0x0, lds_prefix = 0x0, lds_want_stdout = false,
> lds_want_stderr = true,
> lds_want_file = false, lds_want_file_append = false, lds_file_suffix =
> 0x0}
> =====================================
> so that verbose.super.obj_class has been set to null, and no matter how
> it is supposed to work, the opal_obj_run_destructors loop:
> cls = object->obj_class;
> for(i=0; i < cls->cls_depth;i++) { ...
> is going to be working on garbage, because nothing in verbose has a
> useful obj_class element.
>

I've looked at the structures, and I see that opal_output_stream_t is
set up so that (opal_object_t*)(&verbose) should resolve correctly, and
thus my first concern is gone.

Now, for the second: If built with --enable-debug, then when the
program finally reaches OBJ_DESTRUCT(&verbose), the obj_class pointer is
correct. Without --enable-debug, it is NULL. I'll keep looking at it,
but so far, I don't see what is going wrong.

Regards,

-- 
Ferris McCormick (P44646, MI) <fmccor_at_[hidden]>
Developer, Gentoo Linux (Sparc, Devrel)