Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] OpenMPI bug?
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2008-06-17 12:34:15


Thanks for digging into this!

The assembly portion of OMPI is quite squirrelly and dangerous to mess
with. We'll need to check into this carefully to make sure that it
works properly on all supported architectures...

As for other bounds checking, would you mind checking the OMPI
development SVN trunk instead of the v1.2 series? We're working on
releasing the new version (v1.3 series) and there have been many, many
changes since the v1.2 series. There's a little instability on the
trunk right now with some recent PML changes that went in, but
hopefully we'll have those solved soon.

On Jun 13, 2008, at 5:13 AM, Gabriele Fatigati wrote:

> I'm sorry.
> The previous code block reported, is referred to 32 bit not 64. So,
> the right code block is:
>
> static inline int opal_atomic_cmpset_32( volatile int32_t *addr,
> int32_t oldval, int32_t
> newval)
> {
> unsigned char ret;
> __asm__ __volatile (
> SMPLOCK "cmpxchgl %1,%2 \n\t"
> "sete %0 \n\t"
> : "=qm" (ret)
> : "q"(newval), "m"(*(volatile long*)addr),
> "a"(oldval) //<<<<< HERE
> : "memory");
>
> return (int)ret;
> }
>
> 2008/6/13 Gabriele Fatigati <g.fatigati_at_[hidden]>:
> Maybe, i solved this bug, deleting long cast.
> Now, in compile time, it works well, but at runtime, there are other
> problems, like this:
>
> ../../../opal/class/opal_object.h:428:Bounds error: pointer
> arithmetic would overrun the end of the object.
> ../../../opal/class/opal_object.h:428: Pointer value: 0x8, Size: 8
> ../../../opal/class/opal_object.h:428: Object `orte_system_info':
> ../../../opal/class/opal_object.h:428: Address in memory:
> 0x0 .. 0xf
> ../../../opal/class/opal_object.h:428: Size: 64
> bytes
> ../../../opal/class/opal_object.h:428: Element size: 1
> bytes
> ../../../opal/class/opal_object.h:428: Number of elements: 64
> ../../../opal/class/opal_object.h:428: Created at: util/
> sys_info.c, line 43
> ../../../opal/class/opal_object.h:428: Storage class: static
>
> There are very much error of this type, differenting by line code
> error in /opal/class/opal_object.h: . All errors are generated by
> same line code:
>
> util/sys_info.c, line 43
>
> Final status of MPI Job is ever "Undefined".
>
> Another bug?
>
>
> 2008/6/12 Gabriele Fatigati <g.fatigati_at_[hidden]>:
> I found that the error starts in this line code:
>
> static opal_atomic_lock_t class_lock = { { OPAL_ATOMIC_UNLOCKED } };
>
> in class/opal_object.c, line 52
>
> and generates the bound error in this code block:
>
> static inline int opal_atomic_cmpset_64( volatile int64_t *addr,
>
> int64_t oldval, int64_t newval)
> {
> unsigned char ret;
> __asm__ __volatile (
> SMPLOCK "cmpxchgq %1,%2 \n\t"
> "sete %0 \n\t"
> : "=qm" (ret)
> : "q"(newval), "m"(*((volatile long*)addr)),
> "a"(oldval) //<<<<< HERE
> : "memory");
>
> return (int)ret;
> }
>
> in /opal/include/opal/sys/amd64/atomic.h, at line 89
>
> The previous enviroment variable is GCC_BOUNDS_OPTS
>
> Thanks in advance.
>
>
> 2008/6/12 Gabriele Fatigati <g.fatigati_at_[hidden]>:
> Hi,
>
> i have installed OpenMPI 1.2.6, using gcc with bounds checking. But,
> when i compile an MPI program, i have many time the same error:
>
> ../opal/include/opal/sys/amd64/atomic.h:89: Address in memory:
> 0x8 .. 0xb
> ../opal/include/opal/sys/amd64/atomic.h:89: Size:
> 4 bytes
> ../opal/include/opal/sys/amd64/atomic.h:89: Element size:
> 1 bytes
> ../opal/include/opal/sys/amd64/atomic.h:89: Number of elements: 4
> ../opal/include/opal/sys/amd64/atomic.h:89: Created at:
> class/opal_object.c, line 52
> ../opal/include/opal/sys/amd64/atomic.h:89: Storage class:
> static
> ../opal/include/opal/sys/amd64/atomic.h:89:Bounds error: attempt to
> reference memory overrunning the end of an object.
> ../opal/include/opal/sys/amd64/atomic.h:89: Pointer value: 0x8,
> Size: 8
>
> Setting the enviroment variable to "-never-fatal", the compile
> phase, ends successfull. But, at runtime, i have ever the error
> above, very much time, and the program fails, with "undefined status".
>
> Is this an OpenMPI bug?
>
>
>
>
>
> --
> Gabriele Fatigati
>
> CINECA Systems & Tecnologies Department
>
> Supercomputing Group
>
> Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
>
> www.cineca.it Tel: +39 051 6171722
>
> g.fatigati_at_[hidden]
>
>
>
> --
> Gabriele Fatigati
>
> CINECA Systems & Tecnologies Department
>
> Supercomputing Group
>
> Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
>
> www.cineca.it Tel: +39 051 6171722
>
> g.fatigati_at_[hidden]
>
>
>
> --
> Gabriele Fatigati
>
> CINECA Systems & Tecnologies Department
>
> Supercomputing Group
>
> Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
>
> www.cineca.it Tel: +39 051 6171722
>
> g.fatigati_at_[hidden]
>
>
>
> --
> Gabriele Fatigati
>
> CINECA Systems & Tecnologies Department
>
> Supercomputing Group
>
> Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
>
> www.cineca.it Tel: +39 051 6171722
>
> g.fatigati_at_[hidden] _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Jeff Squyres
Cisco Systems