Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI devel] 1.7.4rc: mpirun hangs on ia64
From: Paul Hargrove (phhargrove_at_[hidden])
Date: 2014-01-23 16:16:26


Some progress:

I fixed IA64.asm but still saw failures.
I realized I'd not checked the ia64/atomic.h file.

Lo and behold the origin of the bogus "sxt4" is a pair of improper casts,
removed by the following:
--- opal/include/opal/sys/ia64/atomic.h~ 2014-01-23
13:04:03.000000000 -0800
+++ opal/include/opal/sys/ia64/atomic.h 2014-01-23 13:04:42.000000000 -0800
@@ -119,7 +119,7 @@
     __asm__ __volatile__ ("cmpxchg8.acq %0=[%1],%2,ar.ccv":
                   "=r"(ret) : "r"(addr), "r"(newval) : "memory");

- return ((int32_t)ret == oldval);
+ return (ret == oldval);
 }

@@ -132,7 +132,7 @@
     __asm__ __volatile__ ("cmpxchg8.rel %0=[%1],%2,ar.ccv":
                   "=r"(ret) : "r"(addr), "r"(newval) : "memory");

- return ((int32_t)ret == oldval);
+ return (ret == oldval);
 }

 #endif /* OMPI_GCC_INLINE_ASSEMBLY */

I will retest ASAP and report with, I hope, an attachment to fix both
IA64.asm and ia64/atomic.h

-Paul

On Wed, Jan 22, 2014 at 4:24 PM, Paul Hargrove <phhargrove_at_[hidden]> wrote:

>
> On Wed, Jan 22, 2014 at 2:22 PM, Paul Hargrove <phhargrove_at_[hidden]> wrote:
>
>> My ia64 asm is a bit rusty, but I'll give a quick look if/when I can.
>
>
> I had a look (in v1.7) and this is what I see:
>
> $cat -n IA64.asm | grep -A14 opal_atomic_cmpset_acq_64:
> 70 opal_atomic_cmpset_acq_64:
> 71 .prologue
> 72 .body
> 73 mov ar.ccv=r33;;
> 74 cmpxchg8.acq r32=[r32],r34,ar.ccv
> 75 ;;
> 76 sxt4 r32 = r32
> 77 ;;
> 78 cmp.eq p6, p7 = r33, r32
> 79 ;;
> 80 (p6) addl r8 = 1, r0
> 81 (p7) mov r8 = r0
> 82 br.ret.sptk.many b0
> 83 ;;
> 84 .endp opal_atomic_cmpset_acq_64#
>
> The (approximate and non-atomic) C equivalent is:
>
> // r32 = address
> // r33 = oldvalue
> // r34 = newvalue
> int opal_atomic_cmpset_acq_64(int64_t r32, int64_t r33, int64 r34) {
> int64_t ccv = r33; // L73
> if (*(int64_t *)r32 == ccv) *(int64_t *)r32 = r34; // L74
>
> r32 = (int64_t)(int32_t)r32; // L76 = sign-extend 32->64
>
> bool p6, p7;
> p7 = !(p6 = (r33 == r32)); // L78
>
> const int r0 = 0;
> int r8;
> if (p6) r8 = 1 + r0; // L80
> if (p7) r8 = r0; // L81
> return r8; // L82
> }
>
> Which is fine except that line 76 is totally wrong!!
> The "sxt4" instruction is "sign-extend from 4 bytes to 8 bytes".
> Thus the upper 32-bits of the value read from memory are lost!
> Unless the upper 33 bits off r33 (oldvalue) are all 0s or all 1s, the
> comparison on line 78 MUST fail.
> This explains the hang, as the lifo push will loop indefinitely waiting
> for the success of this cmpset.
>
> Note the same erroneous instruction is also present in the _rel variant
> (at line 94).
> The trunk has the same issue.
> This code has not changed at all since IA64.asm was added way back in
> r4471.
>
> I won't have access to the IA64 platform again until tomorrow AM.
> So, testing my hypothesis will need to wait.
>
> BTW:
> IFF I am right about the source of this problem, then it would be
> beneficial to have (and I may contribute) a stronger test (for "make
> check") that would detect this sort of bug in the atomics (specifically
> look for both false-positive and false-negative return value from 64-bit
> cmpset operations with values satisfying a range of "corner cases"). I
> think I have single-bit and double-bit "marching tests" for cmpset in my
> own arsenal of tests for GASNet's atomics. If I don't have time to
> contribute a complete test, I can at least contribute that logic for
> somebody else to port to the OPAL atomics.
>
> -Paul
>
> P.S.:
> The cmpxchgN for N in 1,2,4 are documented as ZERO-extending their loads
> to 64-bits.
> So, there is a slim chance that the sxt4 actually was intended for the
> 32-bit cmpset code.
> However, since the comparison used there is a "cmp4.eq" the "sxt4" would
> still not be needed.
>
>
> --
> Paul H. Hargrove PHHargrove_at_[hidden]
> Future Technologies Group
> Computer and Data Sciences Department Tel: +1-510-495-2352
> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
>

-- 
Paul H. Hargrove                          PHHargrove_at_[hidden]
Future Technologies Group
Computer and Data Sciences Department     Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900