Which is fine except that line 76 is totally wrong!!
The "sxt4" instruction is "sign-extend from 4 bytes to 8 bytes".
Thus the upper 32-bits of the value read from memory are lost!
Unless the upper 33 bits off r33 (oldvalue) are all 0s or all 1s, the comparison on line 78 MUST fail.
This explains the hang, as the lifo push will loop indefinitely waiting for the success of this cmpset.
Note the same erroneous instruction is also present in the _rel variant (at line 94).
The trunk has the same issue.
This code has not changed at all since IA64.asm was added way back in r4471.
I won't have access to the IA64 platform again until tomorrow AM.
So, testing my hypothesis will need to wait.
IFF I am right about the source of this problem, then it would be beneficial to have (and I may contribute) a stronger test (for "make check") that would detect this sort of bug in the atomics (specifically look for both false-positive and false-negative return value from 64-bit cmpset operations with values satisfying a range of "corner cases"). I think I have single-bit and double-bit "marching tests" for cmpset in my own arsenal of tests for GASNet's atomics. If I don't have time to contribute a complete test, I can at least contribute that logic for somebody else to port to the OPAL atomics.
The cmpxchgN for N in 1,2,4 are documented as ZERO-extending their loads to 64-bits.
So, there is a slim chance that the sxt4 actually was intended for the 32-bit cmpset code.
However, since the comparison used there is a "cmp4.eq" the "sxt4" would still not be needed.