On 11/23/2011 9:57 AM, Lukas Razik wrote:
TERRY DONTJE <terry.dontje@oracle.com> wrote:
On 11/22/2011 6:59 PM, Lukas Razik wrote:
Roland Dreier<roland@purestorage.com>  wrote:

On Tue, Nov 22, 2011 at 3:05 PM, Lukas Razik<linux@razik.name> 
wrote:
   #0  0xfffff8010229ba9c in mca_pml_ob1_send_request_start_copy
(sendreq=0xb23200, bml_btl=0xb29050, size=0) at pml_ob1_sendreq.c:551
   551        hdr->hdr_match.hdr_ctx =
sendreq->req_send.req_base.req_comm->c_contextid;
   (gdb) backtrace
If you can get into gdb here, I guess it would be useful to print the
address of hdr->hdr_match.hdr_ctx and
sendreq->req_send.req_base.req_comm->c_contextid to see which one
is
misaligned.

Not sure of the gdb syntax... does it work to just do

p&hdr->hdr_match.hdr_ctx and sendreq->req_send.req_base.req
p&sendreq->req_send.req_base.req_comm->c_contextid

Oh, sorry that I didn't do that before...
The values are:
&hdr->hdr_match.hdr_ctx and sendreq->req_send.req_base.req  = 
(uint16_t *) 0xad7393
&sendreq->req_send.req_base.req_comm->c_contextid  =  (uint32_t
*) 0x201c20
So hdr_ctx is the bad one...


PS:
I always don't know the syntax of gdb - hence I use the nice kdbg. *g*
http://net.razik.de/linux/T5120/kdbg-openmpi-1.4.4-osu_latency-02.png
Can you get me the value of hdr too.  I bet it is an odd value too.

You're right! :)
The value of hdr you can see in the first screenshot I've sent sent you:
http://net.razik.de/linux/T5120/kdbg-openmpi-1.4.4-osu_latency.png

It's

hdr = (mca_pml_ob1_hdr_t*) 0xad7391

Which now leads me to wondering if this is due to the coalescing code.  If you can run with coalescing off (as described in my last email) that might be telling.

--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle - Performance Technologies
95 Network Drive, Burlington, MA 01803
Email terry.dontje@oracle.com