On 11/22/2011 6:59 PM, Lukas Razik wrote:
Roland Dreier <roland@purestorage.com> wrote:

On Tue, Nov 22, 2011 at 3:05 PM, Lukas Razik <linux@razik.name> wrote:
 #0  0xfffff8010229ba9c in mca_pml_ob1_send_request_start_copy 
(sendreq=0xb23200, bml_btl=0xb29050, size=0) at pml_ob1_sendreq.c:551
 551         hdr->hdr_match.hdr_ctx = 
sendreq->req_send.req_base.req_comm->c_contextid;
 (gdb) backtrace
If you can get into gdb here, I guess it would be useful to print the
address of hdr->hdr_match.hdr_ctx and
sendreq->req_send.req_base.req_comm->c_contextid to see which one is
misaligned.

Not sure of the gdb syntax... does it work to just do

p &hdr->hdr_match.hdr_ctx and sendreq->req_send.req_base.req
p &sendreq->req_send.req_base.req_comm->c_contextid

Oh, sorry that I didn't do that before...
The values are:
&hdr->hdr_match.hdr_ctx and sendreq->req_send.req_base.req  =  (uint16_t *) 0xad7393
&sendreq->req_send.req_base.req_comm->c_contextid  =  (uint32_t *) 0x201c20

So hdr_ctx is the bad one...

Regards,
Lukas


PS:
I always don't know the syntax of gdb - hence I use the nice kdbg. *g*
http://net.razik.de/linux/T5120/kdbg-openmpi-1.4.4-osu_latency-02.png
Lukas,

Can you try running the benchmark with coalescing off?  To do that add the following option to your mpirun line "-mca btl_openib_use_message_coalescing 0". 

thanks,
--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle - Performance Technologies
95 Network Drive, Burlington, MA 01803
Email terry.dontje@oracle.com