Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI devel] [BUG?] OpenMPI with openib on SPARC64: Signal: Bus error (10)
From: TERRY DONTJE (terry.dontje_at_[hidden])
Date: 2011-11-23 10:03:33

On 11/23/2011 9:57 AM, Lukas Razik wrote:
> TERRY DONTJE<terry.dontje_at_[hidden]> wrote:
>> On 11/22/2011 6:59 PM, Lukas Razik wrote:
>>> Roland Dreier<roland_at_[hidden]> wrote:
>>>> On Tue, Nov 22, 2011 at 3:05 PM, Lukas Razik<linux_at_[hidden]>
>> wrote:
>>>>> #0 0xfffff8010229ba9c in mca_pml_ob1_send_request_start_copy
>>>> (sendreq=0xb23200, bml_btl=0xb29050, size=0) at pml_ob1_sendreq.c:551
>>>>> 551 hdr->hdr_match.hdr_ctx =
>>>> sendreq->req_send.req_base.req_comm->c_contextid;
>>>>> (gdb) backtrace
>>>> If you can get into gdb here, I guess it would be useful to print the
>>>> address of hdr->hdr_match.hdr_ctx and
>>>> sendreq->req_send.req_base.req_comm->c_contextid to see which one
>> is
>>>> misaligned.
>>>> Not sure of the gdb syntax... does it work to just do
>>>> p&hdr->hdr_match.hdr_ctx and sendreq->req_send.req_base.req
>>>> p&sendreq->req_send.req_base.req_comm->c_contextid
>>> Oh, sorry that I didn't do that before...
>>> The values are:
>>> &hdr->hdr_match.hdr_ctx and sendreq->req_send.req_base.req =
>> (uint16_t *) 0xad7393
>>> &sendreq->req_send.req_base.req_comm->c_contextid = (uint32_t
>> *) 0x201c20
>>> So hdr_ctx is the bad one...
>>> PS:
>>> I always don't know the syntax of gdb - hence I use the nice kdbg. *g*
>> Can you get me the value of hdr too. I bet it is an odd value too.
> You're right! :)
> The value of hdr you can see in the first screenshot I've sent sent you:
> It's
> hdr = (mca_pml_ob1_hdr_t*) 0xad7391
Which now leads me to wondering if this is due to the coalescing code.
If you can run with coalescing off (as described in my last email) that
might be telling.

Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.dontje_at_[hidden] <mailto:terry.dontje_at_[hidden]>