Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] [BUG?] OpenMPI with openib on SPARC64: Signal: Bus error (10)
From: TERRY DONTJE (terry.dontje_at_[hidden])
Date: 2011-11-23 10:03:33


On 11/23/2011 9:57 AM, Lukas Razik wrote:
> TERRY DONTJE<terry.dontje_at_[hidden]> wrote:
>> On 11/22/2011 6:59 PM, Lukas Razik wrote:
>>> Roland Dreier<roland_at_[hidden]> wrote:
>>>
>>>> On Tue, Nov 22, 2011 at 3:05 PM, Lukas Razik<linux_at_[hidden]>
>> wrote:
>>>>> #0 0xfffff8010229ba9c in mca_pml_ob1_send_request_start_copy
>>>> (sendreq=0xb23200, bml_btl=0xb29050, size=0) at pml_ob1_sendreq.c:551
>>>>> 551 hdr->hdr_match.hdr_ctx =
>>>> sendreq->req_send.req_base.req_comm->c_contextid;
>>>>> (gdb) backtrace
>>>> If you can get into gdb here, I guess it would be useful to print the
>>>> address of hdr->hdr_match.hdr_ctx and
>>>> sendreq->req_send.req_base.req_comm->c_contextid to see which one
>> is
>>>> misaligned.
>>>>
>>>> Not sure of the gdb syntax... does it work to just do
>>>>
>>>> p&hdr->hdr_match.hdr_ctx and sendreq->req_send.req_base.req
>>>> p&sendreq->req_send.req_base.req_comm->c_contextid
>>>>
>>> Oh, sorry that I didn't do that before...
>>> The values are:
>>> &hdr->hdr_match.hdr_ctx and sendreq->req_send.req_base.req =
>> (uint16_t *) 0xad7393
>>> &sendreq->req_send.req_base.req_comm->c_contextid = (uint32_t
>> *) 0x201c20
>>> So hdr_ctx is the bad one...
>>>
>>>
>>> PS:
>>> I always don't know the syntax of gdb - hence I use the nice kdbg. *g*
>>> http://net.razik.de/linux/T5120/kdbg-openmpi-1.4.4-osu_latency-02.png
>> Can you get me the value of hdr too. I bet it is an odd value too.
>
> You're right! :)
> The value of hdr you can see in the first screenshot I've sent sent you:
> http://net.razik.de/linux/T5120/kdbg-openmpi-1.4.4-osu_latency.png
>
> It's
>
> hdr = (mca_pml_ob1_hdr_t*) 0xad7391
>
Which now leads me to wondering if this is due to the coalescing code.
If you can run with coalescing off (as described in my last email) that
might be telling.

-- 
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.dontje_at_[hidden] <mailto:terry.dontje_at_[hidden]>



picture