Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] [BUG?] OpenMPI with openib on SPARC64: Signal: Bus error (10)
From: TERRY DONTJE (terry.dontje_at_[hidden])
Date: 2011-11-23 07:13:18


On 11/22/2011 6:59 PM, Lukas Razik wrote:
> Roland Dreier<roland_at_[hidden]> wrote:
>
>> On Tue, Nov 22, 2011 at 3:05 PM, Lukas Razik<linux_at_[hidden]> wrote:
>>> #0 0xfffff8010229ba9c in mca_pml_ob1_send_request_start_copy
>> (sendreq=0xb23200, bml_btl=0xb29050, size=0) at pml_ob1_sendreq.c:551
>>> 551 hdr->hdr_match.hdr_ctx =
>> sendreq->req_send.req_base.req_comm->c_contextid;
>>> (gdb) backtrace
>> If you can get into gdb here, I guess it would be useful to print the
>> address of hdr->hdr_match.hdr_ctx and
>> sendreq->req_send.req_base.req_comm->c_contextid to see which one is
>> misaligned.
>>
>> Not sure of the gdb syntax... does it work to just do
>>
>> p&hdr->hdr_match.hdr_ctx and sendreq->req_send.req_base.req
>> p&sendreq->req_send.req_base.req_comm->c_contextid
>>
> Oh, sorry that I didn't do that before...
> The values are:
> &hdr->hdr_match.hdr_ctx and sendreq->req_send.req_base.req = (uint16_t *) 0xad7393
> &sendreq->req_send.req_base.req_comm->c_contextid = (uint32_t *) 0x201c20
>
> So hdr_ctx is the bad one...
>
> Regards,
> Lukas
>
>
> PS:
> I always don't know the syntax of gdb - hence I use the nice kdbg. *g*
> http://net.razik.de/linux/T5120/kdbg-openmpi-1.4.4-osu_latency-02.png
Lukas,

Can you try running the benchmark with coalescing off? To do that add
the following option to your mpirun line "-mca
btl_openib_use_message_coalescing 0".

thanks,

-- 
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.dontje_at_[hidden] <mailto:terry.dontje_at_[hidden]>



picture