Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI devel] [BUG?] OpenMPI with openib on SPARC64: Signal: Bus error (10)
From: TERRY DONTJE (terry.dontje_at_[hidden])
Date: 2011-11-23 07:13:18


On 11/22/2011 6:59 PM, Lukas Razik wrote:
> Roland Dreier<roland_at_[hidden]> wrote:
>
>> On Tue, Nov 22, 2011 at 3:05 PM, Lukas Razik<linux_at_[hidden]> wrote:
>>> #0 0xfffff8010229ba9c in mca_pml_ob1_send_request_start_copy
>> (sendreq=0xb23200, bml_btl=0xb29050, size=0) at pml_ob1_sendreq.c:551
>>> 551 hdr->hdr_match.hdr_ctx =
>> sendreq->req_send.req_base.req_comm->c_contextid;
>>> (gdb) backtrace
>> If you can get into gdb here, I guess it would be useful to print the
>> address of hdr->hdr_match.hdr_ctx and
>> sendreq->req_send.req_base.req_comm->c_contextid to see which one is
>> misaligned.
>>
>> Not sure of the gdb syntax... does it work to just do
>>
>> p&hdr->hdr_match.hdr_ctx and sendreq->req_send.req_base.req
>> p&sendreq->req_send.req_base.req_comm->c_contextid
>>
> Oh, sorry that I didn't do that before...
> The values are:
> &hdr->hdr_match.hdr_ctx and sendreq->req_send.req_base.req = (uint16_t *) 0xad7393
> &sendreq->req_send.req_base.req_comm->c_contextid = (uint32_t *) 0x201c20
>
> So hdr_ctx is the bad one...
>
> Regards,
> Lukas
>
>
> PS:
> I always don't know the syntax of gdb - hence I use the nice kdbg. *g*
> http://net.razik.de/linux/T5120/kdbg-openmpi-1.4.4-osu_latency-02.png
Lukas,

Can you try running the benchmark with coalescing off? To do that add
the following option to your mpirun line "-mca
btl_openib_use_message_coalescing 0".

thanks,

-- 
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.dontje_at_[hidden] <mailto:terry.dontje_at_[hidden]>



picture