Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] [openib] segfault when using openib btl
From: Eloi Gaudry (eg_at_[hidden])
Date: 2010-09-29 14:33:58


  Pasha,
Thanks for your help.

I'm not aware of such memory configuration on the new cluster of our
customer (each computing node is running the Red-Hat 5.x operating
system on Intel X5570 processors).
Anyway, I've already tried to deactivate eager_rdma, but this wouldn't
solve the hdr->tag=0 issue (in
share/openmpi/mca-btl-openib-device-params.ini, eager_rdma is on
[vendor_part_id=26428]).

Ishai,
If you need anymore information, please feel free to ask.

Regards,
Eloi

On 29/09/2010 19:49, Shamis, Pavel wrote:
> Terry,
> Ishai Rabinovitz is HPC team manager (I added him to CC)
>
> Eloi,
>
> Back to issue. I have seen very similar issue long time ago on some hardware platforms that support relaxed ordering memory operations. If I remember correct it was some IBM platform.
> Do you know if relaxed memory ordering is enabled on your platform ? If it is enabled you have to disable eager rdma.
>
> Regards,
> Pasha
>
> On Sep 29, 2010, at 1:04 PM, Terry Dontje wrote:
>
> Pasha, do you by any chance know who at Mellanox might be responsible for OMPI working?
>
> --td
>
> Eloi Gaudry wrote:
> Hi Nysal, Terry,
> Thanks for your input on this issue.
> I'll follow your advice. Do you know any Mellanox developer I may discuss with, preferably someone who has spent some time inside the openib btl ?
>
> Regards,
> Eloi
>
> On 29/09/2010 06:01, Nysal Jan wrote:
> Hi Eloi,
> We discussed this issue during the weekly developer meeting& there were no further suggestions, apart from checking the driver and firmware levels. The consensus was that it would be better if you could take this up directly with your IB vendor.
>
> Regards
> --Nysal
> _______________________________________________
> users mailing list
> users_at_[hidden]<mailto:users_at_[hidden]>
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> --
> <ATT00001..gif>
> Terry D. Dontje | Principal Software Engineer
> Developer Tools Engineering | +1.781.442.2631
> Oracle - Performance Technologies
> 95 Network Drive, Burlington, MA 01803
> Email terry.dontje_at_[hidden]<mailto:terry.dontje_at_[hidden]>
>
> <ATT00002..txt>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users