Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] Openib with > 32 cores per node
From: Robert Horton (r.horton_at_[hidden])
Date: 2011-05-20 10:19:55


Thanks for getting back to me (and thanks to Jeff for the explanation

On Thu, 2011-05-19 at 09:59 -0600, Samuel K. Gutierrez wrote:
> Hi,
> On May 19, 2011, at 9:37 AM, Robert Horton wrote
> > On Thu, 2011-05-19 at 08:27 -0600, Samuel K. Gutierrez wrote:
> >> Hi,
> >>
> >> Try the following QP parameters that only use shared receive queues.
> >>
> >> -mca btl_openib_receive_queues S,12288,128,64,32:S,65536,128,64,32
> >>
> >
> > Thanks for that. If I run the job over 2 x 48 cores it now works and the
> > performance seems reasonable (I need to do some more tuning) but when I
> > go up to 4 x 48 cores I'm getting the same problem:
> >
> > [compute-1-7.local][[14383,1],86][../../../../../ompi/mca/btl/openib/connect/btl_openib_connect_oob.c:464:qp_create_one] error creating qp errno says Cannot allocate memory
> > [compute-1-7.local:18106] *** An error occurred in MPI_Isend
> > [compute-1-7.local:18106] *** on communicator MPI_COMM_WORLD
> > [compute-1-7.local:18106] *** MPI_ERR_OTHER: known error not in list
> > [compute-1-7.local:18106] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
> >
> > Any thoughts?
> How much memory does each node have? Does this happen at startup?

Each node has 64GB of RAM. The error happens fairly soon after the job

> Try adding:
> -mca btl_openib_cpc_include rdmacm

Ah - that looks much better. I can now run hpcc over all 15x48 cores. I
need to look at the performance in a bit more detail but it seems to be
"reasonable" at least :)

One thing is puzzling me - when I compile OpenMPI myself it seems to
lack rdmamc support - however the one compiled by the OFED install
process does include it. I'm compiling with:

'--prefix=/share/apps/openmpi/1.4.3/gcc' '--with-sge' '--with-openib' '--enable-openib-rdmacm'

Any idea what might be going on there?

> I'm not sure if your version of OFED supports this feature, but maybe using XRC may help. I **think** other tweaks are needed to get this going, but I'm not familiar with the details.

I'm using the QLogic (QLE7340) rather than Mellanox cards so that
doesn't seem to be an option to me (?). It would be interesting to know
how much difference it would make though...

Thanks again for your help and have a good weekend.


Robert Horton
System Administrator (Research Support) - School of Mathematical Sciences
Queen Mary, University of London
r.horton_at_[hidden]  -  +44 (0) 20 7882 7345