Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Large IMB test problems?
From: George Bosilca (bosilca_at_[hidden])
Date: 2008-07-24 17:22:56


Ralph,

On our quest for better shared memory collective, we did some runs
with 16 cores Intel machines. The SM worked as expected, as far as I
can tell. Unfortunately we only have one such node, so we never tried
more than 16 processes.

   george.

On Jul 24, 2008, at 11:13 PM, Ralph Castain wrote:

> Yo folks
>
> We are trying to run some tests on a new cluster and are having a
> problem telling hardware, system software, and OMPI failures apart.
> This is a 16-ppn Opteron system running SLURM under RHEL (forget the
> precise version), with IB and OMPI 1.2.6.
>
> Everything launches just fine and seems to work okay. However, on
> large jobs (e.g., >450 procs), the IMB tests fail and crash a bunch
> of the nodes on which they are running.
>
> Has anyone else been able to test in 16+ ppn configurations? I'm
> wondering if we have an SM problem - perhaps inadequate backing file
> space or something?
>
> Any suggestions on how to debug this or config options for higher
> ppn systems would be appreciated. We don't see this problem on
> anything with lesser ppn. I'm going to give it a try with 1.3 and
> see what happens there.
>
> Thanks
> Ralph
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel



  • application/pkcs7-signature attachment: smime.p7s