Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Hang in collectives involving shared memory
From: Bogdan Costescu (Bogdan.Costescu_at_[hidden])
Date: 2009-06-10 13:29:44

On Wed, 10 Jun 2009, Ralph Castain wrote:

> Meantime, I have filed a bunch of data on this in ticket #1944, so perhaps
> you might take a glance at that and offer some thoughts?

I wasn't able to reproduce this. I have run with the following setup:
- OS is Scientific Linux 5.1 with a custom compiled kernel based on, but (due to circumstances that I can't control):

checking if MCA component maffinity:libnuma can compile... no

- Intel compiler 10.1
- OpenMPI 1.3.2
- nodes have 2 CPUs of type E5440 (quad core), 16GB RAM and a ConnectX

I've used the platform file that you have provided, but took out the
references to PanFS and fixed the paths. I've also used the MCA file
that you have provided.

I have run with nodes=1:ppn=8 and nodes=2:ppn=8 and the test finished
successfully with m=50 several times. This, together with the earlier
post also describing a negative result, points to a problem related to
your particular setup...

Bogdan Costescu
IWR, University of Heidelberg, INF 368, D-69120 Heidelberg, Germany
Phone: +49 6221 54 8240, Fax: +49 6221 54 8850
E-mail: bogdan.costescu_at_[hidden]