Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] kernel 2.6.23 vs 2.6.24 - communication/wait times
From: Rainer Keller (keller_at_[hidden])
Date: 2010-04-22 12:08:30


Hello Oliver,
thanks for the update.

Just my $0.02: the upcoming Open MPI v1.5 will warn users, if their session
directory is on NFS (or Lustre).

Best regards,
Rainer

On Thursday 22 April 2010 11:37:48 am Oliver Geisler wrote:
> To sum up and give an update:
>
> The extended communication times while using shared memory communication
> of openmpi processes are caused by openmpi session directory laying on
> the network via NFS.
>
> The problem is resolved by establishing on each diskless node a ramdisk
> or mounting a tmpfs. By setting the MCA parameter orte_tmpdir_base to
> point to the according mountpoint shared memory communication and its
> files are kept local, thus decreasing the communication times by
> magnitudes.
>
> The relation of the problem to the kernel version is not really
> resolved, but maybe not "the problem" in this respect.
> My benchmark is now running fine on a single node with 4 CPU, kernel
> 2.6.33.1 and openmpi 1.4.1.
> Running on multiple nodes I experience still higher (TCP) communication
> times than I would expect. But that requires me some more deep
> researching the issue (e.g. collisions on the network) and should
> probably posted to a new thread.
>
> Thank you guys for your help.
>
> oli
>

-- 
------------------------------------------------------------------------
Rainer Keller, PhD                  Tel: +1 (865) 241-6293
Oak Ridge National Lab          Fax: +1 (865) 241-4811
PO Box 2008 MS 6164           Email: keller_at_[hidden]
Oak Ridge, TN 37831-2008    AIM/Skype: rusraink