Hello Oliver,
thanks for the update.
Just my $0.02: the upcoming Open MPI v1.5 will warn users, if their session
directory is on NFS (or Lustre).
Best regards,
Rainer
On Thursday 22 April 2010 11:37:48 am Oliver Geisler wrote:
> To sum up and give an update:
>
> The extended communication times while using shared memory communication
> of openmpi processes are caused by openmpi session directory laying on
> the network via NFS.
>
> The problem is resolved by establishing on each diskless node a ramdisk
> or mounting a tmpfs. By setting the MCA parameter orte_tmpdir_base to
> point to the according mountpoint shared memory communication and its
> files are kept local, thus decreasing the communication times by
> magnitudes.
>
> The relation of the problem to the kernel version is not really
> resolved, but maybe not "the problem" in this respect.
> My benchmark is now running fine on a single node with 4 CPU, kernel
> 2.6.33.1 and openmpi 1.4.1.
> Running on multiple nodes I experience still higher (TCP) communication
> times than I would expect. But that requires me some more deep
> researching the issue (e.g. collisions on the network) and should
> probably posted to a new thread.
>
> Thank you guys for your help.
>
> oli
>
--
------------------------------------------------------------------------
Rainer Keller, PhD Tel: +1 (865) 241-6293
Oak Ridge National Lab Fax: +1 (865) 241-4811
PO Box 2008 MS 6164 Email: keller_at_[hidden]
Oak Ridge, TN 37831-2008 AIM/Skype: rusraink
|