Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] EXTERNAL: Re: How to set up state-less node /tmp for OpenMPI usage
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2011-11-08 11:56:19

We talked about this issue on the weekly OMPI engineering teleconf today.

It seems like it would be a good idea to bring over the new shared memory revamp to the v1.5 series before it transitions to v1.6 so that it can avoid network-mounted /tmp filesystem issues. LANL will be evaluating this; the gut feeling was that it would not be a lot of work to bring this over to the v1.5 branch.

I've created to track the issue.

On Nov 8, 2011, at 8:21 AM, Jeff Squyres wrote:

> On Nov 7, 2011, at 12:12 PM, Blosch, Edwin L wrote:
>> Thanks for the valuable input. I'll change to a wait-and-watch approach.
>> The FAQ on tuning sm says "If the session directory is located on a network filesystem, the shared memory BTL latency will be extremely high." And the title is 'Why am I seeing incredibly poor performance...'. So I made the leap that this configuration must be avoided at all costs...
> (sorry for jumping in late; it's the week before SC, and lots of deadlines are approaching!)
> This is definitely true: if OMPI's mmap files are located on a network filesystem (such as if /tmp is NFS-mounted), your latencies will be higher. I don't claim to know all the exact reasons why, but I have personally seen enough empirical evidence to believe it. Perhaps newer versions of Linux/NFS/whatever have made the issue better. But I'm quite sure that it was happening; that's why we put in that warning.
> Here's a few points to add to this discussion, in no particular order:
> 1. Keep in mind the difference between the session directory and the shared memory backing files: the session directory contains some meta data that OMPI processes need. In general, most of that data is not performance-critical, such that if it's on a networked filesystem, general MPI performance will not be affected. In 1.4.x and 1.5.x, the shared memory mmap files are also located in the session directory, and as described above, we have definitely seen a negative MPI latency performance impact when this file is on a networked file system.
> 2. In the upcoming OMPI v1.7, we revamped the shared memory backing system such that mmap does not have to be used, and therefore will not care if /tmp is on a networked filesystem.
> 3. I don't know whether /tmp on an networked filesystem is 100% "proper" or not. I know that some people do it, but there are uniqueness requirements that can definitely be violated in various other tools in this case. OMPI may not be the only software package that can run into problems here, even if the problems are rare and difficult to track down (e.g., because two processes with the same PID on different machines tried to use the same filename in /tmp, or attempts to use file locking, etc.).
> --
> Jeff Squyres
> jsquyres_at_[hidden]
> For corporate legal information go to:
> _______________________________________________
> users mailing list
> users_at_[hidden]

Jeff Squyres
For corporate legal information go to: