Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Q: OpenMPI's use of /tmp and hanging apps via FS problems?
From: George Bosilca (bosilca_at_[hidden])
Date: 2008-08-17 17:49:29


Some of these files are used for startup, while others are used during
application execution (such as the back-end for shared memory files).
Over the years we had a lot of discussions about this topic, and so
far we have two ways to help people deal with such situations.
However, from my personal experience I don't think that mounting the /
tmp over any kind of shared filesystem is a good idea. Anyway, here
are two MCA parameters that might help you:

                 MCA orte: parameter "orte_tmpdir_base" (current
value: <none>, data source: default value)
                           Base of the session directory tree
                 MCA orte: parameter "orte_no_session_dirs" (current
value: <none>, data source: default value)
                           Prohibited locations for session
directories (multiple locations separated by ',', default=NULL)

I suggest using the first one as a startup.


On Aug 16, 2008, at 9:40 PM, Brian Dobbins wrote:

> Hi guys,
> I was hoping someone here could shed some light on OpenMPI's use
> of /tmp (or, I guess, TMPDIR) and save me from diving into the
> source.. ;)
> The background is that I'm trying to run some applications on a
> system which has a flaky parallel file system which TMPDIR is mapped
> to - so, on start up, OpenMPI creates it's 'openmpi-sessions-<user>'
> directory there, and under that, a few files. Sometimes I see 1
> subdirectory (usually a 0), sometimes a 0 and a 1, etc. In one of
> these, sometimes I see files such as 'shared_memory_pool.<host>',
> and 'shared_memory_module.<host>'.
> My questions are, one, what are the various numbers / files for?
> (If there's a write-up on this somewhere, just point me towards it!)
> And two, the real question, are these 'files' used during runtime,
> or only upon startup / shutdown? I'm having issues with various
> codes, especially those heavy on messages and I/O, failing to
> complete a run, and haven't resorted to sifting through strace's
> output yet. This doesn't happen all the time, but I've seen it
> happen reliably now with one particular code - it's success rate (it
> DOES succeed sometimes) is about 25% right now. My best guess is
> that this is because the file system is overloaded, thus not
> allowing timely I/O or access to OpenMPI's files, but I wanted to
> get a quick understanding of how these files are used by OpenMPI and
> whether the FS does indeed seem a likely culprit before going with
> that theory for sure.
> Thanks very much,
> - Brian
> Brian Dobbins
> Yale Engineering HPC
> _______________________________________________
> users mailing list
> users_at_[hidden]

  • application/pkcs7-signature attachment: smime.p7s