Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] Q: OpenMPI's use of /tmp and hanging apps via FS problems?
From: Brian Dobbins (bdobbins_at_[hidden])
Date: 2008-08-16 15:40:58

Hi guys,

  I was hoping someone here could shed some light on OpenMPI's use of /tmp
(or, I guess, TMPDIR) and save me from diving into the source.. ;)

  The background is that I'm trying to run some applications on a system
which has a flaky parallel file system which TMPDIR is mapped to - so, on
start up, OpenMPI creates it's 'openmpi-sessions-<user>' directory there,
and under that, a few files. Sometimes I see 1 subdirectory (usually a 0),
sometimes a 0 and a 1, etc. In one of these, sometimes I see files such as
'shared_memory_pool.<host>', and 'shared_memory_module.<host>'.

  My questions are, one, what are the various numbers / files for? (If
there's a write-up on this somewhere, just point me towards it!)

  And two, the real question, are these 'files' used during runtime, or only
upon startup / shutdown? I'm having issues with various codes, especially
those heavy on messages and I/O, failing to complete a run, and haven't
resorted to sifting through strace's output yet. This doesn't happen all
the time, but I've seen it happen reliably now with one particular code -
it's success rate (it DOES succeed sometimes) is about 25% right now. My
best guess is that this is because the file system is overloaded, thus not
allowing timely I/O or access to OpenMPI's files, but I wanted to get a
quick understanding of how these files are used by OpenMPI and whether the
FS does indeed seem a likely culprit before going with that theory for sure.

  Thanks very much,
  - Brian

Brian Dobbins
Yale Engineering HPC