I just wanted to record the behind the scenes resolution to this particular issue. For more info, take a look at: https://svn.open-mpi.org/trac/ompi/ticket/3076
It seems as if the problem stems from /tmp being mounted as an NFS space that is shared between the compute nodes.
This problem can be resolved in a variety of ways. Below are a few avenues that can help get around the "globally mounted /tmp space" issue, but others are welcome to add to the list.
o Change the place where ORTE stores its session information-mca orte_tmpdir_base /path/to/some/local/storeFor example:-mca orte_tmpdir_base /dev/shm
**Note: the following options are only available in Open MPI v1.5.5+**
o Change where shmem mmap places its files.-mca shmem_mmap_relocate_backing_file -1 -mca shmem_mmap_backing_file_base_dir /dev/shm
o Change the backing facility used by the sm mpool and sm BTL to posix or sysv-mca shmem posix-mca shmem sysv
On Apr 24, 2012, at 12:34 PM, Seyyed Mohtadin Hashemi wrote:
I ran those cmd's and have posted the outputs on: https://svn.open-mpi.org/trac/ompi/ticket/3076
-mca shmem posix worked for all -np (even when oversubscribing), however sysv did not work for any -np.
On Tue, Apr 24, 2012 at 5:36 PM, Gutierrez, Samuel K <email@example.com> wrote:
Just out of curiosity, what happens when you add
-mca shmem posix
to your mpirun command line using 1.5.5?
Can you also please try:
-mca shmem sysv
I'm shooting in the dark here, but I want to make sure that the failure isn't due to a small backing store.
On Apr 16, 2012, at 8:57 AM, Gutierrez, Samuel K wrote:
De venligste hilsner/I am, yours most sincerely
Seyyed Mohtadin Hashemi
users mailing list