Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

From: eddie168 (eddie168+ompi_user_at_[hidden])
Date: 2007-01-17 20:56:23


Hi Ralph and Brian,

Thanks for the advice, I have checked the permission to /tmp

drwxrwxrwt 19 root root 4096 Jan 18 11:38 tmp

which I think there shouldn't be any problem to create files there, so
option (a) still not work for me.

I tried option (b) which set --tmpdir on command line and run as normal
user, it works for -np 1, however it gives the same error for -np 2.

Option (c) also tested by setting "OMPI_MCA_tmpdir_base =
/home2/mpi_tut/tmp" in "~/.openmpi/mca-params.conf", however error still
occurred.

I included the debug output of what I ran (with IP masked), I noticed that
the optional tmp directory set in the beginning of the process, however it
changed back to "/tmp" after executing orted. Could the error I got related
to SSH setting?

Many thanks,

Eddie.

[eddie_at_oceanus:~/home2/mpi_tut]$ mpirun -d --tmpdir /home2/mpi_tut/tmp -np 2
tut01
[oceanus:129119] [0,0,0] setting up session dir with
[oceanus:129119] tmpdir /home2/mpi_tut/tmp
[oceanus:129119] universe default-universe
[oceanus:129119] user eddie
[oceanus:129119] host oceanus
[oceanus:129119] jobid 0
[oceanus:129119] procid 0
[oceanus:129119] procdir:
/home2/mpi_tut/tmp/openmpi-sessions-eddie_at_oceanus_0/default-universe/0/0
[oceanus:129119] jobdir:
/home2/mpi_tut/tmp/openmpi-sessions-eddie_at_oceanus_0/default-universe/0
[oceanus:129119] unidir:
/home2/mpi_tut/tmp/openmpi-sessions-eddie_at_oceanus_0/default-universe
[oceanus:129119] top: openmpi-sessions-eddie_at_oceanus_0
[oceanus:129119] tmp: /home2/mpi_tut/tmp
[oceanus:129119] [0,0,0] contact_file
/home2/mpi_tut/tmp/openmpi-sessions-eddie_at_oceanus_0/default-universe/universe-setup.txt
[oceanus:129119] [0,0,0] wrote setup file
[oceanus:129119] pls:rsh: local csh: 0, local bash: 1
[oceanus:129119] pls:rsh: assuming same remote shell as local shell
[oceanus:129119] pls:rsh: remote csh: 0, remote bash: 1
[oceanus:129119] pls:rsh: final template argv:
[oceanus:129119] pls:rsh: /usr/bin/ssh <template> orted --debug
--bootproxy 1 --name <template> --num_procs 2 --vpid_start 0 --nodename
<template> --universe eddie_at_oceanus:default-universe --nsreplica
"0.0.0;tcp://xxx.xxx.xxx.xxx:52428"
--gprreplica "0.0.0;tcp://xxx.xxx.xxx.xxx:52428" --mpi-call-yield 0
[oceanus:129119] pls:rsh: launching on node localhost
[oceanus:129119] pls:rsh: oversubscribed -- setting mpi_yield_when_idle to 1
(1 2)
[oceanus:129119] pls:rsh: localhost is a LOCAL node
[oceanus:129119] pls:rsh: changing to directory /home/eddie
[oceanus:129119] pls:rsh: executing: orted --debug --bootproxy 1 --name
0.0.1 --num_procs 2 --vpid_start 0 --nodename localhost --universe
eddie_at_oceanus:default-universe --nsreplica "0.0.0;tcp://xxx.xxx.xxx.xxx:52428"
--gprreplica "0.0.0;tcp://xxx.xxx.xxx.xxx:52428" --mpi-call-yield 1
[oceanus:129120] [0,0,1] setting up session dir with
[oceanus:129120] universe default-universe
[oceanus:129120] user eddie
[oceanus:129120] host localhost
[oceanus:129120] jobid 0
[oceanus:129120] procid 1
[oceanus:129120] procdir:
/tmp/openmpi-sessions-eddie_at_localhost_0/default-universe/0/1
[oceanus:129120] jobdir:
/tmp/openmpi-sessions-eddie_at_localhost_0/default-universe/0
[oceanus:129120] unidir:
/tmp/openmpi-sessions-eddie_at_localhost_0/default-universe
[oceanus:129120] top: openmpi-sessions-eddie_at_localhost_0
[oceanus:129120] tmp: /tmp
[oceanus:129121] [0,1,0] setting up session dir with
[oceanus:129121] universe default-universe
[oceanus:129121] user eddie
[oceanus:129121] host localhost
[oceanus:129121] jobid 1
[oceanus:129121] procid 0
[oceanus:129121] procdir:
/tmp/openmpi-sessions-eddie_at_localhost_0/default-universe/1/0
[oceanus:129121] jobdir:
/tmp/openmpi-sessions-eddie_at_localhost_0/default-universe/1
[oceanus:129121] unidir:
/tmp/openmpi-sessions-eddie_at_localhost_0/default-universe
[oceanus:129121] top: openmpi-sessions-eddie_at_localhost_0
[oceanus:129121] tmp: /tmp
[oceanus:129122] [0,1,1] setting up session dir with
[oceanus:129122] universe default-universe
[oceanus:129122] user eddie
[oceanus:129122] host localhost
[oceanus:129122] jobid 1
[oceanus:129122] procid 1
[oceanus:129122] procdir:
/tmp/openmpi-sessions-eddie_at_localhost_0/default-universe/1/1
[oceanus:129122] jobdir:
/tmp/openmpi-sessions-eddie_at_localhost_0/default-universe/1
[oceanus:129122] unidir:
/tmp/openmpi-sessions-eddie_at_localhost_0/default-universe
[oceanus:129122] top: openmpi-sessions-eddie_at_localhost_0
[oceanus:129122] tmp: /tmp
[oceanus:129119] spawn: in job_state_callback(jobid = 1, state = 0x4)
[oceanus:129119] Info: Setting up debugger process table for applications
  MPIR_being_debugged = 0
  MPIR_debug_gate = 0
  MPIR_debug_state = 1
  MPIR_acquired_pre_main = 0
  MPIR_i_am_starter = 0
  MPIR_proctable_size = 2
  MPIR_proctable:
    (i, host, exe, pid) = (0, localhost, tut01, 129121)
    (i, host, exe, pid) = (1, localhost, tut01, 129122)
[oceanus:129121] mca_common_sm_mmap_init: ftruncate failed with errno=13
[oceanus:129121] mca_mpool_sm_init: unable to create shared memory mapping (
/tmp/openmpi-sessions-eddie_at_localhost_0/default-universe/1/shared_mem_pool.localhost
)
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems. This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  PML add procs failed
  --> Returned "Out of resource" (-2) instead of "Success" (0)
--------------------------------------------------------------------------
*** An error occurred in MPI_Init
*** before MPI was initialized
*** MPI_ERRORS_ARE_FATAL (goodbye)
[oceanus:129120] sess_dir_finalize: found proc session dir empty - deleting
[oceanus:129120] sess_dir_finalize: job session dir not empty - leaving
[oceanus:129120] sess_dir_finalize: found proc session dir empty - deleting
[oceanus:129120] sess_dir_finalize: found job session dir empty - deleting
[oceanus:129120] sess_dir_finalize: univ session dir not empty - leaving
[oceanus:129120] orted: job_state_callback(jobid = 1, state =
ORTE_PROC_STATE_TERMINATED)
[oceanus:129120] sess_dir_finalize: job session dir not empty - leaving
[oceanus:129120] sess_dir_finalize: found proc session dir empty - deleting
[oceanus:129120] sess_dir_finalize: found job session dir empty - deleting
[oceanus:129120] sess_dir_finalize: found univ session dir empty - deleting
[oceanus:129120] sess_dir_finalize: found top session dir empty - deleting
[eddie_at_oceanus:~/home2/mpi_tut]$

On 1/18/07, Ralph H Castain <rhc_at_[hidden]> wrote:
>
> Hi Eddie
>
> Open MPI needs to create a temporary file system – what we call our
> "session directory" - where it stores things like the shared memory file.
> From this output, it appears that your /tmp directory is "locked" to root
> access only.
>
> You have three options for resolving this problem:
>
> (a) you could make /tmp accessible to general users;
>
> (b) you could use the —tmpdir xxx command line option to point Open MPI at
> another directory that is accessible to the user (for example, you could use
> a "tmp" directory under the user's home directory); or
>
> (c) you could set an MCA parameter OMPI_MCA_tmpdir_base to identify a
> directory we can use instead of /tmp.
>
> If you select options (b) or (c), the only requirement is that this
> location must be accessible on every node being used. Let me be clear on
> this: the tmp directory *must not* be NSF mounted and therefore shared
> across all nodes. However, each node must be able to access a location of
> the given name – that location should be strictly local to each node.
>
> Hope that helps
> Ralph
>
>
>
> On 1/17/07 12:25 AM, "eddie168" <eddie168+ompi_user_at_[hidden]> wrote:
>
> Dear all,
>
> I have recently installed OpenMPI 1.1.2 on a OpenSSI cluster running
> Fedora core 3. I tested a simple hello world mpi program (attached) and it
> runs ok as root. However, if I run the same program under normal user, it
> gives the following error:
>
> [eddie_at_oceanus:~/home2/mpi_tut]$ mpirun -np 2 tut01
> [oceanus:125089] mca_common_sm_mmap_init: ftruncate failed with errno=13
> [oceanus:125089] mca_mpool_sm_init: unable to create shared memory mapping
> ( /tmp/openmpi-sessions-eddie_at_localhost
> _0/default-universe/1/shared_mem_pool.localhost)
> --------------------------------------------------------------------------
>
> It looks like MPI_INIT failed for some reason; your parallel process is
> likely to abort. There are many reasons that a parallel process can
> fail during MPI_INIT; some of which are due to configuration or
> environment
> problems. This failure appears to be an internal failure; here's some
> additional information (which may only be relevant to an Open MPI
> developer):
> PML add procs failed
> --> Returned "Out of resource" (-2) instead of "Success" (0)
> --------------------------------------------------------------------------
>
> *** An error occurred in MPI_Init
> *** before MPI was initialized
> *** MPI_ERRORS_ARE_FATAL (goodbye)
> [eddie_at_oceanus:~/home2/mpi_tut]$
>
> Am I need to give certain permission to the user in order to oversubscribe
> processes?
>
> Thanks in advance,
>
> Eddie.
>
>
>
> ------------------------------
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>