Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: eddie168 (eddie168+ompi_user_at_[hidden])
Date: 2007-01-18 22:30:01


Just to answer my own question, after I explicitly specify the "--mca btl
tcp" parameter, the program works. So I will need to issue command like
this:

$ mpirun --mca btl tcp -np 2 tut01
oceanus:Hello world from 0
oceanus:Hello world from 1

Regards,

Eddie.

On 1/18/07, eddie168 <eddie168+ompi_user_at_[hidden]> wrote:
>
> Hi Ralph and Brian,
>
> Thanks for the advice, I have checked the permission to /tmp
>
> drwxrwxrwt 19 root root 4096 Jan 18 11:38 tmp
>
> which I think there shouldn't be any problem to create files there, so
> option (a) still not work for me.
>
> I tried option (b) which set --tmpdir on command line and run as normal
> user, it works for -np 1, however it gives the same error for -np 2.
>
> Option (c) also tested by setting "OMPI_MCA_tmpdir_base =
> /home2/mpi_tut/tmp" in "~/.openmpi/mca-params.conf", however error still
> occurred.
>
> I included the debug output of what I ran (with IP masked), I noticed that
> the optional tmp directory set in the beginning of the process, however it
> changed back to "/tmp" after executing orted. Could the error I got related
> to SSH setting?
>
> Many thanks,
>
> Eddie.
>
>
> [eddie_at_oceanus:~/home2/mpi_tut]$ mpirun -d --tmpdir /home2/mpi_tut/tmp -np
> 2 tut01
> [oceanus:129119] [0,0,0] setting up session dir with
> [oceanus:129119] tmpdir /home2/mpi_tut/tmp
> [oceanus:129119] universe default-universe
> [oceanus:129119] user eddie
> [oceanus:129119] host oceanus
> [oceanus:129119] jobid 0
> [oceanus:129119] procid 0
> [oceanus:129119] procdir: /home2/mpi_tut/tmp/openmpi-sessions-eddie_at_oceanus_0/default-universe/0/0
>
> [oceanus:129119] jobdir:
> /home2/mpi_tut/tmp/openmpi-sessions-eddie_at_oceanus_0/default-universe/0
> [oceanus:129119] unidir:
> /home2/mpi_tut/tmp/openmpi-sessions-eddie_at_oceanus_0/default-universe
> [oceanus:129119] top: openmpi-sessions-eddie_at_oceanus_0
> [oceanus:129119] tmp: /home2/mpi_tut/tmp
> [oceanus:129119] [0,0,0] contact_file /home2/mpi_tut/tmp/openmpi-sessions-eddie_at_oceanus_0/default-universe/universe-setup.txt
>
> [oceanus:129119] [0,0,0] wrote setup file
> [oceanus:129119] pls:rsh: local csh: 0, local bash: 1
> [oceanus:129119] pls:rsh: assuming same remote shell as local shell
> [oceanus:129119] pls:rsh: remote csh: 0, remote bash: 1
> [oceanus:129119] pls:rsh: final template argv:
> [oceanus:129119] pls:rsh: /usr/bin/ssh <template> orted --debug
> --bootproxy 1 --name <template> --num_procs 2 --vpid_start 0 --nodename
> <template> --universe eddie_at_oceanus:default-universe --nsreplica "0.0.0;tcp://xxx.xxx.xxx.xxx:52428"
> --gprreplica "0.0.0;tcp://xxx.xxx.xxx.xxx:52428" --mpi-call-yield 0
> [oceanus:129119] pls:rsh: launching on node localhost
> [oceanus:129119] pls:rsh: oversubscribed -- setting mpi_yield_when_idle to
> 1 (1 2)
> [oceanus:129119] pls:rsh: localhost is a LOCAL node
> [oceanus:129119] pls:rsh: changing to directory /home/eddie
> [oceanus:129119] pls:rsh: executing: orted --debug --bootproxy 1 --name
> 0.0.1 --num_procs 2 --vpid_start 0 --nodename localhost --universe eddie_at_oceanus:default-universe
> --nsreplica "0.0.0;tcp://xxx.xxx.xxx.xxx:52428" --gprreplica "0.0.0;tcp://xxx.xxx.xxx.xxx:52428"
> --mpi-call-yield 1
> [oceanus:129120] [0,0,1] setting up session dir with
> [oceanus:129120] universe default-universe
> [oceanus:129120] user eddie
> [oceanus:129120] host localhost
> [oceanus:129120] jobid 0
> [oceanus:129120] procid 1
> [oceanus:129120] procdir: /tmp/openmpi-sessions-eddie_at_localhost_0/default-universe/0/1
>
> [oceanus:129120] jobdir:
> /tmp/openmpi-sessions-eddie_at_localhost_0/default-universe/0
> [oceanus:129120] unidir:
> /tmp/openmpi-sessions-eddie_at_localhost_0/default-universe
> [oceanus:129120] top: openmpi-sessions-eddie_at_localhost_0
> [oceanus:129120] tmp: /tmp
> [oceanus:129121] [0,1,0] setting up session dir with
> [oceanus:129121] universe default-universe
> [oceanus:129121] user eddie
> [oceanus:129121] host localhost
> [oceanus:129121] jobid 1
> [oceanus:129121] procid 0
> [oceanus:129121] procdir:
> /tmp/openmpi-sessions-eddie_at_localhost_0/default-universe/1/0
> [oceanus:129121] jobdir: /tmp/openmpi-sessions-eddie_at_localhost_0/default-universe/1
>
> [oceanus:129121] unidir:
> /tmp/openmpi-sessions-eddie_at_localhost_0/default-universe
> [oceanus:129121] top: openmpi-sessions-eddie_at_localhost_0
> [oceanus:129121] tmp: /tmp
> [oceanus:129122] [0,1,1] setting up session dir with
> [oceanus:129122] universe default-universe
> [oceanus:129122] user eddie
> [oceanus:129122] host localhost
> [oceanus:129122] jobid 1
> [oceanus:129122] procid 1
> [oceanus:129122] procdir:
> /tmp/openmpi-sessions-eddie_at_localhost_0/default-universe/1/1
> [oceanus:129122] jobdir: /tmp/openmpi-sessions-eddie_at_localhost_0/default-universe/1
>
> [oceanus:129122] unidir:
> /tmp/openmpi-sessions-eddie_at_localhost_0/default-universe
> [oceanus:129122] top: openmpi-sessions-eddie_at_localhost_0
> [oceanus:129122] tmp: /tmp
> [oceanus:129119] spawn: in job_state_callback(jobid = 1, state = 0x4)
> [oceanus:129119] Info: Setting up debugger process table for applications
> MPIR_being_debugged = 0
> MPIR_debug_gate = 0
> MPIR_debug_state = 1
> MPIR_acquired_pre_main = 0
> MPIR_i_am_starter = 0
> MPIR_proctable_size = 2
> MPIR_proctable:
> (i, host, exe, pid) = (0, localhost, tut01, 129121)
> (i, host, exe, pid) = (1, localhost, tut01, 129122)
> [oceanus:129121] mca_common_sm_mmap_init: ftruncate failed with errno=13
> [oceanus:129121] mca_mpool_sm_init: unable to create shared memory mapping
> (/tmp/openmpi-sessions-eddie_at_localhost_0/default-universe/1/shared_mem_pool.localhost
> )
> --------------------------------------------------------------------------
> It looks like MPI_INIT failed for some reason; your parallel process is
> likely to abort. There are many reasons that a parallel process can
> fail during MPI_INIT; some of which are due to configuration or
> environment
> problems. This failure appears to be an internal failure; here's some
> additional information (which may only be relevant to an Open MPI
> developer):
>
> PML add procs failed
> --> Returned "Out of resource" (-2) instead of "Success" (0)
> --------------------------------------------------------------------------
> *** An error occurred in MPI_Init
> *** before MPI was initialized
> *** MPI_ERRORS_ARE_FATAL (goodbye)
> [oceanus:129120] sess_dir_finalize: found proc session dir empty -
> deleting
> [oceanus:129120] sess_dir_finalize: job session dir not empty - leaving
> [oceanus:129120] sess_dir_finalize: found proc session dir empty -
> deleting
> [oceanus:129120] sess_dir_finalize: found job session dir empty - deleting
> [oceanus:129120] sess_dir_finalize: univ session dir not empty - leaving
> [oceanus:129120] orted: job_state_callback(jobid = 1, state =
> ORTE_PROC_STATE_TERMINATED)
> [oceanus:129120] sess_dir_finalize: job session dir not empty - leaving
> [oceanus:129120] sess_dir_finalize: found proc session dir empty -
> deleting
> [oceanus:129120] sess_dir_finalize: found job session dir empty - deleting
> [oceanus:129120] sess_dir_finalize: found univ session dir empty -
> deleting
> [oceanus:129120] sess_dir_finalize: found top session dir empty - deleting
>
> [eddie_at_oceanus:~/home2/mpi_tut]$
>
>
> On 1/18/07, Ralph H Castain <rhc_at_[hidden]> wrote:
> >
> > Hi Eddie
> >
> > Open MPI needs to create a temporary file system – what we call our
> > "session directory" - where it stores things like the shared memory file.
> > From this output, it appears that your /tmp directory is "locked" to root
> > access only.
> >
> > You have three options for resolving this problem:
> >
> > (a) you could make /tmp accessible to general users;
> >
> > (b) you could use the —tmpdir xxx command line option to point Open MPI
> > at another directory that is accessible to the user (for example, you could
> > use a "tmp" directory under the user's home directory); or
> >
> > (c) you could set an MCA parameter OMPI_MCA_tmpdir_base to identify a
> > directory we can use instead of /tmp.
> >
> > If you select options (b) or (c), the only requirement is that this
> > location must be accessible on every node being used. Let me be clear on
> > this: the tmp directory *must not* be NSF mounted and therefore shared
> > across all nodes. However, each node must be able to access a location of
> > the given name – that location should be strictly local to each node.
> >
> > Hope that helps
> > Ralph
> >
> >
> >
> > On 1/17/07 12:25 AM, "eddie168" < eddie168+ompi_user_at_[hidden]> wrote:
> >
> > Dear all,
> >
> > I have recently installed OpenMPI 1.1.2 on a OpenSSI cluster running
> > Fedora core 3. I tested a simple hello world mpi program (attached) and it
> > runs ok as root. However, if I run the same program under normal user, it
> > gives the following error:
> >
> > [eddie_at_oceanus:~/home2/mpi_tut]$ mpirun -np 2 tut01
> > [oceanus:125089] mca_common_sm_mmap_init: ftruncate failed with errno=13
> > [oceanus:125089] mca_mpool_sm_init: unable to create shared memory
> > mapping ( /tmp/openmpi- sessions-eddie_at_localhost
> > _0/default-universe/1/shared_mem_pool.localhost)
> > --------------------------------------------------------------------------
> >
> > It looks like MPI_INIT failed for some reason; your parallel process is
> > likely to abort. There are many reasons that a parallel process can
> > fail during MPI_INIT; some of which are due to configuration or
> > environment
> > problems. This failure appears to be an internal failure; here's some
> > additional information (which may only be relevant to an Open MPI
> > developer):
> > PML add procs failed
> > --> Returned "Out of resource" (-2) instead of "Success" (0)
> > --------------------------------------------------------------------------
> >
> > *** An error occurred in MPI_Init
> > *** before MPI was initialized
> > *** MPI_ERRORS_ARE_FATAL (goodbye)
> > [eddie_at_oceanus:~/home2/mpi_tut]$
> >
> > Am I need to give certain permission to the user in order to
> > oversubscribe processes?
> >
> > Thanks in advance,
> >
> > Eddie.
> >
> >
> >
> > ------------------------------
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> >
> >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> >
>