Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

From: Adam C Powell IV (hazelsct_at_[hidden])
Date: 2007-07-18 14:27:19


On Wed, 2007-07-18 at 13:44 -0400, Tim Prins wrote:
> Adam C Powell IV wrote:
> > As mentioned, I'm running in a chroot environment, so rsh and ssh won't
> > work: "rsh localhost" will rsh into the primary local host environment,
> > not the chroot, which will fail.
> >
> > [The purpose is to be able to build and test MPI programs in the Debian
> > unstable distribution, without upgrading the whole machine to unstable.
> > Though most machines I use for this purpose run Debian stable or
> > testing, the machine I'm currently using runs a very old Fedora, for
> > which I don't think OpenMPI is available.]
>
> Allright, I understand what you are trying to do now. To be honest, I
> don't think we have ever really thought about this use case. We always
> figured that to test Open MPI people would simply install it in a
> different directory and use it from there.
>
> > With MPICH, mpirun -np 1 just runs the new process in the current
> > context, without rsh/ssh, so it works in a chroot. Does OpenMPI not
> > support this functionality?
>
> Open MPI does support this functionality. First, a bit of explanation:
>
> We use 'pls' (process launching system) components to handling the
> launching of processes. There are components for slurm, gridengine, rsh,
> and others. At runtime we open each of these components and query them
> as to whether they can be used. The original error you posted says that
> none of the 'pls' components can be used because all of they detected
> they could not run in your setup. The slurm one excluded itself because
> there were no environment variables set indicating it is running under
> SLURM. Similarly, the gridengine pls said it cannot run as well. The
> 'rsh' pls said it cannot run because neither 'ssh' nor 'rsh' are
> available (I assume this is the case, though you did not explicitly say
> they were not available).
>
> But in this case, you do want the 'rsh' pls to be used. It will
> automatically fork any local processes, and will user rsh/ssh to launch
> any remote processes. Again, I don't think we ever imagined the use case
> on a UNIX-like system where there are no launchers like SLURM
> available, and rsh/ssh also wasn't available (Open MPI is, after all,
> primarily concerned with multi-node operation).
>
> So, there are several ways around this:
>
> 1. Make rsh or ssh available, even though they will not be used.
>
> 2. Tell the 'rsh' pls component to use a dummy program such as
> /bin/false by adding the following to the command line:
> -mca pls_rsh_agent /bin/false
>
> 3. Create a dummy 'rsh' executable that is available in your path.
>
> For instance:
>
> [tprins_at_odin ~]$ which ssh
> /usr/bin/which: no ssh in
> (/u/tprins/usr/ompia/bin:/u/tprins/usr/bin:/usr/local/bin:/bin:/usr/X11R6/bin)
> [tprins_at_odin ~]$ which rsh
> /usr/bin/which: no rsh in
> (/u/tprins/usr/ompia/bin:/u/tprins/usr/bin:/usr/local/bin:/bin:/usr/X11R6/bin)
> [tprins_at_odin ~]$ mpirun -np 1 hostname
> [odin.cs.indiana.edu:18913] [0,0,0] ORTE_ERROR_LOG: Error in file
> runtime/orte_init_stage1.c at line 317
> --------------------------------------------------------------------------
> It looks like orte_init failed for some reason; your parallel process is
> likely to abort. There are many reasons that a parallel process can
> fail during orte_init; some of which are due to configuration or
> environment problems. This failure appears to be an internal failure;
> here's some additional information (which may only be relevant to an
> Open MPI developer):
>
> orte_pls_base_select failed
> --> Returned value Error (-1) instead of ORTE_SUCCESS
>
> --------------------------------------------------------------------------
> [odin.cs.indiana.edu:18913] [0,0,0] ORTE_ERROR_LOG: Error in file
> runtime/orte_system_init.c at line 46
> [odin.cs.indiana.edu:18913] [0,0,0] ORTE_ERROR_LOG: Error in file
> runtime/orte_init.c at line 52
> [odin.cs.indiana.edu:18913] [0,0,0] ORTE_ERROR_LOG: Error in file
> orterun.c at line 399
>
> [tprins_at_odin ~]$ mpirun -np 1 -mca pls_rsh_agent /bin/false hostname
> odin.cs.indiana.edu
>
> [tprins_at_odin ~]$ touch usr/bin/rsh
> [tprins_at_odin ~]$ chmod +x usr/bin/rsh
> [tprins_at_odin ~]$ mpirun -np 1 hostname
> odin.cs.indiana.edu
> [tprins_at_odin ~]$
>
>
> I hope this helps,
>
> Tim

Yes, this helps tremendously. I installed rsh, and now it pretty much
works.

The one missing detail is that I can't seem to get the stdout/stderr
output. For example:

$ orterun -np 1 uptime
$ uptime
18:24:27 up 13 days, 3:03, 0 users, load average: 0.00, 0.03, 0.00

The man page indicates that stdout/stderr is supposed to come back to
the stdout/stderr of the orterun process. Any ideas on why this isn't
working?

Thank you again!

-Adam

-- 
GPG fingerprint: D54D 1AEE B11C CE9B A02B  C5DD 526F 01E8 564E E4B6
Welcome to the best software in the world today cafe!
http://www.take6.com/albums/greatesthits.html