Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] EXTERNAL: Re: Problem with shell when launching jobs with OpenMPI 1.6.5 rsh
From: Blosch, Edwin L (edwin.l.blosch_at_[hidden])
Date: 2014-04-07 17:35:09


That worked!

But still a mystery.

I tried printing the environment immediately before mpirun. Inside the Python wrapper, I do os.system('env') immediately before the subprocess.pOpen( ['mpirun', ..., shell=False ] ) command. This returns SHELL=/bin/csh, and I can confirm that getpwuid, if it works, would also have returned /bin/csh, as that is my default shell.

It is also interesting that it does not matter if the job-submission script is #!/bin/bash or #!/bin/tcsh (properly re-written, of course) -- I get the same errors either way.

So why did the launcher use a bash syntax on the remote host? It does not seem to be behaving exactly as you described.

But telling it to check the remote shell did the trick.

Thanks

-----Original Message-----
From: users [mailto:users-bounces_at_[hidden]] On Behalf Of Ralph Castain
Sent: Monday, April 07, 2014 4:12 PM
To: Open MPI Users
Subject: Re: [OMPI users] EXTERNAL: Re: Problem with shell when launching jobs with OpenMPI 1.6.5 rsh

I doubt that the rsh launcher is getting confused by the cmd you show below. However, if that command is embedded in a script that changes the shell away from your default shell, then yes - it might get confused. When the rsh launcher spawns your remote orted, it attempts to set some envars to ensure things are correctly setup (e.g., that the path is right). Thus, it needs to know what the remove shell is going to be.

If given no other direction, it assumes that both the remote shell and your current shell are your default shell as reported by getpwuid (if available - otherwise, it falls back to the SHELL envar). If the remote shell can be something different, then you need to set the "plm_rsh_assume_same_shell=0" MCA param so it will check the remote shell.

On Apr 7, 2014, at 1:53 PM, Blosch, Edwin L <edwin.l.blosch_at_[hidden]> wrote:

> Thanks Noam, that makes sense.
>
> Yes, I did mean to do ". hello" (with space in between). That was an attempt to replicate whatever OpenMPI is doing.
>
> In the first post I mentioned that my mpirun command actually gets executed from within a Python script using the subprocess module. I don't know the details of the rsh launcher, but there are 3 remote hosts in the hosts file, and 3 sets of the error messages below. May be the rsh launcher is getting confused, doing something that is only valid under bash even though my default login environment is /bin/csh.
>
> mpirun --machinefile mpihosts.914 -np 48 -x LD_LIBRARY_PATH --mca orte_rsh_agent /usr/bin/rsh solver_openmpi -i flow.inp >& output
>
> % cat output
>
> /bin/.: Permission denied.
> OPAL_PREFIX=/apps/local/test/openmpi: Command not found.
> export: Command not found.
> PATH=/apps/local/test/openmpi/bin:/bin:/usr/bin:/usr/ccs/bin:/usr/local/bin:/usr/openwin/bin:/usr/local/etc:/home/bloscel/bin:/usr/ucb:/usr/bsd: Command not found.
> export: Command not found.
> LD_LIBRARY_PATH: Undefined variable.
> /bin/.: Permission denied.
> OPAL_PREFIX=/apps/local/test/openmpi: Command not found.
> export: Command not found.
> PATH=/apps/local/test/openmpi/bin:/bin:/usr/bin:/usr/ccs/bin:/usr/local/bin:/usr/openwin/bin:/usr/local/etc:/home/bloscel/bin:/usr/ucb:/usr/bsd: Command not found.
> export: Command not found.
> LD_LIBRARY_PATH: Undefined variable.
> /bin/.: Permission denied.
> OPAL_PREFIX=/apps/local/test/openmpi: Command not found.
> export: Command not found.
> PATH=/apps/local/test/openmpi/bin:/bin:/usr/bin:/usr/ccs/bin:/usr/local/bin:/usr/openwin/bin:/usr/local/etc:/home/bloscel/bin:/usr/ucb:/usr/bsd: Command not found.
> export: Command not found.
> LD_LIBRARY_PATH: Undefined variable.
>
> -----Original Message-----
> From: users [mailto:users-bounces_at_[hidden]] On Behalf Of Noam Bernstein
> Sent: Monday, April 07, 2014 3:41 PM
> To: Open MPI Users
> Subject: Re: [OMPI users] EXTERNAL: Re: Problem with shell when launching jobs with OpenMPI 1.6.5 rsh
>
>
> On Apr 7, 2014, at 4:36 PM, Blosch, Edwin L <edwin.l.blosch_at_[hidden]> wrote:
>
>> I guess this is not OpenMPI related anymore. I can repeat the essential problem interactively:
>>
>> % echo $SHELL
>> /bin/csh
>>
>> % echo $SHLVL
>> 1
>>
>> % cat hello
>> echo Hello
>>
>> % /bin/bash hello
>> Hello
>>
>> % /bin/csh hello
>> Hello
>>
>> % . hello
>> /bin/.: Permission denied
>
> . is a bash internal which evaluates the contents of the file in the current shell. Since you're running csh, it's just looking for an executable named ., which does not exist (the csh analog of bash's . is source). /bin/. _is_ in your path, but it's a directory (namely /bin itself), which cannot be executed, hence the error. Perhaps you meant to do
> ./hello
> which means (both in bash and csh) run the script hello in the current working directory (.), rather than looking for it in the list of directories in $PATH
>
> Noam
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

_______________________________________________
users mailing list
users_at_[hidden]
http://www.open-mpi.org/mailman/listinfo.cgi/users