Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] mpirun does not propagate environment from master node to slave nodes
From: Ralph Castain (rhc_at_[hidden])
Date: 2011-06-28 22:03:11


On Jun 28, 2011, at 3:52 PM, yanyg_at_[hidden] wrote:

> Thanks, Ralph!
>
> a) Yes, I know I could use only IB by "--mca btl openib", but just
> want to make sure I am using IB interfaces. I am seeking an option
> to mpirun to print out the actual interconnect protocol, like --prot to
> mpirun in MPICH2.
>
> b) Yes, my default shell is bash, but I run a c-shell script from bash
> terminal, mpirun is invoked inside this c-shell script. I am using rsh
> launcher, exactly as your guess. I try different mpirun command in
> the c-shell, one of them is
>
> /path/to/bin/mpirun --mca btl openib --app appfile
>
> and mpirun and orted are under /path/to/bin, and necessary libs are
> under /path/to/lib. I tried the -x, --prefix, and -path, all does not work
> as expected to propagate the PATH and LD_LIBRARY_PATH,
> since orted is not found on slave nodes, although it shoud since it
> on the shared NFS partition.
>

I looked a little deeper into this. I keep forgetting that we changed our default settings a few years ago. In the dim past, OMPI would always probe the remote node to find out what shell it was using, and then use the proper command syntax for that shell. However, people complained about the extra time during launch, and very very few people actually used mis-matched shells.

So we changed the setting the other way to default to assuming the remote shell is the same as the local one. For those like yourself that actually do have a mismatch, we left a parameter you can set to override that assumption. Just add "-mca plm_rsh_assume_same_shell 0" to your mpirun cmd line and it should resolve the problem.

> Thanks,
> Yiguang
>
>
> On Jun 28, 2011, at 9:05 AM, yanyg_at_[hidden] wrote:
>
>> Hello All,
>>
>> I installed Open MPI 1.4.3 on our new HPC blades, with Infiniband
>> interconnection.
>>
>> My system environments are as:
>>
>> 1)uname -a output:
>> Linux gulftown 2.6.18-194.el5 #1 SMP Tue Mar 16 21:52:39 EDT
>> 2010 x86_64 x86_64 x86_64 GNU/Linux
>>
>> 2) /home is mounted over all nodes, and mpirun is started under
>> /home/...
>>
>> Open MPI and application codes are compiled with intel(R)
>> compilers V11. Infiniband stack is Mellanox OFED 1.5.2.
>>
>> I have two questions about mpirun:
>>
>> a) how could I get to know what is the network interconnect
>> protocol used by the MPI application?
>>
>> I specify "--mca btl openib,self,sm,tcp" to mpirun, but I want to
>> make sure it really uses infiniband interconnect.
>
> Why specify tcp if you don't want it used? Just leave that off and it
> will have no choice but to use IB.
>
>
>
>>
>> b) when I run mpirun, I get the following message:
>
>> It seems orted is not found on slave nodes. If I set the PATH and
>> LD_LIBRARY_PATH through --prefix to mpirun, or --path, or -x
>> options to mpirun, to make the orted and related dynamic libs
>> available on slave nodes, it does not work as expected from
> mpirun
>> manual page. The only working case is that I set PATH and
>> LD_LIBRARY_PATH in ~/.bashrc for mpirun, and this .bashrc is
>> invoked by slave nodes too for login shell. I do not want to set
> PATH
>> and LD_LIBRARY_PATH in ~/.bashrc, but instead to set options
> to
>> mpirun directly.
>
> Should work with either prefix or -x options, assuming the right
> syntax with the latter.
>
> I take it your default shell is bash, and that you are using the rsh
> launcher (as opposed to something like torque)? Are you launching
> from your default shell, or did you perhaps change shell?
>
> Can you send the actual mpirun command you typed?
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users