Let me get this straight. You are executing mpirun from inside a c-shell script, launching onto nodes where you will by default be running bash. The param I gave you should support that mode - it basically tells OMPI to probe the remote node to discover what shell it will run under there, and then formats the orted cmd line accordingly. If that isn't working (and it almost never gets used, so may have bit-rotted), then your only option is to convert the c-shell to bash.
However, you are saying that the app you are asking us to run is a c-shell script??? Have you included the !/bin/csh directive in the top of that file so the system will automatically exec it using csh?
Note that the orted comes alive and running prior to your "app" being executed, so the fact that your "app" is a c-shell script is irrelevant.
On Jul 5, 2011, at 9:15 AM, yanyg_at_[hidden] wrote:
> Thanks, Ralph.
> Your information is very deep and detailed.
> I tried with your suggestion to set ""-mca
> plm_rsh_assume_same_shell 0", it still does not work though. My
> situation is that we start a c-shell script from bash shell, which in
> turn invokes mpirun to other slave nodes. These slave nodes have
> bash login shell by default, and mpirun will execute another c-shell
> script on each node, will these mess thing up a little bit and related
> to the orted missing message?
> Thanks again,
> On Jun 28, 2011, at 3:52 PM, yanyg_at_[hidden] wrote:
> I looked a little deeper into this. I keep forgetting that we changed
> our default settings a few years ago. In the dim past, OMPI would
> always probe the remote node to find out what shell it was using,
> and then use the proper command syntax for that shell. However,
> people complained about the extra time during launch, and very
> very few people actually used mis-matched shells.
> So we changed the setting the other way to default to assuming the
> remote shell is the same as the local one. For those like yourself
> that actually do have a mismatch, we left a parameter you can set
> to override that assumption. Just add "-mca
> plm_rsh_assume_same_shell 0" to your mpirun cmd line and it
> should resolve the problem.
> users mailing list