Thanks for the fixes and the "!".
The "!" works, but i still don't have any xterms from my remote nodes
even with all my xhost+ and -x DISPLAY tricks explained below :(
It creates files, but only for the local processes:
[jody_at_localhost neander]$ mpirun -np 8 -hostfile testhosts
--output-filename gnana ./MPITest
... output ...
[jody_at_localhost neander]$ ls -l gna*
-rw-r--r-- 1 jody morpho 549 2009-02-03 18:02 gnana.0
-rw-r--r-- 1 jody morpho 549 2009-02-03 18:02 gnana.1
-rw-r--r-- 1 jody morpho 549 2009-02-03 18:02 gnana.2
( i set slots=3 on my workstation)
Regarding xterms - i'm also no big expert on xterms, but i managed to
get things working for my environment...
Generally, in order to enable X-forwarding, i *would* set the option
in the /etc/ssh/sshd_config on the server, and
in the /etc/ssh/ssh_config on the client.
I say 'would', because to actually use x forwarding you need to call
ssh with the '-X' option.
Correct me if i'm wrong, but i suspect the -X option is not used when
Open-MPI makes a connection.
So what i currently do to have my xterms running:
on my workstation i call
xhost + <hostname> for all
machines in my hostfile, to allow them to use X on my workstation.
Then i set my DISPLAY variable to point to my workstation
Finally, i call mpirun with the -x option (to exports the DISPLAY
variable to all nodes) :
mpirun -np 4 -hostfile myfiles -x DISPLAY run_xterm.sh MyApplication arg1 arg2
Here run_xterm.sh is a shell script which creates a useful title for
the xterm window
and calls the application with all its arguments (-hold leaves the
xterm open after the program terminates):
# feedback for command line
echo "Running on node `hostname`"
# for version 1.2 use undocumented env variable
# for version 1.3 use documented env variable
if [ X$ID = X ]; then
export TITLE="node #$ID"
# start terminal
xterm -T "$TITLE" -hold -e $*
(i have similar scripts to run gdb or valgrind in xterm windows)
I know that the 'xhost +' is a horror for certain sysadmins,
but i feel quite safe, because the machines listed in my hostfile
are not accessible from outside our department.
I haven't found any other alternative to have nice xterms when i can't
use 'ssh -X'.
To come back to the '--xterm' option: i just ran my xterm-script after
doing the above xhost+ and DISPLAY things, and it worked - all local and remote
processes created their xterm windows. (In other words, the environment was
set to have my remote nodes use xterms on my workstation.)
Immediately thereafter i called the same application with
mpirun -np 8 -hostfile testhosts --xterm 2,3,4,5! -x DISPLAY ./MPITest
but still, only the local process (#2) created an xterm.
Do you think it would be possible to have open MPI make its
ssh-connections with '-X',
or are there technical or security-related objections?
On Mon, Feb 2, 2009 at 4:47 PM, Ralph Castain <rhc_at_[hidden]> wrote:
> On Feb 2, 2009, at 2:55 AM, jody wrote:
>> Hi Ralph
>> The new options are great stuff!
>> Following your suggestion, i downloaded and installed
>> and tested the new options. (i have a simple cluster of
>> 8 machines over tcp). Not everything worked as specified, though:
>> * timestamp-output : works
>> * xterm : doesn't work completely -
>> comma-separated rank list:
>> Only for the local processes a xterm is opened. The other processes
>> (the ones on remote machines) only output to the stdout of the
>> calling window.
>> (Just to be sure i started my own script for opening separate xterms
>> - that did work for the remoties, too)
> This is a problem we wrestled with for some time. The issue is that we
> really aren't comfortable modifying the DISPLAY envar on the remote nodes
> like you do in your script. It is fine for a user to do whatever they want,
> but for OMPI to do it...that's another matter. We can't even know for sure
> what to do because of the wide range of scenarios that might occur (e.g., is
> mpirun local to you, or on a remote node connected to you via xterm,
> What you (the user) need to do is ensure that X11 is setup properly so that
> an Xwindow opened on the remote host is displayed on your screen. In this
> case, I believe you have to enable xforwarding - I'm not an xterm expert, so
> I can't advise you on how to do this. Suspect you may already know - in
> which case, can you please pass it along and I'll add it to our docs? :-)
>> If a '-1' is given instead of a list of ranks, it fails (locally &
>> with remotes):
>> [jody_at_localhost neander]$ mpirun -np 4 --xterm -1 ./MPITest
>> Sorry! You were supposed to get help about:
>> from the file:
>> But I couldn't find any file matching that name. Sorry!
>> mpirun was unable to start the specified application as it
>> encountered an error
>> on node localhost. More information may be available above.
> Fixed as of r20398 - this was a bug, had an if statement out of sequence.
>> * output-filename : doesn't work here:
>> [jody_at_localhost neander]$ mpirun -np 4 --output-filename gnagna
>> [jody_at_localhost neander]$ ls -l gna*
>> -rw-r--r-- 1 jody morpho 549 2009-02-02 09:07 gnagna.%10lu
>> There is output from the processes on remote machines on stdout, but
>> from the local ones.
> Fixed as of r20400 - had a format statement syntax that was okay in some
> compilers, but not others.
>> A question about installing: i installed the usual way (configure,
>> make all install),
>> but the new man-files apparently weren't copied to their destination:
>> If i do 'man mpirun' i get shown the contents of an old man-file
>> (without the new options).
>> I had to do ' less /opt//openmpi-1.4a1r20394/share/man/man1/mpirun.1'
>> to see them.
> Strange - the install should put them in the right place, but I wonder if
> you updated your manpath to point at it?
>> About the xterm-option : when the application ends all xterms are
>> closed immediately.
>> (when doing things 'by hand' i used the -hold option for xterm)
>> Would it be possible to add this feature for your xterm option?
>> Perhaps by adding a '!' at the end of the rank list?
> Done! A "!" at the end of the list will activate -hold as of r20398.
>> About orte_iof: with the new version it works, but no matter which
>> rank i specify,
>> it only prints out rank0's output:
>> [jody_at_localhost ~]$ orte-iof --pid 31049 --rank 4 --stdout
>> [localhost]I am #0/9 before the barrier
> The problem here is that the option name changed from "rank" to "ranks"
> since you can now specify any number of ranks as comma-separated ranges. I
> have updated orte-iof so it will gracefully fail if you provide an
> unrecognized cmd line option and output the "help" detailing the accepted
>> On Sun, Feb 1, 2009 at 10:49 PM, Ralph Castain <rhc_at_[hidden]> wrote:
>>> I'm afraid we discovered a bug in optimized builds with r20392. Please
>>> any tarball with r20394 or above.
>>> Sorry for the confusion
>>> On Feb 1, 2009, at 5:27 AM, Jeff Squyres wrote:
>>>> On Jan 31, 2009, at 11:39 AM, Ralph Castain wrote:
>>>>> For anyone following this thread:
>>>>> I have completed the IOF options discussed below. Specifically, I have
>>>>> added the following:
>>>>> * a new "timestamp-output" option that timestamp's each line of output
>>>>> * a new "output-filename" option that redirects each proc's output to a
>>>>> separate rank-named file.
>>>>> * a new "xterm" option that redirects the output of the specified ranks
>>>>> to a separate xterm window.
>>>>> You can obtain a copy of the updated code at:
>>>> Sweet stuff. :-)
>>>> Note that the URL/tarball that Ralph cites is a nightly snapshot and
>>>> expire after a while -- we only keep the most 5 recent nightly tarballs
>>>> available. You can find Ralph's new IOF stuff in any 1.4a1 nightly
>>>> after the one he cited above. Note that the last part of the tarball
>>>> refers to the subversion commit number (which increases monotonically);
>>>> 1.4 nightly snapshot tarball beyond "r20392" will contain this new IOF
>>>> stuff. Here's where to get our nightly snapshot tarballs:
>>>> Don't read anything into the "1.4" version number -- we've just bumped
>>>> version number internally to be different than the current stable series
>>>> (1.3). We haven't yet branched for the v1.4 series; hence, "1.4a1"
>>>> currently refers to our development trunk.
>>>> Jeff Squyres
>>>> Cisco Systems
>>>> users mailing list
>>> users mailing list
>> users mailing list
> users mailing list