Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Handling output of processes
From: jody (jody.xha_at_[hidden])
Date: 2009-02-03 12:16:51


Hi Ralph
Thanks for the fixes and the "!".

--xterm:
The "!" works, but i still don't have any xterms from my remote nodes
even with all my xhost+ and -x DISPLAY tricks explained below :(

--output-filename
It creates files, but only for the local processes:
[jody_at_localhost neander]$ mpirun -np 8 -hostfile testhosts
--output-filename gnana ./MPITest
   ... output ...
[jody_at_localhost neander]$ ls -l gna*
-rw-r--r-- 1 jody morpho 549 2009-02-03 18:02 gnana.0
-rw-r--r-- 1 jody morpho 549 2009-02-03 18:02 gnana.1
-rw-r--r-- 1 jody morpho 549 2009-02-03 18:02 gnana.2
( i set slots=3 on my workstation)

---
Regarding xterms - i'm also no big expert on xterms, but i managed to
get things working for my environment...
Generally, in order to enable X-forwarding, i *would* set the option
   X11Fowarding yes
in the /etc/ssh/sshd_config on the server, and
   X11Fowarding yes
in the /etc/ssh/ssh_config on the client.
I say 'would', because to actually use x forwarding you need to call
ssh with the '-X' option.
Correct me if i'm wrong, but i suspect the -X option is not used when
Open-MPI makes a connection.
So what i currently do to have my xterms running:
on my workstation i call
   xhost + <hostname> for all
machines in my hostfile, to allow them to use X on my workstation.
Then i set my DISPLAY variable to point to my workstation
  export DISPLAY=<mymachine>:0.0
Finally, i call mpirun with the -x option (to exports the DISPLAY
variable to all nodes) :
  mpirun -np 4 -hostfile myfiles -x DISPLAY run_xterm.sh MyApplication arg1 arg2
Here run_xterm.sh is a shell script which creates a useful title for
the xterm window
and calls the application with all its arguments (-hold leaves the
xterm open after the program terminates):
#!/bin/sh -f
# feedback for command line
echo "Running on node `hostname`"
# for version 1.2 use undocumented env variable
# for version 1.3 use documented env variable
export ID=$OMPI_COMM_WORLD_RANK
if [ X$ID = X ]; then
  export ID=$OMPI_MCA_ns_nds_vpid
fi
export TITLE="node #$ID"
# start terminal
xterm -T "$TITLE" -hold  -e  $*
exit 0
(i have similar scripts to run gdb or valgrind in xterm windows)
I know that the 'xhost +' is a horror for certain sysadmins,
but i feel quite safe, because the machines listed in my hostfile
are not accessible from outside our department.
I haven't found any other alternative to have nice xterms when i can't
use 'ssh -X'.
To come back to the '--xterm' option: i just ran my xterm-script after
doing the above xhost+ and DISPLAY things, and it worked - all local and remote
processes created their xterm windows. (In other words, the environment was
set to have my remote nodes use xterms on my workstation.)
Immediately thereafter i called the same application with
   mpirun -np 8 -hostfile testhosts --xterm 2,3,4,5! -x DISPLAY ./MPITest
but still, only the local process (#2) created an xterm.
Do you think it would be possible to have open MPI make its
ssh-connections with '-X',
or are there technical or security-related objections?
Regards
  Jody
On Mon, Feb 2, 2009 at 4:47 PM, Ralph Castain <rhc_at_[hidden]> wrote:
>
> On Feb 2, 2009, at 2:55 AM, jody wrote:
>
>> Hi Ralph
>> The new options are great stuff!
>> Following your suggestion, i downloaded and installed
>>
>> http://www.open-mpi.org/nightly/trunk/openmpi-1.4a1r20392.tar.gz
>>
>> and tested the new options. (i have a simple cluster of
>> 8 machines over tcp). Not everything worked as specified, though:
>> * timestamp-output : works
>
> good!
>
>>
>> * xterm : doesn't work completely -
>>  comma-separated rank list:
>>  Only for the local processes a xterm is opened. The other processes
>>  (the ones on remote machines) only output to the stdout of the
>> calling window.
>>  (Just to be sure i started my own script for opening separate xterms
>> - that did work for the remoties, too)
>
> This is a problem we wrestled with for some time. The issue is that we
> really aren't comfortable modifying the DISPLAY envar on the remote nodes
> like you do in your script. It is fine for a user to do whatever they want,
> but for OMPI to do it...that's another matter. We can't even know for sure
> what to do because of the wide range of scenarios that might occur (e.g., is
> mpirun local to you, or on a remote node connected to you via xterm,
> or...?).
>
> What you (the user) need to do is ensure that X11 is setup properly so that
> an Xwindow opened on the remote host is displayed on your screen. In this
> case, I believe you have to enable xforwarding - I'm not an xterm expert, so
> I can't advise you on how to do this. Suspect you may already know - in
> which case, can you please pass it along and I'll add it to our docs? :-)
>
>>
>>
>>  If a '-1' is given instead of a list of ranks, it fails (locally &
>> with remotes):
>>    [jody_at_localhost neander]$  mpirun -np 4 --xterm -1 ./MPITest
>>
>>  --------------------------------------------------------------------------
>>    Sorry!  You were supposed to get help about:
>>        orte-odls-base:xterm-rank-out-of-bounds
>>    from the file:
>>        help-odls-base.txt
>>    But I couldn't find any file matching that name.  Sorry!
>>
>>  --------------------------------------------------------------------------
>>
>>  --------------------------------------------------------------------------
>>    mpirun was unable to start the specified application as it
>> encountered an error
>>    on node localhost. More information may be available above.
>>
>>  --------------------------------------------------------------------------
>
>
> Fixed as of r20398 - this was a bug, had an if statement out of sequence.
>
>
>>
>> * output-filename : doesn't work here:
>>   [jody_at_localhost neander]$  mpirun -np 4 --output-filename gnagna
>> ./MPITest
>>   [jody_at_localhost neander]$ ls -l gna*
>>   -rw-r--r-- 1 jody morpho 549 2009-02-02 09:07 gnagna.%10lu
>>
>>   There is output from the processes on remote machines on stdout, but
>> none
>>   from the local ones.
>
> Fixed as of r20400 - had a format statement syntax that was okay in some
> compilers, but not others.
>
>>
>>
>>
>> A question about installing: i installed the usual way (configure,
>> make all install),
>> but the new man-files apparently weren't copied to their destination:
>> If i do 'man mpirun' i get shown the contents of an old man-file
>> (without the new options).
>> I had to do '  less /opt//openmpi-1.4a1r20394/share/man/man1/mpirun.1'
>> to see them.
>
> Strange - the install should put them in the right place, but I wonder if
> you updated your manpath to point at it?
>
>>
>>
>> About the xterm-option : when the application ends all xterms are
>> closed immediately.
>> (when doing things 'by hand' i used the -hold option for xterm)
>> Would it be possible to add this feature for your xterm option?
>> Perhaps by adding a '!' at the end of the rank list?
>
> Done! A "!" at the end of the list will activate -hold as of r20398.
>
>>
>>
>> About orte_iof: with the new version it works, but no matter which
>> rank i specify,
>> it only prints out rank0's output:
>>  [jody_at_localhost ~]$ orte-iof --pid 31049   --rank 4 --stdout
>>  [localhost]I am #0/9 before the barrier
>>
>
> The problem here is that the option name changed from "rank" to "ranks"
> since you can now specify any number of ranks as comma-separated ranges. I
> have updated orte-iof so it will gracefully fail if you provide an
> unrecognized cmd line option and output the "help" detailing the accepted
> options.
>
>
>>
>>
>> Thanks
>>
>> Jody
>>
>> On Sun, Feb 1, 2009 at 10:49 PM, Ralph Castain <rhc_at_[hidden]> wrote:
>>>
>>> I'm afraid we discovered a bug in optimized builds with r20392. Please
>>> use
>>> any tarball with r20394 or above.
>>>
>>> Sorry for the confusion
>>> Ralph
>>>
>>>
>>> On Feb 1, 2009, at 5:27 AM, Jeff Squyres wrote:
>>>
>>>> On Jan 31, 2009, at 11:39 AM, Ralph Castain wrote:
>>>>
>>>>> For anyone following this thread:
>>>>>
>>>>> I have completed the IOF options discussed below. Specifically, I have
>>>>> added the following:
>>>>>
>>>>> * a new "timestamp-output" option that timestamp's each line of output
>>>>>
>>>>> * a new "output-filename" option that redirects each proc's output to a
>>>>> separate rank-named file.
>>>>>
>>>>> * a new "xterm" option that redirects the output of the specified ranks
>>>>> to a separate xterm window.
>>>>>
>>>>> You can obtain a copy of the updated code at:
>>>>>
>>>>> http://www.open-mpi.org/nightly/trunk/openmpi-1.4a1r20392.tar.gz
>>>>
>>>> Sweet stuff.  :-)
>>>>
>>>> Note that the URL/tarball that Ralph cites is a nightly snapshot and
>>>> will
>>>> expire after a while -- we only keep the most 5 recent nightly tarballs
>>>> available.  You can find Ralph's new IOF stuff in any 1.4a1 nightly
>>>> tarball
>>>> after the one he cited above.  Note that the last part of the tarball
>>>> name
>>>> refers to the subversion commit number (which increases monotonically);
>>>> any
>>>> 1.4 nightly snapshot tarball beyond "r20392" will contain this new IOF
>>>> stuff.  Here's where to get our nightly snapshot tarballs:
>>>>
>>>>  http://www.open-mpi.org/nightly/trunk/
>>>>
>>>> Don't read anything into the "1.4" version number -- we've just bumped
>>>> the
>>>> version number internally to be different than the current stable series
>>>> (1.3).  We haven't yet branched for the v1.4 series; hence, "1.4a1"
>>>> currently refers to our development trunk.
>>>>
>>>> --
>>>> Jeff Squyres
>>>> Cisco Systems
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>