Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Handling output of processes
From: Ralph Castain (rhc_at_[hidden])
Date: 2009-02-02 10:47:42


On Feb 2, 2009, at 2:55 AM, jody wrote:

> Hi Ralph
> The new options are great stuff!
> Following your suggestion, i downloaded and installed
>
> http://www.open-mpi.org/nightly/trunk/openmpi-1.4a1r20392.tar.gz
>
> and tested the new options. (i have a simple cluster of
> 8 machines over tcp). Not everything worked as specified, though:
> * timestamp-output : works

good!

>
> * xterm : doesn't work completely -
> comma-separated rank list:
> Only for the local processes a xterm is opened. The other processes
> (the ones on remote machines) only output to the stdout of the
> calling window.
> (Just to be sure i started my own script for opening separate xterms
> - that did work for the remoties, too)

This is a problem we wrestled with for some time. The issue is that we
really aren't comfortable modifying the DISPLAY envar on the remote
nodes like you do in your script. It is fine for a user to do whatever
they want, but for OMPI to do it...that's another matter. We can't
even know for sure what to do because of the wide range of scenarios
that might occur (e.g., is mpirun local to you, or on a remote node
connected to you via xterm, or...?).

What you (the user) need to do is ensure that X11 is setup properly so
that an Xwindow opened on the remote host is displayed on your screen.
In this case, I believe you have to enable xforwarding - I'm not an
xterm expert, so I can't advise you on how to do this. Suspect you may
already know - in which case, can you please pass it along and I'll
add it to our docs? :-)

>
>
> If a '-1' is given instead of a list of ranks, it fails (locally &
> with remotes):
> [jody_at_localhost neander]$ mpirun -np 4 --xterm -1 ./MPITest
>
> --------------------------------------------------------------------------
> Sorry! You were supposed to get help about:
> orte-odls-base:xterm-rank-out-of-bounds
> from the file:
> help-odls-base.txt
> But I couldn't find any file matching that name. Sorry!
>
> --------------------------------------------------------------------------
>
> --------------------------------------------------------------------------
> mpirun was unable to start the specified application as it
> encountered an error
> on node localhost. More information may be available above.
>
> --------------------------------------------------------------------------

Fixed as of r20398 - this was a bug, had an if statement out of
sequence.

>
> * output-filename : doesn't work here:
> [jody_at_localhost neander]$ mpirun -np 4 --output-filename
> gnagna ./MPITest
> [jody_at_localhost neander]$ ls -l gna*
> -rw-r--r-- 1 jody morpho 549 2009-02-02 09:07 gnagna.%10lu
>
> There is output from the processes on remote machines on stdout,
> but none
> from the local ones.

Fixed as of r20400 - had a format statement syntax that was okay in
some compilers, but not others.

>
>
>
> A question about installing: i installed the usual way (configure,
> make all install),
> but the new man-files apparently weren't copied to their destination:
> If i do 'man mpirun' i get shown the contents of an old man-file
> (without the new options).
> I had to do ' less /opt//openmpi-1.4a1r20394/share/man/man1/mpirun.1'
> to see them.

Strange - the install should put them in the right place, but I wonder
if you updated your manpath to point at it?

>
>
> About the xterm-option : when the application ends all xterms are
> closed immediately.
> (when doing things 'by hand' i used the -hold option for xterm)
> Would it be possible to add this feature for your xterm option?
> Perhaps by adding a '!' at the end of the rank list?

Done! A "!" at the end of the list will activate -hold as of r20398.

>
>
> About orte_iof: with the new version it works, but no matter which
> rank i specify,
> it only prints out rank0's output:
> [jody_at_localhost ~]$ orte-iof --pid 31049 --rank 4 --stdout
> [localhost]I am #0/9 before the barrier
>

The problem here is that the option name changed from "rank" to
"ranks" since you can now specify any number of ranks as comma-
separated ranges. I have updated orte-iof so it will gracefully fail
if you provide an unrecognized cmd line option and output the "help"
detailing the accepted options.

>
>
> Thanks
>
> Jody
>
> On Sun, Feb 1, 2009 at 10:49 PM, Ralph Castain <rhc_at_[hidden]> wrote:
>> I'm afraid we discovered a bug in optimized builds with r20392.
>> Please use
>> any tarball with r20394 or above.
>>
>> Sorry for the confusion
>> Ralph
>>
>>
>> On Feb 1, 2009, at 5:27 AM, Jeff Squyres wrote:
>>
>>> On Jan 31, 2009, at 11:39 AM, Ralph Castain wrote:
>>>
>>>> For anyone following this thread:
>>>>
>>>> I have completed the IOF options discussed below. Specifically, I
>>>> have
>>>> added the following:
>>>>
>>>> * a new "timestamp-output" option that timestamp's each line of
>>>> output
>>>>
>>>> * a new "output-filename" option that redirects each proc's
>>>> output to a
>>>> separate rank-named file.
>>>>
>>>> * a new "xterm" option that redirects the output of the specified
>>>> ranks
>>>> to a separate xterm window.
>>>>
>>>> You can obtain a copy of the updated code at:
>>>>
>>>> http://www.open-mpi.org/nightly/trunk/openmpi-1.4a1r20392.tar.gz
>>>
>>> Sweet stuff. :-)
>>>
>>> Note that the URL/tarball that Ralph cites is a nightly snapshot
>>> and will
>>> expire after a while -- we only keep the most 5 recent nightly
>>> tarballs
>>> available. You can find Ralph's new IOF stuff in any 1.4a1
>>> nightly tarball
>>> after the one he cited above. Note that the last part of the
>>> tarball name
>>> refers to the subversion commit number (which increases
>>> monotonically); any
>>> 1.4 nightly snapshot tarball beyond "r20392" will contain this new
>>> IOF
>>> stuff. Here's where to get our nightly snapshot tarballs:
>>>
>>> http://www.open-mpi.org/nightly/trunk/
>>>
>>> Don't read anything into the "1.4" version number -- we've just
>>> bumped the
>>> version number internally to be different than the current stable
>>> series
>>> (1.3). We haven't yet branched for the v1.4 series; hence, "1.4a1"
>>> currently refers to our development trunk.
>>>
>>> --
>>> Jeff Squyres
>>> Cisco Systems
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users