Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Handling output of processes
From: jody (jody.xha_at_[hidden])
Date: 2009-02-02 05:06:38


Hi Ralph
one more thing i noticed while trying out orte_iof again.

The option --report-pid crashes mpirun:
[jody_at_localhost neander]$ mpirun -report-pid -np 2 ./MPITest
[localhost:31146] *** Process received signal ***
[localhost:31146] Signal: Segmentation fault (11)
[localhost:31146] Signal code: Address not mapped (1)
[localhost:31146] Failing at address: 0x24
[localhost:31146] [ 0] [0x11040c]
[localhost:31146] [ 1] /opt/openmpi/lib/openmpi/mca_odls_default.so [0x1e8f9d]
[localhost:31146] [ 2]
/opt/openmpi/lib/libopen-rte.so.0(orte_daemon_cmd_processor+0x4d1)
[0x132541]
[localhost:31146] [ 3] /opt/openmpi/lib/libopen-pal.so.0 [0x170248]
[localhost:31146] [ 4]
/opt/openmpi/lib/libopen-pal.so.0(opal_event_loop+0x27) [0x170497]
[localhost:31146] [ 5]
/opt/openmpi/lib/libopen-pal.so.0(opal_progress+0xcb) [0x16399b]
[localhost:31146] [ 6]
/opt/openmpi/lib/libopen-rte.so.0(orte_plm_base_launch_apps+0x30d)
[0x1441ad]
[localhost:31146] [ 7] /opt/openmpi/lib/openmpi/mca_plm_rsh.so [0x1c833b]
[localhost:31146] [ 8] mpirun [0x804acf6]
[localhost:31146] [ 9] mpirun [0x804a0a6]
[localhost:31146] [10] /lib/libc.so.6(__libc_start_main+0xe0) [0x98d390]
[localhost:31146] [11] mpirun [0x8049fd1]
[localhost:31146] *** End of error message ***
Segmentation fault

This always happens, irrespective of the number of processes,
or whether locally only or with remote machines.

Jody

On Mon, Feb 2, 2009 at 10:55 AM, jody <jody.xha_at_[hidden]> wrote:
> Hi Ralph
> The new options are great stuff!
> Following your suggestion, i downloaded and installed
>
> http://www.open-mpi.org/nightly/trunk/openmpi-1.4a1r20392.tar.gz
>
> and tested the new options. (i have a simple cluster of
> 8 machines over tcp). Not everything worked as specified, though:
> * timestamp-output : works
> * xterm : doesn't work completely -
> comma-separated rank list:
> Only for the local processes a xterm is opened. The other processes
> (the ones on remote machines) only output to the stdout of the
> calling window.
> (Just to be sure i started my own script for opening separate xterms
> - that did work for the remoties, too)
>
> If a '-1' is given instead of a list of ranks, it fails (locally &
> with remotes):
> [jody_at_localhost neander]$ mpirun -np 4 --xterm -1 ./MPITest
> --------------------------------------------------------------------------
> Sorry! You were supposed to get help about:
> orte-odls-base:xterm-rank-out-of-bounds
> from the file:
> help-odls-base.txt
> But I couldn't find any file matching that name. Sorry!
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> mpirun was unable to start the specified application as it
> encountered an error
> on node localhost. More information may be available above.
> --------------------------------------------------------------------------
> * output-filename : doesn't work here:
> [jody_at_localhost neander]$ mpirun -np 4 --output-filename gnagna ./MPITest
> [jody_at_localhost neander]$ ls -l gna*
> -rw-r--r-- 1 jody morpho 549 2009-02-02 09:07 gnagna.%10lu
>
> There is output from the processes on remote machines on stdout, but none
> from the local ones.
>
>
> A question about installing: i installed the usual way (configure,
> make all install),
> but the new man-files apparently weren't copied to their destination:
> If i do 'man mpirun' i get shown the contents of an old man-file
> (without the new options).
> I had to do ' less /opt//openmpi-1.4a1r20394/share/man/man1/mpirun.1'
> to see them.
>
> About the xterm-option : when the application ends all xterms are
> closed immediately.
> (when doing things 'by hand' i used the -hold option for xterm)
> Would it be possible to add this feature for your xterm option?
> Perhaps by adding a '!' at the end of the rank list?
>
> About orte_iof: with the new version it works, but no matter which
> rank i specify,
> it only prints out rank0's output:
> [jody_at_localhost ~]$ orte-iof --pid 31049 --rank 4 --stdout
> [localhost]I am #0/9 before the barrier
>
>
>
> Thanks
>
> Jody
>
> On Sun, Feb 1, 2009 at 10:49 PM, Ralph Castain <rhc_at_[hidden]> wrote:
>> I'm afraid we discovered a bug in optimized builds with r20392. Please use
>> any tarball with r20394 or above.
>>
>> Sorry for the confusion
>> Ralph
>>
>>
>> On Feb 1, 2009, at 5:27 AM, Jeff Squyres wrote:
>>
>>> On Jan 31, 2009, at 11:39 AM, Ralph Castain wrote:
>>>
>>>> For anyone following this thread:
>>>>
>>>> I have completed the IOF options discussed below. Specifically, I have
>>>> added the following:
>>>>
>>>> * a new "timestamp-output" option that timestamp's each line of output
>>>>
>>>> * a new "output-filename" option that redirects each proc's output to a
>>>> separate rank-named file.
>>>>
>>>> * a new "xterm" option that redirects the output of the specified ranks
>>>> to a separate xterm window.
>>>>
>>>> You can obtain a copy of the updated code at:
>>>>
>>>> http://www.open-mpi.org/nightly/trunk/openmpi-1.4a1r20392.tar.gz
>>>
>>> Sweet stuff. :-)
>>>
>>> Note that the URL/tarball that Ralph cites is a nightly snapshot and will
>>> expire after a while -- we only keep the most 5 recent nightly tarballs
>>> available. You can find Ralph's new IOF stuff in any 1.4a1 nightly tarball
>>> after the one he cited above. Note that the last part of the tarball name
>>> refers to the subversion commit number (which increases monotonically); any
>>> 1.4 nightly snapshot tarball beyond "r20392" will contain this new IOF
>>> stuff. Here's where to get our nightly snapshot tarballs:
>>>
>>> http://www.open-mpi.org/nightly/trunk/
>>>
>>> Don't read anything into the "1.4" version number -- we've just bumped the
>>> version number internally to be different than the current stable series
>>> (1.3). We haven't yet branched for the v1.4 series; hence, "1.4a1"
>>> currently refers to our development trunk.
>>>
>>> --
>>> Jeff Squyres
>>> Cisco Systems
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>