Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Handling output of processes
From: Ralph Castain (rhc_at_[hidden])
Date: 2009-02-02 07:55:09


Hmnmm...well, it shouldn't crash (so I'll have to fix that), but it
should fail. The --report-pid option takes an argument, which wasn't
provided here. I'll check the man page to ensure it is up-to-date.

What it should tell you is that --report-pid takes either a '-' to
indicate that the pid should be output to stdout, a '+' for stderr, or
a filename.

Thanks for smoke testing it!
Ralph

On Feb 2, 2009, at 3:06 AM, jody wrote:

> Hi Ralph
> one more thing i noticed while trying out orte_iof again.
>
> The option --report-pid crashes mpirun:
> [jody_at_localhost neander]$ mpirun -report-pid -np 2 ./MPITest
> [localhost:31146] *** Process received signal ***
> [localhost:31146] Signal: Segmentation fault (11)
> [localhost:31146] Signal code: Address not mapped (1)
> [localhost:31146] Failing at address: 0x24
> [localhost:31146] [ 0] [0x11040c]
> [localhost:31146] [ 1] /opt/openmpi/lib/openmpi/mca_odls_default.so
> [0x1e8f9d]
> [localhost:31146] [ 2]
> /opt/openmpi/lib/libopen-rte.so.0(orte_daemon_cmd_processor+0x4d1)
> [0x132541]
> [localhost:31146] [ 3] /opt/openmpi/lib/libopen-pal.so.0 [0x170248]
> [localhost:31146] [ 4]
> /opt/openmpi/lib/libopen-pal.so.0(opal_event_loop+0x27) [0x170497]
> [localhost:31146] [ 5]
> /opt/openmpi/lib/libopen-pal.so.0(opal_progress+0xcb) [0x16399b]
> [localhost:31146] [ 6]
> /opt/openmpi/lib/libopen-rte.so.0(orte_plm_base_launch_apps+0x30d)
> [0x1441ad]
> [localhost:31146] [ 7] /opt/openmpi/lib/openmpi/mca_plm_rsh.so
> [0x1c833b]
> [localhost:31146] [ 8] mpirun [0x804acf6]
> [localhost:31146] [ 9] mpirun [0x804a0a6]
> [localhost:31146] [10] /lib/libc.so.6(__libc_start_main+0xe0)
> [0x98d390]
> [localhost:31146] [11] mpirun [0x8049fd1]
> [localhost:31146] *** End of error message ***
> Segmentation fault
>
> This always happens, irrespective of the number of processes,
> or whether locally only or with remote machines.
>
> Jody
>
> On Mon, Feb 2, 2009 at 10:55 AM, jody <jody.xha_at_[hidden]> wrote:
>> Hi Ralph
>> The new options are great stuff!
>> Following your suggestion, i downloaded and installed
>>
>> http://www.open-mpi.org/nightly/trunk/openmpi-1.4a1r20392.tar.gz
>>
>> and tested the new options. (i have a simple cluster of
>> 8 machines over tcp). Not everything worked as specified, though:
>> * timestamp-output : works
>> * xterm : doesn't work completely -
>> comma-separated rank list:
>> Only for the local processes a xterm is opened. The other processes
>> (the ones on remote machines) only output to the stdout of the
>> calling window.
>> (Just to be sure i started my own script for opening separate xterms
>> - that did work for the remoties, too)
>>
>> If a '-1' is given instead of a list of ranks, it fails (locally &
>> with remotes):
>> [jody_at_localhost neander]$ mpirun -np 4 --xterm -1 ./MPITest
>>
>> --------------------------------------------------------------------------
>> Sorry! You were supposed to get help about:
>> orte-odls-base:xterm-rank-out-of-bounds
>> from the file:
>> help-odls-base.txt
>> But I couldn't find any file matching that name. Sorry!
>>
>> --------------------------------------------------------------------------
>>
>> --------------------------------------------------------------------------
>> mpirun was unable to start the specified application as it
>> encountered an error
>> on node localhost. More information may be available above.
>>
>> --------------------------------------------------------------------------
>> * output-filename : doesn't work here:
>> [jody_at_localhost neander]$ mpirun -np 4 --output-filename
>> gnagna ./MPITest
>> [jody_at_localhost neander]$ ls -l gna*
>> -rw-r--r-- 1 jody morpho 549 2009-02-02 09:07 gnagna.%10lu
>>
>> There is output from the processes on remote machines on stdout,
>> but none
>> from the local ones.
>>
>>
>> A question about installing: i installed the usual way (configure,
>> make all install),
>> but the new man-files apparently weren't copied to their destination:
>> If i do 'man mpirun' i get shown the contents of an old man-file
>> (without the new options).
>> I had to do ' less /opt//openmpi-1.4a1r20394/share/man/man1/mpirun.
>> 1'
>> to see them.
>>
>> About the xterm-option : when the application ends all xterms are
>> closed immediately.
>> (when doing things 'by hand' i used the -hold option for xterm)
>> Would it be possible to add this feature for your xterm option?
>> Perhaps by adding a '!' at the end of the rank list?
>>
>> About orte_iof: with the new version it works, but no matter which
>> rank i specify,
>> it only prints out rank0's output:
>> [jody_at_localhost ~]$ orte-iof --pid 31049 --rank 4 --stdout
>> [localhost]I am #0/9 before the barrier
>>
>>
>>
>> Thanks
>>
>> Jody
>>
>> On Sun, Feb 1, 2009 at 10:49 PM, Ralph Castain <rhc_at_[hidden]> wrote:
>>> I'm afraid we discovered a bug in optimized builds with r20392.
>>> Please use
>>> any tarball with r20394 or above.
>>>
>>> Sorry for the confusion
>>> Ralph
>>>
>>>
>>> On Feb 1, 2009, at 5:27 AM, Jeff Squyres wrote:
>>>
>>>> On Jan 31, 2009, at 11:39 AM, Ralph Castain wrote:
>>>>
>>>>> For anyone following this thread:
>>>>>
>>>>> I have completed the IOF options discussed below. Specifically,
>>>>> I have
>>>>> added the following:
>>>>>
>>>>> * a new "timestamp-output" option that timestamp's each line of
>>>>> output
>>>>>
>>>>> * a new "output-filename" option that redirects each proc's
>>>>> output to a
>>>>> separate rank-named file.
>>>>>
>>>>> * a new "xterm" option that redirects the output of the
>>>>> specified ranks
>>>>> to a separate xterm window.
>>>>>
>>>>> You can obtain a copy of the updated code at:
>>>>>
>>>>> http://www.open-mpi.org/nightly/trunk/openmpi-1.4a1r20392.tar.gz
>>>>
>>>> Sweet stuff. :-)
>>>>
>>>> Note that the URL/tarball that Ralph cites is a nightly snapshot
>>>> and will
>>>> expire after a while -- we only keep the most 5 recent nightly
>>>> tarballs
>>>> available. You can find Ralph's new IOF stuff in any 1.4a1
>>>> nightly tarball
>>>> after the one he cited above. Note that the last part of the
>>>> tarball name
>>>> refers to the subversion commit number (which increases
>>>> monotonically); any
>>>> 1.4 nightly snapshot tarball beyond "r20392" will contain this
>>>> new IOF
>>>> stuff. Here's where to get our nightly snapshot tarballs:
>>>>
>>>> http://www.open-mpi.org/nightly/trunk/
>>>>
>>>> Don't read anything into the "1.4" version number -- we've just
>>>> bumped the
>>>> version number internally to be different than the current stable
>>>> series
>>>> (1.3). We haven't yet branched for the v1.4 series; hence, "1.4a1"
>>>> currently refers to our development trunk.
>>>>
>>>> --
>>>> Jeff Squyres
>>>> Cisco Systems
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users