Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Handling output of processes
From: Ralph Castain (rhc_at_[hidden])
Date: 2009-02-02 08:50:50


Okay, I have this fixed and the man page updated as of r20396.

Thanks again for finding and reporting this bug!

Ralph

On Feb 2, 2009, at 5:55 AM, Ralph Castain wrote:

> Hmnmm...well, it shouldn't crash (so I'll have to fix that), but it
> should fail. The --report-pid option takes an argument, which wasn't
> provided here. I'll check the man page to ensure it is up-to-date.
>
> What it should tell you is that --report-pid takes either a '-' to
> indicate that the pid should be output to stdout, a '+' for stderr,
> or a filename.
>
> Thanks for smoke testing it!
> Ralph
>
> On Feb 2, 2009, at 3:06 AM, jody wrote:
>
>> Hi Ralph
>> one more thing i noticed while trying out orte_iof again.
>>
>> The option --report-pid crashes mpirun:
>> [jody_at_localhost neander]$ mpirun -report-pid -np 2 ./MPITest
>> [localhost:31146] *** Process received signal ***
>> [localhost:31146] Signal: Segmentation fault (11)
>> [localhost:31146] Signal code: Address not mapped (1)
>> [localhost:31146] Failing at address: 0x24
>> [localhost:31146] [ 0] [0x11040c]
>> [localhost:31146] [ 1] /opt/openmpi/lib/openmpi/mca_odls_default.so
>> [0x1e8f9d]
>> [localhost:31146] [ 2]
>> /opt/openmpi/lib/libopen-rte.so.0(orte_daemon_cmd_processor+0x4d1)
>> [0x132541]
>> [localhost:31146] [ 3] /opt/openmpi/lib/libopen-pal.so.0 [0x170248]
>> [localhost:31146] [ 4]
>> /opt/openmpi/lib/libopen-pal.so.0(opal_event_loop+0x27) [0x170497]
>> [localhost:31146] [ 5]
>> /opt/openmpi/lib/libopen-pal.so.0(opal_progress+0xcb) [0x16399b]
>> [localhost:31146] [ 6]
>> /opt/openmpi/lib/libopen-rte.so.0(orte_plm_base_launch_apps+0x30d)
>> [0x1441ad]
>> [localhost:31146] [ 7] /opt/openmpi/lib/openmpi/mca_plm_rsh.so
>> [0x1c833b]
>> [localhost:31146] [ 8] mpirun [0x804acf6]
>> [localhost:31146] [ 9] mpirun [0x804a0a6]
>> [localhost:31146] [10] /lib/libc.so.6(__libc_start_main+0xe0)
>> [0x98d390]
>> [localhost:31146] [11] mpirun [0x8049fd1]
>> [localhost:31146] *** End of error message ***
>> Segmentation fault
>>
>> This always happens, irrespective of the number of processes,
>> or whether locally only or with remote machines.
>>
>> Jody
>>
>> On Mon, Feb 2, 2009 at 10:55 AM, jody <jody.xha_at_[hidden]> wrote:
>>> Hi Ralph
>>> The new options are great stuff!
>>> Following your suggestion, i downloaded and installed
>>>
>>> http://www.open-mpi.org/nightly/trunk/openmpi-1.4a1r20392.tar.gz
>>>
>>> and tested the new options. (i have a simple cluster of
>>> 8 machines over tcp). Not everything worked as specified, though:
>>> * timestamp-output : works
>>> * xterm : doesn't work completely -
>>> comma-separated rank list:
>>> Only for the local processes a xterm is opened. The other processes
>>> (the ones on remote machines) only output to the stdout of the
>>> calling window.
>>> (Just to be sure i started my own script for opening separate xterms
>>> - that did work for the remoties, too)
>>>
>>> If a '-1' is given instead of a list of ranks, it fails (locally &
>>> with remotes):
>>> [jody_at_localhost neander]$ mpirun -np 4 --xterm -1 ./MPITest
>>>
>>> --------------------------------------------------------------------------
>>> Sorry! You were supposed to get help about:
>>> orte-odls-base:xterm-rank-out-of-bounds
>>> from the file:
>>> help-odls-base.txt
>>> But I couldn't find any file matching that name. Sorry!
>>>
>>> --------------------------------------------------------------------------
>>>
>>> --------------------------------------------------------------------------
>>> mpirun was unable to start the specified application as it
>>> encountered an error
>>> on node localhost. More information may be available above.
>>>
>>> --------------------------------------------------------------------------
>>> * output-filename : doesn't work here:
>>> [jody_at_localhost neander]$ mpirun -np 4 --output-filename
>>> gnagna ./MPITest
>>> [jody_at_localhost neander]$ ls -l gna*
>>> -rw-r--r-- 1 jody morpho 549 2009-02-02 09:07 gnagna.%10lu
>>>
>>> There is output from the processes on remote machines on stdout,
>>> but none
>>> from the local ones.
>>>
>>>
>>> A question about installing: i installed the usual way (configure,
>>> make all install),
>>> but the new man-files apparently weren't copied to their
>>> destination:
>>> If i do 'man mpirun' i get shown the contents of an old man-file
>>> (without the new options).
>>> I had to do ' less /opt//openmpi-1.4a1r20394/share/man/man1/
>>> mpirun.1'
>>> to see them.
>>>
>>> About the xterm-option : when the application ends all xterms are
>>> closed immediately.
>>> (when doing things 'by hand' i used the -hold option for xterm)
>>> Would it be possible to add this feature for your xterm option?
>>> Perhaps by adding a '!' at the end of the rank list?
>>>
>>> About orte_iof: with the new version it works, but no matter which
>>> rank i specify,
>>> it only prints out rank0's output:
>>> [jody_at_localhost ~]$ orte-iof --pid 31049 --rank 4 --stdout
>>> [localhost]I am #0/9 before the barrier
>>>
>>>
>>>
>>> Thanks
>>>
>>> Jody
>>>
>>> On Sun, Feb 1, 2009 at 10:49 PM, Ralph Castain <rhc_at_[hidden]> wrote:
>>>> I'm afraid we discovered a bug in optimized builds with r20392.
>>>> Please use
>>>> any tarball with r20394 or above.
>>>>
>>>> Sorry for the confusion
>>>> Ralph
>>>>
>>>>
>>>> On Feb 1, 2009, at 5:27 AM, Jeff Squyres wrote:
>>>>
>>>>> On Jan 31, 2009, at 11:39 AM, Ralph Castain wrote:
>>>>>
>>>>>> For anyone following this thread:
>>>>>>
>>>>>> I have completed the IOF options discussed below. Specifically,
>>>>>> I have
>>>>>> added the following:
>>>>>>
>>>>>> * a new "timestamp-output" option that timestamp's each line of
>>>>>> output
>>>>>>
>>>>>> * a new "output-filename" option that redirects each proc's
>>>>>> output to a
>>>>>> separate rank-named file.
>>>>>>
>>>>>> * a new "xterm" option that redirects the output of the
>>>>>> specified ranks
>>>>>> to a separate xterm window.
>>>>>>
>>>>>> You can obtain a copy of the updated code at:
>>>>>>
>>>>>> http://www.open-mpi.org/nightly/trunk/openmpi-1.4a1r20392.tar.gz
>>>>>
>>>>> Sweet stuff. :-)
>>>>>
>>>>> Note that the URL/tarball that Ralph cites is a nightly snapshot
>>>>> and will
>>>>> expire after a while -- we only keep the most 5 recent nightly
>>>>> tarballs
>>>>> available. You can find Ralph's new IOF stuff in any 1.4a1
>>>>> nightly tarball
>>>>> after the one he cited above. Note that the last part of the
>>>>> tarball name
>>>>> refers to the subversion commit number (which increases
>>>>> monotonically); any
>>>>> 1.4 nightly snapshot tarball beyond "r20392" will contain this
>>>>> new IOF
>>>>> stuff. Here's where to get our nightly snapshot tarballs:
>>>>>
>>>>> http://www.open-mpi.org/nightly/trunk/
>>>>>
>>>>> Don't read anything into the "1.4" version number -- we've just
>>>>> bumped the
>>>>> version number internally to be different than the current
>>>>> stable series
>>>>> (1.3). We haven't yet branched for the v1.4 series; hence,
>>>>> "1.4a1"
>>>>> currently refers to our development trunk.
>>>>>
>>>>> --
>>>>> Jeff Squyres
>>>>> Cisco Systems
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>