Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] XML request
From: Greg Watson (g.watson_at_[hidden])
Date: 2009-09-09 09:54:46


Hi Ralph,

Looks good so far. The way I want to use is this to use /dev/tty as
the xml-file and send any other stdout or stderr to /dev/null. I could
use something like 'mpirun -xml-file /dev/tty .... >/dev/null 2>&1',
but the syntax is shell specific which causes a problem the ssh exec
service. I noticed that mpirun has a -output-filename option, but when
I try -output-filename /dev/null, I get:

[Jarrah.local:01581] opal_os_dirpath_create: Error: Unable to create
directory (/dev), unable to set the correct mode [-1]
[Jarrah.local:01581] [[22927,0],0] ORTE_ERROR_LOG: Error in file
ess_hnp_module.c at line 406

Also, I'm not sure if -output-filename redirects both stdout and
stderr, or just stdout.

Any suggestions would be appreciated.

Thanks,
Greg

On Sep 2, 2009, at 2:04 PM, Ralph Castain wrote:

> Okay Greg - give r21930 a whirl. It takes a new cmd line arg -xml-
> file foo as discussed below.
>
> You can also specify it as an MCA param: -mca orte_xml_file foo, or
> OMPI_MCA_orte_xml_file=foo
>
> Let me know how it works
> Ralph
>
> On Aug 31, 2009, at 7:26 PM, Greg Watson wrote:
>
>> Hey Ralph,
>>
>> Unfortunately I don't think this is going to work for us. Most of
>> the time we're starting the mpirun command using the ssh exec or
>> shell service, neither of which provide any mechanism for reading
>> from file descriptors other than 1 or 2. The only alternatives I
>> see are:
>>
>> 1. Provide a separate command that starts mpirun at the end of a
>> pipe that is connected to the fd passed using the -xml-fd argument.
>> This command would need to be part of the OMPI distribution,
>> because the whole purpose of the XML was to provide an out-of-the-
>> box experience when using PTP with OMPI.
>>
>> 2. Implement an -xml-file option, but I could write the code for you.
>>
>> 3. Go back to limiting XML output to the map only.
>>
>> None of these are particularly ideal. If you can think of anything
>> else, let me know.
>>
>> Regards,
>> Greg
>>
>> On Aug 30, 2009, at 10:36 AM, Ralph Castain wrote:
>>
>>> What if we instead offered a -xml-fd N option? I would rather not
>>> create a file myself. However, since you are calling mpirun
>>> yourself, this would allow you to create a pipe on your end, and
>>> then pass us the write end of the pipe. We would then send all XML
>>> output down that pipe.
>>>
>>> Jeff and I chatted about this and felt this might represent the
>>> cleanest solution. Sound okay?
>>>
>>>
>>> On Aug 28, 2009, at 6:33 AM, Greg Watson wrote:
>>>
>>>> Ralph,
>>>>
>>>> Would this be doable? If we could guarantee that the only output
>>>> that went to the file was XML then that would solve the problem.
>>>>
>>>> Greg
>>>>
>>>> On Aug 28, 2009, at 5:39 AM, Ashley Pittman wrote:
>>>>
>>>>> On Thu, 2009-08-27 at 23:46 -0400, Greg Watson wrote:
>>>>>> I didn't realize it would be such a problem. Unfortunately
>>>>>> there is
>>>>>> simply no way to reliably parse this kind of output, because it
>>>>>> is
>>>>>> impossible to know what the error messages are going to be, and
>>>>>> presumably they could include XML-like formatting as well. The
>>>>>> whole
>>>>>> point of the XML was to try and simplify the parsing of the
>>>>>> mpirun
>>>>>> output, but it now looks like it's actually more difficult.
>>>>>
>>>>> I thought this might be difficult when I saw you were attempting
>>>>> it.
>>>>>
>>>>> Let me tell you about what Valgrind does because they have similar
>>>>> problems. Initially they just had added --xml=yes option which
>>>>> put most
>>>>> of the valgrind (as distinct from application) output in xml
>>>>> tags. This
>>>>> works for simple cases and if you mix it with --log-
>>>>> file=<filename> it
>>>>> keeps the valgrind output separate from the application output.
>>>>>
>>>>> Unfortunately there are lots of places throughout the code where
>>>>> developers have inserted print statements (in the valgrind case
>>>>> these
>>>>> all go to the logfile) which means the xml is interspersed with
>>>>> non-xml
>>>>> output and hence impossibly to parse reliably.
>>>>>
>>>>> What they have now done in the current release is to add a extra
>>>>> --xml-file=<file> option as well as the --log-file=<file>
>>>>> option. Now
>>>>> in the simple case all output from a normal run goes well
>>>>> formatted to
>>>>> the xml file and the log file remains empty, any tool that wraps
>>>>> around
>>>>> valgrind can parse the xml which is guaranteed to be well
>>>>> formatted and
>>>>> it can detect the presence of other messages by looking for
>>>>> output in
>>>>> the standard log file. The onus is then on tool writers to look
>>>>> at the
>>>>> remaining cases and decide if they are common or important
>>>>> enough to
>>>>> wrap in xml and propose a patch or removal of the non-formatted
>>>>> message
>>>>> entirely.
>>>>>
>>>>> The above seems to work well, having a separate log file for xml
>>>>> is a
>>>>> huge step forward as it means whilst the xml isn't necessarily
>>>>> complete
>>>>> you can both parse it and are able to tell when it's missing
>>>>> something.
>>>>>
>>>>> Of course when looking at this level of tool integration it's
>>>>> better to
>>>>> use sockets that files (e.g. --xml-socket=localhost:1234 rather
>>>>> than
>>>>> --xml-file=/tmp/app_XXXX.xml) but I'll leave that up to you.
>>>>>
>>>>> I hope this gives you something to think over.
>>>>>
>>>>> Ashley,
>>>>>
>>>>> --
>>>>>
>>>>> Ashley Pittman, Bath, UK.
>>>>>
>>>>> Padb - A parallel job inspection tool for cluster computing
>>>>> http://padb.pittman.org.uk
>>>>>
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> devel_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>
>>>> _______________________________________________
>>>> devel mailing list
>>>> devel_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel