Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] XML request
From: Ralph Castain (rhc_at_[hidden])
Date: 2009-09-09 12:17:04


Hmmm....I never considered the possibility of output-filename being
used that way. Interesting idea!

I can fix that one, I think - let me see what I can do.

BTW: output-filename redirects stdout, stderr, and stddiag. So you
would get rid of everything that doesn't come through the xml path.

On Sep 9, 2009, at 7:54 AM, Greg Watson wrote:

> Hi Ralph,
>
> Looks good so far. The way I want to use is this to use /dev/tty as
> the xml-file and send any other stdout or stderr to /dev/null. I
> could use something like 'mpirun -xml-file /dev/tty .... >/dev/null
> 2>&1', but the syntax is shell specific which causes a problem the
> ssh exec service. I noticed that mpirun has a -output-filename
> option, but when I try -output-filename /dev/null, I get:
>
> [Jarrah.local:01581] opal_os_dirpath_create: Error: Unable to create
> directory (/dev), unable to set the correct mode [-1]
> [Jarrah.local:01581] [[22927,0],0] ORTE_ERROR_LOG: Error in file
> ess_hnp_module.c at line 406
>
> Also, I'm not sure if -output-filename redirects both stdout and
> stderr, or just stdout.
>
> Any suggestions would be appreciated.
>
> Thanks,
> Greg
>
>
> On Sep 2, 2009, at 2:04 PM, Ralph Castain wrote:
>
>> Okay Greg - give r21930 a whirl. It takes a new cmd line arg -xml-
>> file foo as discussed below.
>>
>> You can also specify it as an MCA param: -mca orte_xml_file foo, or
>> OMPI_MCA_orte_xml_file=foo
>>
>> Let me know how it works
>> Ralph
>>
>> On Aug 31, 2009, at 7:26 PM, Greg Watson wrote:
>>
>>> Hey Ralph,
>>>
>>> Unfortunately I don't think this is going to work for us. Most of
>>> the time we're starting the mpirun command using the ssh exec or
>>> shell service, neither of which provide any mechanism for reading
>>> from file descriptors other than 1 or 2. The only alternatives I
>>> see are:
>>>
>>> 1. Provide a separate command that starts mpirun at the end of a
>>> pipe that is connected to the fd passed using the -xml-fd
>>> argument. This command would need to be part of the OMPI
>>> distribution, because the whole purpose of the XML was to provide
>>> an out-of-the-box experience when using PTP with OMPI.
>>>
>>> 2. Implement an -xml-file option, but I could write the code for
>>> you.
>>>
>>> 3. Go back to limiting XML output to the map only.
>>>
>>> None of these are particularly ideal. If you can think of anything
>>> else, let me know.
>>>
>>> Regards,
>>> Greg
>>>
>>> On Aug 30, 2009, at 10:36 AM, Ralph Castain wrote:
>>>
>>>> What if we instead offered a -xml-fd N option? I would rather not
>>>> create a file myself. However, since you are calling mpirun
>>>> yourself, this would allow you to create a pipe on your end, and
>>>> then pass us the write end of the pipe. We would then send all
>>>> XML output down that pipe.
>>>>
>>>> Jeff and I chatted about this and felt this might represent the
>>>> cleanest solution. Sound okay?
>>>>
>>>>
>>>> On Aug 28, 2009, at 6:33 AM, Greg Watson wrote:
>>>>
>>>>> Ralph,
>>>>>
>>>>> Would this be doable? If we could guarantee that the only output
>>>>> that went to the file was XML then that would solve the problem.
>>>>>
>>>>> Greg
>>>>>
>>>>> On Aug 28, 2009, at 5:39 AM, Ashley Pittman wrote:
>>>>>
>>>>>> On Thu, 2009-08-27 at 23:46 -0400, Greg Watson wrote:
>>>>>>> I didn't realize it would be such a problem. Unfortunately
>>>>>>> there is
>>>>>>> simply no way to reliably parse this kind of output, because
>>>>>>> it is
>>>>>>> impossible to know what the error messages are going to be, and
>>>>>>> presumably they could include XML-like formatting as well. The
>>>>>>> whole
>>>>>>> point of the XML was to try and simplify the parsing of the
>>>>>>> mpirun
>>>>>>> output, but it now looks like it's actually more difficult.
>>>>>>
>>>>>> I thought this might be difficult when I saw you were
>>>>>> attempting it.
>>>>>>
>>>>>> Let me tell you about what Valgrind does because they have
>>>>>> similar
>>>>>> problems. Initially they just had added --xml=yes option which
>>>>>> put most
>>>>>> of the valgrind (as distinct from application) output in xml
>>>>>> tags. This
>>>>>> works for simple cases and if you mix it with --log-
>>>>>> file=<filename> it
>>>>>> keeps the valgrind output separate from the application output.
>>>>>>
>>>>>> Unfortunately there are lots of places throughout the code where
>>>>>> developers have inserted print statements (in the valgrind case
>>>>>> these
>>>>>> all go to the logfile) which means the xml is interspersed with
>>>>>> non-xml
>>>>>> output and hence impossibly to parse reliably.
>>>>>>
>>>>>> What they have now done in the current release is to add a extra
>>>>>> --xml-file=<file> option as well as the --log-file=<file>
>>>>>> option. Now
>>>>>> in the simple case all output from a normal run goes well
>>>>>> formatted to
>>>>>> the xml file and the log file remains empty, any tool that
>>>>>> wraps around
>>>>>> valgrind can parse the xml which is guaranteed to be well
>>>>>> formatted and
>>>>>> it can detect the presence of other messages by looking for
>>>>>> output in
>>>>>> the standard log file. The onus is then on tool writers to
>>>>>> look at the
>>>>>> remaining cases and decide if they are common or important
>>>>>> enough to
>>>>>> wrap in xml and propose a patch or removal of the non-formatted
>>>>>> message
>>>>>> entirely.
>>>>>>
>>>>>> The above seems to work well, having a separate log file for
>>>>>> xml is a
>>>>>> huge step forward as it means whilst the xml isn't necessarily
>>>>>> complete
>>>>>> you can both parse it and are able to tell when it's missing
>>>>>> something.
>>>>>>
>>>>>> Of course when looking at this level of tool integration it's
>>>>>> better to
>>>>>> use sockets that files (e.g. --xml-socket=localhost:1234 rather
>>>>>> than
>>>>>> --xml-file=/tmp/app_XXXX.xml) but I'll leave that up to you.
>>>>>>
>>>>>> I hope this gives you something to think over.
>>>>>>
>>>>>> Ashley,
>>>>>>
>>>>>> --
>>>>>>
>>>>>> Ashley Pittman, Bath, UK.
>>>>>>
>>>>>> Padb - A parallel job inspection tool for cluster computing
>>>>>> http://padb.pittman.org.uk
>>>>>>
>>>>>> _______________________________________________
>>>>>> devel mailing list
>>>>>> devel_at_[hidden]
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> devel_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>
>>>> _______________________________________________
>>>> devel mailing list
>>>> devel_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel