Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] XML request
From: Ralph Castain (rhc_at_[hidden])
Date: 2009-09-09 23:05:50


Well, I fixed it so that output-filename can take /dev/null as an
argument. Unfortunately, that doesn't help this issue as it merrily
redirects all stdout/err from the procs to /dev/null. :-/

Of course, that -is- what output-filename was supposed to do. It is a
way of sending the output from the procs to rank-specific files, not a
way of redirecting mpirun's stdout/err.

I guess I don't see any way to do what you want other than to do the
output redirection at the shell level. We face the same problem
regarding the shell-specific syntax when we do ssh, but we get around
it by (a) sensing the local shell type, and then (b) adding the logic
to create the proper shell command. You are welcome to look at our
code as an example of how to do this in C:

orte/mca/plm/rsh/plm_rsh_module.c

I would think there are Java classes already setup to resolve that
problem, though - seems a pretty basic issue.

Sorry I can't be of more help...
Ralph

On Sep 9, 2009, at 10:17 AM, Ralph Castain wrote:

> Hmmm....I never considered the possibility of output-filename being
> used that way. Interesting idea!
>
> I can fix that one, I think - let me see what I can do.
>
> BTW: output-filename redirects stdout, stderr, and stddiag. So you
> would get rid of everything that doesn't come through the xml path.
>
>
> On Sep 9, 2009, at 7:54 AM, Greg Watson wrote:
>
>> Hi Ralph,
>>
>> Looks good so far. The way I want to use is this to use /dev/tty as
>> the xml-file and send any other stdout or stderr to /dev/null. I
>> could use something like 'mpirun -xml-file /dev/tty .... >/dev/null
>> 2>&1', but the syntax is shell specific which causes a problem the
>> ssh exec service. I noticed that mpirun has a -output-filename
>> option, but when I try -output-filename /dev/null, I get:
>>
>> [Jarrah.local:01581] opal_os_dirpath_create: Error: Unable to
>> create directory (/dev), unable to set the correct mode [-1]
>> [Jarrah.local:01581] [[22927,0],0] ORTE_ERROR_LOG: Error in file
>> ess_hnp_module.c at line 406
>>
>> Also, I'm not sure if -output-filename redirects both stdout and
>> stderr, or just stdout.
>>
>> Any suggestions would be appreciated.
>>
>> Thanks,
>> Greg
>>
>>
>> On Sep 2, 2009, at 2:04 PM, Ralph Castain wrote:
>>
>>> Okay Greg - give r21930 a whirl. It takes a new cmd line arg -xml-
>>> file foo as discussed below.
>>>
>>> You can also specify it as an MCA param: -mca orte_xml_file foo,
>>> or OMPI_MCA_orte_xml_file=foo
>>>
>>> Let me know how it works
>>> Ralph
>>>
>>> On Aug 31, 2009, at 7:26 PM, Greg Watson wrote:
>>>
>>>> Hey Ralph,
>>>>
>>>> Unfortunately I don't think this is going to work for us. Most of
>>>> the time we're starting the mpirun command using the ssh exec or
>>>> shell service, neither of which provide any mechanism for reading
>>>> from file descriptors other than 1 or 2. The only alternatives I
>>>> see are:
>>>>
>>>> 1. Provide a separate command that starts mpirun at the end of a
>>>> pipe that is connected to the fd passed using the -xml-fd
>>>> argument. This command would need to be part of the OMPI
>>>> distribution, because the whole purpose of the XML was to provide
>>>> an out-of-the-box experience when using PTP with OMPI.
>>>>
>>>> 2. Implement an -xml-file option, but I could write the code for
>>>> you.
>>>>
>>>> 3. Go back to limiting XML output to the map only.
>>>>
>>>> None of these are particularly ideal. If you can think of
>>>> anything else, let me know.
>>>>
>>>> Regards,
>>>> Greg
>>>>
>>>> On Aug 30, 2009, at 10:36 AM, Ralph Castain wrote:
>>>>
>>>>> What if we instead offered a -xml-fd N option? I would rather
>>>>> not create a file myself. However, since you are calling mpirun
>>>>> yourself, this would allow you to create a pipe on your end, and
>>>>> then pass us the write end of the pipe. We would then send all
>>>>> XML output down that pipe.
>>>>>
>>>>> Jeff and I chatted about this and felt this might represent the
>>>>> cleanest solution. Sound okay?
>>>>>
>>>>>
>>>>> On Aug 28, 2009, at 6:33 AM, Greg Watson wrote:
>>>>>
>>>>>> Ralph,
>>>>>>
>>>>>> Would this be doable? If we could guarantee that the only
>>>>>> output that went to the file was XML then that would solve the
>>>>>> problem.
>>>>>>
>>>>>> Greg
>>>>>>
>>>>>> On Aug 28, 2009, at 5:39 AM, Ashley Pittman wrote:
>>>>>>
>>>>>>> On Thu, 2009-08-27 at 23:46 -0400, Greg Watson wrote:
>>>>>>>> I didn't realize it would be such a problem. Unfortunately
>>>>>>>> there is
>>>>>>>> simply no way to reliably parse this kind of output, because
>>>>>>>> it is
>>>>>>>> impossible to know what the error messages are going to be, and
>>>>>>>> presumably they could include XML-like formatting as well.
>>>>>>>> The whole
>>>>>>>> point of the XML was to try and simplify the parsing of the
>>>>>>>> mpirun
>>>>>>>> output, but it now looks like it's actually more difficult.
>>>>>>>
>>>>>>> I thought this might be difficult when I saw you were
>>>>>>> attempting it.
>>>>>>>
>>>>>>> Let me tell you about what Valgrind does because they have
>>>>>>> similar
>>>>>>> problems. Initially they just had added --xml=yes option
>>>>>>> which put most
>>>>>>> of the valgrind (as distinct from application) output in xml
>>>>>>> tags. This
>>>>>>> works for simple cases and if you mix it with --log-
>>>>>>> file=<filename> it
>>>>>>> keeps the valgrind output separate from the application output.
>>>>>>>
>>>>>>> Unfortunately there are lots of places throughout the code where
>>>>>>> developers have inserted print statements (in the valgrind
>>>>>>> case these
>>>>>>> all go to the logfile) which means the xml is interspersed
>>>>>>> with non-xml
>>>>>>> output and hence impossibly to parse reliably.
>>>>>>>
>>>>>>> What they have now done in the current release is to add a extra
>>>>>>> --xml-file=<file> option as well as the --log-file=<file>
>>>>>>> option. Now
>>>>>>> in the simple case all output from a normal run goes well
>>>>>>> formatted to
>>>>>>> the xml file and the log file remains empty, any tool that
>>>>>>> wraps around
>>>>>>> valgrind can parse the xml which is guaranteed to be well
>>>>>>> formatted and
>>>>>>> it can detect the presence of other messages by looking for
>>>>>>> output in
>>>>>>> the standard log file. The onus is then on tool writers to
>>>>>>> look at the
>>>>>>> remaining cases and decide if they are common or important
>>>>>>> enough to
>>>>>>> wrap in xml and propose a patch or removal of the non-
>>>>>>> formatted message
>>>>>>> entirely.
>>>>>>>
>>>>>>> The above seems to work well, having a separate log file for
>>>>>>> xml is a
>>>>>>> huge step forward as it means whilst the xml isn't necessarily
>>>>>>> complete
>>>>>>> you can both parse it and are able to tell when it's missing
>>>>>>> something.
>>>>>>>
>>>>>>> Of course when looking at this level of tool integration it's
>>>>>>> better to
>>>>>>> use sockets that files (e.g. --xml-socket=localhost:1234
>>>>>>> rather than
>>>>>>> --xml-file=/tmp/app_XXXX.xml) but I'll leave that up to you.
>>>>>>>
>>>>>>> I hope this gives you something to think over.
>>>>>>>
>>>>>>> Ashley,
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>> Ashley Pittman, Bath, UK.
>>>>>>>
>>>>>>> Padb - A parallel job inspection tool for cluster computing
>>>>>>> http://padb.pittman.org.uk
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> devel mailing list
>>>>>>> devel_at_[hidden]
>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>
>>>>>> _______________________________________________
>>>>>> devel mailing list
>>>>>> devel_at_[hidden]
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> devel_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>
>>>> _______________________________________________
>>>> devel mailing list
>>>> devel_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>