Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] XML request
From: Greg Watson (g.watson_at_[hidden])
Date: 2009-08-20 14:36:18


Hi Ralph,

Cool!

Regarding the scope of the tags, I never really thought about output
from the command itself. I propose that any output that can't
otherwise be classified be sent using the appropriate <stdout> or
<stderr> tags with no "rank" attribute.

Cheers,
Greg

On Aug 20, 2009, at 1:52 PM, Ralph Castain wrote:

> Hi Greg
>
> I can catch most of these and will do so as they flow through a
> single code path. However, there are places sprinkled throughout the
> code where people directly output warning and error info - these
> will be more problematic and represent a degree of change that is
> probably outside the comfort zone for the 1.3 series.
>
> After talking with Jeff about it, we propose that I make the simple
> change that will catch messages like those below. For the broader
> problem, we believe that some discussion with you about the degree
> of granularity exposed through the xml output might help define the
> overall solution. For example, can we just label all stderr messages
> with <stderr></stderr> tags, or do you need more detailed tagging
> (e.g., rank, file, line, etc.)?
>
> That discussion can occur later - for now, I'll catch these. Will
> let you know when it is ready to test!
>
> Ralph
>
> On Aug 20, 2009, at 11:16 AM, Greg Watson wrote:
>
>> Ralph,
>>
>> One more thing. Even with XML enabled, I notice that some error
>> messages are still sent to stderr without XML tags (see below.) Any
>> chance these could be sent to stdout wrapped in <stderr></stderr>
>> tags?
>>
>> Thanks,
>> Greg
>>
>> $ mpirun -mca orte_show_resolved_nodenames 1 -xml -display-map -np
>> 1 ./pop pop_in
>> <mpirun>
>> <map>
>> <host name="4pcnuggets" slots="1" max_slots="0">
>> <process rank="0"/>
>> </host>
>> </map>
>> --------------------------------------------------------------------------
>> MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
>> with errorcode 0.
>>
>> NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
>> You may or may not see output from other processes, depending on
>> exactly when Open MPI kills them.
>> --------------------------------------------------------------------------
>> <stdout
>> rank
>> =
>> "0
>> ">
>> ------------------------------------------------------------------------&#010
>> ;</stdout>
>> <stdout rank="0"> &#010;</stdout>
>> <stdout rank="0"> Parallel Ocean Program (POP) &#010;</stdout>
>> <stdout rank="0"> Version 2.0.1 Released 21 Jan 2004&#010;</stdout>
>> <stdout rank="0"> &#010;</stdout>
>> <stdout
>> rank
>> =
>> "0
>> ">
>> ------------------------------------------------------------------------&#010
>> ;</stdout>
>> <stdout
>> rank
>> =
>> "0
>> ">
>> ------------------------------------------------------------------------&#010
>> ;</stdout>
>> <stdout rank="0"> &#010;</stdout>
>> <stdout rank="0">POP aborting...&#010;</stdout>
>> <stdout rank="0"> Input nprocs not same as system request&#010;</
>> stdout>
>> <stdout rank="0"> &#010;</stdout>
>> <stdout
>> rank
>> =
>> "0
>> ">
>> ------------------------------------------------------------------------&#010
>> ;</stdout>
>> --------------------------------------------------------------------------
>> mpirun has exited due to process rank 0 with PID 15201 on
>> node 4pcnuggets exiting without calling "finalize". This may
>> have caused other processes in the application to be
>> terminated by signals sent by mpirun (as reported here).
>> --------------------------------------------------------------------------
>>
>>
>> On Aug 19, 2009, at 10:48 AM, Greg Watson wrote:
>>
>>> Ralph,
>>>
>>> Looks like it's working now.
>>>
>>> Thanks,
>>> Greg
>>>
>>> On Aug 18, 2009, at 5:21 PM, Ralph Castain wrote:
>>>
>>>> Give r21836 a try and see if it still gets out of order.
>>>>
>>>> Ralph
>>>>
>>>>
>>>> On Aug 18, 2009, at 2:18 PM, Greg Watson wrote:
>>>>
>>>>> Ralph,
>>>>>
>>>>> Not sure that's it because all XML output should be via stdout.
>>>>>
>>>>> Greg
>>>>>
>>>>> On Aug 18, 2009, at 3:53 PM, Ralph Castain wrote:
>>>>>
>>>>>> Hmmm....let me try adding a fflush after the <mpirun> output to
>>>>>> force it out. Best guess is that you are seeing a little race
>>>>>> condition - the map output is coming over stderr, while the
>>>>>> <mpirun> tag is coming over stdout.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Aug 18, 2009 at 12:53 PM, Greg Watson <g.watson_at_[hidden]
>>>>>> > wrote:
>>>>>> Hi Ralph,
>>>>>>
>>>>>> I'm seeing something strange. When I run "mpirun -mca
>>>>>> orte_show_resolved_nodenames 1 -xml -display-map...", I see:
>>>>>>
>>>>>> <mpirun>
>>>>>> <map>
>>>>>> <host name="Jarrah.local" slots="1" max_slots="0">
>>>>>> <process rank="0"/>
>>>>>> <process rank="1"/>
>>>>>> <process rank="2"/>
>>>>>> <process rank="3"/>
>>>>>> </host>
>>>>>> </map>
>>>>>> ...
>>>>>> </mpirun>
>>>>>>
>>>>>> but when I run " ssh localhost mpirun -mca
>>>>>> orte_show_resolved_nodenames 1 -xml -display-map...", I see:
>>>>>>
>>>>>> <map>
>>>>>> <host name="Jarrah.local" slots="1" max_slots="0">
>>>>>> <process rank="0"/>
>>>>>> <process rank="1"/>
>>>>>> <process rank="2"/>
>>>>>> <process rank="3"/>
>>>>>> </host>
>>>>>> </map>
>>>>>> <mpirun>
>>>>>> ...
>>>>>> </mpirun>
>>>>>>
>>>>>> Any ideas?
>>>>>>
>>>>>> Thanks,
>>>>>> Greg
>>>>>>
>>>>>>
>>>>>> On Aug 17, 2009, at 11:16 PM, Ralph Castain wrote:
>>>>>>
>>>>>> Should be done on trunk with r21826 - would you please give it
>>>>>> a try and let me know if that meets requirements? If so, I'll
>>>>>> move it to 1.3.4.
>>>>>>
>>>>>> Thanks
>>>>>> Ralph
>>>>>>
>>>>>> On Aug 17, 2009, at 6:42 AM, Greg Watson wrote:
>>>>>>
>>>>>> Hi Ralph,
>>>>>>
>>>>>> Yes, you'd just need issue the start tag prior to any other XML
>>>>>> output, then the end tag when it's guaranteed all XML other
>>>>>> output has been sent.
>>>>>>
>>>>>> Greg
>>>>>>
>>>>>> On Aug 17, 2009, at 7:44 AM, Ralph Castain wrote:
>>>>>>
>>>>>> All things are possible - some just a tad more painful than
>>>>>> others.
>>>>>>
>>>>>> It looks like you want the mpirun tags to flow around all
>>>>>> output during the run - i.e., there is only one pair of mpirun
>>>>>> tags that surround anything that might come out of the job. True?
>>>>>>
>>>>>> If so, that would be trivial.
>>>>>>
>>>>>> On Aug 14, 2009, at 9:25 AM, Greg Watson wrote:
>>>>>>
>>>>>> Ralph,
>>>>>>
>>>>>> Would it be possible to get mpirun to issue start and end tags
>>>>>> if the -xml option is used? Currently there is no way to
>>>>>> determine when the output starts and finishes, which makes
>>>>>> parsing the XML tricky, particularly if something else
>>>>>> generates output (e.g. the shell). Something like this would be
>>>>>> ideal:
>>>>>>
>>>>>> <mpirun>
>>>>>> <map>
>>>>>> ...
>>>>>> </map>
>>>>>> <stdout>...</stdout>
>>>>>> <stderr>...</stderr>
>>>>>> </mpirun>
>>>>>>
>>>>>> If we could get it in 1.3.4 even better. :-)
>>>>>>
>>>>>> Thanks,
>>>>>> Greg
>>>>>> _______________________________________________
>>>>>> devel mailing list
>>>>>> devel_at_[hidden]
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>
>>>>>> _______________________________________________
>>>>>> devel mailing list
>>>>>> devel_at_[hidden]
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>
>>>>>> _______________________________________________
>>>>>> devel mailing list
>>>>>> devel_at_[hidden]
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>
>>>>>> _______________________________________________
>>>>>> devel mailing list
>>>>>> devel_at_[hidden]
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>
>>>>>> _______________________________________________
>>>>>> devel mailing list
>>>>>> devel_at_[hidden]
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>
>>>>>> _______________________________________________
>>>>>> devel mailing list
>>>>>> devel_at_[hidden]
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> devel_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>
>>>> _______________________________________________
>>>> devel mailing list
>>>> devel_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel