Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] XML request
From: Ralph Castain (rhc_at_[hidden])
Date: 2009-08-20 15:24:15


Okay - try r21858.

Ralph

On Aug 20, 2009, at 12:36 PM, Greg Watson wrote:

> Hi Ralph,
>
> Cool!
>
> Regarding the scope of the tags, I never really thought about output
> from the command itself. I propose that any output that can't
> otherwise be classified be sent using the appropriate <stdout> or
> <stderr> tags with no "rank" attribute.
>
> Cheers,
> Greg
>
> On Aug 20, 2009, at 1:52 PM, Ralph Castain wrote:
>
>> Hi Greg
>>
>> I can catch most of these and will do so as they flow through a
>> single code path. However, there are places sprinkled throughout
>> the code where people directly output warning and error info -
>> these will be more problematic and represent a degree of change
>> that is probably outside the comfort zone for the 1.3 series.
>>
>> After talking with Jeff about it, we propose that I make the simple
>> change that will catch messages like those below. For the broader
>> problem, we believe that some discussion with you about the degree
>> of granularity exposed through the xml output might help define the
>> overall solution. For example, can we just label all stderr
>> messages with <stderr></stderr> tags, or do you need more detailed
>> tagging (e.g., rank, file, line, etc.)?
>>
>> That discussion can occur later - for now, I'll catch these. Will
>> let you know when it is ready to test!
>>
>> Ralph
>>
>> On Aug 20, 2009, at 11:16 AM, Greg Watson wrote:
>>
>>> Ralph,
>>>
>>> One more thing. Even with XML enabled, I notice that some error
>>> messages are still sent to stderr without XML tags (see below.)
>>> Any chance these could be sent to stdout wrapped in <stderr></
>>> stderr> tags?
>>>
>>> Thanks,
>>> Greg
>>>
>>> $ mpirun -mca orte_show_resolved_nodenames 1 -xml -display-map -np
>>> 1 ./pop pop_in
>>> <mpirun>
>>> <map>
>>> <host name="4pcnuggets" slots="1" max_slots="0">
>>> <process rank="0"/>
>>> </host>
>>> </map>
>>> --------------------------------------------------------------------------
>>> MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
>>> with errorcode 0.
>>>
>>> NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
>>> You may or may not see output from other processes, depending on
>>> exactly when Open MPI kills them.
>>> --------------------------------------------------------------------------
>>> <stdout
>>> rank
>>> =
>>> "0
>>> ">
>>> ------------------------------------------------------------------------&#010
>>> ;</stdout>
>>> <stdout rank="0"> &#010;</stdout>
>>> <stdout rank="0"> Parallel Ocean Program (POP) &#010;</stdout>
>>> <stdout rank="0"> Version 2.0.1 Released 21 Jan 2004&#010;</stdout>
>>> <stdout rank="0"> &#010;</stdout>
>>> <stdout
>>> rank
>>> =
>>> "0
>>> ">
>>> ------------------------------------------------------------------------&#010
>>> ;</stdout>
>>> <stdout
>>> rank
>>> =
>>> "0
>>> ">
>>> ------------------------------------------------------------------------&#010
>>> ;</stdout>
>>> <stdout rank="0"> &#010;</stdout>
>>> <stdout rank="0">POP aborting...&#010;</stdout>
>>> <stdout rank="0"> Input nprocs not same as system request&#010;</
>>> stdout>
>>> <stdout rank="0"> &#010;</stdout>
>>> <stdout
>>> rank
>>> =
>>> "0
>>> ">
>>> ------------------------------------------------------------------------&#010
>>> ;</stdout>
>>> --------------------------------------------------------------------------
>>> mpirun has exited due to process rank 0 with PID 15201 on
>>> node 4pcnuggets exiting without calling "finalize". This may
>>> have caused other processes in the application to be
>>> terminated by signals sent by mpirun (as reported here).
>>> --------------------------------------------------------------------------
>>>
>>>
>>> On Aug 19, 2009, at 10:48 AM, Greg Watson wrote:
>>>
>>>> Ralph,
>>>>
>>>> Looks like it's working now.
>>>>
>>>> Thanks,
>>>> Greg
>>>>
>>>> On Aug 18, 2009, at 5:21 PM, Ralph Castain wrote:
>>>>
>>>>> Give r21836 a try and see if it still gets out of order.
>>>>>
>>>>> Ralph
>>>>>
>>>>>
>>>>> On Aug 18, 2009, at 2:18 PM, Greg Watson wrote:
>>>>>
>>>>>> Ralph,
>>>>>>
>>>>>> Not sure that's it because all XML output should be via stdout.
>>>>>>
>>>>>> Greg
>>>>>>
>>>>>> On Aug 18, 2009, at 3:53 PM, Ralph Castain wrote:
>>>>>>
>>>>>>> Hmmm....let me try adding a fflush after the <mpirun> output
>>>>>>> to force it out. Best guess is that you are seeing a little
>>>>>>> race condition - the map output is coming over stderr, while
>>>>>>> the <mpirun> tag is coming over stdout.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Aug 18, 2009 at 12:53 PM, Greg Watson <g.watson_at_[hidden]
>>>>>>> > wrote:
>>>>>>> Hi Ralph,
>>>>>>>
>>>>>>> I'm seeing something strange. When I run "mpirun -mca
>>>>>>> orte_show_resolved_nodenames 1 -xml -display-map...", I see:
>>>>>>>
>>>>>>> <mpirun>
>>>>>>> <map>
>>>>>>> <host name="Jarrah.local" slots="1" max_slots="0">
>>>>>>> <process rank="0"/>
>>>>>>> <process rank="1"/>
>>>>>>> <process rank="2"/>
>>>>>>> <process rank="3"/>
>>>>>>> </host>
>>>>>>> </map>
>>>>>>> ...
>>>>>>> </mpirun>
>>>>>>>
>>>>>>> but when I run " ssh localhost mpirun -mca
>>>>>>> orte_show_resolved_nodenames 1 -xml -display-map...", I see:
>>>>>>>
>>>>>>> <map>
>>>>>>> <host name="Jarrah.local" slots="1" max_slots="0">
>>>>>>> <process rank="0"/>
>>>>>>> <process rank="1"/>
>>>>>>> <process rank="2"/>
>>>>>>> <process rank="3"/>
>>>>>>> </host>
>>>>>>> </map>
>>>>>>> <mpirun>
>>>>>>> ...
>>>>>>> </mpirun>
>>>>>>>
>>>>>>> Any ideas?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Greg
>>>>>>>
>>>>>>>
>>>>>>> On Aug 17, 2009, at 11:16 PM, Ralph Castain wrote:
>>>>>>>
>>>>>>> Should be done on trunk with r21826 - would you please give it
>>>>>>> a try and let me know if that meets requirements? If so, I'll
>>>>>>> move it to 1.3.4.
>>>>>>>
>>>>>>> Thanks
>>>>>>> Ralph
>>>>>>>
>>>>>>> On Aug 17, 2009, at 6:42 AM, Greg Watson wrote:
>>>>>>>
>>>>>>> Hi Ralph,
>>>>>>>
>>>>>>> Yes, you'd just need issue the start tag prior to any other
>>>>>>> XML output, then the end tag when it's guaranteed all XML
>>>>>>> other output has been sent.
>>>>>>>
>>>>>>> Greg
>>>>>>>
>>>>>>> On Aug 17, 2009, at 7:44 AM, Ralph Castain wrote:
>>>>>>>
>>>>>>> All things are possible - some just a tad more painful than
>>>>>>> others.
>>>>>>>
>>>>>>> It looks like you want the mpirun tags to flow around all
>>>>>>> output during the run - i.e., there is only one pair of mpirun
>>>>>>> tags that surround anything that might come out of the job.
>>>>>>> True?
>>>>>>>
>>>>>>> If so, that would be trivial.
>>>>>>>
>>>>>>> On Aug 14, 2009, at 9:25 AM, Greg Watson wrote:
>>>>>>>
>>>>>>> Ralph,
>>>>>>>
>>>>>>> Would it be possible to get mpirun to issue start and end tags
>>>>>>> if the -xml option is used? Currently there is no way to
>>>>>>> determine when the output starts and finishes, which makes
>>>>>>> parsing the XML tricky, particularly if something else
>>>>>>> generates output (e.g. the shell). Something like this would
>>>>>>> be ideal:
>>>>>>>
>>>>>>> <mpirun>
>>>>>>> <map>
>>>>>>> ...
>>>>>>> </map>
>>>>>>> <stdout>...</stdout>
>>>>>>> <stderr>...</stderr>
>>>>>>> </mpirun>
>>>>>>>
>>>>>>> If we could get it in 1.3.4 even better. :-)
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Greg
>>>>>>> _______________________________________________
>>>>>>> devel mailing list
>>>>>>> devel_at_[hidden]
>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> devel mailing list
>>>>>>> devel_at_[hidden]
>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> devel mailing list
>>>>>>> devel_at_[hidden]
>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> devel mailing list
>>>>>>> devel_at_[hidden]
>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> devel mailing list
>>>>>>> devel_at_[hidden]
>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> devel mailing list
>>>>>>> devel_at_[hidden]
>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>
>>>>>> _______________________________________________
>>>>>> devel mailing list
>>>>>> devel_at_[hidden]
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> devel_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>
>>>> _______________________________________________
>>>> devel mailing list
>>>> devel_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel