Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] XML request
From: Greg Watson (g.watson_at_[hidden])
Date: 2009-08-25 11:23:00


Ralph,

Looks like some messages are taking a different path:

$ mpirun -mca orte_show_resolved_nodenames 1 -xml -display-map -np 3 xxx
<mpirun>
<map>
        <host name="Jarrah.local" slots="1" max_slots="0">
                <process rank="0"/>
                <process rank="1"/>
                <process rank="2"/>
        </host>
</map>
<
stderr
>
--------------------------------------------------------------------------&#010
;</stderr>
<stderr>mpirun was unable to launch the specified application as it
could not find an executable:&#010;</stderr>
<stderr>&#010;</stderr>
<stderr>Executable: xxx&#010;</stderr>
<stderr>Node: Jarrah.local&#010;</stderr>
<stderr>&#010;</stderr>
<stderr>while attempting to start process rank 0.&#010;</stderr>
<
stderr
>
--------------------------------------------------------------------------&#010
;</stderr>
3 total processes failed to start
</mpirun>

Cheers,
Greg

On Aug 20, 2009, at 3:24 PM, Ralph Castain wrote:

> Okay - try r21858.
>
> Ralph
>
> On Aug 20, 2009, at 12:36 PM, Greg Watson wrote:
>
>> Hi Ralph,
>>
>> Cool!
>>
>> Regarding the scope of the tags, I never really thought about
>> output from the command itself. I propose that any output that
>> can't otherwise be classified be sent using the appropriate
>> <stdout> or <stderr> tags with no "rank" attribute.
>>
>> Cheers,
>> Greg
>>
>> On Aug 20, 2009, at 1:52 PM, Ralph Castain wrote:
>>
>>> Hi Greg
>>>
>>> I can catch most of these and will do so as they flow through a
>>> single code path. However, there are places sprinkled throughout
>>> the code where people directly output warning and error info -
>>> these will be more problematic and represent a degree of change
>>> that is probably outside the comfort zone for the 1.3 series.
>>>
>>> After talking with Jeff about it, we propose that I make the
>>> simple change that will catch messages like those below. For the
>>> broader problem, we believe that some discussion with you about
>>> the degree of granularity exposed through the xml output might
>>> help define the overall solution. For example, can we just label
>>> all stderr messages with <stderr></stderr> tags, or do you need
>>> more detailed tagging (e.g., rank, file, line, etc.)?
>>>
>>> That discussion can occur later - for now, I'll catch these. Will
>>> let you know when it is ready to test!
>>>
>>> Ralph
>>>
>>> On Aug 20, 2009, at 11:16 AM, Greg Watson wrote:
>>>
>>>> Ralph,
>>>>
>>>> One more thing. Even with XML enabled, I notice that some error
>>>> messages are still sent to stderr without XML tags (see below.)
>>>> Any chance these could be sent to stdout wrapped in <stderr></
>>>> stderr> tags?
>>>>
>>>> Thanks,
>>>> Greg
>>>>
>>>> $ mpirun -mca orte_show_resolved_nodenames 1 -xml -display-map -
>>>> np 1 ./pop pop_in
>>>> <mpirun>
>>>> <map>
>>>> <host name="4pcnuggets" slots="1" max_slots="0">
>>>> <process rank="0"/>
>>>> </host>
>>>> </map>
>>>> --------------------------------------------------------------------------
>>>> MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
>>>> with errorcode 0.
>>>>
>>>> NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
>>>> You may or may not see output from other processes, depending on
>>>> exactly when Open MPI kills them.
>>>> --------------------------------------------------------------------------
>>>> <stdout
>>>> rank
>>>> =
>>>> "0
>>>> ">
>>>> ------------------------------------------------------------------------&#010
>>>> ;</stdout>
>>>> <stdout rank="0"> &#010;</stdout>
>>>> <stdout rank="0"> Parallel Ocean Program (POP) &#010;</stdout>
>>>> <stdout rank="0"> Version 2.0.1 Released 21 Jan 2004&#010;</
>>>> stdout>
>>>> <stdout rank="0"> &#010;</stdout>
>>>> <stdout
>>>> rank
>>>> =
>>>> "0
>>>> ">
>>>> ------------------------------------------------------------------------&#010
>>>> ;</stdout>
>>>> <stdout
>>>> rank
>>>> =
>>>> "0
>>>> ">
>>>> ------------------------------------------------------------------------&#010
>>>> ;</stdout>
>>>> <stdout rank="0"> &#010;</stdout>
>>>> <stdout rank="0">POP aborting...&#010;</stdout>
>>>> <stdout rank="0"> Input nprocs not same as system request&#010;</
>>>> stdout>
>>>> <stdout rank="0"> &#010;</stdout>
>>>> <stdout
>>>> rank
>>>> =
>>>> "0
>>>> ">
>>>> ------------------------------------------------------------------------&#010
>>>> ;</stdout>
>>>> --------------------------------------------------------------------------
>>>> mpirun has exited due to process rank 0 with PID 15201 on
>>>> node 4pcnuggets exiting without calling "finalize". This may
>>>> have caused other processes in the application to be
>>>> terminated by signals sent by mpirun (as reported here).
>>>> --------------------------------------------------------------------------
>>>>
>>>>
>>>> On Aug 19, 2009, at 10:48 AM, Greg Watson wrote:
>>>>
>>>>> Ralph,
>>>>>
>>>>> Looks like it's working now.
>>>>>
>>>>> Thanks,
>>>>> Greg
>>>>>
>>>>> On Aug 18, 2009, at 5:21 PM, Ralph Castain wrote:
>>>>>
>>>>>> Give r21836 a try and see if it still gets out of order.
>>>>>>
>>>>>> Ralph
>>>>>>
>>>>>>
>>>>>> On Aug 18, 2009, at 2:18 PM, Greg Watson wrote:
>>>>>>
>>>>>>> Ralph,
>>>>>>>
>>>>>>> Not sure that's it because all XML output should be via stdout.
>>>>>>>
>>>>>>> Greg
>>>>>>>
>>>>>>> On Aug 18, 2009, at 3:53 PM, Ralph Castain wrote:
>>>>>>>
>>>>>>>> Hmmm....let me try adding a fflush after the <mpirun> output
>>>>>>>> to force it out. Best guess is that you are seeing a little
>>>>>>>> race condition - the map output is coming over stderr, while
>>>>>>>> the <mpirun> tag is coming over stdout.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Aug 18, 2009 at 12:53 PM, Greg Watson <g.watson_at_[hidden]
>>>>>>>> > wrote:
>>>>>>>> Hi Ralph,
>>>>>>>>
>>>>>>>> I'm seeing something strange. When I run "mpirun -mca
>>>>>>>> orte_show_resolved_nodenames 1 -xml -display-map...", I see:
>>>>>>>>
>>>>>>>> <mpirun>
>>>>>>>> <map>
>>>>>>>> <host name="Jarrah.local" slots="1" max_slots="0">
>>>>>>>> <process rank="0"/>
>>>>>>>> <process rank="1"/>
>>>>>>>> <process rank="2"/>
>>>>>>>> <process rank="3"/>
>>>>>>>> </host>
>>>>>>>> </map>
>>>>>>>> ...
>>>>>>>> </mpirun>
>>>>>>>>
>>>>>>>> but when I run " ssh localhost mpirun -mca
>>>>>>>> orte_show_resolved_nodenames 1 -xml -display-map...", I see:
>>>>>>>>
>>>>>>>> <map>
>>>>>>>> <host name="Jarrah.local" slots="1" max_slots="0">
>>>>>>>> <process rank="0"/>
>>>>>>>> <process rank="1"/>
>>>>>>>> <process rank="2"/>
>>>>>>>> <process rank="3"/>
>>>>>>>> </host>
>>>>>>>> </map>
>>>>>>>> <mpirun>
>>>>>>>> ...
>>>>>>>> </mpirun>
>>>>>>>>
>>>>>>>> Any ideas?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Greg
>>>>>>>>
>>>>>>>>
>>>>>>>> On Aug 17, 2009, at 11:16 PM, Ralph Castain wrote:
>>>>>>>>
>>>>>>>> Should be done on trunk with r21826 - would you please give
>>>>>>>> it a try and let me know if that meets requirements? If so,
>>>>>>>> I'll move it to 1.3.4.
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>> Ralph
>>>>>>>>
>>>>>>>> On Aug 17, 2009, at 6:42 AM, Greg Watson wrote:
>>>>>>>>
>>>>>>>> Hi Ralph,
>>>>>>>>
>>>>>>>> Yes, you'd just need issue the start tag prior to any other
>>>>>>>> XML output, then the end tag when it's guaranteed all XML
>>>>>>>> other output has been sent.
>>>>>>>>
>>>>>>>> Greg
>>>>>>>>
>>>>>>>> On Aug 17, 2009, at 7:44 AM, Ralph Castain wrote:
>>>>>>>>
>>>>>>>> All things are possible - some just a tad more painful than
>>>>>>>> others.
>>>>>>>>
>>>>>>>> It looks like you want the mpirun tags to flow around all
>>>>>>>> output during the run - i.e., there is only one pair of
>>>>>>>> mpirun tags that surround anything that might come out of the
>>>>>>>> job. True?
>>>>>>>>
>>>>>>>> If so, that would be trivial.
>>>>>>>>
>>>>>>>> On Aug 14, 2009, at 9:25 AM, Greg Watson wrote:
>>>>>>>>
>>>>>>>> Ralph,
>>>>>>>>
>>>>>>>> Would it be possible to get mpirun to issue start and end
>>>>>>>> tags if the -xml option is used? Currently there is no way to
>>>>>>>> determine when the output starts and finishes, which makes
>>>>>>>> parsing the XML tricky, particularly if something else
>>>>>>>> generates output (e.g. the shell). Something like this would
>>>>>>>> be ideal:
>>>>>>>>
>>>>>>>> <mpirun>
>>>>>>>> <map>
>>>>>>>> ...
>>>>>>>> </map>
>>>>>>>> <stdout>...</stdout>
>>>>>>>> <stderr>...</stderr>
>>>>>>>> </mpirun>
>>>>>>>>
>>>>>>>> If we could get it in 1.3.4 even better. :-)
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Greg
>>>>>>>> _______________________________________________
>>>>>>>> devel mailing list
>>>>>>>> devel_at_[hidden]
>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> devel mailing list
>>>>>>>> devel_at_[hidden]
>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> devel mailing list
>>>>>>>> devel_at_[hidden]
>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> devel mailing list
>>>>>>>> devel_at_[hidden]
>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> devel mailing list
>>>>>>>> devel_at_[hidden]
>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> devel mailing list
>>>>>>>> devel_at_[hidden]
>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> devel mailing list
>>>>>>> devel_at_[hidden]
>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>
>>>>>> _______________________________________________
>>>>>> devel mailing list
>>>>>> devel_at_[hidden]
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> devel_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>
>>>> _______________________________________________
>>>> devel mailing list
>>>> devel_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel