Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] -display-map
From: Greg Watson (g.watson_at_[hidden])
Date: 2009-01-20 10:02:14


I don't think there's any reason we'd want stdout/err not to be
encapsulated, so forcing tag-output makes sense.

Greg

On Jan 20, 2009, at 9:20 AM, Ralph Castain wrote:

> You need to add --tag-output - this is a separate option as it
> applies both to xml and non-xml situations.
>
> If you like, I can force tag-output "on" by default whenever -xml is
> specified.
>
> Ralph
>
>
> On Jan 16, 2009, at 12:52 PM, Greg Watson wrote:
>
>> Ralph,
>>
>> Is there something I need to do to enable stdout/err encapsulation
>> (apart from -xml)? Here's what I see:
>>
>> $ mpirun -mca orte_show_resolved_nodenames 1 -xml -display-map -np
>> 5 /Users/greg/Documents/workspace1/testMPI/Debug/testMPI
>> <map>
>> <host name="Jarrah.local" slots="8" max_slots="0">
>> <noderesolve resolved="node0"/>
>> <noderesolve resolved="node1"/>
>> <noderesolve resolved="node2"/>
>> <noderesolve resolved="node3"/>
>> <noderesolve resolved="node4"/>
>> <noderesolve resolved="node5"/>
>> <noderesolve resolved="node6"/>
>> <noderesolve resolved="node7"/>
>> <process rank="0"/>
>> <process rank="1"/>
>> <process rank="2"/>
>> <process rank="3"/>
>> <process rank="4"/>
>> </host>
>> </map>
>> n = 0
>> n = 0
>> n = 0
>> n = 0
>> n = 0
>>
>> On Jan 15, 2009, at 1:13 PM, Ralph Castain wrote:
>>
>>> Okay, it is in the trunk as of r20284 - I'll file the request to
>>> have it moved to 1.3.1.
>>>
>>> Let me know if you get a chance to test the stdout/err stuff in
>>> the trunk - we should try and iterate it so any changes can make
>>> 1.3.1 as well.
>>>
>>> Thanks!
>>> Ralph
>>>
>>>
>>> On Jan 15, 2009, at 11:03 AM, Greg Watson wrote:
>>>
>>>> Ralph,
>>>>
>>>> I think the second form would be ideal and would simplify things
>>>> greatly.
>>>>
>>>> Greg
>>>>
>>>> On Jan 15, 2009, at 10:53 AM, Ralph Castain wrote:
>>>>
>>>>> Here is what I was able to do - note that the resolve messages
>>>>> are associated with the specific hostname, not the overall map:
>>>>>
>>>>> <map>
>>>>> <host name="graywolf54.lanl.gov" slots="1" max_slots="0">
>>>>> <noderesolve name="graywolf54.lanl.gov" resolved="localhost"/>
>>>>> <process rank="0"/>
>>>>> <process rank="1"/>
>>>>> <process rank="2"/>
>>>>> </host>
>>>>> </map>
>>>>>
>>>>> Will that work for you? If you like, I can remove the name=
>>>>> field from the noderesolve element since the info is specific to
>>>>> the host element that contains it. In other words, I can make it
>>>>> look like this:
>>>>>
>>>>> <map>
>>>>> <host name="graywolf54.lanl.gov" slots="1" max_slots="0">
>>>>> <noderesolve resolved="localhost"/>
>>>>> <process rank="0"/>
>>>>> <process rank="1"/>
>>>>> <process rank="2"/>
>>>>> </host>
>>>>> </map>
>>>>>
>>>>> if that would help.
>>>>>
>>>>> Ralph
>>>>>
>>>>>
>>>>> On Jan 14, 2009, at 7:57 AM, Ralph Castain wrote:
>>>>>
>>>>>> We -may- be able to do a more formal XML output at some point.
>>>>>> The problem will be the natural interleaving of stdout/err from
>>>>>> the various procs due to the async behavior of MPI. Mpirun
>>>>>> receives fragmented output in the forwarding system, limited by
>>>>>> the buffer sizes and the amount of data we can read at any one
>>>>>> "bite" from the pipes connecting us to the procs. So even
>>>>>> though the user -thinks- they output a single large line of
>>>>>> stuff, it may show up at mpirun as a series of fragments.
>>>>>> Hence, it gets tricky to know how to put appropriate XML
>>>>>> brackets around it.
>>>>>>
>>>>>> Given this input about when you actually want resolved name
>>>>>> info, I can at least do something about that area. Won't be in
>>>>>> 1.3.0, but should make 1.3.1.
>>>>>>
>>>>>> As for XML-tagged stdout/err: the OMPI community asked me not
>>>>>> to turn that feature "on" for 1.3.0 as they felt it hasn't been
>>>>>> adequately tested yet. The code is present, but cannot be
>>>>>> activated in 1.3.0. However, I believe it is activated on the
>>>>>> trunk when you do --xml --tagged-output, so perhaps some
>>>>>> testing will help us debug and validate it adequately for 1.3.1?
>>>>>>
>>>>>> Thanks
>>>>>> Ralph
>>>>>>
>>>>>>
>>>>>> On Jan 14, 2009, at 7:02 AM, Greg Watson wrote:
>>>>>>
>>>>>>> Ralph,
>>>>>>>
>>>>>>> The only time we use the resolved names is when we get a map,
>>>>>>> so we consider them part of the map output.
>>>>>>>
>>>>>>> If quasi-XML is all that will ever be possible with 1.3, then
>>>>>>> you may as well leave as-is and we will attempt to clean it up
>>>>>>> in Eclipse. It would be nice if a future version of ompi could
>>>>>>> output correct XML (including stdout) as this would vastly
>>>>>>> simplify the parsing we need to do.
>>>>>>>
>>>>>>> Regards,
>>>>>>>
>>>>>>> Greg
>>>>>>>
>>>>>>> On Jan 13, 2009, at 3:30 PM, Ralph Castain wrote:
>>>>>>>
>>>>>>>> Hmmm...well, I can't do either for 1.3.0 as it is departing
>>>>>>>> this afternoon.
>>>>>>>>
>>>>>>>> The first option would be very hard to do. I would have to
>>>>>>>> expose the display-map option across the code base and check
>>>>>>>> it prior to printing anything about resolving node names. I
>>>>>>>> guess I should ask: do you only want noderesolve statements
>>>>>>>> when we are displaying the map? Right now, I will output them
>>>>>>>> regardless.
>>>>>>>>
>>>>>>>> The second option could be done. I could check if any
>>>>>>>> "display" option has been specified, and output the <ompi>
>>>>>>>> root at that time (likewise for the end). Anything we output
>>>>>>>> in-between would be encapsulated between the two, but that
>>>>>>>> would include any user output to stdout and/or stderr - which
>>>>>>>> for 1.3.0 is not in xml.
>>>>>>>>
>>>>>>>> Any thoughts?
>>>>>>>>
>>>>>>>> Ralph
>>>>>>>>
>>>>>>>> PS. Guess I should clarify that I was not striving for true
>>>>>>>> XML interaction here, but rather a quasi-XML format that
>>>>>>>> would help you to filter the output. I have no problem trying
>>>>>>>> to get to something more formally correct, but it could be
>>>>>>>> tricky in some places to achieve it due to the inherent async
>>>>>>>> nature of the beast.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Jan 13, 2009, at 12:17 PM, Greg Watson wrote:
>>>>>>>>
>>>>>>>>> Ralph,
>>>>>>>>>
>>>>>>>>> The XML is looking better now, but there is still one
>>>>>>>>> problem. To be valid, there needs to be only one root
>>>>>>>>> element, but currently you don't have any (or many). So
>>>>>>>>> rather than:
>>>>>>>>>
>>>>>>>>> <noderesolve name="node0" resolved="Jarrah.local"/>
>>>>>>>>> <noderesolve name="node1" resolved="Jarrah.local"/>
>>>>>>>>> <map>
>>>>>>>>> <host name="Jarrah.local" slots="8" max_slots="0">
>>>>>>>>> <process rank="0"/>
>>>>>>>>> <process rank="1"/>
>>>>>>>>> <process rank="2"/>
>>>>>>>>> <process rank="3"/>
>>>>>>>>> <process rank="4"/>
>>>>>>>>> </host>
>>>>>>>>> </map>
>>>>>>>>>
>>>>>>>>> the XML should be:
>>>>>>>>>
>>>>>>>>> <map>
>>>>>>>>> <noderesolve name="node0" resolved="Jarrah.local"/>
>>>>>>>>> <noderesolve name="node1" resolved="Jarrah.local"/>
>>>>>>>>> <host name="Jarrah.local" slots="8" max_slots="0">
>>>>>>>>> <process rank="0"/>
>>>>>>>>> <process rank="1"/>
>>>>>>>>> <process rank="2"/>
>>>>>>>>> <process rank="3"/>
>>>>>>>>> <process rank="4"/>
>>>>>>>>> </host>
>>>>>>>>> </map>
>>>>>>>>>
>>>>>>>>> or:
>>>>>>>>>
>>>>>>>>> <ompi>
>>>>>>>>> <noderesolve name="node0" resolved="Jarrah.local"/>
>>>>>>>>> <noderesolve name="node1" resolved="Jarrah.local"/>
>>>>>>>>> <map>
>>>>>>>>> <host name="Jarrah.local" slots="8" max_slots="0">
>>>>>>>>> <process rank="0"/>
>>>>>>>>> <process rank="1"/>
>>>>>>>>> <process rank="2"/>
>>>>>>>>> <process rank="3"/>
>>>>>>>>> <process rank="4"/>
>>>>>>>>> </host>
>>>>>>>>> </map>
>>>>>>>>> </ompi>
>>>>>>>>>
>>>>>>>>> Would either of these be possible?
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>> Greg
>>>>>>>>>
>>>>>>>>> On Dec 8, 2008, at 2:18 PM, Greg Watson wrote:
>>>>>>>>>
>>>>>>>>>> Ok thanks. I'll test from trunk in future.
>>>>>>>>>>
>>>>>>>>>> Greg
>>>>>>>>>>
>>>>>>>>>> On Dec 8, 2008, at 2:05 PM, Ralph Castain wrote:
>>>>>>>>>>
>>>>>>>>>>> Working its way around the CMR process now.
>>>>>>>>>>>
>>>>>>>>>>> Might be easier in the future if we could test/debug this
>>>>>>>>>>> in the trunk, though. Otherwise, the CMR procedure will
>>>>>>>>>>> fall behind and a fix might miss a release window.
>>>>>>>>>>>
>>>>>>>>>>> Anyway, hopefully this one will make the 1.3.0 release
>>>>>>>>>>> cutoff.
>>>>>>>>>>>
>>>>>>>>>>> Thanks
>>>>>>>>>>> Ralph
>>>>>>>>>>>
>>>>>>>>>>> On Dec 8, 2008, at 9:56 AM, Greg Watson wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi Ralph,
>>>>>>>>>>>>
>>>>>>>>>>>> This is now in 1.3rc2, thanks. However there are a couple
>>>>>>>>>>>> of problems. Here is what I see:
>>>>>>>>>>>>
>>>>>>>>>>>> [Jarrah.watson.ibm.com:58957] <noderesolve name="node0"
>>>>>>>>>>>> resolved="Jarrah.watson.ibm.com">
>>>>>>>>>>>>
>>>>>>>>>>>> For some reason each line is prefixed with "[...]", any
>>>>>>>>>>>> idea why this is? Also the end tag should be "/>" not ">".
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>
>>>>>>>>>>>> Greg
>>>>>>>>>>>>
>>>>>>>>>>>> On Nov 24, 2008, at 3:06 PM, Greg Watson wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Great, thanks. I'll take a look once it comes over to 1.3.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Greg
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Nov 24, 2008, at 2:59 PM, Ralph Castain wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Yo Greg
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> This is in the trunk as of r20032. I'll bring it over
>>>>>>>>>>>>>> to 1.3 in a few days.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I implemented it as another MCA param
>>>>>>>>>>>>>> "orte_show_resolved_nodenames" so you can actually get
>>>>>>>>>>>>>> the info as you execute the job, if you want. The xml
>>>>>>>>>>>>>> tag is "noderesolve" - let me know if you need any
>>>>>>>>>>>>>> changes.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Ralph
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Oct 22, 2008, at 11:55 AM, Greg Watson wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Ralph,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I guess the issue for us is that we will have to run
>>>>>>>>>>>>>>> two commands to get the information we need. One to
>>>>>>>>>>>>>>> get the configuration information, such as version and
>>>>>>>>>>>>>>> MCA parameters, and one to get the host information,
>>>>>>>>>>>>>>> whereas it would seem more logical that this should
>>>>>>>>>>>>>>> all be available via some kind of "configuration
>>>>>>>>>>>>>>> discovery" command. I understand the issue with
>>>>>>>>>>>>>>> supplying the hostfile though, so maybe this just
>>>>>>>>>>>>>>> points at the need for us to separate configuration
>>>>>>>>>>>>>>> information from the host information. In any case,
>>>>>>>>>>>>>>> we'll work with what you think is best.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Greg
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Oct 20, 2008, at 4:49 PM, Ralph Castain wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hmmm...just to be sure we are all clear on this. The
>>>>>>>>>>>>>>>> reason we proposed to use mpirun is that "hostfile"
>>>>>>>>>>>>>>>> has no meaning outside of mpirun. That's why
>>>>>>>>>>>>>>>> ompi_info can't do anything in this regard.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> We have no idea what hostfile the user may specify
>>>>>>>>>>>>>>>> until we actually get the mpirun cmd line. They may
>>>>>>>>>>>>>>>> have specified a default-hostfile, but they could
>>>>>>>>>>>>>>>> also specify hostfiles for the individual
>>>>>>>>>>>>>>>> app_contexts. These may or may not include the node
>>>>>>>>>>>>>>>> upon which mpirun is executing.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> So the only way to provide you with a separate
>>>>>>>>>>>>>>>> command to get a hostfile<->nodename mapping would
>>>>>>>>>>>>>>>> require you to provide us with the default-hostifle
>>>>>>>>>>>>>>>> and/or hostfile cmd line options just as if you were
>>>>>>>>>>>>>>>> issuing the mpirun cmd. We just wouldn't launch - but
>>>>>>>>>>>>>>>> it would be the exact equivalent of doing "mpirun --
>>>>>>>>>>>>>>>> do-not-launch".
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Am I missing something? If so, please do correct me -
>>>>>>>>>>>>>>>> I would be happy to provide a tool if that would make
>>>>>>>>>>>>>>>> it easier. Just not sure what that tool would do.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>>>> Ralph
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Oct 19, 2008, at 1:59 PM, Greg Watson wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Ralph,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> It seems a little strange to be using mpirun for
>>>>>>>>>>>>>>>>> this, but barring providing a separate command, or
>>>>>>>>>>>>>>>>> using ompi_info, I think this would solve our problem.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Greg
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Oct 17, 2008, at 10:46 AM, Ralph Castain wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Sorry for delay - had to ponder this one for awhile.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Jeff and I agree that adding something to ompi_info
>>>>>>>>>>>>>>>>>> would not be a good idea. Ompi_info has no
>>>>>>>>>>>>>>>>>> knowledge or understanding of hostfiles, and adding
>>>>>>>>>>>>>>>>>> that capability to it would be a major distortion
>>>>>>>>>>>>>>>>>> of its intended use.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> However, we think we can offer an alternative that
>>>>>>>>>>>>>>>>>> might better solve the problem. Remember, we now
>>>>>>>>>>>>>>>>>> treat hostfiles in a very different manner than
>>>>>>>>>>>>>>>>>> before - see the wiki page for a complete
>>>>>>>>>>>>>>>>>> description, or "man orte_hosts".
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> So the problem is that, to provide you with what
>>>>>>>>>>>>>>>>>> you want, we need to "dump" the information from
>>>>>>>>>>>>>>>>>> whatever default-hostfile was provided, and, if no
>>>>>>>>>>>>>>>>>> default-hostfile was provided, then the information
>>>>>>>>>>>>>>>>>> from each hostfile that was provided with an
>>>>>>>>>>>>>>>>>> app_context.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> The best way we could think of to do this is to add
>>>>>>>>>>>>>>>>>> another mpirun cmd line option --dump-hostfiles
>>>>>>>>>>>>>>>>>> that would output the line-by-line name from the
>>>>>>>>>>>>>>>>>> hostfile plus the name we resolved it to. Of
>>>>>>>>>>>>>>>>>> course, --xml would cause it to be in xml format.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Would that meet your needs?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Ralph
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Oct 15, 2008, at 3:12 PM, Greg Watson wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Hi Ralph,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> We've been discussing this back and forth a bit
>>>>>>>>>>>>>>>>>>> internally and don't really see an easy solution.
>>>>>>>>>>>>>>>>>>> Our problem is that Eclipse is not running on the
>>>>>>>>>>>>>>>>>>> head node, so gethostbyname will not necessarily
>>>>>>>>>>>>>>>>>>> resolve to the same address. For example, the
>>>>>>>>>>>>>>>>>>> hostfile might refer to the head node by an
>>>>>>>>>>>>>>>>>>> internal network address that is not visible to
>>>>>>>>>>>>>>>>>>> the outside world. Since gethostname also looks
>>>>>>>>>>>>>>>>>>> in /etc/hosts, it may resolve locally but not on a
>>>>>>>>>>>>>>>>>>> remote system. The only think I can think of would
>>>>>>>>>>>>>>>>>>> be, rather than us reading the hostfile directly
>>>>>>>>>>>>>>>>>>> as we do now, to provide an option to ompi_info
>>>>>>>>>>>>>>>>>>> that would dump the hostfile using the same rules
>>>>>>>>>>>>>>>>>>> that you apply when you're using the hostfile.
>>>>>>>>>>>>>>>>>>> Would that be feasible?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Greg
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Sep 22, 2008, at 4:25 PM, Ralph Castain wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Sorry for delay - was on vacation and am now
>>>>>>>>>>>>>>>>>>>> trying to work my way back to the surface.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> I'm not sure I can fix this one for two reasons:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> 1. In general, OMPI doesn't really care what name
>>>>>>>>>>>>>>>>>>>> is used for the node. However, the problem is
>>>>>>>>>>>>>>>>>>>> that it needs to be consistent. In this case,
>>>>>>>>>>>>>>>>>>>> ORTE has already used the name returned by
>>>>>>>>>>>>>>>>>>>> gethostname to create its session directory
>>>>>>>>>>>>>>>>>>>> structure long before mpirun reads a hostfile.
>>>>>>>>>>>>>>>>>>>> This is why we retain the value from gethostname
>>>>>>>>>>>>>>>>>>>> instead of allowing it to be overwritten by the
>>>>>>>>>>>>>>>>>>>> name in whatever allocation we are given. Using
>>>>>>>>>>>>>>>>>>>> the name in hostfile would require that I either
>>>>>>>>>>>>>>>>>>>> find some way to remember any prior name, or that
>>>>>>>>>>>>>>>>>>>> I tear down and rebuild the session directory
>>>>>>>>>>>>>>>>>>>> tree - neither seems attractive nor simple (e.g.,
>>>>>>>>>>>>>>>>>>>> what happens when the user provides multiple
>>>>>>>>>>>>>>>>>>>> entries in the hostfile for the node, each with a
>>>>>>>>>>>>>>>>>>>> different IP address based on another interface
>>>>>>>>>>>>>>>>>>>> in that node? Sounds crazy, but we have already
>>>>>>>>>>>>>>>>>>>> seen it done - which one do I use?).
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> 2. We don't actually store the hostfile info
>>>>>>>>>>>>>>>>>>>> anywhere - we just use it and forget it. For us
>>>>>>>>>>>>>>>>>>>> to add an XML attribute containing any hostfile-
>>>>>>>>>>>>>>>>>>>> related info would therefore require us to re-
>>>>>>>>>>>>>>>>>>>> read the hostfile. I could have it do that -only-
>>>>>>>>>>>>>>>>>>>> in the case of "XML output required", but it
>>>>>>>>>>>>>>>>>>>> seems rather ugly.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> An alternative might be for you to simply do a
>>>>>>>>>>>>>>>>>>>> "gethostbyname" lookup of the IP address or
>>>>>>>>>>>>>>>>>>>> hostname to see if it matches instead of just
>>>>>>>>>>>>>>>>>>>> doing a strcmp. This is what we have to do
>>>>>>>>>>>>>>>>>>>> internally as we frequently have problems with
>>>>>>>>>>>>>>>>>>>> FQDN vs. non-FQDN vs. IP addresses etc. If the
>>>>>>>>>>>>>>>>>>>> local OS hasn't cached the IP address for the
>>>>>>>>>>>>>>>>>>>> node in question it can take a little time to DNS
>>>>>>>>>>>>>>>>>>>> resolve it, but otherwise works fine.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> I can point you to the code in OPAL that we use -
>>>>>>>>>>>>>>>>>>>> I would think something similar would be easy to
>>>>>>>>>>>>>>>>>>>> implement in your code and would readily solve
>>>>>>>>>>>>>>>>>>>> the problem.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Ralph
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On Sep 19, 2008, at 7:18 AM, Greg Watson wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Ralph,
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> The problem we're seeing is just with the head
>>>>>>>>>>>>>>>>>>>>> node. If I specify a particular IP address for
>>>>>>>>>>>>>>>>>>>>> the head node in the hostfile, it gets changed
>>>>>>>>>>>>>>>>>>>>> to the FQDN when displayed in the map. This is a
>>>>>>>>>>>>>>>>>>>>> problem for us as we need to be able to match
>>>>>>>>>>>>>>>>>>>>> the two, and since we're not necessarily running
>>>>>>>>>>>>>>>>>>>>> on the head node, we can't always do the same
>>>>>>>>>>>>>>>>>>>>> resolution you're doing.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Would it be possible to use the same address
>>>>>>>>>>>>>>>>>>>>> that is specified in the hostfile, or
>>>>>>>>>>>>>>>>>>>>> alternatively provide an XML attribute that
>>>>>>>>>>>>>>>>>>>>> contains this information?
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Greg
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> On Sep 11, 2008, at 9:06 AM, Ralph Castain wrote:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Not in that regard, depending upon what you
>>>>>>>>>>>>>>>>>>>>>> mean by "recently". The only changes I am aware
>>>>>>>>>>>>>>>>>>>>>> of wrt nodes consisted of some changes to the
>>>>>>>>>>>>>>>>>>>>>> order in which we use the nodes when specified
>>>>>>>>>>>>>>>>>>>>>> by hostfile or -host, and a little #if
>>>>>>>>>>>>>>>>>>>>>> protectionism needed by Brian for the Cray port.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Are you seeing this for every node? Reason I
>>>>>>>>>>>>>>>>>>>>>> ask: I can't offhand think of anything in the
>>>>>>>>>>>>>>>>>>>>>> code base that would replace a host name with
>>>>>>>>>>>>>>>>>>>>>> the FQDN because we don't get that info for
>>>>>>>>>>>>>>>>>>>>>> remote nodes. The only exception is the head
>>>>>>>>>>>>>>>>>>>>>> node (where mpirun sits) - in that lone case,
>>>>>>>>>>>>>>>>>>>>>> we default to the name returned to us by
>>>>>>>>>>>>>>>>>>>>>> gethostname(). We do that because the head node
>>>>>>>>>>>>>>>>>>>>>> is frequently accessible on a more global basis
>>>>>>>>>>>>>>>>>>>>>> than the compute nodes - thus, the FQDN is
>>>>>>>>>>>>>>>>>>>>>> required to ensure that there is no address
>>>>>>>>>>>>>>>>>>>>>> confusion on the network.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> If the user refers to compute nodes in a
>>>>>>>>>>>>>>>>>>>>>> hostfile or -host (or in an allocation from a
>>>>>>>>>>>>>>>>>>>>>> resource manager) by non-FQDN, we just assume
>>>>>>>>>>>>>>>>>>>>>> they know what they are doing and the name will
>>>>>>>>>>>>>>>>>>>>>> correctly resolve to a unique address.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> On Sep 10, 2008, at 9:45 AM, Greg Watson wrote:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Has there been a change in the behavior of the
>>>>>>>>>>>>>>>>>>>>>>> -display-map option has changed recently in
>>>>>>>>>>>>>>>>>>>>>>> the 1.3 branch. We're now seeing the host name
>>>>>>>>>>>>>>>>>>>>>>> as a fully resolved DN rather than the entry
>>>>>>>>>>>>>>>>>>>>>>> that was specified in the hostfile. Is there
>>>>>>>>>>>>>>>>>>>>>>> any particular reason for this? If so, would
>>>>>>>>>>>>>>>>>>>>>>> it be possible to add the hostfile entry to
>>>>>>>>>>>>>>>>>>>>>>> the output since we need to be able to match
>>>>>>>>>>>>>>>>>>>>>>> the two?
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Greg
>>>>>>>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>>>>>>>> devel mailing list
>>>>>>>>>>>>>>>>>>>>>>> devel_at_[hidden]
>>>>>>>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>>>>>>> devel mailing list
>>>>>>>>>>>>>>>>>>>>>> devel_at_[hidden]
>>>>>>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/
>>>>>>>>>>>>>>>>>>>>>> devel
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>>>>>> devel mailing list
>>>>>>>>>>>>>>>>>>>>> devel_at_[hidden]
>>>>>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>>>>> devel mailing list
>>>>>>>>>>>>>>>>>>>> devel_at_[hidden]
>>>>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>>>> devel mailing list
>>>>>>>>>>>>>>>>>>> devel_at_[hidden]
>>>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>>> devel mailing list
>>>>>>>>>>>>>>>>>> devel_at_[hidden]
>>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>> devel mailing list
>>>>>>>>>>>>>>>>> devel_at_[hidden]
>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>> devel mailing list
>>>>>>>>>>>>>>>> devel_at_[hidden]
>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>> devel mailing list
>>>>>>>>>>>>>>> devel_at_[hidden]
>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>> devel mailing list
>>>>>>>>>>>>>> devel_at_[hidden]
>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>> devel mailing list
>>>>>>>>>>>>> devel_at_[hidden]
>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> devel mailing list
>>>>>>>>>>>> devel_at_[hidden]
>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> devel mailing list
>>>>>>>>>>> devel_at_[hidden]
>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> devel mailing list
>>>>>>>>>> devel_at_[hidden]
>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> devel mailing list
>>>>>>>>> devel_at_[hidden]
>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> devel mailing list
>>>>>>>> devel_at_[hidden]
>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> devel mailing list
>>>>>>> devel_at_[hidden]
>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>
>>>>>> _______________________________________________
>>>>>> devel mailing list
>>>>>> devel_at_[hidden]
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> devel_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>
>>>> _______________________________________________
>>>> devel mailing list
>>>> devel_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel