Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] -display-map
From: Ralph Castain (rhc_at_[hidden])
Date: 2009-01-20 12:00:14


I'm embarrassed to admit that I never actually implemented the xml
option for tag-output...this has been rectified with r20302.

Let me know if that works for you - sorry for confusion.

Ralph

On Jan 20, 2009, at 8:08 AM, Greg Watson wrote:

> Ralph,
>
> The encapsulation is not quite right yet. I'm seeing this:
>
> [1,0]<stdout>n = 0
> [1,1]<stdout>n = 0
>
> but it should be:
>
> <stdout rank="0">n = 0</stdout>
> <stdout rank="1">n = 0</stdout>
>
> Thanks,
>
> Greg
>
> On Jan 20, 2009, at 9:20 AM, Ralph Castain wrote:
>
>> You need to add --tag-output - this is a separate option as it
>> applies both to xml and non-xml situations.
>>
>> If you like, I can force tag-output "on" by default whenever -xml
>> is specified.
>>
>> Ralph
>>
>>
>> On Jan 16, 2009, at 12:52 PM, Greg Watson wrote:
>>
>>> Ralph,
>>>
>>> Is there something I need to do to enable stdout/err encapsulation
>>> (apart from -xml)? Here's what I see:
>>>
>>> $ mpirun -mca orte_show_resolved_nodenames 1 -xml -display-map -np
>>> 5 /Users/greg/Documents/workspace1/testMPI/Debug/testMPI
>>> <map>
>>> <host name="Jarrah.local" slots="8" max_slots="0">
>>> <noderesolve resolved="node0"/>
>>> <noderesolve resolved="node1"/>
>>> <noderesolve resolved="node2"/>
>>> <noderesolve resolved="node3"/>
>>> <noderesolve resolved="node4"/>
>>> <noderesolve resolved="node5"/>
>>> <noderesolve resolved="node6"/>
>>> <noderesolve resolved="node7"/>
>>> <process rank="0"/>
>>> <process rank="1"/>
>>> <process rank="2"/>
>>> <process rank="3"/>
>>> <process rank="4"/>
>>> </host>
>>> </map>
>>> n = 0
>>> n = 0
>>> n = 0
>>> n = 0
>>> n = 0
>>>
>>> On Jan 15, 2009, at 1:13 PM, Ralph Castain wrote:
>>>
>>>> Okay, it is in the trunk as of r20284 - I'll file the request to
>>>> have it moved to 1.3.1.
>>>>
>>>> Let me know if you get a chance to test the stdout/err stuff in
>>>> the trunk - we should try and iterate it so any changes can make
>>>> 1.3.1 as well.
>>>>
>>>> Thanks!
>>>> Ralph
>>>>
>>>>
>>>> On Jan 15, 2009, at 11:03 AM, Greg Watson wrote:
>>>>
>>>>> Ralph,
>>>>>
>>>>> I think the second form would be ideal and would simplify things
>>>>> greatly.
>>>>>
>>>>> Greg
>>>>>
>>>>> On Jan 15, 2009, at 10:53 AM, Ralph Castain wrote:
>>>>>
>>>>>> Here is what I was able to do - note that the resolve messages
>>>>>> are associated with the specific hostname, not the overall map:
>>>>>>
>>>>>> <map>
>>>>>> <host name="graywolf54.lanl.gov" slots="1" max_slots="0">
>>>>>> <noderesolve name="graywolf54.lanl.gov" resolved="localhost"/>
>>>>>> <process rank="0"/>
>>>>>> <process rank="1"/>
>>>>>> <process rank="2"/>
>>>>>> </host>
>>>>>> </map>
>>>>>>
>>>>>> Will that work for you? If you like, I can remove the name=
>>>>>> field from the noderesolve element since the info is specific
>>>>>> to the host element that contains it. In other words, I can
>>>>>> make it look like this:
>>>>>>
>>>>>> <map>
>>>>>> <host name="graywolf54.lanl.gov" slots="1" max_slots="0">
>>>>>> <noderesolve resolved="localhost"/>
>>>>>> <process rank="0"/>
>>>>>> <process rank="1"/>
>>>>>> <process rank="2"/>
>>>>>> </host>
>>>>>> </map>
>>>>>>
>>>>>> if that would help.
>>>>>>
>>>>>> Ralph
>>>>>>
>>>>>>
>>>>>> On Jan 14, 2009, at 7:57 AM, Ralph Castain wrote:
>>>>>>
>>>>>>> We -may- be able to do a more formal XML output at some point.
>>>>>>> The problem will be the natural interleaving of stdout/err
>>>>>>> from the various procs due to the async behavior of MPI.
>>>>>>> Mpirun receives fragmented output in the forwarding system,
>>>>>>> limited by the buffer sizes and the amount of data we can read
>>>>>>> at any one "bite" from the pipes connecting us to the procs.
>>>>>>> So even though the user -thinks- they output a single large
>>>>>>> line of stuff, it may show up at mpirun as a series of
>>>>>>> fragments. Hence, it gets tricky to know how to put
>>>>>>> appropriate XML brackets around it.
>>>>>>>
>>>>>>> Given this input about when you actually want resolved name
>>>>>>> info, I can at least do something about that area. Won't be in
>>>>>>> 1.3.0, but should make 1.3.1.
>>>>>>>
>>>>>>> As for XML-tagged stdout/err: the OMPI community asked me not
>>>>>>> to turn that feature "on" for 1.3.0 as they felt it hasn't
>>>>>>> been adequately tested yet. The code is present, but cannot be
>>>>>>> activated in 1.3.0. However, I believe it is activated on the
>>>>>>> trunk when you do --xml --tagged-output, so perhaps some
>>>>>>> testing will help us debug and validate it adequately for 1.3.1?
>>>>>>>
>>>>>>> Thanks
>>>>>>> Ralph
>>>>>>>
>>>>>>>
>>>>>>> On Jan 14, 2009, at 7:02 AM, Greg Watson wrote:
>>>>>>>
>>>>>>>> Ralph,
>>>>>>>>
>>>>>>>> The only time we use the resolved names is when we get a map,
>>>>>>>> so we consider them part of the map output.
>>>>>>>>
>>>>>>>> If quasi-XML is all that will ever be possible with 1.3, then
>>>>>>>> you may as well leave as-is and we will attempt to clean it
>>>>>>>> up in Eclipse. It would be nice if a future version of ompi
>>>>>>>> could output correct XML (including stdout) as this would
>>>>>>>> vastly simplify the parsing we need to do.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>>
>>>>>>>> Greg
>>>>>>>>
>>>>>>>> On Jan 13, 2009, at 3:30 PM, Ralph Castain wrote:
>>>>>>>>
>>>>>>>>> Hmmm...well, I can't do either for 1.3.0 as it is departing
>>>>>>>>> this afternoon.
>>>>>>>>>
>>>>>>>>> The first option would be very hard to do. I would have to
>>>>>>>>> expose the display-map option across the code base and check
>>>>>>>>> it prior to printing anything about resolving node names. I
>>>>>>>>> guess I should ask: do you only want noderesolve statements
>>>>>>>>> when we are displaying the map? Right now, I will output
>>>>>>>>> them regardless.
>>>>>>>>>
>>>>>>>>> The second option could be done. I could check if any
>>>>>>>>> "display" option has been specified, and output the <ompi>
>>>>>>>>> root at that time (likewise for the end). Anything we output
>>>>>>>>> in-between would be encapsulated between the two, but that
>>>>>>>>> would include any user output to stdout and/or stderr -
>>>>>>>>> which for 1.3.0 is not in xml.
>>>>>>>>>
>>>>>>>>> Any thoughts?
>>>>>>>>>
>>>>>>>>> Ralph
>>>>>>>>>
>>>>>>>>> PS. Guess I should clarify that I was not striving for true
>>>>>>>>> XML interaction here, but rather a quasi-XML format that
>>>>>>>>> would help you to filter the output. I have no problem
>>>>>>>>> trying to get to something more formally correct, but it
>>>>>>>>> could be tricky in some places to achieve it due to the
>>>>>>>>> inherent async nature of the beast.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Jan 13, 2009, at 12:17 PM, Greg Watson wrote:
>>>>>>>>>
>>>>>>>>>> Ralph,
>>>>>>>>>>
>>>>>>>>>> The XML is looking better now, but there is still one
>>>>>>>>>> problem. To be valid, there needs to be only one root
>>>>>>>>>> element, but currently you don't have any (or many). So
>>>>>>>>>> rather than:
>>>>>>>>>>
>>>>>>>>>> <noderesolve name="node0" resolved="Jarrah.local"/>
>>>>>>>>>> <noderesolve name="node1" resolved="Jarrah.local"/>
>>>>>>>>>> <map>
>>>>>>>>>> <host name="Jarrah.local" slots="8" max_slots="0">
>>>>>>>>>> <process rank="0"/>
>>>>>>>>>> <process rank="1"/>
>>>>>>>>>> <process rank="2"/>
>>>>>>>>>> <process rank="3"/>
>>>>>>>>>> <process rank="4"/>
>>>>>>>>>> </host>
>>>>>>>>>> </map>
>>>>>>>>>>
>>>>>>>>>> the XML should be:
>>>>>>>>>>
>>>>>>>>>> <map>
>>>>>>>>>> <noderesolve name="node0" resolved="Jarrah.local"/>
>>>>>>>>>> <noderesolve name="node1" resolved="Jarrah.local"/>
>>>>>>>>>> <host name="Jarrah.local" slots="8" max_slots="0">
>>>>>>>>>> <process rank="0"/>
>>>>>>>>>> <process rank="1"/>
>>>>>>>>>> <process rank="2"/>
>>>>>>>>>> <process rank="3"/>
>>>>>>>>>> <process rank="4"/>
>>>>>>>>>> </host>
>>>>>>>>>> </map>
>>>>>>>>>>
>>>>>>>>>> or:
>>>>>>>>>>
>>>>>>>>>> <ompi>
>>>>>>>>>> <noderesolve name="node0" resolved="Jarrah.local"/>
>>>>>>>>>> <noderesolve name="node1" resolved="Jarrah.local"/>
>>>>>>>>>> <map>
>>>>>>>>>> <host name="Jarrah.local" slots="8" max_slots="0">
>>>>>>>>>> <process rank="0"/>
>>>>>>>>>> <process rank="1"/>
>>>>>>>>>> <process rank="2"/>
>>>>>>>>>> <process rank="3"/>
>>>>>>>>>> <process rank="4"/>
>>>>>>>>>> </host>
>>>>>>>>>> </map>
>>>>>>>>>> </ompi>
>>>>>>>>>>
>>>>>>>>>> Would either of these be possible?
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>>
>>>>>>>>>> Greg
>>>>>>>>>>
>>>>>>>>>> On Dec 8, 2008, at 2:18 PM, Greg Watson wrote:
>>>>>>>>>>
>>>>>>>>>>> Ok thanks. I'll test from trunk in future.
>>>>>>>>>>>
>>>>>>>>>>> Greg
>>>>>>>>>>>
>>>>>>>>>>> On Dec 8, 2008, at 2:05 PM, Ralph Castain wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Working its way around the CMR process now.
>>>>>>>>>>>>
>>>>>>>>>>>> Might be easier in the future if we could test/debug this
>>>>>>>>>>>> in the trunk, though. Otherwise, the CMR procedure will
>>>>>>>>>>>> fall behind and a fix might miss a release window.
>>>>>>>>>>>>
>>>>>>>>>>>> Anyway, hopefully this one will make the 1.3.0 release
>>>>>>>>>>>> cutoff.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks
>>>>>>>>>>>> Ralph
>>>>>>>>>>>>
>>>>>>>>>>>> On Dec 8, 2008, at 9:56 AM, Greg Watson wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Ralph,
>>>>>>>>>>>>>
>>>>>>>>>>>>> This is now in 1.3rc2, thanks. However there are a
>>>>>>>>>>>>> couple of problems. Here is what I see:
>>>>>>>>>>>>>
>>>>>>>>>>>>> [Jarrah.watson.ibm.com:58957] <noderesolve name="node0"
>>>>>>>>>>>>> resolved="Jarrah.watson.ibm.com">
>>>>>>>>>>>>>
>>>>>>>>>>>>> For some reason each line is prefixed with "[...]", any
>>>>>>>>>>>>> idea why this is? Also the end tag should be "/>" not ">".
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Greg
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Nov 24, 2008, at 3:06 PM, Greg Watson wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Great, thanks. I'll take a look once it comes over to
>>>>>>>>>>>>>> 1.3.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Greg
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Nov 24, 2008, at 2:59 PM, Ralph Castain wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Yo Greg
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> This is in the trunk as of r20032. I'll bring it over
>>>>>>>>>>>>>>> to 1.3 in a few days.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I implemented it as another MCA param
>>>>>>>>>>>>>>> "orte_show_resolved_nodenames" so you can actually get
>>>>>>>>>>>>>>> the info as you execute the job, if you want. The xml
>>>>>>>>>>>>>>> tag is "noderesolve" - let me know if you need any
>>>>>>>>>>>>>>> changes.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Ralph
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Oct 22, 2008, at 11:55 AM, Greg Watson wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Ralph,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I guess the issue for us is that we will have to run
>>>>>>>>>>>>>>>> two commands to get the information we need. One to
>>>>>>>>>>>>>>>> get the configuration information, such as version
>>>>>>>>>>>>>>>> and MCA parameters, and one to get the host
>>>>>>>>>>>>>>>> information, whereas it would seem more logical that
>>>>>>>>>>>>>>>> this should all be available via some kind of
>>>>>>>>>>>>>>>> "configuration discovery" command. I understand the
>>>>>>>>>>>>>>>> issue with supplying the hostfile though, so maybe
>>>>>>>>>>>>>>>> this just points at the need for us to separate
>>>>>>>>>>>>>>>> configuration information from the host information.
>>>>>>>>>>>>>>>> In any case, we'll work with what you think is best.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Greg
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Oct 20, 2008, at 4:49 PM, Ralph Castain wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hmmm...just to be sure we are all clear on this. The
>>>>>>>>>>>>>>>>> reason we proposed to use mpirun is that "hostfile"
>>>>>>>>>>>>>>>>> has no meaning outside of mpirun. That's why
>>>>>>>>>>>>>>>>> ompi_info can't do anything in this regard.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> We have no idea what hostfile the user may specify
>>>>>>>>>>>>>>>>> until we actually get the mpirun cmd line. They may
>>>>>>>>>>>>>>>>> have specified a default-hostfile, but they could
>>>>>>>>>>>>>>>>> also specify hostfiles for the individual
>>>>>>>>>>>>>>>>> app_contexts. These may or may not include the node
>>>>>>>>>>>>>>>>> upon which mpirun is executing.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> So the only way to provide you with a separate
>>>>>>>>>>>>>>>>> command to get a hostfile<->nodename mapping would
>>>>>>>>>>>>>>>>> require you to provide us with the default-hostifle
>>>>>>>>>>>>>>>>> and/or hostfile cmd line options just as if you were
>>>>>>>>>>>>>>>>> issuing the mpirun cmd. We just wouldn't launch -
>>>>>>>>>>>>>>>>> but it would be the exact equivalent of doing
>>>>>>>>>>>>>>>>> "mpirun --do-not-launch".
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Am I missing something? If so, please do correct me
>>>>>>>>>>>>>>>>> - I would be happy to provide a tool if that would
>>>>>>>>>>>>>>>>> make it easier. Just not sure what that tool would do.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>>>>> Ralph
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Oct 19, 2008, at 1:59 PM, Greg Watson wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Ralph,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> It seems a little strange to be using mpirun for
>>>>>>>>>>>>>>>>>> this, but barring providing a separate command, or
>>>>>>>>>>>>>>>>>> using ompi_info, I think this would solve our
>>>>>>>>>>>>>>>>>> problem.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Greg
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Oct 17, 2008, at 10:46 AM, Ralph Castain wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Sorry for delay - had to ponder this one for awhile.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Jeff and I agree that adding something to
>>>>>>>>>>>>>>>>>>> ompi_info would not be a good idea. Ompi_info has
>>>>>>>>>>>>>>>>>>> no knowledge or understanding of hostfiles, and
>>>>>>>>>>>>>>>>>>> adding that capability to it would be a major
>>>>>>>>>>>>>>>>>>> distortion of its intended use.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> However, we think we can offer an alternative that
>>>>>>>>>>>>>>>>>>> might better solve the problem. Remember, we now
>>>>>>>>>>>>>>>>>>> treat hostfiles in a very different manner than
>>>>>>>>>>>>>>>>>>> before - see the wiki page for a complete
>>>>>>>>>>>>>>>>>>> description, or "man orte_hosts".
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> So the problem is that, to provide you with what
>>>>>>>>>>>>>>>>>>> you want, we need to "dump" the information from
>>>>>>>>>>>>>>>>>>> whatever default-hostfile was provided, and, if no
>>>>>>>>>>>>>>>>>>> default-hostfile was provided, then the
>>>>>>>>>>>>>>>>>>> information from each hostfile that was provided
>>>>>>>>>>>>>>>>>>> with an app_context.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> The best way we could think of to do this is to
>>>>>>>>>>>>>>>>>>> add another mpirun cmd line option --dump-
>>>>>>>>>>>>>>>>>>> hostfiles that would output the line-by-line name
>>>>>>>>>>>>>>>>>>> from the hostfile plus the name we resolved it to.
>>>>>>>>>>>>>>>>>>> Of course, --xml would cause it to be in xml format.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Would that meet your needs?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Ralph
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Oct 15, 2008, at 3:12 PM, Greg Watson wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Hi Ralph,
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> We've been discussing this back and forth a bit
>>>>>>>>>>>>>>>>>>>> internally and don't really see an easy solution.
>>>>>>>>>>>>>>>>>>>> Our problem is that Eclipse is not running on the
>>>>>>>>>>>>>>>>>>>> head node, so gethostbyname will not necessarily
>>>>>>>>>>>>>>>>>>>> resolve to the same address. For example, the
>>>>>>>>>>>>>>>>>>>> hostfile might refer to the head node by an
>>>>>>>>>>>>>>>>>>>> internal network address that is not visible to
>>>>>>>>>>>>>>>>>>>> the outside world. Since gethostname also looks
>>>>>>>>>>>>>>>>>>>> in /etc/hosts, it may resolve locally but not on
>>>>>>>>>>>>>>>>>>>> a remote system. The only think I can think of
>>>>>>>>>>>>>>>>>>>> would be, rather than us reading the hostfile
>>>>>>>>>>>>>>>>>>>> directly as we do now, to provide an option to
>>>>>>>>>>>>>>>>>>>> ompi_info that would dump the hostfile using the
>>>>>>>>>>>>>>>>>>>> same rules that you apply when you're using the
>>>>>>>>>>>>>>>>>>>> hostfile. Would that be feasible?
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Greg
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On Sep 22, 2008, at 4:25 PM, Ralph Castain wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Sorry for delay - was on vacation and am now
>>>>>>>>>>>>>>>>>>>>> trying to work my way back to the surface.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> I'm not sure I can fix this one for two reasons:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> 1. In general, OMPI doesn't really care what
>>>>>>>>>>>>>>>>>>>>> name is used for the node. However, the problem
>>>>>>>>>>>>>>>>>>>>> is that it needs to be consistent. In this case,
>>>>>>>>>>>>>>>>>>>>> ORTE has already used the name returned by
>>>>>>>>>>>>>>>>>>>>> gethostname to create its session directory
>>>>>>>>>>>>>>>>>>>>> structure long before mpirun reads a hostfile.
>>>>>>>>>>>>>>>>>>>>> This is why we retain the value from gethostname
>>>>>>>>>>>>>>>>>>>>> instead of allowing it to be overwritten by the
>>>>>>>>>>>>>>>>>>>>> name in whatever allocation we are given. Using
>>>>>>>>>>>>>>>>>>>>> the name in hostfile would require that I either
>>>>>>>>>>>>>>>>>>>>> find some way to remember any prior name, or
>>>>>>>>>>>>>>>>>>>>> that I tear down and rebuild the session
>>>>>>>>>>>>>>>>>>>>> directory tree - neither seems attractive nor
>>>>>>>>>>>>>>>>>>>>> simple (e.g., what happens when the user
>>>>>>>>>>>>>>>>>>>>> provides multiple entries in the hostfile for
>>>>>>>>>>>>>>>>>>>>> the node, each with a different IP address based
>>>>>>>>>>>>>>>>>>>>> on another interface in that node? Sounds crazy,
>>>>>>>>>>>>>>>>>>>>> but we have already seen it done - which one do
>>>>>>>>>>>>>>>>>>>>> I use?).
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> 2. We don't actually store the hostfile info
>>>>>>>>>>>>>>>>>>>>> anywhere - we just use it and forget it. For us
>>>>>>>>>>>>>>>>>>>>> to add an XML attribute containing any hostfile-
>>>>>>>>>>>>>>>>>>>>> related info would therefore require us to re-
>>>>>>>>>>>>>>>>>>>>> read the hostfile. I could have it do that -
>>>>>>>>>>>>>>>>>>>>> only- in the case of "XML output required", but
>>>>>>>>>>>>>>>>>>>>> it seems rather ugly.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> An alternative might be for you to simply do a
>>>>>>>>>>>>>>>>>>>>> "gethostbyname" lookup of the IP address or
>>>>>>>>>>>>>>>>>>>>> hostname to see if it matches instead of just
>>>>>>>>>>>>>>>>>>>>> doing a strcmp. This is what we have to do
>>>>>>>>>>>>>>>>>>>>> internally as we frequently have problems with
>>>>>>>>>>>>>>>>>>>>> FQDN vs. non-FQDN vs. IP addresses etc. If the
>>>>>>>>>>>>>>>>>>>>> local OS hasn't cached the IP address for the
>>>>>>>>>>>>>>>>>>>>> node in question it can take a little time to
>>>>>>>>>>>>>>>>>>>>> DNS resolve it, but otherwise works fine.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> I can point you to the code in OPAL that we use
>>>>>>>>>>>>>>>>>>>>> - I would think something similar would be easy
>>>>>>>>>>>>>>>>>>>>> to implement in your code and would readily
>>>>>>>>>>>>>>>>>>>>> solve the problem.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Ralph
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> On Sep 19, 2008, at 7:18 AM, Greg Watson wrote:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Ralph,
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> The problem we're seeing is just with the head
>>>>>>>>>>>>>>>>>>>>>> node. If I specify a particular IP address for
>>>>>>>>>>>>>>>>>>>>>> the head node in the hostfile, it gets changed
>>>>>>>>>>>>>>>>>>>>>> to the FQDN when displayed in the map. This is
>>>>>>>>>>>>>>>>>>>>>> a problem for us as we need to be able to match
>>>>>>>>>>>>>>>>>>>>>> the two, and since we're not necessarily
>>>>>>>>>>>>>>>>>>>>>> running on the head node, we can't always do
>>>>>>>>>>>>>>>>>>>>>> the same resolution you're doing.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Would it be possible to use the same address
>>>>>>>>>>>>>>>>>>>>>> that is specified in the hostfile, or
>>>>>>>>>>>>>>>>>>>>>> alternatively provide an XML attribute that
>>>>>>>>>>>>>>>>>>>>>> contains this information?
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Greg
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> On Sep 11, 2008, at 9:06 AM, Ralph Castain wrote:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Not in that regard, depending upon what you
>>>>>>>>>>>>>>>>>>>>>>> mean by "recently". The only changes I am
>>>>>>>>>>>>>>>>>>>>>>> aware of wrt nodes consisted of some changes
>>>>>>>>>>>>>>>>>>>>>>> to the order in which we use the nodes when
>>>>>>>>>>>>>>>>>>>>>>> specified by hostfile or -host, and a little
>>>>>>>>>>>>>>>>>>>>>>> #if protectionism needed by Brian for the Cray
>>>>>>>>>>>>>>>>>>>>>>> port.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Are you seeing this for every node? Reason I
>>>>>>>>>>>>>>>>>>>>>>> ask: I can't offhand think of anything in the
>>>>>>>>>>>>>>>>>>>>>>> code base that would replace a host name with
>>>>>>>>>>>>>>>>>>>>>>> the FQDN because we don't get that info for
>>>>>>>>>>>>>>>>>>>>>>> remote nodes. The only exception is the head
>>>>>>>>>>>>>>>>>>>>>>> node (where mpirun sits) - in that lone case,
>>>>>>>>>>>>>>>>>>>>>>> we default to the name returned to us by
>>>>>>>>>>>>>>>>>>>>>>> gethostname(). We do that because the head
>>>>>>>>>>>>>>>>>>>>>>> node is frequently accessible on a more global
>>>>>>>>>>>>>>>>>>>>>>> basis than the compute nodes - thus, the FQDN
>>>>>>>>>>>>>>>>>>>>>>> is required to ensure that there is no address
>>>>>>>>>>>>>>>>>>>>>>> confusion on the network.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> If the user refers to compute nodes in a
>>>>>>>>>>>>>>>>>>>>>>> hostfile or -host (or in an allocation from a
>>>>>>>>>>>>>>>>>>>>>>> resource manager) by non-FQDN, we just assume
>>>>>>>>>>>>>>>>>>>>>>> they know what they are doing and the name
>>>>>>>>>>>>>>>>>>>>>>> will correctly resolve to a unique address.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> On Sep 10, 2008, at 9:45 AM, Greg Watson wrote:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Has there been a change in the behavior of
>>>>>>>>>>>>>>>>>>>>>>>> the -display-map option has changed recently
>>>>>>>>>>>>>>>>>>>>>>>> in the 1.3 branch. We're now seeing the host
>>>>>>>>>>>>>>>>>>>>>>>> name as a fully resolved DN rather than the
>>>>>>>>>>>>>>>>>>>>>>>> entry that was specified in the hostfile. Is
>>>>>>>>>>>>>>>>>>>>>>>> there any particular reason for this? If so,
>>>>>>>>>>>>>>>>>>>>>>>> would it be possible to add the hostfile
>>>>>>>>>>>>>>>>>>>>>>>> entry to the output since we need to be able
>>>>>>>>>>>>>>>>>>>>>>>> to match the two?
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Greg
>>>>>>>>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>>>>>>>>> devel mailing list
>>>>>>>>>>>>>>>>>>>>>>>> devel_at_[hidden]
>>>>>>>>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>>>>>>>> devel mailing list
>>>>>>>>>>>>>>>>>>>>>>> devel_at_[hidden]
>>>>>>>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>>>>>>> devel mailing list
>>>>>>>>>>>>>>>>>>>>>> devel_at_[hidden]
>>>>>>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/
>>>>>>>>>>>>>>>>>>>>>> devel
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>>>>>> devel mailing list
>>>>>>>>>>>>>>>>>>>>> devel_at_[hidden]
>>>>>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>>>>> devel mailing list
>>>>>>>>>>>>>>>>>>>> devel_at_[hidden]
>>>>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>>>> devel mailing list
>>>>>>>>>>>>>>>>>>> devel_at_[hidden]
>>>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>>> devel mailing list
>>>>>>>>>>>>>>>>>> devel_at_[hidden]
>>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>> devel mailing list
>>>>>>>>>>>>>>>>> devel_at_[hidden]
>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>> devel mailing list
>>>>>>>>>>>>>>>> devel_at_[hidden]
>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>> devel mailing list
>>>>>>>>>>>>>>> devel_at_[hidden]
>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>> devel mailing list
>>>>>>>>>>>>>> devel_at_[hidden]
>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>> devel mailing list
>>>>>>>>>>>>> devel_at_[hidden]
>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>>>>>>
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> devel mailing list
>>>>>>>>>>>> devel_at_[hidden]
>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> devel mailing list
>>>>>>>>>>> devel_at_[hidden]
>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> devel mailing list
>>>>>>>>>> devel_at_[hidden]
>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> devel mailing list
>>>>>>>>> devel_at_[hidden]
>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> devel mailing list
>>>>>>>> devel_at_[hidden]
>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> devel mailing list
>>>>>>> devel_at_[hidden]
>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>
>>>>>> _______________________________________________
>>>>>> devel mailing list
>>>>>> devel_at_[hidden]
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> devel_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>
>>>> _______________________________________________
>>>> devel mailing list
>>>> devel_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel