Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] -display-map
From: Greg Watson (g.watson_at_[hidden])
Date: 2009-01-20 15:00:03


Looks good now. Thanks!

Greg

On Jan 20, 2009, at 12:00 PM, Ralph Castain wrote:

> I'm embarrassed to admit that I never actually implemented the xml
> option for tag-output...this has been rectified with r20302.
>
> Let me know if that works for you - sorry for confusion.
>
> Ralph
>
>
> On Jan 20, 2009, at 8:08 AM, Greg Watson wrote:
>
>> Ralph,
>>
>> The encapsulation is not quite right yet. I'm seeing this:
>>
>> [1,0]<stdout>n = 0
>> [1,1]<stdout>n = 0
>>
>> but it should be:
>>
>> <stdout rank="0">n = 0</stdout>
>> <stdout rank="1">n = 0</stdout>
>>
>> Thanks,
>>
>> Greg
>>
>> On Jan 20, 2009, at 9:20 AM, Ralph Castain wrote:
>>
>>> You need to add --tag-output - this is a separate option as it
>>> applies both to xml and non-xml situations.
>>>
>>> If you like, I can force tag-output "on" by default whenever -xml
>>> is specified.
>>>
>>> Ralph
>>>
>>>
>>> On Jan 16, 2009, at 12:52 PM, Greg Watson wrote:
>>>
>>>> Ralph,
>>>>
>>>> Is there something I need to do to enable stdout/err
>>>> encapsulation (apart from -xml)? Here's what I see:
>>>>
>>>> $ mpirun -mca orte_show_resolved_nodenames 1 -xml -display-map -
>>>> np 5 /Users/greg/Documents/workspace1/testMPI/Debug/testMPI
>>>> <map>
>>>> <host name="Jarrah.local" slots="8" max_slots="0">
>>>> <noderesolve resolved="node0"/>
>>>> <noderesolve resolved="node1"/>
>>>> <noderesolve resolved="node2"/>
>>>> <noderesolve resolved="node3"/>
>>>> <noderesolve resolved="node4"/>
>>>> <noderesolve resolved="node5"/>
>>>> <noderesolve resolved="node6"/>
>>>> <noderesolve resolved="node7"/>
>>>> <process rank="0"/>
>>>> <process rank="1"/>
>>>> <process rank="2"/>
>>>> <process rank="3"/>
>>>> <process rank="4"/>
>>>> </host>
>>>> </map>
>>>> n = 0
>>>> n = 0
>>>> n = 0
>>>> n = 0
>>>> n = 0
>>>>
>>>> On Jan 15, 2009, at 1:13 PM, Ralph Castain wrote:
>>>>
>>>>> Okay, it is in the trunk as of r20284 - I'll file the request to
>>>>> have it moved to 1.3.1.
>>>>>
>>>>> Let me know if you get a chance to test the stdout/err stuff in
>>>>> the trunk - we should try and iterate it so any changes can make
>>>>> 1.3.1 as well.
>>>>>
>>>>> Thanks!
>>>>> Ralph
>>>>>
>>>>>
>>>>> On Jan 15, 2009, at 11:03 AM, Greg Watson wrote:
>>>>>
>>>>>> Ralph,
>>>>>>
>>>>>> I think the second form would be ideal and would simplify
>>>>>> things greatly.
>>>>>>
>>>>>> Greg
>>>>>>
>>>>>> On Jan 15, 2009, at 10:53 AM, Ralph Castain wrote:
>>>>>>
>>>>>>> Here is what I was able to do - note that the resolve messages
>>>>>>> are associated with the specific hostname, not the overall map:
>>>>>>>
>>>>>>> <map>
>>>>>>> <host name="graywolf54.lanl.gov" slots="1" max_slots="0">
>>>>>>> <noderesolve name="graywolf54.lanl.gov" resolved="localhost"/>
>>>>>>> <process rank="0"/>
>>>>>>> <process rank="1"/>
>>>>>>> <process rank="2"/>
>>>>>>> </host>
>>>>>>> </map>
>>>>>>>
>>>>>>> Will that work for you? If you like, I can remove the name=
>>>>>>> field from the noderesolve element since the info is specific
>>>>>>> to the host element that contains it. In other words, I can
>>>>>>> make it look like this:
>>>>>>>
>>>>>>> <map>
>>>>>>> <host name="graywolf54.lanl.gov" slots="1" max_slots="0">
>>>>>>> <noderesolve resolved="localhost"/>
>>>>>>> <process rank="0"/>
>>>>>>> <process rank="1"/>
>>>>>>> <process rank="2"/>
>>>>>>> </host>
>>>>>>> </map>
>>>>>>>
>>>>>>> if that would help.
>>>>>>>
>>>>>>> Ralph
>>>>>>>
>>>>>>>
>>>>>>> On Jan 14, 2009, at 7:57 AM, Ralph Castain wrote:
>>>>>>>
>>>>>>>> We -may- be able to do a more formal XML output at some
>>>>>>>> point. The problem will be the natural interleaving of stdout/
>>>>>>>> err from the various procs due to the async behavior of MPI.
>>>>>>>> Mpirun receives fragmented output in the forwarding system,
>>>>>>>> limited by the buffer sizes and the amount of data we can
>>>>>>>> read at any one "bite" from the pipes connecting us to the
>>>>>>>> procs. So even though the user -thinks- they output a single
>>>>>>>> large line of stuff, it may show up at mpirun as a series of
>>>>>>>> fragments. Hence, it gets tricky to know how to put
>>>>>>>> appropriate XML brackets around it.
>>>>>>>>
>>>>>>>> Given this input about when you actually want resolved name
>>>>>>>> info, I can at least do something about that area. Won't be
>>>>>>>> in 1.3.0, but should make 1.3.1.
>>>>>>>>
>>>>>>>> As for XML-tagged stdout/err: the OMPI community asked me not
>>>>>>>> to turn that feature "on" for 1.3.0 as they felt it hasn't
>>>>>>>> been adequately tested yet. The code is present, but cannot
>>>>>>>> be activated in 1.3.0. However, I believe it is activated on
>>>>>>>> the trunk when you do --xml --tagged-output, so perhaps some
>>>>>>>> testing will help us debug and validate it adequately for
>>>>>>>> 1.3.1?
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>> Ralph
>>>>>>>>
>>>>>>>>
>>>>>>>> On Jan 14, 2009, at 7:02 AM, Greg Watson wrote:
>>>>>>>>
>>>>>>>>> Ralph,
>>>>>>>>>
>>>>>>>>> The only time we use the resolved names is when we get a
>>>>>>>>> map, so we consider them part of the map output.
>>>>>>>>>
>>>>>>>>> If quasi-XML is all that will ever be possible with 1.3,
>>>>>>>>> then you may as well leave as-is and we will attempt to
>>>>>>>>> clean it up in Eclipse. It would be nice if a future version
>>>>>>>>> of ompi could output correct XML (including stdout) as this
>>>>>>>>> would vastly simplify the parsing we need to do.
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>>
>>>>>>>>> Greg
>>>>>>>>>
>>>>>>>>> On Jan 13, 2009, at 3:30 PM, Ralph Castain wrote:
>>>>>>>>>
>>>>>>>>>> Hmmm...well, I can't do either for 1.3.0 as it is departing
>>>>>>>>>> this afternoon.
>>>>>>>>>>
>>>>>>>>>> The first option would be very hard to do. I would have to
>>>>>>>>>> expose the display-map option across the code base and
>>>>>>>>>> check it prior to printing anything about resolving node
>>>>>>>>>> names. I guess I should ask: do you only want noderesolve
>>>>>>>>>> statements when we are displaying the map? Right now, I
>>>>>>>>>> will output them regardless.
>>>>>>>>>>
>>>>>>>>>> The second option could be done. I could check if any
>>>>>>>>>> "display" option has been specified, and output the <ompi>
>>>>>>>>>> root at that time (likewise for the end). Anything we
>>>>>>>>>> output in-between would be encapsulated between the two,
>>>>>>>>>> but that would include any user output to stdout and/or
>>>>>>>>>> stderr - which for 1.3.0 is not in xml.
>>>>>>>>>>
>>>>>>>>>> Any thoughts?
>>>>>>>>>>
>>>>>>>>>> Ralph
>>>>>>>>>>
>>>>>>>>>> PS. Guess I should clarify that I was not striving for true
>>>>>>>>>> XML interaction here, but rather a quasi-XML format that
>>>>>>>>>> would help you to filter the output. I have no problem
>>>>>>>>>> trying to get to something more formally correct, but it
>>>>>>>>>> could be tricky in some places to achieve it due to the
>>>>>>>>>> inherent async nature of the beast.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Jan 13, 2009, at 12:17 PM, Greg Watson wrote:
>>>>>>>>>>
>>>>>>>>>>> Ralph,
>>>>>>>>>>>
>>>>>>>>>>> The XML is looking better now, but there is still one
>>>>>>>>>>> problem. To be valid, there needs to be only one root
>>>>>>>>>>> element, but currently you don't have any (or many). So
>>>>>>>>>>> rather than:
>>>>>>>>>>>
>>>>>>>>>>> <noderesolve name="node0" resolved="Jarrah.local"/>
>>>>>>>>>>> <noderesolve name="node1" resolved="Jarrah.local"/>
>>>>>>>>>>> <map>
>>>>>>>>>>> <host name="Jarrah.local" slots="8" max_slots="0">
>>>>>>>>>>> <process rank="0"/>
>>>>>>>>>>> <process rank="1"/>
>>>>>>>>>>> <process rank="2"/>
>>>>>>>>>>> <process rank="3"/>
>>>>>>>>>>> <process rank="4"/>
>>>>>>>>>>> </host>
>>>>>>>>>>> </map>
>>>>>>>>>>>
>>>>>>>>>>> the XML should be:
>>>>>>>>>>>
>>>>>>>>>>> <map>
>>>>>>>>>>> <noderesolve name="node0" resolved="Jarrah.local"/>
>>>>>>>>>>> <noderesolve name="node1" resolved="Jarrah.local"/>
>>>>>>>>>>> <host name="Jarrah.local" slots="8" max_slots="0">
>>>>>>>>>>> <process rank="0"/>
>>>>>>>>>>> <process rank="1"/>
>>>>>>>>>>> <process rank="2"/>
>>>>>>>>>>> <process rank="3"/>
>>>>>>>>>>> <process rank="4"/>
>>>>>>>>>>> </host>
>>>>>>>>>>> </map>
>>>>>>>>>>>
>>>>>>>>>>> or:
>>>>>>>>>>>
>>>>>>>>>>> <ompi>
>>>>>>>>>>> <noderesolve name="node0" resolved="Jarrah.local"/>
>>>>>>>>>>> <noderesolve name="node1" resolved="Jarrah.local"/>
>>>>>>>>>>> <map>
>>>>>>>>>>> <host name="Jarrah.local" slots="8" max_slots="0">
>>>>>>>>>>> <process rank="0"/>
>>>>>>>>>>> <process rank="1"/>
>>>>>>>>>>> <process rank="2"/>
>>>>>>>>>>> <process rank="3"/>
>>>>>>>>>>> <process rank="4"/>
>>>>>>>>>>> </host>
>>>>>>>>>>> </map>
>>>>>>>>>>> </ompi>
>>>>>>>>>>>
>>>>>>>>>>> Would either of these be possible?
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>>
>>>>>>>>>>> Greg
>>>>>>>>>>>
>>>>>>>>>>> On Dec 8, 2008, at 2:18 PM, Greg Watson wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Ok thanks. I'll test from trunk in future.
>>>>>>>>>>>>
>>>>>>>>>>>> Greg
>>>>>>>>>>>>
>>>>>>>>>>>> On Dec 8, 2008, at 2:05 PM, Ralph Castain wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Working its way around the CMR process now.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Might be easier in the future if we could test/debug
>>>>>>>>>>>>> this in the trunk, though. Otherwise, the CMR procedure
>>>>>>>>>>>>> will fall behind and a fix might miss a release window.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Anyway, hopefully this one will make the 1.3.0 release
>>>>>>>>>>>>> cutoff.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>> Ralph
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Dec 8, 2008, at 9:56 AM, Greg Watson wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi Ralph,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> This is now in 1.3rc2, thanks. However there are a
>>>>>>>>>>>>>> couple of problems. Here is what I see:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> [Jarrah.watson.ibm.com:58957] <noderesolve name="node0"
>>>>>>>>>>>>>> resolved="Jarrah.watson.ibm.com">
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> For some reason each line is prefixed with "[...]", any
>>>>>>>>>>>>>> idea why this is? Also the end tag should be "/>" not
>>>>>>>>>>>>>> ">".
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Greg
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Nov 24, 2008, at 3:06 PM, Greg Watson wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Great, thanks. I'll take a look once it comes over to
>>>>>>>>>>>>>>> 1.3.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Greg
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Nov 24, 2008, at 2:59 PM, Ralph Castain wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Yo Greg
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> This is in the trunk as of r20032. I'll bring it over
>>>>>>>>>>>>>>>> to 1.3 in a few days.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I implemented it as another MCA param
>>>>>>>>>>>>>>>> "orte_show_resolved_nodenames" so you can actually
>>>>>>>>>>>>>>>> get the info as you execute the job, if you want. The
>>>>>>>>>>>>>>>> xml tag is "noderesolve" - let me know if you need
>>>>>>>>>>>>>>>> any changes.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Ralph
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Oct 22, 2008, at 11:55 AM, Greg Watson wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Ralph,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I guess the issue for us is that we will have to run
>>>>>>>>>>>>>>>>> two commands to get the information we need. One to
>>>>>>>>>>>>>>>>> get the configuration information, such as version
>>>>>>>>>>>>>>>>> and MCA parameters, and one to get the host
>>>>>>>>>>>>>>>>> information, whereas it would seem more logical that
>>>>>>>>>>>>>>>>> this should all be available via some kind of
>>>>>>>>>>>>>>>>> "configuration discovery" command. I understand the
>>>>>>>>>>>>>>>>> issue with supplying the hostfile though, so maybe
>>>>>>>>>>>>>>>>> this just points at the need for us to separate
>>>>>>>>>>>>>>>>> configuration information from the host information.
>>>>>>>>>>>>>>>>> In any case, we'll work with what you think is best.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Greg
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Oct 20, 2008, at 4:49 PM, Ralph Castain wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Hmmm...just to be sure we are all clear on this.
>>>>>>>>>>>>>>>>>> The reason we proposed to use mpirun is that
>>>>>>>>>>>>>>>>>> "hostfile" has no meaning outside of mpirun. That's
>>>>>>>>>>>>>>>>>> why ompi_info can't do anything in this regard.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> We have no idea what hostfile the user may specify
>>>>>>>>>>>>>>>>>> until we actually get the mpirun cmd line. They may
>>>>>>>>>>>>>>>>>> have specified a default-hostfile, but they could
>>>>>>>>>>>>>>>>>> also specify hostfiles for the individual
>>>>>>>>>>>>>>>>>> app_contexts. These may or may not include the node
>>>>>>>>>>>>>>>>>> upon which mpirun is executing.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> So the only way to provide you with a separate
>>>>>>>>>>>>>>>>>> command to get a hostfile<->nodename mapping would
>>>>>>>>>>>>>>>>>> require you to provide us with the default-hostifle
>>>>>>>>>>>>>>>>>> and/or hostfile cmd line options just as if you
>>>>>>>>>>>>>>>>>> were issuing the mpirun cmd. We just wouldn't
>>>>>>>>>>>>>>>>>> launch - but it would be the exact equivalent of
>>>>>>>>>>>>>>>>>> doing "mpirun --do-not-launch".
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Am I missing something? If so, please do correct me
>>>>>>>>>>>>>>>>>> - I would be happy to provide a tool if that would
>>>>>>>>>>>>>>>>>> make it easier. Just not sure what that tool would
>>>>>>>>>>>>>>>>>> do.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>>>>>> Ralph
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Oct 19, 2008, at 1:59 PM, Greg Watson wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Ralph,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> It seems a little strange to be using mpirun for
>>>>>>>>>>>>>>>>>>> this, but barring providing a separate command, or
>>>>>>>>>>>>>>>>>>> using ompi_info, I think this would solve our
>>>>>>>>>>>>>>>>>>> problem.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Greg
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Oct 17, 2008, at 10:46 AM, Ralph Castain wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Sorry for delay - had to ponder this one for
>>>>>>>>>>>>>>>>>>>> awhile.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Jeff and I agree that adding something to
>>>>>>>>>>>>>>>>>>>> ompi_info would not be a good idea. Ompi_info has
>>>>>>>>>>>>>>>>>>>> no knowledge or understanding of hostfiles, and
>>>>>>>>>>>>>>>>>>>> adding that capability to it would be a major
>>>>>>>>>>>>>>>>>>>> distortion of its intended use.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> However, we think we can offer an alternative
>>>>>>>>>>>>>>>>>>>> that might better solve the problem. Remember, we
>>>>>>>>>>>>>>>>>>>> now treat hostfiles in a very different manner
>>>>>>>>>>>>>>>>>>>> than before - see the wiki page for a complete
>>>>>>>>>>>>>>>>>>>> description, or "man orte_hosts".
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> So the problem is that, to provide you with what
>>>>>>>>>>>>>>>>>>>> you want, we need to "dump" the information from
>>>>>>>>>>>>>>>>>>>> whatever default-hostfile was provided, and, if
>>>>>>>>>>>>>>>>>>>> no default-hostfile was provided, then the
>>>>>>>>>>>>>>>>>>>> information from each hostfile that was provided
>>>>>>>>>>>>>>>>>>>> with an app_context.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> The best way we could think of to do this is to
>>>>>>>>>>>>>>>>>>>> add another mpirun cmd line option --dump-
>>>>>>>>>>>>>>>>>>>> hostfiles that would output the line-by-line name
>>>>>>>>>>>>>>>>>>>> from the hostfile plus the name we resolved it
>>>>>>>>>>>>>>>>>>>> to. Of course, --xml would cause it to be in xml
>>>>>>>>>>>>>>>>>>>> format.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Would that meet your needs?
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Ralph
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On Oct 15, 2008, at 3:12 PM, Greg Watson wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Hi Ralph,
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> We've been discussing this back and forth a bit
>>>>>>>>>>>>>>>>>>>>> internally and don't really see an easy
>>>>>>>>>>>>>>>>>>>>> solution. Our problem is that Eclipse is not
>>>>>>>>>>>>>>>>>>>>> running on the head node, so gethostbyname will
>>>>>>>>>>>>>>>>>>>>> not necessarily resolve to the same address. For
>>>>>>>>>>>>>>>>>>>>> example, the hostfile might refer to the head
>>>>>>>>>>>>>>>>>>>>> node by an internal network address that is not
>>>>>>>>>>>>>>>>>>>>> visible to the outside world. Since gethostname
>>>>>>>>>>>>>>>>>>>>> also looks in /etc/hosts, it may resolve locally
>>>>>>>>>>>>>>>>>>>>> but not on a remote system. The only think I can
>>>>>>>>>>>>>>>>>>>>> think of would be, rather than us reading the
>>>>>>>>>>>>>>>>>>>>> hostfile directly as we do now, to provide an
>>>>>>>>>>>>>>>>>>>>> option to ompi_info that would dump the hostfile
>>>>>>>>>>>>>>>>>>>>> using the same rules that you apply when you're
>>>>>>>>>>>>>>>>>>>>> using the hostfile. Would that be feasible?
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Greg
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> On Sep 22, 2008, at 4:25 PM, Ralph Castain wrote:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Sorry for delay - was on vacation and am now
>>>>>>>>>>>>>>>>>>>>>> trying to work my way back to the surface.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> I'm not sure I can fix this one for two reasons:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> 1. In general, OMPI doesn't really care what
>>>>>>>>>>>>>>>>>>>>>> name is used for the node. However, the problem
>>>>>>>>>>>>>>>>>>>>>> is that it needs to be consistent. In this
>>>>>>>>>>>>>>>>>>>>>> case, ORTE has already used the name returned
>>>>>>>>>>>>>>>>>>>>>> by gethostname to create its session directory
>>>>>>>>>>>>>>>>>>>>>> structure long before mpirun reads a hostfile.
>>>>>>>>>>>>>>>>>>>>>> This is why we retain the value from
>>>>>>>>>>>>>>>>>>>>>> gethostname instead of allowing it to be
>>>>>>>>>>>>>>>>>>>>>> overwritten by the name in whatever allocation
>>>>>>>>>>>>>>>>>>>>>> we are given. Using the name in hostfile would
>>>>>>>>>>>>>>>>>>>>>> require that I either find some way to remember
>>>>>>>>>>>>>>>>>>>>>> any prior name, or that I tear down and rebuild
>>>>>>>>>>>>>>>>>>>>>> the session directory tree - neither seems
>>>>>>>>>>>>>>>>>>>>>> attractive nor simple (e.g., what happens when
>>>>>>>>>>>>>>>>>>>>>> the user provides multiple entries in the
>>>>>>>>>>>>>>>>>>>>>> hostfile for the node, each with a different IP
>>>>>>>>>>>>>>>>>>>>>> address based on another interface in that
>>>>>>>>>>>>>>>>>>>>>> node? Sounds crazy, but we have already seen it
>>>>>>>>>>>>>>>>>>>>>> done - which one do I use?).
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> 2. We don't actually store the hostfile info
>>>>>>>>>>>>>>>>>>>>>> anywhere - we just use it and forget it. For us
>>>>>>>>>>>>>>>>>>>>>> to add an XML attribute containing any hostfile-
>>>>>>>>>>>>>>>>>>>>>> related info would therefore require us to re-
>>>>>>>>>>>>>>>>>>>>>> read the hostfile. I could have it do that -
>>>>>>>>>>>>>>>>>>>>>> only- in the case of "XML output required", but
>>>>>>>>>>>>>>>>>>>>>> it seems rather ugly.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> An alternative might be for you to simply do a
>>>>>>>>>>>>>>>>>>>>>> "gethostbyname" lookup of the IP address or
>>>>>>>>>>>>>>>>>>>>>> hostname to see if it matches instead of just
>>>>>>>>>>>>>>>>>>>>>> doing a strcmp. This is what we have to do
>>>>>>>>>>>>>>>>>>>>>> internally as we frequently have problems with
>>>>>>>>>>>>>>>>>>>>>> FQDN vs. non-FQDN vs. IP addresses etc. If the
>>>>>>>>>>>>>>>>>>>>>> local OS hasn't cached the IP address for the
>>>>>>>>>>>>>>>>>>>>>> node in question it can take a little time to
>>>>>>>>>>>>>>>>>>>>>> DNS resolve it, but otherwise works fine.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> I can point you to the code in OPAL that we use
>>>>>>>>>>>>>>>>>>>>>> - I would think something similar would be easy
>>>>>>>>>>>>>>>>>>>>>> to implement in your code and would readily
>>>>>>>>>>>>>>>>>>>>>> solve the problem.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Ralph
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> On Sep 19, 2008, at 7:18 AM, Greg Watson wrote:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Ralph,
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> The problem we're seeing is just with the head
>>>>>>>>>>>>>>>>>>>>>>> node. If I specify a particular IP address for
>>>>>>>>>>>>>>>>>>>>>>> the head node in the hostfile, it gets changed
>>>>>>>>>>>>>>>>>>>>>>> to the FQDN when displayed in the map. This is
>>>>>>>>>>>>>>>>>>>>>>> a problem for us as we need to be able to
>>>>>>>>>>>>>>>>>>>>>>> match the two, and since we're not necessarily
>>>>>>>>>>>>>>>>>>>>>>> running on the head node, we can't always do
>>>>>>>>>>>>>>>>>>>>>>> the same resolution you're doing.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Would it be possible to use the same address
>>>>>>>>>>>>>>>>>>>>>>> that is specified in the hostfile, or
>>>>>>>>>>>>>>>>>>>>>>> alternatively provide an XML attribute that
>>>>>>>>>>>>>>>>>>>>>>> contains this information?
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Greg
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> On Sep 11, 2008, at 9:06 AM, Ralph Castain
>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Not in that regard, depending upon what you
>>>>>>>>>>>>>>>>>>>>>>>> mean by "recently". The only changes I am
>>>>>>>>>>>>>>>>>>>>>>>> aware of wrt nodes consisted of some changes
>>>>>>>>>>>>>>>>>>>>>>>> to the order in which we use the nodes when
>>>>>>>>>>>>>>>>>>>>>>>> specified by hostfile or -host, and a little
>>>>>>>>>>>>>>>>>>>>>>>> #if protectionism needed by Brian for the
>>>>>>>>>>>>>>>>>>>>>>>> Cray port.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Are you seeing this for every node? Reason I
>>>>>>>>>>>>>>>>>>>>>>>> ask: I can't offhand think of anything in the
>>>>>>>>>>>>>>>>>>>>>>>> code base that would replace a host name with
>>>>>>>>>>>>>>>>>>>>>>>> the FQDN because we don't get that info for
>>>>>>>>>>>>>>>>>>>>>>>> remote nodes. The only exception is the head
>>>>>>>>>>>>>>>>>>>>>>>> node (where mpirun sits) - in that lone case,
>>>>>>>>>>>>>>>>>>>>>>>> we default to the name returned to us by
>>>>>>>>>>>>>>>>>>>>>>>> gethostname(). We do that because the head
>>>>>>>>>>>>>>>>>>>>>>>> node is frequently accessible on a more
>>>>>>>>>>>>>>>>>>>>>>>> global basis than the compute nodes - thus,
>>>>>>>>>>>>>>>>>>>>>>>> the FQDN is required to ensure that there is
>>>>>>>>>>>>>>>>>>>>>>>> no address confusion on the network.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> If the user refers to compute nodes in a
>>>>>>>>>>>>>>>>>>>>>>>> hostfile or -host (or in an allocation from a
>>>>>>>>>>>>>>>>>>>>>>>> resource manager) by non-FQDN, we just assume
>>>>>>>>>>>>>>>>>>>>>>>> they know what they are doing and the name
>>>>>>>>>>>>>>>>>>>>>>>> will correctly resolve to a unique address.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> On Sep 10, 2008, at 9:45 AM, Greg Watson wrote:
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Has there been a change in the behavior of
>>>>>>>>>>>>>>>>>>>>>>>>> the -display-map option has changed recently
>>>>>>>>>>>>>>>>>>>>>>>>> in the 1.3 branch. We're now seeing the host
>>>>>>>>>>>>>>>>>>>>>>>>> name as a fully resolved DN rather than the
>>>>>>>>>>>>>>>>>>>>>>>>> entry that was specified in the hostfile. Is
>>>>>>>>>>>>>>>>>>>>>>>>> there any particular reason for this? If so,
>>>>>>>>>>>>>>>>>>>>>>>>> would it be possible to add the hostfile
>>>>>>>>>>>>>>>>>>>>>>>>> entry to the output since we need to be able
>>>>>>>>>>>>>>>>>>>>>>>>> to match the two?
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Greg
>>>>>>>>>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>>>>>>>>>> devel mailing list
>>>>>>>>>>>>>>>>>>>>>>>>> devel_at_[hidden]
>>>>>>>>>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>>>>>>>>> devel mailing list
>>>>>>>>>>>>>>>>>>>>>>>> devel_at_[hidden]
>>>>>>>>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>>>>>>>> devel mailing list
>>>>>>>>>>>>>>>>>>>>>>> devel_at_[hidden]
>>>>>>>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>>>>>>> devel mailing list
>>>>>>>>>>>>>>>>>>>>>> devel_at_[hidden]
>>>>>>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/
>>>>>>>>>>>>>>>>>>>>>> devel
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>>>>>> devel mailing list
>>>>>>>>>>>>>>>>>>>>> devel_at_[hidden]
>>>>>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>>>>> devel mailing list
>>>>>>>>>>>>>>>>>>>> devel_at_[hidden]
>>>>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>>>> devel mailing list
>>>>>>>>>>>>>>>>>>> devel_at_[hidden]
>>>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>>> devel mailing list
>>>>>>>>>>>>>>>>>> devel_at_[hidden]
>>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>> devel mailing list
>>>>>>>>>>>>>>>>> devel_at_[hidden]
>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>> devel mailing list
>>>>>>>>>>>>>>>> devel_at_[hidden]
>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>> devel mailing list
>>>>>>>>>>>>>>> devel_at_[hidden]
>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>> devel mailing list
>>>>>>>>>>>>>> devel_at_[hidden]
>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>>>>>>>
>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>> devel mailing list
>>>>>>>>>>>>> devel_at_[hidden]
>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>>>>>>
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> devel mailing list
>>>>>>>>>>>> devel_at_[hidden]
>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> devel mailing list
>>>>>>>>>>> devel_at_[hidden]
>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> devel mailing list
>>>>>>>>>> devel_at_[hidden]
>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> devel mailing list
>>>>>>>>> devel_at_[hidden]
>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> devel mailing list
>>>>>>>> devel_at_[hidden]
>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> devel mailing list
>>>>>>> devel_at_[hidden]
>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>
>>>>>> _______________________________________________
>>>>>> devel mailing list
>>>>>> devel_at_[hidden]
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> devel_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>
>>>> _______________________________________________
>>>> devel mailing list
>>>> devel_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel