On Mar 3, 2010, at 15:04 , Jeff Squyres wrote:
> On Mar 3, 2010, at 2:06 PM, Iain Bason wrote:
>>> 1. The individual entries now behave like pseudo-regexp's rather that strict matching. We used strict matching before this for a reason. If we want to allow regexp-like behavior, then I think we should enable that with special characters -- that's the customary/usual way to do it.
>> The history of this particular piece of code is that it used to use strncmp. George Bosilca changed it last summer, incidental to a larger change (r21652). The commit comment was not particularly illuminating on this issue, in my opinion:
> You're right -- it's not illuminating... :-\
>>> 2. All other <foo>_in|exclude behavior in ompi is strict matching, not prefix matching. I'm uncomfortable with the disparity.
>> That turns out not to be the case. Look in btl_tcp_proc.c/mca_btl_tcp_retrieve_local_interfaces.
I guess this is the result different developers with different ideas working on a non consistent way. This is without talking about the fact that we do the same checking in several places, and we duplicate the code in a way that doesn't enforce any consistency. Anyway, now that this problem is highlighted, we should clearly fix it.
> Mmmm... good point. I was thinking specifically of the if_in|exclude behavior in the openib BTL. That uses strcmp, not strncmp. Here's a complete list:
> ompi_info --param all all --parsable | grep include | grep :value:
> Do we know what these others do? I only checked openib_if_*clude -- it's strcmp.
>>> Additionally, if loopback is now handled properly via change #2, shouldn't the default value for the btl_tcp_if_exclude parameter now be empty?
>> That's a good question. Enabling the "lo" interface results in intra-node messages being striped across that interface in addition to the others on a system. I don't know what impact that would have, if any.
> sm and self should still be prioritized above it, right? If so, we should be ok.
> However, I think you're right that the addition of striping across lo* in addition to the other interfaces might have an unknown effect.
This is not supposed to happen. The sm BTL has a high exclusivity, which will prevent the TCP BTL to be used for the same peer. But again, this was the case a while ago, there is nothing to guarantee that the code is still doing what it was supposed to.
> Here's a random question -- if a user does not use the sm btl, would sending messages through lo for on-node communication be potentially better than sending it through a real device, given that that real device may be far away (in the NUMA sense of "far")? I.e., are OS's typically smart enough to know that loopback traffic may be able to stay local to the NUMA node, vs. sending it out to a device and back? Or are OS's smart enough to know that if the both ends of a TCP socket are on the same node -- regardless of what IP interface they use -- and if both processes are on the same NUMA locality, that the data can stay local and not have to make a round trip to the device?
> (I admit that this is a fairly corner case -- doing on-node communication but *not* using the sm btl...)
>>> Actually -- thinking about this a little more, does opal_net_islocalhost() guarantee to work on peer interfaces?
>> It looks to see whether the IP address is (v4) 127.0.0.1, or (v6) ::1. I believe that these values are dictated by the relevant RFCs (but I haven't looked to make sure).
> Good enough -- thanks! (I was thinking that it might be checking interfaces, not IP addrs -- so 127.x checking should be fine here)
> Jeff Squyres
> For corporate legal information go to:
> devel mailing list