Yes I understand what you are doing but there is still a possible error
case I was trying to consider and your initial placement of the call
outside of the af==AF_INET check lead me to assume you were using
something other than IPv4 which is why I was asking if you had an
example. You don't and that is fine. Thanks again for the feedback, it
is appreciated. I will make a change.
On 01/07/10 18:02, Dennis Schridde wrote:
> Hello Don!
> Am Donnerstag, 7. Januar 2010 23:22:27 schrieben Sie:
>> I am assuming you are using something other than IPv4 so I am curious
>> what the string looks like when you call
>> inet_ntop(AF_INET, (void *) &btl_addr->sin_addr,
>> btl_addr_string, INET_ADDRSTRLEN);
>> when the address is not of the AF_INET family? Do you have an example
>> of this?
> The address is indeed of family AF_INET, e.g. "10.0.0.1".
> The issue in btl_udapl_proc is that it does not initialise btl_addr_string for
> every possible code path (peer_proc->proc_addr_count <= 0).
> Thus the error message sent by the nodes may contain garbage / uninitialised
> bytes. (As in our case.)
> I fixed that by initialising btl_addr_string at the earliest possible point,
> which is outside the AF_INET check and the loop over the proc addresses.
> This also prevents the string from being copied multiple times within the
> loop, which seems just unnecessary.
> You are right with your doubts that the move out of the check for AF_INET is
> correct, since inet_ntop is called for af=AF_INET. It should better be located
> inside the if block.
> The reason that peer_proc->proc_addr_count was <= 0 must have had something to
> do with uDAPL not being setup correctly on our cluster. (I didnt check for the
> exact value of peer_proc->proc_addr_count, just guessed, because the loop was
> obviously never executed.)
> We actually want to connect the nodes via IB and uDAPL was just the default
> BTL choosen by OpenMPI. We are now using "--mca btl openib,self", as we
> figured out uDAPL is not at all needed to connect the nodes via IB.
> I still found it reasonable to report the issue detected, together with a
> possible fix.
> users mailing list