Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] btl udapl leaves string uninitialised
From: Don Kerr (Don.Kerr_at_[hidden])
Date: 2010-01-07 21:18:52


Yes I understand what you are doing but there is still a possible error
case I was trying to consider and your initial placement of the call
outside of the af==AF_INET check lead me to assume you were using
something other than IPv4 which is why I was asking if you had an
example. You don't and that is fine. Thanks again for the feedback, it
is appreciated. I will make a change.

-DON

On 01/07/10 18:02, Dennis Schridde wrote:
> Hello Don!
>
> Am Donnerstag, 7. Januar 2010 23:22:27 schrieben Sie:
>
>> I am assuming you are using something other than IPv4 so I am curious
>> what the string looks like when you call
>> "
>> inet_ntop(AF_INET, (void *) &btl_addr->sin_addr,
>> btl_addr_string, INET_ADDRSTRLEN);
>> "
>>
>> when the address is not of the AF_INET family? Do you have an example
>> of this?
>>
> The address is indeed of family AF_INET, e.g. "10.0.0.1".
>
> The issue in btl_udapl_proc is that it does not initialise btl_addr_string for
> every possible code path (peer_proc->proc_addr_count <= 0).
> Thus the error message sent by the nodes may contain garbage / uninitialised
> bytes. (As in our case.)
>
> I fixed that by initialising btl_addr_string at the earliest possible point,
> which is outside the AF_INET check and the loop over the proc addresses.
> This also prevents the string from being copied multiple times within the
> loop, which seems just unnecessary.
> You are right with your doubts that the move out of the check for AF_INET is
> correct, since inet_ntop is called for af=AF_INET. It should better be located
> inside the if block.
>
> The reason that peer_proc->proc_addr_count was <= 0 must have had something to
> do with uDAPL not being setup correctly on our cluster. (I didnt check for the
> exact value of peer_proc->proc_addr_count, just guessed, because the loop was
> obviously never executed.)
>
> We actually want to connect the nodes via IB and uDAPL was just the default
> BTL choosen by OpenMPI. We are now using "--mca btl openib,self", as we
> figured out uDAPL is not at all needed to connect the nodes via IB.
> I still found it reasonable to report the issue detected, together with a
> possible fix.
>
> --Dennis
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>