Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r25015
From: Aurélien Bouteiller (bouteill_at_[hidden])
Date: 2011-08-08 16:16:01


I see no problem with the latest code from Wesley. The warning is silenced, and was not indicative of any buggy behavior. So for me the problem is closed.

Aurelien

Le 8 août 2011 à 15:30, Barrett, Brian W a écrit :

> 4) Don't design an interface where you're going to have the circular
> dependency. Perhaps it's a sign that the epoch is in the wrong place, or
> handled in the wrong way, or the whole naming scheme is wrong. I'm not
> sure. I've relented that it doesn't need to be fixed now (since it is
> clearly bigger than just the patch introducing the epochs), but I still
> think it's a sign we have a basic design problem with the way we're
> handling names.
>
> Brian
>
> On 8/8/11 1:23 PM, "Thomas Herault" <herault.thomas_at_[hidden]> wrote:
>
>> proc_get_epoch( proc );
>> calls ORTE_NAME_PRINT(proc)
>> when in debug mode.
>>
>> When this is the case, ORTE_NAME_PRINT prints all fields of proc,
>> including epoch. So, proc->epoch = proc_get_epoch(proc) triggers the
>> valgrind warning, because of the print, in debug mode.
>> This is the only reason why proc->epoch must be initialized before
>> calling proc->epoch = proc_get_epoch( proc );
>>
>> I agree with you it's not a good approach, for coding style / avoiding
>> this kind of discussions.
>>
>> Alternative approaches include:
>> - having proc_get_epoch have a side effect on proc->epoch instead of
>> returning the correct epoch value. Assignment can then be done before the
>> debug message. But this is inconsistent with the rest of the interface
>> used in proc_name_fns;
>> - having proc_get_epoch use another helper than ORTE_NAME_PRINT to print
>> the debug info. But this requires coding an alternative
>> ORTE_SOMETHING_PRINT function, only for this use;
>> - having proc be defined as an opal_object, and set epoch to INVALID (or
>> even UNSET) into the constructor. This could induce changes at many
>> places, and there is always the risk that some changes are left
>> incomplete.
>>
>> Thomas
>>
>> Le 8 août 2011 à 12:20, Barrett, Brian W a écrit :
>>
>>> Ok, that makes some sense. It's a really weird way of doing
>>> initialization, though. Some of it is probably how Ralph chose to share
>>> nidpid map code, so perhaps it's unavoidable. But since the name isn't
>>> use in proc_get_epoch() (or, better not be used in there), I'm not
>>> entirely sure why that made valgrind happy...
>>>
>>> Brian
>>>
>>> On 8/8/11 9:53 AM, "Wesley Bland" <wbland_at_[hidden]> wrote:
>>>
>>>> It's not just copying the value from one to the other.
>>>> The value is initialized on the first line. The proc_name is passed
>>>> into
>>>> the function where the jobid and vpid are used (not the epoch). The
>>>> function returns a new (the correct) value for the epoch based on the
>>>> passed in jobid and vpid which is assigned to the process name.
>>>>
>>>> This is the same thing as initializing the value to NULL. We do that
>>>> all
>>>> the time to avoid compiler warnings. I just can't do that here because
>>>> this isn't a pointer. If it would make the code look better I can move
>>>> the first assignment to the top of the function where the other
>>>> initializations are.
>>>>
>>>> On Mon, Aug 8, 2011 at 11:41 AM, Barrett, Brian W <bwbarre_at_[hidden]>
>>>> wrote:
>>>>
>>>> On 8/8/11 9:34 AM, "Jeff Squyres" <jsquyres_at_[hidden]> wrote:
>>>>
>>>>> On Aug 8, 2011, at 11:30 AM, Wesley Bland wrote:
>>>>>
>>>>>> The reason is because valgrind was complaining about uninitialized
>>>>>> values that were passed into proc_get_epoch. I saw the same warnings
>>>>>> from valgrind when I ran it. I added the code to initialize the
>>>>>> values
>>>>>> to what really should be the default value and the warnings went
>>>>>> away.
>>>>>> Since the process_name_t struct isn't an object, it doesn't have an
>>>>>> initialization function like so many of the other objects in the
>>>>>> code.
>>>>>> This is what we have.
>>>>>
>>>>> Ah, I see -- you are passing peer_name into proc_get_epoch(). I
>>>>> missed
>>>>> that.
>>>>>
>>>>> Thanks!
>>>>>
>>>>>> peer_name.jobid = ORTE_PROC_MY_NAME->jobid;
>>>>>> peer_name.vpid = peer_idx;
>>>>>> + peer_name.epoch = ORTE_EPOCH_INVALID;
>>>>>> peer_name.epoch = orte_ess.proc_get_epoch(&peer_name);
>>>>
>>>>
>>>> I'm still not convinced this is a rational coding style. It seems to
>>>> me
>>>> that if you're just going to copy the epoch from peer_name to
>>>> peer_name,
>>>> just assign the epoch to INVALID and be done with it. There's no need
>>>> for
>>>> both the assignment added in this patch and the line after it.
>>>>
>>>> Now, normally I'd say that this is fixing a bug and having to fix other
>>>> people's bad code shouldn't be your problem. But since you wrote both
>>>> lines, fixing it to make sense seems reasonable to me ;).
>>>>
>>>> Brian
>>>>
>>>> --
>>>> Brian W. Barrett
>>>> Dept. 1423: Scalable System Software
>>>> Sandia National Laboratories
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Brian W. Barrett
>>> Dept. 1423: Scalable System Software
>>> Sandia National Laboratories
>>>
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>>
>
>
> --
> Brian W. Barrett
> Dept. 1423: Scalable System Software
> Sandia National Laboratories
>
>
>
>
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel