Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r25015
From: Thomas Herault (herault.thomas_at_[hidden])
Date: 2011-08-08 15:23:41


proc_get_epoch( proc );
calls ORTE_NAME_PRINT(proc)
when in debug mode.

When this is the case, ORTE_NAME_PRINT prints all fields of proc, including epoch. So, proc->epoch = proc_get_epoch(proc) triggers the valgrind warning, because of the print, in debug mode.
This is the only reason why proc->epoch must be initialized before calling proc->epoch = proc_get_epoch( proc );

I agree with you it's not a good approach, for coding style / avoiding this kind of discussions.

Alternative approaches include:
 - having proc_get_epoch have a side effect on proc->epoch instead of returning the correct epoch value. Assignment can then be done before the debug message. But this is inconsistent with the rest of the interface used in proc_name_fns;
 - having proc_get_epoch use another helper than ORTE_NAME_PRINT to print the debug info. But this requires coding an alternative ORTE_SOMETHING_PRINT function, only for this use;
 - having proc be defined as an opal_object, and set epoch to INVALID (or even UNSET) into the constructor. This could induce changes at many places, and there is always the risk that some changes are left incomplete.

Thomas

Le 8 août 2011 à 12:20, Barrett, Brian W a écrit :

> Ok, that makes some sense. It's a really weird way of doing
> initialization, though. Some of it is probably how Ralph chose to share
> nidpid map code, so perhaps it's unavoidable. But since the name isn't
> use in proc_get_epoch() (or, better not be used in there), I'm not
> entirely sure why that made valgrind happy...
>
> Brian
>
> On 8/8/11 9:53 AM, "Wesley Bland" <wbland_at_[hidden]> wrote:
>
>> It's not just copying the value from one to the other.
>> The value is initialized on the first line. The proc_name is passed into
>> the function where the jobid and vpid are used (not the epoch). The
>> function returns a new (the correct) value for the epoch based on the
>> passed in jobid and vpid which is assigned to the process name.
>>
>> This is the same thing as initializing the value to NULL. We do that all
>> the time to avoid compiler warnings. I just can't do that here because
>> this isn't a pointer. If it would make the code look better I can move
>> the first assignment to the top of the function where the other
>> initializations are.
>>
>> On Mon, Aug 8, 2011 at 11:41 AM, Barrett, Brian W <bwbarre_at_[hidden]>
>> wrote:
>>
>> On 8/8/11 9:34 AM, "Jeff Squyres" <jsquyres_at_[hidden]> wrote:
>>
>>> On Aug 8, 2011, at 11:30 AM, Wesley Bland wrote:
>>>
>>>> The reason is because valgrind was complaining about uninitialized
>>>> values that were passed into proc_get_epoch. I saw the same warnings
>>>> from valgrind when I ran it. I added the code to initialize the values
>>>> to what really should be the default value and the warnings went away.
>>>> Since the process_name_t struct isn't an object, it doesn't have an
>>>> initialization function like so many of the other objects in the code.
>>>> This is what we have.
>>>
>>> Ah, I see -- you are passing peer_name into proc_get_epoch(). I missed
>>> that.
>>>
>>> Thanks!
>>>
>>>> peer_name.jobid = ORTE_PROC_MY_NAME->jobid;
>>>> peer_name.vpid = peer_idx;
>>>> + peer_name.epoch = ORTE_EPOCH_INVALID;
>>>> peer_name.epoch = orte_ess.proc_get_epoch(&peer_name);
>>
>>
>> I'm still not convinced this is a rational coding style. It seems to me
>> that if you're just going to copy the epoch from peer_name to peer_name,
>> just assign the epoch to INVALID and be done with it. There's no need for
>> both the assignment added in this patch and the line after it.
>>
>> Now, normally I'd say that this is fixing a bug and having to fix other
>> people's bad code shouldn't be your problem. But since you wrote both
>> lines, fixing it to make sense seems reasonable to me ;).
>>
>> Brian
>>
>> --
>> Brian W. Barrett
>> Dept. 1423: Scalable System Software
>> Sandia National Laboratories
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>
>
> --
> Brian W. Barrett
> Dept. 1423: Scalable System Software
> Sandia National Laboratories
>
>
>
>
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel