Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

From: Mohamad Chaarawi (mschaara_at_[hidden])
Date: 2007-08-16 13:01:28


Ok
I was assuming that setting the ranks was done the same for Plist as for
the sparse groups (which calls translate_ranks)
For Plist i just forgot to add a check that the rank is not MPI_UNDEFINED,
before i do the lookup and set the rank.

I just commited the fix..

On Thu, August 16, 2007 10:53 am, Tim Prins wrote:
> Mohamad,
>
> 2 process was plenty. Like I said, when running in debug mode, it tends
> to 'work' since memory is initialized to \0 and we fall through. In an
> optimized build, looking at the mtt results it looks like it segfaults
> about 10% of the time.
>
> But if you apply the patch I sent, it will tell you when an invaild
> lookup happens, which should be every time it runs.
>
> Tim
>
> Mohamad Chaarawi wrote:
>> Hey Tim,
>>
>> I understand what you are talking about.
>> Im trying to reproduce the problem. How many processes are your running
>> with, because with the default (4 for the group) it's passing..
>>
>> Thanks,
>> Mohamad
>>
>> On Thu, August 16, 2007 7:49 am, Tim Prins wrote:
>>> Sorry, I pushed the wrong button and sent this before it was ready....
>>>
>>> Tim Prins wrote:
>>>> Hi folks,
>>>>
>>>> I am running into a problem with the ibm test 'group'. I will try to
>>>> explain what I think is going on, but I do not really understand the
>>>> group code so please forgive me if it is wrong...
>>>>
>>>> The test creates a group based on MPI_COMM_WORLD (group1), and a group
>>>> that has half the procs in group1 (newgroup). Next, all the processes
>>>> do:
>>>>
>>>> MPI_Group_intersection(newgroup,group1,&group2)
>>>>
>>>> ompi_group_intersection figures out what procs are needed for group2,
>>>> then calls
>>>>
>>>> ompi_group_incl, passing 'newgroup' and '&group2'
>>>>
>>>> This then calls (since I am not using sparse groups)
>>>> ompi_group_incl_plist
>>>>
>>>> However, ompi_group_plist assumes that the current process is a member
>>>> of the passed group ('newgroup'). Thus when it calls
>>>> ompi_group_peer_lookup on 'newgroup', half of the processes get
>>>> garbage
>>>> back since they are not in 'newgroup'. In most cases, memory is
>>>> initialized to \0 and things fall through, but we get intermittent
>>>> segfaults in optimized builds.
>>>>
>>> Here is a patch to a error check which highlights the problem:
>>> Index: group/group.h
>>> ===================================================================
>>> --- group/group.h (revision 15869)
>>> +++ group/group.h (working copy)
>>> @@ -308,7 +308,7 @@
>>> static inline struct ompi_proc_t* ompi_group_peer_lookup(ompi_group_t
>>> *group, int peer_id)
>>> {
>>> #if OMPI_ENABLE_DEBUG
>>> - if (peer_id >= group->grp_proc_count) {
>>> + if (peer_id >= group->grp_proc_count || peer_id < 0) {
>>> opal_output(0, "ompi_group_lookup_peer: invalid peer index
>>> (%d)", peer_id);
>>>
>>>> Thanks,
>>>>
>>>> Tim
>>>> _______________________________________________
>>>> devel mailing list
>>>> devel_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>
>>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>

-- 
Mohamad Chaarawi
Instructional Assistant		  http://www.cs.uh.edu/~mschaara
Department of Computer Science	  University of Houston
4800 Calhoun, PGH Room 526        Houston, TX 77204, USA