Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] shmem_collect32 does not work with nlong == 0
From: Mike Dubman (miked_at_[hidden])
Date: 2014-05-10 13:17:45


thanks for patch, we will review it next week.

Also, you can select different shmem collectives at runtime:

-mca scoll_mpi_enable 1 (to select MPI collectives for shmem)

On Sat, May 10, 2014 at 7:08 PM, Bert Wesarg <Bert.Wesarg_at_[hidden]>wrote:

> On 05/10/2014 02:46 PM, Bert Wesarg wrote:
>
>> Hi,
>>
>> I get a deadlock when using the shmem_collect32() routine and any of the
>> non-root PEs pass 0 as the number of elements. It looks like the
>> algorithm in _algorithm_central_collector() does use 0 as a special
>> value, and thus does not break out of the loop.
>>
>
> This seems to fix it for me:
>
> diff --git i/oshmem/mca/scoll/basic/scoll_basic_collect.c
> w/oshmem/mca/scoll/basic/scoll_basic_collect.c
> index aa81fac..6bba7d1 100644 oshmem/mca/scoll/basic/scoll_basic_collect.c
> --- i/oshmem/mca/scoll/basic/scoll_basic_collect.c
> +++ w/oshmem/mca/scoll/basic/scoll_basic_collect.c
> @@ -553,7 +553,7 @@ static int _algorithm_central_collector(struct
> oshmem_group_t *group,
> wait_pe_array = malloc(sizeof(*wait_pe_array) * wait_pe_count);
> if (wait_pe_array) {
> memset((void*) wait_pe_array,
> - 0,
> + 0xff,
> sizeof(*wait_pe_array) * wait_pe_count);
> wait_pe_array[0] = nlong;
> wait_pe_count--;
> @@ -564,13 +564,13 @@ static int _algorithm_central_collector(struct
> oshmem_group_t *group,
> group->my_pe);
> for (i = 1; (i < group->proc_count) && (rc ==
> OSHMEM_SUCCESS);
> i++) {
> - if (wait_pe_array[i] == 0) {
> + if (wait_pe_array[i] == (size_t)-1) {
> pe_cur = oshmem_proc_pe(group->proc_array[i]);
> value = 0;
> rc = MCA_SPML_CALL(get((void*)pSync,
> sizeof(value), (void*)&value, pe_cur));
> if ((rc == OSHMEM_SUCCESS)
> && (value != _SHMEM_SYNC_VALUE)
> - && (value > 0)) {
> + && (value >= 0)) {
> wait_pe_array[i] = (size_t) value;
> wait_pe_count--;
> SCOLL_VERBOSE(14,
> @@ -588,6 +588,9 @@ static int _algorithm_central_collector(struct
> oshmem_group_t *group,
>
> for (i = 1; (i < group->proc_count) && (rc == OSHMEM_SUCCESS);
> i++) {
> + if (!wait_pe_array[i])
> + continue;
> +
> /* Get PE ID of a peer from the group */
> pe_cur = oshmem_proc_pe(group->proc_array[i]);
>
>
>> Kind regards,
>> Bert Wesarg
>>
>>
> --
> Dipl.-Inf. Bert Wesarg
> wiss. Mitarbeiter
>
> Technische Universität Dresden
> Zentrum für Informationsdienste und Hochleistungsrechnen (ZIH)
> 01062 Dresden
> Tel.: +49 (351) 463-42451
> Fax: +49 (351) 463-37773
> E-Mail: Bert.Wesarg_at_[hidden]
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/05/14768.php
>