Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] shmem_collect32 does not work with nlong == 0
From: Bert Wesarg (Bert.Wesarg_at_[hidden])
Date: 2014-05-10 12:08:56


On 05/10/2014 02:46 PM, Bert Wesarg wrote:
> Hi,
>
> I get a deadlock when using the shmem_collect32() routine and any of the
> non-root PEs pass 0 as the number of elements. It looks like the
> algorithm in _algorithm_central_collector() does use 0 as a special
> value, and thus does not break out of the loop.

This seems to fix it for me:

diff --git i/oshmem/mca/scoll/basic/scoll_basic_collect.c
w/oshmem/mca/scoll/basic/scoll_basic_collect.c
index aa81fac..6bba7d1 100644 oshmem/mca/scoll/basic/scoll_basic_collect.c
--- i/oshmem/mca/scoll/basic/scoll_basic_collect.c
+++ w/oshmem/mca/scoll/basic/scoll_basic_collect.c
@@ -553,7 +553,7 @@ static int _algorithm_central_collector(struct
oshmem_group_t *group,
          wait_pe_array = malloc(sizeof(*wait_pe_array) * wait_pe_count);
          if (wait_pe_array) {
              memset((void*) wait_pe_array,
- 0,
+ 0xff,
                     sizeof(*wait_pe_array) * wait_pe_count);
              wait_pe_array[0] = nlong;
              wait_pe_count--;
@@ -564,13 +564,13 @@ static int _algorithm_central_collector(struct
oshmem_group_t *group,
                                group->my_pe);
                  for (i = 1; (i < group->proc_count) && (rc ==
OSHMEM_SUCCESS);
                          i++) {
- if (wait_pe_array[i] == 0) {
+ if (wait_pe_array[i] == (size_t)-1) {
                          pe_cur = oshmem_proc_pe(group->proc_array[i]);
                          value = 0;
                          rc = MCA_SPML_CALL(get((void*)pSync,
sizeof(value), (void*)&value, pe_cur));
                          if ((rc == OSHMEM_SUCCESS)
                                  && (value != _SHMEM_SYNC_VALUE)
- && (value > 0)) {
+ && (value >= 0)) {
                              wait_pe_array[i] = (size_t) value;
                              wait_pe_count--;
                              SCOLL_VERBOSE(14,
@@ -588,6 +588,9 @@ static int _algorithm_central_collector(struct
oshmem_group_t *group,

              for (i = 1; (i < group->proc_count) && (rc == OSHMEM_SUCCESS);
                      i++) {
+ if (!wait_pe_array[i])
+ continue;
+
                  /* Get PE ID of a peer from the group */
                  pe_cur = oshmem_proc_pe(group->proc_array[i]);

>
> Kind regards,
> Bert Wesarg
>

-- 
Dipl.-Inf. Bert Wesarg
wiss. Mitarbeiter
Technische Universität Dresden
Zentrum für Informationsdienste und Hochleistungsrechnen (ZIH)
01062 Dresden
Tel.: +49 (351) 463-42451
Fax: +49 (351) 463-37773
E-Mail: Bert.Wesarg_at_[hidden]