Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

From: Terry D. Dontje (Terry.Dontje_at_[hidden])
Date: 2007-07-20 15:34:49


I think I've found a problem that is causing at least some of my runs of
the MT tests to abort or hang. The issue is that in the OB1 request
structure there is a req_send_range_lock that is never initialized with
the appropriate (pthread_)mutex_init call. I've put in the following
patch (given to me by Jeff) in ompi/mca/pml/ob1/pml_ob1_sendreq.c

Index: pml_ob1_sendreq.c
===================================================================
--- pml_ob1_sendreq.c (revision 15535)
+++ pml_ob1_sendreq.c (working copy)
@@ -136,12 +136,18 @@
     req->req_rdma_cnt = 0;
     req->req_throttle_sends = false;
     OBJ_CONSTRUCT(&req->req_send_ranges, opal_list_t);
+ OBJ_CONSTRUCT(&req->req_send_range_lock, opal_mutex_t);
}
+static void mca_pml_ob1_send_request_destruct
(mca_pml_ob1_send_request_t* req)
+{
+ OBJ_DESTRUCT(&req->req_send_range_lock);
+}
+
OBJ_CLASS_INSTANCE( mca_pml_ob1_send_request_t,
                     mca_pml_base_send_request_t,
                     mca_pml_ob1_send_request_construct,
- NULL );
+ mca_pml_ob1_send_request_destruct);
/**
  * Completion of a short message - nothing left to schedule. Note that
this

The above seems to at least allow one of my tests to consistently pass
(haven't tried the other tests yet). I was wanting to see if the above
fix makes sense and if possibly there are similar issues with the other
PMLs.

Thanks,

--td