Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Still seeing hangs in OMPI 1.3
From: Rolf vandeVaart (Rolf.Vandevaart_at_[hidden])
Date: 2008-08-22 15:15:56


Hi George:

I did some more experimenting. Just copying over the btl_sm_fifo.h file
was not enough. I also had to make this change (which I found in the
trunk) to the btl_sm_component.c file. After that, my hangs went away.

 burpen-csx10-0 164 =>svn diff btl_sm_component.c
Index: btl_sm_component.c
===================================================================
--- btl_sm_component.c (revision 19393)
+++ btl_sm_component.c (working copy)
@@ -389,9 +389,7 @@
             opal_atomic_lock(fifo->tail_lock);
         }

- hdr = (mca_btl_sm_hdr_t*)ompi_cb_fifo_read_from_tail(&fifo->tail->cb_fifo,
- fifo->tail->cb_overflow,
- &useless );
+ hdr = (mca_btl_sm_hdr_t*)ompi_fifo_read_from_tail(fifo);

         /* release thread lock */
         if(opal_using_threads()) {
 burpen-csx10-0 165 =>

Rolf vandeVaart wrote:
> George:
>
> We are still seeing hangs in OMPI 1.3 which I assume are due to the
> PML issue. However, we do not see it in the trunk. My investigation
> is early, but I am wondering if the merge of the changes into v1.3 may
> be missing a file. From the original fix in the trunk, I see the
> following:
>
> Changeset 19309 (trunk)
> btl_sm.c (modified) (2 diffs)
> btl_sm_component.c (modified) (7 diffs)
> btl_sm_fifo.h (modified) (1 diff)
>
> For the ompi v1.3 I see this:
> Changeset 19378 (v1.3)
> btl/sm/btl_sm.c (modified) (1 diff)
> btl/sm/btl_sm_component.c (modified) (2 diffs)
> coll/sm/coll_sm_module.c (modified) (1 diff)
> pml/ob1/pml_ob1_sendreq.c (modified) (1 diff)
>
> The 1.3 changeset has those two extra files, but they were just
> formatting fixes. So, my concern is the missing btl_sm_fifo.h change
> in 1.3. I have not tried it out yet, but wanted to see if anyone else
> is still seeing 1.3 hangs.
>
> Rolf
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel