Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Still seeing hangs in OMPI 1.3
From: George Bosilca (bosilca_at_[hidden])
Date: 2008-08-22 15:34:15


Rolf,

You're absolutely right. I don't know how this didn't get into the
merge ... Anyway, I just corrected the mistake. Now the 1.3 should be
[really] working.

   Thanks,
     george.

On Aug 22, 2008, at 9:15 PM, Rolf vandeVaart wrote:

> Hi George:
>
> I did some more experimenting. Just copying over the btl_sm_fifo.h
> file was not enough. I also had to make this change (which I found
> in the trunk) to the btl_sm_component.c file. After that, my hangs
> went away.
>
> burpen-csx10-0 164 =>svn diff btl_sm_component.c
> Index: btl_sm_component.c
> ===================================================================
> --- btl_sm_component.c (revision 19393)
> +++ btl_sm_component.c (working copy)
> @@ -389,9 +389,7 @@
> opal_atomic_lock(fifo->tail_lock);
> }
>
> - hdr = (mca_btl_sm_hdr_t*)ompi_cb_fifo_read_from_tail(&fifo-
> >tail->cb_fifo,
> - fifo-
> >tail->cb_overflow,
> -
> &useless );
> + hdr = (mca_btl_sm_hdr_t*)ompi_fifo_read_from_tail(fifo);
>
> /* release thread lock */
> if(opal_using_threads()) {
> burpen-csx10-0 165 =>
>
>
>
> Rolf vandeVaart wrote:
>> George:
>>
>> We are still seeing hangs in OMPI 1.3 which I assume are due to the
>> PML issue. However, we do not see it in the trunk. My
>> investigation is early, but I am wondering if the merge of the
>> changes into v1.3 may be missing a file. From the original fix in
>> the trunk, I see the following:
>>
>> Changeset 19309 (trunk)
>> btl_sm.c (modified) (2 diffs)
>> btl_sm_component.c (modified) (7 diffs)
>> btl_sm_fifo.h (modified) (1 diff)
>>
>> For the ompi v1.3 I see this:
>> Changeset 19378 (v1.3)
>> btl/sm/btl_sm.c (modified) (1 diff)
>> btl/sm/btl_sm_component.c (modified) (2 diffs)
>> coll/sm/coll_sm_module.c (modified) (1 diff)
>> pml/ob1/pml_ob1_sendreq.c (modified) (1 diff)
>>
>> The 1.3 changeset has those two extra files, but they were just
>> formatting fixes. So, my concern is the missing btl_sm_fifo.h
>> change in 1.3. I have not tried it out yet, but wanted to see if
>> anyone else is still seeing 1.3 hangs.
>>
>> Rolf
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel



  • application/pkcs7-signature attachment: smime.p7s