This web mail archive is frozen.
This page is part of a frozen web archive of this mailing list.
You can still navigate around this archive, but know that no new mails
have been added to it since July of 2016.
Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.
Commit 9946 solve the problem. I mixed the return value of the
trylock call, considering that any not zero value was a success when
in fact 0 is a success. Anyway, now it's fixed on the trunk.
On May 16, 2006, at 11:07 AM, Rolf Vandevaart wrote:
> Hi Brian:
> Here is the stack trace from the core dump. I am also trying to
> better what is happening here, but I figured I needed to get this off
> to you.
> burl-ct-v440-4 96 =>dbx connectivity core
> For information about new features see `help changes'
> To remove this message, put `dbxenv suppress_startup_message 7.4' in
> your .dbxrc
> Reading connectivity
> core file header read successfully
> (dbx) where
> current thread: t_at_1
>  _lwp_kill(0x0, 0x6, 0x0, 0x6, 0xfc00, 0x0), at 0xfd840f90
>  raise(0x6, 0x0, 0xfd824a98, 0xffffffff, 0xfd868284, 0x6), at
>  abort(0xffbfee00, 0x1, 0x0, 0xa83f0, 0xfd86b298, 0x0), at
> => opal_mutex_lock(m = 0xfd0b12e8), line 101 in "mutex_unix.h"
>  __ompi_free_list_wait(fl = 0xfd0b1298, item = 0xffbfef88), line
> 167 in "ompi_free_list.h"
>  mca_pml_ob1_recv_frag_match(btl = 0xfcfbc778, hdr = 0xdc897260,
> segments = 0xdc897218, num_segments = 1U), line 550 in
>  mca_pml_ob1_recv_frag_callback(btl = 0xfcfbc778, tag =
> '\001', des
> = 0xdc8971d0, cbdata = (nil)), line 80 in "pml_ob1_recvfrag.c"
>  mca_btl_sm_component_progress(), line 396 in
>  mca_bml_r2_progress(), line 103 in "bml_r2.c"
>  opal_progress(), line 288 in "opal_progress.c"
>  opal_condition_wait(c = 0xff29d3b8, m = 0xff29d430), line 75 in
>  mca_pml_ob1_recv(addr = 0xffbff4b0, count = 1U, datatype =
> 0x21458, src = 0, tag = 0, comm = 0x215a0, status = 0xffbff4c0), line
> 101 in "pml_ob1_irecv.c"
>  PMPI_Recv(buf = 0xffbff4b0, count = 1, type = 0x21458, source =
> 0, tag = 0, comm = 0x215a0, status = 0xffbff4c0), line 66 in "precv.c"
>  main(argc = 2, argv = 0xffbff53c), line 69 in "connectivity.c"
> Brian Barrett wrote On 05/11/06 02:57,:
>> Eeeks! That sounds like a bug. Can you attach a debugger and get a
>> stack trace for the situation where that occurs?
>> On May 10, 2006, at 10:17 PM, Rolf Vandevaart wrote:
>>> I have built a library with "--enable-mpi-threads --with-
>>> the trunk) and tried running a simple non-threaded program linked
>>> against it.
>>> The program just calls to MPI_Send and MPI_Recv so every process
>>> sends an
>>> MPI_INT to one another.
>>> When I run it I see the following:
>>> burl-ct-v440-4 86 =>mpirun -np 4 connectivity -v
>>> burl-ct-v440-4: checking connection 0 <-> 1
>>> burl-ct-v440-4: checking connection 1 <-> 2
>>> burl-ct-v440-4: checking connection 0 <-> 2
>>> opal_mutex_lock(): Deadlock situation detected/avoided
>>> Signal:6 info.si_errno:0(Error 0) si_code:-1()
>>> *** End of error message ***
>>> burl-ct-v440-4 87 =>
>>> Since I had the debug enabled, I get to see that one of the
>>> was trying to grab a lock that it already head. (Nice feature
>>> that error printed out!)
>>> Has anyone else seen this? As I said, this is a non-threaded
>>> so there is only one thread per process. I am wondering if I am
>>> something basic in the building of my library. This test works
>>> fine against
>>> a library configured without "--enable-mpi-threads --with-
>>> devel mailing list
>> devel mailing list
> devel mailing list
"Half of what I say is meaningless; but I say it so that the other
half may reach you"