Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] MPI_Errhandler_fatal_c failure
From: TERRY DONTJE (terry.dontje_at_[hidden])
Date: 2011-08-18 12:29:16


Just ran MPI_Errhandler_fatal_c with r25063 and it still fails.
Everything is the same except I don't see the "readv failed.." message.

Have your tried to run this code yourself? It is pretty simple and
fails with one node using np=4.

--td

On 8/18/2011 10:57 AM, Wesley Bland wrote:
> I just checked in a fix (I hope). I think the problem was that the errmgr
> was removing children from the list of odls children without using the
> mutex to prevent race conditions. Let me know if the MTT is still having
> problems tomorrow.
>
> Wes
>
>> I am seeing the intel test suite tests MPI_Errhandler_fatal_c and
>> MPI_Errhandler_fatal_f fail with an oob failure quite a bit I have not
>> seen this test failing under MTT until the epoch code was added. So I
>> have a suspicion the epoch code might be at fault. Could someone
>> familiar with the epoch changes (Wesley) take a look at this failure.
>>
>> Note this intermittently fails but fails for me more times than not.
>> Attached is a log file of a run that succeeds followed by the failing
>> run. The piece of concern are the messages involving
>> mca_oob_tcp_msg_recv and below.
>>
>> thanks,
>>
>> --
>> Oracle
>> Terry D. Dontje | Principal Software Engineer
>> Developer Tools Engineering | +1.781.442.2631
>> Oracle *- Performance Technologies*
>> 95 Network Drive, Burlington, MA 01803
>> Email terry.dontje_at_[hidden]<mailto:terry.dontje_at_[hidden]>
>>
>>
>>
>>

-- 
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.dontje_at_[hidden] <mailto:terry.dontje_at_[hidden]>



picture