Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] trunk regressions
From: Ralph Castain (rhc_at_[hidden])
Date: 2012-04-10 09:58:23


On Apr 10, 2012, at 7:51 AM, TERRY DONTJE wrote:

> Fair enough sorry about the false report.

No problem - it's a good reminder to all that we changed this policy. Previously, we allowed oversubscribe by default even on managed systems. This generated some significant concerns from sys admins who managed multi-tenant (i.e., shared node) systems as it caused obvious problems. So we now respect allocations from managed systems unless directed otherwise.

MTT setups probably require adjustment for tests like loop_spawn.

>
> I sent you email about the other failures (final and MPI_Errhandler).
>
> --td
>
> On 4/10/2012 9:40 AM, Ralph Castain wrote:
>>
>> I looked closer at the MTT output, Terry, and loop_spawn is actually behaving correctly. The problem is that (a) the test creates more children than allocated slots, and (b) the tests are being executed in a managed environment, and so we enforce the slot limit. The solution is to set the --oversubscribe flag so that ORTE knows it is okay to run more procs than allocated slots.
>>
>> Set that and it will run just fine.
>>
>> On Apr 10, 2012, at 4:44 AM, TERRY DONTJE wrote:
>>
>>> Thanks Ralph the comm_join issue seems to be fix but the other issues mentioned still seem to persist. I'll look at this later today unless someone else decides to fix them :-).
>>>
>>> --td
>>>
>>> On 4/9/2012 6:45 PM, Ralph Castain wrote:
>>>>
>>>> Should all be fixed now.
>>>>
>>>> On Apr 9, 2012, at 7:17 AM, TERRY DONTJE wrote:
>>>>
>>>>> After looking at Oracles MTT results there seem to be a (some??) regressions between r26240 and 26249 detected by the ibm and intel tests suites. An example of this is the failures in the comm_join, final and loop_spawn tests of the ibm test suite as seen in http://www.open-mpi.org/mtt/index.php?do_redir=2055.
>>>>>
>>>>> Note, I've seen similar errors detected by IU runs too.
>>>>>
>>>>> I'll look further into this but I thought I would post this just in case someone else has seen this.
>>>>> --
>>>>> Terry D. Dontje | Principal Software Engineer
>>>>> Developer Tools Engineering | +1.781.442.2631
>>>>> Oracle - Performance Technologies
>>>>> 95 Network Drive, Burlington, MA 01803
>>>>> Email terry.dontje_at_[hidden]
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> devel_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> devel mailing list
>>>> devel_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>> --
>>> Terry D. Dontje | Principal Software Engineer
>>> Developer Tools Engineering | +1.781.442.2631
>>> Oracle - Performance Technologies
>>> 95 Network Drive, Burlington, MA 01803
>>> Email terry.dontje_at_[hidden]
>>>
>>>
>>>
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>>
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
> --
> Terry D. Dontje | Principal Software Engineer
> Developer Tools Engineering | +1.781.442.2631
> Oracle - Performance Technologies
> 95 Network Drive, Burlington, MA 01803
> Email terry.dontje_at_[hidden]
>
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel