Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] ctrl+c to abort a job with openmpi-1.7.5rc2
From: Ralph Castain (rhc_at_[hidden])
Date: 2014-03-13 20:44:15


No problem - we appreciate you taking the time to confirm. Jeff encountered something late today, so we may indeed still have a lingering problem. :-(

Will keep you posted
Ralph

On Mar 13, 2014, at 5:08 PM, tmishima_at_[hidden] wrote:

>
>
> Hi Ralph, I'm late to your release again due to TD.
>
> At that time, I manually applied #4386 and #4383 to 1.7 branch
> - namely openmpi-1.7.5rc2, and did the check. I might have
> made some mistake.
>
> Now, I found openmpi-1.7.5rc3 had just released and confirmed
> it worked fine, thanks.
>
> Tetsuya
>
>> It's okay - we thought we had it fixed, but not for that scenario.
>>
>> On Mar 12, 2014, at 9:02 PM, tmishima_at_[hidden] wrote:
>>
>>>
>>>
>>> Sorry for disturbing, please keep going ...
>>>
>>> Tetsuya
>>>
>>>> Yes, I know - I am just finishing the fix now.
>>>>
>>>> On Mar 12, 2014, at 8:48 PM, tmishima_at_[hidden] wrote:
>>>>
>>>>>
>>>>>
>>>>> Hi Ralph, this problem is not fixed completely by today's latest
>>>>> ticket #4383, I guess ...
>>>>>
>>>>> https://svn.open-mpi.org/trac/ompi/ticket/4383
>>>>>
>>>>> For example, in case of returing with ORTE_ERR_SILENT from the line
> 514
>>>>> in rmaps_rr_mapper.c file, the problem still occurs. I executed the
> job
>>>>> under the unmanaged condition - rsh without torque:
>>>>>
>>>>> [mishima_at_manage openmpi-1.7.5rc2]$ mpirun -np 6 -host node05,node06
>>>>> -nooversubscribe ~/mis/openmpi/demos/myprog
>>>>>
>>>
> --------------------------------------------------------------------------
>>>>> There are not enough slots available in the system to satisfy the 6
>>> slots
>>>>> that were requested by the application:
>>>>> /home/mishima/mis/openmpi/demos/myprog
>>>>>
>>>>> Either request fewer slots for your application, or make more slots
>>>>> available
>>>>> for use.
>>>>>
>>>
> --------------------------------------------------------------------------
>>>>> Abort is in progress...hit ctrl-c again within 5 seconds to forcibly
>>>>> terminate
>>>>> Abort is in progress...hit ctrl-c again within 5 seconds to forcibly
>>>>> terminate
>>>>> .....
>>>>>
>>>>> rmaps_rr_mapper.c:
>>>>> 509 /* quick check to see if we can map all the procs */
>>>>> 510 if (num_slots < (app->num_procs *
>>>>> orte_rmaps_base.cpus_per_rank)) {
>>>>> 511 if (ORTE_MAPPING_NO_OVERSUBSCRIBE &
>>>>> ORTE_GET_MAPPING_DIRECTIVE(jdata->map->mapping)) {
>>>>> 512 orte_show_help("help-orte-rmaps-base.txt",
>>>>> "orte-rmaps-base:alloc-error",
>>>>> 513 true, app->num_procs, app->app);
>>>>> 514 return ORTE_ERR_SILENT;
>>>>> 515 }
>>>>>
>>>>>
>>>>> Tetsuya
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users