Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] confusion between slot and procs on mca/rmaps
From: Ralph Castain (rhc_at_[hidden])
Date: 2010-12-01 07:52:15


Done in r24126

On Dec 1, 2010, at 5:11 AM, Damien Guinier wrote:

> oups
>
> Ok, you can commit it. All problem is on "procs" word, on source code, "processes" AND "cores" definition is used.
>
>
> Le 01/12/2010 11:37, Damien Guinier a écrit :
>> Ok, you can commit it. All problem is on "procs" work, on source code, "processes" AND "cores" definition is used.
>>
>> Thank you for your help.
>> Damien
>>
>> Le 01/12/2010 10:47, Ralph Castain a écrit :
>>> I just checked and it appears bycore does correctly translate to byslot. So your patch does indeed appear to be correct. If you don't mind, I'm going to apply it for you as I'm working on a correction for how we handle oversubscribe flags, and I want to ensure the patch gets included so we compute oversubscribe correctly.
>>>
>>> Thanks for catching this!
>>>
>>> On Nov 30, 2010, at 10:33 PM, Ralph Castain wrote:
>>>
>>>> Afraid I don't speak much slurm any more (thank goodness!).
>>>>
>>>> From your output, It looks like the system is mapping bynode instead of byslot. IIRC, isn't bycore just supposed to be a pseudonym for byslot? So perhaps the problem is that "bycore" causes us to set the "bynode" flag by mistake. Did you check that?
>>>>
>>>> BTW: when running cpus-per-proc, a slot doesn't have X processes. I suspect this is just a language thing, but it will create confusion. A slot consists of X cpus - we still assign only one process to each slot.
>>>>
>>>> On Nov 30, 2010, at 10:47 AM, Damien Guinier wrote:
>>>>
>>>>> hi all,
>>>>>
>>>>> Many time, there are no difference between "proc" and "slot". But when you use "mpirun -cpus-per-proc X", slot have X procs.
>>>>> On orte/mca/rmaps/base/rmaps_base_common_mappers.c, there are a confusion between proc and slot. this little error impact mapping action:
>>>>>
>>>>> On OMPI last version with 32 cores compute node:
>>>>> salloc -n 8 -c 8 mpirun -bind-to-core -bycore ./a.out
>>>>> [rank:0]<stdout>: host:compute18
>>>>> [rank:1]<stdout>: host:compute19
>>>>> [rank:2]<stdout>: host:compute18
>>>>> [rank:3]<stdout>: host:compute19
>>>>> [rank:4]<stdout>: host:compute18
>>>>> [rank:5]<stdout>: host:compute19
>>>>> [rank:6]<stdout>: host:compute18
>>>>> [rank:7]<stdout>: host:compute19
>>>>>
>>>>> with patch:
>>>>> [rank:0]<stdout>: host:compute18
>>>>> [rank:1]<stdout>: host:compute18
>>>>> [rank:2]<stdout>: host:compute18
>>>>> [rank:3]<stdout>: host:compute18
>>>>> [rank:4]<stdout>: host:compute19
>>>>> [rank:5]<stdout>: host:compute19
>>>>> [rank:6]<stdout>: host:compute19
>>>>> [rank:7]<stdout>: host:compute19
>>>>>
>>>>> Can you say, if my patch is correct ?
>>>>>
>>>>> Thanks you
>>>>>
>>>>> Damien
>>>>>
>>>>> <patch_cpu_per_rank.txt>_______________________________________________
>>>>> devel mailing list
>>>>> devel_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>>
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel