Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] confusion between slot and procs on mca/rmaps
From: Damien Guinier (damien.guinier_at_[hidden])
Date: 2010-12-01 07:11:23


oups

Ok, you can commit it. All problem is on "procs" word, on source code,
"processes" AND "cores" definition is used.

Le 01/12/2010 11:37, Damien Guinier a écrit :
> Ok, you can commit it. All problem is on "procs" work, on source code,
> "processes" AND "cores" definition is used.
>
> Thank you for your help.
> Damien
>
> Le 01/12/2010 10:47, Ralph Castain a écrit :
>> I just checked and it appears bycore does correctly translate to
>> byslot. So your patch does indeed appear to be correct. If you don't
>> mind, I'm going to apply it for you as I'm working on a correction
>> for how we handle oversubscribe flags, and I want to ensure the patch
>> gets included so we compute oversubscribe correctly.
>>
>> Thanks for catching this!
>>
>> On Nov 30, 2010, at 10:33 PM, Ralph Castain wrote:
>>
>>> Afraid I don't speak much slurm any more (thank goodness!).
>>>
>>> From your output, It looks like the system is mapping bynode
>>> instead of byslot. IIRC, isn't bycore just supposed to be a
>>> pseudonym for byslot? So perhaps the problem is that "bycore" causes
>>> us to set the "bynode" flag by mistake. Did you check that?
>>>
>>> BTW: when running cpus-per-proc, a slot doesn't have X processes. I
>>> suspect this is just a language thing, but it will create confusion.
>>> A slot consists of X cpus - we still assign only one process to each
>>> slot.
>>>
>>> On Nov 30, 2010, at 10:47 AM, Damien Guinier wrote:
>>>
>>>> hi all,
>>>>
>>>> Many time, there are no difference between "proc" and "slot". But
>>>> when you use "mpirun -cpus-per-proc X", slot have X procs.
>>>> On orte/mca/rmaps/base/rmaps_base_common_mappers.c, there are a
>>>> confusion between proc and slot. this little error impact mapping
>>>> action:
>>>>
>>>> On OMPI last version with 32 cores compute node:
>>>> salloc -n 8 -c 8 mpirun -bind-to-core -bycore ./a.out
>>>> [rank:0]<stdout>: host:compute18
>>>> [rank:1]<stdout>: host:compute19
>>>> [rank:2]<stdout>: host:compute18
>>>> [rank:3]<stdout>: host:compute19
>>>> [rank:4]<stdout>: host:compute18
>>>> [rank:5]<stdout>: host:compute19
>>>> [rank:6]<stdout>: host:compute18
>>>> [rank:7]<stdout>: host:compute19
>>>>
>>>> with patch:
>>>> [rank:0]<stdout>: host:compute18
>>>> [rank:1]<stdout>: host:compute18
>>>> [rank:2]<stdout>: host:compute18
>>>> [rank:3]<stdout>: host:compute18
>>>> [rank:4]<stdout>: host:compute19
>>>> [rank:5]<stdout>: host:compute19
>>>> [rank:6]<stdout>: host:compute19
>>>> [rank:7]<stdout>: host:compute19
>>>>
>>>> Can you say, if my patch is correct ?
>>>>
>>>> Thanks you
>>>>
>>>> Damien
>>>>
>>>> <patch_cpu_per_rank.txt>_______________________________________________
>>>>
>>>> devel mailing list
>>>> devel_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>