Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] openmpi tar.gz for 1.6.1 or 1.6.2
From: Anne M. Hammond (hammond_at_[hidden])
Date: 2012-07-17 16:59:54


Thanks! I ran the command:

mpirun --slot-list 0-3 -np 4 --report-bindings $EXECUTABLE:

and this is the output of standard error:

[node50.cl.corp.com:15473] [[45030,0],0] odls:default:fork binding child [[45030,1],0] to slot_list 0-3
[node50.cl.corp.com:15473] [[45030,0],0] odls:default:fork binding child [[45030,1],1] to slot_list 0-3
[node50.cl.corp.com:15473] [[45030,0],0] odls:default:fork binding child [[45030,1],2] to slot_list 0-3
[node50.cl.corp.com:15473] [[45030,0],0] odls:default:fork binding child [[45030,1],3] to slot_list 0-3

top shows the first 3 cores are bound:

top - 11:17:06 up 35 days, 1:03, 2 users, load average: 3.15, 1.15, 0.41
Tasks: 453 total, 6 running, 446 sleeping, 1 stopped, 0 zombie
Cpu0 :100.0%us, 0.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu1 :100.0%us, 0.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu2 :100.0%us, 0.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu3 :100.0%us, 0.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu4 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu5 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu6 : 0.0%us, 0.3%sy, 0.0%ni, 99.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu7 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 8059116k total, 1577220k used, 6481896k free, 62020k buffers
Swap: 16787916k total, 61108k used, 16726808k free, 718036k cached

For a multinode job, rankfile is needed:

http://www.open-mpi.org/faq/?category=tuning#using-paffinity-v1.3

Appreciate the suggestions and solution.

On Jul 16, 2012, at 5:08 PM, Ralph Castain wrote:

> Or you could just do:
>
> mpirun --slot-list 0-3 -np 4 hostname
>
> That will put the four procs on the cpu numbers 0-3, which should all be on the first socket
>
>
> On Jul 16, 2012, at 3:23 PM, Dominik Goeddeke wrote:
>
>> in the "old" 1.4.x and 1.5.x, I achieved this by using rankfiles (see FAQ), and it worked very well. With these versions, --byslot etc. didn't work for me, I always needed the rankfiles. I haven't tried the overhauled "convenience wrappers" in 1.6 that you are using for this feature yet, but I see no reason why the "old" way should not work, although it requires some shell magic if rankfiles are to be generated automatically from e.g. PBS or SLURM node lists.
>>
>> Dominik
>>
>> On 07/17/2012 12:13 AM, Anne M. Hammond wrote:
>>> There are 2 physical processors, each with 4 cores (no hyperthreading).
>>>
>>> I want to instruct openmpi to run only on the first processor, using 4 cores.
>>>
>>>
>>> [hammond_at_node48 ~]$ cat /proc/cpuinfo
>>> processor : 0
>>> vendor_id : AuthenticAMD
>>> cpu family : 16
>>> model : 4
>>> model name : Quad-Core AMD Opteron(tm) Processor 2376
>>> stepping : 2
>>> cpu MHz : 2311.694
>>> cache size : 512 KB
>>> physical id : 0
>>> siblings : 4
>>> core id : 0
>>> cpu cores : 4
>>> apicid : 0
>>> initial apicid : 0
>>> fpu : yes
>>> fpu_exception : yes
>>> cpuid level : 5
>>> wp : yes
>>> flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nonstop_tsc extd_apicid pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt npt lbrv svm_lock nrip_save
>>> bogomips : 4623.38
>>> TLB size : 1024 4K pages
>>> clflush size : 64
>>> cache_alignment : 64
>>> address sizes : 48 bits physical, 48 bits virtual
>>> power management: ts ttp tm stc 100mhzsteps hwpstate
>>>
>>> processor : 1
>>> vendor_id : AuthenticAMD
>>> cpu family : 16
>>> model : 4
>>> model name : Quad-Core AMD Opteron(tm) Processor 2376
>>> stepping : 2
>>> cpu MHz : 2311.694
>>> cache size : 512 KB
>>> physical id : 0
>>> siblings : 4
>>> core id : 1
>>> cpu cores : 4
>>> apicid : 1
>>> initial apicid : 1
>>> fpu : yes
>>> fpu_exception : yes
>>> cpuid level : 5
>>> wp : yes
>>> flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nonstop_tsc extd_apicid pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt npt lbrv svm_lock nrip_save
>>> bogomips : 4623.17
>>> TLB size : 1024 4K pages
>>> clflush size : 64
>>> cache_alignment : 64
>>> address sizes : 48 bits physical, 48 bits virtual
>>> power management: ts ttp tm stc 100mhzsteps hwpstate
>>>
>>> processor : 2
>>> vendor_id : AuthenticAMD
>>> cpu family : 16
>>> model : 4
>>> model name : Quad-Core AMD Opteron(tm) Processor 2376
>>> stepping : 2
>>> cpu MHz : 2311.694
>>> cache size : 512 KB
>>> physical id : 0
>>> siblings : 4
>>> core id : 2
>>> cpu cores : 4
>>> apicid : 2
>>> initial apicid : 2
>>> fpu : yes
>>> fpu_exception : yes
>>> cpuid level : 5
>>> wp : yes
>>> flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nonstop_tsc extd_apicid pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt npt lbrv svm_lock nrip_save
>>> bogomips : 4623.19
>>> TLB size : 1024 4K pages
>>> clflush size : 64
>>> cache_alignment : 64
>>> address sizes : 48 bits physical, 48 bits virtual
>>> power management: ts ttp tm stc 100mhzsteps hwpstate
>>>
>>> processor : 3
>>> vendor_id : AuthenticAMD
>>> cpu family : 16
>>> model : 4
>>> model name : Quad-Core AMD Opteron(tm) Processor 2376
>>> stepping : 2
>>> cpu MHz : 2311.694
>>> cache size : 512 KB
>>> physical id : 0
>>> siblings : 4
>>> core id : 3
>>> cpu cores : 4
>>> apicid : 3
>>> initial apicid : 3
>>> fpu : yes
>>> fpu_exception : yes
>>> cpuid level : 5
>>> wp : yes
>>> flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nonstop_tsc extd_apicid pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt npt lbrv svm_lock nrip_save
>>> bogomips : 4623.16
>>> TLB size : 1024 4K pages
>>> clflush size : 64
>>> cache_alignment : 64
>>> address sizes : 48 bits physical, 48 bits virtual
>>> power management: ts ttp tm stc 100mhzsteps hwpstate
>>>
>>> processor : 4
>>> vendor_id : AuthenticAMD
>>> cpu family : 16
>>> model : 4
>>> model name : Quad-Core AMD Opteron(tm) Processor 2376
>>> stepping : 2
>>> cpu MHz : 2311.694
>>> cache size : 512 KB
>>> physical id : 1
>>> siblings : 4
>>> core id : 0
>>> cpu cores : 4
>>> apicid : 4
>>> initial apicid : 4
>>> fpu : yes
>>> fpu_exception : yes
>>> cpuid level : 5
>>> wp : yes
>>> flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nonstop_tsc extd_apicid pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt npt lbrv svm_lock nrip_save
>>> bogomips : 4623.16
>>> TLB size : 1024 4K pages
>>> clflush size : 64
>>> cache_alignment : 64
>>> address sizes : 48 bits physical, 48 bits virtual
>>> power management: ts ttp tm stc 100mhzsteps hwpstate
>>>
>>> processor : 5
>>> vendor_id : AuthenticAMD
>>> cpu family : 16
>>> model : 4
>>> model name : Quad-Core AMD Opteron(tm) Processor 2376
>>> stepping : 2
>>> cpu MHz : 2311.694
>>> cache size : 512 KB
>>> physical id : 1
>>> siblings : 4
>>> core id : 1
>>> cpu cores : 4
>>> apicid : 5
>>> initial apicid : 5
>>> fpu : yes
>>> fpu_exception : yes
>>> cpuid level : 5
>>> wp : yes
>>> flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nonstop_tsc extd_apicid pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt npt lbrv svm_lock nrip_save
>>> bogomips : 4623.16
>>> TLB size : 1024 4K pages
>>> clflush size : 64
>>> cache_alignment : 64
>>> address sizes : 48 bits physical, 48 bits virtual
>>> power management: ts ttp tm stc 100mhzsteps hwpstate
>>>
>>> processor : 6
>>> vendor_id : AuthenticAMD
>>> cpu family : 16
>>> model : 4
>>> model name : Quad-Core AMD Opteron(tm) Processor 2376
>>> stepping : 2
>>> cpu MHz : 2311.694
>>> cache size : 512 KB
>>> physical id : 1
>>> siblings : 4
>>> core id : 2
>>> cpu cores : 4
>>> apicid : 6
>>> initial apicid : 6
>>> fpu : yes
>>> fpu_exception : yes
>>> cpuid level : 5
>>> wp : yes
>>> flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nonstop_tsc extd_apicid pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt npt lbrv svm_lock nrip_save
>>> bogomips : 4623.17
>>> TLB size : 1024 4K pages
>>> clflush size : 64
>>> cache_alignment : 64
>>> address sizes : 48 bits physical, 48 bits virtual
>>> power management: ts ttp tm stc 100mhzsteps hwpstate
>>>
>>> processor : 7
>>> vendor_id : AuthenticAMD
>>> cpu family : 16
>>> model : 4
>>> model name : Quad-Core AMD Opteron(tm) Processor 2376
>>> stepping : 2
>>> cpu MHz : 2311.694
>>> cache size : 512 KB
>>> physical id : 1
>>> siblings : 4
>>> core id : 3
>>> cpu cores : 4
>>> apicid : 7
>>> initial apicid : 7
>>> fpu : yes
>>> fpu_exception : yes
>>> cpuid level : 5
>>> wp : yes
>>> flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nonstop_tsc extd_apicid pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt npt lbrv svm_lock nrip_save
>>> bogomips : 4623.18
>>> TLB size : 1024 4K pages
>>> clflush size : 64
>>> cache_alignment : 64
>>> address sizes : 48 bits physical, 48 bits virtual
>>> power management: ts ttp tm stc 100mhzsteps hwpstate
>>>
>>>
>>> On Jul 16, 2012, at 4:09 PM, Elken, Tom wrote:
>>>
>>>> Anne,
>>>>
>>>> output from "cat /proc/cpuinfo" on your node "hostname" may help those trying to answer.
>>>>
>>>> -Tom
>>>>
>>>>> -----Original Message-----
>>>>> From: users-bounces_at_[hidden] [mailto:users-bounces_at_[hidden]] On
>>>>> Behalf Of Ralph Castain
>>>>> Sent: Monday, July 16, 2012 2:47 PM
>>>>> To: Open MPI Users
>>>>> Subject: Re: [OMPI users] openmpi tar.gz for 1.6.1 or 1.6.2
>>>>>
>>>>> I gather there are two sockets on this node? So the second cmd line is equivalent
>>>>> to leaving "num-sockets" off of the cmd line?
>>>>>
>>>>> I haven't tried what you are doing, so it is quite possible this is a bug.
>>>>>
>>>>>
>>>>> On Jul 16, 2012, at 1:49 PM, Anne M. Hammond wrote:
>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>> Built the latest snapshot. Still getting an error when trying to run
>>>>>> on only one socket (see below): Is there a workaround?
>>>>>>
>>>>>> [hammond_at_node65 bin]$ ./mpirun -np 4 --num-sockets 1 --npersocket 4
>>>>>> hostname
>>>>>> ----------------------------------------------------------------------
>>>>>> ---- An invalid physical processor ID was returned when attempting to
>>>>>> bind an MPI process to a unique processor.
>>>>>>
>>>>>> This usually means that you requested binding to more processors than
>>>>>> exist (e.g., trying to bind N MPI processes to M processors, where N >
>>>>>> M). Double check that you have enough unique processors for all the
>>>>>> MPI processes that you are launching on this host.
>>>>>>
>>>>>> You job will now abort.
>>>>>> ----------------------------------------------------------------------
>>>>>> ----
>>>>>> ----------------------------------------------------------------------
>>>>>> ---- mpirun was unable to start the specified application as it
>>>>>> encountered an error:
>>>>>>
>>>>>> Error name: Fatal
>>>>>> Node: node65.cl.corp.com
>>>>>>
>>>>>> when attempting to start process rank 0.
>>>>>> ----------------------------------------------------------------------
>>>>>> ----
>>>>>> 4 total processes failed to start
>>>>>>
>>>>>>
>>>>>> [hammond_at_node65 bin]$ ./mpirun -np 4 --num-sockets 2 --npersocket 4
>>>>>> hostname node65.cl.corp.com node65.cl.corp.com node65.cl.corp.com
>>>>>> node65.cl.corp.com
>>>>>> [hammond_at_node65 bin]$
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Jul 16, 2012, at 12:56 PM, Ralph Castain wrote:
>>>>>>
>>>>>>> Jeff is at the MPI Forum this week, so his answers will be delayed. Last I
>>>>> heard, it was close, but no specific date has been set.
>>>>>>>
>>>>>>>
>>>>>>> On Jul 16, 2012, at 11:49 AM, Michael E. Thomadakis wrote:
>>>>>>>
>>>>>>>> When is the expected date for the official 1.6.1 (or 1.6.2 ?) to be available ?
>>>>>>>>
>>>>>>>> mike
>>>>>>>>
>>>>>>>> On 07/16/2012 01:44 PM, Ralph Castain wrote:
>>>>>>>>> You can get it here:
>>>>>>>>>
>>>>>>>>> http://www.open-mpi.org/nightly/v1.6/
>>>>>>>>>
>>>>>>>>> On Jul 16, 2012, at 10:22 AM, Anne M. Hammond wrote:
>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> For benchmarking, we would like to use openmpi with
>>>>>>>>>> --num-sockets 1
>>>>>>>>>>
>>>>>>>>>> This fails in 1.6, but Bug Report #3119 indicates it is changed in
>>>>>>>>>> 1.6.1.
>>>>>>>>>>
>>>>>>>>>> Is 1.6.1 or 1.6.2 available in tar.gz form?
>>>>>>>>>>
>>>>>>>>>> Thanks!
>>>>>>>>>> Anne
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> users mailing list
>>>>>>>>>> users_at_[hidden]
>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>> _______________________________________________
>>>>>>>>> users mailing list
>>>>>>>>> users_at_[hidden]
>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> users mailing list
>>>>>>>> users_at_[hidden]
>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> users mailing list
>>>>>>> users_at_[hidden]
>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>
>>>>>>
>>>>>> Anne M. Hammond - Systems / Network Administration - Tech-X Corp
>>>>>> hammond_at_txcorp.com 720-974-1840
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> users_at_[hidden]
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>
>>> Anne M. Hammond - Systems / Network Administration - Tech-X Corp
>>> hammond_at_txcorp.com 720-974-1840
>>>
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>>
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>> --
>> Jun.-Prof. Dr. Dominik Göddeke
>> Hardware-orientierte Numerik für große Systeme
>> Institut für Angewandte Mathematik (LS III)
>> Fakultät für Mathematik, Technische Universität Dortmund
>>
>> http://www.mathematik.tu-dortmund.de/~goeddeke
>>
>> Tel. +49-(0)231-755-7218 Fax +49-(0)231-755-5933
>> --
>> Sent from my old-fashioned computer and not from a mobile device.
>> I proudly boycott 24/7 availability.
>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Anne M. Hammond - Systems / Network Administration - Tech-X Corp
                  hammond_at_txcorp.com 720-974-1840