Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] problem with rankfile
From: Ralph Castain (rhc_at_[hidden])
Date: 2012-09-10 14:53:49


Hmmm...well, let's try to isolate this a little. Would you mind installing a copy of the current trunk on this machine and trying it?

I ask because I'd like to better understand if the problem is in the actual binding mechanism (i.e., hwloc), or in the code that computes where to bind the process (i.e., in orte). The trunk uses a completely different method for doing the latter, so if the trunk works, then we can probably rule out hwloc as the culprit here.

On Sep 10, 2012, at 4:34 AM, Siegmar Gross <Siegmar.Gross_at_[hidden]> wrote:

> Hi,
>
>>> are the following outputs helpful to find the error with
>>> a rankfile on Solaris?
>>
>> If you can't bind on the new Solaris machine, then the rankfile
>> won't do you any good. It looks like we are getting the incorrect
>> number of cores on that machine - is it possible that it has
>> hardware threads, and doesn't report "cores"? Can you download
>> and run a copy of lstopo to check the output? You get that from
>> the hwloc folks:
>>
>> http://www.open-mpi.org/software/hwloc/v1.5/
>
> I downloaded and installed the package on our machines. Perhaps it is
> easier to detect the error if you have more information. Therefore I
> provide the different hardware architecures of all machines on which
> a simple program breaks if I try to bind processes to sockets or cores.
>
> I tried the following five commands with "h" one of "tyr", "rs0",
> "linpc0", "linpc1", "linpc2", "linpc4", "sunpc0", "sunpc1",
> "sunpc2", or "sunpc4" in a shell script file which I started on
> my local machine ("tyr"). "works on" means that the small program
> (MPI_Init, printf, MPI_Finalize) didn't break. I didn't check if
> the layout of the processes was correct.
>
>
> mpiexec -report-bindings -np 4 -host h init_finalize
>
> works on: tyr, rs0, linpc0, linpc1, linpc2, linpc4, sunpc0, sunpc1,
> sunpc2, sunpc4
> breaks on: -
>
>
> mpiexec -report-bindings -np 4 -host h -bind-to-core -bycore init_finalize
>
> works on: linpc2, sunpc1
> breaks on: tyr, rs0, linpc0, linpc1, linpc4, sunpc0, sunpc2, sunpc4
>
>
> mpiexec -report-bindings -np 4 -host h -bind-to-core -bysocket init_finalize
>
> works on: linpc2, sunpc1
> breaks on: tyr, rs0, linpc0, linpc1, linpc4, sunpc0, sunpc2, sunpc4
>
>
> mpiexec -report-bindings -np 4 -host h -bind-to-socket -bycore init_finalize
>
> works on: tyr, linpc1, linpc2, sunpc1, sunpc2
> breaks on: rs0, linpc0, linpc4, sunpc0, sunpc4
>
>
> mpiexec -report-bindings -np 4 -host h -bind-to-socket -bysocket init_finalize
>
> works on: tyr, linpc1, linpc2, sunpc1, sunpc2
> breaks on: rs0, linpc0, linpc4, sunpc0, sunpc4
>
>
>
> "lstopo" shows the following hardware configurations for the above
> machines. The first line always shows the installed architecture.
> "lstopo" does a good job as far as I can see it.
>
> tyr:
> ----
>
> UltraSPARC-IIIi, 2 single core processors, no hardware threads
>
> tyr fd1026 183 lstopo
> Machine (4096MB)
> NUMANode L#0 (P#2 2048MB) + Socket L#0 + Core L#0 + PU L#0 (P#0)
> NUMANode L#1 (P#1 2048MB) + Socket L#1 + Core L#1 + PU L#1 (P#1)
>
> tyr fd1026 116 psrinfo -pv
> The physical processor has 1 virtual processor (0)
> UltraSPARC-IIIi (portid 0 impl 0x16 ver 0x34 clock 1600 MHz)
> The physical processor has 1 virtual processor (1)
> UltraSPARC-IIIi (portid 1 impl 0x16 ver 0x34 clock 1600 MHz)
>
>
> rs0, rs1:
> ---------
>
> SPARC64-VII, 2 quad-core processors, 2 hardware threads / core
>
> rs0 fd1026 105 lstopo
> Machine (32GB) + NUMANode L#0 (P#1 32GB)
> Socket L#0
> Core L#0
> PU L#0 (P#0)
> PU L#1 (P#1)
> Core L#1
> PU L#2 (P#2)
> PU L#3 (P#3)
> Core L#2
> PU L#4 (P#4)
> PU L#5 (P#5)
> Core L#3
> PU L#6 (P#6)
> PU L#7 (P#7)
> Socket L#1
> Core L#4
> PU L#8 (P#8)
> PU L#9 (P#9)
> Core L#5
> PU L#10 (P#10)
> PU L#11 (P#11)
> Core L#6
> PU L#12 (P#12)
> PU L#13 (P#13)
> Core L#7
> PU L#14 (P#14)
> PU L#15 (P#15)
>
> tyr fd1026 117 ssh rs0 psrinfo -pv
> The physical processor has 8 virtual processors (0-7)
> SPARC64-VII (portid 1024 impl 0x7 ver 0x91 clock 2400 MHz)
> The physical processor has 8 virtual processors (8-15)
> SPARC64-VII (portid 1032 impl 0x7 ver 0x91 clock 2400 MHz)
>
>
> linpc0, linpc3:
> ---------------
>
> AMD Athlon64 X2, 1 dual-core processor, no hardware threads
>
> linpc0 fd1026 102 lstopo
> Machine (4023MB) + Socket L#0
> L2 L#0 (512KB) + L1d L#0 (64KB) + L1i L#0 (64KB) + Core L#0 + PU L#0 (P#0)
> L2 L#1 (512KB) + L1d L#1 (64KB) + L1i L#1 (64KB) + Core L#1 + PU L#1 (P#1)
>
>
> It is strange that openSuSE-Linux-12.1 thinks that two
> dual-core processors are available although the machines
> are only equipped with one processor.
>
> linpc0 fd1026 104 cat /proc/cpuinfo | grep -e processor -e "cpu core"
> processor : 0
> cpu cores : 2
> processor : 1
> cpu cores : 2
>
>
> linpc1:
> -------
>
> Intel Xeon, 2 single core processors, no hardware threads
>
> linpc1 fd1026 104 lstopo
> Machine (3829MB)
> Socket L#0 + Core L#0 + PU L#0 (P#0)
> Socket L#1 + Core L#1 + PU L#1 (P#1)
>
> tyr fd1026 118 ssh linpc1 cat /proc/cpuinfo | grep -e processor -e "cpu core"
> processor : 0
> cpu cores : 1
> processor : 1
> cpu cores : 1
>
>
> linpc2:
> -------
>
> AMD Opteron 280, 2 dual-core processors, no hardware threads
>
> linpc2 fd1026 103 lstopo
> Machine (8190MB)
> NUMANode L#0 (P#0 4094MB) + Socket L#0
> L2 L#0 (1024KB) + L1d L#0 (64KB) + L1i L#0 (64KB) + Core L#0 + PU L#0 (P#0)
> L2 L#1 (1024KB) + L1d L#1 (64KB) + L1i L#1 (64KB) + Core L#1 + PU L#1 (P#1)
> NUMANode L#1 (P#1 4096MB) + Socket L#1
> L2 L#2 (1024KB) + L1d L#2 (64KB) + L1i L#2 (64KB) + Core L#2 + PU L#2 (P#2)
> L2 L#3 (1024KB) + L1d L#3 (64KB) + L1i L#3 (64KB) + Core L#3 + PU L#3 (P#3)
>
> It is strange that openSuSE-Linux-12.1 thinks that four
> dual-core processors are available although the machine
> is only equipped with two processors.
>
> linpc2 fd1026 104 cat /proc/cpuinfo | grep -e processor -e "cpu core"
> processor : 0
> cpu cores : 2
> processor : 1
> cpu cores : 2
> processor : 2
> cpu cores : 2
> processor : 3
> cpu cores : 2
>
>
>
> linpc4:
> -------
>
> AMD Opteron 1218, 1 dual-core processors, no hardware threads
>
> linpc4 fd1026 100 lstopo
> Machine (4024MB) + Socket L#0
> L2 L#0 (1024KB) + L1d L#0 (64KB) + L1i L#0 (64KB) + Core L#0 + PU L#0 (P#0)
> L2 L#1 (1024KB) + L1d L#1 (64KB) + L1i L#1 (64KB) + Core L#1 + PU L#1 (P#1)
>
> It is strange that openSuSE-Linux-12.1 thinks that two
> dual-core processors are available although the machine
> is only equipped with one processor.
>
> tyr fd1026 230 ssh linpc4 cat /proc/cpuinfo | grep -e processor -e "cpu core"
> processor : 0
> cpu cores : 2
> processor : 1
> cpu cores : 2
>
>
>
> sunpc0, sunpc3:
> ---------------
>
> AMD Athlon64 X2, 1 dual-core processor, no hardware threads
>
> sunpc0 fd1026 104 lstopo
> Machine (4094MB) + NUMANode L#0 (P#0 4094MB) + Socket L#0
> Core L#0 + PU L#0 (P#0)
> Core L#1 + PU L#1 (P#1)
>
> tyr fd1026 111 ssh sunpc0 psrinfo -pv
> The physical processor has 2 virtual processors (0 1)
> x86 (chipid 0x0 AuthenticAMD family 15 model 43 step 1 clock 2000 MHz)
> AMD Athlon(tm) 64 X2 Dual Core Processor 3800+
>
>
> sunpc1:
> -------
>
> AMD Opteron 280, 2 dual-core processors, no hardware threads
>
> sunpc1 fd1026 104 lstopo
> Machine (8191MB)
> NUMANode L#0 (P#1 4095MB) + Socket L#0
> Core L#0 + PU L#0 (P#0)
> Core L#1 + PU L#1 (P#1)
> NUMANode L#1 (P#2 4096MB) + Socket L#1
> Core L#2 + PU L#2 (P#2)
> Core L#3 + PU L#3 (P#3)
>
> tyr fd1026 112 ssh sunpc1 psrinfo -pv
> The physical processor has 2 virtual processors (0 1)
> x86 (chipid 0x0 AuthenticAMD family 15 model 33 step 2 clock 2411 MHz)
> Dual Core AMD Opteron(tm) Processor 280
> The physical processor has 2 virtual processors (2 3)
> x86 (chipid 0x1 AuthenticAMD family 15 model 33 step 2 clock 2411 MHz)
> Dual Core AMD Opteron(tm) Processor 280
>
>
> sunpc2:
> -------
>
> Intel Xeon, 2 single core processors, no hardware threads
>
> sunpc2 fd1026 104 lstopo
> Machine (3904MB) + NUMANode L#0 (P#0 3904MB)
> Socket L#0 + Core L#0 + PU L#0 (P#0)
> Socket L#1 + Core L#1 + PU L#1 (P#1)
>
> tyr fd1026 114 ssh sunpc2 psrinfo -pv
> The physical processor has 1 virtual processor (0)
> x86 (chipid 0x0 GenuineIntel family 15 model 2 step 9 clock 2791 MHz)
> Intel(r) Xeon(tm) CPU 2.80GHz
> The physical processor has 1 virtual processor (1)
> x86 (chipid 0x3 GenuineIntel family 15 model 2 step 9 clock 2791 MHz)
> Intel(r) Xeon(tm) CPU 2.80GHz
>
>
> sunpc4:
> -------
>
> AMD Opteron 1218, 1 dual-core processor, no hardware threads
>
> sunpc4 fd1026 104 lstopo
> Machine (4096MB) + NUMANode L#0 (P#0 4096MB) + Socket L#0
> Core L#0 + PU L#0 (P#0)
> Core L#1 + PU L#1 (P#1)
>
> tyr fd1026 115 ssh sunpc4 psrinfo -pv
> The physical processor has 2 virtual processors (0 1)
> x86 (chipid 0x0 AuthenticAMD family 15 model 67 step 2 clock 2613 MHz)
> Dual-Core AMD Opteron(tm) Processor 1218
>
>
>
>
> Among others I got the following error messages (I can provide
> the complete file if you are interested in it).
>
> ##################
> ##################
> mpiexec -report-bindings -np 4 -host tyr -bind-to-core -bycore init_finalize
> [tyr.informatik.hs-fulda.de:23208] [[30908,0],0] odls:default:fork binding child
> [[30908,1],2] to cpus 0004
> --------------------------------------------------------------------------
> An attempt to set processor affinity has failed - please check to
> ensure that your system supports such functionality. If so, then
> this is probably something that should be reported to the OMPI developers.
> --------------------------------------------------------------------------
> [tyr.informatik.hs-fulda.de:23208] [[30908,0],0] odls:default:fork binding child
> [[30908,1],0] to cpus 0001
> [tyr.informatik.hs-fulda.de:23208] [[30908,0],0] odls:default:fork binding child
> [[30908,1],1] to cpus 0002
> --------------------------------------------------------------------------
> mpiexec was unable to start the specified application as it encountered an error
> on node tyr.informatik.hs-fulda.de. More information may be available above.
> --------------------------------------------------------------------------
> 4 total processes failed to start
>
>
> ##################
> ##################
> mpiexec -report-bindings -np 4 -host tyr -bind-to-core -bysocket init_finalize
> --------------------------------------------------------------------------
> An invalid physical processor ID was returned when attempting to bind
> an MPI process to a unique processor.
>
> This usually means that you requested binding to more processors than
> exist (e.g., trying to bind N MPI processes to M processors, where N >
> M). Double check that you have enough unique processors for all the
> MPI processes that you are launching on this host.
>
> You job will now abort.
> --------------------------------------------------------------------------
> [tyr.informatik.hs-fulda.de:23215] [[30907,0],0] odls:default:fork binding child
> [[30907,1],0] to socket 0 cpus 0001
> [tyr.informatik.hs-fulda.de:23215] [[30907,0],0] odls:default:fork binding child
> [[30907,1],1] to socket 1 cpus 0002
> --------------------------------------------------------------------------
> mpiexec was unable to start the specified application as it encountered an error
> on node tyr.informatik.hs-fulda.de. More information may be available above.
> --------------------------------------------------------------------------
> 4 total processes failed to start
>
>
> ##################
> ##################
> mpiexec -report-bindings -np 4 -host rs0 -bind-to-core -bycore init_finalize
> --------------------------------------------------------------------------
> An attempt to set processor affinity has failed - please check to
> ensure that your system supports such functionality. If so, then
> this is probably something that should be reported to the OMPI developers.
> --------------------------------------------------------------------------
> [rs0.informatik.hs-fulda.de:05715] [[30936,0],1] odls:default:fork binding child
> [[30936,1],0] to cpus 0001
> --------------------------------------------------------------------------
> mpiexec was unable to start the specified application as it encountered an
> error:
>
> Error name: Resource temporarily unavailable
> Node: rs0
>
> when attempting to start process rank 0.
> --------------------------------------------------------------------------
> 4 total processes failed to start
>
>
> ##################
> ##################
> mpiexec -report-bindings -np 4 -host rs0 -bind-to-core -bysocket init_finalize
> --------------------------------------------------------------------------
> An attempt to set processor affinity has failed - please check to
> ensure that your system supports such functionality. If so, then
> this is probably something that should be reported to the OMPI developers.
> --------------------------------------------------------------------------
> [rs0.informatik.hs-fulda.de:05743] [[30916,0],1] odls:default:fork binding child
> [[30916,1],0] to socket 0 cpus 0001
> --------------------------------------------------------------------------
> mpiexec was unable to start the specified application as it encountered an
> error:
>
> Error name: Resource temporarily unavailable
> Node: rs0
>
> when attempting to start process rank 0.
> --------------------------------------------------------------------------
> 4 total processes failed to start
>
>
> ##################
> ##################
> mpiexec -report-bindings -np 4 -host rs0 -bind-to-socket -bycore init_finalize
> --------------------------------------------------------------------------
> An attempt to set processor affinity has failed - please check to
> ensure that your system supports such functionality. If so, then
> this is probably something that should be reported to the OMPI developers.
> --------------------------------------------------------------------------
> [rs0.informatik.hs-fulda.de:05771] [[30912,0],1] odls:default:fork binding child
> [[30912,1],0] to socket 0 cpus 0055
> --------------------------------------------------------------------------
> mpiexec was unable to start the specified application as it encountered an
> error:
>
> Error name: Resource temporarily unavailable
> Node: rs0
>
> when attempting to start process rank 0.
> --------------------------------------------------------------------------
> 4 total processes failed to start
>
>
> ##################
> ##################
> mpiexec -report-bindings -np 4 -host rs0 -bind-to-socket -bysocket init_finalize
> --------------------------------------------------------------------------
> An attempt to set processor affinity has failed - please check to
> ensure that your system supports such functionality. If so, then
> this is probably something that should be reported to the OMPI developers.
> --------------------------------------------------------------------------
> [rs0.informatik.hs-fulda.de:05799] [[30924,0],1] odls:default:fork binding child
> [[30924,1],0] to socket 0 cpus 0055
> --------------------------------------------------------------------------
> mpiexec was unable to start the specified application as it encountered an
> error:
>
> Error name: Resource temporarily unavailable
> Node: rs0
>
> when attempting to start process rank 0.
> --------------------------------------------------------------------------
> 4 total processes failed to start
>
>
> ##################
> ##################
> mpiexec -report-bindings -np 4 -host linpc0 -bind-to-core -bycore init_finalize
> --------------------------------------------------------------------------
> An attempt to set processor affinity has failed - please check to
> ensure that your system supports such functionality. If so, then
> this is probably something that should be reported to the OMPI developers.
> --------------------------------------------------------------------------
> [linpc0:02275] [[30964,0],1] odls:default:fork binding child [[30964,1],0] to
> cpus 0001
> [linpc0:02275] [[30964,0],1] odls:default:fork binding child [[30964,1],1] to
> cpus 0002
> [linpc0:02275] [[30964,0],1] odls:default:fork binding child [[30964,1],2] to
> cpus 0004
> --------------------------------------------------------------------------
> mpiexec was unable to start the specified application as it encountered an error
> on node linpc0. More information may be available above.
> --------------------------------------------------------------------------
> 4 total processes failed to start
>
>
> ##################
> ##################
> mpiexec -report-bindings -np 4 -host linpc0 -bind-to-core -bysocket
> init_finalize
> --------------------------------------------------------------------------
> An invalid physical processor ID was returned when attempting to bind
> an MPI process to a unique processor.
>
> This usually means that you requested binding to more processors than
> exist (e.g., trying to bind N MPI processes to M processors, where N >
> M). Double check that you have enough unique processors for all the
> MPI processes that you are launching on this host.
>
> You job will now abort.
> --------------------------------------------------------------------------
> [linpc0:02326] [[30960,0],1] odls:default:fork binding child [[30960,1],0] to
> socket 0 cpus 0001
> [linpc0:02326] [[30960,0],1] odls:default:fork binding child [[30960,1],1] to
> socket 0 cpus 0002
> --------------------------------------------------------------------------
> mpiexec was unable to start the specified application as it encountered an error
> on node linpc0. More information may be available above.
> --------------------------------------------------------------------------
> 4 total processes failed to start
>
>
> ##################
> ##################
> mpiexec -report-bindings -np 4 -host linpc0 -bind-to-socket -bycore
> init_finalize
> --------------------------------------------------------------------------
> Unable to bind to socket 0 on node linpc0.
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> mpiexec was unable to start the specified application as it encountered an
> error:
>
> Error name: Fatal
> Node: linpc0
>
> when attempting to start process rank 0.
> --------------------------------------------------------------------------
> 4 total processes failed to start
>
>
> ##################
> ##################
> mpiexec -report-bindings -np 4 -host linpc0 -bind-to-socket -bysocket
> init_finalize
> --------------------------------------------------------------------------
> Unable to bind to socket 0 on node linpc0.
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> mpiexec was unable to start the specified application as it encountered an
> error:
>
> Error name: Fatal
> Node: linpc0
>
> when attempting to start process rank 0.
> --------------------------------------------------------------------------
> 4 total processes failed to start
>
>
>
> Hopefully this helps to track the error. Thank you very much
> for your help in advance.
>
>
> Kind regards
>
> Siegmar
>
>
>
>>> I wrapped long lines so that they
>>> are easier to read. Have you had time to look at the
>>> segmentation fault with a rankfile which I reported in my
>>> last email (see below)?
>>
>> I'm afraid not - been too busy lately. I'd suggest first focusing
>> on getting binding to work.
>>
>>>
>>> "tyr" is a two processor single core machine.
>>>
>>> tyr fd1026 116 mpiexec -report-bindings -np 4 \
>>> -bind-to-socket -bycore rank_size
>>> [tyr.informatik.hs-fulda.de:18614] [[27298,0],0] odls:default:
>>> fork binding child [[27298,1],0] to socket 0 cpus 0001
>>> [tyr.informatik.hs-fulda.de:18614] [[27298,0],0] odls:default:
>>> fork binding child [[27298,1],1] to socket 1 cpus 0002
>>> [tyr.informatik.hs-fulda.de:18614] [[27298,0],0] odls:default:
>>> fork binding child [[27298,1],2] to socket 0 cpus 0001
>>> [tyr.informatik.hs-fulda.de:18614] [[27298,0],0] odls:default:
>>> fork binding child [[27298,1],3] to socket 1 cpus 0002
>>> I'm process 0 of 4 ...
>>>
>>>
>>> tyr fd1026 121 mpiexec -report-bindings -np 4 \
>>> -bind-to-socket -bysocket rank_size
>>> [tyr.informatik.hs-fulda.de:18656] [[27380,0],0] odls:default:
>>> fork binding child [[27380,1],0] to socket 0 cpus 0001
>>> [tyr.informatik.hs-fulda.de:18656] [[27380,0],0] odls:default:
>>> fork binding child [[27380,1],1] to socket 1 cpus 0002
>>> [tyr.informatik.hs-fulda.de:18656] [[27380,0],0] odls:default:
>>> fork binding child [[27380,1],2] to socket 0 cpus 0001
>>> [tyr.informatik.hs-fulda.de:18656] [[27380,0],0] odls:default:
>>> fork binding child [[27380,1],3] to socket 1 cpus 0002
>>> I'm process 0 of 4 ...
>>>
>>>
>>> tyr fd1026 117 mpiexec -report-bindings -np 4 \
>>> -bind-to-core -bycore rank_size
>>> [tyr.informatik.hs-fulda.de:18623] [[27307,0],0] odls:default:
>>> fork binding child [[27307,1],2] to cpus 0004
>>> ------------------------------------------------------------------
>>> An attempt to set processor affinity has failed - please check to
>>> ensure that your system supports such functionality. If so, then
>>> this is probably something that should be reported to the OMPI
>>> developers.
>>> ------------------------------------------------------------------
>>> [tyr.informatik.hs-fulda.de:18623] [[27307,0],0] odls:default:
>>> fork binding child [[27307,1],0] to cpus 0001
>>> [tyr.informatik.hs-fulda.de:18623] [[27307,0],0] odls:default:
>>> fork binding child [[27307,1],1] to cpus 0002
>>> ------------------------------------------------------------------
>>> mpiexec was unable to start the specified application
>>> as it encountered an error
>>> on node tyr.informatik.hs-fulda.de. More information may be
>>> available above.
>>> ------------------------------------------------------------------
>>> 4 total processes failed to start
>>>
>>>
>>>
>>> tyr fd1026 118 mpiexec -report-bindings -np 4 \
>>> -bind-to-core -bysocket rank_size
>>> ------------------------------------------------------------------
>>> An invalid physical processor ID was returned when attempting to
>>> bind
>>> an MPI process to a unique processor.
>>>
>>> This usually means that you requested binding to more processors
>>> than
>>>
>>> exist (e.g., trying to bind N MPI processes to M processors,
>>> where N >
>>> M). Double check that you have enough unique processors for
>>> all the
>>> MPI processes that you are launching on this host.
>>>
>>> You job will now abort.
>>> ------------------------------------------------------------------
>>> [tyr.informatik.hs-fulda.de:18631] [[27347,0],0] odls:default:
>>> fork binding child [[27347,1],0] to socket 0 cpus 0001
>>> [tyr.informatik.hs-fulda.de:18631] [[27347,0],0] odls:default:
>>> fork binding child [[27347,1],1] to socket 1 cpus 0002
>>> ------------------------------------------------------------------
>>> mpiexec was unable to start the specified application as it
>>> encountered an error
>>> on node tyr.informatik.hs-fulda.de. More information may be
>>> available above.
>>> ------------------------------------------------------------------
>>> 4 total processes failed to start
>>> tyr fd1026 119
>>>
>>>
>>>
>>> "linpc3" and "linpc4" are two processor dual core machines.
>>>
>>> linpc4 fd1026 102 mpiexec -report-bindings -host linpc3,linpc4 \
>>> -np 4 -bind-to-core -bycore rank_size
>>> [linpc4:16842] [[40914,0],0] odls:default:
>>> fork binding child [[40914,1],1] to cpus 0001
>>> [linpc4:16842] [[40914,0],0] odls:default:
>>> fork binding child [[40914,1],3] to cpus 0002
>>> [linpc3:31384] [[40914,0],1] odls:default:
>>> fork binding child [[40914,1],0] to cpus 0001
>>> [linpc3:31384] [[40914,0],1] odls:default:
>>> fork binding child [[40914,1],2] to cpus 0002
>>> I'm process 1 of 4 ...
>>>
>>>
>>> linpc4 fd1026 102 mpiexec -report-bindings -host linpc3,linpc4 \
>>> -np 4 -bind-to-core -bysocket rank_size
>>> [linpc4:16846] [[40918,0],0] odls:default:
>>> fork binding child [[40918,1],1] to socket 0 cpus 0001
>>> [linpc4:16846] [[40918,0],0] odls:default:
>>> fork binding child [[40918,1],3] to socket 0 cpus 0002
>>> [linpc3:31435] [[40918,0],1] odls:default:
>>> fork binding child [[40918,1],0] to socket 0 cpus 0001
>>> [linpc3:31435] [[40918,0],1] odls:default:
>>> fork binding child [[40918,1],2] to socket 0 cpus 0002
>>> I'm process 1 of 4 ...
>>>
>>>
>>>
>>>
>>> linpc4 fd1026 104 mpiexec -report-bindings -host linpc3,linpc4 \
>>> -np 4 -bind-to-socket -bycore rank_size
>>> ------------------------------------------------------------------
>>> Unable to bind to socket 0 on node linpc3.
>>> ------------------------------------------------------------------
>>> ------------------------------------------------------------------
>>> Unable to bind to socket 0 on node linpc4.
>>> ------------------------------------------------------------------
>>> ------------------------------------------------------------------
>>> mpiexec was unable to start the specified application as it
>>> encountered an error:
>>>
>>> Error name: Fatal
>>> Node: linpc4
>>>
>>> when attempting to start process rank 1.
>>> ------------------------------------------------------------------
>>> 4 total processes failed to start
>>> linpc4 fd1026 105
>>>
>>>
>>> linpc4 fd1026 105 mpiexec -report-bindings -host linpc3,linpc4 \
>>> -np 4 -bind-to-socket -bysocket rank_size
>>> ------------------------------------------------------------------
>>> Unable to bind to socket 0 on node linpc4.
>>> ------------------------------------------------------------------
>>> ------------------------------------------------------------------
>>> Unable to bind to socket 0 on node linpc3.
>>> ------------------------------------------------------------------
>>> ------------------------------------------------------------------
>>> mpiexec was unable to start the specified application as it
>>> encountered an error:
>>>
>>> Error name: Fatal
>>> Node: linpc4
>>>
>>> when attempting to start process rank 1.
>>> --------------------------------------------------------------------------
>>> 4 total processes failed to start
>>>
>>>
>>> It's interesting that commands that work on Solaris fail on Linux
>>> and vice versa.
>>>
>>>
>>> Kind regards
>>>
>>> Siegmar
>>>
>>>>> I couldn't really say for certain - I don't see anything obviously
>>>>> wrong with your syntax, and the code appears to be working or else
>>>>> it would fail on the other nodes as well. The fact that it fails
>>>>> solely on that machine seems suspect.
>>>>>
>>>>> Set aside the rankfile for the moment and try to just bind to cores
>>>>> on that machine, something like:
>>>>>
>>>>> mpiexec --report-bindings -bind-to-core
>>>>> -host rs0.informatik.hs-fulda.de -n 2 rank_size
>>>>>
>>>>> If that doesn't work, then the problem isn't with rankfile
>>>>
>>>> It doesn't work but I found out something else as you can see below.
>>>> I get a segmentation fault for some rankfiles.
>>>>
>>>>
>>>> tyr small_prog 110 mpiexec --report-bindings -bind-to-core
>>>> -host rs0.informatik.hs-fulda.de -n 2 rank_size
>>>> --------------------------------------------------------------------------
>>>> An attempt to set processor affinity has failed - please check to
>>>> ensure that your system supports such functionality. If so, then
>>>> this is probably something that should be reported to the OMPI developers.
>>>> --------------------------------------------------------------------------
>>>> [rs0.informatik.hs-fulda.de:14695] [[30561,0],1] odls:default:
>>>> fork binding child [[30561,1],0] to cpus 0001
>>>> --------------------------------------------------------------------------
>>>> mpiexec was unable to start the specified application as it
>>>> encountered an error:
>>>>
>>>> Error name: Resource temporarily unavailable
>>>> Node: rs0.informatik.hs-fulda.de
>>>>
>>>> when attempting to start process rank 0.
>>>> --------------------------------------------------------------------------
>>>> 2 total processes failed to start
>>>> tyr small_prog 111
>>>>
>>>>
>>>>
>>>>
>>>> Perhaps I have a hint for the error on Solaris Sparc. I use the
>>>> following rankfile to keep everything simple.
>>>>
>>>> rank 0=tyr.informatik.hs-fulda.de slot=0:0
>>>> rank 1=linpc0.informatik.hs-fulda.de slot=0:0
>>>> rank 2=linpc1.informatik.hs-fulda.de slot=0:0
>>>> #rank 3=linpc2.informatik.hs-fulda.de slot=0:0
>>>> rank 4=linpc3.informatik.hs-fulda.de slot=0:0
>>>> rank 5=linpc4.informatik.hs-fulda.de slot=0:0
>>>> rank 6=sunpc0.informatik.hs-fulda.de slot=0:0
>>>> rank 7=sunpc1.informatik.hs-fulda.de slot=0:0
>>>> rank 8=sunpc2.informatik.hs-fulda.de slot=0:0
>>>> rank 9=sunpc3.informatik.hs-fulda.de slot=0:0
>>>> rank 10=sunpc4.informatik.hs-fulda.de slot=0:0
>>>>
>>>> When I execute "mpiexec -report-bindings -rf my_rankfile rank_size"
>>>> on a Linux-x86_64 or Solaris-10-x86_64 machine everything works fine.
>>>>
>>>> linpc4 small_prog 104 mpiexec -report-bindings -rf my_rankfile rank_size
>>>> [linpc4:08018] [[49482,0],0] odls:default:fork binding child
>>>> [[49482,1],5] to slot_list 0:0
>>>> [linpc3:22030] [[49482,0],4] odls:default:fork binding child
>>>> [[49482,1],4] to slot_list 0:0
>>>> [linpc0:12887] [[49482,0],2] odls:default:fork binding child
>>>> [[49482,1],1] to slot_list 0:0
>>>> [linpc1:08323] [[49482,0],3] odls:default:fork binding child
>>>> [[49482,1],2] to slot_list 0:0
>>>> [sunpc1:17786] [[49482,0],6] odls:default:fork binding child
>>>> [[49482,1],7] to slot_list 0:0
>>>> [sunpc3.informatik.hs-fulda.de:08482] [[49482,0],8] odls:default:fork
>>>> binding child [[49482,1],9] to slot_list 0:0
>>>> [sunpc0.informatik.hs-fulda.de:11568] [[49482,0],5] odls:default:fork
>>>> binding child [[49482,1],6] to slot_list 0:0
>>>> [tyr.informatik.hs-fulda.de:21484] [[49482,0],1] odls:default:fork
>>>> binding child [[49482,1],0] to slot_list 0:0
>>>> [sunpc2.informatik.hs-fulda.de:28638] [[49482,0],7] odls:default:fork
>>>> binding child [[49482,1],8] to slot_list 0:0
>>>> ...
>>>>
>>>>
>>>>
>>>> I get a segmentation fault when I run it on my local machine
>>>> (Solaris Sparc).
>>>>
>>>> tyr small_prog 141 mpiexec -report-bindings -rf my_rankfile rank_size
>>>> [tyr.informatik.hs-fulda.de:21421] [[29113,0],0] ORTE_ERROR_LOG:
>>>> Data unpack would read past end of buffer in file
>>>> ../../../../openmpi-1.6/orte/mca/odls/base/odls_base_default_fns.c
>>>> at line 927
>>>> [tyr:21421] *** Process received signal ***
>>>> [tyr:21421] Signal: Segmentation Fault (11)
>>>> [tyr:21421] Signal code: Address not mapped (1)
>>>> [tyr:21421] Failing at address: 5ba
>>>>
> /export2/prog/SunOS_sparc/openmpi-1.6_32_cc/lib/libopen-rte.so.4.0.0:0x15d3ec
>>>> /lib/libc.so.1:0xcad04
>>>> /lib/libc.so.1:0xbf3b4
>>>> /lib/libc.so.1:0xbf59c
>>>> /lib/libc.so.1:0x58bd0 [ Signal 11 (SEGV)]
>>>> /lib/libc.so.1:free+0x24
>>>> /export2/prog/SunOS_sparc/openmpi-1.6_32_cc/lib/libopen-rte.so.4.0.0:
>>>> orte_odls_base_default_construct_child_list+0x1234
>>>> /export2/prog/SunOS_sparc/openmpi-1.6_32_cc/lib/openmpi/
>>>> mca_odls_default.so:0x90b8
>>>>
> /export2/prog/SunOS_sparc/openmpi-1.6_32_cc/lib/libopen-rte.so.4.0.0:0x5e8d4
>>>> /export2/prog/SunOS_sparc/openmpi-1.6_32_cc/lib/libopen-rte.so.4.0.0:
>>>> orte_daemon_cmd_processor+0x328
>>>>
> /export2/prog/SunOS_sparc/openmpi-1.6_32_cc/lib/libopen-rte.so.4.0.0:0x12e324
>>>> /export2/prog/SunOS_sparc/openmpi-1.6_32_cc/lib/libopen-rte.so.4.0.0:
>>>> opal_event_base_loop+0x228
>>>> /export2/prog/SunOS_sparc/openmpi-1.6_32_cc/lib/libopen-rte.so.4.0.0:
>>>> opal_progress+0xec
>>>> /export2/prog/SunOS_sparc/openmpi-1.6_32_cc/lib/libopen-rte.so.4.0.0:
>>>> orte_plm_base_report_launched+0x1c4
>>>> /export2/prog/SunOS_sparc/openmpi-1.6_32_cc/lib/libopen-rte.so.4.0.0:
>>>> orte_plm_base_launch_apps+0x318
>>>> /export2/prog/SunOS_sparc/openmpi-1.6_32_cc/lib/openmpi/mca_plm_rsh.so:
>>>> orte_plm_rsh_launch+0xac4
>>>> /export2/prog/SunOS_sparc/openmpi-1.6_32_cc/bin/orterun:orterun+0x16a8
>>>> /export2/prog/SunOS_sparc/openmpi-1.6_32_cc/bin/orterun:main+0x24
>>>> /export2/prog/SunOS_sparc/openmpi-1.6_32_cc/bin/orterun:_start+0xd8
>>>> [tyr:21421] *** End of error message ***
>>>> Segmentation fault
>>>> tyr small_prog 142
>>>>
>>>>
>>>> The funny thing is that I get a segmentation fault on the Linux
>>>> machine as well if I change my rankfile in the following way.
>>>>
>>>> rank 0=tyr.informatik.hs-fulda.de slot=0:0
>>>> rank 1=linpc0.informatik.hs-fulda.de slot=0:0
>>>> #rank 2=linpc1.informatik.hs-fulda.de slot=0:0
>>>> #rank 3=linpc2.informatik.hs-fulda.de slot=0:0
>>>> #rank 4=linpc3.informatik.hs-fulda.de slot=0:0
>>>> rank 5=linpc4.informatik.hs-fulda.de slot=0:0
>>>> rank 6=sunpc0.informatik.hs-fulda.de slot=0:0
>>>> #rank 7=sunpc1.informatik.hs-fulda.de slot=0:0
>>>> #rank 8=sunpc2.informatik.hs-fulda.de slot=0:0
>>>> #rank 9=sunpc3.informatik.hs-fulda.de slot=0:0
>>>> rank 10=sunpc4.informatik.hs-fulda.de slot=0:0
>>>>
>>>>
>>>> linpc4 small_prog 107 mpiexec -report-bindings -rf my_rankfile rank_size
>>>> [linpc4:08402] [[65226,0],0] ORTE_ERROR_LOG: Data unpack would
>>>> read past end of buffer in file
>>>> ../../../../openmpi-1.6/orte/mca/odls/base/odls_base_default_fns.c
>>>> at line 927
>>>> [linpc4:08402] *** Process received signal ***
>>>> [linpc4:08402] Signal: Segmentation fault (11)
>>>> [linpc4:08402] Signal code: Address not mapped (1)
>>>> [linpc4:08402] Failing at address: 0x5f32fffc
>>>> [linpc4:08402] [ 0] [0xffffe410]
>>>> [linpc4:08402] [ 1] /usr/local/openmpi-1.6_32_cc/lib/openmpi/
>>>> mca_odls_default.so(+0x4023) [0xf73ec023]
>>>> [linpc4:08402] [ 2] /usr/local/openmpi-1.6_32_cc/lib/
>>>> libopen-rte.so.4(+0x42b91) [0xf7667b91]
>>>> [linpc4:08402] [ 3] /usr/local/openmpi-1.6_32_cc/lib/
>>>> libopen-rte.so.4(orte_daemon_cmd_processor+0x313) [0xf76655c3]
>>>> [linpc4:08402] [ 4] /usr/local/openmpi-1.6_32_cc/lib/
>>>> libopen-rte.so.4(+0x8f366) [0xf76b4366]
>>>> [linpc4:08402] [ 5] /usr/local/openmpi-1.6_32_cc/lib/
>>>> libopen-rte.so.4(opal_event_base_loop+0x18c) [0xf76b46bc]
>>>> [linpc4:08402] [ 6] /usr/local/openmpi-1.6_32_cc/lib/
>>>> libopen-rte.so.4(opal_event_loop+0x26) [0xf76b4526]
>>>> [linpc4:08402] [ 7] /usr/local/openmpi-1.6_32_cc/lib/
>>>> libopen-rte.so.4(opal_progress+0xba) [0xf769303a]
>>>> [linpc4:08402] [ 8] /usr/local/openmpi-1.6_32_cc/lib/
>>>> libopen-rte.so.4(orte_plm_base_report_launched+0x13f) [0xf767d62f]
>>>> [linpc4:08402] [ 9] /usr/local/openmpi-1.6_32_cc/lib/
>>>> libopen-rte.so.4(orte_plm_base_launch_apps+0x1b7) [0xf767bf27]
>>>> [linpc4:08402] [10] /usr/local/openmpi-1.6_32_cc/lib/openmpi/
>>>> mca_plm_rsh.so(orte_plm_rsh_launch+0xb2d) [0xf74228fd]
>>>> [linpc4:08402] [11] mpiexec(orterun+0x102f) [0x804e7bf]
>>>> [linpc4:08402] [12] mpiexec(main+0x13) [0x804c273]
>>>> [linpc4:08402] [13] /lib/libc.so.6(__libc_start_main+0xf3) [0xf745e003]
>>>> [linpc4:08402] *** End of error message ***
>>>> Segmentation fault
>>>> linpc4 small_prog 107
>>>>
>>>>
>>>> Hopefully this information helps to fix the problem.
>>>>
>>>>
>>>> Kind regards
>>>>
>>>> Siegmar
>>>>
>>>>
>>>>
>>>>
>>>>> On Sep 5, 2012, at 5:50 AM, Siegmar Gross
>>> <Siegmar.Gross_at_[hidden]> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I'm new to rankfiles so that I played a little bit with different
>>>>>> options. I thought that the following entry would be similar to an
>>>>>> entry in an appfile and that MPI could place the process with rank 0
>>>>>> on any core of any processor.
>>>>>>
>>>>>> rank 0=tyr.informatik.hs-fulda.de
>>>>>>
>>>>>> Unfortunately it's not allowed and I got an error. Can somebody add
>>>>>> the missing help to the file?
>>>>>>
>>>>>>
>>>>>> tyr small_prog 126 mpiexec -rf my_rankfile -report-bindings rank_size
>>>>>>
> --------------------------------------------------------------------------
>>>>>> Sorry! You were supposed to get help about:
>>>>>> no-slot-list
>>>>>> from the file:
>>>>>> help-rmaps_rank_file.txt
>>>>>> But I couldn't find that topic in the file. Sorry!
>>>>>>
> --------------------------------------------------------------------------
>>>>>>
>>>>>>
>>>>>> As you can see below I could use a rankfile on my old local machine
>>>>>> (Sun Ultra 45) but not on our "new" one (Sun Server M4000). Today I
>>>>>> logged into the machine via ssh and tried the same command once more
>>>>>> as a local user without success. It's more or less the same error as
>>>>>> before when I tried to bind the process to a remote machine.
>>>>>>
>>>>>> rs0 small_prog 118 mpiexec -rf my_rankfile -report-bindings rank_size
>>>>>> [rs0.informatik.hs-fulda.de:13745] [[19734,0],0] odls:default:fork
>>>>>> binding child [[19734,1],0] to slot_list 0:0
>>>>>>
> --------------------------------------------------------------------------
>>>>>> We were unable to successfully process/set the requested processor
>>>>>> affinity settings:
>>>>>>
>>>>>> Specified slot list: 0:0
>>>>>> Error: Cross-device link
>>>>>>
>>>>>> This could mean that a non-existent processor was specified, or
>>>>>> that the specification had improper syntax.
>>>>>>
> --------------------------------------------------------------------------
>>>>>>
> --------------------------------------------------------------------------
>>>>>> mpiexec was unable to start the specified application as it encountered
> an
>>> error:
>>>>>>
>>>>>> Error name: No such file or directory
>>>>>> Node: rs0.informatik.hs-fulda.de
>>>>>>
>>>>>> when attempting to start process rank 0.
>>>>>>
> --------------------------------------------------------------------------
>>>>>> rs0 small_prog 119
>>>>>>
>>>>>>
>>>>>> The application is available.
>>>>>>
>>>>>> rs0 small_prog 119 which rank_size
>>>>>> /home/fd1026/SunOS/sparc/bin/rank_size
>>>>>>
>>>>>>
>>>>>> Is it a problem in the Open MPI implementation or in my rankfile?
>>>>>> How can I request which sockets and cores per socket are
>>>>>> available so that I can use correct values in my rankfile?
>>>>>> In lam-mpi I had a command "lamnodes" which I could use to get
>>>>>> such information. Thank you very much for any help in advance.
>>>>>>
>>>>>>
>>>>>> Kind regards
>>>>>>
>>>>>> Siegmar
>>>>>>
>>>>>>
>>>>>>
>>>>>>>> Are *all* the machines Sparc? Or just the 3rd one (rs0)?
>>>>>>>
>>>>>>> Yes, both machines are Sparc. I tried first in a homogeneous
>>>>>>> environment.
>>>>>>>
>>>>>>> tyr fd1026 106 psrinfo -v
>>>>>>> Status of virtual processor 0 as of: 09/04/2012 07:32:14
>>>>>>> on-line since 08/31/2012 15:44:42.
>>>>>>> The sparcv9 processor operates at 1600 MHz,
>>>>>>> and has a sparcv9 floating point processor.
>>>>>>> Status of virtual processor 1 as of: 09/04/2012 07:32:14
>>>>>>> on-line since 08/31/2012 15:44:39.
>>>>>>> The sparcv9 processor operates at 1600 MHz,
>>>>>>> and has a sparcv9 floating point processor.
>>>>>>> tyr fd1026 107
>>>>>>>
>>>>>>> My local machine (tyr) is a dual processor machine and the
>>>>>>> other one is equipped with two quad-core processors each
>>>>>>> capable of running two hardware threads.
>>>>>>>
>>>>>>>
>>>>>>> Kind regards
>>>>>>>
>>>>>>> Siegmar
>>>>>>>
>>>>>>>
>>>>>>>> On Sep 3, 2012, at 12:43 PM, Siegmar Gross
>>>>>>> <Siegmar.Gross_at_[hidden]> wrote:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> the man page for "mpiexec" shows the following:
>>>>>>>>>
>>>>>>>>> cat myrankfile
>>>>>>>>> rank 0=aa slot=1:0-2
>>>>>>>>> rank 1=bb slot=0:0,1
>>>>>>>>> rank 2=cc slot=1-2
>>>>>>>>> mpirun -H aa,bb,cc,dd -rf myrankfile ./a.out So that
>>>>>>>>>
>>>>>>>>> Rank 0 runs on node aa, bound to socket 1, cores 0-2.
>>>>>>>>> Rank 1 runs on node bb, bound to socket 0, cores 0 and 1.
>>>>>>>>> Rank 2 runs on node cc, bound to cores 1 and 2.
>>>>>>>>>
>>>>>>>>> Does it mean that the process with rank 0 should be bound to
>>>>>>>>> core 0, 1, or 2 of socket 1?
>>>>>>>>>
>>>>>>>>> I tried to use a rankfile and have a problem. My rankfile contains
>>>>>>>>> the following lines.
>>>>>>>>>
>>>>>>>>> rank 0=tyr.informatik.hs-fulda.de slot=0:0
>>>>>>>>> rank 1=tyr.informatik.hs-fulda.de slot=1:0
>>>>>>>>> #rank 2=rs0.informatik.hs-fulda.de slot=0:0
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Everything is fine if I use the file with just my local machine
>>>>>>>>> (the first two lines).
>>>>>>>>>
>>>>>>>>> tyr small_prog 115 mpiexec -report-bindings -rf my_rankfile rank_size
>>>>>>>>> [tyr.informatik.hs-fulda.de:01133] [[9849,0],0]
>>>>>>>>> odls:default:fork binding child [[9849,1],0] to slot_list 0:0
>>>>>>>>> [tyr.informatik.hs-fulda.de:01133] [[9849,0],0]
>>>>>>>>> odls:default:fork binding child [[9849,1],1] to slot_list 1:0
>>>>>>>>> I'm process 0 of 2 available processes running on
>>>>>>> tyr.informatik.hs-fulda.de.
>>>>>>>>> MPI standard 2.1 is supported.
>>>>>>>>> I'm process 1 of 2 available processes running on
>>>>>>> tyr.informatik.hs-fulda.de.
>>>>>>>>> MPI standard 2.1 is supported.
>>>>>>>>> tyr small_prog 116
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I can also change the socket number and the processes will be attached
>>>>>>>>> to the correct cores. Unfortunately it doesn't work if I add one
>>>>>>>>> other machine (third line).
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> tyr small_prog 112 mpiexec -report-bindings -rf my_rankfile rank_size
>>>>>>>>>
>>> --------------------------------------------------------------------------
>>>>>>>>> We were unable to successfully process/set the requested processor
>>>>>>>>> affinity settings:
>>>>>>>>>
>>>>>>>>> Specified slot list: 0:0
>>>>>>>>> Error: Cross-device link
>>>>>>>>>
>>>>>>>>> This could mean that a non-existent processor was specified, or
>>>>>>>>> that the specification had improper syntax.
>>>>>>>>>
>>> --------------------------------------------------------------------------
>>>>>>>>> [tyr.informatik.hs-fulda.de:01520] [[10212,0],0]
>>>>>>>>> odls:default:fork binding child [[10212,1],0] to slot_list 0:0
>>>>>>>>> [tyr.informatik.hs-fulda.de:01520] [[10212,0],0]
>>>>>>>>> odls:default:fork binding child [[10212,1],1] to slot_list 1:0
>>>>>>>>> [rs0.informatik.hs-fulda.de:12047] [[10212,0],1]
>>>>>>>>> odls:default:fork binding child [[10212,1],2] to slot_list 0:0
>>>>>>>>> [tyr.informatik.hs-fulda.de:01520] [[10212,0],0]
>>>>>>>>> ORTE_ERROR_LOG: A message is attempting to be sent to a process
>>>>>>>>> whose contact information is unknown in file
>>>>>>>>> ../../../../../openmpi-1.6/orte/mca/rml/oob/rml_oob_send.c at line 145
>>>>>>>>> [tyr.informatik.hs-fulda.de:01520] [[10212,0],0] attempted to send
>>>>>>>>> to [[10212,1],0]: tag 20
>>>>>>>>> [tyr.informatik.hs-fulda.de:01520] [[10212,0],0] ORTE_ERROR_LOG:
>>>>>>>>> A message is attempting to be sent to a process whose contact
>>>>>>>>> information is unknown in file
>>>>>>>>> ../../../../openmpi-1.6/orte/mca/odls/base/odls_base_default_fns.c
>>>>>>>>> at line 2501
>>>>>>>>>
>>> --------------------------------------------------------------------------
>>>>>>>>> mpiexec was unable to start the specified application as it
>>>>>>>>> encountered an error:
>>>>>>>>>
>>>>>>>>> Error name: Error 0
>>>>>>>>> Node: rs0.informatik.hs-fulda.de
>>>>>>>>>
>>>>>>>>> when attempting to start process rank 2.
>>>>>>>>>
>>> --------------------------------------------------------------------------
>>>>>>>>> tyr small_prog 113
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> The other machine has two 8 core processors.
>>>>>>>>>
>>>>>>>>> tyr small_prog 121 ssh rs0 psrinfo -v
>>>>>>>>> Status of virtual processor 0 as of: 09/03/2012 19:51:15
>>>>>>>>> on-line since 07/26/2012 15:03:14.
>>>>>>>>> The sparcv9 processor operates at 2400 MHz,
>>>>>>>>> and has a sparcv9 floating point processor.
>>>>>>>>> Status of virtual processor 1 as of: 09/03/2012 19:51:15
>>>>>>>>> ...
>>>>>>>>> Status of virtual processor 15 as of: 09/03/2012 19:51:15
>>>>>>>>> on-line since 07/26/2012 15:03:16.
>>>>>>>>> The sparcv9 processor operates at 2400 MHz,
>>>>>>>>> and has a sparcv9 floating point processor.
>>>>>>>>> tyr small_prog 122
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Is it necessary to specify another option on the command line or
>>>>>>>>> is my rankfile faulty? Thank you very much for any suggestions in
>>>>>>>>> advance.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Kind regards
>>>>>>>>>
>>>>>>>>> Siegmar
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> users mailing list
>>>>>>>>> users_at_[hidden]
>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> users mailing list
>>>>>>> users_at_[hidden]
>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> users_at_[hidden]
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>
>>>>>
>>>
>>
>>
>