Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] problem with rankfile
From: Siegmar Gross (Siegmar.Gross_at_[hidden])
Date: 2012-09-07 08:41:34


Hi,

are the following outputs helpful to find the error with
a rankfile on Solaris? I wrapped long lines so that they
are easier to read. Have you had time to look at the
segmentation fault with a rankfile which I reported in my
last email (see below)?

"tyr" is a two processor single core machine.

tyr fd1026 116 mpiexec -report-bindings -np 4 \
  -bind-to-socket -bycore rank_size
[tyr.informatik.hs-fulda.de:18614] [[27298,0],0] odls:default:
  fork binding child [[27298,1],0] to socket 0 cpus 0001
[tyr.informatik.hs-fulda.de:18614] [[27298,0],0] odls:default:
  fork binding child [[27298,1],1] to socket 1 cpus 0002
[tyr.informatik.hs-fulda.de:18614] [[27298,0],0] odls:default:
  fork binding child [[27298,1],2] to socket 0 cpus 0001
[tyr.informatik.hs-fulda.de:18614] [[27298,0],0] odls:default:
  fork binding child [[27298,1],3] to socket 1 cpus 0002
I'm process 0 of 4 ...

tyr fd1026 121 mpiexec -report-bindings -np 4 \
 -bind-to-socket -bysocket rank_size
[tyr.informatik.hs-fulda.de:18656] [[27380,0],0] odls:default:
  fork binding child [[27380,1],0] to socket 0 cpus 0001
[tyr.informatik.hs-fulda.de:18656] [[27380,0],0] odls:default:
  fork binding child [[27380,1],1] to socket 1 cpus 0002
[tyr.informatik.hs-fulda.de:18656] [[27380,0],0] odls:default:
  fork binding child [[27380,1],2] to socket 0 cpus 0001
[tyr.informatik.hs-fulda.de:18656] [[27380,0],0] odls:default:
  fork binding child [[27380,1],3] to socket 1 cpus 0002
I'm process 0 of 4 ...

tyr fd1026 117 mpiexec -report-bindings -np 4 \
  -bind-to-core -bycore rank_size
[tyr.informatik.hs-fulda.de:18623] [[27307,0],0] odls:default:
  fork binding child [[27307,1],2] to cpus 0004
------------------------------------------------------------------
An attempt to set processor affinity has failed - please check to
ensure that your system supports such functionality. If so, then
this is probably something that should be reported to the OMPI
  developers.
------------------------------------------------------------------
[tyr.informatik.hs-fulda.de:18623] [[27307,0],0] odls:default:
  fork binding child [[27307,1],0] to cpus 0001
[tyr.informatik.hs-fulda.de:18623] [[27307,0],0] odls:default:
  fork binding child [[27307,1],1] to cpus 0002
------------------------------------------------------------------
mpiexec was unable to start the specified application
  as it encountered an error
on node tyr.informatik.hs-fulda.de. More information may be
  available above.
------------------------------------------------------------------
4 total processes failed to start

tyr fd1026 118 mpiexec -report-bindings -np 4 \
  -bind-to-core -bysocket rank_size
------------------------------------------------------------------
An invalid physical processor ID was returned when attempting to
  bind
an MPI process to a unique processor.

This usually means that you requested binding to more processors
  than

exist (e.g., trying to bind N MPI processes to M processors,
  where N >
M). Double check that you have enough unique processors for
  all the
MPI processes that you are launching on this host.

You job will now abort.
------------------------------------------------------------------
[tyr.informatik.hs-fulda.de:18631] [[27347,0],0] odls:default:
  fork binding child [[27347,1],0] to socket 0 cpus 0001
[tyr.informatik.hs-fulda.de:18631] [[27347,0],0] odls:default:
  fork binding child [[27347,1],1] to socket 1 cpus 0002
------------------------------------------------------------------
mpiexec was unable to start the specified application as it
  encountered an error
on node tyr.informatik.hs-fulda.de. More information may be
  available above.
------------------------------------------------------------------
4 total processes failed to start
tyr fd1026 119

"linpc3" and "linpc4" are two processor dual core machines.

linpc4 fd1026 102 mpiexec -report-bindings -host linpc3,linpc4 \
 -np 4 -bind-to-core -bycore rank_size
[linpc4:16842] [[40914,0],0] odls:default:
  fork binding child [[40914,1],1] to cpus 0001
[linpc4:16842] [[40914,0],0] odls:default:
  fork binding child [[40914,1],3] to cpus 0002
[linpc3:31384] [[40914,0],1] odls:default:
  fork binding child [[40914,1],0] to cpus 0001
[linpc3:31384] [[40914,0],1] odls:default:
  fork binding child [[40914,1],2] to cpus 0002
I'm process 1 of 4 ...

linpc4 fd1026 102 mpiexec -report-bindings -host linpc3,linpc4 \
  -np 4 -bind-to-core -bysocket rank_size
[linpc4:16846] [[40918,0],0] odls:default:
  fork binding child [[40918,1],1] to socket 0 cpus 0001
[linpc4:16846] [[40918,0],0] odls:default:
  fork binding child [[40918,1],3] to socket 0 cpus 0002
[linpc3:31435] [[40918,0],1] odls:default:
  fork binding child [[40918,1],0] to socket 0 cpus 0001
[linpc3:31435] [[40918,0],1] odls:default:
  fork binding child [[40918,1],2] to socket 0 cpus 0002
I'm process 1 of 4 ...

linpc4 fd1026 104 mpiexec -report-bindings -host linpc3,linpc4 \
  -np 4 -bind-to-socket -bycore rank_size
------------------------------------------------------------------
Unable to bind to socket 0 on node linpc3.
------------------------------------------------------------------
------------------------------------------------------------------
Unable to bind to socket 0 on node linpc4.
------------------------------------------------------------------
------------------------------------------------------------------
mpiexec was unable to start the specified application as it
  encountered an error:

Error name: Fatal
Node: linpc4

when attempting to start process rank 1.
------------------------------------------------------------------
4 total processes failed to start
linpc4 fd1026 105

linpc4 fd1026 105 mpiexec -report-bindings -host linpc3,linpc4 \
  -np 4 -bind-to-socket -bysocket rank_size
------------------------------------------------------------------
Unable to bind to socket 0 on node linpc4.
------------------------------------------------------------------
------------------------------------------------------------------
Unable to bind to socket 0 on node linpc3.
------------------------------------------------------------------
------------------------------------------------------------------
mpiexec was unable to start the specified application as it
  encountered an error:

Error name: Fatal
Node: linpc4

when attempting to start process rank 1.
--------------------------------------------------------------------------
4 total processes failed to start

It's interesting that commands that work on Solaris fail on Linux
and vice versa.

Kind regards

Siegmar

> > I couldn't really say for certain - I don't see anything obviously
> > wrong with your syntax, and the code appears to be working or else
> > it would fail on the other nodes as well. The fact that it fails
> > solely on that machine seems suspect.
> >
> > Set aside the rankfile for the moment and try to just bind to cores
> > on that machine, something like:
> >
> > mpiexec --report-bindings -bind-to-core
> > -host rs0.informatik.hs-fulda.de -n 2 rank_size
> >
> > If that doesn't work, then the problem isn't with rankfile
>
> It doesn't work but I found out something else as you can see below.
> I get a segmentation fault for some rankfiles.
>
>
> tyr small_prog 110 mpiexec --report-bindings -bind-to-core
> -host rs0.informatik.hs-fulda.de -n 2 rank_size
> --------------------------------------------------------------------------
> An attempt to set processor affinity has failed - please check to
> ensure that your system supports such functionality. If so, then
> this is probably something that should be reported to the OMPI developers.
> --------------------------------------------------------------------------
> [rs0.informatik.hs-fulda.de:14695] [[30561,0],1] odls:default:
> fork binding child [[30561,1],0] to cpus 0001
> --------------------------------------------------------------------------
> mpiexec was unable to start the specified application as it
> encountered an error:
>
> Error name: Resource temporarily unavailable
> Node: rs0.informatik.hs-fulda.de
>
> when attempting to start process rank 0.
> --------------------------------------------------------------------------
> 2 total processes failed to start
> tyr small_prog 111
>
>
>
>
> Perhaps I have a hint for the error on Solaris Sparc. I use the
> following rankfile to keep everything simple.
>
> rank 0=tyr.informatik.hs-fulda.de slot=0:0
> rank 1=linpc0.informatik.hs-fulda.de slot=0:0
> rank 2=linpc1.informatik.hs-fulda.de slot=0:0
> #rank 3=linpc2.informatik.hs-fulda.de slot=0:0
> rank 4=linpc3.informatik.hs-fulda.de slot=0:0
> rank 5=linpc4.informatik.hs-fulda.de slot=0:0
> rank 6=sunpc0.informatik.hs-fulda.de slot=0:0
> rank 7=sunpc1.informatik.hs-fulda.de slot=0:0
> rank 8=sunpc2.informatik.hs-fulda.de slot=0:0
> rank 9=sunpc3.informatik.hs-fulda.de slot=0:0
> rank 10=sunpc4.informatik.hs-fulda.de slot=0:0
>
> When I execute "mpiexec -report-bindings -rf my_rankfile rank_size"
> on a Linux-x86_64 or Solaris-10-x86_64 machine everything works fine.
>
> linpc4 small_prog 104 mpiexec -report-bindings -rf my_rankfile rank_size
> [linpc4:08018] [[49482,0],0] odls:default:fork binding child
> [[49482,1],5] to slot_list 0:0
> [linpc3:22030] [[49482,0],4] odls:default:fork binding child
> [[49482,1],4] to slot_list 0:0
> [linpc0:12887] [[49482,0],2] odls:default:fork binding child
> [[49482,1],1] to slot_list 0:0
> [linpc1:08323] [[49482,0],3] odls:default:fork binding child
> [[49482,1],2] to slot_list 0:0
> [sunpc1:17786] [[49482,0],6] odls:default:fork binding child
> [[49482,1],7] to slot_list 0:0
> [sunpc3.informatik.hs-fulda.de:08482] [[49482,0],8] odls:default:fork
> binding child [[49482,1],9] to slot_list 0:0
> [sunpc0.informatik.hs-fulda.de:11568] [[49482,0],5] odls:default:fork
> binding child [[49482,1],6] to slot_list 0:0
> [tyr.informatik.hs-fulda.de:21484] [[49482,0],1] odls:default:fork
> binding child [[49482,1],0] to slot_list 0:0
> [sunpc2.informatik.hs-fulda.de:28638] [[49482,0],7] odls:default:fork
> binding child [[49482,1],8] to slot_list 0:0
> ...
>
>
>
> I get a segmentation fault when I run it on my local machine
> (Solaris Sparc).
>
> tyr small_prog 141 mpiexec -report-bindings -rf my_rankfile rank_size
> [tyr.informatik.hs-fulda.de:21421] [[29113,0],0] ORTE_ERROR_LOG:
> Data unpack would read past end of buffer in file
> ../../../../openmpi-1.6/orte/mca/odls/base/odls_base_default_fns.c
> at line 927
> [tyr:21421] *** Process received signal ***
> [tyr:21421] Signal: Segmentation Fault (11)
> [tyr:21421] Signal code: Address not mapped (1)
> [tyr:21421] Failing at address: 5ba
> /export2/prog/SunOS_sparc/openmpi-1.6_32_cc/lib/libopen-rte.so.4.0.0:0x15d3ec
> /lib/libc.so.1:0xcad04
> /lib/libc.so.1:0xbf3b4
> /lib/libc.so.1:0xbf59c
> /lib/libc.so.1:0x58bd0 [ Signal 11 (SEGV)]
> /lib/libc.so.1:free+0x24
> /export2/prog/SunOS_sparc/openmpi-1.6_32_cc/lib/libopen-rte.so.4.0.0:
> orte_odls_base_default_construct_child_list+0x1234
> /export2/prog/SunOS_sparc/openmpi-1.6_32_cc/lib/openmpi/
> mca_odls_default.so:0x90b8
> /export2/prog/SunOS_sparc/openmpi-1.6_32_cc/lib/libopen-rte.so.4.0.0:0x5e8d4
> /export2/prog/SunOS_sparc/openmpi-1.6_32_cc/lib/libopen-rte.so.4.0.0:
> orte_daemon_cmd_processor+0x328
> /export2/prog/SunOS_sparc/openmpi-1.6_32_cc/lib/libopen-rte.so.4.0.0:0x12e324
> /export2/prog/SunOS_sparc/openmpi-1.6_32_cc/lib/libopen-rte.so.4.0.0:
> opal_event_base_loop+0x228
> /export2/prog/SunOS_sparc/openmpi-1.6_32_cc/lib/libopen-rte.so.4.0.0:
> opal_progress+0xec
> /export2/prog/SunOS_sparc/openmpi-1.6_32_cc/lib/libopen-rte.so.4.0.0:
> orte_plm_base_report_launched+0x1c4
> /export2/prog/SunOS_sparc/openmpi-1.6_32_cc/lib/libopen-rte.so.4.0.0:
> orte_plm_base_launch_apps+0x318
> /export2/prog/SunOS_sparc/openmpi-1.6_32_cc/lib/openmpi/mca_plm_rsh.so:
> orte_plm_rsh_launch+0xac4
> /export2/prog/SunOS_sparc/openmpi-1.6_32_cc/bin/orterun:orterun+0x16a8
> /export2/prog/SunOS_sparc/openmpi-1.6_32_cc/bin/orterun:main+0x24
> /export2/prog/SunOS_sparc/openmpi-1.6_32_cc/bin/orterun:_start+0xd8
> [tyr:21421] *** End of error message ***
> Segmentation fault
> tyr small_prog 142
>
>
> The funny thing is that I get a segmentation fault on the Linux
> machine as well if I change my rankfile in the following way.
>
> rank 0=tyr.informatik.hs-fulda.de slot=0:0
> rank 1=linpc0.informatik.hs-fulda.de slot=0:0
> #rank 2=linpc1.informatik.hs-fulda.de slot=0:0
> #rank 3=linpc2.informatik.hs-fulda.de slot=0:0
> #rank 4=linpc3.informatik.hs-fulda.de slot=0:0
> rank 5=linpc4.informatik.hs-fulda.de slot=0:0
> rank 6=sunpc0.informatik.hs-fulda.de slot=0:0
> #rank 7=sunpc1.informatik.hs-fulda.de slot=0:0
> #rank 8=sunpc2.informatik.hs-fulda.de slot=0:0
> #rank 9=sunpc3.informatik.hs-fulda.de slot=0:0
> rank 10=sunpc4.informatik.hs-fulda.de slot=0:0
>
>
> linpc4 small_prog 107 mpiexec -report-bindings -rf my_rankfile rank_size
> [linpc4:08402] [[65226,0],0] ORTE_ERROR_LOG: Data unpack would
> read past end of buffer in file
> ../../../../openmpi-1.6/orte/mca/odls/base/odls_base_default_fns.c
> at line 927
> [linpc4:08402] *** Process received signal ***
> [linpc4:08402] Signal: Segmentation fault (11)
> [linpc4:08402] Signal code: Address not mapped (1)
> [linpc4:08402] Failing at address: 0x5f32fffc
> [linpc4:08402] [ 0] [0xffffe410]
> [linpc4:08402] [ 1] /usr/local/openmpi-1.6_32_cc/lib/openmpi/
> mca_odls_default.so(+0x4023) [0xf73ec023]
> [linpc4:08402] [ 2] /usr/local/openmpi-1.6_32_cc/lib/
> libopen-rte.so.4(+0x42b91) [0xf7667b91]
> [linpc4:08402] [ 3] /usr/local/openmpi-1.6_32_cc/lib/
> libopen-rte.so.4(orte_daemon_cmd_processor+0x313) [0xf76655c3]
> [linpc4:08402] [ 4] /usr/local/openmpi-1.6_32_cc/lib/
> libopen-rte.so.4(+0x8f366) [0xf76b4366]
> [linpc4:08402] [ 5] /usr/local/openmpi-1.6_32_cc/lib/
> libopen-rte.so.4(opal_event_base_loop+0x18c) [0xf76b46bc]
> [linpc4:08402] [ 6] /usr/local/openmpi-1.6_32_cc/lib/
> libopen-rte.so.4(opal_event_loop+0x26) [0xf76b4526]
> [linpc4:08402] [ 7] /usr/local/openmpi-1.6_32_cc/lib/
> libopen-rte.so.4(opal_progress+0xba) [0xf769303a]
> [linpc4:08402] [ 8] /usr/local/openmpi-1.6_32_cc/lib/
> libopen-rte.so.4(orte_plm_base_report_launched+0x13f) [0xf767d62f]
> [linpc4:08402] [ 9] /usr/local/openmpi-1.6_32_cc/lib/
> libopen-rte.so.4(orte_plm_base_launch_apps+0x1b7) [0xf767bf27]
> [linpc4:08402] [10] /usr/local/openmpi-1.6_32_cc/lib/openmpi/
> mca_plm_rsh.so(orte_plm_rsh_launch+0xb2d) [0xf74228fd]
> [linpc4:08402] [11] mpiexec(orterun+0x102f) [0x804e7bf]
> [linpc4:08402] [12] mpiexec(main+0x13) [0x804c273]
> [linpc4:08402] [13] /lib/libc.so.6(__libc_start_main+0xf3) [0xf745e003]
> [linpc4:08402] *** End of error message ***
> Segmentation fault
> linpc4 small_prog 107
>
>
> Hopefully this information helps to fix the problem.
>
>
> Kind regards
>
> Siegmar
>
>
>
>
> > On Sep 5, 2012, at 5:50 AM, Siegmar Gross
<Siegmar.Gross_at_[hidden]> wrote:
> >
> > > Hi,
> > >
> > > I'm new to rankfiles so that I played a little bit with different
> > > options. I thought that the following entry would be similar to an
> > > entry in an appfile and that MPI could place the process with rank 0
> > > on any core of any processor.
> > >
> > > rank 0=tyr.informatik.hs-fulda.de
> > >
> > > Unfortunately it's not allowed and I got an error. Can somebody add
> > > the missing help to the file?
> > >
> > >
> > > tyr small_prog 126 mpiexec -rf my_rankfile -report-bindings rank_size
> > > --------------------------------------------------------------------------
> > > Sorry! You were supposed to get help about:
> > > no-slot-list
> > > from the file:
> > > help-rmaps_rank_file.txt
> > > But I couldn't find that topic in the file. Sorry!
> > > --------------------------------------------------------------------------
> > >
> > >
> > > As you can see below I could use a rankfile on my old local machine
> > > (Sun Ultra 45) but not on our "new" one (Sun Server M4000). Today I
> > > logged into the machine via ssh and tried the same command once more
> > > as a local user without success. It's more or less the same error as
> > > before when I tried to bind the process to a remote machine.
> > >
> > > rs0 small_prog 118 mpiexec -rf my_rankfile -report-bindings rank_size
> > > [rs0.informatik.hs-fulda.de:13745] [[19734,0],0] odls:default:fork
> > > binding child [[19734,1],0] to slot_list 0:0
> > > --------------------------------------------------------------------------
> > > We were unable to successfully process/set the requested processor
> > > affinity settings:
> > >
> > > Specified slot list: 0:0
> > > Error: Cross-device link
> > >
> > > This could mean that a non-existent processor was specified, or
> > > that the specification had improper syntax.
> > > --------------------------------------------------------------------------
> > > --------------------------------------------------------------------------
> > > mpiexec was unable to start the specified application as it encountered an
error:
> > >
> > > Error name: No such file or directory
> > > Node: rs0.informatik.hs-fulda.de
> > >
> > > when attempting to start process rank 0.
> > > --------------------------------------------------------------------------
> > > rs0 small_prog 119
> > >
> > >
> > > The application is available.
> > >
> > > rs0 small_prog 119 which rank_size
> > > /home/fd1026/SunOS/sparc/bin/rank_size
> > >
> > >
> > > Is it a problem in the Open MPI implementation or in my rankfile?
> > > How can I request which sockets and cores per socket are
> > > available so that I can use correct values in my rankfile?
> > > In lam-mpi I had a command "lamnodes" which I could use to get
> > > such information. Thank you very much for any help in advance.
> > >
> > >
> > > Kind regards
> > >
> > > Siegmar
> > >
> > >
> > >
> > >>> Are *all* the machines Sparc? Or just the 3rd one (rs0)?
> > >>
> > >> Yes, both machines are Sparc. I tried first in a homogeneous
> > >> environment.
> > >>
> > >> tyr fd1026 106 psrinfo -v
> > >> Status of virtual processor 0 as of: 09/04/2012 07:32:14
> > >> on-line since 08/31/2012 15:44:42.
> > >> The sparcv9 processor operates at 1600 MHz,
> > >> and has a sparcv9 floating point processor.
> > >> Status of virtual processor 1 as of: 09/04/2012 07:32:14
> > >> on-line since 08/31/2012 15:44:39.
> > >> The sparcv9 processor operates at 1600 MHz,
> > >> and has a sparcv9 floating point processor.
> > >> tyr fd1026 107
> > >>
> > >> My local machine (tyr) is a dual processor machine and the
> > >> other one is equipped with two quad-core processors each
> > >> capable of running two hardware threads.
> > >>
> > >>
> > >> Kind regards
> > >>
> > >> Siegmar
> > >>
> > >>
> > >>> On Sep 3, 2012, at 12:43 PM, Siegmar Gross
> > >> <Siegmar.Gross_at_[hidden]> wrote:
> > >>>
> > >>>> Hi,
> > >>>>
> > >>>> the man page for "mpiexec" shows the following:
> > >>>>
> > >>>> cat myrankfile
> > >>>> rank 0=aa slot=1:0-2
> > >>>> rank 1=bb slot=0:0,1
> > >>>> rank 2=cc slot=1-2
> > >>>> mpirun -H aa,bb,cc,dd -rf myrankfile ./a.out So that
> > >>>>
> > >>>> Rank 0 runs on node aa, bound to socket 1, cores 0-2.
> > >>>> Rank 1 runs on node bb, bound to socket 0, cores 0 and 1.
> > >>>> Rank 2 runs on node cc, bound to cores 1 and 2.
> > >>>>
> > >>>> Does it mean that the process with rank 0 should be bound to
> > >>>> core 0, 1, or 2 of socket 1?
> > >>>>
> > >>>> I tried to use a rankfile and have a problem. My rankfile contains
> > >>>> the following lines.
> > >>>>
> > >>>> rank 0=tyr.informatik.hs-fulda.de slot=0:0
> > >>>> rank 1=tyr.informatik.hs-fulda.de slot=1:0
> > >>>> #rank 2=rs0.informatik.hs-fulda.de slot=0:0
> > >>>>
> > >>>>
> > >>>> Everything is fine if I use the file with just my local machine
> > >>>> (the first two lines).
> > >>>>
> > >>>> tyr small_prog 115 mpiexec -report-bindings -rf my_rankfile rank_size
> > >>>> [tyr.informatik.hs-fulda.de:01133] [[9849,0],0]
> > >>>> odls:default:fork binding child [[9849,1],0] to slot_list 0:0
> > >>>> [tyr.informatik.hs-fulda.de:01133] [[9849,0],0]
> > >>>> odls:default:fork binding child [[9849,1],1] to slot_list 1:0
> > >>>> I'm process 0 of 2 available processes running on
> > >> tyr.informatik.hs-fulda.de.
> > >>>> MPI standard 2.1 is supported.
> > >>>> I'm process 1 of 2 available processes running on
> > >> tyr.informatik.hs-fulda.de.
> > >>>> MPI standard 2.1 is supported.
> > >>>> tyr small_prog 116
> > >>>>
> > >>>>
> > >>>> I can also change the socket number and the processes will be attached
> > >>>> to the correct cores. Unfortunately it doesn't work if I add one
> > >>>> other machine (third line).
> > >>>>
> > >>>>
> > >>>> tyr small_prog 112 mpiexec -report-bindings -rf my_rankfile rank_size
> > >>>>
--------------------------------------------------------------------------
> > >>>> We were unable to successfully process/set the requested processor
> > >>>> affinity settings:
> > >>>>
> > >>>> Specified slot list: 0:0
> > >>>> Error: Cross-device link
> > >>>>
> > >>>> This could mean that a non-existent processor was specified, or
> > >>>> that the specification had improper syntax.
> > >>>>
--------------------------------------------------------------------------
> > >>>> [tyr.informatik.hs-fulda.de:01520] [[10212,0],0]
> > >>>> odls:default:fork binding child [[10212,1],0] to slot_list 0:0
> > >>>> [tyr.informatik.hs-fulda.de:01520] [[10212,0],0]
> > >>>> odls:default:fork binding child [[10212,1],1] to slot_list 1:0
> > >>>> [rs0.informatik.hs-fulda.de:12047] [[10212,0],1]
> > >>>> odls:default:fork binding child [[10212,1],2] to slot_list 0:0
> > >>>> [tyr.informatik.hs-fulda.de:01520] [[10212,0],0]
> > >>>> ORTE_ERROR_LOG: A message is attempting to be sent to a process
> > >>>> whose contact information is unknown in file
> > >>>> ../../../../../openmpi-1.6/orte/mca/rml/oob/rml_oob_send.c at line 145
> > >>>> [tyr.informatik.hs-fulda.de:01520] [[10212,0],0] attempted to send
> > >>>> to [[10212,1],0]: tag 20
> > >>>> [tyr.informatik.hs-fulda.de:01520] [[10212,0],0] ORTE_ERROR_LOG:
> > >>>> A message is attempting to be sent to a process whose contact
> > >>>> information is unknown in file
> > >>>> ../../../../openmpi-1.6/orte/mca/odls/base/odls_base_default_fns.c
> > >>>> at line 2501
> > >>>>
--------------------------------------------------------------------------
> > >>>> mpiexec was unable to start the specified application as it
> > >>>> encountered an error:
> > >>>>
> > >>>> Error name: Error 0
> > >>>> Node: rs0.informatik.hs-fulda.de
> > >>>>
> > >>>> when attempting to start process rank 2.
> > >>>>
--------------------------------------------------------------------------
> > >>>> tyr small_prog 113
> > >>>>
> > >>>>
> > >>>>
> > >>>> The other machine has two 8 core processors.
> > >>>>
> > >>>> tyr small_prog 121 ssh rs0 psrinfo -v
> > >>>> Status of virtual processor 0 as of: 09/03/2012 19:51:15
> > >>>> on-line since 07/26/2012 15:03:14.
> > >>>> The sparcv9 processor operates at 2400 MHz,
> > >>>> and has a sparcv9 floating point processor.
> > >>>> Status of virtual processor 1 as of: 09/03/2012 19:51:15
> > >>>> ...
> > >>>> Status of virtual processor 15 as of: 09/03/2012 19:51:15
> > >>>> on-line since 07/26/2012 15:03:16.
> > >>>> The sparcv9 processor operates at 2400 MHz,
> > >>>> and has a sparcv9 floating point processor.
> > >>>> tyr small_prog 122
> > >>>>
> > >>>>
> > >>>>
> > >>>> Is it necessary to specify another option on the command line or
> > >>>> is my rankfile faulty? Thank you very much for any suggestions in
> > >>>> advance.
> > >>>>
> > >>>>
> > >>>> Kind regards
> > >>>>
> > >>>> Siegmar
> > >>>>
> > >>>>
> > >>>> _______________________________________________
> > >>>> users mailing list
> > >>>> users_at_[hidden]
> > >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> > >>>
> > >>>
> > >>
> > >> _______________________________________________
> > >> users mailing list
> > >> users_at_[hidden]
> > >> http://www.open-mpi.org/mailman/listinfo.cgi/users
> > >
> > > _______________________________________________
> > > users mailing list
> > > users_at_[hidden]
> > > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> >