Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: [OMPI users] problem with rankfile in openmpi-1.7.2rc3r28550
From: Siegmar Gross (Siegmar.Gross_at_[hidden])
Date: 2013-05-24 03:16:52


Hi

I installed openmpi-1.7.2rc3r28550 on "openSuSE Linux 12.1", "Solaris 10
x86_64", and "Solaris 10 sparc" with "Sun C 5.12" in 32- and 64-bit
versions. Unfortunately "rank_files" don't work as expected.

sunpc1 rankfiles 109 more rf_ex_sunpc_linpc
# mpiexec -report-bindings -rf rf_ex_sunpc_linpc hostname

rank 0=linpc1 slot=0:0-1,1:0-1
rank 1=sunpc1 slot=0:0-1
rank 2=sunpc1 slot=1:0
rank 3=sunpc1 slot=1:1

sunpc1 rankfiles 110 mpiexec -report-bindings \
  -rf rf_ex_sunpc_linpc hostname
---------------------------------------------------------------------
The rankfile that was used claimed that a host was either not
allocated or oversubscribed its slots. Please review your rank-slot
assignments and your host allocation to ensure a proper match. Also,
some systems may require using full hostnames, such as
"host1.example.com" (instead of just plain "host1").

  Host: linpc1
---------------------------------------------------------------------
sunpc1 rankfiles 111

sunpc1 rankfiles 111 which mpiexec
/usr/local/openmpi-1.7_32_cc/bin/mpiexec

I get the same error for my 64-bit version, but I don't have this
problem with openmpi-1.6.5a1r28554.

sunpc1 rankfiles 105 mpiexec -report-bindings \
  -rf rf_ex_sunpc_linpc hostname
[sunpc1:17968] MCW rank 1 bound to socket 0[core 0-1]:
  [B B][. .] (slot list 0:0-1)
[sunpc1:17968] MCW rank 2 bound to socket 1[core 0]:
  [. .][B .] (slot list 1:0)
[sunpc1:17968] MCW rank 3 bound to socket 1[core 1]:
  [. .][. B] (slot list 1:1)
sunpc1
sunpc1
sunpc1
[linpc1:03246] MCW rank 0 bound to socket 0[core 0-1]
  socket 1[core 0-1]: [B B][B B] (slot list 0:0-1,1:0-1)
linpc1

sunpc1 rankfiles 106 which mpiexec
/usr/local/openmpi-1.6.5_32_cc/bin/mpiexec
sunpc1 rankfiles 107

I would be grateful, if somebody can fix the problem. Thank you
very much for any help in advance.

Kind regards

Siegmar