Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] more information for my problem with rankfiles
From: Ralph Castain (rhc_at_[hidden])
Date: 2013-02-09 10:23:43


Jeff just committed the fix to the branch today at r28039, so it isn't in the earlier versions. You might try it with the next nightly snapshot of 1.6.4.

On Feb 9, 2013, at 3:09 AM, Siegmar Gross <Siegmar.Gross_at_[hidden]> wrote:

> Hi
>
> I could successfully use the following rankfile on Linux with
> openmpi-1.6.4rc3r27923, but it doesn't work with a patched
> openmpi-1.6.4rc4r28022 (patch.diff from Eugene). Perhaps this
> information helps to track down the error.
>
> tyr rankfiles 114 cat rf_ex_linpc
> # mpiexec -report-bindings -rf rf_ex_linpc hostname
> rank 0=linpc0 slot=0:0-1,1:0-1
> rank 1=linpc1 slot=0:0-1
> rank 2=linpc1 slot=1:0
> rank 3=linpc1 slot=1:1
>
>
> linpc1 rankfiles 99 mpiexec -report-bindings -rf rf_ex_linpc hostname
> ------------------------------------------------------------------------
> The rankfile that was used claimed that a host was either not
> allocated or oversubscribed its slots. Please review your rank-slot
> assignments and your host allocation to ensure a proper match. Also,
> some systems may require using full hostnames, such as
> "host1.example.com" (instead of just plain "host1").
>
> Host: linpc0
> ------------------------------------------------------------------------
>
> linpc1 rankfiles 100 ompi_info | grep "MPI:"
> Open MPI: 1.6.4rc4r28022
> linpc1 rankfiles 101 exit
>
>
>
> tyr rankfiles 110 ssh linpc1
> linpc1 fd1026 96 cd .../prog/mpi/rankfiles/
> linpc1 rankfiles 97 mpiexec -report-bindings -rf rf_ex_linpc hostname
> [linpc1:21351] MCW rank 1 bound to socket 0[core 0-1]:
> [B B][. .] (slot list 0:0-1)
> [linpc1:21351] MCW rank 2 bound to socket 1[core 0]:
> [. .][B .] (slot list 1:0)
> [linpc1:21351] MCW rank 3 bound to socket 1[core 1]:
> [. .][. B] (slot list 1:1)
> [linpc0:08012] MCW rank 0 bound to socket 0[core 0-1] socket 1[core 0-1]:
> [B B][B B] (slot list 0:0-1,1:0-1)
>
> linpc1 rankfiles 98 ompi_info | grep "MPI:"
> Open MPI: 1.6.4rc3r27923
> linpc1 rankfiles 99
>
>
> I will build an unpatched openmpi-1.6.4rc4 and check if the
> above rankfile will work. Unfortunately I can check only tomorrow
> because new packages will be mirrored in the night to all machines
> so that it is not available on both machines today. I let you know
> the result.
>
>
> Kind regards
>
> Siegmar
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users