Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] questions to some open problems
From: Ralph Castain (rhc_at_[hidden])
Date: 2012-12-15 11:04:53


Hmmm...you shouldn't need to specify a hostfile in addition to the rankfile, so something has gotten messed up in the allocator. I'll take a look at it.

As for cpus-per-proc, I'm hoping to tackle it over the holiday while I take a break from my regular job. Will let you know when fixed.

Thanks for your patience!

On Dec 15, 2012, at 1:41 AM, Siegmar Gross <Siegmar.Gross_at_[hidden]> wrote:

> Hi Ralph
>
>>> some weeks ago (mainly in the beginning of October) I reported
>>> several problems and I would be grateful if you can tell me if
>>> and probably when somebody will try to solve them.
>>>
>>> 1) I don't get the expected results, when I try to send or scatter
>>> the columns of a matrix in Java. The received column values have
>>> nothing to do with the original values, if I use a homogeneous
>>> environment and the program breaks with "An error occurred in
>>> MPI_Comm_dup" and "MPI_ERR_INTERN: internal error", if I use
>>> a heterogeneous environment. I would like to use the Java API.
>>>
>>> 2) I don't get the expected result, when I try to scatter an object
>>> in Java.
>>> https://svn.open-mpi.org/trac/ompi/ticket/3351
>>
>> Nothing has happened on these yet
>
> Do you have an idea when somebody will have time to fix these problems?
>
>
>>> 3) I still get only a message that all nodes are already filled up
>>> when I use a "rankfile" and nothing else happens. I would like
>>> to use a rankfile. You filed a bug fix for it.
>>>
>>
>> I believe rankfile was fixed, at least on the trunk - not sure if it
>> was moved to 1.7. I assume that's the release you are talking about?
>
> I'm using the trunk for my tests. It didn't work for me because I used
> the rankfile without a hostfile or a hostlist (it is not enough to
> specify the hosts in the rankfile). Everything works fine when I provide
> a "correct" hostfile or hostlist and the binding isn't too compilicated
> (see my last example below).
>
> My rankfile:
>
> rank 0=sunpc0 slot=0:0
> rank 1=sunpc1 slot=0:0
> rank 2=sunpc0 slot=1:0
> rank 3=sunpc1 slot=1:0
>
>
> My hostfile:
>
> sunpc0 slots=4
> sunpc1 slots=4
>
>
> It will not work without a hostfile or hostlist.
>
> sunpc0 mpi-probleme 128 mpiexec -report-bindings -rf rankfile_1.openmpi \
> -np 4 hostname
> ------------------------------------------------------------------------
> The rankfile that was used claimed that a host was either not
> allocated or oversubscribed its slots. Please review your rank-slot
> assignments and your host allocation to ensure a proper match. Also,
> some systems may require using full hostnames, such as
> "host1.example.com" (instead of just plain "host1").
>
> Host: sunpc1
> ------------------------------------------------------------------------
> sunpc0 mpi-probleme 129
>
>
> I get the expected output, if I add "-hostfile host_sunpc" or
> "-host sunpc0,sunpc1" on the command line.
>
> sunpc0 mpi-probleme 129 mpiexec -report-bindings -rf rankfile_1.openmpi \
> -np 4 -hostfile host_sunpc hostname
> [sunpc0:06954] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/.][./.]
> [sunpc0:06954] MCW rank 2 bound to socket 1[core 2[hwt 0]]: [./.][B/.]
> sunpc0
> sunpc0
> [sunpc1:12583] MCW rank 1 bound to socket 0[core 0[hwt 0]]: [B/.][./.]
> [sunpc1:12583] MCW rank 3 bound to socket 1[core 2[hwt 0]]: [./.][B/.]
> sunpc1
> sunpc1
> sunpc0 mpi-probleme 130
>
>
> Furthermore it is necessary that both the rankfile and the hostfile
> contain qualified or unqualified hostnames in the same way. Otherwise
> it will not work as you can see in the following output where my
> hostfile contains a qualified hostname and my rankfile only the hostname
> without domain name.
>
> sunpc0 mpi-probleme 131 mpiexec -report-bindings -rf rankfile_1.openmpi \
> -np 4 -hostfile host_sunpc_full hostname
> ------------------------------------------------------------------------
> The rankfile that was used claimed that a host was either not
> allocated or oversubscribed its slots. Please review your rank-slot
> assignments and your host allocation to ensure a proper match. Also,
> some systems may require using full hostnames, such as
> "host1.example.com" (instead of just plain "host1").
>
> Host: sunpc1
> ------------------------------------------------------------------------
> sunpc0 mpi-probleme 132
>
>
> Unfortunately my complicated rankfile still doesn't work, although
> you told me some weeks ago that it is correct.
>
> rank 0=sunpc0 slot=0:0-1,1:0-1
> rank 1=sunpc1 slot=0:0-1
> rank 2=sunpc1 slot=1:0
> rank 3=sunpc1 slot=1:1
>
> sunpc1 mpi-probleme 103 mpiexec -report-bindings -rf rankfile -np 4 \
> -hostfile host_sunpc hostname
> sunpc1
> sunpc1
> sunpc1
> [sunpc1:12741] MCW rank 2 bound to socket 1[core 2[hwt 0]]: [./.][B/.]
> [sunpc1:12741] MCW rank 3 bound to socket 1[core 3[hwt 0]]: [./.][./B]
> [sunpc1:12741] MCW rank 1 bound to socket 0[core 0[hwt 0]],
> socket 0[core 1[hwt 0]]: [B/B][./.]
> [sunpc0:07075] MCW rank 0 bound to socket 0[core 0[hwt 0]],
> socket 0[core 1[hwt 0]]: [B/B][./.]
> sunpc0
> sunpc1 mpi-probleme 104
>
> The bindings for ranks 1 to 3 are correct, but rank 0 didn't get the
> cores from the second socket.
>
>
>
>>> 4) I would like to have "-cpus-per-proc", "-npersocket", etc for
>>> every set of machines/applications and not globally for all
>>> machines/applications if I specify several colon-separated sets
>>> of machines or applications on the command line. You told me that
>>> it could be done.
>>>
>>> 5) By the way, it seems that the option "-cpus-per-proc" isn't any
>>> longer supported in openmpi-1.7 and openmpi-1.9. How can I bind a
>>> multi-threaded process to more than one core in these versions?
>>
>> I'm afraid I haven't gotten around to working on cpus-per-proc, though
>> I believe npersocket was fixed.
>
> Will you also support "-cpus-per-proc" in openmpi-1.7 and openmpi-1.9?
> At the moment it isn't available.
>
> sunpc1 mpi-probleme 106 mpiexec -report-bindings -np 4 \
> -host linpc0,linpc1,sunpc0,sunpc1 -cpus-per-proc 4 -map-by core \
> -bind-to core hostname
> mpiexec: Error: unknown option "-p"
> Type 'mpiexec --help' for usage.
>
>
> sunpc1 mpi-probleme 110 mpiexec --help | grep cpus
> cpus allocated to this job [default: none]
> -use-hwthread-cpus|--use-hwthread-cpus
> Use hardware threads as independent cpus
>
>
>
>>> I can provide my small programs once more if you need them. Thank
>>> you very much for any answer in advance.
>
> Thank you very much for all your help and time
>
> Siegmar
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users