Fixes #3513: Move orterun.1in rankfile update to v1.7 v1.7
authorjsquyres
Tue Feb 19 22:36:41 2013 +0000 (4 months ago)
branchv1.7
changeset 21417c412ef2fcb1a
parent 21416 01bbea512249
child 21418 a30e81e61730
Fixes #3513: Move orterun.1in rankfile update to v1.7

---svn-pre-commit-ignore-below---

r28067 [[BR]]
Update documentation for rankfiles in orterun.1:

* Add a little more description of what rankfiles are
* Update that we use logical numbering for socket:core notation
* Mention +nX notation
orte/tools/orterun/orterun.1in
     1.1 --- a/orte/tools/orterun/orterun.1in	Tue Feb 19 22:35:08 2013 +0000
     1.2 +++ b/orte/tools/orterun/orterun.1in	Tue Feb 19 22:36:41 2013 +0000
     1.3 @@ -900,29 +900,61 @@
     1.4  .
     1.5  .SS Rankfiles
     1.6  .
     1.7 -Rankfiles provide a means for specifying detailed information about
     1.8 -how process ranks should be mapped to nodes and how they should be bound.
     1.9 -Consider the following:
    1.10 +Rankfiles are text files that specify detailed information about how
    1.11 +individual processes should be mapped to nodes, and to which
    1.12 +processor(s) they should be bound.  Each line of a rankfile specifies
    1.13 +the location of one process (for MPI jobs, the process' "rank" refers
    1.14 +to its rank in MPI_COMM_WORLD).  The general form of each line in the
    1.15 +rankfile is:
    1.16  .
    1.17  
    1.18 -    cat myrankfile
    1.19 +    rank <N>=<hostname> slot=<slot list>
    1.20 +.
    1.21 +.PP
    1.22 +For example:
    1.23 +.
    1.24 +
    1.25 +    $ cat myrankfile
    1.26      rank 0=aa slot=1:0-2
    1.27      rank 1=bb slot=0:0,1
    1.28      rank 2=cc slot=1-2
    1.29 -    mpirun -H aa,bb,cc,dd -rf myrankfile ./a.out
    1.30 +    $ mpirun -H aa,bb,cc,dd -rf myrankfile ./a.out
    1.31  .
    1.32  .PP
    1.33 -So that
    1.34 +Means that
    1.35  .
    1.36 +
    1.37    Rank 0 runs on node aa, bound to socket 1, cores 0-2.
    1.38    Rank 1 runs on node bb, bound to socket 0, cores 0 and 1.
    1.39    Rank 2 runs on node cc, bound to cores 1 and 2.
    1.40  .
    1.41  .PP
    1.42 -Note that all slot locations are to be specified as
    1.43 +The hostnames listed above are "absolute," meaning that actual
    1.44 +resolveable hostnames are specified.  However, hostnames can also be
    1.45 +specified as "relative," meaning that they are specified in relation
    1.46 +to an externally-specified list of hostnames (e.g., by mpirun's --host
    1.47 +argument, a hostfile, or a job scheduler).
    1.48 +.
    1.49 +.PP
    1.50 +The "relative" specification is of the form "+n<X>", where X is an
    1.51 +integer specifying the Xth hostname in the set of all available
    1.52 +hostnames, indexed from 0.  For example:
    1.53 +.
    1.54 +
    1.55 +    $ cat myrankfile
    1.56 +    rank 0=+n0 slot=1:0-2
    1.57 +    rank 1=+n1 slot=0:0,1
    1.58 +    rank 2=+n2 slot=1-2
    1.59 +    $ mpirun -H aa,bb,cc,dd -rf myrankfile ./a.out
    1.60 +.
    1.61 +.PP
    1.62 +Starting with Open MPI v1.7, all socket/core slot locations are be
    1.63 +specified as
    1.64 +.I logical
    1.65 +indexes (the Open MPI v1.6 series used 
    1.66  .I physical
    1.67 -indexes.  You can use tools such as HWLOC's "lstopo -v" to find the
    1.68 -physical indexes of socket and cores.
    1.69 +indexes).  You can use tools such as HWLOC's "lstopo" to find the
    1.70 +logical indexes of socket and cores.
    1.71  .
    1.72  .
    1.73  .SS Application Context or Executable Program?