Subject: Re: [MTT users] FW: ALPS modifications for MTT
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2008-08-14 10:46:58


BTW, I committed this patch to the MTT trunk.

I feel a little sheepish; I should have told you to use the trunk
these days, not the release branch (I know the wiki specifically says
otherwise). We really need to finally make a release out of what is
on the trunk -- it's much more advanced than what is on the release
branch (look at the CHANGES file in the top-level dir to see what has
changed since the release branch).

The Cisco MTT files in SVN are for the trunk; it's possible that the
features that the release branch doesn't understand will just be
ignored, but I haven't tried this in a long time.

On Aug 14, 2008, at 10:35 AM, Jeff Squyres wrote:

> This patch looks good to me.
>
> I'll commit. If you want to do any more work on MTT, perhaps ORNL
> can add you to its "Schedule A" form for the Open MPI Third Party
> Contribution form (it's very easy to amend Schedule A -- doesn't
> require any authoritative signatures), we could get you an MTT SVN
> account and you could commit this stuff directly.
>
>
> On Aug 14, 2008, at 10:24 AM, Matney Sr, Kenneth D. wrote:
>
>> Hi,
>>
>> When running MTT on the Cray XT3/XT4 machines, I found that MTT
>> does not
>> contain any support for ALPS. As a result, it always executes mpirun
>> with "-np 1". I patched lib/MTT/Values/Functions.pm with the
>> following
>> to overcome this:
>>
>> -----Original Message-----
>> From: Matney Sr, Kenneth D.
>> Sent: Wednesday, August 13, 2008 5:57 PM
>> To: Shipman, Galen M.
>> Cc: Graham, Richard L.
>> Subject: FW: ALPS modifications for MTT
>>
>> --- Functions-bak.pm 2008-08-06 14:31:26.256538000 -0400
>> +++ Functions.pm 2008-08-13 17:43:40.273641000 -0400
>> @@ -602,6 +602,8 @@
>> # Resource managers
>> return "SLURM"
>> if slurm_job();
>> + return "ALPS"
>> + if alps_job();
>> return "TM"
>> if pbs_job();
>> return "N1GE"
>> @@ -638,6 +640,8 @@
>> # Resource managers
>> return slurm_max_procs()
>> if slurm_job();
>> + return alps_max_procs()
>> + if alps_job();
>> return pbs_max_procs()
>> if pbs_job();
>> return n1ge_max_procs()
>> @@ -670,6 +674,8 @@
>> # Resource managers
>> return slurm_hosts()
>> if slurm_job();
>> + return alps_hosts()
>> + if alps_job();
>> return pbs_hosts()
>> if pbs_job();
>> return n1ge_hosts()
>> @@ -1004,6 +1010,70 @@
>>
>>
>> #-----------------------------------------------------------------------
>> ---
>>
>> +# Return "1" if we're running in an ALPS job; "0" otherwise.
>> +sub alps_job {
>> + Debug("&alps_job\n");
>> +
>> +# It is true that ALPS can be run in an interactive access mode;
>> however,
>> +# this would not be a true managed environment. Such only can be
>> +# achieved under a batch scheduler.
>> + return ((exists($ENV{BATCH_PARTITION_ID}) &&
>> + exists($ENV{PBS_NNODES})) ? "1" : "0");
>> +}
>> +
>> +
>> #----------------------------------------------------------------------
>> ----
>> +
>> +# If in an ALPS job, return the max number of processes we can run.
>> +# Otherwise, return 0.
>> +sub alps_max_procs {
>> + Debug("&alps_max_procs\n");
>> +
>> + return "0"
>> + if (!alps_job());
>> +
>> +# If we were not running under PBS or some other batch system, we
>> would
>> +# not have the foggiest idea of how many processes mpirun could
>> spawn.
>> + my $ret;
>> + $ret=$ENV{PBS_NNODES};
>> +
>> + Debug("&alps_max_procs returning: $ret\n");
>> + return "$ret";
>> +}
>> +
>> +
>> #----------------------------------------------------------------------
>> ----
>> +
>> +# If in an ALPS job, return the hosts we can run on. Otherwise,
>> return
>> +# "".
>> +sub alps_hosts {
>> + Debug("&alps_hosts\n");
>> +
>> + return ""
>> + if (!alps_job());
>> +
>> +# Again, we need a batch system to achieve management; return the
>> uniq'ed
>> +# contents of $PBS_HOSTFILE. Actually, on the Cray XT, we can
>> return
>> the
>> +# NIDS allocated by ALPS; but, without launching servers to other
>> service
>> +# nodes, all communication is via the launching node and NIDS
>> actually
>> +# have no persistent resource allocated to the user. That is, all
>> file
>> +# resources accessible from a NID are shared with the launching
>> node.
>>
>> +# And, since ALPS is managed by the batch system, only the
>> launching
>> node
>> +# can initiate communication with a NID. In effect, the Cray XT
>> model is
>> +# of a single service node with a varying number of compute
>> processors.
>> + open (FILE, $ENV{PBS_NODEFILE}) || return "";
>> + my $lines;
>> + while (<FILE>) {
>> + chomp;
>> + $lines->{$_} = 1;
>> + }
>> +
>> + my @hosts = sort(keys(%$lines));
>> + my $hosts = join(",", @hosts);
>> + Debug("&alps_hosts returning: $hosts\n");
>> + return "$hosts";
>> +}
>> +
>> +
>> #----------------------------------------------------------------------
>> ----
>> +
>> # Return "1" if we're running in a PBS job; "0" otherwise.
>> sub pbs_job {
>> Debug("&pbs_job\n");
>>
>>
>>
>>
>> --
>> Ken
>>
>> _______________________________________________
>> mtt-users mailing list
>> mtt-users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/mtt-users
>
>
> --
> Jeff Squyres
> Cisco Systems
>
> _______________________________________________
> mtt-users mailing list
> mtt-users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/mtt-users

-- 
Jeff Squyres
Cisco Systems