Subject: Re: [MTT users] FW: ALPS modifications for MTT
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2008-08-25 19:49:06


Committed in r1222.

If you want a directory in ompi-tests/ to save your ornl scripts and
ini files (perhaps analogous to ompi-tests/cisco and ompi-tests/iu),
let us know.

On Aug 20, 2008, at 10:38 AM, Matney Sr, Kenneth D. wrote:

> Hi Jeff,
>
> The trunk needs an additional patch to make ALPS work (without
> complaints). I have attached it hereto. Also, I will send along the
> ornl.ini script when I get it finalized. This wlll show how we do
> Cray
> XT builds, run, etc.
> --
> Ken
>
>
> -----Original Message-----
> From: mtt-users-bounces_at_[hidden]
> [mailto:mtt-users-bounces_at_[hidden]] On Behalf Of Jeff Squyres
> Sent: Thursday, August 14, 2008 10:47 AM
> To: General user list for the MPI Testing Tool
> Subject: Re: [MTT users] FW: ALPS modifications for MTT
>
> BTW, I committed this patch to the MTT trunk.
>
> I feel a little sheepish; I should have told you to use the trunk
> these days, not the release branch (I know the wiki specifically says
> otherwise). We really need to finally make a release out of what is
> on the trunk -- it's much more advanced than what is on the release
> branch (look at the CHANGES file in the top-level dir to see what has
> changed since the release branch).
>
> The Cisco MTT files in SVN are for the trunk; it's possible that the
> features that the release branch doesn't understand will just be
> ignored, but I haven't tried this in a long time.
>
>
>
> On Aug 14, 2008, at 10:35 AM, Jeff Squyres wrote:
>
>> This patch looks good to me.
>>
>> I'll commit. If you want to do any more work on MTT, perhaps ORNL
>> can add you to its "Schedule A" form for the Open MPI Third Party
>> Contribution form (it's very easy to amend Schedule A -- doesn't
>> require any authoritative signatures), we could get you an MTT SVN
>> account and you could commit this stuff directly.
>>
>>
>> On Aug 14, 2008, at 10:24 AM, Matney Sr, Kenneth D. wrote:
>>
>>> Hi,
>>>
>>> When running MTT on the Cray XT3/XT4 machines, I found that MTT
>>> does not
>>> contain any support for ALPS. As a result, it always executes
>>> mpirun
>>> with "-np 1". I patched lib/MTT/Values/Functions.pm with the
>>> following
>>> to overcome this:
>>>
>>> -----Original Message-----
>>> From: Matney Sr, Kenneth D.
>>> Sent: Wednesday, August 13, 2008 5:57 PM
>>> To: Shipman, Galen M.
>>> Cc: Graham, Richard L.
>>> Subject: FW: ALPS modifications for MTT
>>>
>>> --- Functions-bak.pm 2008-08-06 14:31:26.256538000 -0400
>>> +++ Functions.pm 2008-08-13 17:43:40.273641000 -0400
>>> @@ -602,6 +602,8 @@
>>> # Resource managers
>>> return "SLURM"
>>> if slurm_job();
>>> + return "ALPS"
>>> + if alps_job();
>>> return "TM"
>>> if pbs_job();
>>> return "N1GE"
>>> @@ -638,6 +640,8 @@
>>> # Resource managers
>>> return slurm_max_procs()
>>> if slurm_job();
>>> + return alps_max_procs()
>>> + if alps_job();
>>> return pbs_max_procs()
>>> if pbs_job();
>>> return n1ge_max_procs()
>>> @@ -670,6 +674,8 @@
>>> # Resource managers
>>> return slurm_hosts()
>>> if slurm_job();
>>> + return alps_hosts()
>>> + if alps_job();
>>> return pbs_hosts()
>>> if pbs_job();
>>> return n1ge_hosts()
>>> @@ -1004,6 +1010,70 @@
>>>
>>>
>>>
> #-----------------------------------------------------------------------
>>> ---
>>>
>>> +# Return "1" if we're running in an ALPS job; "0" otherwise.
>>> +sub alps_job {
>>> + Debug("&alps_job\n");
>>> +
>>> +# It is true that ALPS can be run in an interactive access mode;
>>> however,
>>> +# this would not be a true managed environment. Such only can be
>>> +# achieved under a batch scheduler.
>>> + return ((exists($ENV{BATCH_PARTITION_ID}) &&
>>> + exists($ENV{PBS_NNODES})) ? "1" : "0");
>>> +}
>>> +
>>> +
>>>
> #----------------------------------------------------------------------
>>> ----
>>> +
>>> +# If in an ALPS job, return the max number of processes we can run.
>>> +# Otherwise, return 0.
>>> +sub alps_max_procs {
>>> + Debug("&alps_max_procs\n");
>>> +
>>> + return "0"
>>> + if (!alps_job());
>>> +
>>> +# If we were not running under PBS or some other batch system, we
>>> would
>>> +# not have the foggiest idea of how many processes mpirun could
>>> spawn.
>>> + my $ret;
>>> + $ret=$ENV{PBS_NNODES};
>>> +
>>> + Debug("&alps_max_procs returning: $ret\n");
>>> + return "$ret";
>>> +}
>>> +
>>> +
>>>
> #----------------------------------------------------------------------
>>> ----
>>> +
>>> +# If in an ALPS job, return the hosts we can run on. Otherwise,
>>> return
>>> +# "".
>>> +sub alps_hosts {
>>> + Debug("&alps_hosts\n");
>>> +
>>> + return ""
>>> + if (!alps_job());
>>> +
>>> +# Again, we need a batch system to achieve management; return the
>>> uniq'ed
>>> +# contents of $PBS_HOSTFILE. Actually, on the Cray XT, we can
>>> return
>>> the
>>> +# NIDS allocated by ALPS; but, without launching servers to other
>>> service
>>> +# nodes, all communication is via the launching node and NIDS
>>> actually
>>> +# have no persistent resource allocated to the user. That is,
>>> all
>>> file
>>> +# resources accessible from a NID are shared with the launching
>>> node.
>>>
>>> +# And, since ALPS is managed by the batch system, only the
>>> launching
>>> node
>>> +# can initiate communication with a NID. In effect, the Cray XT
>>> model is
>>> +# of a single service node with a varying number of compute
>>> processors.
>>> + open (FILE, $ENV{PBS_NODEFILE}) || return "";
>>> + my $lines;
>>> + while (<FILE>) {
>>> + chomp;
>>> + $lines->{$_} = 1;
>>> + }
>>> +
>>> + my @hosts = sort(keys(%$lines));
>>> + my $hosts = join(",", @hosts);
>>> + Debug("&alps_hosts returning: $hosts\n");
>>> + return "$hosts";
>>> +}
>>> +
>>> +
>>>
> #----------------------------------------------------------------------
>>> ----
>>> +
>>> # Return "1" if we're running in a PBS job; "0" otherwise.
>>> sub pbs_job {
>>> Debug("&pbs_job\n");
>>>
>>>
>>>
>>>
>>> --
>>> Ken
>>>
>>> _______________________________________________
>>> mtt-users mailing list
>>> mtt-users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/mtt-users
>>
>>
>> --
>> Jeff Squyres
>> Cisco Systems
>>
>> _______________________________________________
>> mtt-users mailing list
>> mtt-users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/mtt-users
>
>
> --
> Jeff Squyres
> Cisco Systems
>
> _______________________________________________
> mtt-users mailing list
> mtt-users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/mtt-users
> <kmymtt2.patch>_______________________________________________
> mtt-users mailing list
> mtt-users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/mtt-users

-- 
Jeff Squyres
Cisco Systems