Subject: Re: [MTT users] FW: ALPS modifications for MTT
From: Matney Sr, Kenneth D. (matneykdsr_at_[hidden])
Date: 2008-08-20 10:38:22


Hi Jeff,

The trunk needs an additional patch to make ALPS work (without
complaints). I have attached it hereto. Also, I will send along the
ornl.ini script when I get it finalized. This wlll show how we do Cray
XT builds, run, etc.

-- 
Ken
-----Original Message-----
From: mtt-users-bounces_at_[hidden]
[mailto:mtt-users-bounces_at_[hidden]] On Behalf Of Jeff Squyres
Sent: Thursday, August 14, 2008 10:47 AM
To: General user list for the MPI Testing Tool
Subject: Re: [MTT users] FW: ALPS modifications for MTT
BTW, I committed this patch to the MTT trunk.
I feel a little sheepish; I should have told you to use the trunk  
these days, not the release branch (I know the wiki specifically says  
otherwise).  We really need to finally make a release out of what is  
on the trunk -- it's much more advanced than what is on the release  
branch (look at the CHANGES file in the top-level dir to see what has  
changed since the release branch).
The Cisco MTT files in SVN are for the trunk; it's possible that the  
features that the release branch doesn't understand will just be  
ignored, but I haven't tried this in a long time.
On Aug 14, 2008, at 10:35 AM, Jeff Squyres wrote:
> This patch looks good to me.
>
> I'll commit.  If you want to do any more work on MTT, perhaps ORNL  
> can add you to its "Schedule A" form for the Open MPI Third Party  
> Contribution form (it's very easy to amend Schedule A -- doesn't  
> require any authoritative signatures), we could get you an MTT SVN  
> account and you could commit this stuff directly.
>
>
> On Aug 14, 2008, at 10:24 AM, Matney Sr, Kenneth D. wrote:
>
>> Hi,
>>
>> When running MTT on the Cray XT3/XT4 machines, I found that MTT  
>> does not
>> contain any support for ALPS.  As a result, it always executes mpirun
>> with "-np 1".  I patched lib/MTT/Values/Functions.pm with the  
>> following
>> to overcome this:
>>
>> -----Original Message-----
>> From: Matney Sr, Kenneth D.
>> Sent: Wednesday, August 13, 2008 5:57 PM
>> To: Shipman, Galen M.
>> Cc: Graham, Richard L.
>> Subject: FW: ALPS modifications for MTT
>>
>> --- Functions-bak.pm	2008-08-06 14:31:26.256538000 -0400
>> +++ Functions.pm	2008-08-13 17:43:40.273641000 -0400
>> @@ -602,6 +602,8 @@
>>    # Resource managers
>>    return "SLURM"
>>        if slurm_job();
>> +    return "ALPS"
>> +        if alps_job();
>>    return "TM"
>>        if pbs_job();
>>    return "N1GE"
>> @@ -638,6 +640,8 @@
>>    # Resource managers
>>    return slurm_max_procs()
>>        if slurm_job();
>> +    return alps_max_procs()
>> +        if alps_job();
>>    return pbs_max_procs()
>>        if pbs_job();
>>    return n1ge_max_procs()
>> @@ -670,6 +674,8 @@
>>    # Resource managers
>>    return slurm_hosts()
>>        if slurm_job();
>> +    return alps_hosts()
>> +        if alps_job();
>>    return pbs_hosts()
>>        if pbs_job();
>>    return n1ge_hosts()
>> @@ -1004,6 +1010,70 @@
>>
>>
>>
#-----------------------------------------------------------------------
>> ---
>>
>> +# Return "1" if we're running in an ALPS job; "0" otherwise.
>> +sub alps_job {
>> +    Debug("&alps_job\n");
>> +
>> +#   It is true that ALPS can be run in an interactive access mode;
>> however,
>> +#   this would not be a true managed environment.  Such only can be
>> +#   achieved under a batch scheduler.
>> +    return ((exists($ENV{BATCH_PARTITION_ID}) &&
>> +             exists($ENV{PBS_NNODES})) ? "1" : "0");
>> +}
>> +
>> + 
>>
#----------------------------------------------------------------------
>> ----
>> +
>> +# If in an ALPS job, return the max number of processes we can run.
>> +# Otherwise, return 0.
>> +sub alps_max_procs {
>> +    Debug("&alps_max_procs\n");
>> +
>> +    return "0"
>> +        if (!alps_job());
>> +
>> +#   If we were not running under PBS or some other batch system, we
>> would
>> +#   not have the foggiest idea of how many processes mpirun could
>> spawn.
>> +    my $ret;
>> +    $ret=$ENV{PBS_NNODES};
>> +
>> +    Debug("&alps_max_procs returning: $ret\n");
>> +    return "$ret";
>> +}
>> +
>> + 
>>
#----------------------------------------------------------------------
>> ----
>> +
>> +# If in an ALPS job, return the hosts we can run on.  Otherwise,  
>> return
>> +# "".
>> +sub alps_hosts {
>> +    Debug("&alps_hosts\n");
>> +
>> +    return ""
>> +        if (!alps_job());
>> +
>> +#   Again, we need a batch system to achieve management; return the
>> uniq'ed
>> +#   contents of $PBS_HOSTFILE.  Actually, on the Cray XT, we can  
>> return
>> the
>> +#   NIDS allocated by ALPS; but, without launching servers to other
>> service
>> +#   nodes, all communication is via the launching node and NIDS
>> actually
>> +#   have no persistent resource allocated to the user.  That is, all
>> file
>> +#   resources accessible from a NID are shared with the launching  
>> node.
>>
>> +#   And, since ALPS is managed by the batch system, only the  
>> launching
>> node
>> +#   can initiate communication with a NID.  In effect, the Cray XT
>> model is
>> +#   of a single service node with a varying number of compute
>> processors.
>> +    open (FILE, $ENV{PBS_NODEFILE}) || return "";
>> +    my $lines;
>> +    while (<FILE>) {
>> +        chomp;
>> +        $lines->{$_} = 1;
>> +    }
>> +
>> +    my @hosts = sort(keys(%$lines));
>> +    my $hosts = join(",", @hosts);
>> +    Debug("&alps_hosts returning: $hosts\n");
>> +    return "$hosts";
>> +}
>> +
>> + 
>>
#----------------------------------------------------------------------
>> ----
>> +
>> # Return "1" if we're running in a PBS job; "0" otherwise.
>> sub pbs_job {
>>    Debug("&pbs_job\n");
>>
>>
>>
>>
>> -- 
>> Ken
>>
>> _______________________________________________
>> mtt-users mailing list
>> mtt-users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/mtt-users
>
>
> -- 
> Jeff Squyres
> Cisco Systems
>
> _______________________________________________
> mtt-users mailing list
> mtt-users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/mtt-users
-- 
Jeff Squyres
Cisco Systems
_______________________________________________
mtt-users mailing list
mtt-users_at_[hidden]
http://www.open-mpi.org/mailman/listinfo.cgi/mtt-users