Open MPI logo

MTT Users Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: [MTT users] FW: ALPS modifications for MTT
From: Matney Sr, Kenneth D. (matneykdsr_at_[hidden])
Date: 2008-08-14 10:24:42


Hi,

When running MTT on the Cray XT3/XT4 machines, I found that MTT does not
contain any support for ALPS. As a result, it always executes mpirun
with "-np 1". I patched lib/MTT/Values/Functions.pm with the following
to overcome this:

-----Original Message-----
From: Matney Sr, Kenneth D.
Sent: Wednesday, August 13, 2008 5:57 PM
To: Shipman, Galen M.
Cc: Graham, Richard L.
Subject: FW: ALPS modifications for MTT

--- Functions-bak.pm 2008-08-06 14:31:26.256538000 -0400
+++ Functions.pm 2008-08-13 17:43:40.273641000 -0400
@@ -602,6 +602,8 @@
     # Resource managers
     return "SLURM"
         if slurm_job();
+ return "ALPS"
+ if alps_job();
     return "TM"
         if pbs_job();
     return "N1GE"
@@ -638,6 +640,8 @@
     # Resource managers
     return slurm_max_procs()
         if slurm_job();
+ return alps_max_procs()
+ if alps_job();
     return pbs_max_procs()
         if pbs_job();
     return n1ge_max_procs()
@@ -670,6 +674,8 @@
     # Resource managers
     return slurm_hosts()
         if slurm_job();
+ return alps_hosts()
+ if alps_job();
     return pbs_hosts()
         if pbs_job();
     return n1ge_hosts()
@@ -1004,6 +1010,70 @@
 
 
#-----------------------------------------------------------------------

---
 
+# Return "1" if we're running in an ALPS job; "0" otherwise.
+sub alps_job {
+    Debug("&alps_job\n");
+
+#   It is true that ALPS can be run in an interactive access mode;
however,
+#   this would not be a true managed environment.  Such only can be
+#   achieved under a batch scheduler.
+    return ((exists($ENV{BATCH_PARTITION_ID}) &&
+             exists($ENV{PBS_NNODES})) ? "1" : "0");
+}
+
+#----------------------------------------------------------------------
----
+
+# If in an ALPS job, return the max number of processes we can run.
+# Otherwise, return 0.
+sub alps_max_procs {
+    Debug("&alps_max_procs\n");
+
+    return "0"
+        if (!alps_job());
+
+#   If we were not running under PBS or some other batch system, we
would
+#   not have the foggiest idea of how many processes mpirun could
spawn.
+    my $ret;
+    $ret=$ENV{PBS_NNODES};
+
+    Debug("&alps_max_procs returning: $ret\n");
+    return "$ret";
+}
+
+#----------------------------------------------------------------------
----
+
+# If in an ALPS job, return the hosts we can run on.  Otherwise, return
+# "".
+sub alps_hosts {
+    Debug("&alps_hosts\n");
+
+    return ""
+        if (!alps_job());
+
+#   Again, we need a batch system to achieve management; return the
uniq'ed
+#   contents of $PBS_HOSTFILE.  Actually, on the Cray XT, we can return
the
+#   NIDS allocated by ALPS; but, without launching servers to other
service
+#   nodes, all communication is via the launching node and NIDS
actually
+#   have no persistent resource allocated to the user.  That is, all
file
+#   resources accessible from a NID are shared with the launching node.
+#   And, since ALPS is managed by the batch system, only the launching
node
+#   can initiate communication with a NID.  In effect, the Cray XT
model is
+#   of a single service node with a varying number of compute
processors.
+    open (FILE, $ENV{PBS_NODEFILE}) || return "";
+    my $lines;
+    while (<FILE>) {
+        chomp;
+        $lines->{$_} = 1;
+    }
+
+    my @hosts = sort(keys(%$lines));
+    my $hosts = join(",", @hosts);
+    Debug("&alps_hosts returning: $hosts\n");
+    return "$hosts";
+}
+
+#----------------------------------------------------------------------
----
+
 # Return "1" if we're running in a PBS job; "0" otherwise.
 sub pbs_job {
     Debug("&pbs_job\n");
-- 
Ken