John Hearns wrote:
> On 20 January 2011 16:50, Olivier SANNIER <Olivier.SANNIER_at_[hidden]> wrote:
>>> Ive started looking at beowulf clusters, and that lead me to PBS. Am I
>> right in assuming that PBS (PBSPro or TORQUE) could be used to do the
>> monitoring and the load balancing I thought of?
> Yes, that is correct. An alternative is Gridengine.
> To be honest, I think you should contact a company which sells
> ccomputational clusters.
> They will send someone to tell you how these clusters work, and give
> you an idea of how a small cluster could help with your work.
> I can suggest some companies off-list.
1) Besides John's suggestions, there are some good and informative
articles on how clusters work, etc, at ClusterMonkey.net:
2) Since clusters != MPI != OpenMPI,
you may find general information about clusters
in the Beowulf and Rocks Clusters web sites
and mailing lists:
BTW, Rocks provides free software to setup a standard cluster with
minimal effort. It is a NSF-supported project at UCSD:
3) Resource managers / job queuing systems:
Torque (which we use here) is free, available to download
from the AdaptiveComputing/ClusterResources web site:
Torque was formerly called PBS,
although PBS-Pro also exists as a licensed product:
Torque performs resource management, job queuing and control,
and, along with its cousin job scheduler Maui, which is also
available from the same site (one of the links above),
gives you a handle to manage resource optimization and load balancing
in one or more clusters.
There are other free resource managers, like Sun Grid Engine,
although its future is not completely clear after Sun was
bought by Oracle, and its development/maintenance
apparently has been taken over by Univa:
Lawrence Livermore produces another free scheduler named Slurm,
but my perception is that Slurm doesn't integrate to as many HPC
tools or as easily as Torque and SGE do:
Other licensed resource managers/batch systems also exist,
including Moab (Adaptive Computing),
LSF (Platform Computing),
Tivoli/Load Leveler (IBM),
There are also "grid" resource managers (Condor, Globus, etc):
I hope this helps,