On Thu, Jul 08, 2010 at 09:43:48AM -0400, Gus Correa wrote:
> Douglas Guptill wrote:
>> On Wed, Jul 07, 2010 at 12:37:54PM -0600, Ralph Castain wrote:
>>
>>> No....afraid not. Things work pretty well, but there are places
>>> where things just don't mesh. Sub-node allocation in particular is
>>> an issue as it implies binding, and slurm and ompi have conflicting
>>> methods.
>>>
>>> It all can get worked out, but we have limited time and nobody cares
>>> enough to put in the effort. Slurm just isn't used enough to make it
>>> worthwhile (too small an audience).
>>
>> I am about to get my first HPC cluster (128 nodes), and was
>> considering slurm. We do use MPI.
>>
>> Should I be looking at Torque instead for a queue manager?
>>
> Hi Douglas
>
> Yes, works like a charm along with OpenMPI.
> I also have MVAPICH2 and MPICH2, no integration w/ Torque,
> but no conflicts either.
Thanks, Gus.
After some lurking and reading, I plan this:
Debian (lenny)
+ fai - for compute-node operating system install
+ Torque - job scheduler/manager
+ MPI (Intel MPI) - for the application
+ MPI (OpenMP) - alternative MPI
Does anyone see holes in this plan?
Thanks,
Douglas
--
Douglas Guptill voice: 902-461-9749
Research Assistant, LSC 4640 email: douglas.guptill_at_[hidden]
Oceanography Department fax: 902-494-3877
Dalhousie University
Halifax, NS, B3H 4J1, Canada
|