Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Help with some fundamentals
From: David Zhang (solarbikedz_at_[hidden])
Date: 2011-01-20 11:59:32


you would probably want some kind of cluster managing software like torque

On Thu, Jan 20, 2011 at 8:50 AM, Olivier SANNIER <
Olivier.SANNIER_at_[hidden]> wrote:

> First of all, thank you for answers.
>
> I have a bit more questions, added below.
>
>
>
> What is the behavior in case a node dies or becomes unreachable?
>
> Your run will be aborted. However there is checkpoint/restart support for
> Linux http://www.open-mpi.org/faq/?category=ft
>
>
>
> As this is a Win32 program, I’ll have to take into account that there is
> only the « abort » behavior.
>
>
>
> What makes any given machine become a node available for tasks?
>
> You define it in a host file or a batch system tells it OpenMPI.
>
>
>
> So there is no dynamic discovery of nodes available on the network. Unless,
> of course, if I was to write a tool that would do it before the actual run
> is started.
>
>
>
> Is there a monitoring tool that would give me indications of the status and
> health of the nodes?
>
> This has nothing to do with MPI. Nagios or Ganglia can do that.
>
>
>
> I was more thinking of a tool that would tell me a node is already
> performing a task, so that I can avoid having it oversubscribed.
>
>
>
> I’m quite sure all these are trivial questions for those with more
> experience, but I’m having a hard time finding resources that would answer
> those.
>
> Read an introduction on programming with MPI and another one on Beowulf
> clusters (batch systems, monitoring, shared file systems). This should give
> you enough information on the topic. If you don't mind spending more money
> on software you can also take a look at Microsofts HPC Server.
>
> I’ve started looking at beowulf clusters, and that lead me to PBS. Am I
> right in assuming that PBS (PBSPro or TORQUE) could be used to do the
> monitoring and the load balancing I thought of?
>
>
>
> Thanks
>
> Olivier
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

-- 
David Zhang
University of California, San Diego