It sounds like, with the fault tolerance features specifically mentioned
by Vasiliy, MPI in its current form may not be the simplest choice.
On Tue, 2010-03-09 at 18:55 -0700, Ralph Castain wrote:
> Running an orted directly won't work - it is intended solely to be launched when running a job with "mpirun".
> You application doesn't immediately sounds like it -needs- MPI, though you could always use it anyway. The MPI messaging system is fast, but it isn't clear if your application will necessarily benefit from that speed. It depends upon how much communication is going on vs computation and idle time.
> If you are more familiar with the non-MPI methods, I would personally do it that way unless I found a need for MPI - for example, a place where MPI collectives such as MPI_Allgather would be helpful.
> On Mar 9, 2010, at 12:10 PM, Vasiliy G Tolstov wrote:
> > Hello.
> > Some times ago i run study MPI (openmpi).
> > I need to write application (client/server) runs on 50 servers in
> > parallel. Each application can communicate with others by tcp/ip (send
> > commands, doing some parallel computations).
> > Master - controls all clients - slaves (send control commands, if needed
> > restart clients). If master machine with server application die, some
> > other server need to recive master role and controls other slaves.
> > Can i do this things with openmpi? Or i need to write standart tcp/ip
> > client/server application?
> > I'm try to read some search results in google like this -
> > http://docs.sun.com/source/819-7480-11/ExecutingPrograms.htmlaopenmpi%
> > 20orted%20persistent%20daemon
> > but orted return error:
> > orted --daemonize
> > [mobile:24107] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in file
> > runtime/orte_init.c at line 125
> > --------------------------------------------------------------------------
> > It looks like orte_init failed for some reason; your parallel process is
> > likely to abort. There are many reasons that a parallel process can
> > fail during orte_init; some of which are due to configuration or
> > environment problems. This failure appears to be an internal failure;
> > here's some additional information (which may only be relevant to an
> > Open MPI developer):
> > orte_ess_base_select failed
> > --> Returned value Not found (-13) instead of ORTE_SUCCESS
> > Thank You. Sorry for my poor english.
> > --
> > Vasiliy G Tolstov <v.tolstov_at_[hidden]>
> > Selfip.Ru
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> users mailing list