Jeff, 

Thanks for the detailed discussion. It certainly makes things a lot clearer, just as I was giving up my hopes for a reply.

The app is fairly heavy on communication (~10k messages per minute) and is also embarrassingly parallel. Taking this into account, I think I'll readjust my resilience expectations and go with MPI as it will make communications a breeze to deal with.

It does make sense to have the ability to add/remove processes on the go. In a multi-core hardware a scheduler could add more processes to an app as the hardware becomes freed up from other tasks. Of course that would be a problem for apps that require some type of data synchronisation (tightly coupled as you say). It would be nice to have the option of "mpirun -min 4 -max 16" and let the scheduler optimise based on availability.

I'm currently running a test case on two machines with two cores each and, after one day, so far so good. We'll see how it goes.

Thanks again
dok

On Dec 6, 2007 2:06 PM, Jeff Squyres <jsquyres@cisco.com> wrote:
It certainly does make sense to use MPI for such a setup.  But there
are some important things to consider:

1. MPI, at its heart, is a communications system.  There's lots of
other bells and whistles (e.g ., starting up a whole bunch of processes
in tandem), but at the core: it's all about passing messages.

2. MPI tends to lend itself to a fairly tightly coupled systems.  The
usual model is that you start all of your parallel processes at the
same time (e.g., "mpirun -np 32 my_application").  The current state
of technology is *not* good in terms of fault tolerance -- most MPI's
(Open MPI included) will kill the entire job if any one of those
processes die.  This is an important factor for running for weeks,
months, or years.

(lots of good research is ongoing about fault tolerance and MPI, but
the existing solutions are still emphasizing tightly-coupled
applications or required a bunch of involvement from the application)

3. MPI also emphasizes performance: low latency, high bandwidth, good
concurrency, etc.

If you don't need these things, for example, if your communication
between manager and worker is infrequent, and/or the overall
application time is not dominated by communication time, you might be
better served for [extremely] long-running applications by using a
simple (but resilient) sockets-based communication layer and not using
MPI.  I say this mainly because of the fault tolerance issues involved
and the natural hardware MTBF values that we see on today's hardware.

Hope that helps.


On Dec 4, 2007, at 1:15 PM, doktora v wrote:

> Hi, although I did my due diligence on searching for this question,
> I apologise if this is a repeat.
>
> From an architectural point of view does it make sense to use MPI in
> the following scenario (for the purposes of resilience as much as
> parallelization):
>
> Each process is a long-running process (runs non-interrupted for
> weeks, months or even years) that collects and crunches some
> streaming data, for example temperature readings, and the data is
> replicated to R nodes.
>
> Because this is a diversion from the normal modus operandi (i.e. all
> data is immediately available), is there any obvious MPI issues that
> I am not considering in designing such an application?
>
> Here is a more detailed description of the app:
>
> A master receives the data and dispatches it according to some
> function such that each tuple is replicated R times to R of the N
> nodes (with R<=N). Suppose that there are K regions from which
> temperature readings stream in  in the form of <K,T> where K is the
> region id and T is the temperature reading. The master sends <K,T>
> to R of the N nodes. These nodes maintain a long-term state of, say,
> the min/max readings. If R=N=2, the system is basically duplicated
> and if one of the two nodes dies inadvertently, the other one still
> has accounted for all the data.
>
> Here is some pseudo-code:
>
> int main(argc, argv)
>
> int N=10, R=3, K=200;
>
> Init(argc,argv);
> int rank=COMM_WORLD.Get_rank();
> if(rank==0) {
>      int lastnode = 1;
>      while(read <k,T> from socket)
>        for(i in 0:R) COMM_WORLD.Send(<k,T>,1,tuple,++lastnode%N,tag);
> } else {
>       COMM_WORLD.Recv(<k,T>,1,tuple,any,tag,Info);
>        process_message(<k,T>);
> }
>
> Many thanks for your time!
> Regards
> Dok
> _______________________________________________
> users mailing list
> users@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


--
Jeff Squyres
Cisco Systems
_______________________________________________
users mailing list
users@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users