Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] arch question: long running app
From: doktora v (doktora_at_[hidden])
Date: 2007-12-06 10:58:32


Jeff,
Thanks for the detailed discussion. It certainly makes things a lot clearer,
just as I was giving up my hopes for a reply.

The app is fairly heavy on communication (~10k messages per minute) and is
also embarrassingly parallel. Taking this into account, I think I'll
readjust my resilience expectations and go with MPI as it will make
communications a breeze to deal with.

It does make sense to have the ability to add/remove processes on the go. In
a multi-core hardware a scheduler could add more processes to an app as the
hardware becomes freed up from other tasks. Of course that would be a
problem for apps that require some type of data synchronisation (tightly
coupled as you say). It would be nice to have the option of "mpirun -min 4
-max 16" and let the scheduler optimise based on availability.

I'm currently running a test case on two machines with two cores each and,
after one day, so far so good. We'll see how it goes.

Thanks again
dok

On Dec 6, 2007 2:06 PM, Jeff Squyres <jsquyres_at_[hidden]> wrote:

> It certainly does make sense to use MPI for such a setup. But there
> are some important things to consider:
>
> 1. MPI, at its heart, is a communications system. There's lots of
> other bells and whistles (e.g., starting up a whole bunch of processes
> in tandem), but at the core: it's all about passing messages.
>
> 2. MPI tends to lend itself to a fairly tightly coupled systems. The
> usual model is that you start all of your parallel processes at the
> same time (e.g., "mpirun -np 32 my_application"). The current state
> of technology is *not* good in terms of fault tolerance -- most MPI's
> (Open MPI included) will kill the entire job if any one of those
> processes die. This is an important factor for running for weeks,
> months, or years.
>
> (lots of good research is ongoing about fault tolerance and MPI, but
> the existing solutions are still emphasizing tightly-coupled
> applications or required a bunch of involvement from the application)
>
> 3. MPI also emphasizes performance: low latency, high bandwidth, good
> concurrency, etc.
>
> If you don't need these things, for example, if your communication
> between manager and worker is infrequent, and/or the overall
> application time is not dominated by communication time, you might be
> better served for [extremely] long-running applications by using a
> simple (but resilient) sockets-based communication layer and not using
> MPI. I say this mainly because of the fault tolerance issues involved
> and the natural hardware MTBF values that we see on today's hardware.
>
> Hope that helps.
>
>
> On Dec 4, 2007, at 1:15 PM, doktora v wrote:
>
> > Hi, although I did my due diligence on searching for this question,
> > I apologise if this is a repeat.
> >
> > From an architectural point of view does it make sense to use MPI in
> > the following scenario (for the purposes of resilience as much as
> > parallelization):
> >
> > Each process is a long-running process (runs non-interrupted for
> > weeks, months or even years) that collects and crunches some
> > streaming data, for example temperature readings, and the data is
> > replicated to R nodes.
> >
> > Because this is a diversion from the normal modus operandi (i.e. all
> > data is immediately available), is there any obvious MPI issues that
> > I am not considering in designing such an application?
> >
> > Here is a more detailed description of the app:
> >
> > A master receives the data and dispatches it according to some
> > function such that each tuple is replicated R times to R of the N
> > nodes (with R<=N). Suppose that there are K regions from which
> > temperature readings stream in in the form of <K,T> where K is the
> > region id and T is the temperature reading. The master sends <K,T>
> > to R of the N nodes. These nodes maintain a long-term state of, say,
> > the min/max readings. If R=N=2, the system is basically duplicated
> > and if one of the two nodes dies inadvertently, the other one still
> > has accounted for all the data.
> >
> > Here is some pseudo-code:
> >
> > int main(argc, argv)
> >
> > int N=10, R=3, K=200;
> >
> > Init(argc,argv);
> > int rank=COMM_WORLD.Get_rank();
> > if(rank==0) {
> > int lastnode = 1;
> > while(read <k,T> from socket)
> > for(i in 0:R) COMM_WORLD.Send(<k,T>,1,tuple,++lastnode%N,tag);
> > } else {
> > COMM_WORLD.Recv(<k,T>,1,tuple,any,tag,Info);
> > process_message(<k,T>);
> > }
> >
> > Many thanks for your time!
> > Regards
> > Dok
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> --
> Jeff Squyres
> Cisco Systems
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>