Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Miguel Figueiredo Mascarenhas Sousa Filipe (miguel.filipe_at_[hidden])
Date: 2006-09-07 17:10:21


Hi all,

Well, I just wanted to say that from a software engineering (and also
computer science) point of view,

OMP, MPI and threads are completely diferent models for parallel
computation/concurrent programming.

I do not believe that any capable engineer (or good programmer for all
I know) can know in advance whats the "best"
one to use without knowing the problem space, design requisites, ..etc..

Its not just about "portability" or code "readability/mantainability".

Deciding which too use will depend (and therefore influence) the
aplication arquitecture.

Should I use OMP on a web server as apache or tomcat, providing that
way better portability and code readability?

Should I use OMP or threading for a massively parallel system, such
has blue gene/L, what about SGI Altix3000?

Shoud I use threading for a 2 cpu, shared memory system for a
sequencial aplication where I just need to speed up
some highly vectorizable loops?

For instance, my thesis dealt with paralelizing a seismic simulation
application, I did a thread and a MPI version.
The threaded version, since "tasks" could share massive amounts of
data with very little lock contention, could work bigger data sets
than the MPI version (given the same total amount of ram). But the MPI
version could run in clusters, while with threading I needed a single
system image.
OpenMP was inadequate since it would have a much bigger sequential
execution time, providing inadequate speedup, for a algorithm which
was very parallel.

Seedups measured in the threaded version and MPI version were about
1.99 in 2cpu mode, (<1% of sequential computation). In MPI, with 16
cpus (1 gigabit link for 8 x 2cpu nodes), the measured speedup was
14.8.

My threaded version would never achieve a 14.8 speedup, even in a
"SSI" 8 node cluster.
The effort applied to make the MPI version so scalable was _much_
bigger.. (designing a new concurrent, distributed algorithm to replace
one that was sequential, that in the sequencial aplication amounted to
1% of the computation time.) than the threaded one. It uses more ram
per process, but can scale up to 64/128 nodes, depending on the
problem size, and it doesn't require a shared memory system.
My threaded version, in a shared memory system, with lots of cpus,
will scale quite a lot..but probably the agregate bandwith will be
inferior to a cluster with the same amount of cpus/ram (normally, big
SMP or NUMA systems have bigger RAM latency and not proportional
bandwith).
Basically, I can't predict which performs better.

So, I hope that its understandable that choosing the right parallel
computing model isn't just a matter of "taste".

On 9/6/06, George Bosilca <bosilca_at_[hidden]> wrote:
> From my perspective some [let's say #1 and #2) of the most important
> features of an application that has to last for a while is the
> readability and portability. And OMP code is far more readable than
> pthread one. The loops look like loops, the critical sections are
> obvious and the sequential meaning of the program is preserved.
>
> On Sep 5, 2006, at 7:52 PM, Durga Choudhury wrote:
>
> > My opinion would be to use pthreads, for a couple of reasons:
> >
> > 1. You don't need an OMP aware compiler; any old compiler would do.
>
> Compilers can be downloaded for free these days. And most of them
> have now OMP support. And on all operating systems (i.e. even the
> free Microsoft compiler now has OMP support, and Windows was
> definitively not the platform I expect to use for my OMP tasks).
>
> > 2. The pthread library is more well adapted and hence might be more
> > optimized than the code emitted from an OMP compiler.
>
> The pthread library add a huge overhead for all operations. At this
> level granularity quite often you need atomic locks and operations,
> not critical sections protected by mutexes. Unfortunately, there is
> no portable library that give you a common interface to atomic
> operations (there was a BSD one at one point). Moreover, using
> threads instead of OMP directive move the burden on the programmer.
> Most of the people just cannot afford a one year student who has to
> first understand and then add the correct pthread directive inside.
> And for which result ... you don't even know that you will get the
> fastest version. On the other side OMP compilers are getting smarter
> and smarter every day. Today the results are quite impressive, just
> imagine what will happens in few years.
>
> >
> > If your operating system is Linux, you may use the clone() system
> > call directly; this would add further optimization at the expense
> > of portability.
>
> It's always a trade-off between performance and portability. What do
> you want to loose in order to get the 1% performance gain ... And in
> this case the only performance gain you will get is when you start
> the threads, otherwise you will not improve anything. Generally,
> people prefer to use threads pools in order to avoid the overhead of
> creating and destroying threads all the time.
>
> george.
>
> >
> > Durga
> >
> >
> > On 9/5/06, George Bosilca <bosilca_at_[hidden]> wrote:
> > On Sep 5, 2006, at 3:19 AM, Aidaros Dev wrote:
> >
> > > Nowdays we hear about intel dual core processor, An Intel dual-core
> > > processor consists of two complete execution cores in one physical
> > > processor both running at the same frequency. Both cores share the
> > > same packaging and the same interface with the chipset/memory.
> > > Can I use MPI library to communicate these processors? Can we
> > > consider as they are separated?
> >
> >
> > Yes and yes. However, these architectures fit better on a different
> > programming model. If you want to get the max performance out of
> > them, a OMP approach (instead of MPI) is more suitable. Using
> > processes on such architecture is just a waste of performance. One
> > should use a thread model, with locking to insure the coordination
> > between memory accesses. Or let the underlying libraries do their
> > magic for you. As an example most of the mathematical codes based on
> > BLAS can use the GOTO BLAS (developed at TACC) to get multi-code (and
> > multi-CPU) support for free, as this library will do all BLAS
> > operation in parallel using multiple threads.
> >
> > george.
> >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> >
> >
> > --
> > Devil wanted omnipresence;
> > He therefore created communists.
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> "Half of what I say is meaningless; but I say it so that the other
> half may reach you"
> Kahlil Gibran
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

-- 
Miguel Sousa Filipe