Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] Busy waiting [was Re: (no subject)]
From: George Bosilca (bosilca_at_[hidden])
Date: 2008-04-24 11:17:30

Well, blocking or not blocking this is the question !!! Unfortunately,
it's more complex than this thread seems to indicate. It's not that we
didn't want to implement it in Open MPI, it's that at one point we had
to make a choice ... and we decided to always go for performance first.

However, there were some experimentations to go in blocking more at
least when only TCP was used. Unfortunately, this break some other
things in Open MPI, because of our progression model. We are component
based and these components are allowed to register periodically called
callbacks ... And here periodically means as often as possible. There
are at least 2 components that use this mechanism for their own
progression: romio (mca/io/romio) and one-sided communications (mca/
osc/*). Switching in blocking mode will break these 2 components
completely. This was the reason why we're not blocking when only TCP
is used.

Anyway, there is a solution. We have to move from a poll base progress
for these components to an event base progress. There were some
discussions, and if I remember well ... everybody's waiting for one of
my patches :) A patch that allow a component to add a completion
callback to MPI requests ... I don't have a clear deadline for this,
and unfortunately I'm a little busy right now ... but I'll work on it


On Apr 24, 2008, at 9:43 AM, Barry Rountree wrote:

> On Thu, Apr 24, 2008 at 12:56:03PM +0200, Ingo Josopait wrote:
>> I am using one of the nodes as a desktop computer. Therefore it is
>> most
>> important for me that the mpi program is not so greedily acquiring
>> cpu
>> time.
> This is a kernel scheduling issue, not an OpenMPI issue. Busy
> waiting in
> one process should not cause noticable loss of responsiveness in
> another
> processes. Have you experimented with the "nice" command?
>> But I would imagine that the energy consumption is generally a big
>> issue, since energy is a major cost factor in a computer cluster.
> Yup.
>> When a
>> cpu is idle, it uses considerably less energy. Last time I checked my
>> computer used 180W when both cpu cores were working and 110W when
>> both
>> cores were idle.
> What processor is this?
>> I just made a small hack to solve the problem. I inserted a simple
>> sleep
>> call into the function 'opal_condition_wait':
>> --- orig/openmpi-1.2.6/opal/threads/condition.h
>> +++ openmpi-1.2.6/opal/threads/condition.h
>> @@ -78,6 +78,7 @@
>> #endif
>> } else {
>> while (c->c_signaled == 0) {
>> + usleep(1000);
>> opal_progress();
>> }
>> }
> I expect this would lead to increased execution time for all programs
> and increased energy consumption for most programs. Recall that
> energy
> is power multiplied by time. You're reducing the power on some nodes
> and increasing time on all nodes.
>> The usleep call will let the program sleep for about 4 ms (it won't
>> sleep for a shorter time because of some timer granularity). But
>> that is
>> good enough for me. The cpu usage is (almost) zero when the tasks are
>> waiting for one another.
> I think your mistake here is considering CPU load to be a useful
> metric.
> It isn't. Responsiveness is a useful metric, energy is a useful
> metric,
> but CPU load isn't a reliable guide to either of these.
>> For a proper implementation you would want to actively poll without a
>> sleep call for a few milliseconds, and then use some other method
>> that
>> sleeps not for a fixed time, but until new messages arrive.
> Well, it sounds like you can get to this before I can. Post your
> patch
> here and I'll test it on the NAS suite, UMT2K, Paradis, and a few
> synthetic benchmarks I've written. The cluster I use has multimeters
> hooked up so I can also let you know how much energy is being saved.
> Barry Rountree
> Ph.D. Candidate, Computer Science
> University of Georgia
> _______________________________________________
> users mailing list
> users_at_[hidden]

  • application/pkcs7-signature attachment: smime.p7s