Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] CPU burning in Wait state
From: Richard Treumann (treumann_at_[hidden])
Date: 2008-09-03 14:17:11


1) Assume you are running an MPI program which has 16 tasks in
MPI_COMM_WORLD, you have 16 dedicated CPUs and each task is single
threaded. (a task is a distinct process, a process can contain one or more
threads) The is the most common traditional model. In this model, when a
task makes a blocking call, the CPU is used to poll the communication
layer. With only one thread per task, there is no way the CPU can be given
other useful work because the only thread is in the MPI_Bast and not
available to compute. With nothing else for the CPU to do anyway, it may
as well poll because that is likely to complete the blocking operation in
shortest time. Polling is the right choice. You should not worry that the
CPU is being "burned". It will not wear out.

2) Now assume you have the same number of tasks and CPUs but you have
provided a compute thread and a communication thread in each task. At the
moment you make an MPI_Bcast call on each task's communication thread you
have unfinished computation that the CPUs could process on the compute
threads. In this case you want the CPU to be released by the blocked
MPI_Bcast so it can be used by the compute thread. The MPI_Bcast may take
longer to complete because it is not burning the CPU but if useful
computation is going forward you come out ahead. A non-polling mode for the
blocking MPI_Bcast is the better option.

3) Take a third case - the CPUs are not dedicated to your MPI job. You
have only one thread per task but when that thread is blocked in an
MPI_Bcast you want other processes to be able to run. This is not a common
situation in production environments but may be common in learning or
development situations. Perhaps your MPI homework problem is running at the
same time someone else is trying to compile theirs on the same nodes. In
this case you really do not need the MPI_Bcast to finish in the shortest
possible time and you do want the people who share the node with you to
quit complaining. Again. a non-polling mode than gives up the CPU and lets
your neighbors compilation run is best.

Which of these is closest to your situation? If it is situation 1, why
would you care that CPU is burning? If it is situation 2 or 3 then you do
have reason to care.


Dick Treumann - MPI Team
IBM Systems & Technology Group
Dept X2ZA / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601
Tele (845) 433-7846 Fax (845) 433-8363

users-bounces_at_[hidden] wrote on 09/03/2008 01:11:00 PM:

> [image removed]
> Re: [OMPI users] CPU burning in Wait state
> Vincent Rotival
> to:
> Open MPI Users
> 09/03/2008 01:15 PM
> Sent by:
> users-bounces_at_[hidden]
> Please respond to Open MPI Users
> Eugene,
> No what I'd like is that when doing something like
> call mpi_bcast(data, 1, MPI_INTEGER, 0, .....)
> the program continues AFTER the Bcast is completed (so no control
> returned to user), but while threads with rank > 0 are waiting in Bcast
> they are not taking CPU resources
> I hope it is more clear, I apologize for not being clear in the first
> Vincent
> Eugene Loh wrote:
> >
> > Vincent Rotival wrote:
> >
> >> The solution I retained was for the main thread to isend data
> >> separately to each other threads that are using Irecv + loop on
> >> mpi_test to test the finish of the Irecv. It mught be dirty but
> >> works much better than using Bcast
> >
> > Thanks for the clarification.
> >
> > But this strikes me more as a question about the MPI standard than
> > about the Open MPI implementation. That is, what you really want is
> > for the MPI API to support a non-blocking form of collectives. You
> > want control to return to the user program before the
> > barrier/bcast/etc. operation has completed. That's an API change.
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> >
> >
> >
> _______________________________________________
> users mailing list
> users_at_[hidden]