On Mar 21, 2010, at 12:58 PM, Addepalli, Srirangam V wrote:
> Yes We have seen this behavior too.
>>> Another behavior I have seen is that one MPI process starts to
>>> show different elapsed time than its peers. Is it because
>>> checkpoint happened on behalf of this process?
> From: users-bounces_at_[hidden] [users-bounces_at_[hidden]] On
> Behalf Of ananda.mudar_at_[hidden] [ananda.mudar_at_[hidden]]
> Sent: Saturday, March 20, 2010 10:18 PM
> To: users_at_[hidden]
> Subject: [OMPI users] top command output shows huge CPU utilization
> when openmpi processes resume after the checkpoint
> When I checkpoint my openmpi application using ompi_checkpoint, I
> see that top command suddenly shows some really huge numbers in "CPU
> %" field such as 150% 200% etc. After sometime, these numbers do
> come back to the normal numbers under 100%. This happens exactly
> around the time checkpoint is completed and when the processes are
> resuming the execution.
One cause for this type of CPU utilization is due to the C/R thread.
During non-checkpoint/normal processing the thread is polling for a
checkpoint fairly aggressively. You can change how aggressive the
thread is by adjusting the two parameters below:
I usually set the latter to:
You can also turn off the C/R thread, either by configure'ing without
it, or disabling it at runtime by setting the 'opal_cr_use_thread'
parameter to '0':
The CPU increase during the checkpoint may be due to both the Open MPI
C/R thread, and the BLCR thread becoming active on the machine. You
might try to determine whether this is BLCR's CPU utilization or Open
MPI's by creating a single process application and watching the CPU
utilization when checkpointing with BLCR. You may also want to look at
the memory consumption of the process to make sure that there is
enough for BLCR to run efficiently.
This may also be due to processes finished with the checkpoint waiting
on other peer processes to finish. I don't think we have a good way to
control how aggressively these waiting processes poll for completion
of peers. If this becomes a problem we can look into adding a
parameter similar to opal_cr_thread_sleep_wait to throttle the polling
on the machine.
The disadvantage of making the various polling for completion loops
less aggressive, is that the checkpoint may stall the checkpoint and/
or application for a little longer than necessary. But if this is
acceptable to the user, then they can adjust the MCA parameters as
> Another behavior I have seen is that one MPI process starts to show
> different elapsed time than its peers. Is it because checkpoint
> happened on behalf of this process?
Can you explain a bit more about what you mean by this? Neither Open
MPI nor BLCR messes with the timer on the machine, so we are not
changing it in any way. The process is 'stopped' briefly while BLCR
takes the checkpoint, so this will extend the running time of the
process. How much the running time is extended (a.k.a. checkpoint
overhead) is determined by a bunch of things, but primarily by the
storage device(s) that the checkpoint is being written to.
> For your reference, I am using open mpi 1.3.4 and BLCR 0.8.2 for
It would be interesting to know if you see the same behavior with the
trunk or v1.5 series of Open MPI.
Hope that helps,
> Please do not print this email unless it is absolutely necessary.
> The information contained in this electronic message and any
> attachments to this message are intended for the exclusive use of
> the addressee(s) and may contain proprietary, confidential or
> privileged information. If you are not the intended recipient, you
> should not disseminate, distribute or copy this e-mail. Please
> notify the sender immediately and destroy all copies of this message
> and any attachments.
> WARNING: Computer viruses can be transmitted via email. The
> recipient should check this email and any attachments for the
> presence of viruses. The company accepts no liability for any damage
> caused by any virus transmitted by this email.
> users mailing list