Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] CP2K mpi hang
From: Ashley Pittman (ashley_at_[hidden])
Date: 2009-05-19 09:32:30

On Mon, 2009-05-18 at 17:05 -0400, Noam Bernstein wrote:
> The code is complicated, the input files are big and lead to long
> computation
> times, so I don't think I'll be able to make a simple test case.
> Instead
> I attached to the hanging processes (all 8 of them) with gdb
> during the hang. The stack trace is below. Nodes seem to spend most of
> their time in the btl_openib_component_progress(), and occasionally in
> mca_pml_ob1_progress(). I.e. not completely stuck, but not making
> progress.

Can you confirm that *all* processes are in PMPI_Allreduce at some
point, the collectives commonly get blamed for a lot of hangs and it's
not always the correct place to look.

> P.S. I get a similar hang with MVAPICH, in a nearby but different part
> of the
> code (on an MPI_Bcast, specifically), increasing my tendency to believe
> that it's OFED's fault. But maybe the stack trace will suggest to
> someone
> where it might be stuck, and therefore perhaps an mca flag to try?

This strikes me as a filesystem problem more than MPI per se. Again
with MVAPICH are all your processes in MPI_Bcast or just some of them?