Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

From: Matt Leininger (mlleinin_at_[hidden])
Date: 2006-11-27 18:54:10


On Mon, 2006-11-27 at 16:29 -0700, Brian W Barrett wrote:
> On Nov 27, 2006, at 4:19 PM, Matt Leininger wrote:
>
> > I've been running more tests of OpenMPI v1.2b. I've run into several
> > cases where the app+MPI use too much memory and the OOM handler kills
> > off tasks. Sometimes the ompi mpirun shuts down gracefully, but other
> > times the OOM handler may kill off 1 to 4 MPI tasks per node (when I'm
> > using 8 MPI tasks per node). The remaining MPI tasks keep
> > running/polling and have to be killed off by hand. Has anyone seen
> > this
> > behavior before?
>
> Are the orteds also getting killed?

  Not sure. I'll check the next time I see this.

> It's a known problem that if the
> orted is killed by outside forces, everything kind of hangs. We're
> working on this one, and hope to have it fixed by the time 1.2 ships.

  That could be the problem.

>
> I'm not really familiar with the OOM killer -- does it cause the
> parent of the killed process to get a SIGCHLD? If not, that could be
> a fairly serious problem for us, as we rely on SIGCHLDs being
> received by the orteds when things die...

  Mark Grondona could answer this. His reply to devel-core bounced so
I'm including devel_at_[hidden] on this thread.

  - Matt

>
> Brian
> _______________________________________________
> devel-core mailing list
> devel-core_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel-core
>