Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Calculation stuck in MPI
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2009-03-04 09:39:54


No, it is not obvious, unfortunately. Can you send all the
information listed here:

     http://www.open-mpi.org/community/help/

On Mar 3, 2009, at 5:22 AM, Ondrej Marsalek wrote:

> Dear everyone,
>
> I have a calculation (the CP2K program) using MPI over Infiniband and
> it is stuck. All processes (16 on 4 nodes) are running, taking 100%
> CPU. Attaching a debugger reveals this (only the end of the stack
> shown here):
>
> (gdb) backtrace
> #0 0x00002b3460916dbf in btl_openib_component_progress () from
> /home/marsalek/opt/openmpi-1.3-intel/lib/openmpi/mca_btl_openib.so
> #1 0x00002b345c22c778 in opal_progress () from
> /home/marsalek/opt/openmpi-1.3-intel/lib/libopen-pal.so.0
> #2 0x00002b345bd2d66d in ompi_request_default_wait_any () from
> /home/marsalek/opt/openmpi-1.3-intel/lib/libmpi.so.0
> #3 0x00002b345bd6021a in PMPI_Waitany () from
> /home/marsalek/opt/openmpi-1.3-intel/lib/libmpi.so.0
> #4 0x00002b345bae77f1 in pmpi_waitany__ () from
> /home/marsalek/opt/openmpi-1.3-intel/lib/libmpi_f77.so.0
>
> It has survived a restart of the IB switch, unlike "healthy" runs. My
> question is - is it obvious at what level the problem is? IB, Open
> MPI, application?I would be glad to provide detailed information, if
> anyone was willing to help. I want to work on this, but unfortunately
> I am not sure where to begin.
>
> Best regards,
> Ondrej Marsalek
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Jeff Squyres
Cisco Systems