Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] Calculation stuck in MPI
From: Ondrej Marsalek (ondrej.marsalek_at_[hidden])
Date: 2009-03-03 05:22:06


Dear everyone,

I have a calculation (the CP2K program) using MPI over Infiniband and
it is stuck. All processes (16 on 4 nodes) are running, taking 100%
CPU. Attaching a debugger reveals this (only the end of the stack
shown here):

(gdb) backtrace
#0 0x00002b3460916dbf in btl_openib_component_progress () from
/home/marsalek/opt/openmpi-1.3-intel/lib/openmpi/mca_btl_openib.so
#1 0x00002b345c22c778 in opal_progress () from
/home/marsalek/opt/openmpi-1.3-intel/lib/libopen-pal.so.0
#2 0x00002b345bd2d66d in ompi_request_default_wait_any () from
/home/marsalek/opt/openmpi-1.3-intel/lib/libmpi.so.0
#3 0x00002b345bd6021a in PMPI_Waitany () from
/home/marsalek/opt/openmpi-1.3-intel/lib/libmpi.so.0
#4 0x00002b345bae77f1 in pmpi_waitany__ () from
/home/marsalek/opt/openmpi-1.3-intel/lib/libmpi_f77.so.0

It has survived a restart of the IB switch, unlike "healthy" runs. My
question is - is it obvious at what level the problem is? IB, Open
MPI, application?I would be glad to provide detailed information, if
anyone was willing to help. I want to work on this, but unfortunately
I am not sure where to begin.

Best regards,
Ondrej Marsalek