Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] Call stack upon MPI routine error
From: Vince Grimes (tom.grimes_at_[hidden])
Date: 2014-03-21 15:50:38

OpenMPI folks:

        I have mentioned before a problem with an in-house code (ScalIT) that
generates the error message

[[31552,1],84][btl_openib_component.c:3492:handle_wc] from
compute-4-5.local to: compute-4-13 error polling LP CQ with status LOCAL
QP OPERATION ERROR status number 2 for wr_id 246f300 opcode 128 vendor
error 107 qp_idx 0

at a specific, reproducible point. It was suggested that the error could
be due to memory problems, such as the amount of registered memory. I
have already corrected the amount of registered memory per the URLs that
were given to me. My question today is two-fold:

First, is it possible that ScalIT uses so much memory that there is no
memory to register for IB communications? ScalIT is very
memory-intensive and has to run distributed just to get a large matrix
in memory (split between nodes).

Second, is there a way to trap that error so I can see the call stack,
showing the MPI function called and exactly where in the code the error
was generated?

T. Vince Grimes, Ph.D.
CCC System Administrator
Texas Tech University
Dept. of Chemistry and Biochemistry (10A)
Box 41061
Lubbock, TX 79409-1061
(806) 834-0813 (voice);     (806) 742-1289 (fax)