Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] Call stack upon MPI routine error
From: Vince Grimes (tom.grimes_at_[hidden])
Date: 2014-03-21 15:50:38


OpenMPI folks:

        I have mentioned before a problem with an in-house code (ScalIT) that
generates the error message

[[31552,1],84][btl_openib_component.c:3492:handle_wc] from
compute-4-5.local to: compute-4-13 error polling LP CQ with status LOCAL
QP OPERATION ERROR status number 2 for wr_id 246f300 opcode 128 vendor
error 107 qp_idx 0

at a specific, reproducible point. It was suggested that the error could
be due to memory problems, such as the amount of registered memory. I
have already corrected the amount of registered memory per the URLs that
were given to me. My question today is two-fold:

First, is it possible that ScalIT uses so much memory that there is no
memory to register for IB communications? ScalIT is very
memory-intensive and has to run distributed just to get a large matrix
in memory (split between nodes).

Second, is there a way to trap that error so I can see the call stack,
showing the MPI function called and exactly where in the code the error
was generated?

-- 
T. Vince Grimes, Ph.D.
CCC System Administrator
Texas Tech University
Dept. of Chemistry and Biochemistry (10A)
Box 41061
Lubbock, TX 79409-1061
(806) 834-0813 (voice);     (806) 742-1289 (fax)