Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2007-01-19 20:15:46


On Jan 19, 2007, at 6:19 PM, Arif Ali wrote:

> > [0,1,59][btl_openib_component.c:
> 1153:btl_openib_component_progress] from
> > node16 to: node02 error polling HP CQ with status REMOTE ACCESS
> ERROR
> > status number 10 for wr_id 268919352 opcode 256614836
> > mpirun noticed that job rank 0 with PID 0 on node node02 exited on
> > signal 15 (Terminated).
> > 55 additional processes aborted (not shown)
> does this happen with btl_openib_flags=1? Does this also happen
> without
> this setting. This doesn't happen with OpenMPI-1.2b3 right?
>
>
> That's Correct, I tried all the flags that was suggested, and a few
> more, which I listed in previous mails

I can parse your text either way, so forgive me for belaboring the
point:

- Does this happen with btl_openib_flags=1 on the nightly snapshot of
OMPI v1.2?
- Does this happen without setting btl_openib_flags on the nightly
snapshot of OMPI v1.2?
- What is the exact version of the nightly snapshot for OMPI v1.2
that you are using?

> Yes, correct, this doesn't happen with 1.2b3

Good to know.

Were you able to experiment with the various MCA parameters that I
described in the other mail to see if such problems went away?
(i.e., ensure that you're not running out of DMA-able memory)

-- 
Jeff Squyres
Server Virtualization Business Unit
Cisco Systems