Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] openib issues
From: Mike Dubman (mike.ompi_at_[hidden])
Date: 2010-08-10 07:19:45


Hey Eloi,

What HCA card do you have ? Can you post code/instructions howto reproduce
it?
10x
Mike

On Mon, Aug 9, 2010 at 5:22 PM, Eloi Gaudry <eg_at_[hidden]> wrote:

> Hi,
>
> Could someone have a look on these two different error messages ? I'd like
> to know the reason(s) why they were displayed and their actual meaning.
>
> Thanks,
> Eloi
>
> On Monday 19 July 2010 16:38:57 Eloi Gaudry wrote:
> > Hi,
> >
> > I've been working on a random segmentation fault that seems to occur
> during
> > a collective communication when using the openib btl (see [OMPI users]
> > [openib] segfault when using openib btl).
> >
> > During my tests, I've come across different issues reported by
> > OpenMPI-1.4.2:
> >
> > 1/
> > [[12770,1],0][btl_openib_component.c:3227:handle_wc] from bn0103 to:
> bn0122
> > error polling LP CQ with status LOCAL LENGTH ERROR status number 1 for
> > wr_id 560618664 opcode 1 vendor error 105 qp_idx 3
> >
> > 2/
> > [[992,1],6][btl_openib_component.c:3227:handle_wc] from pbn04 to: pbn05
> > error polling LP CQ with status REMOTE ACCESS ERROR status number 10 for
> > wr_id 162858496 opcode 1 vendor error 136 qp_idx
> > 0[[992,1],5][btl_openib_component.c:3227:handle_wc] from pbn05 to: pbn04
> > error polling HP CQ with status WORK REQUEST FLUSHED ERROR status number
> 5
> > for wr_id 485900928 opcode 0 vendor error 249 qp_idx 0
> >
> >
> --------------------------------------------------------------------------
> > The OpenFabrics stack has reported a network error event. Open MPI will
> > try to continue, but your job may end up failing.
> >
> > Local host: p'"
> > MPI process PID: 20743
> > Error number: 3 (IBV_EVENT_QP_ACCESS_ERR)
> >
> > This error may indicate connectivity problems within the fabric; please
> > contact your system administrator.
> >
> --------------------------------------------------------------------------
> >
> > I'd like to know what these two errors mean and where they come from.
> >
> > Thanks for your help,
> > Eloi
>
> --
>
>
> Eloi Gaudry
>
> Free Field Technologies
> Company Website: http://www.fft.be
> Company Phone: +32 10 487 959
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>