Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] 32-bit openib is broken on the trunk as of Nov 27th, r16799
From: Gleb Natapov (glebn_at_[hidden])
Date: 2007-12-09 05:02:33


On Wed, Dec 05, 2007 at 02:45:17PM -0500, Tim Mattox wrote:
> Hello,
> It appears that sometime after r16777, and by r16799, that something
> was broken on the trunk's openib support for 32-bit builds.
> The 64-bit tests all seem normal, as well as the 32-bit & 64-bit tests on
> the 1.2 branch on the same machine (odin).
>
> See this MTT results page permalink showing the 32-bit odin runs:
> http://www.open-mpi.org/mtt/index.php?do_redir=468
>
> Pasha & Gleb, you both did a variety of checkins in that svn r# range.
> Do either of you have time to investigate this?
>
> Here is a snippet from one randomly picked failed test (out of thousands):
> [1,1][btl_openib_component.c:1665:btl_openib_module_progress] from
> odin001 to: odin001 error
> polling LP CQ with status LOCAL PROTOCOL ERROR status number 4 for
> wr_id 141733120 opcode 128
> qp_idx 3
> --------------------------------------------------------------------------
> mpirun has exited due to process rank 1 with PID 29761 on
> node odin001 calling "abort". This will have caused other processes
> in the application to be terminated by signals sent by mpirun
> (as reported here).
> --------------------------------------------------------------------------
>
> Thanks, and happy bug hunting!
I know where the problem is. Will fix this week.

--
			Gleb.