Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI devel] 32-bit openib is broken on the trunk as of Nov 27th, r16799
From: Gleb Natapov (glebn_at_[hidden])
Date: 2007-12-09 05:02:33


On Wed, Dec 05, 2007 at 02:45:17PM -0500, Tim Mattox wrote:
> Hello,
> It appears that sometime after r16777, and by r16799, that something
> was broken on the trunk's openib support for 32-bit builds.
> The 64-bit tests all seem normal, as well as the 32-bit & 64-bit tests on
> the 1.2 branch on the same machine (odin).
>
> See this MTT results page permalink showing the 32-bit odin runs:
> http://www.open-mpi.org/mtt/index.php?do_redir=468
>
> Pasha & Gleb, you both did a variety of checkins in that svn r# range.
> Do either of you have time to investigate this?
>
> Here is a snippet from one randomly picked failed test (out of thousands):
> [1,1][btl_openib_component.c:1665:btl_openib_module_progress] from
> odin001 to: odin001 error
> polling LP CQ with status LOCAL PROTOCOL ERROR status number 4 for
> wr_id 141733120 opcode 128
> qp_idx 3
> --------------------------------------------------------------------------
> mpirun has exited due to process rank 1 with PID 29761 on
> node odin001 calling "abort". This will have caused other processes
> in the application to be terminated by signals sent by mpirun
> (as reported here).
> --------------------------------------------------------------------------
>
> Thanks, and happy bug hunting!
I know where the problem is. Will fix this week.

--
			Gleb.