Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] OpenIB problems
From: Brian Dobbins (brian.dobbins_at_[hidden])
Date: 2007-11-21 15:23:29

Hi Brock
> We have a user whos code keep failing at a similar point in the
> code. The errors (below) would make me think its a fabric problem,
> but ibcheckerrors is not returning any issues. He is using
> openmpi-1.2.0 With OFED on RHEL4,
  Strangely enough, I hit this exact problem about half an hour ago...
what compilers is he using for the code / OpenMPI? I haven't narrowed
down the cause yet because the system I'm on is a tad, uh, disheveled,
but it'd be good to find any commonality. I'm using PGI-7.1-2
(pgf77/pgf90) with OpenMPI-1.2.4. The system also happens to be RHEL 4
(Update 3).

  .. Also, the code I'm running is CCSM, and it gave an error message
about being unable to read a file correctly right before my
synchronization. This code has worked on other systems in the past
(non-IB, non-IBRIX), but something as basic as a file write shouldn't be
adversely affected by such things, hence I'm going to try backing the
compiler down to a 'known-good' one first., since perhaps that's my
problem. I don't suppose you saw any messages of that sort? I did
already try setting the retry count parameter up to 20 (from 7), but
that didn't fix it.

  - Brian

Brian Dobbins
Yale University HPC