> Hi Brock
>> We have a user whos code keep failing at a similar point in the
>> code. The errors (below) would make me think its a fabric problem,
>> but ibcheckerrors is not returning any issues. He is using
>> openmpi-1.2.0 With OFED on RHEL4,
> Strangely enough, I hit this exact problem about half an hour ago...
> what compilers is he using for the code / OpenMPI? I haven't narrowed
> down the cause yet because the system I'm on is a tad, uh, disheveled,
> but it'd be good to find any commonality. I'm using PGI-7.1-2
> (pgf77/pgf90) with OpenMPI-1.2.4. The system also happens to be
> RHEL 4
> (Update 3).
We are also running PGI compilers version 6.2. We have Cisco
(topspin) IB hardware, and using OFED 1.1 stock with red hat.
Is this the same you are using?
> .. Also, the code I'm running is CCSM, and it gave an error message
> about being unable to read a file correctly right before my
> synchronization. This code has worked on other systems in the past
> (non-IB, non-IBRIX), but something as basic as a file write
> shouldn't be
> adversely affected by such things, hence I'm going to try backing the
> compiler down to a 'known-good' one first., since perhaps that's my
> problem. I don't suppose you saw any messages of that sort? I did
> already try setting the retry count parameter up to 20 (from 7), but
> that didn't fix it.
> - Brian
> Brian Dobbins
> Yale University HPC
> users mailing list