Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] possible bug in 1.3.2 sm transport
From: Bryan Lally (lally_at_[hidden])
Date: 2009-05-18 22:42:26


Eugene Loh wrote:
> Ralph Castain wrote:
>
>> Hi Bryan
>>
>> I have seen similar issues on LANL clusters when message sizes were
>> fairly large. How big are your buffers when you call Allreduce? Can
>> you send us your Allreduce call params (e.g., the reduce operation,
>> datatype, num elements)?
>>
>> If you don't want to send that to the list, you can send it to me at
>> LANL.
>
> I haven't seen any updates on this. Please tell me Bryan sent info to
> Ralph at LANL and Ralph nailed this one. Please! :^)

Ralph and I took this off line.

I'm so far unable to reproduce the problem on a node of roadrunner,
which is 4 x86_64 cores, openmpi 1.3.2, and sm for transport. That
openmpi was built with some special platform files, not a configure run
without the platform files. Ralph sent me the platform files and I'm
about to build my own version on the small 8 core machine where the
problem first showed up.

I'll report more as soon as I know more. Hopefully in the morning.

        - Bryan

-- 
Bryan Lally, lally_at_[hidden]
505.667.9954
CCS-2
Los Alamos National Laboratory
Los Alamos, New Mexico