Eugene Loh wrote:
> Ralph Castain wrote:
>> Hi Bryan
>> I have seen similar issues on LANL clusters when message sizes were
>> fairly large. How big are your buffers when you call Allreduce? Can
>> you send us your Allreduce call params (e.g., the reduce operation,
>> datatype, num elements)?
>> If you don't want to send that to the list, you can send it to me at
> I haven't seen any updates on this. Please tell me Bryan sent info to
> Ralph at LANL and Ralph nailed this one. Please! :^)
Ralph and I took this off line.
I'm so far unable to reproduce the problem on a node of roadrunner,
which is 4 x86_64 cores, openmpi 1.3.2, and sm for transport. That
openmpi was built with some special platform files, not a configure run
without the platform files. Ralph sent me the platform files and I'm
about to build my own version on the small 8 core machine where the
problem first showed up.
I'll report more as soon as I know more. Hopefully in the morning.
Bryan Lally, lally_at_[hidden]
Los Alamos National Laboratory
Los Alamos, New Mexico