I know a few national labs that run OMPI w/Fedora 9, but that isn't on Nehalem hardware and is using gcc 4.3.x.
However, I think the key issue really is the compiler. I have seen similar problems on multiple platforms and OS's whenever I use GCC 4.4.x - I -think- it has to do with the automatic vectorization in that compiler, but I can't swear to it.
You can always install a personal copy of gcc for your own use on the system and see if that solves the problem. Just download a version like 4.3.x from the gnu site.
I know 4.3.x doesn't have a problem, though again I haven't tried it on Nehalem.
On May 6, 2010, at 12:10 PM, Gus Correa wrote:
> Hi Jeff
> Thank you for your testimony.
> So now I have two important data points (you and Douglas Guptill)
> to support the argument here that installing Fedora
> on machines meant to do scientific and parallel computation
> is to ask for trouble.
> I use CentOS in our cluster, but this is a standalone machine
> I don't have control of.
> Anybody out there using Open MPI + Fedora Core + Nehalem ?
> Gus Correa
> Jeff Squyres wrote:
>> On May 6, 2010, at 1:11 PM, Gus Correa wrote:
>>> Just for the record, I am using:
>>> Open MPI 1.4.2 (released 2 days ago), gcc 4.4.3 (g++, gfortran).
>>> All on Fedora Core 12, kernel 188.8.131.52-99.fc12.x86_64 #1 SMP.
>> Someone asked earlier in this thread -- I've used RHEL4.4 and RHEL5.4 on my Nehalem EP boxen. I used the default gcc on those RHELs for compiling everything (OMPI + apps). I don't remember what it was on RHEL 4.4, but on RHEL 5.4, it's GCC 4.1.2.
>>> You and Jeff reported that your
>>> Nehalems get along with Open MPI.
>>> I would guess other people have functional Open MPI + Nehalem systems.
>>> All I can think of is that some mess with the OS/gcc is causing
>>> the trouble here.
>> I don't have much experience with kernels outside
> of the RHEL kernels,
> so I don't know if 2.6.32 is problematic or not. :-(
> users mailing list