Yeah bummers, but something tells me it might not be OpenMPI's fault. Here's why:

1- The tech that takes care of these machines told me that he gets RTC errors on bootup (the cpu borads are apprantly "out of sync" since the clocks aren't set correctly).

2- There is also a possibility that the prior admin did not put in a "stable" firmware version.

So if any Sun guru can help out by telling me which command or point to a quick HOWTO for resolvin these clock issues, it would be greatly appreciated (our analyst is overloaded and he would not be able to justify the 3 days of reading up docs just to satisfy my running parallel code problems ;P)

3- I realised that the OS is not booted in 64 O_o!! (not that this has to do with OpenMPI bombing):

Jun 21 07:45:15 unknown genunix: [ID 540533 kern.notice] ^MSunOS Release 5.8 Version Generic_108528-29 32-bit

Jun 21 07:45:15 unknown NOTICE: 64-bit OS installed, but the 32-bit OS is the default

Jun 21 07:45:15 unknown Booting the 32-bit OS ...

4- LAM-MPI 7.1.1 also bombs, but it does so at a much higher processor count (OpenMPI bombs at 5, LAM-MPI bombs around 10, but it vraies).

As for the questions regarding OpenMPI build, I just recently built 1.1 with the same basic configure options with the exact same results (clean cache).

So, I guess this one is on pause untill I have the confirmation that the clocks on the processor boards are set correctly. There is one this that bothers me though, one of the machines has only 1 processor board (4 procs) and I still get the error on that machine if I go over 4 can a board be out of sync with itself??


PS: I am at liberty of providing the source code if anyone wants it.

