On Mar 13, 2012, at 2:54 PM, Joshua Baker-LePain wrote:
> On Tue, 13 Mar 2012 at 7:53pm, Gutierrez, Samuel K wrote
>> The failure signature isn't exactly what we were seeing here at LANL, but there were misplaced memory barriers in Open MPI 1.4.3. Ticket 2619 talks about this issue (https://svn.open-mpi.org/trac/ompi/ticket/2619). This doesn't explain, however, the failures that you are experiencing within Open MPI 1.5.4. Can you give 1.4.4 a whirl and see if this fixes the issue?
> Would it be best to use 1.4.4 specifically, or simply the most recent 1.4.x (which appears to be 1.4.5 at this point)?
Good point - please do use Open MPI 1.4.5.
>> Any more information surrounding your failures in 1.5.4 are greatly appreciated.
> I'm happy to provide, but what exactly are you looking for? The test code I'm running is *very* simple:
If you experience this type of failure with 1.4.5, can you send another backtrace? We'll go from there.
Another question. How reproducible is this on your system?
> #include <stdio.h>
> #include <mpi.h>
> main(int argc, char **argv)
> int node;
> int i, j;
> float f;
> MPI_Comm_rank(MPI_COMM_WORLD, &node);
> printf("Hello World from Node %d.\n", node);
> for(i=0; i<=1000000000000; i++)
> And my environment is a pretty standard CentOS-6.2 install.
> Joshua Baker-LePain
> QB3 Shared Cluster Sysadmin
> users mailing list