Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: [OMPI devel] Allgather failures?
From: Ralph Castain (rhc_at_[hidden])
Date: 2009-07-28 21:06:54


Hi folks

I was reviewing the trunk MTT results tonight and found a ton of
failures in the Intel test suite on IU's odin cluster. That cluster -
usually- runs pretty clean, so I took a closer look.

What I found was that the errors were all typified by the following:

  MPITEST_INFO ( 0): Starting test MPI_Allgather()
[odin001:31038] *** Process received signal ***
[odin001:31038] Signal: Floating point exception (8)
[odin001:31038] Signal code: Integer divide-by-zero (1)
[odin001:31038] Failing at address: 0x804c8c9
[odin001:31039] *** Process received signal ***
[odin001:31039] Signal: Floating point exception (8)
[odin001:31039] Signal code: Integer divide-by-zero (1)
[odin001:31039] Failing at address: 0x804c8c9
[odin001:31040] *** Process received signal ***
[odin001:31040] Signal: Floating point exception (8)
[odin001:31040] Signal code: Integer divide-by-zero (1)
[odin001:31040] Failing at address: 0x804c8c9
[odin001:31038] [ 0] [0xffffe600]
[odin001:31038] [ 1] src/MPI_Allgather_f(MAIN__+0x2db) [0x804b30f]
[odin001:31038] [ 2] src/MPI_Allgather_f(main+0x27) [0x805aa57]
[odin001:31038] [ 3] /lib/libc.so.6(__libc_start_main+0xdc) [0xf7c32dec]
[odin001:31038] [ 4] src/MPI_Allgather_f [0x804af81]
[odin001:31038] *** End of error message ***

In other words, a divide-by-zero floating point exception on a
collective test.

Any ideas what might be causing this?

Ralph