Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Allgather failures?
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2009-07-29 00:26:58


These are the MPI_COMPLEX failures that I reported to George last week.

On Jul 28, 2009, at 8:06 PM, Ralph Castain wrote:

> Hi folks
>
> I was reviewing the trunk MTT results tonight and found a ton of
> failures in the Intel test suite on IU's odin cluster. That cluster -
> usually- runs pretty clean, so I took a closer look.
>
> What I found was that the errors were all typified by the following:
>
> MPITEST_INFO ( 0): Starting test MPI_Allgather()
> [odin001:31038] *** Process received signal ***
> [odin001:31038] Signal: Floating point exception (8)
> [odin001:31038] Signal code: Integer divide-by-zero (1)
> [odin001:31038] Failing at address: 0x804c8c9
> [odin001:31039] *** Process received signal ***
> [odin001:31039] Signal: Floating point exception (8)
> [odin001:31039] Signal code: Integer divide-by-zero (1)
> [odin001:31039] Failing at address: 0x804c8c9
> [odin001:31040] *** Process received signal ***
> [odin001:31040] Signal: Floating point exception (8)
> [odin001:31040] Signal code: Integer divide-by-zero (1)
> [odin001:31040] Failing at address: 0x804c8c9
> [odin001:31038] [ 0] [0xffffe600]
> [odin001:31038] [ 1] src/MPI_Allgather_f(MAIN__+0x2db) [0x804b30f]
> [odin001:31038] [ 2] src/MPI_Allgather_f(main+0x27) [0x805aa57]
> [odin001:31038] [ 3] /lib/libc.so.6(__libc_start_main+0xdc)
> [0xf7c32dec]
> [odin001:31038] [ 4] src/MPI_Allgather_f [0x804af81]
> [odin001:31038] *** End of error message ***
>
>
> In other words, a divide-by-zero floating point exception on a
> collective test.
>
> Any ideas what might be causing this?
>
> Ralph
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>

-- 
Jeff Squyres
jsquyres_at_[hidden]