Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: [OMPI devel] 1.7.5 and trunk failures
From: Ralph Castain (rhc_at_[hidden])
Date: 2014-03-18 23:22:02


Just to be safe, I blew away my existing installations and got completely fresh checkouts. I am doing a vanilla configure, with the only configure options besides prefix being --enable-orterun-prefix-by-default and --enable-mpi-java (so I can test the Java bindings)

For 1.7.5, running the IBM test suite, I get the following failures on my 2-node cluster, running map-by node:

*** WARNING: Test: ineighbor_allgatherv, np=2, variant=1: FAILED
*** WARNING: Test: neighbor_allgatherv, np=2, variant=1: FAILED
*** WARNING: Test: ineighbor_alltoallv, np=2, variant=1: FAILED
*** WARNING: Test: ineighbor_alltoall, np=2, variant=1: FAILED
*** WARNING: Test: neighbor_alltoallw, np=2, variant=1: FAILED
*** WARNING: Test: neighbor_alltoallv, np=2, variant=1: FAILED
*** WARNING: Test: neighbor_alltoall, np=2, variant=1: FAILED
*** WARNING: Test: ineighbor_alltoallw, np=2, variant=1: FAILED
*** WARNING: Test: ineighbor_allgather, np=2, variant=1: FAILED
*** WARNING: Test: neighbor_allgather, np=2, variant=1: FAILED
*** WARNING: Test: create_group_usempi, np=2, variant=1: FAILED
*** WARNING: Test: create_group_mpifh, np=2, variant=1: FAILED
*** WARNING: Test: create_group, np=2, variant=1: FAILED
*** WARNING: Test: idx_null, np=2, variant=1: FAILED

From the Intel test suite:

*** WARNING: Test: MPI_Keyval3_c, np=6, variant=1: FAILED
*** WARNING: Test: MPI_Allgatherv_c, np=6, variant=1: TIMED OUT (failed)
*** WARNING: Test: MPI_Graph_create_undef_c, np=6, variant=1: FAILED

I subsequently removed the map-by node directive so everything basically ran on the head node with mpirun, just in case having the procs on separate nodes was the cause of the problem. However, the exact same failures were observed again.

Note that the 1.7.5 branch ran clean (except for idx_null, which we understand) yesterday, so this is caused by something new today. I then tested the trunk and got the identical errors.

I don't see how we can release with this situation, so we appear to be stuck until someone can figure out what happened and fix it.
Ralph