Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] 1.7.5 and trunk failures
From: Jeff Squyres (jsquyres) (jsquyres_at_[hidden])
Date: 2014-03-19 16:59:00


For the list: we figured this out. These neighbor tests require np>=4 (whew!). I added minimum np checks to the tests so that they'll skip (exit 77) if np<4. Nathan and I worked through the other three tests.

On Mar 18, 2014, at 11:22 PM, Ralph Castain <rhc_at_[hidden]> wrote:

> Just to be safe, I blew away my existing installations and got completely fresh checkouts. I am doing a vanilla configure, with the only configure options besides prefix being --enable-orterun-prefix-by-default and --enable-mpi-java (so I can test the Java bindings)
>
> For 1.7.5, running the IBM test suite, I get the following failures on my 2-node cluster, running map-by node:
>
> *** WARNING: Test: ineighbor_allgatherv, np=2, variant=1: FAILED
> *** WARNING: Test: neighbor_allgatherv, np=2, variant=1: FAILED
> *** WARNING: Test: ineighbor_alltoallv, np=2, variant=1: FAILED
> *** WARNING: Test: ineighbor_alltoall, np=2, variant=1: FAILED
> *** WARNING: Test: neighbor_alltoallw, np=2, variant=1: FAILED
> *** WARNING: Test: neighbor_alltoallv, np=2, variant=1: FAILED
> *** WARNING: Test: neighbor_alltoall, np=2, variant=1: FAILED
> *** WARNING: Test: ineighbor_alltoallw, np=2, variant=1: FAILED
> *** WARNING: Test: ineighbor_allgather, np=2, variant=1: FAILED
> *** WARNING: Test: neighbor_allgather, np=2, variant=1: FAILED
> *** WARNING: Test: create_group_usempi, np=2, variant=1: FAILED
> *** WARNING: Test: create_group_mpifh, np=2, variant=1: FAILED
> *** WARNING: Test: create_group, np=2, variant=1: FAILED
> *** WARNING: Test: idx_null, np=2, variant=1: FAILED
>
>
> From the Intel test suite:
>
> *** WARNING: Test: MPI_Keyval3_c, np=6, variant=1: FAILED
> *** WARNING: Test: MPI_Allgatherv_c, np=6, variant=1: TIMED OUT (failed)
> *** WARNING: Test: MPI_Graph_create_undef_c, np=6, variant=1: FAILED
>
> I subsequently removed the map-by node directive so everything basically ran on the head node with mpirun, just in case having the procs on separate nodes was the cause of the problem. However, the exact same failures were observed again.
>
> Note that the 1.7.5 branch ran clean (except for idx_null, which we understand) yesterday, so this is caused by something new today. I then tested the trunk and got the identical errors.
>
> I don't see how we can release with this situation, so we appear to be stuck until someone can figure out what happened and fix it.
> Ralph
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: http://www.open-mpi.org/community/lists/devel/2014/03/14369.php

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/