Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

From: Sven Stork (stork_at_[hidden])
Date: 2007-08-01 05:08:01


Hi,

since yesterday I noticed that Netpipe and sometimes IMB are hanging. As far
as I saw both processes stuck in a receive. The wired thing is that if I run
it in a debugger everything works fine.

Cheers,
  Sven

On Tuesday 31 July 2007 23:47, Jeff Squyres wrote:
> I'm getting a pile of test failures when running with the openib and
> tcp BTLs on the trunk. Gleb is getting some failures, too, but his
> seem to be different than mine.
>
> Here's what I'm seeing from manual MTT runs on my SVN/development
> install -- did you know that MTT could do that? :-)
>
> +-------------+-------------------+------+------+----------+------+
> | Phase | Section | Pass | Fail | Time out | Skip |
> +-------------+-------------------+------+------+----------+------+
> | Test Run | intel | 442 | 0 | 26 | 0 |
> | Test Run | ibm | 173 | 3 | 1 | 3 |
> +-------------+-------------------+------+------+----------+------+
>
> The tests that are failing are:
>
> *** WARNING: Test: MPI_Recv_pack_c, np=16, variant=1: TIMED OUT (failed)
> *** WARNING: Test: MPI_Ssend_ator_c, np=16, variant=1: TIMED OUT
> (failed)
> *** WARNING: Test: MPI_Irecv_pack_c, np=16, variant=1: TIMED OUT
> (failed)
> *** WARNING: Test: MPI_Isend_ator_c, np=16, variant=1: TIMED OUT
> (failed)
> *** WARNING: Test: MPI_Irsend_rtoa_c, np=16, variant=1: TIMED OUT
> (failed)
> *** WARNING: Test: MPI_Ssend_rtoa_c, np=16, variant=1: TIMED OUT
> (failed)
> *** WARNING: Test: MPI_Send_rtoa_c, np=16, variant=1: TIMED OUT (failed)
> *** WARNING: Test: MPI_Send_ator_c, np=16, variant=1: TIMED OUT (failed)
> *** WARNING: Test: MPI_Rsend_rtoa_c, np=16, variant=1: TIMED OUT
> (failed)
> *** WARNING: Test: MPI_Reduce_loc_c, np=16, variant=1: TIMED OUT
> (failed)
> *** WARNING: Test: MPI_Isend_ator2_c, np=16, variant=1: TIMED OUT
> (failed)
> *** WARNING: Test: MPI_Issend_rtoa_c, np=16, variant=1: TIMED OUT
> (failed)
> *** WARNING: Test: MPI_Isend_rtoa_c, np=16, variant=1: TIMED OUT
> (failed)
> *** WARNING: Test: MPI_Send_ator2_c, np=16, variant=1: TIMED OUT
> (failed)
> *** WARNING: Test: MPI_Issend_ator_c, np=16, variant=1: TIMED OUT
> (failed)
> *** WARNING: Test: comm_join, np=16, variant=1: TIMED OUT (failed)
> *** WARNING: Test: getcount, np=16, variant=1: FAILED
> *** WARNING: Test: spawn, np=3, variant=1: FAILED
> *** WARNING: Test: spawn_multiple, np=3, variant=1: FAILED
>
> I'm not too worried about the comm spawn/join tests because I think
> they're heavily oversubscribing the nodes and therefore timing out.
> These were all from a default trunk build running with "mpirun --mca
> btl openib,self".
>
> For all of these tests, I'm running on 4 nodes, 4 cores each, but
> they have varying numbers of network interfaces:
>
> nodes 1,2 nodes 3,4
> openib 3 active ports 2 active ports
> tcp 4 tcp interfaces 3 tcp interfaces
>
> Is anyone else seeing these kinds of failures?
>
> --
> Jeff Squyres
> Cisco Systems
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>