Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] oshmem test suite errors
From: Brian Barrett (brian_at_[hidden])
Date: 2014-02-20 11:48:14

On Feb 20, 2014, at 7:10 AM, Jeff Squyres (jsquyres) <jsquyres_at_[hidden]> wrote:

> For all of these, I'm using the openshmem test suite that is now committed to the ompi-svn SVN repo. I don't know if the errors are with the tests or with oshmem itself.
> 1. I'm running the oshmem test suite at 32 processes across 2 16-core servers. I'm seeing a segv in "examples/shmem_2dheat.x 10 10". It seems to run fine at lower np values such as 2, 4, and 8; I didn't try to determine where the crossover to badness occurs.

My memory is bad and my notes are on a machine I no longer have access to, but I did this to the test suite run for Portals SHMEM:

Index: shmem_2dheat.c
--- shmem_2dheat.c (revision 270)
+++ shmem_2dheat.c (revision 271)
@@ -129,6 +129,11 @@
   p = _num_pes ();
   my_rank = _my_pe ();
+ if (p > 8) {
+ fprintf(stderr, "Ignoring test when run with more than 8 pes\n");
+ return 77;
+ }
   /* argument processing done by everyone */
   int c, errflg;
   extern char *optarg;

The commit comment was that there was a scaling issue in the code itself, I just wish I could remember exactly what it was.

> 2. "examples/adjacent_32bit_amo.x 10 10" seems to hang with both tcp and usnic BTLs, even when running at np=2 (I let it run for several minutes before killing it).

If atomics aren't fast, this test can run for a very long time (also, it takes no arguments, so the 10 10 is being ignored). It's essentially looking for a race by blasting 32-bit atomic ops at both parts of a 64 bit word.

> 3. Ditto for "example/ptp.x 10 10".
> 4. "examples/shmem_matrix.x 10 10" seems to run fine at np=32 on usnic, but hangs with TCP (i.e., I let it run for 8+ minutes before killing it -- perhaps it would have finished eventually?).
> ...there's more results (more timeouts and more failures), but they're not yet complete, and I've got to keep working on my own features for v1.7.5, so I need to move to other things right now.

These start to sound like issues in the code; those last two are pretty decent tests.

> I think I have oshmem running well enough to add these to Cisco's nightly MTT runs now, so the results will start showing up there without needing my manual attention.



 Brian Barrett
 There is an art . . . to flying. The knack lies in learning how to
 throw yourself at the ground and miss.
     Douglas Adams, 'The Hitchhikers Guide to the Galaxy'