Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI devel] oshmem test suite errors
From: Jeff Squyres (jsquyres) (jsquyres_at_[hidden])
Date: 2014-02-20 10:44:03

Yes, I've added them to my Cisco MTT ini files in the ompi-svn repo. Look in cisco/mtt/usnic/usnic-trunk.ini and usnic-v1.7.ini.

All relevant sections have "oshmem" in them.

Most are copied from the Mellanox examples, but I made a few tweaks/improvements here and there. I also anticipate adjusting some of the timeouts as we get a few MTT oshmem runs done in some of the sections for some longer-running tests (at np=32 and possibly 64).

On Feb 20, 2014, at 10:34 AM, Ralph Castain <rhc_at_[hidden]> wrote:

> Could you send along the relevant mtt .ini sections?
> On Feb 20, 2014, at 7:10 AM, Jeff Squyres (jsquyres) <jsquyres_at_[hidden]> wrote:
>> For all of these, I'm using the openshmem test suite that is now committed to the ompi-svn SVN repo. I don't know if the errors are with the tests or with oshmem itself.
>> 1. I'm running the oshmem test suite at 32 processes across 2 16-core servers. I'm seeing a segv in "examples/shmem_2dheat.x 10 10". It seems to run fine at lower np values such as 2, 4, and 8; I didn't try to determine where the crossover to badness occurs.
>> 2. "examples/adjacent_32bit_amo.x 10 10" seems to hang with both tcp and usnic BTLs, even when running at np=2 (I let it run for several minutes before killing it).
>> 3. Ditto for "example/ptp.x 10 10".
>> 4. "examples/shmem_matrix.x 10 10" seems to run fine at np=32 on usnic, but hangs with TCP (i.e., I let it run for 8+ minutes before killing it -- perhaps it would have finished eventually?).
>> ...there's more results (more timeouts and more failures), but they're not yet complete, and I've got to keep working on my own features for v1.7.5, so I need to move to other things right now.
>> I think I have oshmem running well enough to add these to Cisco's nightly MTT runs now, so the results will start showing up there without needing my manual attention.
>> --
>> Jeff Squyres
>> jsquyres_at_[hidden]
>> For corporate legal information go to:
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
> _______________________________________________
> devel mailing list
> devel_at_[hidden]

Jeff Squyres
For corporate legal information go to: