From: Josh Hursey (jjhursey_at_[hidden])
Date: 2006-11-13 10:56:06


On Nov 13, 2006, at 10:27 AM, Ethan Mallove wrote:

> I can infer that you have an MPI Install section labeled
> "odin 64 bit gcc". A few questions:
>
> * What is the mpi_get for that section (or how does that
> parameter get filled in by your automated scripts)?

I attached the generated INI file for you to look at.


It is the same value for all parallel runs of GCC+64bit (same value
for all branches)

> * Do you start with a fresh scratch tree every run?

Yep. Every run, and all of the parallel runs.

> * Could you email me your scratch/installs/mpi_installs.xml
> files?

The attached mpi_installs.xml is from the trunk+gcc+64bit parallel
scratch directory.

>
> I checked on how widespread this issue is, and found that
> 18,700 out of 474,000 Test Run rows in the past month have a
> mpi_version/command (v1.2-trunk) mismatch. Occuring in both
> directions (version=1.2, command=trunk and vice versa).
> They occur on these clusters:
>
> Cisco MPI development cluster
> IU Odin
> IU - Thor - TESTING
>

Interesting...

> There *is* that race condition in which one mtt submitting
> could overwrite another's index. Do you have "trunk" and
> "1.2" runs submitting to the database at the same time?

Yes we do. :(

The parallel blocks as we call them are separate scratch directories
in which MTT is running concurrently. Meaning that we have N parallel
block scratch directories each running one instance of MTT. So it is
possible (and highly likely) that when the reporter phase fires all
of the N parallel blocks are firing it about the same time.

Without knowing how the reporter is doing the inserts into the
database I don't think I can help much more than that on debugging.
When the reporter fires for the DB:
  - Does it start a transaction for the connection, do the inserts,
then commit?
  - Does it ship the inserts to the server then allow it to run them,
or does the client do all of the individual inserts?

-- Josh

>
>
> On Sun, Nov/12/2006 06:04:17PM, Jeff Squyres (jsquyres) wrote:
>>
>> I feel somewhat better now. Ethan - can you fix?
>> -----Original Message-----
>> From: Tim Mattox [[1]mailto:timattox_at_[hidden]]
>> Sent: Sunday, November 12, 2006 05:34 PM Eastern Standard Time
>> To: General user list for the MPI Testing Tool
>> Subject: [MTT users] Corrupted MTT database or
>> incorrucet query
>> Hello,
>> I just noticed that the MTT summary page is presenting
>> incorrect information for our recent runs at IU. It is
>> showing failures for the 1.2b1 that actaully came from
>> the trunk! See the first entry in this table:
>> http://www.open-mpi.org/mtt/reporter.php?
>> &maf_start_test_timestamp=200
>> 6-11-12%2019:12:02%20through%202006-11-12%
>> 2022:12:02&ft_platform_id=co
>>
>> ntains&tf_platform_id=IU&maf_phase=runs&maf_success=fail&by_atom=*by_
>> t
>> est_case&go=Table&maf_agg_timestamp=-
>> &mef_mpi_name=All&mef_mpi_version
>>
>> =All&mef_os_name=All&mef_os_version=All&mef_platform_hardware=All&mef
>> _
>> platform_id=All&agg_platform_id=off&1-
>> page=off&no_bookmarks&no_bookmar
>> ks
>> Click on the [i] in the upper right (the first entry)
>> to get the popup window which shows the MPIRrun cmd as:
>> mpirun -mca btl tcp,sm,self -np 6 --prefix
>> /san/homedirs/mpiteam/mtt-runs/odin/20061112-Testing-NOCLN/
>> parallel-bl
>> ock-3/installs/ompi-nightly-trunk/odin_64_bit_gcc/1.3a1r12559/
>> install
>> dynamic/spawn Note the path has "1.3a1r12559" in the
>> name... it's a run from the trunk, yet the table showed
>> this as a 1.2b1 run. There are several of these
>> missattributed errors. This would explain why Jeff saw
>> some ddt errors on the 1.2 brach yesterday, but was
>> unable to reproduce them. They were from the trunk!
>> --
>> Tim Mattox - [2]http://homepage.mac.com/tmattox/
>> tmattox_at_[hidden] || timattox_at_[hidden]
>> I'm a bright... [3]http://www.the-brights.net/
>> _______________________________________________
>> mtt-users mailing list
>> mtt-users_at_[hidden]
>> [4]http://www.open-mpi.org/mailman/listinfo.cgi/mtt-users
>>
>> References
>>
>> 1. mailto:timattox_at_[hidden]
>> 2. http://homepage.mac.com/tmattox/
>> 3. http://www.the-brights.net/
>> 4. http://www.open-mpi.org/mailman/listinfo.cgi/mtt-users
>
>> _______________________________________________
>> mtt-users mailing list
>> mtt-users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/mtt-users
>
>
> --
> -Ethan
> _______________________________________________
> mtt-users mailing list
> mtt-users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/mtt-users

----
Josh Hursey
jjhursey_at_[hidden]
http://www.open-mpi.org/