On Nov 13, 2006, at 10:27 AM, Ethan Mallove wrote:
> I can infer that you have an MPI Install section labeled
> "odin 64 bit gcc". A few questions:
> * What is the mpi_get for that section (or how does that
> parameter get filled in by your automated scripts)?
I attached the generated INI file for you to look at.
It is the same value for all parallel runs of GCC+64bit (same value
for all branches)
> * Do you start with a fresh scratch tree every run?
Yep. Every run, and all of the parallel runs.
> * Could you email me your scratch/installs/mpi_installs.xml
The attached mpi_installs.xml is from the trunk+gcc+64bit parallel
> I checked on how widespread this issue is, and found that
> 18,700 out of 474,000 Test Run rows in the past month have a
> mpi_version/command (v1.2-trunk) mismatch. Occuring in both
> directions (version=1.2, command=trunk and vice versa).
> They occur on these clusters:
> Cisco MPI development cluster
> IU Odin
> IU - Thor - TESTING
> There *is* that race condition in which one mtt submitting
> could overwrite another's index. Do you have "trunk" and
> "1.2" runs submitting to the database at the same time?
Yes we do. :(
The parallel blocks as we call them are separate scratch directories
in which MTT is running concurrently. Meaning that we have N parallel
block scratch directories each running one instance of MTT. So it is
possible (and highly likely) that when the reporter phase fires all
of the N parallel blocks are firing it about the same time.
Without knowing how the reporter is doing the inserts into the
database I don't think I can help much more than that on debugging.
When the reporter fires for the DB:
- Does it start a transaction for the connection, do the inserts,
- Does it ship the inserts to the server then allow it to run them,
or does the client do all of the individual inserts?
> On Sun, Nov/12/2006 06:04:17PM, Jeff Squyres (jsquyres) wrote:
>> I feel somewhat better now. Ethan - can you fix?
>> -----Original Message-----
>> From: Tim Mattox [mailto:timattox_at_[hidden]]
>> Sent: Sunday, November 12, 2006 05:34 PM Eastern Standard Time
>> To: General user list for the MPI Testing Tool
>> Subject: [MTT users] Corrupted MTT database or
>> incorrucet query
>> I just noticed that the MTT summary page is presenting
>> incorrect information for our recent runs at IU. It is
>> showing failures for the 1.2b1 that actaully came from
>> the trunk! See the first entry in this table:
>> Click on the [i] in the upper right (the first entry)
>> to get the popup window which shows the MPIRrun cmd as:
>> mpirun -mca btl tcp,sm,self -np 6 --prefix
>> dynamic/spawn Note the path has "1.3a1r12559" in the
>> name... it's a run from the trunk, yet the table showed
>> this as a 1.2b1 run. There are several of these
>> missattributed errors. This would explain why Jeff saw
>> some ddt errors on the 1.2 brach yesterday, but was
>> unable to reproduce them. They were from the trunk!
>> Tim Mattox - http://homepage.mac.com/tmattox/
>> tmattox_at_[hidden] || timattox_at_[hidden]
>> I'm a bright... http://www.the-brights.net/
>> mtt-users mailing list
>> 1. mailto:timattox_at_[hidden]
>> 2. http://homepage.mac.com/tmattox/
>> 3. http://www.the-brights.net/
>> 4. http://www.open-mpi.org/mailman/listinfo.cgi/mtt-users
>> mtt-users mailing list
> mtt-users mailing list