This web mail archive is frozen.
This page is part of a frozen web archive of this mailing list.
You can still navigate around this archive, but know that no new mails
have been added to it since July of 2016.
Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.
On Dec 5, 2006, at 6:15 PM, Galen M. Shipman wrote:
> Brock Palen wrote:
>> I was asked by mirycom to run a test using the data reliability pml.
>> (dr) I ran it like so:
>> $ mpirun --mca pml dr -np 4 ./xhpl
>> Is this the right format for running the dr pml?
> This should be fine, yes.
> I can running HPL on our test cluster to see if something is wrong
> with DR.
> I assume you are using GM and not MX?
He is running GM.
> Can you try running a simple ping-pong to make sure we have the basics
> on this platform?
> If you have access to them, running the intel test suite would also be
> helpful in determining if/where we have an issue.
He has run IMB compiled with -DCHECK and it did not report any errors.
>> Is there any gotchas on using the dr pml?
>> also if the dr pml is finding errors, and is resending data, can i
>> have it tell me when that happens? Like a verbose mode?
> Unfortunately you will need to update the source and recompile, try:
> Updating this file: topdir/ompi/mca/pml/dr/pml_dr.h:245:#define
> MCA_PML_DR_DEBUG_LEVEL -1
> And change MCA_PML_DR_DEBUG_LEVEL to 0..
The problem is that, when running HPL, he sees failed residuals. When
running HPL under MPICH-GM, he does not.
I have tried running HPCC (HPL plus other benchmarks) using OMPI with
GM on 32-bit Xeons and 64-bit Opterons. I do not see any failed
residuals. I am trying to get access to a couple of OSX machines to
replicate Brock's setup.