On Dec 5, 2006, at 6:15 PM, Galen M. Shipman wrote:
> Brock Palen wrote:
>
>> I was asked by mirycom to run a test using the data reliability pml.
>> (dr) I ran it like so:
>>
>> $ mpirun --mca pml dr -np 4 ./xhpl
>>
>> Is this the right format for running the dr pml?
>>
> This should be fine, yes.
> I can running HPL on our test cluster to see if something is wrong
> with DR.
> I assume you are using GM and not MX?
He is running GM.
> Can you try running a simple ping-pong to make sure we have the basics
> on this platform?
> If you have access to them, running the intel test suite would also be
> helpful in determining if/where we have an issue.
He has run IMB compiled with -DCHECK and it did not report any errors.
>> Is there any gotchas on using the dr pml?
>> also if the dr pml is finding errors, and is resending data, can i
>> have it tell me when that happens? Like a verbose mode?
>>
> Unfortunately you will need to update the source and recompile, try:
>
> Updating this file: topdir/ompi/mca/pml/dr/pml_dr.h:245:#define
> MCA_PML_DR_DEBUG_LEVEL -1
> And change MCA_PML_DR_DEBUG_LEVEL to 0..
The problem is that, when running HPL, he sees failed residuals. When
running HPL under MPICH-GM, he does not.
I have tried running HPCC (HPL plus other benchmarks) using OMPI with
GM on 32-bit Xeons and 64-bit Opterons. I do not see any failed
residuals. I am trying to get access to a couple of OSX machines to
replicate Brock's setup.
Scott
|