I guess that you have OFED driver installed on you machines. You may do
basic network verification with ibdiagnet utility
(http://linux.die.net/man/1/ibdiagnet) that is part of OFED installation.
Jeff Squyres wrote:
> On May 4, 2009, at 9:50 AM, jan wrote:
>> Thank you Jeff. I have passed the mail to the IB vendor Dell company(the
>> blade was ordered from Dell Taiwan), but he todl me that he didn't
>> understand "layer 0 diagnostics". Coluld you help us to get more
>> information of "layer 0 diagnostics". Thanks again.
> Layer 0 = your physical network layer. Specifically: ensure that your
> IB network is actually functioning properly at both the physical and
> driver layer. Cisco was an IB vendor for several years; I can tell
> you from experience that it is *not* enough to just plug everything in
> and run a few trivial tests to ensure that network traffic seems to be
> passed properly. You need to have your vendor run a full set of layer
> 0 diagnostics to ensure that all the cables are good, all the HCAs are
> good, all the drivers are functioning properly, etc. This involves
> running diagnostic network testing patterns, checking various error
> counters on the HCAs and IB switches, etc.
> This is something that Dell should know how to do.
> I say all this because the problem that you are seeing *seems* to be a
> network-related problem, not an OMPI-related problem. One can never
> know for sure, but it is fairly clear that the very first step in your
> case is to verify that the network is functioning 100% properly.
> FWIW: this was standard operating procedure when Cisco was selling IB