On Apr 12, 2006, at 8:59 PM, Jeff Squyres (jsquyres) wrote:
> FWIW, the "has a different size..." errors means that you may not
> have been linking against the shared libraries that you thought you
> were. This typically means that the executable expected to find an
> object in a library of a given size, but the actual size of the
> object was different. So some kind of mismatch was occurring, and
> the segv at the end was therefore not surprising.
Yeah; I wasn't surprised either. That's why I just re-compiled the
app & ran it. Then it worked.
I'm suspicious (but can't prove it) that the opensm subnet manager
(running on another node, and on the Mellanox 'ib gold' stack) wasn't
working properly. The problem is that I have nothing to back up the
suspicion. But the behavior was consistent to what I'd see if there
was no subnet manager on the IB fabric (which may well have been the
case, actually). It's working now, though...