On Oct 23, 2007, at 6:33 AM, Bogdan Costescu wrote:
>> There is in the openib BTL.
>
> The bug #1025 has in one the answers the following phrase:
>
> "It looks like this will affect many threading issues with the
> pathscale compiler -- the openib BTL is simply the first place we
> tripped it."
>
> which along with the rest of the data (failure dependency on TLS
> usage) led me to wonder about threading issues.
FWIW, these problems even affect non-threaded builds, so I'm not
entirely sure what the problem is. All indications point to a
problem in the Pathscale compiler, but who knows -- perhaps we're
doing something stupid that doesn't show up in any other compiler.
>> To be honest, I removed the pathscale suite from my regular
>> regression testing
>
> So, is anyone else testing PathScale 3.0 with stable versions of Open
> MPI ? Or with development versions ?
I don't know; Cisco is not. I removed it from my normal testing set
because all IB testing would fail -- so it wasn't worth testing.
>> I just recompiled the OMPI 1.2 branch with pathscale 3.0 on RHEL4U4
>> and I do not see the problems that you are seeing. :-\ Is Debian
>> etch a supported pathscale platform?
>
> Seems like it's not... And indeed the older RHEL4 is a supported
> platform, which might explain the different results.
You might want to ask them if Debian etch is supported.
> I made some progress: if I configure with "--without-memory-manager"
> (along with all other options that I mentioned before), then it works.
> This was inspired by the fact that the segmentation fault occured in
> ptmalloc2. I have previously tried to remove the MX support without
> any effect; with ptmalloc2 out of the picture I have had test runs
> over MX and TCP without problems.
This is ringing a [very] distant bell in my memory, but I don't
remember the details. Brian: do you remember any specific issues
about the memory manager and pathscale compiler?
--
Jeff Squyres
Cisco Systems
|