I am pretty sure MTL's and BTL's are very different, but just as a note,
This users code (Crash) hangs at MPI_Allreduce() in
But runs on:
psm (an mtl, different hardware)
Putting it out there if it does have any bearing. Otherwise ignore.
Center for Advanced Computing
On May 12, 2011, at 10:20 AM, Brock Palen wrote:
> On May 12, 2011, at 10:13 AM, Jeff Squyres wrote:
>> On May 11, 2011, at 3:21 PM, Dave Love wrote:
>>> We can reproduce it with IMB. We could provide access, but we'd have to
>>> negotiate with the owners of the relevant nodes to give you interactive
>>> access to them. Maybe Brock's would be more accessible? (If you
>>> contact me, I may not be able to respond for a few days.)
>> Brock has replied off-list that he, too, is able to reliably reproduce the issue with IMB, and is working to get access for us. Many thanks for your offer; let's see where Brock's access takes us.
> I should also note that as far as I know I have three codes (CRASH, Namd (some cases), and another user code. That lockup on a collective on OpenIB but run with the same library on Gig-e.
> So I am not sure it is limited to IMB, or I could be crossing errors, normally I would assume unmatched eager recvs for this sort of problem.
>>>> -- we have not closed this issue,
>>> Which issue? I couldn't find a relevant-looking one.
>> Jeff Squyres
>> For corporate legal information go to:
>> users mailing list
> users mailing list