I ran IMB on 60 procs with the openib and self btls, and all ran fine. The tests that were run
were ping-pong, ping-ping, SendRecv, Exchange, Allreduce, Reduce, Reduce_scatter, Allgather,
Allgatherv, Alltoall, Bcast, and Barrier. I also ran on 40 procs, and several smaller runs. If you
can reproduce and provide more details (I realize you ran out of time), I can take another look.
I would expect a bug in the changes would cause one to walk over memory, rather than change
the memory usage, but who knows. I will be off line until late Sunday...
On 11/2/07 3:26 PM, "Jeff Squyres (jsquyres)" <email@example.com> wrote:
I'm not 100% sure its fixed - I'm still seeing "out of memory" errors when running about 40 prob imb over openib. But I ran out of time to investigate deeply...
Could you try running a nontrivial omb to check?
Sent from my PDA
From: Richard Graham [mailto:firstname.lastname@example.org]
Sent: Friday, November 02, 2007 02:07 PM Eastern Standard Time
To: Open MPI Developers
Subject: Re: [OMPI devel] openib currently broken
R16641 should have fixed the regression. Anyone using ompi_free_list_t_ex()
a memory allocator would have been bitten by this, since I did not update
(which will be deprecated in favor of a version parallel to
ompi_free_list_t_new) to initialize
the new fields defined. From looking through the btls, this seems to be
only the openib btl.
On 11/2/07 12:31 PM, "Richard Graham" <email@example.com> wrote:
> On 11/2/07 12:21 PM, "Jeff Squyres" <firstname.lastname@example.org> wrote:
>> The freelist changes from yesterday appear to have broken the openib
>> btl. We didn't get lots of test failures in MTT last night only
>> because there was a separate (unrelated) typo in the ofud BTL that
>> prevented the nightly tarball from building on any IB-capable
>> machines. :-)
>> Rich hopes to look into fixing the openib BTL problem today; he
>> thinks it's a case of a simple oversight: the openib BTL is not using
>> the new freelist init functions.
>> Rich: are there other places that are not using the new init
>> functions that need to?
>>>> >> the ompi free list has two init functions, I changed just one. The IB
>>>> btl uses the
>>>> >> one I have not yet changed, but the pml uses the one I did change.
>>>> >> rich
>> Jeff Squyres
>> Cisco Systems
>> devel mailing list
> devel mailing list
devel mailing list