Jeff,
  I ran IMB on 60 procs with the openib and self btls,  and all ran fine.  The tests that were run
 were ping-pong, ping-ping, SendRecv, Exchange, Allreduce, Reduce, Reduce_scatter, Allgather,
 Allgatherv, Alltoall, Bcast, and Barrier.  I also ran on 40 procs, and several smaller runs.  If you
 can reproduce and provide more details (I realize you ran out of time), I can take another look.
 I would expect a bug in the changes would cause one to walk over memory, rather than change
 the memory usage, but who knows.  I will be off line until late Sunday...

Rich
 


On 11/2/07 3:26 PM, "Jeff Squyres (jsquyres)" <jsquyres@cisco.com> wrote:

Rich -

I'm not 100% sure its fixed - I'm still seeing "out of memory" errors when running about 40 prob imb over openib.  But I ran out of time to investigate deeply...

Could you try running a nontrivial omb to check?

-jms
Sent from my PDA

 -----Original Message-----
From:   Richard Graham [mailto:rlgraham@ornl.gov]
Sent:   Friday, November 02, 2007 02:07 PM Eastern Standard Time
To:     Open MPI Developers
Subject:        Re: [OMPI devel] openib currently broken

R16641 should have fixed the regression.  Anyone using ompi_free_list_t_ex()
and providing
 a memory allocator would have been bitten by this, since I did not update
this function
 (which will be deprecated in favor of a version parallel to
ompi_free_list_t_new) to initialize
 the new fields defined.  From looking through the btls, this seems to be
only the openib btl.

Rich


On 11/2/07 12:31 PM, "Richard Graham" <rlgraham@ornl.gov> wrote:

>
>
>
> On 11/2/07 12:21 PM, "Jeff Squyres" <jsquyres@cisco.com> wrote:
>
>> The freelist changes from yesterday appear to have broken the openib
>> btl.  We didn't get lots of test failures in MTT last night only
>> because there was a separate (unrelated) typo in the ofud BTL that
>> prevented the nightly tarball from building on any IB-capable
>> machines.  :-)
>>
>> Rich hopes to look into fixing the openib BTL problem today; he
>> thinks it's a case of a simple oversight: the openib BTL is not using
>> the new freelist init functions.
>>
>> Rich: are there other places that are not using the new init
>> functions that need to?
>>
>>>> >> the ompi free list has two init functions, I changed just one.  The IB
>>>> btl uses the
>>>> >> one I have not yet changed, but the pml uses the one I did change.
>>
>>>> >> rich
>>
>> --
>> Jeff Squyres
>> Cisco Systems
>>
>> _______________________________________________
>> devel mailing list
>> devel@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>
>
>
> _______________________________________________
> devel mailing list
> devel@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel




_______________________________________________
devel mailing list
devel@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel