Ok.  I'll dig a bit over the weekend.  Thanks!

-jms
Sent from my PDA

 -----Original Message-----
From:   Richard Graham [mailto:rlgraham@ornl.gov]
Sent:   Friday, November 02, 2007 05:50 PM Eastern Standard Time
To:     Open MPI Developers
Subject:        Re: [OMPI devel] openib currently broken

Jeff,
  I ran IMB on 60 procs with the openib and self btls,  and all ran fine.
The tests that were run
 were ping-pong, ping-ping, SendRecv, Exchange, Allreduce, Reduce,
Reduce_scatter, Allgather,
 Allgatherv, Alltoall, Bcast, and Barrier.  I also ran on 40 procs, and
several smaller runs.  If you
 can reproduce and provide more details (I realize you ran out of time), I
can take another look.
 I would expect a bug in the changes would cause one to walk over memory,
rather than change
 the memory usage, but who knows.  I will be off line until late Sunday...

Rich



On 11/2/07 3:26 PM, "Jeff Squyres (jsquyres)" <jsquyres@cisco.com> wrote:

> Rich -
>
> I'm not 100% sure its fixed - I'm still seeing "out of memory" errors when
> running about 40 prob imb over openib.  But I ran out of time to investigate
> deeply...
>
> Could you try running a nontrivial omb to check?
>
> -jms
> Sent from my PDA
>
>  -----Original Message-----
> From:   Richard Graham [mailto:rlgraham@ornl.gov]
> Sent:   Friday, November 02, 2007 02:07 PM Eastern Standard Time
> To:     Open MPI Developers
> Subject:        Re: [OMPI devel] openib currently broken
>
> R16641 should have fixed the regression.  Anyone using ompi_free_list_t_ex()
> and providing
>  a memory allocator would have been bitten by this, since I did not update
> this function
>  (which will be deprecated in favor of a version parallel to
> ompi_free_list_t_new) to initialize
>  the new fields defined.  From looking through the btls, this seems to be
> only the openib btl.
>
> Rich
>
>
> On 11/2/07 12:31 PM, "Richard Graham" <rlgraham@ornl.gov> wrote:
>
>> >
>> >
>> >
>> > On 11/2/07 12:21 PM, "Jeff Squyres" <jsquyres@cisco.com> wrote:
>> >
>>> >> The freelist changes from yesterday appear to have broken the openib
>>> >> btl.  We didn't get lots of test failures in MTT last night only
>>> >> because there was a separate (unrelated) typo in the ofud BTL that
>>> >> prevented the nightly tarball from building on any IB-capable
>>> >> machines.  :-)
>>> >>
>>> >> Rich hopes to look into fixing the openib BTL problem today; he
>>> >> thinks it's a case of a simple oversight: the openib BTL is not using
>>> >> the new freelist init functions.
>>> >>
>>> >> Rich: are there other places that are not using the new init
>>> >> functions that need to?
>>> >>
>>>>>>> >>>> >> the ompi free list has two init functions, I changed just one.
The IB
>>>>> >>>> btl uses the
>>>>>>> >>>> >> one I have not yet changed, but the pml uses the one I did
change.
>>> >>
>>>>>>> >>>> >> rich
>>> >>
>>> >> --
>>> >> Jeff Squyres
>>> >> Cisco Systems
>>> >>
>>> >> _______________________________________________
>>> >> devel mailing list
>>> >> devel@open-mpi.org
>>> >> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> >>
>> >
>> >
>> >
>> > _______________________________________________
>> > devel mailing list
>> > devel@open-mpi.org
>> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
>
>
> _______________________________________________
> devel mailing list
> devel@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel