Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

From: Richard Graham (rlgraham_at_[hidden])
Date: 2007-11-02 17:50:12


Jeff,
  I ran IMB on 60 procs with the openib and self btls, and all ran fine.
The tests that were run
 were ping-pong, ping-ping, SendRecv, Exchange, Allreduce, Reduce,
Reduce_scatter, Allgather,
 Allgatherv, Alltoall, Bcast, and Barrier. I also ran on 40 procs, and
several smaller runs. If you
 can reproduce and provide more details (I realize you ran out of time), I
can take another look.
 I would expect a bug in the changes would cause one to walk over memory,
rather than change
 the memory usage, but who knows. I will be off line until late Sunday...

Rich
 

On 11/2/07 3:26 PM, "Jeff Squyres (jsquyres)" <jsquyres_at_[hidden]> wrote:

> Rich -
>
> I'm not 100% sure its fixed - I'm still seeing "out of memory" errors when
> running about 40 prob imb over openib. But I ran out of time to investigate
> deeply...
>
> Could you try running a nontrivial omb to check?
>
> -jms
> Sent from my PDA
>
> -----Original Message-----
> From: Richard Graham [mailto:rlgraham_at_[hidden]]
> Sent: Friday, November 02, 2007 02:07 PM Eastern Standard Time
> To: Open MPI Developers
> Subject: Re: [OMPI devel] openib currently broken
>
> R16641 should have fixed the regression. Anyone using ompi_free_list_t_ex()
> and providing
> a memory allocator would have been bitten by this, since I did not update
> this function
> (which will be deprecated in favor of a version parallel to
> ompi_free_list_t_new) to initialize
> the new fields defined. From looking through the btls, this seems to be
> only the openib btl.
>
> Rich
>
>
> On 11/2/07 12:31 PM, "Richard Graham" <rlgraham_at_[hidden]> wrote:
>
>> >
>> >
>> >
>> > On 11/2/07 12:21 PM, "Jeff Squyres" <jsquyres_at_[hidden]> wrote:
>> >
>>> >> The freelist changes from yesterday appear to have broken the openib
>>> >> btl. We didn't get lots of test failures in MTT last night only
>>> >> because there was a separate (unrelated) typo in the ofud BTL that
>>> >> prevented the nightly tarball from building on any IB-capable
>>> >> machines. :-)
>>> >>
>>> >> Rich hopes to look into fixing the openib BTL problem today; he
>>> >> thinks it's a case of a simple oversight: the openib BTL is not using
>>> >> the new freelist init functions.
>>> >>
>>> >> Rich: are there other places that are not using the new init
>>> >> functions that need to?
>>> >>
>>>>>>> >>>> >> the ompi free list has two init functions, I changed just one.
The IB
>>>>> >>>> btl uses the
>>>>>>> >>>> >> one I have not yet changed, but the pml uses the one I did
change.
>>> >>
>>>>>>> >>>> >> rich
>>> >>
>>> >> --
>>> >> Jeff Squyres
>>> >> Cisco Systems
>>> >>
>>> >> _______________________________________________
>>> >> devel mailing list
>>> >> devel_at_[hidden]
>>> >> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> >>
>> >
>> >
>> >
>> > _______________________________________________
>> > devel mailing list
>> > devel_at_[hidden]
>> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel