Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] coll_ml_priority in openmpi-1.7.5
From: tmishima_at_[hidden]
Date: 2014-03-21 19:00:17


I could roughly understand what the coll_ml is and how you
are going to treat it, thanks.

As Ralph pointed out, I didn't see coll_ml was really used.
I just thought the slowdown meant it was used. I'll check it
later. It might be due to the expensive connectivity computation.

Tetsuya

> One of the authors of ML mentioned to me off-list that he has an idea
what might have been causing the slowdown. They're actively working on
tweaking and making things better.
>
> I told them to ping you -- the whole point is that ml is supposed to be
*better* than our existing collectives, so if it's not, we should fix that
before we make ml be the default. :-)
>
>
> On Mar 21, 2014, at 9:04 AM, Ralph Castain <rhc_at_[hidden]> wrote:
>
> >
> > On Mar 20, 2014, at 5:56 PM, tmishima_at_[hidden] wrote:
> >
> >>
> >> Hi Ralph, congratulations on releasing new openmpi-1.7.5.
> >>
> >> By the way, opnempi-1.7.5rc3 has been slowing down our application
> >> with smaller size of testing data, where the time consuming part
> >> of our application is so called sparse solver. It's negligible
> >> with medium or large size data - more practical one, so I have
> >> been defering this problem.
> >>
> >> However, this slowdown disappears in the final version of
> >> openmpi-1.7.5. After some investigations, I found coll_ml caused
> >> this slowdown. The final version seems to set coll_ml_priority as zero
> >> again.
> >>
> >> Could you explain briefly about the advantage of coll_ml? In what kind
> >> of situation it's effective and so on ...
> >
> > I'm not really the one to speak about coll/ml as I wasn't involved in
it - Nathan would be the one to ask. It is supposed to be significantly
faster for most collectives, but I imagine it would
> depend on the precise collective being used and the size of the data. We
did find and fix a number of problems right at the end (which is why we
dropped the priority until we can better test/debug
> it), and so we might have hit something that was causing your slow down.
> >
> >
> >>
> >> In addition, I'm not sure why coll_my is activated in
openmpi-1.7.5rc3,
> >> although its priority is lower than tuned as described in the message
> >> of changeset 30790:
> >> We are initially setting the priority lower than
> >> tuned until this has had some time to soak in the trunk.
> >
> > Were you actually seeing coll/ml being used? It shouldn't have been.
However, coll/ml was getting called during the collective initialization
phase so it could set itself up, even if it wasn't
> being used. One part of its setup is a somewhat expensive connectivity
computation - one of our last-minute cleanups was removal of a static 1MB
array in that procedure. Changing the priority to 0
> completely disables the coll/ml component, thus removing it from even the
initialization phase. My guess is that you were seeing a measurable "hit"
by that procedure on your small data tests, which
> probably ran fairly quickly - and not seeing it on the other tests
because the setup time was swamped by the computation time.
> >
> >
> >>
> >> Tetsuya
> >>
> >> _______________________________________________
> >> users mailing list
> >> users_at_[hidden]
> >> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> --
> Jeff Squyres
> jsquyres_at_[hidden]
> For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users