Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] coll_ml_priority in openmpi-1.7.5
From: Jeff Squyres (jsquyres) (jsquyres_at_[hidden])
Date: 2014-03-25 10:11:42


Yes, Nathan has a few coll ml fixes queued up for 1.8.

On Mar 24, 2014, at 10:11 PM, tmishima_at_[hidden] wrote:

>
>
> I ran our application using the final version of openmpi-1.7.5 again
> with coll_ml_priority = 90.
>
> Then, coll/ml was actually activated and I got these error messages
> as shown below:
> [manage][[11217,1],0][coll_ml_lmngr.c:265:mca_coll_ml_lmngr_alloc] COLL-ML
> List manager is empty.
> [manage][[11217,1],0][coll_ml_allocation.c:47:mca_coll_ml_allocate_block]
> COLL-ML lmngr failed.
> [manage][[11217,1],0][coll_ml_module.c:532:ml_module_memory_initialization]
> COLL-ML mca_coll_ml_allocate_block exited wi
> th error.
>
> Unfortunately coll/ml seems to still have some problems ...
>
> And, it also means coll/ml was not activated on my test run with
> coll_ml_priority = 27. So, the slowdown was due to the expensive
> connectivity computation as you pointed out, I guess.
>
> Tetsuya
>
>> On Mar 20, 2014, at 5:56 PM, tmishima_at_[hidden] wrote:
>>
>>>
>>> Hi Ralph, congratulations on releasing new openmpi-1.7.5.
>>>
>>> By the way, opnempi-1.7.5rc3 has been slowing down our application
>>> with smaller size of testing data, where the time consuming part
>>> of our application is so called sparse solver. It's negligible
>>> with medium or large size data - more practical one, so I have
>>> been defering this problem.
>>>
>>> However, this slowdown disappears in the final version of
>>> openmpi-1.7.5. After some investigations, I found coll_ml caused
>>> this slowdown. The final version seems to set coll_ml_priority as zero
>>> again.
>>>
>>> Could you explain briefly about the advantage of coll_ml? In what kind
>>> of situation it's effective and so on ...
>>
>> I'm not really the one to speak about coll/ml as I wasn't involved in it
> - Nathan would be the one to ask. It is supposed to be significantly faster
> for most collectives, but I imagine it would
>> depend on the precise collective being used and the size of the data. We
> did find and fix a number of problems right at the end (which is why we
> dropped the priority until we can better test/debug
>> it), and so we might have hit something that was causing your slow down.
>>
>>
>>>
>>> In addition, I'm not sure why coll_my is activated in openmpi-1.7.5rc3,
>>> although its priority is lower than tuned as described in the message
>>> of changeset 30790:
>>> We are initially setting the priority lower than
>>> tuned until this has had some time to soak in the trunk.
>>
>> Were you actually seeing coll/ml being used? It shouldn't have been.
> However, coll/ml was getting called during the collective initialization
> phase so it could set itself up, even if it wasn't being
>> used. One part of its setup is a somewhat expensive connectivity
> computation - one of our last-minute cleanups was removal of a static 1MB
> array in that procedure. Changing the priority to 0
>> completely disables the coll/ml component, thus removing it from even the
> initialization phase. My guess is that you were seeing a measurable "hit"
> by that procedure on your small data tests, which
>> probably ran fairly quickly - and not seeing it on the other tests
> because the setup time was swamped by the computation time.
>>
>>
>>>
>>> Tetsuya
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/