This web mail archive is frozen.
This page is part of a frozen web archive of this mailing list.
You can still navigate around this archive, but know that no new mails
have been added to it since July of 2016.
Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.
With the increasing gap between network bandwidth and processor computing power, the current trend in linear algebra is toward communication avoiding algorithms (aka. replacing communications with redundant computations). You're taking the exact opposite path, I wonder if you can get any benefit ...
Moreover, your proposed approach only makes sense if you expect the LAPACK operation to be faster if the other cores are silent (in order to use them for the computation itself). This is very tricky to do for a single LAPACK call, as usually OMP_NUM_THREADS & co. are affecting the entire application. I remember reading somewhere that MKL provide a function to change the number of threads at runtime, maybe you should look in that direction.
On Nov 2, 2010, at 06:33 , Ashley Pittman wrote:
> On 2 Nov 2010, at 10:21, Jerome Reybert wrote:
>> - in my implementation, is MPI_Bcast aware that it should use shared memory
>> memory communication? Is data go through the network? It seems it is the case,
>> considering the first results.
>> - is there any other methods to group task by machine, OpenMPI being aware
>> that it is grouping task by shared memory?
>> - is it possible to assign a policy (in this case, a shared memory policy) to
>> a Bcast or a Barrier call?
>> - do you have any better idea for this problem? :)
> Interesting stuff, two points quickly spring to mind from the above:
> MPI_Comm_split() is an expensive operation, sure the manual says it's low cost but it shouldn't be used inside any critical loops so be sure you are doing the Comm_Split() at startup and then re-using it as and when needed.
> Any blocking call into OpenMPI will poll consuming CPU cycles until the call is complete, you can mitigate against this by telling OpenMPI to aggressively call yield whilst polling which would mean that your parallel Lapack function could get the CPU resources it required. Have a look at this FAQ entry for details of the option and what you can expect it to do.
> Ashley Pittman, Bath, UK.
> Padb - A parallel job inspection tool for cluster computing
> users mailing list