Could someone, who is more familiar with the architecture of the sm BTL, comment on the technical feasibility of the following: is it possible to easily extend the BTL (i.e. without having to rewrite it completely from scratch) so as to be able to perform transfers using both KNEM (or other kernel-assisted copying mechanism) for messages over a given size and the normal user-space mechanism for smaller messages with the switch-over point being a user-tunable parameter?
From what I’ve seen, both implementations have something in common, e.g. both use FIFOs to communicate controlling information.
The motivation behind this are our efforts to become greener by extracting the best possible out of the box performance on our systems without having to profile each and every user application that runs on them. We’ve already determined that activating KNEM really benefits some collective operations on big shared-memory systems, but the increased latency significantly slows down small message transfers, which also hits the pipelined implementations.
sm’s code doesn’t seem to be very complex but still I’ve decided to ask first before diving any deeper.
Hristo Iliev, PhD – High Performance Computing Team
RWTH Aachen University, Center for Computing and Communication
Rechen- und Kommunikationszentrum der RWTH Aachen
Seffenter Weg 23, D 52074 Aachen (Germany)