Congratulations for your machine, this is a stunning achievement !
> Kawashima <t-kawashima_at_[hidden]> wrote :
> Also, we modified tuned COLL to implement interconnect-and-topology-
> specific bcast/allgather/alltoall/allreduce algorithm. These algorithm
> implementations also bypass PML/BML/BTL to eliminate protocol and
This seems perfectly valid to me. The current coll components use normal
MPI_Send/Recv semantics, hence the PML/BML/BTL chain, but I always saw the
coll framework as a way to be able to integrate smoothly "custom"
collective components for a specific interconnect. I think that Mellanox
also did a specific collective component using directly their ConnectX HCA
However, modifying the "tuned" component may not be the better way to
integrate your collective work. You may consider creating a "tofu" coll
component which would only provide the collectives you optimized (and the
coll framework will fallback on tuned for the ones you didn't optimize).
> To achieve above, we created 'tofu COMMON', like sm
> Is there interesting one?
It may be interesting, yes. I don't know the tofu model, but if it is not
secret, contributing it is usually a good thing.
Your communication model may be similar to others and portions of code may
be shared with other technologies (I'm thinking of IB, MX, PSM,...).
People writing new code would also consider your model and let you take
advantage of it. Knowing how tofu is integrated into Open MPI may also
impact major decisions the open-source community is taking.