Hi,
Pooja and I are actually working on this course project where we our main aim is
schedule MPI and non MPI calls... giving more priority to the MPI calls over the non
MPI ones.
To make things simple, we are making this scheduling static to some extent... by
static I mean.. we know that our clusters use Infiniband for MPI ( from our study of
the openmpi source code this precisely uses the 'mca_btl_openib_send()' from
the ompi/mca/btl/openib/btl_openib.c file) ... so all the non MPI communication can
be assumed to be TCP communication using the 'mca_btl_tcp_send()' from the
ompi/mca/btl/tcp/btl_tcp.c file.
To implement this we plan to implement the foll. simple algorithm:
- before calling the 'mca_btl_openib_send()' lock0(X);
- before calling the 'mca_btl_tcp_send()' lock1(X);
Algo:
1. Allow Lock0(x) -> Lock0(x);.. meaning Lock0(x) is followed by Lock0(x).
2. Allow Lock1(x) -> Lock1(x);
3. Do not allow Lock0(x) -> Lock1(x);
4. If Lock1(x) -> Lock0(x).... since MPI calls are to be higher priority over the non
MPI ones.. in this case the non MPI communication should be paused and all the
related data off course needs to be put into a queue(meaning the status of this
should be saved in a queue). All other non MPI communications newer than this
should also be added to this same queue. Now the MPI process trying to
perform Lock0(x) should be allowed to complete and only when all the MPI
communications are complete should the non MPI communication be allowed.
Currently we are working on a simple scheduling algorithm without giving any
priorities to the 'MPI_send' calls.
However to implement the project fully, we have the following queries :(
-Can we abort or pause the non-MPI/TCP communication in any way???
-Given the assumption that the non-MPI communication is TCP, can we
make use of the built in structures (i mean the buffer already used) in
mca_btl_tcp_send() for the implementation of pt.4 in the above mentioned
algorithm??? and more importantly how?
Regards,
Chaitali