Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

From: chaitali dherange (chaitali.dherange_at_[hidden])
Date: 2007-04-15 22:25:06


  Pooja and I are actually working on this course project where we our main
aim is
schedule MPI and non MPI calls... giving more priority to the MPI calls over
the non
MPI ones.

To make things simple, we are making this scheduling static to some
extent... by
static I mean.. we know that our clusters use Infiniband for MPI ( from our
study of
the openmpi source code this precisely uses the 'mca_btl_openib_send()' from

the ompi/mca/btl/openib/btl_openib.c file) ... so all the non MPI
communication can
be assumed to be TCP communication using the 'mca_btl_tcp_send()' from the
ompi/mca/btl/tcp/btl_tcp.c file.

To implement this we plan to implement the foll. simple algorithm:

- before calling the 'mca_btl_openib_send()' lock0(X);
- before calling the 'mca_btl_tcp_send()' lock1(X);


1. Allow Lock0(x) -> Lock0(x);.. meaning Lock0(x) is followed by Lock0(x).
2. Allow Lock1(x) -> Lock1(x);
3. Do not allow Lock0(x) -> Lock1(x);
4. If Lock1(x) -> Lock0(x).... since MPI calls are to be higher priority
over the non
MPI ones.. in this case the non MPI communication should be paused and all
related data off course needs to be put into a queue(meaning the status of
should be saved in a queue). All other non MPI communications newer than
should also be added to this same queue. Now the MPI process trying to
perform Lock0(x) should be allowed to complete and only when all the MPI
communications are complete should the non MPI communication be allowed.

Currently we are working on a simple scheduling algorithm without giving any

priorities to the 'MPI_send' calls.

However to implement the project fully, we have the following queries :(
-Can we abort or pause the non-MPI/TCP communication in any way???
-Given the assumption that the non-MPI communication is TCP, can we
make use of the built in structures (i mean the buffer already used) in
mca_btl_tcp_send() for the implementation of pt.4 in the above mentioned
algorithm??? and more importantly how?