|
Hi,
I recently started on an MPI-based, 'real-time',
pipelined-processing application, and the application
fails due to large time-jitter in sending and receiving
messages. Here are related info -
1) Platform:
a) Intel Box: Two Hex-core, Intel Xeon, 2.668 GHz
(...total of 12 cores),
b) OS: SUSE Linux Enterprise Server 11 (x86_64) - Kernel
\r (\l)
c) MPI Rev: (OpenRTE) 1.4, (...Installed OFED package)
d) HCA: InfiniBand: Mellanox Technologies MT26428
[ConnectX IB QDR, PCIe 2.0 5GT/s] (rev a0)
2) Application detail
a) Launching 7 processes, for pipelined processing,
where each process waits for a message (sizes vary
between 1 KBytes to 26 KBytes),
then process the data, and outputs a message (sizes vary
between 1 KBytes to 26 KBytes), to next process.
b) MPI transport functions used : "MPI_Isend",
MPI_Irecv, MPI_Test.
i) For Receiving messages, I first make an MPI_Irecv
call, followed by a busy-loop on MPI_Test, waiting for
message
ii) For Sending message, there is a busy-loop on
MPI_Test to ensure prior buffer was sent, then use
MPI_Isend.
c) When the job starts, all these 7 process are put
in High priority mode ( SCHED_FIFO policy, with priority
setting of 99).
The Job entails an input data packet stream (and a
series of MPI messages), continually at 40 micro-sec
rate, for a few minutes.
3) The Problem:
Most calls to MPI_Test (...which is non-blocking) takes
a few micro-sec, but around 10% of the job, it has a
large jitter, that vary from 1 to 100 odd millisec. This
causes
some of the application input queues to fill-up and
cause a failure.
Any suggestions to look at on the MPI settings or OS
config/issues will be much appreciated.
|