Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] gathering problem
From: Hodge, Gary C (gary.c.hodge_at_[hidden])
Date: 2012-11-09 12:36:43


Answering my own question, I have downloaded openMPI 1.6.2 and still get the delay in the MPI_Send. Previously, I was using openMPI 1.4.1

I configured 1.6.2 with -enable-peruse and have implemented PERUSE callbacks.
Here is a trace of gb2 (PROC39) sending messages to ob (PROC28)

PROC39: 59.2268409810; SEND_REQ_ACTIVATE, count=81536, peer=28
PROC39: 59.2268649811; SEND_REQ_XFER_BEGIN, count=81536, peer=28
PROC39: 59.2270019811; SEND_REQ_XFER_CONTINUE, count=27178, peer=28
PROC39: 59.2270509811; SEND_REQ_XFER_CONTINUE, count=54358, peer=28
PROC39: 59.2270809811; SEND_REQ_XFER_END, count=81536, peer=28
PROC39: 59.2270869811; SEND_REQ_COMPLETE, count=81536, peer=28
PROC39: 59.2270939811; SEND_REQ_NOTIFY, count=81536, peer=28
PROC28: 59.2271299611; RECV_MSG_ARRIVED, count=0, peer=39
PROC28: 59.2271389612; RECV_SEARCH_POSTED_Q_BEGIN, count=0, peer=39
PROC28: 59.2271439611; RECV_REQ_REMOVE_FROM_POSTED_Q, count=172084, peer=39
PROC28: 59.2271479612; RECV_MSG_MATCH_POSTED_REQ, count=172084, peer=39
PROC28: 59.2271519612; RECV_SEARCH_POSTED_Q_END, count=0, peer=39
PROC28: 59.2271559612; RECV_REQ_XFER_BEGIN, count=172084, peer=39
PROC28: 59.2271609611; RECV_REQ_XFER_CONTINUE, count=27178, peer=39
PROC28: 59.2271759611; RECV_REQ_XFER_CONTINUE, count=54358, peer=39
PROC39: 59.2283769811; SEND_REQ_ACTIVATE, count=81536, peer=28
PROC39: 59.2283959811; SEND_REQ_XFER_BEGIN, count=81536, peer=28
PROC39: 59.4331379811; SEND_REQ_XFER_CONTINUE, count=27178, peer=28
PROC28: 59.4331819612; RECV_REQ_XFER_END, count=172084, peer=39
PROC28: 59.4332009612; RECV_REQ_COMPLETE, count=172084, peer=39
PROC39: 59.4332049811; SEND_REQ_XFER_CONTINUE, count=54358, peer=28
PROC28: 59.4332059611; RECV_MSG_ARRIVED, count=0, peer=39
PROC28: 59.4332109612; RECV_SEARCH_POSTED_Q_BEGIN, count=0, peer=39
PROC28: 59.4332149612; RECV_MSG_INSERT_IN_UNEX_Q, count=0, peer=39
PROC28: 59.4332199611; RECV_SEARCH_POSTED_Q_END, count=0, peer=39
PROC28: 59.4332269612; RECV_REQ_NOTIFY, count=172084, peer=39
PROC28: 59.4332539611; RECV_REQ_ACTIVATE, count=172084, peer=39
PROC39: 59.4332569811; SEND_REQ_XFER_END, count=81536, peer=28
PROC28: 59.4332589612; RECV_SEARCH_UNEX_Q_BEGIN, count=172084, peer=39
PROC28: 59.4332629612; RECV_REQ_MATCH_UNEX, count=172084, peer=39
PROC39: 59.4332659811; SEND_REQ_COMPLETE, count=81536, peer=28
PROC28: 59.4332669611; RECV_MSG_REMOVE_FROM_UNEX_Q, count=0, peer=39
PROC28: 59.4332719612; RECV_SEARCH_UNEX_Q_END, count=172084, peer=39
PROC28: 59.4332759612; RECV_REQ_XFER_BEGIN, count=172084, peer=39
PROC39: 59.4332769811; SEND_REQ_NOTIFY, count=81536, peer=28
PROC28: 59.4332809611; RECV_REQ_XFER_CONTINUE, count=27178, peer=39
PROC39: 59.4332849811; ERROR: component gb2 exceeded send time limit by 0.104915 seconds.
PROC39: 59.4332849811; WARNING: component gb2 increased page faults R=130898,0; P=130898,0; S=132207,0

Notice the 205 ms delay between the hi-lighted SEND_REQ_XFER_BEGIN and the subsequent SEND_REQ_XFER_CONTINUE
Also notice that there was no such delay in the previous send request.

The last two lines are my own debug that monitor excessive time spent in an MPI_Send and monitor increasing number of page faults.
After the delaying MPI_Send, the page faults have increased by 1309 (132207 - 130898) !!!

I looked at the PML code (pml_ob1_sendreq.c) and I suspect that something is going awry in the mca_bml_base_prepare_src function.
I believe that this is the function in which the openib BTL makes calls to register the user memory.

This delay in the MPI_Send kills any hope of meeting our real-time requirements, so any help is greatly appreciated.

From: Hodge, Gary C
Sent: Monday, November 05, 2012 12:27 PM
To: users_at_[hidden]
Subject: gathering problem

I continue to have a problem where 2 processes are sending to the same process and one of the sending processes hangs for 150 to 550 ms in the call to MPI_Send.

Each process runs on a different node and the receiving process has posted an MPI_Irecv 17 ms before the hanging send.
The posted receives are for 172K buffers and the sending processes are sending 81K size messages.
I have set mpi_leave_pinned to 1 and have increased the btl_openib_receive_queues to ...:S,65536,512,256,64

How do I trace the various phases of message passing to diagnose where the send is hanging up?