Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Why might MPI_Recv trip PSM_MQ_RECVREQS_MAX ?
From: Rainer Keller (keller_at_[hidden])
Date: 2010-03-08 09:22:10


Hello Jonathan,
Your are using Infinipath's PSM library and the corresponding MTL/psm and
therefore the corresponding upper-layer PML/cm.
In fact, this _is_ calling into the psm's irecv() function, which explains the
error triggered in the psm library.

Not knowing the degree of parallelism of Your application otherwise, apart
from trying to increase the max. recv requests using the environment variable,
You might want to change some of the master send to synchronous MPI_Ssend().

On the other hand, the example code You posted could be written differently,
e.g. collect multiple random numbers into one communication, or using
collective communication, here with sub-communicators containing the master
and sources and master and targets, all of which would reduce pressure on the
master.

Hope this helps.

Best regards,
Rainer

On Sunday 07 March 2010 04:17:33 pm Jonathan Wesley Stone wrote:
> Hi,
>
> My supercomputer has OpenMPI 1.4. I am running into a frustrating
> problem with my MPI program. I am using only the following calls,
> which I expect to be blocking:
> MPI_Wtime
> MPI_Error_string
> MPI_Abort
> MPI_Send
> MPI_Get_count
> MPI_Recv
> MPI_Probe
> MPI_Init
> MPI_Comm_rank
> MPI_Comm_size
> MPI_Finalize
>
> Somehow I am getting this error when I do a large number of sequential
> communications: "c002:2.0.Exhausted 1048576 MQ irecv request
> descriptors, which usually indicates a user program error or
> insufficient request descriptors (PSM_MQ_RECVREQS_MAX=1048576)"
>
> This seems counter-intuitive to me because I don't think I should be
> using irecvs since I am wanting specifically to rely on the documented
> blocking behavior of MPI_Recv (not MPI_Irecv, which I am not using).
>
> My main program is quite large, however I have managed to replicate
> the irritating behavior in this much smaller program, which executes a
> number of MPI_Send or MPI_Recv calls in a loop. The program's default
> behaviour is to run 2,000,000 iterations. When I turn it up to
> 20,000,000, after a short time it generates the PSM_MQ_RECVREQS_MAX
> exception.
>
> I would appreciate if anyone could advise why it might be happening in
> this "test" case -- basically what is going on that causes my
> presumably blocking MPI_Recv calls to "accumulate" such a large number
> of "irecv request descriptors", when I expect they should be blocking
> and get immediately resolved and the count should go down when the
> matching MPI_Send is posted.
>
> I appreciate your assistance. Thank you!
>
> Jonathan Stone
> Research Assistant, U. Oklahoma
>

-- 
------------------------------------------------------------------------
Rainer Keller, PhD                  Tel: +1 (865) 241-6293
Oak Ridge National Lab          Fax: +1 (865) 241-4811
PO Box 2008 MS 6164           Email: keller_at_[hidden]
Oak Ridge, TN 37831-2008    AIM/Skype: rusraink