>> Are you overwhelming the receiver with short, unexpected messages such that MPI keeps mallocing >> and mallocing and mallocing in an attempt to eagerly receive all the messages? I ask because Open >> MPI only eagerly sends short messages -- long messages are queued up at the sender and not
>> actually transferred until the receiver starts to receive (aka a "rendezvous protocol").
This is probably what is happening. In general, my processes send a massive number of short messages and it is overwhelming the receivers. As I have some stage of computation (processes) much slower than others, the first ones can not handle the incoming messages in the same rate they are delivered to them.
>> Are you sure that you don't have some other kind of memory error in your application?
I have checked and there are not memory problems within the application.
>> FWIW, you can use MPI_SSEND to do a "synchronous" send, which means that it won't complete >> until the receiver has started to receive the message. This may slow your sender down dramatically, >> however. If it slows down your sender too much, you may have to implement your own flow control.
MPI_SSEND worked for my application and I did not have the problem, but as you said, it slows the senders. A better solution was implement my own flow control, as suggested. I have implemented a simple credit-based flow control scheme and it solved my problem.
Thanks a lot for the explanation and suggestions.
On Tue, Sep 6, 2011 at 9:43 AM, Jeff Squyres
<jsquyres@cisco.com> wrote:
Are you overwhelming the receiver with short, unexpected messages such that MPI keeps mallocing and mallocing and mallocing in an attempt to eagerly receive all the messages? I ask because Open MPI only eagerly sends short messages -- long messages are queued up at the sender and not actually transferred until the receiver starts to receive (aka a "rendezvous protocol").
While that *can* happen, I'd be a little surprised if it did. Indeed, it would probably take a little while for that to happen (i.e., the time necessary for the receiver to malloc a small amount N times, where N is large enough to exhaust the virtual memory on your machine, coupled with all the time delay to page out all the old memory and page in on-demand as Open MPI scans for new incoming matches... this could be pretty darn slow). Is that what is happening?
Are you sure that you don't have some other kind of memory error in your application?
FWIW, you can use MPI_SSEND to do a "synchronous" send, which means that it won't complete until the receiver has started to receive the message. This may slow your sender down dramatically, however. If it slows down your sender too much, you may have to implement your own flow control.
On Aug 25, 2011, at 10:58 PM, Rodrigo Oliveira wrote:
> Hi there,
>
> I am facing some problems in an Open MPI application. Part of the application is composed by a sender and a receiver. The problem is that the sender is so much faster than the receiver, what causes the receiver's memory to be completely used, aborting the application.
>
> I would like to know if there is a flow control scheme implemented in open mpi or if this issue have to be treated at the user application's layer. If exists, how it works and how can I use it in my application?
>
> I did some research about this subject, but I did not find a conclusive explanation.
>
> Thanks a lot.
> _______________________________________________
> users mailing list
> users@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
--
Jeff Squyres
jsquyres@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/
_______________________________________________
users mailing list
users@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users