Eugene Loh wrote:
> ompi_info -a | grep eager
> depends on the BTL. E.g., sm=4K but tcp is 64K. self is 128K.
>>> On the other hand, I assume the memory imbalance we're talking about
>>> is rather severe. Much more than 2500 bytes to be noticeable, I
>>> would think. Is that really the situation you're imagining?
>> The memory imbalance is drastic. I'm expecting 2 GB of memory use per
>> process. The heaving processes (13/16) use the expected amount of
>> memory; the remainder (3/16) misbehaving processes use more than twice
>> as much memory. The specifics vary from run to run of course. So, yes,
>> there is gigs of unexpected memory use to track down.
> Umm, how big of a message imbalance do you think you might have? (The
> inflection in my voice doesn't come out well in e-mail.) Anyhow, that
> sounds like, um, "lots" of 2500-byte messages.
The message imbalance could be very large. Each process is running
pretty close to its memory capacity. If a backlog of messages causes a
buffer to grow to the point where the process starts swapping, it will
very quickly fall very far behind. There are some billion 25-byte
operations being sent in total or tens of millions MPI_Send messages
(at 100 operations per MPI_Send).