On 4/7/08 7:15 AM, "Gleb Natapov" <glebn_at_[hidden]> wrote:
> On Mon, Apr 07, 2008 at 07:07:38AM -0600, Ralph H Castain wrote:
>> On 4/7/08 7:04 AM, "Gleb Natapov" <glebn_at_[hidden]> wrote:
>>> On Fri, Apr 04, 2008 at 10:52:38AM -0600, Ralph H Castain wrote:
>>>> With compression "on", you will get output telling you the original size of
>>>> the message and its compressed size so you can see what was done.
>>> I see this output:
>>> uncompressed allgather msg orig size 67521 compressed size 4162.
>>> What is "allgather msg"
>> It is the modex message - it is "shared" across all the procs via an
>> allgather procedure
> If I'll divide allgather msg size by number of processes I should get a
> modex size of one process. Is this correct?
Pretty much - there is some slight overhead added so orte knows what to do
with the message, but that is only a few bytes.
> Also can you explain how
> allgather is implemented in orte (sorry if you already explained this once
> and I missed it).
The default method is for each proc to send its modex data to its local
daemon. The local daemon collects the messages until all of its local procs
have contributed, then sends the collected data to the rank=0 application
proc. One rank=0 has received a message from every daemon, it xcasts the
collected result to all procs in its job.
I am currently working on a more scalable version of this that has the
daemons do a tree-like gather instead of just sending everything to rank=0.
Probably about a week from completion.