Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] --bynode vs --byslot
From: Cally K (kalpana0611_at_[hidden])
Date: 2008-06-04 20:39:33


Thanks, that was actually a lot of help, I had very little understanding of
the bynode and byslot thingy, thanks

On 6/5/08, Jeff Squyres <jsquyres_at_[hidden]> wrote:
>
> On May 23, 2008, at 9:07 PM, Cally K wrote:
>
> > Hi, I have a question about --bynode and --byslot that i would like
> > to clarify
> >
> > Say, for example, I have a hostfile
> >
> > #Hostfile
> >
> > __________________________
> > node0
> > node1 slots=2 max_slots=2
> > node2 slots=2 max_slots=2
> > node3 slots=4 max_slots=4
> > ___________________________
> >
> > There are 4 nodes and 9 slots, how do I run my mpirun, for now I use
> >
> > a) mpirun -np --bynode 4 ./abcd
>
> I assume you mean "... -np 4 --bynode ..."
>
> > I know that the slot thingy is for SMPs, and I have tried running
> > mpirun -np --byslot 9 ./abcd
> >
> > and I noticed that its longer when I do --byslot when compared to --
> > bynode
>
> According to your text, you're running 9 processes when using --byslot
> and 4 when using --bynode. Is that a typo? I'll assume that it is --
> that you meant to use 9 in both cases.
>
> > and I just read the faq that said, by defauly the byslot option is
> > used, so I dun have to use it rite,,,
>
> I'm not sure what your question is. The actual performance may depend
> on your application and what its communication and computation
> patterns are. It gets more difficult to model when you have a
> heterogeneous setup (like it looks like you have, per your hostfile).
>
> Let's take your example of 9 processes.
>
> - With --bynode, the MPI_COMM_WORLD ranks will be laid out as follows
> (MCRW = "MPI_COMM_WORLD rank")
>
> node0: MCWR 0
> node1: MCWR 1, MCWR 4
> node2: MCWR 2, MCWR 5
> node3: MCRW 3, MCRW 6, MCWR 7, MCWR 8
>
> - With --byslot, it'll look like this:
>
> node0: MCWR 0
> node1: MCWR 1, MCWR 2
> node2: MCWR 3, MCWR 4
> node3: MCRW 5, MCRW 6, MCWR 7, MCWR 8
>
> In short, OMPI is doing round-robin placement of your processes; the
> only difference is in which dimension is traversed first: by node or
> by slot.
>
> As to why there's such a performance difference, it could depend on a
> lot of things: the difference in computational speed and/or RAM on
> your 4 nodes, the changing communication patterns between the two
> (shared memory is usually used for on-node communication, which is
> usually faster than most networks), etc. It really depends on what
> your application is *doing*.
>
> Sorry I can't be of more help...
>
> --
> Jeff Squyres
> Cisco Systems
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>