On Nov 10, 2011, at 11:30 AM, Mudassar Majeed wrote:
> For example there are 10 nodes, and each node contains 20 cores. We will have 200 cores in total and let say there are 2000 MPI processes. We start the application with 10 MPI on each core.
Is this just to be able to simulate very large MPI jobs, or are you thinking that people will actually run that way (heavily over-subscribing cores)?
> Let say Comm(Pi, Pj) denotes how much communication Pi and Pj make with each other and let say each process Pi has to communicate with few other processes Pj, Pk, Pl, Pm..... Pz. Secondly let say Load(Pi) denotes the computational load of process Pi.
Depending on how you define Load(Pi), this really only matters if you're over-subscribing processors. Meaning: if you have only one MPI process per processor core, then Load(Pi) is probably irrelevant (excluding other effects, like cache thrashing, memory and PCI bandwidth usage, etc.).
> Now, we know that sending a message between two nodes is more expensive then sending a message within a node (two processes that communicate reside on the cores that exist in the same node). This is true atleast in my supercomputing centers that I use. In my previous work I only consider Load[ ] and not Comm[ ]. In that work, all the MPI processes calculate their new ranks and then call MPI_Comm_split with key = new_rank and color = 0. So all the processes get the new rank and then the actual data is provided to each process for computation. We have found that the total execution time decreases.
In an oversubscribed case, I'm still not sure how this works. Do you have some MPI processes doing work and some not? (e.g., blocking in sleep() or something)
I think the reason for my confusion is that MPI processes are generally designed to run 1 per core (or perhaps 1 MPI process per more-than-1-core, if the MPI process is multi-threaded). MPI processes are generally assumed to aggressively use the entire computational resource that is given to them -- sharing computational resources (e.g., cores) between multiple MPI processes would seem to violate that assumption, and therefore result in bad overall performance.
I feel like I must be missing something in what you're trying to describe...
> Now we need to consider the communications as well. We will bring the computational load balance but those MPI which communicate more will be mapped to the same node (not necessarily same cores). I have solved this optimization problem using ILP and that shows good results. But the thing is, in the solution I have found that after applying ILP or my heuristic, the cores (on all nodes) will no longer contain same number of MPI processes (load and communications are balanced instead of count of MPI processes per core). So this means either I use process migration for few processes or I run more than 2000 (means at every core I run few more processes) so that at the end imbalance in the number or MPI processes per core can be achieved (to achieve balance in load and communications). I need your suggestions in these regards,
> thanks and best regards,
> From: Josh Hursey <jjhursey_at_[hidden]>
> To: Open MPI Users <users_at_[hidden]>
> Cc: Mudassar Majeed <mudassarm30_at_[hidden]>
> Sent: Thursday, November 10, 2011 5:11 PM
> Subject: Re: [OMPI users] Process Migration
> Note that the "migrate me from my current node to node <foo>" scenario
> is covered by the migration API exported by the C/R infrastructure, as
> I noted earlier.
> The "move rank N to node <foo>" scenario could probably be added as an
> extension of this interface (since you can do that via the command
> line now) if that is what you are looking for.
> -- Josh
> On Thu, Nov 10, 2011 at 11:03 AM, Ralph Castain <rhc_at_[hidden]> wrote:
> > So what you are looking for is an MPI extension API that let's you say
> > "migrate me from my current node to node <foo>"? Or do you have a rank that
> > is the "master" that would order "move rank N to node <foo>"?
> > Either could be provided, I imagine - just want to ensure I understand what
> > you need. Can you pass along a brief description of the syntax and
> > functionality you would need?
> > On Nov 10, 2011, at 8:27 AM, Mudassar Majeed wrote:
> > Thank you for your reply. In our previous publication, we have figured it
> > out that run more than one processes on cores and balancing the
> > computational load considerably reduces the total execution time. You know
> > the MPI_Graph_create function, we created another function MPI_Load_create
> > that maps the processes on cores such that balance of computational load can
> > be achieved on cores. We were having some issues with increase in
> > communication cost due to ranks rearrangements (due to MPI_Comm_split, with
> > color=0), so in this research work we will see how can we balance both
> > computation load on each core and communication load on each node. Those
> > processes that communicate more will reside on the same node keeping the
> > computational load balance over the cores. I solved this problem using ILP
> > but ILP takes time and can't be used in run time so I am thinking about an
> > heuristic. That's why I want to see if it is possible to migrate a process
> > from one core to another or not. Then I will see how good my heuristic will
> > be.
> > thanks
> > Mudassar
> > ________________________________
> > From: Jeff Squyres <jsquyres_at_[hidden]>
> > To: Mudassar Majeed <mudassarm30_at_[hidden]>; Open MPI Users
> > <users_at_[hidden]>
> > Cc: Ralph Castain <rhc_at_[hidden]>
> > Sent: Thursday, November 10, 2011 2:19 PM
> > Subject: Re: [OMPI users] Process Migration
> > On Nov 10, 2011, at 8:11 AM, Mudassar Majeed wrote:
> >> Thank you for your reply. I am implementing a load balancing function for
> >> MPI, that will balance the computation load and the communication both at a
> >> time. So my algorithm assumes that all the cores may at the end get
> >> different number of processes to run.
> > Are you talking about over-subscribing cores? I.e., putting more than 1 MPI
> > process on each core?
> > In general, that's not a good idea.
> >> In the beginning (before that function will be called), each core will
> >> have equal number of processes. So I am thinking either to start more
> >> processes on each core (than needed) and run my function for load balancing
> >> and then block the remaining processes (on each core). In this way I will be
> >> able to achieve different number of processes per core.
> > Open MPI spins aggressively looking for network progress. For example, if
> > you block in an MPI_RECV waiting for a message, Open MPI is actively banging
> > on the CPU looking for network progress. Because of this (and other
> > reasons), you probably do not want to over-subscribe your processors
> > (meaning: you probably don't want to put more than 1 process per core).
> > --
> > Jeff Squyres
> > jsquyres_at_[hidden]
> > For corporate legal information go to:
> > http://www.cisco.com/web/about/doing_business/legal/cri/
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> Joshua Hursey
> Postdoctoral Research Associate
> Oak Ridge National Laboratory
> users mailing list
For corporate legal information go to: