I am getting interested in this thread.
I'm looking for some solutions, where I can redirect a task/message (MPI_send) to a particular process (say rank 1), which is in a queue (at rank 1) to another process (say rank 2), if the queue is longer at rank 1.
How can I do it?
First of all, I need to know the queue length at a particular process (rank 1) at a particular instant. how can I use padb to get that info?
Then on the basis of that info 'send' some (queued up) messages (from rank 1) to some other process (say rank 2) which are relatively free. Is that possible?
On 02/09/2010, at 7:32 AM, Ashley Pittman wrote:
> On 1 Sep 2010, at 21:13, Brock Palen wrote:
>> I have a code for a user (namd if anyone cares) that on a specific case will lock up, a quick ltrace shows the processes doing Iprobes over and over, so this makes me think that a process someplace is blocking on communication.
>> What is the best way to look at message queues? To see what process is stuck and to drill into.
> The only three programs I know which can do this are TotalView, DDT and Padb. Totalview and DDT are graphical parallel debuggers and are commercial projects, Padb is a command-line tool and is open-source
> Ashley (padb developer)
> Ashley Pittman, Bath, UK.
> Padb - A parallel job inspection tool for cluster computing
> users mailing list