Open MPI logo

MTT Devel Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all MTT Devel mailing list

Subject: Re: [MTT devel] Analysis of hung jobs.
From: Ethan Mallove (ethan.mallove_at_[hidden])
Date: 2009-10-07 16:21:39


On Wed, Oct/07/2009 09:04:22PM, Ashley Pittman wrote:
> On Wed, 2009-10-07 at 15:41 -0400, Ethan Mallove wrote:
>
> > I got the following error doing a simple test:
>
> As it happens I saw this error earlier on FC8, r279 should fix this
> problem.

Thanks. That eliminates the perl regex error.

>
> > $ perl --version
> > This is perl, v5.8.4 built for sun4-solaris-64int
>
> I had wondered if you'd be using solaris, this is not something I've
> tested and not something I'd expect to work. The stack trace code
> should all be fine but there might be some problems reading data
> from /proc. In the past padb has worked on Tru64, possibly all that
> needs porting would be getting parent pid and process name from ps
> rather than /proc/status.
>

Okay. I've moved to Linux for testing:

  $ padb --debug=all --verbose --config-option rmgr=mpirun --full-report=29713
  Loading config from "/etc/padb.conf"
  Loading config from "/home/em162155/.padbrc"
  Loading config from environment
  Loading config from command line
  Setting 'rmgr' to 'mpirun'
  DEBUG (config): 0: Finished setting configuration options
  padb version 3.n (Revision 279)
  full job report for job 29713

  No secret file (/home/em162155/.padb-secret)
  Error: Could not load secret file on this node

-Ethan

> Ashley,
>
> --
>
> Ashley Pittman, Bath, UK.
>
> Padb - A parallel job inspection tool for cluster computing
> http://padb.pittman.org.uk
>
> _______________________________________________
> mtt-devel mailing list
> mtt-devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/mtt-devel