Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] RFC: Add ompi-top tool
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2008-12-13 08:20:21

This works for me. LAM had a similar tool to query daemons and find
the current state of running MPI procs (although it didn't get top-
like statistics of the apps).

On Dec 12, 2008, at 3:20 PM, Ralph Castain wrote:

> ----------------------------------------------------------------------------
> WHAT: Add new tool to retrieve/monitor process stats
> ----------------------------------------------------------------------------
> WHY: Several of us have had user requests to provide a
> convenient way of obtaining reports on memory usage and
> other typical stats from their MPI procs. The notion was to
> create a tool that would allow a user to specify multiple ranks
> (which could be on any number of nodes), and have the tool
> query mpirun to get the info. This would avoid the necessity
> of users remotely logging into multiple nodes to run top, ps,
> or other stat tools - and from having to use something heavy
> like Totalview for such a small purpose.
> ----------------------------------------------------------------------------
> WHERE: Involves the following:
> 1. new opal framework "opal/mca/pstat" with components
> to support obtaining process stats from the different OS's.
> Note that application procs do -not- open this framework.
> The open/select functions are -only- in the orte_init procedures
> for the HNP and orteds. This is because an app would never
> have any reason to call this framework, so there is no reason
> to open it.
> 2. new "orte-top" tool (also avail as ompi-top) that sends
> the top request to the specified mpirun and prints out
> the returned data. No fancy screen handling - just basic
> printout
> 3. slight mods to orted_comm to receive and process the
> new cmd
> 4. added new cmd flag define to orte/mca/odls/odls_types.h
> 5. added new base function to orte/mca/odls/base/
> odls_base_default_fns.c
> to lookup the specified child and call opal_pstat to get
> the info
> ----------------------------------------------------------------------------
> WHEN: I would like to do this before the holiday break, if
> possible, given that Sun, Cisco, and IU are all aware and
> supportive of this change. However, since a number of
> community members are tied up with the MPI Forum next week,
> I propose to see if there are any immediate concerns and, if so,
> wait until after the holiday to more thoroughly discuss them.
> ----------------------------------------------------------------------------
> TIMEOUT: Dec 23
> ----------------------------------------------------------------------------
> _______________________________________________
> devel mailing list
> devel_at_[hidden]

Jeff Squyres
Cisco Systems