Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] OMPI monitor each process behavior
From: Ralph Castain (rhc_at_[hidden])
Date: 2011-04-13 12:32:51


On Apr 13, 2011, at 10:29 AM, Jack Bryan wrote:

> Hi ,
>
> If I cannot ssh to a worker node, it means that my program cannot work correctly ?

No, that's not true. People thought you were on a cluster using ssh as the launcher. From prior notes, you were using Torque, so not being allowed to ssh is just an admin thing.

>
> I can run it on 32 nodes *4 cores/node parallel processes. But, for larger parallel processes,
> 128 nodes * 1 cpu/node, it is killed by signal 9.
>
> Is this a reason ?

No, it isn't

>
> thanks
>
> > Date: Wed, 13 Apr 2011 05:59:10 -0700
> > From: n8tm_at_[hidden]
> > To: users_at_[hidden]
> > Subject: Re: [OMPI users] OMPI monitor each process behavior
> >
> > On 4/12/2011 8:55 PM, Jack Bryan wrote:
> >
> > >
> > > I need to monitor the memory usage of each parallel process on a linux
> > > Open MPI cluster.
> > >
> > > But, top, ps command cannot help here because they only show the head
> > > node information.
> > >
> > > I need to follow the behavior of each process on each cluster node.
> > Did you consider ganglia et al?
> > >
> > > I cannot use ssh to access each node.
> > How can MPI run?
> > >
> > > The program takes 8 hours to finish.
> >
> >
> >
> > --
> > Tim Prince
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users