Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] OMPI monitor each process behavior
From: Jack Bryan (dtustudy68_at_[hidden])
Date: 2011-04-13 12:29:22


Hi ,
If I cannot ssh to a worker node, it means that my program cannot work correctly ?
I can run it on 32 nodes *4 cores/node parallel processes. But, for larger parallel processes, 128 nodes * 1 cpu/node, it is killed by signal 9.
Is this a reason ?
thanks

> Date: Wed, 13 Apr 2011 05:59:10 -0700
> From: n8tm_at_[hidden]
> To: users_at_[hidden]
> Subject: Re: [OMPI users] OMPI monitor each process behavior
>
> On 4/12/2011 8:55 PM, Jack Bryan wrote:
>
> >
> > I need to monitor the memory usage of each parallel process on a linux
> > Open MPI cluster.
> >
> > But, top, ps command cannot help here because they only show the head
> > node information.
> >
> > I need to follow the behavior of each process on each cluster node.
> Did you consider ganglia et al?
> >
> > I cannot use ssh to access each node.
> How can MPI run?
> >
> > The program takes 8 hours to finish.
>
>
>
> --
> Tim Prince
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users