Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] [torqueusers] Job dies randomly, but only through torque
From: Jim Kusznir (jkusznir_at_[hidden])
Date: 2008-05-27 19:02:17

Yep. Wall time is no where near violation (dies about 2 minutes into
a 30 minute allocation). I did a ulimit -a through qsub and direct on
the node (as the same user in both cases), and the results were
identical (most items were unlimited).

Any other ideas?


On Tue, May 27, 2008 at 9:25 AM, Jan Ploski <Jan.Ploski_at_[hidden]> wrote:
> This suggestion is rather trivial, but since you have not mentioned
> anything in this area:
> Are you sure that the job is not exceeding resource limits (walltime -
> enforced by TORQUE, or rlimits such as memory - enforced by the kernel,
> but they could be set differently in TORQUE and your manual invocations of
> mpirun).
> Regards,
> Jan Ploski