Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] [torqueusers] Job dies randomly, but only through torque
From: Jim Kusznir (jkusznir_at_[hidden])
Date: 2008-05-27 19:02:17


Yep. Wall time is no where near violation (dies about 2 minutes into
a 30 minute allocation). I did a ulimit -a through qsub and direct on
the node (as the same user in both cases), and the results were
identical (most items were unlimited).

Any other ideas?

--Jim

On Tue, May 27, 2008 at 9:25 AM, Jan Ploski <Jan.Ploski_at_[hidden]> wrote:
>
> This suggestion is rather trivial, but since you have not mentioned
> anything in this area:
>
> Are you sure that the job is not exceeding resource limits (walltime -
> enforced by TORQUE, or rlimits such as memory - enforced by the kernel,
> but they could be set differently in TORQUE and your manual invocations of
> mpirun).
>
> Regards,
> Jan Ploski
>