Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] OpenMPI job initializing problem
From: Beichuan Yan (beichuan.yan_at_[hidden])
Date: 2014-03-03 18:43:08

I agree there are two cases for pure-MPI mode: 1. Job fails with no apparent reason; 2 job complains shared-memory file on network file system, which can be resolved by " export TMPDIR=/home/yanb/tmp", /home/yanb/tmp is my local directory. The default TMPDIR points to a Lustre directory.

There is no any other output. I checked my job with "qstat -n" and found that processes were actually not started on compute nodes even though PBS Pro has "started" my job.


> 3. Then I test pure-MPI mode: OPENMP is turned off, and each compute node runs 16 processes (clearly shared-memory of MPI is used). Four combinations of "TMPDIR" and "TCP" are tested:
> case 1:
> #export TMPDIR=/home/yanb/tmp
> TCP="--mca btl_tcp_if_include"
> mpirun $TCP -np 64 -npernode 16 -hostfile $PBS_NODEFILE ./paraEllip3d
> input.txt
> output:
> Start Prologue v2.5 Mon Mar 3 15:47:16 EST 2014 End Prologue v2.5 Mon
> Mar 3 15:47:16 EST 2014
> -bash: line 1: 448597 Terminated /var/spool/PBS/mom_priv/jobs/602244.service12.SC
> Start Epilogue v2.5 Mon Mar 3 15:50:51 EST 2014 Statistics
> cpupercent=0,cput=00:00:00,mem=7028kb,ncpus=128,vmem=495768kb,walltime
> =00:03:24 End Epilogue v2.5 Mon Mar 3 15:50:52 EST 2014

It looks like you have two general cases:

1. The job fails for no apparent reason (like above), or 2. The job complains that your TMPDIR is on a shared filesystem


I think the real issue, then, is to figure out why your jobs are failing with no output.

Is there anything in the stderr output?

Jeff Squyres
For corporate legal information go to:
users mailing list