Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] OpenMPI job initializing problem
From: Beichuan Yan (beichuan.yan_at_[hidden])
Date: 2014-03-03 19:10:19


How to set TMPDIR to a local filesystem? Is /home/yanb/tmp a local filesystem? I don't know how to tell a directory is local file system or network file system.

-----Original Message-----
From: users [mailto:users-bounces_at_[hidden]] On Behalf Of Jeff Squyres (jsquyres)
Sent: Monday, March 03, 2014 16:57
To: Open MPI Users
Subject: Re: [OMPI users] OpenMPI job initializing problem

How about setting TMPDIR to a local filesystem?

On Mar 3, 2014, at 3:43 PM, Beichuan Yan <beichuan.yan_at_[hidden]> wrote:

> I agree there are two cases for pure-MPI mode: 1. Job fails with no apparent reason; 2 job complains shared-memory file on network file system, which can be resolved by " export TMPDIR=/home/yanb/tmp", /home/yanb/tmp is my local directory. The default TMPDIR points to a Lustre directory.
>
> There is no any other output. I checked my job with "qstat -n" and found that processes were actually not started on compute nodes even though PBS Pro has "started" my job.
>
> Beichuan
>
>> 3. Then I test pure-MPI mode: OPENMP is turned off, and each compute node runs 16 processes (clearly shared-memory of MPI is used). Four combinations of "TMPDIR" and "TCP" are tested:
>> case 1:
>> #export TMPDIR=/home/yanb/tmp
>> TCP="--mca btl_tcp_if_include 10.148.0.0/16"
>> mpirun $TCP -np 64 -npernode 16 -hostfile $PBS_NODEFILE ./paraEllip3d
>> input.txt
>> output:
>> Start Prologue v2.5 Mon Mar 3 15:47:16 EST 2014 End Prologue v2.5
>> Mon Mar 3 15:47:16 EST 2014
>> -bash: line 1: 448597 Terminated /var/spool/PBS/mom_priv/jobs/602244.service12.SC
>> Start Epilogue v2.5 Mon Mar 3 15:50:51 EST 2014 Statistics
>> cpupercent=0,cput=00:00:00,mem=7028kb,ncpus=128,vmem=495768kb,walltim
>> e
>> =00:03:24 End Epilogue v2.5 Mon Mar 3 15:50:52 EST 2014
>
> It looks like you have two general cases:
>
> 1. The job fails for no apparent reason (like above), or 2. The job
> complains that your TMPDIR is on a shared filesystem
>
> Right?
>
> I think the real issue, then, is to figure out why your jobs are failing with no output.
>
> Is there anything in the stderr output?
>
> --
> Jeff Squyres
> jsquyres_at_[hidden]
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

--
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
_______________________________________________
users mailing list
users_at_[hidden]
http://www.open-mpi.org/mailman/listinfo.cgi/users