Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Error launching single-node tasks from multiple-node job.
From: Lee-Ping Wang (leeping_at_[hidden])
Date: 2013-08-10 19:40:59


Hi Ralph,

Thank you. I didn't know that "--without-tm" was the correct configure
option. I built and reinstalled OpenMPI 1.4.2, and now I no longer need to
set PBS_JOBID for it to recognize the correct machine file. My current
workflow is:

1) Submit a multiple-node batch job.
2) Launch a separate process on each node with "pbsdsh".
2) On each node, create a file called
/scratch/leeping/pbs_nodefile.$HOSTNAME which contains 24 instances of the
hostname (since there are 24 cores).
3) Set $PBS_NODEFILE=/scratch/leeping/pbs_nodefile.$HOSTNAME.
4) In the Q-Chem wrapper script, make sure mpirun is called with the command
line argument: -machinefile $PBS_NODEFILE

Everything seems to work, thanks to your help and Gus. I might report back
if the jobs fail halfway through or if there is no speedup, but for now
everything seems to be in place.

- Lee-Ping

-----Original Message-----
From: users [mailto:users-bounces_at_[hidden]] On Behalf Of Ralph Castain
Sent: Saturday, August 10, 2013 4:28 PM
To: Open MPI Users
Subject: Re: [OMPI users] Error launching single-node tasks from
multiple-node job.

It helps if you use the correct configure option: --without-tm

Regardless, you can always deselect Torque support at runtime. Just put the
following in your environment:

OMPI_MCA_ras=^tm

That will tell ORTE to ignore the Torque allocation module and it should
then look at the machinefile.

On Aug 10, 2013, at 4:18 PM, "Lee-Ping Wang" <leeping_at_[hidden]> wrote:

> Hi Gus,
>
> I agree that $PBS_JOBID should not point to a file in normal
> situations, because it is the job identifier given by the scheduler.
> However, ras_tm_module.c actually does search for a file named
> $PBS_JOBID, and that seems to be why it was failing. You can see this
> in the source code as well (look at ras_tm_module.c, I uploaded it to
> https://dl.dropboxusercontent.com/u/5381783/ras_tm_module.c ). Once I
> changed the $PBS_JOBID environment variable to the name of the node
> file, things seemed to work - though I agree, it's not very logical.
>
> I doubt Q-Chem is causing the issue, because I was able to "fix"
> things by changing $PBS_JOBID before Q-Chem is called. Also, I
> provided the command line to mpirun in a previous email, where the
> -machinefile argument correctly points to the custom machine file that
> I created. The missing environment variables should not matter.
>
> The PBS_NODEFILE created by Torque is
> /opt/torque/aux//272139.certainty.stanford.edu and it never gets
> touched. I followed the advice in your earlier email and I created my
> own node file on each node called
> /scratch/leeping/pbs_nodefile.$HOSTNAME, and I set PBS_NODEFILE to
> point to this file. However, this file does not get used either, even
> if I include it on the mpirun command line, unless I set PBS_JOBID to the
file name.
>
> Finally, I was not able to build OpenMPI 1.4.2 without pbs support. I
> used the configure flag --without-rte-support, but the build failed
> halfway through.
>
> Thanks,
>
> - Lee-Ping
>
> leeping_at_certainty-a:~/temp$ qsub -I -q debug -l walltime=1:00:00 -l
> nodes=1:ppn=12
> qsub: waiting for job 272139.certainty.stanford.edu to start
> qsub: job 272139.certainty.stanford.edu ready
>
> leeping_at_compute-140-4:~$ echo $PBS_NODEFILE
> /opt/torque/aux//272139.certainty.stanford.edu
>
> leeping_at_compute-140-4:~$ cat $PBS_NODEFILE
> compute-140-4
> compute-140-4
> compute-140-4
> compute-140-4
> compute-140-4
> compute-140-4
> compute-140-4
> compute-140-4
> compute-140-4
> compute-140-4
> compute-140-4
> compute-140-4
>
> leeping_at_compute-140-4:~$ echo $PBS_JOBID 272139.certainty.stanford.edu
>
> leeping_at_compute-140-4:~$ cat $PBS_JOBID
> cat: 272139.certainty.stanford.edu: No such file or directory
>
> leeping_at_compute-140-4:~$ env | grep PBS
> PBS_VERSION=TORQUE-2.5.3
> PBS_JOBNAME=STDIN
> PBS_ENVIRONMENT=PBS_INTERACTIVE
> PBS_O_WORKDIR=/home/leeping/temp
> PBS_TASKNUM=1
> PBS_O_HOME=/home/leeping
> PBS_MOMPORT=15003
> PBS_O_QUEUE=debug
> PBS_O_LOGNAME=leeping
> PBS_O_LANG=en_US.iso885915
> PBS_JOBCOOKIE=A27B00DAF72024CBEBB7CD3752BDBADC
> PBS_NODENUM=0
> PBS_NUM_NODES=1
> PBS_O_SHELL=/bin/bash
> PBS_SERVER=certainty.stanford.edu
> PBS_JOBID=272139.certainty.stanford.edu
> PBS_O_HOST=certainty-a.local
> PBS_VNODENUM=0
> PBS_QUEUE=debug
> PBS_O_MAIL=/var/spool/mail/leeping
> PBS_NUM_PPN=12
> PBS_NODEFILE=/opt/torque/aux//272139.certainty.stanford.edu
> PBS_O_PATH=/opt/intel/Compiler/11.1/064/bin/intel64:/opt/intel/Compile
> r/11.1
> /064/bin/intel64:/usr/local/cuda/bin:/home/leeping/opt/psi-4.0b5/bin:/
> home/l
> eeping/opt/tinker/bin:/home/leeping/opt/cctools/bin:/home/leeping/bin:
> /home/
> leeping/local/bin:/home/leeping/opt/bin:/usr/kerberos/bin:/usr/java/la
> test/b
> in:/usr/local/bin:/bin:/usr/bin:/opt/ganglia/bin:/opt/ganglia/sbin:/op
> t/open
> mpi/bin/:/opt/maui/bin:/opt/torque/bin:/opt/torque/sbin:/opt/rocks/bin
> :/opt/ rocks/sbin:/opt/sun-ct/bin:/home/leeping/bin
>
> -----Original Message-----
> From: users [mailto:users-bounces_at_[hidden]] On Behalf Of Gustavo
> Correa
> Sent: Saturday, August 10, 2013 3:58 PM
> To: Open MPI Users
> Subject: Re: [OMPI users] Error launching single-node tasks from
> multiple-node job.
>
> Lee-Ping
>
> Something looks amiss.
> PBS_JOBID contains the job name.
> PBS_NODEFILE contains a list (with repetitions up to the number of
> cores) of the nodes that torque assigned to the job.
>
> Why things get twisted it is hard to tell, it may be something in the
> Q-Chem scripts (could it be mixing up PBS_JOBID and PBS_NODEFILE?), it
> may be something else.
> A more remote possibility is if the cluster has a Torque qsub wrapper
> that may perhaps produce the aforementioned confusion. Unlikely, but
possible.
>
> To sort out, run any simple job (mpiexec -np 32 hostname), or even
> your very Q-Chem job, but precede it with a bunch of printouts of the
> PBS environment
> variables:
> echo $PBS_JOBID
> echo $PBS_NODEFILE
> ls -l $PBS_NODEFILE
> cat $PBS_NODEFILE
> cat $PBS_JOBID [this one should fail, because that is not a file, but
> may work the PBS variables were messed up along the way]
>
> I hope this helps,
> Gus Correa
>
> On Aug 10, 2013, at 6:39 PM, Lee-Ping Wang wrote:
>
>> Hi Gus,
>>
>> It seems the calculation is now working, or at least it didn't crash.
>> I set the PBS_JOBID environment variable to the name of my custom
>> node file. That is to say, I set
PBS_JOBID=pbs_nodefile.compute-3-3.local.
>> It appears that ras_tm_module.c is trying to open the file located at
>> /scratch/leeping/$PBS_JOBID for some reason, and it is disregarding
>> the machinefile argument on the command line.
>>
>> It'll be a few hours before I know for sure whether the job actually
> worked.
>> I still don't know why things are structured this way, however.
>>
>> Thanks,
>>
>> - Lee-Ping
>>
>> -----Original Message-----
>> From: users [mailto:users-bounces_at_[hidden]] On Behalf Of Lee-Ping
>> Wang
>> Sent: Saturday, August 10, 2013 3:07 PM
>> To: 'Open MPI Users'
>> Subject: Re: [OMPI users] Error launching single-node tasks from
>> multiple-node job.
>>
>> Hi Gus,
>>
>> I tried your suggestions. Here is the command line which executes
mpirun.
>> I was puzzled because it still reported a file open failure, so I
>> inserted a print statement into ras_tm_module.c and recompiled. The
> results are below.
>> As you can see, it tries to open a different file
>> (/scratch/leeping/272055.certainty.stanford.edu) than the one I
>> specified (/scratch/leeping/pbs_nodefile.compute-3-3.local).
>>
>> - Lee-Ping
>>
>> === mpirun command line ===
>> /home/leeping/opt/openmpi-1.4.2-intel11-dbg/bin/mpirun -machinefile
>> /scratch/leeping/pbs_nodefile.compute-3-3.local -x HOME -x PWD -x QC
>> -x QCAUX -x QCCLEAN -x QCFILEPREF -x QCLOCALSCR -x QCPLATFORM -x
>> QCREF -x QCRSH -x QCRUNNAME -x QCSCRATCH
>> -np 24 /home/leeping/opt/qchem40/exe/qcprog.exe
>> .B.in.28642.qcin.1 ./qchem28642/ >>B.out
>>
>> === Error message from compute node === [compute-3-3.local:28666]
>> Warning: could not find environment variable "QCLOCALSCR"
>> [compute-3-3.local:28666] Warning: could not find environment
>> variable "QCREF"
>> [compute-3-3.local:28666] Warning: could not find environment
>> variable "QCRUNNAME"
>> Attempting to open /scratch/leeping/272055.certainty.stanford.edu
>> [compute-3-3.local:28666] [[56726,0],0] ORTE_ERROR_LOG: File open
>> failure in file ras_tm_module.c at line 155 [compute-3-3.local:28666]
>> [[56726,0],0]
>> ORTE_ERROR_LOG: File open failure in file ras_tm_module.c at line 87
>> [compute-3-3.local:28666] [[56726,0],0] ORTE_ERROR_LOG: File open
>> failure in file base/ras_base_allocate.c at line 133
>> [compute-3-3.local:28666] [[56726,0],0] ORTE_ERROR_LOG: File open
>> failure in file base/plm_base_launch_support.c at line 72
>> [compute-3-3.local:28666] [[56726,0],0] ORTE_ERROR_LOG: File open
>> failure in file plm_tm_module.c at line 167
>>
>> -----Original Message-----
>> From: users [mailto:users-bounces_at_[hidden]] On Behalf Of Lee-Ping
>> Wang
>> Sent: Saturday, August 10, 2013 12:51 PM
>> To: 'Open MPI Users'
>> Subject: Re: [OMPI users] Error launching single-node tasks from
>> multiple-node job.
>>
>> Hi Gus,
>>
>> Thank you. You gave me many helpful suggestions, which I will try
>> out and get back to you. I will provide more specifics (e.g. how my
>> jobs were
>> submitted) in a future email.
>>
>> As for the queue policy, that is a highly political issue because the
>> cluster is a shared resource. My usual recourse is to use the batch
>> system as effectively as possible within the confines of their
>> policies. This is why it makes sense to submit a single
>> multiple-node batch job, which then executes several independent
single-node tasks.
>>
>> - Lee-Ping
>>
>> -----Original Message-----
>> From: users [mailto:users-bounces_at_[hidden]] On Behalf Of Gustavo
>> Correa
>> Sent: Saturday, August 10, 2013 12:39 PM
>> To: Open MPI Users
>> Subject: Re: [OMPI users] Error launching single-node tasks from
>> multiple-node job.
>>
>> Hi Lee-Ping
>> On Aug 10, 2013, at 3:15 PM, Lee-Ping Wang wrote:
>>
>>> Hi Gus,
>>>
>>> Thank you for your reply. I want to run MPI jobs inside a single
>>> node, but due to the resource allocation policies on the clusters, I
>>> could get many more resources if I submit multiple-node "batch jobs".
>>> Once I have a multiple-node batch job, then I can use a command like
>>> "pbsdsh" to run single node MPI jobs on each node that is allocated
>>> to me. Thus, the MPI jobs on each node are running independently of
>>> each other and unaware of one another.
>>
>> Even if you use pbdsh to launch separate MPI jobs on individual
>> nodes, you probably (not 100% sure about that), probably need to
>> specify he -hostfile naming the specific node that each job will run on.
>>
>> Still quite confused because you didn't tell how your "qsub" command
>> looks like, what Torque script (if any) it is launching, etc.
>>
>>>
>>> The actual call to mpirun is nontrivial to get, because Q-Chem has a
>>> complicated series of wrapper scripts which ultimately calls mpirun.
>>
>> Yes, I just found this out on the Web. See my previous email.
>>
>>> If the
>>> jobs are failing immediately, then I only have a small window to
>>> view the actual command through "ps" or something.
>>>
>>
>> Are you launching the jobs interactively?
>> I.e., with the -I switch to qsub?
>>
>>
>>> Another option is for me to compile OpenMPI without Torque / PBS
support.
>>> If I do that, then it won't look for the node file anymore. Is this
>>> correct?
>>
>> You will need to tell mpiexec where to launch the jobs.
>> If I understand what you are trying to achieve (and I am not sure I
>> do), one way to do it would be to programatically split the
>> $PBS_NODEFILE into several hostfiles, one per MPI job (so to speak)
>> that
> you want to launch.
>> Then use each of these nodefiles for each of the MPI jobs.
>> Note that the PBS_NODEFILE has one line per-node-per-core, *not* one
>> line per node.
>> I have no idea how the trick above could be reconciled with the
>> Q-Chem scripts, though.
>>
>> Overall, I don't understand why you would benefit from such a
>> complicated scheme, rather than lauching either a big MPI job across
>> all nodes that you requested (if the problem is large enough to
>> benefit from this many cores), or launch several small single-node
>> jobs (if the problem is small enough to fit well a single node).
>>
>> You may want to talk to the cluster managers, because there must be a
>> way to reconcile their queue policies with your needs (if this not
>> already in place).
>> We run tons of parallel single-node jobs here, for problems that fit
>> well a single node.
>>
>>
>> My two cents
>> Gus Correa
>>
>>>
>>> I will try your suggestions and get back to you. Thanks!
>>>
>>> - Lee-Ping
>>>
>>> -----Original Message-----
>>> From: users [mailto:users-bounces_at_[hidden]] On Behalf Of Gustavo
>> Correa
>>> Sent: Saturday, August 10, 2013 12:04 PM
>>> To: Open MPI Users
>>> Subject: Re: [OMPI users] Error launching single-node tasks from
>>> multiple-node job.
>>>
>>> Hi Lee-Ping
>>>
>>> I know nothing about Q-Chem, but I was confused by these sentences:
>>>
>>> "That is to say, these tasks are intended to use OpenMPI parallelism
>>> on
>> each
>>> node, but no parallelism across nodes. "
>>>
>>> "I do not observe this error when submitting single-node jobs."
>>>
>>> "Since my jobs are only parallel over the node they're running on, I
>> believe
>>> that a node file of any kind is unnecessary. "
>>>
>>> Are you trying to run MPI jobs across several nodes or inside a
>>> single
>> node?
>>>
>>> ***
>>>
>>> Anyway, as far as I know,
>>> if your OpenMPI was compiled with Torque/PBS support, the
>>> mpiexec/mpirun command will look for the $PBS_NODEFILE to learn in
>>> which node(s) it
>> should
>>> launch the MPI processes, regardless of whether you are using one
>>> node or more than one node.
>>>
>>> You didn't send your mpiexec command line (which would help), but
>>> assuming that Q-Chem allows some level of standard mpiexec command
>>> options, you
>> could
>>> force passing the $PBS_NODEFILE to it.
>>>
>>> Something like this (for two nodes with 8 cores each):
>>>
>>> #PBS -q myqueue
>>> #PBS -l nodes=2:ppn=8
>>> #PBS -N myjob
>>> cd $PBS_O_WORKDIR
>>> ls -l $PBS_NODEFILE
>>> cat $PBS_NODEFILE
>>>
>>> mpiexec -hostfile $PBS_NODEFILE -np 16 ./my-Q-chem-executable
>>> <parameters
>> to
>>> Q-chem>
>>>
>>> I hope this helps,
>>> Gus Correa
>>>
>>> On Aug 10, 2013, at 1:51 PM, Lee-Ping Wang wrote:
>>>
>>>> Hi there,
>>>>
>>>> Recently, I've begun some calculations on a cluster where I submit
>>>> a
>>> multiple node job to the Torque batch system, and the job executes
>> multiple
>>> single-node parallel tasks. That is to say, these tasks are
>>> intended to
>> use
>>> OpenMPI parallelism on each node, but no parallelism across nodes.
>>>>
>>>> Some background: The actual program being executed is Q-Chem 4.0.
>>>> I use
>>> OpenMPI 1.4.2 for this, because Q-Chem is notoriously difficult to
>>> compile and this is the last known version of OpenMPI that this
>>> version of Q-Chem
>> is
>>> known to work with.
>>>>
>>>> My jobs are failing with the error message below; I do not observe
>>>> this
>>> error when submitting single-node jobs. From reading the mailing
>>> list archives
>> (http://www.open-mpi.org/community/lists/users/2010/03/12348.php),
>>> I believe it is looking for a PBS node file somewhere. Since my
>>> jobs are only parallel over the node they're running on, I believe
>>> that a node file of any kind is unnecessary.
>>>>
>>>> My question is: Why is OpenMPI behaving differently when I submit a
>>> multi-node job compared to a single-node job? How does OpenMPI
>>> detect
>> that
>>> it is running under a multi-node allocation? Is there a way I can
>>> change OpenMPI's behavior so it always thinks it's running on a
>>> single node, regardless of the type of job I submit to the batch system?
>>>>
>>>> Thank you,
>>>>
>>>> - Lee-Ping Wang (Postdoc in Dept. of Chemistry, Stanford
>>> University)
>>>>
>>>> [compute-1-1.local:10910] [[42010,0],0] ORTE_ERROR_LOG: File open
>>>> failure in file ras_tm_module.c at line 153
>>>> [compute-1-1.local:10909] [[42009,0],0] ORTE_ERROR_LOG: File open
>>>> failure in file ras_tm_module.c at line 153
>>>> [compute-1-1.local:10911] [[42011,0],0]
>>>> ORTE_ERROR_LOG: File open failure in file ras_tm_module.c at line
>>>> 153 [compute-1-1.local:10910] [[42010,0],0] ORTE_ERROR_LOG: File
>>>> open failure in file ras_tm_module.c at line 87
>>>> [compute-1-1.local:10909] [[42009,0],0] ORTE_ERROR_LOG: File open
>>>> failure in file ras_tm_module.c at line 87
>>>> [compute-1-1.local:10911] [[42011,0],0]
>>>> ORTE_ERROR_LOG: File open failure in file ras_tm_module.c at line
>>>> 87 [compute-1-1.local:10910] [[42010,0],0] ORTE_ERROR_LOG: File
>>>> open failure in file base/ras_base_allocate.c at line 133
>>>> [compute-1-1.local:10909] [[42009,0],0] ORTE_ERROR_LOG: File open
>>>> failure in file base/ras_base_allocate.c at line 133
>>>> [compute-1-1.local:10911] [[42011,0],0] ORTE_ERROR_LOG: File open
>>>> failure in file base/ras_base_allocate.c at line 133
>>>> [compute-1-1.local:10910] [[42010,0],0] ORTE_ERROR_LOG: File open
>>>> failure in file base/plm_base_launch_support.c at line 72
>>>> [compute-1-1.local:10909] [[42009,0],0] ORTE_ERROR_LOG: File open
>>>> failure in file base/plm_base_launch_support.c at line 72
>>>> [compute-1-1.local:10911] [[42011,0],0] ORTE_ERROR_LOG: File open
>>>> failure in file base/plm_base_launch_support.c at line 72
>>>> [compute-1-1.local:10910] [[42010,0],0] ORTE_ERROR_LOG: File open
>>>> failure in file plm_tm_module.c at line 167
>>>> [compute-1-1.local:10909] [[42009,0],0] ORTE_ERROR_LOG: File open
>>>> failure in file plm_tm_module.c at line 167
>>>> [compute-1-1.local:10911] [[42011,0],0]
>>>> ORTE_ERROR_LOG: File open failure in file plm_tm_module.c at line
>>>> 167 _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

_______________________________________________
users mailing list
users_at_[hidden]
http://www.open-mpi.org/mailman/listinfo.cgi/users