Subject: Re: [OMPI users] qsub - mpirun problem
From: Zhiliang Hu (zhu_at_[hidden])
Date: 2008-09-28 22:07:38


Thank you for your quick response.

Indeed as you expected, "printenv | grep PBS" produced nothing.

BTW, I have:

> qmgr -c 'p s'

# Create queues and set their attributes.
# Create and define queue default
create queue default
set queue default queue_type = Execution
set queue default resources_default.nodes = 7
set queue default enabled = True
set queue default started = True
# Set server attributes.
set server scheduling = True
set server acl_hosts = nagrp2
set server default_queue = default
set server log_events = 511
set server mail_from = adm
set server query_other_jobs = True
set server resources_available.nodect = 6
set server scheduler_iteration = 600
set server node_check_rate = 150
set server tcp_timeout = 6
set server next_job_number = 793

- I am not sure what/how is missing from my configurations (do you mean the installation "configure" step with optional directives) or else?

Thank you,


At 07:16 PM 9/28/2008 -0600, you wrote:
>Hi Zhiliang
>First thing to check is that your Torque system is defining and
>setting the environmental variables we are expecting in a Torque
>system. It is quite possible that your Torque system isn't configured
>as we expect.
>Can you run a job and send us the output from "printenv | grep PBS"?
>We should see a PBS jobid, the name of the file containing the names
>of the allocated nodes, etc.
>Since you are able to run with -machinefile, my guess is that your
>system isn't setting those environmental variables as we expect. In
>that case, you will have to keep specifying the machinefile by hand.
>On Sep 28, 2008, at 7:02 PM, Zhiliang Hu wrote:
>>I have asked this question on TorqueUsers list. Responses from that
>>list suggests that the question be asked on this list:
>>The situation is:
>>I can submit my jobs as in:
>>>qsub -l nodes=6:ppn=2 /path/to/mpi_program
>>where "mpi_program" is:
>>/path/to/mpirun -np 12 /path/to/my_program
>>-- however everything went to run on the head node (one time on the
>>first compute node). Jobs can be done anyway.
>>While the mpirun can run on its own by specifying a "-machinefile",
>>it is pointed out by Glen among others, and also on this web site (I got the same error as the last example on that web page) that
>>it's not a good idea to provide machinefile since it's "already
>>handled by OpenMPI and Torque".
>>My question is, why the OpenMPI and Torque is not handling the jobs
>>to all nodes?
>>ps 1:
>>The OpenMPI is configured and installed with the "--with-tm" option,
>>and the "ompi_info" does show lines:
>>MCA ras: tm (MCA v1.0, API v1.3, Component v1.2.7)
>>MCA pls: tm (MCA v1.0, API v1.3, Component v1.2.7)
>>ps 2:
>>"/path/to/mpirun -np 12 -machinefile /path/to/machinefile /path/to/ my_program"
>>works normal (send jobs to all nodes).
>>users mailing list
>users mailing list