It sounds like you may not have setup paswordless ssh between all
On Sep 29, 2008, at 2:12 PM, Zhiliang Hu wrote:
> At 10:45 PM 9/29/2008 +0200, you wrote:
>> Am 29.09.2008 um 22:33 schrieb Zhiliang Hu:
>>> At 07:37 PM 9/29/2008 +0200, Reuti wrote:
>>>>> "-l nodes=6:ppn=2" is all I have to specify the node requests:
>>>> this might help: http://www.open-mpi.org/faq/?category=tm
>>> Essentially the examples given on this web is no difference from
>>> what I did.
>>> Only thing new is, I suppose "qsub -I " is for interactive mode.
>>> When I did this:
>>> qsub -I -l nodes=7 mpiblastn.sh
>>> It hangs on "qsub: waiting for job 798.nagrp2.ansci.iastate.edu to
>>>>> UNIX_PROMPT> qsub -l nodes=6:ppn=2 /path/to/mpi_program
>>>>> where "mpi_program" is a file with one line:
>>>>> /path/to/mpirun -np 12 /path/to/my_program
>>>> Can you please try this jobscript instead:
>>>> set | grep PBS
>>>> /path/to/mpirun /path/to/my_program
>>>> All should be handled by Open MPI automatically. With the "set"
>>>> command you will get a list with all defined variables for further
>>>> analysis; and where you can check for the variables set by Torque.
>>>> -- Reuti
>>> "set | grep PBS" part had nothing in output.
>> Strange - you checked the .o end .e files of the job? - Reuti
> There is nothing in -o nor -e output. I had to kill the job.
> I checked torque log, it shows (/var/spool/torque/server_logs):
> 09/29/2008 15:52:16;0100;PBS_Server;Job;799.xxx.xxx.xxx;enqueuing
> into default, state 1 hop 1
> 09/29/2008 15:52:16;0008;PBS_Server;Job;799.xxx.xxx.xxx;Job Queued
> at request of zhu_at_xxx.xxx.xxx, owner = zhu_at_xxx.xxx.xxx, job name =
> mpiblastn.sh, queue = default
> 09/29/2008 15:52:16;0040;PBS_Server;Svr;xxx.xxx.xxx;Scheduler sent
> command new
> 09/29/2008 15:52:16;0008;PBS_Server;Job;799.xxx.xxx.xxx;Job
> Modified at request of Scheduler_at_xxx.xxx.xxx
> 09/29/2008 15:52:27;0008;PBS_Server;Job;799.xxx.xxx.xxx;Job deleted
> at request of zhu_at_xxx.xxx.xxx
> 09/29/2008 15:52:27;0100;PBS_Server;Job;799.xxx.xxx.xxx;dequeuing
> from default, state EXITING
> 09/29/2008 15:52:27;0040;PBS_Server;Svr;xxx.xxx.xxx;Scheduler sent
> command term
> 09/29/2008 15:52:47;0001;PBS_Server;Svr;PBS_Server;is_request, bad
> attempt to connect from 172.16.100.1:1021 (address not trusted -
> check entry in server_priv/nodes)
> where the server_priv/nodes has:
> node001 np=4
> node002 np=4
> node003 np=4
> node004 np=4
> node005 np=4
> node006 np=4
> node007 np=4
> which was set up by the vender.
> What is "address not trusted"?
> users mailing list