Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] qsub - mpirun problem
From: Doug Reeder (dlr_at_[hidden])
Date: 2008-09-29 17:15:52


It sounds like you may not have setup paswordless ssh between all
your nodes.

Doug Reeder
On Sep 29, 2008, at 2:12 PM, Zhiliang Hu wrote:

> At 10:45 PM 9/29/2008 +0200, you wrote:
>> Am 29.09.2008 um 22:33 schrieb Zhiliang Hu:
>>
>>> At 07:37 PM 9/29/2008 +0200, Reuti wrote:
>>>
>>>>> "-l nodes=6:ppn=2" is all I have to specify the node requests:
>>>>
>>>> this might help: http://www.open-mpi.org/faq/?category=tm
>>>
>>> Essentially the examples given on this web is no difference from
>>> what I did.
>>> Only thing new is, I suppose "qsub -I " is for interactive mode.
>>> When I did this:
>>>
>>> qsub -I -l nodes=7 mpiblastn.sh
>>>
>>> It hangs on "qsub: waiting for job 798.nagrp2.ansci.iastate.edu to
>>> start".
>>>
>>>
>>>>> UNIX_PROMPT> qsub -l nodes=6:ppn=2 /path/to/mpi_program
>>>>> where "mpi_program" is a file with one line:
>>>>> /path/to/mpirun -np 12 /path/to/my_program
>>>>
>>>> Can you please try this jobscript instead:
>>>>
>>>> #!/bin/sh
>>>> set | grep PBS
>>>> /path/to/mpirun /path/to/my_program
>>>>
>>>> All should be handled by Open MPI automatically. With the "set"
>>>> bash
>>>> command you will get a list with all defined variables for further
>>>> analysis; and where you can check for the variables set by Torque.
>>>>
>>>> -- Reuti
>>>
>>> "set | grep PBS" part had nothing in output.
>>
>> Strange - you checked the .o end .e files of the job? - Reuti
>
> There is nothing in -o nor -e output. I had to kill the job.
> I checked torque log, it shows (/var/spool/torque/server_logs):
>
> 09/29/2008 15:52:16;0100;PBS_Server;Job;799.xxx.xxx.xxx;enqueuing
> into default, state 1 hop 1
> 09/29/2008 15:52:16;0008;PBS_Server;Job;799.xxx.xxx.xxx;Job Queued
> at request of zhu_at_xxx.xxx.xxx, owner = zhu_at_xxx.xxx.xxx, job name =
> mpiblastn.sh, queue = default
> 09/29/2008 15:52:16;0040;PBS_Server;Svr;xxx.xxx.xxx;Scheduler sent
> command new
> 09/29/2008 15:52:16;0008;PBS_Server;Job;799.xxx.xxx.xxx;Job
> Modified at request of Scheduler_at_xxx.xxx.xxx
> 09/29/2008 15:52:27;0008;PBS_Server;Job;799.xxx.xxx.xxx;Job deleted
> at request of zhu_at_xxx.xxx.xxx
> 09/29/2008 15:52:27;0100;PBS_Server;Job;799.xxx.xxx.xxx;dequeuing
> from default, state EXITING
> 09/29/2008 15:52:27;0040;PBS_Server;Svr;xxx.xxx.xxx;Scheduler sent
> command term
> 09/29/2008 15:52:47;0001;PBS_Server;Svr;PBS_Server;is_request, bad
> attempt to connect from 172.16.100.1:1021 (address not trusted -
> check entry in server_priv/nodes)
>
> where the server_priv/nodes has:
> node001 np=4
> node002 np=4
> node003 np=4
> node004 np=4
> node005 np=4
> node006 np=4
> node007 np=4
>
> which was set up by the vender.
>
> What is "address not trusted"?
>
> Zhiliang
>
>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users