Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] qsub - mpirun problem
From: Reuti (reuti_at_[hidden])
Date: 2008-09-29 18:10:32


Am 29.09.2008 um 23:15 schrieb Doug Reeder:

> It sounds like you may not have setup paswordless ssh between all
> your nodes.

If you have a tight intergration of Open MPI and use the task manager
from torque this shiouldn't be necessary.

Continued below...

> Doug Reeder
> On Sep 29, 2008, at 2:12 PM, Zhiliang Hu wrote:
>
>> At 10:45 PM 9/29/2008 +0200, you wrote:
>>> Am 29.09.2008 um 22:33 schrieb Zhiliang Hu:
>>>
>>>> At 07:37 PM 9/29/2008 +0200, Reuti wrote:
>>>>
>>>>>> "-l nodes=6:ppn=2" is all I have to specify the node requests:
>>>>>
>>>>> this might help: http://www.open-mpi.org/faq/?category=tm
>>>>
>>>> Essentially the examples given on this web is no difference from
>>>> what I did.
>>>> Only thing new is, I suppose "qsub -I " is for interactive mode.
>>>> When I did this:
>>>>
>>>> qsub -I -l nodes=7 mpiblastn.sh
>>>>
>>>> It hangs on "qsub: waiting for job 798.nagrp2.ansci.iastate.edu to
>>>> start".
>>>>
>>>>
>>>>>> UNIX_PROMPT> qsub -l nodes=6:ppn=2 /path/to/mpi_program
>>>>>> where "mpi_program" is a file with one line:
>>>>>> /path/to/mpirun -np 12 /path/to/my_program
>>>>>
>>>>> Can you please try this jobscript instead:
>>>>>
>>>>> #!/bin/sh
>>>>> set | grep PBS
>>>>> /path/to/mpirun /path/to/my_program
>>>>>
>>>>> All should be handled by Open MPI automatically. With the "set"
>>>>> bash
>>>>> command you will get a list with all defined variables for further
>>>>> analysis; and where you can check for the variables set by Torque.
>>>>>
>>>>> -- Reuti
>>>>
>>>> "set | grep PBS" part had nothing in output.
>>>
>>> Strange - you checked the .o end .e files of the job? - Reuti
>>
>> There is nothing in -o nor -e output. I had to kill the job.
>> I checked torque log, it shows (/var/spool/torque/server_logs):
>>
>> 09/29/2008 15:52:16;0100;PBS_Server;Job;799.xxx.xxx.xxx;enqueuing
>> into default, state 1 hop 1
>> 09/29/2008 15:52:16;0008;PBS_Server;Job;799.xxx.xxx.xxx;Job Queued
>> at request of zhu_at_xxx.xxx.xxx, owner = zhu_at_xxx.xxx.xxx, job name =
>> mpiblastn.sh, queue = default
>> 09/29/2008 15:52:16;0040;PBS_Server;Svr;xxx.xxx.xxx;Scheduler sent
>> command new
>> 09/29/2008 15:52:16;0008;PBS_Server;Job;799.xxx.xxx.xxx;Job
>> Modified at request of Scheduler_at_xxx.xxx.xxx
>> 09/29/2008 15:52:27;0008;PBS_Server;Job;799.xxx.xxx.xxx;Job
>> deleted at request of zhu_at_xxx.xxx.xxx
>> 09/29/2008 15:52:27;0100;PBS_Server;Job;799.xxx.xxx.xxx;dequeuing
>> from default, state EXITING
>> 09/29/2008 15:52:27;0040;PBS_Server;Svr;xxx.xxx.xxx;Scheduler sent
>> command term
>> 09/29/2008 15:52:47;0001;PBS_Server;Svr;PBS_Server;is_request, bad
>> attempt to connect from 172.16.100.1:1021 (address not trusted -
>> check entry in server_priv/nodes)

As you blank out some addresses: have the nodes and the headnode one
or two network cards installed? All the names like node001 et al. are
known on neach node by the correct address? I.e. 172.16.100.1 = node001?

-- Reuti

>> where the server_priv/nodes has:
>> node001 np=4
>> node002 np=4
>> node003 np=4
>> node004 np=4
>> node005 np=4
>> node006 np=4
>> node007 np=4
>>
>> which was set up by the vender.
>>
>> What is "address not trusted"?
>>
>> Zhiliang
>>
>>
>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users