Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] submitted job stops
From: Reuti (reuti_at_[hidden])
Date: 2008-04-10 05:18:04


Hi,

Am 09.04.2008 um 22:17 schrieb Danesh Daroui:
> Mark Kosmowski skrev:
>> Danesh:
>>
>> Have you tried "mpirun -np 4 --hostfile hosts hostname" to verify
>> that
>> ompi is working?
>>
>
> When I run "mpirun -np 4 --hostfile hosts hostname" same thing happens
> and it just hangs. Can it be a clue?
>
>> Can you remote access from each node to each other node?
>>
> Yes all nodes can have access to each other via SSH and can login
> without being prompted for password.
>
>> If any node has more than 1 network device, are you using the ompi
>> options to specify which device to use?
>>
>
> Each node has one network interface which works properly.

do you have any firewall on the machines, blocking certain ports?

-- Reuti

> Regards,
>
> Danesh
>
>
>> Good luck,
>>
>> Mark
>>
>>
>>> Message: 5
>>> Date: Wed, 9 Apr 2008 14:15:34 +0200 (CEST)
>>> From: "danesh.d_at_[hidden]" <danesh.d_at_[hidden]>
>>> Subject: [OMPI users] Ang: Re: submitted job stops
>>> To: <users_at_[hidden]>
>>> Message-ID:
>>> <24351656.56761207743334738.JavaMail.defaultUser_at_defaultHost>
>>> Content-Type: text/plain;charset="ISO-8859-15"
>>>
>>>
>>> Actually my program is very simple MPI program "Hello World" which
>>> just prints rank of each processor and then terminates. When I run
>>> my program on a single processor machine with e.g 4 processors
>>> (oversubscribing) it shows:
>>>
>>> Hello world from processor with rank 0
>>> Hello world from processor with rank 3
>>> Hello world from processor with rank 1
>>> Hello world from processor with rank 2
>>>
>>> but when I use my remote machines everything just stops when
>>> I run the program.
>>>
>>> No I do not use any queuing system. I simply run it like this:
>>>
>>> mpirun -np 4 --hostfile hosts ./hw
>>>
>>> and then it just tops until I terminate it manually. As I said,
>>> I monitored all machines (master+2 slaves) and found out that
>>> in all machines, "orted" daemon starts when I run the program, but
>>> after few seconds the daemon is terminated. What can be the reason?
>>>
>>> Thanks,
>>>
>>> Danesh
>>>
>>>
>>>
>>>
>>>> ----Ursprungligt meddelande----
>>>> Fr?n: reuti_at_[hidden]
>>>> Datum: 09-04-2008 13:26
>>>> Till: "Open MPI Users"<users_at_[hidden]>
>>>> ?rende: Re: [OMPI users] submitted job stops
>>>>
>>>> Hi,
>>>>
>>>> Am 08.04.2008 um 21:58 schrieb Danesh Daroui:
>>>>
>>>>> I had posted a message about my problem and I did all solutions
>>>>> but
>>>>> the
>>>>> problem is not solved it. The problem is that
>>>>> I have installed Open-MPI on three machines (1 master+2 slaves).
>>>>> When I
>>>>> submit a job to master I can see that
>>>>> "orted" daemon is launched on all machines (by running "top" on
>>>>> all
>>>>> machines) but all "orted" daemons terminate after
>>>>> few seconds and nothing will happen. First I thought that it
>>>>> can be
>>>>> because remote machines can not launch "orted" but
>>>>> now I am sure that it can be run on all machines without
>>>>> problem. Any
>>>>> suggestion?
>>>>>
>>>> the question is more: is your MPI program running successfully
>>>> or is
>>>> there simply no output from mpiexec/-run? And: by "submit" you mean
>>>> you use any queuingsystem?
>>>>
>>>> -- Reuti
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>>
>>>
>>>
>>>
>>> ------------------------------
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>> End of users Digest, Vol 863, Issue 1
>>> *************************************
>>>
>>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users