Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] submitted job stops
From: Danesh Daroui (Danesh.D_at_[hidden])
Date: 2008-04-09 16:17:59


Mark Kosmowski skrev:
> Danesh:
>
> Have you tried "mpirun -np 4 --hostfile hosts hostname" to verify that
> ompi is working?
>

When I run "mpirun -np 4 --hostfile hosts hostname" same thing happens
and it just hangs. Can it be a clue?

> Can you remote access from each node to each other node?
>
Yes all nodes can have access to each other via SSH and can login
without being prompted for password.

> If any node has more than 1 network device, are you using the ompi
> options to specify which device to use?
>

Each node has one network interface which works properly.

Regards,

Danesh

> Good luck,
>
> Mark
>
>
>> Message: 5
>> Date: Wed, 9 Apr 2008 14:15:34 +0200 (CEST)
>> From: "danesh.d_at_[hidden]" <danesh.d_at_[hidden]>
>> Subject: [OMPI users] Ang: Re: submitted job stops
>> To: <users_at_[hidden]>
>> Message-ID:
>> <24351656.56761207743334738.JavaMail.defaultUser_at_defaultHost>
>> Content-Type: text/plain;charset="ISO-8859-15"
>>
>>
>> Actually my program is very simple MPI program "Hello World" which
>> just prints rank of each processor and then terminates. When I run
>> my program on a single processor machine with e.g 4 processors
>> (oversubscribing) it shows:
>>
>> Hello world from processor with rank 0
>> Hello world from processor with rank 3
>> Hello world from processor with rank 1
>> Hello world from processor with rank 2
>>
>> but when I use my remote machines everything just stops when
>> I run the program.
>>
>> No I do not use any queuing system. I simply run it like this:
>>
>> mpirun -np 4 --hostfile hosts ./hw
>>
>> and then it just tops until I terminate it manually. As I said,
>> I monitored all machines (master+2 slaves) and found out that
>> in all machines, "orted" daemon starts when I run the program, but
>> after few seconds the daemon is terminated. What can be the reason?
>>
>> Thanks,
>>
>> Danesh
>>
>>
>>
>>
>>> ----Ursprungligt meddelande----
>>> Fr?n: reuti_at_[hidden]
>>> Datum: 09-04-2008 13:26
>>> Till: "Open MPI Users"<users_at_[hidden]>
>>> ?rende: Re: [OMPI users] submitted job stops
>>>
>>> Hi,
>>>
>>> Am 08.04.2008 um 21:58 schrieb Danesh Daroui:
>>>
>>>> I had posted a message about my problem and I did all solutions but
>>>> the
>>>> problem is not solved it. The problem is that
>>>> I have installed Open-MPI on three machines (1 master+2 slaves).
>>>> When I
>>>> submit a job to master I can see that
>>>> "orted" daemon is launched on all machines (by running "top" on all
>>>> machines) but all "orted" daemons terminate after
>>>> few seconds and nothing will happen. First I thought that it can be
>>>> because remote machines can not launch "orted" but
>>>> now I am sure that it can be run on all machines without problem. Any
>>>> suggestion?
>>>>
>>> the question is more: is your MPI program running successfully or is
>>> there simply no output from mpiexec/-run? And: by "submit" you mean
>>> you use any queuingsystem?
>>>
>>> -- Reuti
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>>
>>
>>
>>
>> ------------------------------
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>> End of users Digest, Vol 863, Issue 1
>> *************************************
>>
>>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>