Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] openmpi hangs when running on more than one node (unless i use --debug-daemons )
From: Advanced Computing Group University of Padova (acg.unipd_at_[hidden])
Date: 2010-12-30 04:13:38


Thank You Raplh
It works!!!!
:)

On Wed, Dec 29, 2010 at 4:23 PM, Ralph Castain <rhc_at_[hidden]> wrote:

> Both look perfectly right to me. The difference is only because your
> "success" one still has the ssh session active.
>
> It looks to me like something is preventing communication when the ssh
> session is terminated, but I have no clue why.
>
> Given the small cluster size, I would just add this to your default param
> file and not worry about it:
>
> orte_leave_session_attached = 1
>
>
> On Dec 29, 2010, at 2:10 AM, Advanced Computing Group University of Padova
> wrote:
>
>
>
> On Wed, Dec 29, 2010 at 10:10 AM, Advanced Computing Group University of
> Padova <acg.unipd_at_[hidden]> wrote:
>
>> Thank you Ralph,
>> Your suspects seems to be quite interesting :)
>> I try to run the same program from node 192.168.1/2.11 using also
>> 192.168.2.12 "tracing" .12 activities.
>> I attach the two files (_succ: using --debug-daemons , _fail:without
>> --debug-daemons)
>> I notice that orted daemon on the second node is called in a different
>> way.....
>> Moreover when i launch without --debug-daemons a process called
>> orted...... remain active on the second node after i kill (ctrl+c) the
>> command on the first node.
>>
>> Can you continue to help me ?
>>
>>
>> On Tue, Dec 28, 2010 at 8:51 PM, Ralph Castain <rhc_at_[hidden]> wrote:
>>
>>> All --debug-daemons really does is keep the ssh session open after
>>> launching the remote daemon and turn on some output. Otherwise, we close
>>> that session as most systems only allow a limited number of concurrent ssh
>>> sessions to be open.
>>>
>>> I suspect you have a system setting that kills any running job upon ssh
>>> close. It would be best if you removed that restriction. If you cannot, then
>>> you can always run your MPI jobs with --no-daemonize. This will keep the ssh
>>> session open, but without all the debug output.
>>>
>>> That flag is just shorthand for an MCA param, so you can set it in your
>>> environ or put it in your default MCA param file.
>>>
>>>
>>> On Dec 28, 2010, at 3:31 AM, Advanced Computing Group University of
>>> Padova wrote:
>>>
>>> yes i've tested 'em
>>> In fact using the --debug-daemons switch everything works fine! (and i
>>> see that on the nodes a process calles orted... is started whenever i launch
>>> a test application)
>>> I believe this is a environment variables problem....
>>>
>>> On Mon, Dec 27, 2010 at 10:16 PM, David Zhang <solarbikedz_at_[hidden]>wrote:
>>>
>>>> have you tested your ssh key setup, fire wall, and switch settings to
>>>> ensure all nodes are talking to each other?
>>>>
>>>> On Mon, Dec 27, 2010 at 1:07 AM, Advanced Computing Group University of
>>>> Padova <acg.unipd_at_[hidden]> wrote:
>>>>
>>>>> using openmpi 1.4.2
>>>>>
>>>>>
>>>>> On Fri, Dec 24, 2010 at 11:17 AM, Advanced Computing Group University
>>>>> of Padova <acg.unipd_at_[hidden]> wrote:
>>>>>
>>>>>> Hi,
>>>>>> i am building a small 16 nodes cluster gentoo based.
>>>>>> I succesfully installed openmpi and i succesfully tried some simple
>>>>>> small test parallel program on a single host but...
>>>>>> i can't run parallel program on more than one nodes
>>>>>>
>>>>>>
>>>>>> The nodes are cloned (so they are equals).
>>>>>> The mpiuser (and their ssh certificates) uses /home/mpiuser that is a
>>>>>> nfs share.
>>>>>> I modified .bashrc
>>>>>>
>>>>>> -------------------------
>>>>>> PATH=/usr/bin:$PATH ; export PATH ;
>>>>>> LD_LIBRARY_PATH=/usr/lib64:$LD_LIBRARY_PATH ; export LD_LIBRARY_PATH ;
>>>>>>
>>>>>> # already present below
>>>>>> if [[ $- != *i* ]] ; then
>>>>>> # Shell is non-interactive. Be done now!
>>>>>> return
>>>>>> fi
>>>>>> ---------------------
>>>>>>
>>>>>> The very very strange behaviour is that using the --debug-daemons let
>>>>>> my program run succesfully.....
>>>>>>
>>>>>> Thank you in advance and sorry for my bad english
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> David Zhang
>>>> University of California, San Diego
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>
>>
> <dump_succ.txt><dump_fail.txt>
> _______________________________________________
>
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>