Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] openmpi hangs when running on more than one node (unless i use --debug-daemons )
From: Advanced Computing Group University of Padova (acg.unipd_at_[hidden])
Date: 2010-12-30 04:13:38


Thank You Raplh
It works!!!!
:)

On Wed, Dec 29, 2010 at 4:23 PM, Ralph Castain <rhc_at_[hidden]> wrote:

> Both look perfectly right to me. The difference is only because your
> "success" one still has the ssh session active.
>
> It looks to me like something is preventing communication when the ssh
> session is terminated, but I have no clue why.
>
> Given the small cluster size, I would just add this to your default param
> file and not worry about it:
>
> orte_leave_session_attached = 1
>
>
> On Dec 29, 2010, at 2:10 AM, Advanced Computing Group University of Padova
> wrote:
>
>
>
> On Wed, Dec 29, 2010 at 10:10 AM, Advanced Computing Group University of
> Padova <acg.unipd_at_[hidden]> wrote:
>
>> Thank you Ralph,
>> Your suspects seems to be quite interesting :)
>> I try to run the same program from node 192.168.1/2.11 using also
>> 192.168.2.12 "tracing" .12 activities.
>> I attach the two files (_succ: using --debug-daemons , _fail:without
>> --debug-daemons)
>> I notice that orted daemon on the second node is called in a different
>> way.....
>> Moreover when i launch without --debug-daemons a process called
>> orted...... remain active on the second node after i kill (ctrl+c) the
>> command on the first node.
>>
>> Can you continue to help me ?
>>
>>
>> On Tue, Dec 28, 2010 at 8:51 PM, Ralph Castain <rhc_at_[hidden]> wrote:
>>
>>> All --debug-daemons really does is keep the ssh session open after
>>> launching the remote daemon and turn on some output. Otherwise, we close
>>> that session as most systems only allow a limited number of concurrent ssh
>>> sessions to be open.
>>>
>>> I suspect you have a system setting that kills any running job upon ssh
>>> close. It would be best if you removed that restriction. If you cannot, then
>>> you can always run your MPI jobs with --no-daemonize. This will keep the ssh
>>> session open, but without all the debug output.
>>>
>>> That flag is just shorthand for an MCA param, so you can set it in your
>>> environ or put it in your default MCA param file.
>>>
>>>
>>> On Dec 28, 2010, at 3:31 AM, Advanced Computing Group University of
>>> Padova wrote:
>>>
>>> yes i've tested 'em
>>> In fact using the --debug-daemons switch everything works fine! (and i
>>> see that on the nodes a process calles orted... is started whenever i launch
>>> a test application)
>>> I believe this is a environment variables problem....
>>>
>>> On Mon, Dec 27, 2010 at 10:16 PM, David Zhang <solarbikedz_at_[hidden]>wrote:
>>>
>>>> have you tested your ssh key setup, fire wall, and switch settings to
>>>> ensure all nodes are talking to each other?
>>>>
>>>> On Mon, Dec 27, 2010 at 1:07 AM, Advanced Computing Group University of
>>>> Padova <acg.unipd_at_[hidden]> wrote:
>>>>
>>>>> using openmpi 1.4.2
>>>>>
>>>>>
>>>>> On Fri, Dec 24, 2010 at 11:17 AM, Advanced Computing Group University
>>>>> of Padova <acg.unipd_at_[hidden]> wrote:
>>>>>
>>>>>> Hi,
>>>>>> i am building a small 16 nodes cluster gentoo based.
>>>>>> I succesfully installed openmpi and i succesfully tried some simple
>>>>>> small test parallel program on a single host but...
>>>>>> i can't run parallel program on more than one nodes
>>>>>>
>>>>>>
>>>>>> The nodes are cloned (so they are equals).
>>>>>> The mpiuser (and their ssh certificates) uses /home/mpiuser that is a
>>>>>> nfs share.
>>>>>> I modified .bashrc
>>>>>>
>>>>>> -------------------------
>>>>>> PATH=/usr/bin:$PATH ; export PATH ;
>>>>>> LD_LIBRARY_PATH=/usr/lib64:$LD_LIBRARY_PATH ; export LD_LIBRARY_PATH ;
>>>>>>
>>>>>> # already present below
>>>>>> if [[ $- != *i* ]] ; then
>>>>>> # Shell is non-interactive. Be done now!
>>>>>> return
>>>>>> fi
>>>>>> ---------------------
>>>>>>
>>>>>> The very very strange behaviour is that using the --debug-daemons let
>>>>>> my program run succesfully.....
>>>>>>
>>>>>> Thank you in advance and sorry for my bad english
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> David Zhang
>>>> University of California, San Diego
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>
>>
> <dump_succ.txt><dump_fail.txt>
> _______________________________________________
>
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>