Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Problem in remote nodes
From: uriz.49949_at_[hidden]
Date: 2010-03-30 06:27:01


I've benn investigating and there is no firewall that could stop TCP
traffic in the cluster. With the option --mca plm_base_verbose 30 I get
the following output:

[itanium1] /home/otro > mpirun --mca plm_base_verbose 30 --host itanium2
helloworld.out
[itanium1:08311] mca: base: components_open: Looking for plm components
[itanium1:08311] mca: base: components_open: opening plm components
[itanium1:08311] mca: base: components_open: found loaded component rsh
[itanium1:08311] mca: base: components_open: component rsh has no register
function
[itanium1:08311] mca: base: components_open: component rsh open function
successful
[itanium1:08311] mca: base: components_open: found loaded component slurm
[itanium1:08311] mca: base: components_open: component slurm has no
register function
[itanium1:08311] mca: base: components_open: component slurm open function
successful
[itanium1:08311] mca:base:select: Auto-selecting plm components
[itanium1:08311] mca:base:select:( plm) Querying component [rsh]
[itanium1:08311] mca:base:select:( plm) Query of component [rsh] set
priority to 10
[itanium1:08311] mca:base:select:( plm) Querying component [slurm]
[itanium1:08311] mca:base:select:( plm) Skipping component [slurm]. Query
failed to return a module
[itanium1:08311] mca:base:select:( plm) Selected component [rsh]
[itanium1:08311] mca: base: close: component slurm closed
[itanium1:08311] mca: base: close: unloading component slurm

--Hangs here

It seems a slurm problem??

Thanks to any idea

El Vie, 19 de Marzo de 2010, 17:57, Ralph Castain escribió:
> Did you configure OMPI with --enable-debug? You should do this so that
> more diagnostic output is available.
>
> You can also add the following to your cmd line to get more info:
>
> --debug --debug-daemons --leave-session-attached
>
> Something is likely blocking proper launch of the daemons and processes so
> you aren't getting to the btl's at all.
>
>
> On Mar 19, 2010, at 9:42 AM, uriz.49949_at_[hidden] wrote:
>
>> The processes are running on the remote nodes but they don't give the
>> response to the origin node. I don't know why.
>> With the option --mca btl_base_verbose 30, I have the same problems and
>> it
>> doesn't show any message.
>>
>> Thanks
>>
>>> On Wed, Mar 17, 2010 at 1:41 PM, Jeff Squyres <jsquyres_at_[hidden]>
>>> wrote:
>>>> On Mar 17, 2010, at 4:39 AM, <uriz.49949_at_[hidden]> wrote:
>>>>
>>>>> Hi everyone I'm a new Open MPI user and I have just installed Open
>>>>> MPI
>>>>> in
>>>>> a 6 nodes cluster with Scientific Linux. When I execute it in local
>>>>> it
>>>>> works perfectly, but when I try to execute it on the remote nodes
>>>>> with
>>>>> the
>>>>> --host option it hangs and gives no message. I think that the
>>>>> problem
>>>>> could be with the shared libraries but i'm not sure. In my opinion
>>>>> the
>>>>> problem is not ssh because i can access to the nodes with no password
>>>>
>>>> You might want to check that Open MPI processes are actually running
>>>> on
>>>> the remote nodes -- check with ps if you see any "orted" or other
>>>> MPI-related processes (e.g., your processes).
>>>>
>>>> Do you have any TCP firewall software running between the nodes? If
>>>> so,
>>>> you'll need to disable it (at least for Open MPI jobs).
>>>
>>> I also recommend running mpirun with the option --mca btl_base_verbose
>>> 30 to troubleshoot tcp issues.
>>>
>>> In some environments, you need to explicitly tell mpirun what network
>>> interfaces it can use to reach the hosts. Read the following FAQ
>>> section for more information:
>>>
>>> http://www.open-mpi.org/faq/?category=tcp
>>>
>>> Item 7 of the FAQ might be of special interest.
>>>
>>> Regards,
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>