Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2006-12-04 08:35:17


One thing that I note is that you are using a fairly ancient
development version -- the development snapshots tend to change
pretty quickly (usually nightly). The version you cited is
1.2a1r10111 (which I think is about 5-6 months ago), but the current
development head is r12737.

Indeed, we've had some fairly important run-time changes over the
past 2-3 weeks. Can you update to a more recent copy and try again?

On Dec 4, 2006, at 3:44 AM, Jens Klostermann wrote:

>> What the system is saying is that (a) you don't have transparent ssh
>> authority on one or more of your nodes, and/or (b) the system was
> unable
>> to
>> locate the Open MPI code libraries on the remote node. For the first
>> problem, please see the FAQ at:
>
>> http://www.open-mpi.org/faq/?category=rsh#ssh-keys
>>
>>
>> Once you have that fixed, then you should check the remote nodes to
>> ensure
>> that the Open MPI code libraries are available - you may need to
> provide
>> a
>> prefix directory to mpirun to tell us where they are. Please see the
> FAQ
>> at:
>>
>>
>> http://www.open-mpi.org/faq/?category=running
>>
>>
>> For some advice in that area.
>>
>>
>> Hope that helps
>> Ralph
>
> I think these suggestions: (a) nontransparent ssh authority and (b)
> being unable to locate the Open MPI code libraries on the remote node
> are not the problems.
> (a)Passwordless ssh is setup and all nodes see the same home!
> (b)the Open MPI code libraries are located in my home which sees every
> node.
>
> mpirun sometimes works with all cpus/nodes of the cluster, but
> sometimes
> it won't and the error described below will occur.
>>
>
>>
>> On 12/1/06 8:17 AM, "Jens Klostermann"
>> <jens.klostermann_at_[hidden]> wrote:
>>
>>
>>> I 've got the same problem as described in:
>>> http://www.open-mpi.org/community/lists/users/2006/07/1537.php
>>>
>>> From: Chengwen Chen (chenchengwen_at_[hidden])
>>> Date: 2006-07-04 03:53:26
>>>
>>>
>>>
>>> The problem seems to occur randomly! It occurs more often if I use a
>>> larger number of cpu, but always never if I use a small number of
>> cpus.
>>> So far my cure to the problem is to kill and restart my application
>>> (mpirun ...) as often untill the error won't occur and mpirun will
>> run.
>>>
>>> So is the problem resolved. Can anybody give me an hint?
>>>
>>> I am using a amd64 linux (suse10) cluster with infiniband conection
>> and
>>> openmpi-1.2a1r10111.
>>>
>>> I attach the ompi_info --param all all output, hope it helps.
>>>
>>> Regards Jens
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Jeff Squyres
Server Virtualization Business Unit
Cisco Systems