The --prefix directory is a typo and no longer exists on our system.
We are running 1.4-4 version of OpenMPI
% /opt/openmpi/x86_64/bin/ompi_info
Package: Open MPI
mockbuild_at_[hidden] Distribution
Open MPI: 1.4
Open MPI SVN revision: r22285
Open MPI release date: Dec 08, 2009
Open RTE: 1.4
Sincerely,
Waris Sindhi
High Performance Computing, TechApps
Pratt & Whitney, UTC
(860)-565-8486
-----Original Message-----
From: users-bounces_at_[hidden] [mailto:users-bounces_at_[hidden]] On
Behalf Of Ralph Castain
Sent: Thursday, April 28, 2011 9:02 AM
To: Open MPI Users
Subject: Re: [OMPI users] OpenMPI out of band TCP retry exceeded
On Apr 28, 2011, at 6:49 AM, Jeff Squyres wrote:
> On Apr 28, 2011, at 8:45 AM, Ralph Castain wrote:
>
>> What lead you to conclude 1.2.8?
>>
>>>>>> /opt/openmpi/i386/bin/mpirun -mca btl_openib_verbose 1 --mca btl
^tcp
>>>>>> --mca pls_ssh_agent ssh -mca oob_tcp_peer_retries 1000 --prefix
>>>>>> /usr/lib/openmpi/1.2.8-gcc/bin -np 239 --app procgroup
>
> His command line has "1.2.8" in it.
Actually, that isn't totally correct and may point to the problem. The
mpirun cmd itself points to a version of OMPI located in /opt/openmpi.
The error messages are clearly from a 1.3+ version - they look totally
different for 1.2
However, the prefix passed to the backend nodes points to /usr/lib, and
indeed looks like a 1.2.8 version.
Waris: is this a mistype? Are these two versions actually the same?
If not, that would explain the problem - you can't mix OMPI versions. As
written, the cmd line has the potential to mix one version of mpirun
with another version of the daemons.
>
> --
> Jeff Squyres
> jsquyres_at_[hidden]
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
users_at_[hidden]
http://www.open-mpi.org/mailman/listinfo.cgi/users
|