Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] OpenMPI out of band TCP retry exceeded
From: Ralph Castain (rhc_at_[hidden])
Date: 2011-04-28 16:19:38


We figured out that in the case where you provide the full path to mpirun -and- the -prefix option, we ignore the latter anyway. :-/

I'm working on a patch to at least warn you we are ignoring it.

On Apr 28, 2011, at 2:03 PM, Sindhi, Waris PW wrote:

> The --prefix directory is a typo and no longer exists on our system.
>
> We are running 1.4-4 version of OpenMPI
>
> % /opt/openmpi/x86_64/bin/ompi_info
>
> Package: Open MPI
> mockbuild_at_[hidden] Distribution
> Open MPI: 1.4
> Open MPI SVN revision: r22285
> Open MPI release date: Dec 08, 2009
> Open RTE: 1.4
>
>
> Sincerely,
>
> Waris Sindhi
> High Performance Computing, TechApps
> Pratt & Whitney, UTC
> (860)-565-8486
>
> -----Original Message-----
> From: users-bounces_at_[hidden] [mailto:users-bounces_at_[hidden]] On
> Behalf Of Ralph Castain
> Sent: Thursday, April 28, 2011 9:02 AM
> To: Open MPI Users
> Subject: Re: [OMPI users] OpenMPI out of band TCP retry exceeded
>
>
> On Apr 28, 2011, at 6:49 AM, Jeff Squyres wrote:
>
>> On Apr 28, 2011, at 8:45 AM, Ralph Castain wrote:
>>
>>> What lead you to conclude 1.2.8?
>>>
>>>>>>> /opt/openmpi/i386/bin/mpirun -mca btl_openib_verbose 1 --mca btl
> ^tcp
>>>>>>> --mca pls_ssh_agent ssh -mca oob_tcp_peer_retries 1000 --prefix
>>>>>>> /usr/lib/openmpi/1.2.8-gcc/bin -np 239 --app procgroup
>>
>> His command line has "1.2.8" in it.
>
> Actually, that isn't totally correct and may point to the problem. The
> mpirun cmd itself points to a version of OMPI located in /opt/openmpi.
> The error messages are clearly from a 1.3+ version - they look totally
> different for 1.2
>
> However, the prefix passed to the backend nodes points to /usr/lib, and
> indeed looks like a 1.2.8 version.
>
> Waris: is this a mistype? Are these two versions actually the same?
>
> If not, that would explain the problem - you can't mix OMPI versions. As
> written, the cmd line has the potential to mix one version of mpirun
> with another version of the daemons.
>
>
>>
>> --
>> Jeff Squyres
>> jsquyres_at_[hidden]
>> For corporate legal information go to:
>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users