Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] OpenMPI out of band TCP retry exceeded
From: Ralph Castain (rhc_at_[hidden])
Date: 2011-04-27 14:18:04


Perhaps a firewall? All it is telling you is that mpirun couldn't establish TCP communications with the daemon on ln10.

On Apr 27, 2011, at 11:58 AM, Sindhi, Waris PW wrote:

> Hi,
> I am getting a "oob-tcp: Communication retries exceeded" error
> message when I run a 238 MPI slave code
>
>
> /opt/openmpi/i386/bin/mpirun -mca btl_openib_verbose 1 --mca btl ^tcp
> --mca pls_ssh_agent ssh -mca oob_tcp_peer_retries 1000 --prefix
> /usr/lib/openmpi/1.2.8-gcc/bin -np 239 --app procgroup
> ------------------------------------------------------------------------
> --
> mpirun was unable to start the specified application as it encountered
> an error:
>
> Error name: Unknown error: 1
> Node: ln10
>
> when attempting to start process rank 234.
> ------------------------------------------------------------------------
> --
> [ln13:27867] [[61748,0],0]-[[61748,0],32] oob-tcp: Communication retries
> exceeded. Can not communicate with peer
> [ln13:27867] [[61748,0],0] ORTE_ERROR_LOG: Unreachable in file
> orted/orted_comm.c at line 130
> [ln13:27867] [[61748,0],0] ORTE_ERROR_LOG: Unreachable in file
> orted/orted_comm.c at line 130
> [ln13:27867] [[61748,0],0]-[[61748,0],32] oob-tcp: Communication retries
> exceeded. Can not communicate with peer
> [ln13:27867] [[61748,0],0]-[[61748,0],32] oob-tcp: Communication retries
> exceeded. Can not communicate with peer
> [ln13:27867] [[61748,0],0]-[[61748,0],32] oob-tcp: Communication retries
> exceeded. Can not communicate with peer
> [ln13:27867] [[61748,0],0]-[[61748,0],32] oob-tcp: Communication retries
> exceeded. Can not communicate with peer
> [ln13:27867] [[61748,0],0]-[[61748,0],32] oob-tcp: Communication retries
> exceeded. Can not communicate with peer
> [ln13:27867] [[61748,0],0]-[[61748,0],32] oob-tcp: Communication retries
> exceeded. Can not communicate with peer
> [ln13:27867] [[61748,0],0]-[[61748,0],32] oob-tcp: Communication retries
> exceeded. Can not communicate with peer
> [ln13:27867] [[61748,0],0]-[[61748,0],32] oob-tcp: Communication retries
> exceeded. Can not communicate with peer
> [ln13:27867] [[61748,0],0]-[[61748,0],32] oob-tcp: Communication retries
> exceeded. Can not communicate with peer
> [ln13:27867] [[61748,0],0]-[[61748,0],32] oob-tcp: Communication retries
> exceeded. Can not communicate with peer
>
> Any help would be greatly appreciated.
>
> Sincerely,
>
> Waris Sindhi
> High Performance Computing, TechApps
> Pratt & Whitney, UTC
> (860)-565-8486
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users