I was using the v1.2 branch. Gleb's fix has resolved the problem.
What version of Open MPI are you using?
We had a bug with this on the trunk and [unreleased] v1.2 branch; it was
just fixed within the last few hours in both places. It should not be a
problem in the released v1.1 series.
Can you confirm that you were using the OMPI trunk or the v1.2 branch? If
you're seeing this in the v1.1 series, then we need to look at this a bit
On 9/22/06 1:25 PM, "Nysal Jan" <firstname.lastname@example.org> wrote:
> The ompi_info command shows the following description for
> "btl_openib_max_btls" parameter
> MCA btl: parameter "btl_openib_max_btls" (current value: "-1") Maximum
> number of HCA ports to use (-1 = use all available, otherwise must be >= 1)
> Even though I specify "mpirun --mca btl_openib_max_btls 1 ....." 2 openib
> btls are created(the HCA has 2 ports).
> When I try to run Open MPI across 2 nodes (one node has an HCA with 2 ports
> and the other has only one port). Both endpoints send the QP information
> over to the peer. Only one endpoint exists at the peer so it prints the
> following error message:
> [0,1,1][btl_openib_endpoint.c:706:mca_btl_openib_endpoint_recv] can't find
> suitable endpoint for this peer
> [0,1,0][btl_openib_endpoint.c:913:mca_btl_openib_endpoint_connect] error
> posting receive errno says Operation now in progress
> [0,1,0][btl_openib_endpoint.c:737:mca_btl_openib_endpoint_recv] endpoint
> connect error: -1
> Is "btl_openib_max_btls" the maximum number of BTLs or maximum number of
> BTLs per port (which is what the current implementation "init_one_hca()"
> looks like)?
> devel mailing list
Server Virtualization Business Unit