On May 29, 2006, at 5:46 AM, Francoise Roch wrote:
> I still have a problem to select an interface with openmpi-1.1a7 on
> solaris opteron.
> I compile in 64 bit mode, with Studio11 compilers
> I attempted to force interface exclusion without success.
> This problem is critical for us because we'll soon have Infiniband
> interfaces for mpi traffic.
> roch_at_n15 ~/MPI > mpirun --mca btl_tcp_if_exclude bge1 -np 2 -host
> p15,p27 all2all
> Process 0 is alive on n15
> Process 1 is alive on n27
> [n27:05110] *** An error occurred in MPI_Barrier
> [n27:05110] *** on communicator MPI_COMM_WORLD
> [n27:05110] *** MPI_ERR_INTERN: internal error
> [n27:05110] *** MPI_ERRORS_ARE_FATAL (goodbye)
> 1 process killed (possibly by Open MPI)
> The code works without mca btl_tcp_if_exclude option.
It took me a while to realize what is going on. Normally,
btl_tcp_if_exclude excludes the lo devices so that they won't be used
for the btl transport. When you explicitly set btl_tcp_if_exclude,
you have to include lo0 (for Solaris) in the list or things go down
hill. I can replicate Françoise's problem on his cluster. However, if
I instead do:
mpirun --mca btl_tcp_if_exclude bge0,lo0 -np 2 --host n15,n27 ./
the routing issues are resolved and everything runs to completion.
I'll make sure to update the documentation for 1.1 so that this
hopefully doesn't confuse too many more people.
Open MPI developer