Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] openmpi/ib noob question
From: Ralph Castain (rhc_at_[hidden])
Date: 2009-02-11 19:54:45


The ^ applies to everything that follows, so you just turned off all
of the tcp, self, and openib comm paths. :-)

If you just wanted to drop tcp from that list, you should just use -
mca btl self,openib.

Ralph

On Feb 11, 2009, at 2:01 PM, Gary Draving wrote:

> Hello,
>
> When running the followng program on 4 of my nodes I get the
> expected response:
>
> "/usr/local/bin/mpirun --mca btl tcp,self,openib --hostfile ibnodes -
> np 4 hello_c"
> Hello, world, I am 0 of 4
> Hello, world, I am 2 of 4
> Hello, world, I am 1 of 4
> Hello, world, I am 3 of 4
>
> But when I run it with ^TCP "/usr/local/bin/mpirun --mca btl
> ^tcp,self,openib --hostfile ibnodes -np 4 hello_c"
>
> I get the following:. Does this mean my mpi (openmpi 1.3) is not
> configured properly w/ ib support?
>
> Thanks for any help you can give me.
> Gary
>
>
> *** An error occurred in MPI_Init
> *** before MPI was initialized
> *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
> --------------------------------------------------------------------------
> At least one pair of MPI processes are unable to reach each other for
> MPI communications. This means that no Open MPI device has indicated
> that it can be used to communicate between these processes. This is
> an error; Open MPI requires that all MPI processes be able to reach
> each other. This error can sometimes be the result of forgetting to
> specify the "self" BTL.
>
> Process 1 ([[7579,1],1]) is on host: compute-0-1.local
> Process 2 ([[7579,1],0]) is on host: 11
> BTLs attempted: sm
>
> Your MPI job is now going to abort; sorry.
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> It looks like MPI_INIT failed for some reason; your parallel process
> is
> likely to abort. There are many reasons that a parallel process can
> fail during MPI_INIT; some of which are due to configuration or
> environment
> problems. This failure appears to be an internal failure; here's some
> additional information (which may only be relevant to an Open MPI
> developer):
>
> PML add procs failed
> --> Returned "Unreachable" (-12) instead of "Success" (0)
> --------------------------------------------------------------------------
> [compute-0-1.local:1763] Abort before MPI_INIT completed
> successfully; not able to guarantee that all other processes were
> killed!
> *** An error occurred in MPI_Init
> *** before MPI was initialized
> *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
> [compute-0-0.local:12308] Abort before MPI_INIT completed
> successfully; not able to guarantee that all other processes were
> killed!
> *** An error occurred in MPI_Init
> *** before MPI was initialized
> *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
> [compute-0-3.local:14123] Abort before MPI_INIT completed
> successfully; not able to guarantee that all other processes were
> killed!
> *** An error occurred in MPI_Init
> *** before MPI was initialized
> *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
> [compute-0-2.local:19185] Abort before MPI_INIT completed
> successfully; not able to guarantee that all other processes were
> killed!
> --------------------------------------------------------------------------
> mpirun has exited due to process rank 1 with PID 1763 on
> node 11.2.0.1 exiting without calling "finalize". This may
> have caused other processes in the application to be
> terminated by signals sent by mpirun (as reported here).
> --------------------------------------------------------------------------
> [dahl.calvin.edu:21712] 3 more processes have sent help message help-
> mca-bml-r2.txt / unreachable proc
> [dahl.calvin.edu:21712] Set MCA parameter "orte_base_help_aggregate"
> to 0 to see all help / error messages
> [dahl.calvin.edu:21712] 3 more processes have sent help message help-
> mpi-runtime / mpi_init:startup:internal-failure
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users