Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] openmpi/ib noob question
From: Gary Draving (gbd2_at_[hidden])
Date: 2009-02-12 09:12:11


Yes, thanks, that seems to have been the problem,
Gary

Ralph Castain wrote:
> The ^ applies to everything that follows, so you just turned off all
> of the tcp, self, and openib comm paths. :-)
>
> If you just wanted to drop tcp from that list, you should just use
> -mca btl self,openib.
>
> Ralph
>
> On Feb 11, 2009, at 2:01 PM, Gary Draving wrote:
>
>> Hello,
>>
>> When running the followng program on 4 of my nodes I get the expected
>> response:
>>
>> "/usr/local/bin/mpirun --mca btl tcp,self,openib --hostfile ibnodes
>> -np 4 hello_c"
>> Hello, world, I am 0 of 4
>> Hello, world, I am 2 of 4
>> Hello, world, I am 1 of 4
>> Hello, world, I am 3 of 4
>>
>> But when I run it with ^TCP "/usr/local/bin/mpirun --mca btl
>> ^tcp,self,openib --hostfile ibnodes -np 4 hello_c"
>>
>> I get the following:. Does this mean my mpi (openmpi 1.3) is not
>> configured properly w/ ib support?
>>
>> Thanks for any help you can give me.
>> Gary
>>
>>
>> *** An error occurred in MPI_Init
>> *** before MPI was initialized
>> *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
>> --------------------------------------------------------------------------
>>
>> At least one pair of MPI processes are unable to reach each other for
>> MPI communications. This means that no Open MPI device has indicated
>> that it can be used to communicate between these processes. This is
>> an error; Open MPI requires that all MPI processes be able to reach
>> each other. This error can sometimes be the result of forgetting to
>> specify the "self" BTL.
>>
>> Process 1 ([[7579,1],1]) is on host: compute-0-1.local
>> Process 2 ([[7579,1],0]) is on host: 11
>> BTLs attempted: sm
>>
>> Your MPI job is now going to abort; sorry.
>> --------------------------------------------------------------------------
>>
>> --------------------------------------------------------------------------
>>
>> It looks like MPI_INIT failed for some reason; your parallel process is
>> likely to abort. There are many reasons that a parallel process can
>> fail during MPI_INIT; some of which are due to configuration or
>> environment
>> problems. This failure appears to be an internal failure; here's some
>> additional information (which may only be relevant to an Open MPI
>> developer):
>>
>> PML add procs failed
>> --> Returned "Unreachable" (-12) instead of "Success" (0)
>> --------------------------------------------------------------------------
>>
>> [compute-0-1.local:1763] Abort before MPI_INIT completed
>> successfully; not able to guarantee that all other processes were
>> killed!
>> *** An error occurred in MPI_Init
>> *** before MPI was initialized
>> *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
>> [compute-0-0.local:12308] Abort before MPI_INIT completed
>> successfully; not able to guarantee that all other processes were
>> killed!
>> *** An error occurred in MPI_Init
>> *** before MPI was initialized
>> *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
>> [compute-0-3.local:14123] Abort before MPI_INIT completed
>> successfully; not able to guarantee that all other processes were
>> killed!
>> *** An error occurred in MPI_Init
>> *** before MPI was initialized
>> *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
>> [compute-0-2.local:19185] Abort before MPI_INIT completed
>> successfully; not able to guarantee that all other processes were
>> killed!
>> --------------------------------------------------------------------------
>>
>> mpirun has exited due to process rank 1 with PID 1763 on
>> node 11.2.0.1 exiting without calling "finalize". This may
>> have caused other processes in the application to be
>> terminated by signals sent by mpirun (as reported here).
>> --------------------------------------------------------------------------
>>
>> [dahl.calvin.edu:21712] 3 more processes have sent help message
>> help-mca-bml-r2.txt / unreachable proc
>> [dahl.calvin.edu:21712] Set MCA parameter "orte_base_help_aggregate"
>> to 0 to see all help / error messages
>> [dahl.calvin.edu:21712] 3 more processes have sent help message
>> help-mpi-runtime / mpi_init:startup:internal-failure
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users