Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] selected pml cm, but peer [[2469, 1], 0] on compute-0-0 selected pml ob1
From: Gary Draving (gbd2_at_[hidden])
Date: 2009-03-19 10:17:13


Wow!, that seems to have worked. fs1 has a Qlogic QLE7240.

I got it to work from the command line first then added "pml - ob1" to
/usr/local/etc/openmpi-mca-params.conf which works as well.

Thanks for all your help!.

Gary

Nysal Jan wrote:
> fs1 is selecting the "cm" PML whereas other nodes are selecting the
> "ob1" PML component. You can force ob1 to be used via "--mca pml ob1"
>
> What kind of hardware/NIC does fs1 have?
>
> --Nysal
>
> On Wed, 2009-03-18 at 17:17 -0400, Gary Draving wrote:
>
>> Hi all,
>>
>> anyone ever seen an error like this? Seems like I have some setting
>> wrong in opemmpi. I thought I had it setup like the other machines but
>> seems as though I have missed something. I only get the error when
>> adding machine "fs1" to the hostfile list. The other 40+ machines seem
>> fine.
>>
>> [fs1.calvin.edu:01750] [[2469,1],6] selected pml cm, but peer
>> [[2469,1],0] on compute-0-0 selected pml ob1
>>
>> When I use ompi_info the output looks like my other machines:
>>
>> [root_at_fs1 openmpi-1.3]# ompi_info | grep btl
>> MCA btl: ofud (MCA v2.0, API v2.0, Component v1.3)
>> MCA btl: openib (MCA v2.0, API v2.0, Component v1.3)
>> MCA btl: self (MCA v2.0, API v2.0, Component v1.3)
>> MCA btl: sm (MCA v2.0, API v2.0, Component v1.3)
>>
>> The whole error is below, any help would be greatly appreciated.
>>
>> Gary
>>
>> [admin_at_dahl 00.greetings]$ /usr/local/bin/mpirun --mca btl ^tcp
>> --hostfile machines -np 7 greetings
>> [fs1.calvin.edu:01959] [[2212,1],6] selected pml cm, but peer
>> [[2212,1],0] on compute-0-0 selected pml ob1
>> --------------------------------------------------------------------------
>> It looks like MPI_INIT failed for some reason; your parallel process is
>> likely to abort. There are many reasons that a parallel process can
>> fail during MPI_INIT; some of which are due to configuration or environment
>> problems. This failure appears to be an internal failure; here's some
>> additional information (which may only be relevant to an Open MPI
>> developer):
>>
>> PML add procs failed
>> --> Returned "Unreachable" (-12) instead of "Success" (0)
>> --------------------------------------------------------------------------
>> *** An error occurred in MPI_Init
>> *** before MPI was initialized
>> *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
>> [fs1.calvin.edu:1959] Abort before MPI_INIT completed successfully; not
>> able to guarantee that all other processes were killed!
>> --------------------------------------------------------------------------
>> At least one pair of MPI processes are unable to reach each other for
>> MPI communications. This means that no Open MPI device has indicated
>> that it can be used to communicate between these processes. This is
>> an error; Open MPI requires that all MPI processes be able to reach
>> each other. This error can sometimes be the result of forgetting to
>> specify the "self" BTL.
>>
>> Process 1 ([[2212,1],3]) is on host: dahl.calvin.edu
>> Process 2 ([[2212,1],0]) is on host: compute-0-0
>> BTLs attempted: openib self sm
>>
>> Your MPI job is now going to abort; sorry.
>> --------------------------------------------------------------------------
>> *** An error occurred in MPI_Init
>> *** before MPI was initialized
>> *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
>> [dahl.calvin.edu:16884] Abort before MPI_INIT completed successfully;
>> not able to guarantee that all other processes were killed!
>> *** An error occurred in MPI_Init
>> *** before MPI was initialized
>> *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
>> [compute-0-0.local:1591] Abort before MPI_INIT completed successfully;
>> not able to guarantee that all other processes were killed!
>> *** An error occurred in MPI_Init
>> *** before MPI was initialized
>> *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
>> [fs2.calvin.edu:8826] Abort before MPI_INIT completed successfully; not
>> able to guarantee that all other processes were killed!
>> --------------------------------------------------------------------------
>> mpirun has exited due to process rank 3 with PID 16884 on
>> node dahl.calvin.edu exiting without calling "finalize". This may
>> have caused other processes in the application to be
>> terminated by signals sent by mpirun (as reported here).
>> --------------------------------------------------------------------------
>> [dahl.calvin.edu:16879] 3 more processes have sent help message
>> help-mpi-runtime / mpi_init:startup:internal-failure
>> [dahl.calvin.edu:16879] Set MCA parameter "orte_base_help_aggregate" to
>> 0 to see all help / error messages
>> [dahl.calvin.edu:16879] 2 more processes have sent help message
>> help-mca-bml-r2.txt / unreachable proc
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>