Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] how to select a specific network
From: Rolf Vandevaart (Rolf.Vandevaart_at_[hidden])
Date: 2008-01-11 16:05:55


Hello:
Have you actually tried this and got it to work? It did not work for me.

  burl-ct-v440-0 50 =>mpirun -host burl-ct-v440-0,burl-ct-v440-1 -np 1
-mca btl self,sm,tcp -mca btl_tcp_if_include ce0 connectivity_c : -np 1
-mca btl self,sm,tcp -mca btl_tcp_if_include ce0 connectivity_c
Connectivity test on 2 processes PASSED.
  burl-ct-v440-0 51 =>mpirun -host burl-ct-v440-0,burl-ct-v440-1 -np 1
-mca btl self,sm,tcp -mca btl_tcp_if_include ibd0 connectivity_c : -np 1
-mca btl self,sm,tcp -mca btl_tcp_if_include ibd0 connectivity_c
Connectivity test on 2 processes PASSED.
  burl-ct-v440-0 52 =>mpirun -host burl-ct-v440-0,burl-ct-v440-1 -np 1
-mca btl self,sm,tcp -mca btl_tcp_if_include ce0 connectivity_c : -np 1
-mca btl self,sm,tcp -mca btl_tcp_if_include ibd0 connectivity_c
--------------------------------------------------------------------------
Process 0.1.1 is unable to reach 0.1.0 for MPI communication.
If you specified the use of a BTL component, you may have
forgotten a component (such as "self") in the list of
usable components.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems. This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

   PML add procs failed
   --> Returned "Unreachable" (-12) instead of "Success" (0)
--------------------------------------------------------------------------
--------------------------------------------------------------------------
Process 0.1.0 is unable to reach 0.1.1 for MPI communication.
If you specified the use of a BTL component, you may have
forgotten a component (such as "self") in the list of
usable components.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems. This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

   PML add procs failed
   --> Returned "Unreachable" (-12) instead of "Success" (0)
--------------------------------------------------------------------------
*** An error occurred in MPI_Init
*** before MPI was initialized
*** MPI_ERRORS_ARE_FATAL (goodbye)
*** An error occurred in MPI_Init
*** before MPI was initialized
*** MPI_ERRORS_ARE_FATAL (goodbye)
  burl-ct-v440-0 53 =>

Aurélien Bouteiller wrote:
> Try something similar to this
>
> mpirun -np 1 -mca btl self,tcp -mca btl_tcp_if_include en1 NetPIPE_3.6/
> NPmpi : -np 1 -mca btl self,tcp -mca btl_tcp_if_include en0
> NetPIPE_3.6/NPmpi
>
> You should then be able to specify a different if_include mask for you
> different processes.
>
> Aurelien
>
> Le 11 janv. 08 à 06:46, Lydia Heck a écrit :
>
>> I should have added that the two networks are not routable,
>> and that they are private class B.
>>
>>
>> On Fri, 11 Jan 2008, Lydia Heck wrote:
>>
>>> I have a setup which contains one set of machines
>>> with one nge and one e1000g network and of machines
>>> with two e1000g networks configured. I am planning a
>>> large run where all these computers will be occupied
>>> with one job and the mpi communication should only go
>>> over one specific network which is configured over
>>> e1000g0 on the first set of machines and on e1000g1 on the
>>> second set. I cannot use - for obvious reasons to either
>>> include all of e1000g or to exclude part of e1000g - if that is
>>> possible.
>>> So I have to exclude or include on the internet number range.
>>>
>>> Is there an obvious flag - which I have not yet found - to tell
>>> mpirun to use one specific network?
>>>
>>> Lydia
>>>
>>> ------------------------------------------
>>> Dr E L Heck
>>>
>>> University of Durham
>>> Institute for Computational Cosmology
>>> Ogden Centre
>>> Department of Physics
>>> South Road
>>>
>>> DURHAM, DH1 3LE
>>> United Kingdom
>>>
>>> e-mail: lydia.heck_at_[hidden]
>>>
>>> Tel.: + 44 191 - 334 3628
>>> Fax.: + 44 191 - 334 3645
>>> ___________________________________________
>>>
>> ------------------------------------------
>> Dr E L Heck
>>
>> University of Durham
>> Institute for Computational Cosmology
>> Ogden Centre
>> Department of Physics
>> South Road
>>
>> DURHAM, DH1 3LE
>> United Kingdom
>>
>> e-mail: lydia.heck_at_[hidden]
>>
>> Tel.: + 44 191 - 334 3628
>> Fax.: + 44 191 - 334 3645
>> ___________________________________________
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
> Dr. Aurélien Bouteiller
> Sr. Research Associate - Innovative Computing Laboratory
> Suite 350, 1122 Volunteer Boulevard
> Knoxville, TN 37996
> 865 974 6321
>
>
>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
=========================
rolf.vandevaart_at_[hidden]
781-442-3043
=========================