Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Warner Yuen (wyuen_at_[hidden])
Date: 2007-07-11 01:34:46


I've also had someone run into the endpoint busy problem. I never
figured it out, I just increased the default endpoints on MX-10G from
8 to 16 endpoints to make the problem go away. Here's the actual
command and error before setting the endpoints to 16. The version is
MX-1.2.1with OMPI 1.2.3:

node1:~/taepic tae$ mpirun --hostfile hostmx10g -byslot -mca btl
self,sm,mx -np 12 test_beam_injection test_beam_injection.inp -npx 12
> out12
[node2:00834] mca_btl_mx_init: mx_open_endpoint() failed with status=20
------------------------------------------------------------------------

--
Process 0.1.3 is unable to reach 0.1.7 for MPI communication.
If you specified the use of a BTL component, you may have
forgotten a component (such as "self") in the list of
usable components.
------------------------------------------------------------------------ 
--
------------------------------------------------------------------------ 
--
Process 0.1.11 is unable to reach 0.1.7 for MPI communication.
If you specified the use of a BTL component, you may have
forgotten a component (such as "self") in the list of
usable components.
------------------------------------------------------------------------ 
--
------------------------------------------------------------------------ 
--
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or  
environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):
  PML add procs failed
  --> Returned "Unreachable" (-12) instead of "Success" (0)
------------------------------------------------------------------------ 
--
*** An error occurred in MPI_Init
*** before MPI was initialized
*** MPI_ERRORS_ARE_FATAL (goodbye)
------------------------------------------------------------------------ 
--
Warner Yuen
Scientific Computing Consultant
Apple Computer
email: wyuen_at_[hidden]
Tel: 408.718.2859
Fax: 408.715.0133
On Jul 10, 2007, at 7:53 AM, users-request_at_[hidden] wrote:
> ------------------------------
>
> Message: 2
> Date: Tue, 10 Jul 2007 09:19:42 -0400
> From: Tim Prins <tprins_at_[hidden]>
> Subject: Re: [OMPI users] openmpi fails on mx endpoint busy
> To: Open MPI Users <users_at_[hidden]>
> Message-ID: <4693876E.4070302_at_[hidden]>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> SLIM H.A. wrote:
>> Dear Tim
>>
>>> So, you should just be able to run:
>>> mpirun --mca btl mx,sm,self -mca mtl ^mx -np 4 -hostfile
>>> ompi_machinefile ./cpi
>>
>> I tried
>>
>> node001>mpirun --mca btl mx,sm,self -mca mtl ^mx -np 4 -hostfile
>> ompi_machinefile ./cpi
>>
>> I put in a sleep call to keep it running for some time and to monitor
>> the endpoints. None of the 4 were open, it must have used tcp.
> No, this is not possible. With this command line it will not use tcp.
> Are you launching on more than one machine? If the procs are all on  
> one
> machine, then it will use the shared memory component to communicate
> (sm), although the endpoints should still be opened.
>
> Just to make sure, you did put the sleep between MPI_Init and  
> MPI_Finalize?
>