Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: George Bosilca (bosilca_at_[hidden])
Date: 2007-07-11 11:27:47


What's in the hostmx10g file ? How many hosts ?

   george.

On Jul 11, 2007, at 1:34 AM, Warner Yuen wrote:

> I've also had someone run into the endpoint busy problem. I never
> figured it out, I just increased the default endpoints on MX-10G
> from 8 to 16 endpoints to make the problem go away. Here's the
> actual command and error before setting the endpoints to 16. The
> version is MX-1.2.1with OMPI 1.2.3:
>
> node1:~/taepic tae$ mpirun --hostfile hostmx10g -byslot -mca btl
> self,sm,mx -np 12 test_beam_injection test_beam_injection.inp -npx
> 12 > out12
> [node2:00834] mca_btl_mx_init: mx_open_endpoint() failed with
> status=20
> ----------------------------------------------------------------------
> ----
> Process 0.1.3 is unable to reach 0.1.7 for MPI communication.
> If you specified the use of a BTL component, you may have
> forgotten a component (such as "self") in the list of
> usable components.
> ----------------------------------------------------------------------
> ----
> ----------------------------------------------------------------------
> ----
> Process 0.1.11 is unable to reach 0.1.7 for MPI communication.
> If you specified the use of a BTL component, you may have
> forgotten a component (such as "self") in the list of
> usable components.
> ----------------------------------------------------------------------
> ----
> ----------------------------------------------------------------------
> ----
> It looks like MPI_INIT failed for some reason; your parallel
> process is
> likely to abort. There are many reasons that a parallel process can
> fail during MPI_INIT; some of which are due to configuration or
> environment
> problems. This failure appears to be an internal failure; here's some
> additional information (which may only be relevant to an Open MPI
> developer):
>
> PML add procs failed
> --> Returned "Unreachable" (-12) instead of "Success" (0)
> ----------------------------------------------------------------------
> ----
> *** An error occurred in MPI_Init
> *** before MPI was initialized
> *** MPI_ERRORS_ARE_FATAL (goodbye)
> ----------------------------------------------------------------------
> ----
>
>
> Warner Yuen
> Scientific Computing Consultant
> Apple Computer
> email: wyuen_at_[hidden]
> Tel: 408.718.2859
> Fax: 408.715.0133
>
>
> On Jul 10, 2007, at 7:53 AM, users-request_at_[hidden] wrote:
>
>> ------------------------------
>>
>> Message: 2
>> Date: Tue, 10 Jul 2007 09:19:42 -0400
>> From: Tim Prins <tprins_at_[hidden]>
>> Subject: Re: [OMPI users] openmpi fails on mx endpoint busy
>> To: Open MPI Users <users_at_[hidden]>
>> Message-ID: <4693876E.4070302_at_[hidden]>
>> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>>
>> SLIM H.A. wrote:
>>> Dear Tim
>>>
>>>> So, you should just be able to run:
>>>> mpirun --mca btl mx,sm,self -mca mtl ^mx -np 4 -hostfile
>>>> ompi_machinefile ./cpi
>>>
>>> I tried
>>>
>>> node001>mpirun --mca btl mx,sm,self -mca mtl ^mx -np 4 -hostfile
>>> ompi_machinefile ./cpi
>>>
>>> I put in a sleep call to keep it running for some time and to
>>> monitor
>>> the endpoints. None of the 4 were open, it must have used tcp.
>> No, this is not possible. With this command line it will not use tcp.
>> Are you launching on more than one machine? If the procs are all
>> on one
>> machine, then it will use the shared memory component to communicate
>> (sm), although the endpoints should still be opened.
>>
>> Just to make sure, you did put the sleep between MPI_Init and
>> MPI_Finalize?
>>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users