Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Scott Atchley (atchley_at_[hidden])
Date: 2007-07-06 08:30:03


On Jul 6, 2007, at 7:37 AM, SLIM H.A. wrote:

> Dear Michael
>
> I have now tried both
>
> mpirun --mca btl mx,sm -np 4 ./cpi
>
> which gives the same error message again, and,
>
> mpirun --mca btl mx,sm,self -np 4 ./cpi_gcc_ompi_mx
>
> actually locks some of the mx ports but not all 4, ie this is the
> output from endpoint info:
>
> 1 Myrinet board installed.
> The MX driver is configured to support up to 4 endpoints on 4 boards.
> ===================================================================
> Board #0:
> Endpoint PID Command Info
> <raw> 5061 mx_mapper
> 0 20315 cpi
> There are currently 1 regular endpoint open
>
> This is the output from the node:
> >mpirun --mca btl mx,sm,self -np 4 ./cpi_gcc_ompi_mx
> [node001:20312] mca_btl_mx_init: mx_open_endpoint() failed with
> status=20
> [node001:20314] mca_btl_mx_init: mx_open_endpoint() failed with
> status=20
> [node001:20313] mca_btl_mx_init: mx_open_endpoint() failed with
> status=20
> Thanks
>
> Henk
>
>
>
> From: users-bounces_at_[hidden] [mailto:users-bounces_at_open-
> mpi.org] On Behalf Of Michael Edwards
> Sent: 05 July 2007 18:06
> To: Open MPI Users
> Subject: Re: [OMPI users] openmpi fails on mx endpoint busy
>
> If the machine is multi-processor you might want to add the sm
> btl. That cleared up some similar problems for me, though I don't
> use mx so your millage may vary.
>
> On 7/5/07, SLIM H.A. <h.a.slim_at_[hidden]> wrote:
> Hello
>
> I have compiled openmpi-1.2.3 with the --with-mx=<directory>
> configuration and gcc compiler. On testing with 4-8 slots I get an
> error
> message, the mx ports are busy:
>
> >mpirun --mca btl mx,self -np 4 ./cpi
> [node001:10071] mca_btl_mx_init: mx_open_endpoint() failed with
> status=20
> [node001:10074] mca_btl_mx_init: mx_open_endpoint() failed with
> status=20
> [node001:10073] mca_btl_mx_init: mx_open_endpoint() failed with
> status=20
> ----------------------------------------------------------------------
> --
> --
> Process 0.1.0 is unable to reach 0.1.1 for MPI communication.
> If you specified the use of a BTL component, you may have
> forgotten a component (such as "self") in the list of
> usable components.
> ... snipped
> It looks like MPI_INIT failed for some reason; your parallel
> process is
> likely to abort. There are many reasons that a parallel process can
> fail during MPI_INIT; some of which are due to configuration or
> environment
> problems. This failure appears to be an internal failure; here's some
> additional information (which may only be relevant to an Open MPI
> developer):
>
> PML add procs failed
> --> Returned "Unreachable" (-12) instead of "Success" (0)
> ----------------------------------------------------------------------
> --
> --
> *** An error occurred in MPI_Init
> *** before MPI was initialized
> *** MPI_ERRORS_ARE_FATAL (goodbye)
> mpirun noticed that job rank 0 with PID 10071 on node node001
> exited on
> signal 1 (Hangup).
>
>
> I would not expect mx messages as communication should not go through
> the mx card? (This is a twin dual core shared memory node)
> The same happens when testing on 2 nodes, using a hostfile.
> I checked the state of the mx card with mx_endpoint_info and mx_info,
> they are healthy and free.
> What is missing here?
>
> Thanks
>
> Henk

Henk,

OMPI is successfully opening one endpoint and the other three fail
with MX_BUSY (error 20). This might happen if they are all trying to
open the same endpoint ID. OMPI normally does not do this. I do not
see a hostfile or host parameters specified. What is OMPI using for a
machinefile?

Also, could you try creating a host file named "hosts" with the names
of your machines and then try:

$ mpirun -np 2 --hostfile hosts ./cpi

and then

$ mpirun -np 2 --hostfile hosts --mca pml cm ./cpi

Scott