Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Tim Prins (tprins_at_[hidden])
Date: 2007-07-05 13:15:38


Hi Henk,

By specifying '--mca btl mx,self' you are telling Open MPI not to use
its shared memory support. If you want to use Open MPI's shared memory
support, you must add 'sm' to the list. I.e. '--mca btl mx,self'. If you
would rather use MX's shared memory support, instead use '--mca btl
mx,self --mca btl_mx_shared_mem 1'. However, in most cases I believe
Open MPI's shared memory support is a bit better.

Alternatively, if you don't specify any btls, Open MPI should figure out
what to use automatically.

Hope this helps,

Tim

SLIM H.A. wrote:
> Hello
>
> I have compiled openmpi-1.2.3 with the --with-mx=<directory>
> configuration and gcc compiler. On testing with 4-8 slots I get an error
> message, the mx ports are busy:
>
>> mpirun --mca btl mx,self -np 4 ./cpi
> [node001:10071] mca_btl_mx_init: mx_open_endpoint() failed with
> status=20
> [node001:10074] mca_btl_mx_init: mx_open_endpoint() failed with
> status=20
> [node001:10073] mca_btl_mx_init: mx_open_endpoint() failed with
> status=20
> ------------------------------------------------------------------------
> --
> Process 0.1.0 is unable to reach 0.1.1 for MPI communication.
> If you specified the use of a BTL component, you may have
> forgotten a component (such as "self") in the list of
> usable components.
> ... snipped
> It looks like MPI_INIT failed for some reason; your parallel process is
> likely to abort. There are many reasons that a parallel process can
> fail during MPI_INIT; some of which are due to configuration or
> environment
> problems. This failure appears to be an internal failure; here's some
> additional information (which may only be relevant to an Open MPI
> developer):
>
> PML add procs failed
> --> Returned "Unreachable" (-12) instead of "Success" (0)
> ------------------------------------------------------------------------
> --
> *** An error occurred in MPI_Init
> *** before MPI was initialized
> *** MPI_ERRORS_ARE_FATAL (goodbye)
> mpirun noticed that job rank 0 with PID 10071 on node node001 exited on
> signal 1 (Hangup).
>
>
> I would not expect mx messages as communication should not go through
> the mx card? (This is a twin dual core shared memory node)
> The same happens when testing on 2 nodes, using a hostfile.
> I checked the state of the mx card with mx_endpoint_info and mx_info,
> they are healthy and free.
> What is missing here?
>
> Thanks
>
> Henk
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users