Dear Tim
I followed the use of "--mca btl mx,self" as suggested in the FAQ
http://www.open-mpi.org/faq/?category=myrinet#myri-btl
When I use your extra mca value I get:
>mpirun --mca btl mx,self --mca btl_mx_shared_mem 1 -np 4 ./cpi
>
------------------------------------------------------------------------
--
> WARNING: A user-supplied value attempted to override the read-only MCA
> parameter named "btl_mx_shared_mem".
> The user-supplied value was ignored.
followed by the same error messages as before.
Note that although I add "self" the error messages complain about it
missing:
> > Process 0.1.0 is unable to reach 0.1.1 for MPI communication.
> > If you specified the use of a BTL component, you may have
> forgotten a
> > component (such as "self") in the list of usable components.
I checked the output from mx_info for both the current node and another,
there seems not to be a problem.
I attch the output from ompi_info --all
Also
>ompi_info | grep mx
Prefix:
/usr/local/Cluster-Apps/openmpi/mx/gcc/64/1.2.3
MCA btl: mx (MCA v1.0, API v1.0.1, Component v1.2.3)
MCA mtl: mx (MCA v1.0, API v1.0, Component v1.2.3)
As a further check, I rebuild the exe with mpich and that works fine on
the same node over myrinet. I wonder whether mx is properly include in
my openmpi build.
Use of ldd -v on the mpich exe gives references to libmyriexpress.so,
which is not the case for the ompi built exe, suggesting something is
missing?
I used --with-mx=/usr/local/Cluster-Apps/mx/mx-1.1.1
and a listing of that directory is
>ls /usr/local/Cluster-Apps/mx/mx-1.1.1
bin etc include lib lib32 lib64 sbin
This should be sufficient, I don't need --with-mx-libdir?
Thanks
Henk
> -----Original Message-----
> From: users-bounces_at_[hidden]
> [mailto:users-bounces_at_[hidden]] On Behalf Of Tim Prins
> Sent: 05 July 2007 18:16
> To: Open MPI Users
> Subject: Re: [OMPI users] openmpi fails on mx endpoint busy
>
> Hi Henk,
>
> By specifying '--mca btl mx,self' you are telling Open MPI
> not to use its shared memory support. If you want to use Open
> MPI's shared memory support, you must add 'sm' to the list.
> I.e. '--mca btl mx,self'. If you would rather use MX's shared
> memory support, instead use '--mca btl mx,self --mca
> btl_mx_shared_mem 1'. However, in most cases I believe Open
> MPI's shared memory support is a bit better.
>
> Alternatively, if you don't specify any btls, Open MPI should
> figure out what to use automatically.
>
> Hope this helps,
>
> Tim
>
> SLIM H.A. wrote:
> > Hello
> >
> > I have compiled openmpi-1.2.3 with the --with-mx=<directory>
> > configuration and gcc compiler. On testing with 4-8 slots I get an
> > error message, the mx ports are busy:
> >
> >> mpirun --mca btl mx,self -np 4 ./cpi
> > [node001:10071] mca_btl_mx_init: mx_open_endpoint() failed with
> > status=20 [node001:10074] mca_btl_mx_init:
> mx_open_endpoint() failed
> > with status=20 [node001:10073] mca_btl_mx_init: mx_open_endpoint()
> > failed with status=20
> >
> ----------------------------------------------------------------------
> > --
> > --
> > Process 0.1.0 is unable to reach 0.1.1 for MPI communication.
> > If you specified the use of a BTL component, you may have
> forgotten a
> > component (such as "self") in the list of usable components.
> > ... snipped
> > It looks like MPI_INIT failed for some reason; your
> parallel process
> > is likely to abort. There are many reasons that a parallel process
> > can fail during MPI_INIT; some of which are due to configuration or
> > environment problems. This failure appears to be an
> internal failure;
> > here's some additional information (which may only be
> relevant to an
> > Open MPI
> > developer):
> >
> > PML add procs failed
> > --> Returned "Unreachable" (-12) instead of "Success" (0)
> >
> ----------------------------------------------------------------------
> > --
> > --
> > *** An error occurred in MPI_Init
> > *** before MPI was initialized
> > *** MPI_ERRORS_ARE_FATAL (goodbye)
> > mpirun noticed that job rank 0 with PID 10071 on node
> node001 exited
> > on signal 1 (Hangup).
> >
> >
> > I would not expect mx messages as communication should not
> go through
> > the mx card? (This is a twin dual core shared memory node)
> The same
> > happens when testing on 2 nodes, using a hostfile.
> > I checked the state of the mx card with mx_endpoint_info
> and mx_info,
> > they are healthy and free.
> > What is missing here?
> >
> > Thanks
> >
> > Henk
> >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
|