Dear Michael
I have now tried both
mpirun --mca btl mx,sm -np 4 ./cpi
which gives the same error message again,
and,
mpirun --mca btl mx,sm,self -np 4
./cpi_gcc_ompi_mx
actually locks some of the mx ports but not all 4, ie this
is the output from endpoint info:
1 Myrinet board installed.
The MX driver is configured
to support up to 4 endpoints on 4
boards.
===================================================================
Board
#0:
Endpoint
PID
Command
Info
<raw>
5061
mx_mapper
0
20315 cpi
There
are currently 1 regular endpoint open
This
is the output from the node:
>mpirun --mca btl mx,sm,self -np 4
./cpi_gcc_ompi_mx
[node001:20312] mca_btl_mx_init: mx_open_endpoint() failed
with status=20
[node001:20314] mca_btl_mx_init: mx_open_endpoint() failed
with status=20
[node001:20313] mca_btl_mx_init: mx_open_endpoint() failed
with status=20
Thanks
Henk
If the machine is multi-processor you might want to add the sm
btl. That cleared up some similar problems for me, though I don't use mx
so your millage may vary.
On 7/5/07, SLIM
H.A. <h.a.slim@durham.ac.uk>
wrote:
Hello
I
have compiled openmpi-1.2.3 with the
--with-mx=<directory>
configuration and gcc compiler. On testing
with 4-8 slots I get an error
message, the mx ports are
busy:
>mpirun --mca btl mx,self -np 4 ./cpi
[node001:10071]
mca_btl_mx_init: mx_open_endpoint() failed
with
status=20
[node001:10074] mca_btl_mx_init: mx_open_endpoint()
failed with
status=20
[node001:10073] mca_btl_mx_init:
mx_open_endpoint() failed with
status=20
------------------------------------------------------------------------
--
Process
0.1.0 is unable to reach 0.1.1 for MPI communication.
If you specified
the use of a BTL component, you may have
forgotten a component (such as
"self") in the list of
usable components.
... snipped
It looks like
MPI_INIT failed for some reason; your parallel process is
likely to
abort. There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration
or
environment
problems. This failure appears to be an
internal failure; here's some
additional information (which may only be
relevant to an Open MPI
developer):
PML add procs
failed
--> Returned "Unreachable" (-12) instead of
"Success"
(0)
------------------------------------------------------------------------
--
***
An error occurred in MPI_Init
*** before MPI was initialized
***
MPI_ERRORS_ARE_FATAL (goodbye)
mpirun noticed that job rank 0 with PID
10071 on node node001 exited on
signal 1 (Hangup).
I would not
expect mx messages as communication should not go through
the mx card?
(This is a twin dual core shared memory node)
The same happens
when testing on 2 nodes, using a hostfile.
I checked the state of the mx
card with mx_endpoint_info and mx_info,
they are healthy and free.
What is missing
here?
Thanks
Henk
_______________________________________________
users
mailing list
users@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users