Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: de Almeida, Valmor F. (dealmeidav_at_[hidden])
Date: 2007-04-01 11:58:59


Hello Tim,

Thanks for the info. I also received this help from Myrinet:

************
It looks like you are running out of endpoints.

This discusses what endpoints are:
 http://www.myri.com/cgi-bin/fom.pl?file=421

And this explains how to increase the limit:
 http://www.myri.com/cgi-bin/fom.pl?file=482

Let us know if this doesn't address the problem.
************

I haven't had time to look into it.

--
Valmor
> -----Original Message-----
> From: users-bounces_at_[hidden] [mailto:users-bounces_at_[hidden]]
On
> Behalf Of Tim Prins
> Sent: Friday, March 30, 2007 10:49 PM
> To: Open MPI Users
> Subject: Re: [OMPI users] mca_btl_mx_init: mx_open_endpoint() failed
> withstatus=20
> 
> Hi Valmor,
> 
> What is happening here is that when Open MPI tries to create MX
endpoint
> for
> communication, mx returns code 20, which is MX_BUSY.
> 
> At this point we should gracefully move on, but there is a bug in Open
MPI
> 1.2
> which causes a segmentation fault in case of this type of error. This
will
> be
> fixed in 1.2.1, and the fix is available now in the 1.2 nightly
tarballs.
> 
> Hope this helps,
> 
> Tim
> 
> On Friday 30 March 2007 05:06 pm, de Almeida, Valmor F. wrote:
> > Hello,
> >
> > I am getting this error any time the number of processes requested
per
> > machine is greater than the number of cpus. I suspect it is
something on
> > the configuration of mx / ompi that I am missing since another
machine I
> > have without mx installed runs ompi correctly with oversubscription.
> >
> > Thanks for any help.
> >
> > --
> > Valmor
> >
> >
> > ->mpirun -np 3 --machinefile mymachines-1 a.out
> > [x1:23624] mca_btl_mx_init: mx_open_endpoint() failed with status=20
> > [x1:23624] *** Process received signal *** [x1:23624] Signal:
> > Segmentation fault (11) [x1:23624] Signal code: Address not mapped
(1)
> > [x1:23624] Failing at address: 0x20 [x1:23624] [ 0] [0xb7f7f440]
> > [x1:23624] [ 1]
> > /opt/openmpi-1.2/lib/openmpi/mca_btl_mx.so(mca_btl_mx_finalize+0x25)
> > [0xb7aca825] [x1:23624] [ 2]
> >
/opt/openmpi-1.2/lib/openmpi/mca_btl_mx.so(mca_btl_mx_component_init+0x6
> > f8) [0xb7acc658] [x1:23624] [ 3]
> > /opt/ompi/lib/libmpi.so.0(mca_btl_base_select+0x1a0) [0xb7f41900]
> > [x1:23624] [ 4]
> >
/opt/openmpi-1.2/lib/openmpi/mca_bml_r2.so(mca_bml_r2_component_init+0x2
> > 6) [0xb7ad1006] [x1:23624] [ 5]
> > /opt/ompi/lib/libmpi.so.0(mca_bml_base_init+0x78) [0xb7f41198]
> > [x1:23624] [ 6]
> >
/opt/openmpi-1.2/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_component_init+0
> > x7d) [0xb7af866d] [x1:23624] [ 7]
> > /opt/ompi/lib/libmpi.so.0(mca_pml_base_select+0x176) [0xb7f49b56]
> > [x1:23624] [ 8] /opt/ompi/lib/libmpi.so.0(ompi_mpi_init+0x4cf)
> > [0xb7f0fe2f] [x1:23624] [ 9]
/opt/ompi/lib/libmpi.so.0(MPI_Init+0xab)
> > [0xb7f3204b] [x1:23624] [10] a.out(_ZN3MPI4InitERiRPPc+0x18)
[0x8052cbe]
> > [x1:23624] [11] a.out(main+0x21) [0x804f4a7] [x1:23624] [12]
> > /lib/libc.so.6(__libc_start_main+0xdc) [0xb7be9824]
> >
> > content of mymachines-1 file
> >
> > x1  max_slots=4
> >
> >
> >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users