Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

From: de Almeida, Valmor F. (dealmeidav_at_[hidden])
Date: 2007-04-01 11:58:59


Hello Tim,

Thanks for the info. I also received this help from Myrinet:

************
It looks like you are running out of endpoints.

This discusses what endpoints are:
 http://www.myri.com/cgi-bin/fom.pl?file=421

And this explains how to increase the limit:
 http://www.myri.com/cgi-bin/fom.pl?file=482

Let us know if this doesn't address the problem.
************

I haven't had time to look into it.

--
Valmor
> -----Original Message-----
> From: users-bounces_at_[hidden] [mailto:users-bounces_at_[hidden]]
On
> Behalf Of Tim Prins
> Sent: Friday, March 30, 2007 10:49 PM
> To: Open MPI Users
> Subject: Re: [OMPI users] mca_btl_mx_init: mx_open_endpoint() failed
> withstatus=20
> 
> Hi Valmor,
> 
> What is happening here is that when Open MPI tries to create MX
endpoint
> for
> communication, mx returns code 20, which is MX_BUSY.
> 
> At this point we should gracefully move on, but there is a bug in Open
MPI
> 1.2
> which causes a segmentation fault in case of this type of error. This
will
> be
> fixed in 1.2.1, and the fix is available now in the 1.2 nightly
tarballs.
> 
> Hope this helps,
> 
> Tim
> 
> On Friday 30 March 2007 05:06 pm, de Almeida, Valmor F. wrote:
> > Hello,
> >
> > I am getting this error any time the number of processes requested
per
> > machine is greater than the number of cpus. I suspect it is
something on
> > the configuration of mx / ompi that I am missing since another
machine I
> > have without mx installed runs ompi correctly with oversubscription.
> >
> > Thanks for any help.
> >
> > --
> > Valmor
> >
> >
> > ->mpirun -np 3 --machinefile mymachines-1 a.out
> > [x1:23624] mca_btl_mx_init: mx_open_endpoint() failed with status=20
> > [x1:23624] *** Process received signal *** [x1:23624] Signal:
> > Segmentation fault (11) [x1:23624] Signal code: Address not mapped
(1)
> > [x1:23624] Failing at address: 0x20 [x1:23624] [ 0] [0xb7f7f440]
> > [x1:23624] [ 1]
> > /opt/openmpi-1.2/lib/openmpi/mca_btl_mx.so(mca_btl_mx_finalize+0x25)
> > [0xb7aca825] [x1:23624] [ 2]
> >
/opt/openmpi-1.2/lib/openmpi/mca_btl_mx.so(mca_btl_mx_component_init+0x6
> > f8) [0xb7acc658] [x1:23624] [ 3]
> > /opt/ompi/lib/libmpi.so.0(mca_btl_base_select+0x1a0) [0xb7f41900]
> > [x1:23624] [ 4]
> >
/opt/openmpi-1.2/lib/openmpi/mca_bml_r2.so(mca_bml_r2_component_init+0x2
> > 6) [0xb7ad1006] [x1:23624] [ 5]
> > /opt/ompi/lib/libmpi.so.0(mca_bml_base_init+0x78) [0xb7f41198]
> > [x1:23624] [ 6]
> >
/opt/openmpi-1.2/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_component_init+0
> > x7d) [0xb7af866d] [x1:23624] [ 7]
> > /opt/ompi/lib/libmpi.so.0(mca_pml_base_select+0x176) [0xb7f49b56]
> > [x1:23624] [ 8] /opt/ompi/lib/libmpi.so.0(ompi_mpi_init+0x4cf)
> > [0xb7f0fe2f] [x1:23624] [ 9]
/opt/ompi/lib/libmpi.so.0(MPI_Init+0xab)
> > [0xb7f3204b] [x1:23624] [10] a.out(_ZN3MPI4InitERiRPPc+0x18)
[0x8052cbe]
> > [x1:23624] [11] a.out(main+0x21) [0x804f4a7] [x1:23624] [12]
> > /lib/libc.so.6(__libc_start_main+0xdc) [0xb7be9824]
> >
> > content of mymachines-1 file
> >
> > x1  max_slots=4
> >
> >
> >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users