Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] orte_rml_base_select failed
From: Amit Sharma (amit.sharma5_at_[hidden])
Date: 2009-11-06 00:16:33


Hi,

That is my mistake.

I sent the wrong dump. Please see the screenshot. I am using mpirun version
1.3.2 and the shown error i am getting even with verbose option.

 

 

-----Original Message-----

From: Jeff Squyres [ <mailto:jsquyres_at_[hidden]> mailto:jsquyres_at_[hidden]]

Sent: Thursday, November 05, 2009 6:55 PM

To: amit.sharma5_at_[hidden]; Open MPI Developers

Cc: Ralph Castain

Subject: Re: [OMPI devel] orte_rml_base_select failed

I think you must be accidentally mixing Open MPI versions -- the file
"orte/runtime/orte_system_init.c" does not exist in the Open MPI v1.3
series. It did exist, however, back in the Open MPI 1.2 series.

Could you double check that the OMPI that is installed (and is being

found/used) on host-desktop1 is the same version as all the others?

 

On Nov 5, 2009, at 7:18 AM, Amit Sharma wrote:

> I had built OMPI with "-mca rml_base_verbose 10 -mca oob_base_verbose

> 10" but still no luck. On some machine, where mpirun is working

> properly, it is giving correct debug messages as

> below:

>

> # mpirun -mca rml_base_verbose 10 -mca oob_base_verbose 10 arch

> [linux] mca: base: components_open: Looking for rml components [linux]

> mca: base: components_open: opening rml components [linux] mca: base:

> components_open: found loaded component oob [linux] mca: base:

> components_open: component oob has no register function [linux] mca:

> base: components_open: Looking for oob components [linux] mca: base:

> components_open: opening oob components [linux] mca: base:

> components_open: found loaded component tcp [linux] mca: base:

> components_open: component tcp has no register function [linux] mca:

> base: components_open: component tcp open function successful [linux]

> mca: base: components_open: component oob open function successful

> [linux] orte_rml_base_select: initializing rml component oob [linux]

> [[55739,0],0] rml:base:update:contact:info got uri

> 3652911104.0;tcp://128.88.143.227:39207

> x86_64

> [linux] mca: base: close: component tcp closed [linux] mca: base:

> close: unloading component tcp [linux] mca: base: close: component oob

> closed [linux] mca: base: close: unloading component oob #

>

> But on the problem reported machine, still the problem is same. It is

> not showing the debug messages. Directly it is giving the error as

> below:

>

> # mpirun arch

>

> [NO-NAME] ORTE_ERROR_LOG: Not found in file runtime/

> orte_init_stage1.c at line 182

> ----------------------------------------------------------------------

> ---- It looks like orte_init failed for some reason; your parallel

> process is likely to abort. There are many reasons that a parallel

> process can fail during orte_init; some of which are due to

> configuration or environment problems. This failure appears to be an

> internal failure; here's some additional information (which may only

> be relevant to an Open MPI

> developer):

>

> orte_rml_base_select failed

> --> Returned value -13 instead of ORTE_SUCCESS

>

> ----------------------------------------------------------------------

> ---- [host-desktop1:09127] [NO-NAME] ORTE_ERROR_LOG: Not found in file

> runtime/orte_system_init.c at line 42 [host-desktop1:09127] [NO-NAME]

> ORTE_ERROR_LOG: Not found in file runtime/orte_init.c at line 52

> ----------------------------------------------------------------------

> ---- Open RTE was unable to initialize properly. The error occured

> while attempting to orte_init(). Returned value -13 instead of

> ORTE_SUCCESS.

> ----------------------------------------------------------------------

> ---- Not getting the root cause of failure. Please guide.

>

>

> Regards,

> Amit Sharma

> Sr. Software Engineer,

> Wipro Technologies, Bangalore

>

>

>

> From: rhc.openmpi_at_[hidden] [ <mailto:rhc.openmpi_at_[hidden]>
mailto:rhc.openmpi_at_[hidden]] On Behalf

> Of Ralph Castain

> Sent: Tuesday, November 03, 2009 11:08 PM

> To: amit.sharma5_at_[hidden]; Open MPI Developers

> Subject: Re: [OMPI devel] orte_rml_base_select failed

>

> No parameter will help - the issue is that we couldn't find a TCP

> interface to use for wiring up the job. First thing you might check is

> that you have a TCP interface alive and active - can be the loopback

> interface, but you need at least something.

>

> If you do have an interface, then you might rebuild OMPI with --

> enable-debug so you can get some diagnostics. Then run the job again

> with

>

> -mca rml_base_verbose 10 -mca oob_base_verbose 10

>

> and see what diagnostic error messages emerge.

>

>

> On Tue, Nov 3, 2009 at 4:42 AM, Amit Sharma <amit.sharma5_at_[hidden]>

> wrote:

>

>

> Hi,

>

> I am using open-mpi version 1.3.2. on SLES 11 machine. I have built it

> simply like ./configure => make => make install.

>

> I am facing the following error with mpirun on some machines.

>

> Root # mpirun -np 2 ls

>

> [NO-NAME] ORTE_ERROR_LOG: Not found in file runtime/

> orte_init_stage1.c at

> line 182

> --------------------------------------------------------------------------

> It looks like orte_init failed for some reason; your parallel

> process is

> likely to abort. There are many reasons that a parallel process can

> fail

> during orte_init; some of which are due to configuration or

> environment

> problems. This failure appears to be an internal failure; here's some

> additional information (which may only be relevant to an Open MPI

> developer):

>

> orte_rml_base_select failed

> --> Returned value -13 instead of ORTE_SUCCESS

>

> --------------------------------------------------------------------------

> [host-desktop1:09127] [NO-NAME] ORTE_ERROR_LOG: Not found in file

> runtime/orte_system_init.c at line 42 [host-desktop1:09127] [NO-NAME]

> ORTE_ERROR_LOG: Not found in file runtime/orte_init.c at line 52

> --------------------------------------------------------------------------

> Open RTE was unable to initialize properly. The error occured while

> attempting to orte_init(). Returned value -13 instead of ORTE_SUCCESS.

> --------------------------------------------------------------------------

>

> Can you please guide me to resolve this issue. Is there any run time

> environmental variable be set to get rid of this issue?

>

>

> Thanks in Advance,

> Amit

>

>

>

>

> Please do not print this email unless it is absolutely necessary.

>

> The information contained in this electronic message and any

> attachments to this message are intended for the exclusive use of

> the addressee(s) and may contain proprietary, confidential or

> privileged information. If you are not the intended recipient, you

> should not disseminate, distribute or copy this e-mail. Please

> notify the sender immediately and destroy all copies of this message

> and any attachments.

>

> WARNING: Computer viruses can be transmitted via email. The

> recipient should check this email and any attachments for the

> presence of viruses. The company accepts no liability for any damage

> caused by any virus transmitted by this email.

>

> <outbind://429/www.wipro.com> www.wipro.com

> _______________________________________________

> devel mailing list

> devel_at_[hidden]

> <http://www.open-mpi.org/mailman/listinfo.cgi/devel>
http://www.open-mpi.org/mailman/listinfo.cgi/devel

>

> Please do not print this email unless it is absolutely necessary.

>

> The information contained in this electronic message and any

> attachments to this message are intended for the exclusive use of

> the addressee(s) and may contain proprietary, confidential or

> privileged information. If you are not the intended recipient, you

> should not disseminate, distribute or copy this e-mail. Please

> notify the sender immediately and destroy all copies of this message

> and any attachments.

>

> WARNING: Computer viruses can be transmitted via email. The

> recipient should check this email and any attachments for the

> presence of viruses. The company accepts no liability for any damage

> caused by any virus transmitted by this email.

>

> <outbind://429/www.wipro.com> www.wipro.com

>

> _______________________________________________

> devel mailing list

> devel_at_[hidden]

> <http://www.open-mpi.org/mailman/listinfo.cgi/devel>
http://www.open-mpi.org/mailman/listinfo.cgi/devel

 

-- 
Jeff Squyres
jsquyres_at_[hidden]
 
Regards,
Amit Sharma
Sr. Software Engineer,
Wipro Technologies, Bangalore
 
 
Please do not print this email unless it is absolutely necessary. 
The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments. 
WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. 
www.wipro.com




Outlook.jpg