I had
built OMPI with "-mca rml_base_verbose 10 -mca oob_base_verbose
10" but still no luck. On some machine, where mpirun is working
properly, it is giving correct debug messages as below:
#
mpirun -mca rml_base_verbose 10 -mca oob_base_verbose 10 arch
[linux] mca:
base: components_open: Looking for rml components
[linux] mca: base:
components_open: opening rml components
[linux] mca: base: components_open:
found loaded component oob
[linux] mca: base: components_open: component oob
has no register function
[linux] mca: base: components_open: Looking for oob
components
[linux] mca: base: components_open: opening oob
components
[linux] mca: base: components_open: found loaded component
tcp
[linux] mca: base: components_open: component tcp has no register
function
[linux] mca: base: components_open: component tcp open function
successful
[linux] mca: base: components_open: component oob open function
successful
[linux] orte_rml_base_select: initializing rml component
oob
[linux] [[55739,0],0] rml:base:update:contact:info got uri
3652911104.0;tcp://128.88.143.227:39207
x86_64
[linux] mca: base: close:
component tcp closed
[linux] mca: base: close: unloading component
tcp
[linux] mca: base: close: component oob closed
[linux] mca: base:
close: unloading component oob
#
But on the problem
reported machine, still the problem is same. It is not showing the debug
messages. Directly it is giving the error as below:
# mpirun arch
[NO-NAME] ORTE_ERROR_LOG: Not found in
file runtime/orte_init_stage1.c at
line
182
--------------------------------------------------------------------------
It
looks like orte_init failed for some reason; your parallel process is
likely
to abort. There are many reasons that a parallel process can fail
during
orte_init; some of which are due to configuration or environment
problems.
This failure appears to be an internal failure; here's some
additional
information (which may only be relevant to an Open
MPI
developer):
orte_rml_base_select failed
--> Returned value
-13 instead of
ORTE_SUCCESS
--------------------------------------------------------------------------
[host-desktop1:09127]
[NO-NAME] ORTE_ERROR_LOG: Not found in file
runtime/orte_system_init.c at
line 42 [host-desktop1:09127] [NO-NAME]
ORTE_ERROR_LOG: Not found in file
runtime/orte_init.c at line
52
--------------------------------------------------------------------------
Open
RTE was unable to initialize properly. The error occured while
attempting to
orte_init(). Returned value -13 instead of
ORTE_SUCCESS.
--------------------------------------------------------------------------
Not getting the root cause of failure. Please
guide.
Regards,
Amit
Sharma
Sr. Software
Engineer,
Wipro Technologies,
Bangalore
No parameter will help - the issue is that we couldn't find a TCP
interface to use for wiring up the job. First thing you might check is that you
have a TCP interface alive and active - can be the loopback interface, but you
need at least something.
If you do have an interface, then you might
rebuild OMPI with --enable-debug so you can get some diagnostics. Then run the
job again with
-mca rml_base_verbose 10 -mca oob_base_verbose
10
and see what diagnostic error messages emerge.
On Tue, Nov 3, 2009 at 4:42 AM, Amit Sharma
<amit.sharma5@wipro.com>
wrote:
Hi,
I
am using open-mpi version 1.3.2. on SLES 11 machine. I have built it
simply
like ./configure => make => make install.
I am facing the
following error with mpirun on some machines.
Root # mpirun -np 2
ls
[NO-NAME] ORTE_ERROR_LOG: Not found in file
runtime/orte_init_stage1.c at
line
182
--------------------------------------------------------------------------
It
looks like orte_init failed for some reason; your parallel process
is
likely to abort. There are many reasons that a parallel process can
fail
during orte_init; some of which are due to configuration or
environment
problems. This failure appears to be an internal failure;
here's some
additional information (which may only be relevant to an Open
MPI
developer):
orte_rml_base_select failed
--> Returned value
-13 instead of
ORTE_SUCCESS
--------------------------------------------------------------------------
[host-desktop1:09127]
[NO-NAME] ORTE_ERROR_LOG: Not found in file
runtime/orte_system_init.c at
line 42 [host-desktop1:09127] [NO-NAME]
ORTE_ERROR_LOG: Not found in file
runtime/orte_init.c at line
52
--------------------------------------------------------------------------
Open
RTE was unable to initialize properly. The error occured while
attempting
to orte_init(). Returned value -13 instead of
ORTE_SUCCESS.
--------------------------------------------------------------------------
Can
you please guide me to resolve this issue. Is there any run
time
environmental variable be set to get rid of this
issue?
Thanks in Advance,
Amit
Please do not
print this email unless it is absolutely necessary.
The information
contained in this electronic message and any attachments to this message are
intended for the exclusive use of the addressee(s) and may contain
proprietary, confidential or privileged information. If you are not the
intended recipient, you should not disseminate, distribute or copy this
e-mail. Please notify the sender immediately and destroy all copies of this
message and any attachments.
WARNING: Computer viruses can be
transmitted via email. The recipient should check this email and any
attachments for the presence of viruses. The company accepts no liability for
any damage caused by any virus transmitted by this email.
www.wipro.com
_______________________________________________
devel
mailing list
devel@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
Please do not print this email unless it is absolutely necessary.
The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments.
WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.
www.wipro.com