Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] Question on ssh search path
From: marco atzeri (marco.atzeri_at_[hidden])
Date: 2012-10-16 17:23:53


Hi,
I am playing on OpenMpi(1.6.2) on cygwin platform, and
while compile and check were fine

the simple "mpirun hello_c.exe" is failing with the criptic

##################################################################
[MARCOATZERI:07440] [[15164,0],0] ORTE_ERROR_LOG: Not found in file
/pub/devel/openmpi/openmpi-1.6.2-1/src/openmpi-1.6.2/orte/mca/plm/rsh/plm_rsh_module.c
at line 197
[MARCOATZERI:07440] [[15164,0],0] ORTE_ERROR_LOG: Not found in file
/pub/devel/openmpi/openmpi-1.6.2-1/src/openmpi-1.6.2/orte/mca/ess/hnp/ess_hnp_module.c
at line 228
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

   orte_plm_init failed
   --> Returned value Not found (-13) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
[MARCOATZERI:07440] [[15164,0],0] ORTE_ERROR_LOG: Not found in file
/pub/devel/openmpi/openmpi-1.6.2-1/src/openmpi-1.6.2/orte/runtime/orte_init.c
at line 128
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

   orte_ess_set_name failed
   --> Returned value Not found (-13) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
[MARCOATZERI:07440] [[15164,0],0] ORTE_ERROR_LOG: Not found in file
/pub/devel/openmpi/openmpi-1.6.2-1/src/openmpi-1.6.2/orte/tools/orterun/orterun.c
at line 694
#####################################################################

trying to debug I notice a strange pattern on ssh search:
1) ssh is only searched on the PATH directories that end with "bin"
     other directories are skipped.
2) //usr/bin/ssh is not on the PATH but is searched.
    Why and where is defined in the code ?

   103 321183 [main] orterun 6304 normalize_posix_path: src
/home/marco/bin/ssh
   100 324353 [main] orterun 6304 normalize_posix_path: src
/usr/local/bin/ssh
    99 327381 [main] orterun 6304 normalize_posix_path: src /usr/bin/ssh
    36 1805679 [main] orterun 6304 normalize_posix_path: src
/home/marco/bin/ssh
    34 1807010 [main] orterun 6304 normalize_posix_path: src
/usr/local/bin/ssh
    34 1808236 [main] orterun 6304 normalize_posix_path: src /usr/bin/ssh
    37 1810858 [main] orterun 6304 normalize_posix_path: src //usr/bin/ssh

as immediately after the "//" search mpirun crashes

  703 9508968 [WNetOpenEnum] orterun 8020 cygthread::stub: thread
'WNetOpenEnum', id 0x15A0, stack_ptr 0x28BAD40
--- Process 8020, exception 000006AB at 776BB9BC
41286 9550254 [main] orterun 8020 fs_info::update: Cannot get volume
attributes (\??\UNC), C0000010

I suspect this search is the culprit.

If someone is interested I put here
http://matzeri.altervista.org/works/ompi/

all the config, check and make logs plus the ompi_info output.

Regards
Marco