Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] OPAL_PREFIX is not passed to remote node in pls_rsh_module.c
From: Teng Lin (teng.lin_at_[hidden])
Date: 2008-10-17 06:00:18


Hi All,

We have bundled Open MPI with our product and shipped it to the
customer. According to http://www.open-mpi.org/faq/?category=building#installdirs
,

Below is the command we used to launch MPI program:
env OPAL_PREFIX=/path/to/openmpi \
/path/to/openmpi/bin//orterun --prefix /path/to/openmpi -x PATH -x
LD_LIBRARY_PATH -x OPAL_PREFIX -np 2 --host host1,host2 ring_c

The interesting fact is that it always works on csh/tcsh. But quite a
few users told us that they runs into below errors:

[compute-28-1.local:11174] [NO-NAME] ORTE_ERROR_LOG: Not found in file
runtime/orte_init_stage1.c at line 182
------------------------------------------------------------------------

--
Sorry!  You were supposed to get help about:
   orte_init:startup:internal-failure
from the file:
   help-orte-runtime
But I couldn't find any file matching that name.  Sorry!
------------------------------------------------------------------------
--
[compute-28-1.local:11174] [NO-NAME] ORTE_ERROR_LOG: Not found in file
runtime/orte_system_init.c at line 42
[compute-28-1.local:11174] [NO-NAME] ORTE_ERROR_LOG: Not found in file
runtime/orte_init.c at line 52
------------------------------------------------------------------------
--
Sorry!  You were supposed to get help about:
   orted:init-failure
from the file:
   help-orted.txt
But I couldn't find any file matching that name.  Sorry!
Jeff did mention in http://www.open-mpi.org/community/lists/users/2008/09/6582.php 
  that OPAL_PREFIX was propagated for him automatically. I bet Jeff  
uses csh/tcsh.
Anyway, it can be traced back to how the daemon is launched.
sh/bash:
[xxxxx:25369] pls:rsh: executing: (//usr/bin/ssh) /usr/bin/ssh xxxxx
OPAL_PREFIX=/opt/openmpi-1.2.4 ;
PATH=/opt/openmpi-1.2.4/bin:$PATH
; export PATH ;
LD_LIBRARY_PATH=/opt/openmpi-1.2.4/lib:$LD_LIBRARY_PATH ; export  
LD_LIBRARY_PATH ;
csh/tcsh:
[xxxxx:09886] pls:rsh: executing: (//usr/bin/ssh) /usr/bin/ssh xxxxx
setenv OPAL_PREFIX /opt/openmpi-1.2.4 ;
It seems to work after I patched pls_rsh_module.c
--- pls_rsh_module.c.orig	2008-10-16 17:15:32.000000000 -0400
+++ pls_rsh_module.c	2008-10-16 17:15:51.000000000 -0400
@@ -989,7 +989,7 @@
                                   "%s/%s/%s",
                                   (opal_prefix != NULL ?  
"OPAL_PREFIX=" : ""),
                                   (opal_prefix != NULL ?  
opal_prefix : ""),
-                                  (opal_prefix != NULL ? " ;" : ""),
+                                  (opal_prefix != NULL ? " ; export  
OPAL_PREFIX ; " : ""),
                                   prefix_dir, bin_base,
                                   prefix_dir, lib_base,
                                   prefix_dir, bin_base,
Another workaround is to add
export OPAL_PREFIX
into $HOME/.bashrc.
Jeff, is this a bug in the code? Or  there is a reason that  
OPAL_PREFIX is not exported for sh/bash?
Teng