Few Additionnal Informations about my Network configuration

/opt is a share point it uses NFS

/Network/opt

is the point where /opt can be found accross the Network

I declared OPAL_PREFIX because openmpi was built with prefix /opt and runs it directory /Network/opt


 If a copy the directory /opt/openmpi-1.4.4 on all my nodes

scp -r /opt/openmpi-1.4.4 root@node2:/opt/.
scp -r /opt/openmpi-1.4.4 root@node3:/opt/.
scp -r /opt/openmpi-1.4.4 root@node4:/opt/.
....

This time my program runs.

A Question :  is OPAL_PREFIX declaration enought to use /Network/opt rather than /opt ?


Christophe


The problem is that the prefix you configured with doesn't match the prefix you are providing: 

configure: prefix = /opt/openmpi-1.4.4 

running: prefix = /Network/opt/openmpi-1.4.4 

The two have to match in order for the libraries to be found. 


On Nov 8, 2011, at 6:01 AM, Christophe Peyret wrote: 

> Hello, 


> I am trying to run a program on a cluster composed with Apple Xserve running 10.5.8 (Leopard). 


> 1) I am using openmpi-1.4.4 compiled with Intel ifort and icc (V12) 
> (/opt is a share point mounted in /Network/opt with NFS) 

> ./configure --prefix=/opt/openmpi-1.4.4 \ 
> F77=/Network/opt/intel/composerxe/bin/ifort F77FLAGS="-arch x86_64" \ 
> FC=/Network/opt/intel/composerxe/bin/ifort FCFLAGS="-arch x86_64" \ 
> CC=/Network/opt/intel/composerxe/bin/icc CFLAGS="-arch x86_64" \ 
> CXX=/Network/opt/intel/composerxe/bin/icpc CXXFLAGS="-arch x86_64" 

> make 
> sudo make install 


> Each /etc/profile of my nodes contains : 

> export COMP_HOME=/Network/opt/intel/composerxe 
> export PATH=$COMP_HOME/bin:$COMP_HOME/man:$PATH 
> export DYLD_LIBRARY_PATH=$COMP_HOME/lib/:$DYLD_LIBRARY_PATH 

> export MPI_HOME=/Network/opt/openmpi-1.4.4 
> export OPAL_PREFIX=/Network/opt/openmpi-1.4.4 

> export PATH=${MPI_HOME}/bin:${MPI_HOME}/man:$PATH 
> export DYLD_LIBRARY_PATH=$MPI_HOME/lib/:$DYLD_LIBRARY_PATH 
> export LD_LIBRARY_PATH=$MPI_HOME/lib/:$LD_LIBRARY_PATH 

> 2) when I lauch mpirun on several nodes, the MPI connections fails and I have the error message : 

> mpirun --prefix /Network/opt/openmpi-1.4.4/ -H node1,node2 -n 2 space64 -f Test/Euler/eulerRigid.def 
> dyld: lazy symbol binding failed: Symbol not found: _orte_daemon 
> Referenced from: /Network/opt/openmpi-1.4.4/bin/orted 
> Expected in: /usr/lib/libopen-rte.0.dylib 

> dyld: Symbol not found: _orte_daemon 
> Referenced from: /Network/opt/openmpi-1.4.4/bin/orted 
> Expected in: /usr/lib/libopen-rte.0.dylib 

> bash: line 1: 2973 Trace/BPT trap /Network/opt/openmpi-1.4.4/bin/orted --daemonize -mca ess env -mca orte_ess_jobid 1644560384 -mca orte_ess_vpid 1 -mca orte_ess_num_procs 2 --hnp-uri "1644560384.0;tcp://10.0.0.1:50782;tcp://125.1.4.55:50782" 
> -------------------------------------------------------------------------- 
> A daemon (pid 41667) died unexpectedly with status 133 while attempting 
> to launch so we are aborting. 

> There may be more information reported by the environment (see above). 

> This may be because the daemon was unable to find all the needed shared 
> libraries on the remote node. You may set your LD_LIBRARY_PATH to have the 
> location of the shared libraries on the remote nodes and this will 
> automatically be forwarded to the remote nodes. 
> -------------------------------------------------------------------------- 
> -------------------------------------------------------------------------- 
> mpirun noticed that the job aborted, but has no info as to the process 
> that caused that situation. 
> -------------------------------------------------------------------------- 
> mpirun: clean termination accomplished 


> 3) Does anyone have an idea ? 


> -- 
> Christophe Peyret 
> ONERA - DSNA - PS3A 
> 29 ave de la Division Leclerc 
> F92320 Chatillon 
> Tel. : +331 4673 4778 
> Fax : +331 4673 4166 

http://www.onera.fr/dsna/couplage-methodes-aeroacoustiques 


--
Christophe Peyret
ONERA - DSNA - PS3A
29 ave de la Division Leclerc
F92320 Chatillon
Tel. : +331 4673 4778
Fax : +331 4673 4166