Add -mca plm_base_verbose 5 --leave-session-attached to the cmd line - that will show the ssh command being used to start each orted.
On Dec 14, 2012, at 12:17 PM, "Blosch, Edwin L" <edwin.l.blosch_at_[hidden]> wrote:
> I am having a weird problem launching cases with OpenMPI 1.4.3. It is most likely a problem with a particular node of our cluster, as the jobs will run fine on some submissions, but not other submissions. It seems to depend on the node list. I just am having trouble diagnosing which node, and what is the nature of the problem it has.
> One or perhaps more of the orted are indicating they cannot find an Intel Math library. The error is:
> /release/cfd/openmpi-intel/bin/orted: error while loading shared libraries: libimf.so: cannot open shared object file: No such file or directory
> Ive checked the environment just before launching mpirun, and LD_LIBRARY_PATH includes the necessary component to point to where the Intel shared libraries are located. Furthermore, my mpirun command line says to export the LD_LIBRARY_PATH variable:
> Executing ['/release/cfd/openmpi-intel/bin/mpirun', '--machinefile /var/spool/PBS/aux/20761.maruhpc4-mgt', '-np 160', '-x LD_LIBRARY_PATH', '-x MPI_ENVIRONMENT=1', '/tmp/fv420761.maruhpc4-mgt/falconv4_openmpi_jsgl', '-v', '-cycles', '10000', '-ri', 'restart.1', '-ro', '/tmp/fv420761.maruhpc4-mgt/restart.1']
> My shell-initialization script (.bashrc) does not overwrite LD_LIBRARY_PATH. OpenMPI is built explicitly --without-torque and should be using ssh to launch the orted.
> What options can I add to get more debugging of problems launching orted?
> users mailing list