Ralph Castain wrote:
> On Feb 9, 2009, at 6:41 PM, Brett Pemberton wrote:
>> I've just installed OpenMPI 1.3 on our cluster, and am getting this
>> issue on jobs > 1 node.
>> mpiexec: symbol lookup error:
>> /usr/local/openmpi/1.3-pgi/lib/openmpi/mca_plm_tm.so: undefined
>> symbol: tm_init
>> As reported before, I saw someone saying that they solved this with:
>> A new install using this configure option does work for me, but only
>> for code recompiled with this new mpicc. Existing code doesn't spawn
> No, it won't since the static libraries for tm plm component weren't
> linked directly into the code.
Ahh, didn't think of that.
>> As such, I'd much rather get the existing install working again.
>> It was suggested that I need the torque libraries on the compute
>> nodes, which they are. However adding them to ld.so.conf has not
>> solved this, so I'm not sure what more needs to be done to solve this
>> without recompiling openmpi.
> I'm not sure what you mean by adding them to ld.so.conf. What you need
> to do is install the torque libraries on the compute node in the same
> absolute path where they reside on the node where OMPI was built. OMPI
> points the executable to look for that location.
> Other than that, there shouldn't be a problem.
This is what confuses me.
We export /usr/local from the mgt node to all compute nodes.
Both torque and openmpi are installed to /usr/local.
So why are we hitting this issue?
Brett Pemberton - VPAC Senior Systems Administrator
http://www.vpac.org/ - (03) 9925 4899