Ralph Castain wrote:
>
> On Feb 9, 2009, at 6:41 PM, Brett Pemberton wrote:
>
>> Hey,
>>
>> I've just installed OpenMPI 1.3 on our cluster, and am getting this
>> issue on jobs > 1 node.
>>
>> mpiexec: symbol lookup error:
>> /usr/local/openmpi/1.3-pgi/lib/openmpi/mca_plm_tm.so: undefined
>> symbol: tm_init
>>
>> As reported before, I saw someone saying that they solved this with:
>> --enable-mca-static=plm:tm
>>
>> A new install using this configure option does work for me, but only
>> for code recompiled with this new mpicc. Existing code doesn't spawn
>> properly.
>
> No, it won't since the static libraries for tm plm component weren't
> linked directly into the code.
Ahh, didn't think of that.
>
>>
>>
>> As such, I'd much rather get the existing install working again.
>>
>> It was suggested that I need the torque libraries on the compute
>> nodes, which they are. However adding them to ld.so.conf has not
>> solved this, so I'm not sure what more needs to be done to solve this
>> without recompiling openmpi.
>
> I'm not sure what you mean by adding them to ld.so.conf. What you need
> to do is install the torque libraries on the compute node in the same
> absolute path where they reside on the node where OMPI was built. OMPI
> points the executable to look for that location.
>
> Other than that, there shouldn't be a problem.
>
This is what confuses me.
We export /usr/local from the mgt node to all compute nodes.
Both torque and openmpi are installed to /usr/local.
So why are we hitting this issue?
cheers,
/ Brett
--
Brett Pemberton - VPAC Senior Systems Administrator
http://www.vpac.org/ - (03) 9925 4899
|