Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] OpenMPI 1.4.2 with Myrinet MX, mpirun seg faults
From: Scott Atchley (atchley_at_[hidden])
Date: 2010-10-28 14:40:33


On Oct 28, 2010, at 2:18 PM, Ray Muno wrote:

> On 10/22/2010 07:36 AM, Scott Atchley wrote:
>> Ray,
>>
>> Looking back at your original message, you say that it works if you use the Myricom supplied mpirun from the Myrinet roll. I wonder if this is a mismatch between libraries on the compute nodes.
>>
>> What do you get if you use your OMPI's mpirun with:
>>
>> $ mpirun -n 1 -H <remote_host> ldd $PWD/<your_binary>
>>
>> I am wondering if ldd find the libraries from your compile or the Myrinet roll.
>>
>
> OK, a bit of a hiatus trying to get this resolved. Had to tend other
> fires...
>
> I do think I had an issue of mixed environments. It is a Rocks 5.3
> test cluster and it had an old version of OpenMPI installed as part of
> the Rocks 5.3 HPC roll. I have no removed the HPC roll. All nodes were
> rebuilt.
>
> In the previous setup, we could actually run OpenMPI jobs over MX.
>
> With all other spurious versions of OpenMPI (and MPICH for that matter)
> removed, I have rebuilt and re-installed, from a fresh source tree,
> OpenMPI 1.4.3. It was built with PGI 10.8 compilers.
>
> Now, we cannot run with MX at all.
>
> The install was built with MX.
>
> $ ompi_info | grep mx
> MCA btl: mx (MCA v2.0, API v2.0, Component v1.4.3)
> MCA mtl: mx (MCA v2.0, API v2.0, Component v1.4.3)
>
> I can run with TCP, but now I get
>
> [compute-0-1.local:24863] mca: base: component_find: unable to open
> /share/apps/opt/OpenMPI/1.4.3/PGI/10.8/lib/openmpi/mca_mtl_mx: perhaps a
> missing symbol, or compiled for a different version of Open MPI? (ignored)
>
> $ ls -l /share/apps/opt/OpenMPI/1.4.3/PGI/10.8/lib/openmpi/mca_mtl_mx*
> -rwxr-xr-x 1 muno muno 1070 Oct 28 12:49
> /share/apps/opt/OpenMPI/1.4.3/PGI/10.8/lib/openmpi/mca_mtl_mx.la
> -rwxr-xr-x 1 muno muno 32044 Oct 28 12:49
> /share/apps/opt/OpenMPI/1.4.3/PGI/10.8/lib/openmpi/mca_mtl_mx.so
>
> mpirun -v -nolocal -np 96 --x MX_RCACHE=2 -hostfile machines --mca mtl
> mx --mca pml cm cpi.pgi

Does your environment have LD_LIBRARY_PATH set to point to $OMPI/lib and $MX/lib? Does it get set on login? Is $OMPI/bin in your PATH?

Scott