Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] OpenMPI 1.4.2 with Myrinet MX, mpirun seg faults
From: Scott Atchley (atchley_at_[hidden])
Date: 2010-10-28 14:40:33


On Oct 28, 2010, at 2:18 PM, Ray Muno wrote:

> On 10/22/2010 07:36 AM, Scott Atchley wrote:
>> Ray,
>>
>> Looking back at your original message, you say that it works if you use the Myricom supplied mpirun from the Myrinet roll. I wonder if this is a mismatch between libraries on the compute nodes.
>>
>> What do you get if you use your OMPI's mpirun with:
>>
>> $ mpirun -n 1 -H <remote_host> ldd $PWD/<your_binary>
>>
>> I am wondering if ldd find the libraries from your compile or the Myrinet roll.
>>
>
> OK, a bit of a hiatus trying to get this resolved. Had to tend other
> fires...
>
> I do think I had an issue of mixed environments. It is a Rocks 5.3
> test cluster and it had an old version of OpenMPI installed as part of
> the Rocks 5.3 HPC roll. I have no removed the HPC roll. All nodes were
> rebuilt.
>
> In the previous setup, we could actually run OpenMPI jobs over MX.
>
> With all other spurious versions of OpenMPI (and MPICH for that matter)
> removed, I have rebuilt and re-installed, from a fresh source tree,
> OpenMPI 1.4.3. It was built with PGI 10.8 compilers.
>
> Now, we cannot run with MX at all.
>
> The install was built with MX.
>
> $ ompi_info | grep mx
> MCA btl: mx (MCA v2.0, API v2.0, Component v1.4.3)
> MCA mtl: mx (MCA v2.0, API v2.0, Component v1.4.3)
>
> I can run with TCP, but now I get
>
> [compute-0-1.local:24863] mca: base: component_find: unable to open
> /share/apps/opt/OpenMPI/1.4.3/PGI/10.8/lib/openmpi/mca_mtl_mx: perhaps a
> missing symbol, or compiled for a different version of Open MPI? (ignored)
>
> $ ls -l /share/apps/opt/OpenMPI/1.4.3/PGI/10.8/lib/openmpi/mca_mtl_mx*
> -rwxr-xr-x 1 muno muno 1070 Oct 28 12:49
> /share/apps/opt/OpenMPI/1.4.3/PGI/10.8/lib/openmpi/mca_mtl_mx.la
> -rwxr-xr-x 1 muno muno 32044 Oct 28 12:49
> /share/apps/opt/OpenMPI/1.4.3/PGI/10.8/lib/openmpi/mca_mtl_mx.so
>
> mpirun -v -nolocal -np 96 --x MX_RCACHE=2 -hostfile machines --mca mtl
> mx --mca pml cm cpi.pgi

Does your environment have LD_LIBRARY_PATH set to point to $OMPI/lib and $MX/lib? Does it get set on login? Is $OMPI/bin in your PATH?

Scott