Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] simple test problem hangs on mpi_finalize and consumes all system resources
From: Jeff Squyres (jsquyres) (jsquyres_at_[hidden])
Date: 2014-01-24 18:24:05


Greg and I are chatting off list; there's something definitely weird going on in his setup.

We'll report back to the list when we figure it out.

On Jan 24, 2014, at 1:26 PM, Gus Correa <gus_at_[hidden]>
 wrote:

> On 01/24/2014 12:50 PM, Fischer, Greg A. wrote:
>> Yep. That was the problem. It works beautifully now.
>>
>> Thanks for prodding me to take another look.
>>
>> With regards to openmpi-1.6.5, the system that I'm compiling and running on,
> SLES10, contains some pretty dated software (e.g. Linux 2.6.x, python 2.4,
> gcc 4.1.2). Is it possible there's simply an
> incompatibility lurking in there somewhere that would trip
> openmpi-1.6.5 but not openmpi-1.4.3?
>>
>> Greg
>>
>
> Hi Greg
>
> FWIW, we have OpenMPI 1.6.5 installed
> (and we have used OMPI 1.4.5, 1.4.4, 1.4.3, ..., 1.2.8, before)
> in our older cluster that has CentOS 5.2, Linux kernel 2.6.18,
> gcc 4.1.2, Python 2.4.3, etc.
> Parallel programs compile and run with OMPI 1.6.5 without problems.
>
> I hope this helps,
> Gus Correa
>
>>> -----Original Message-----
>>> From: Fischer, Greg A.
>>> Sent: Friday, January 24, 2014 11:41 AM
>>> To: 'Open MPI Users'
>>> Cc: Fischer, Greg A.
>>> Subject: RE: [OMPI users] simple test problem hangs on mpi_finalize and
>>> consumes all system resources
>>>
>>> Hmm... It looks like CMAKE was somehow finding openmpi-1.6.5 instead of
>>> openmpi-1.4.3, despite the environment variables being set otherwise. This
>>> is likely the explanation. I'll try to chase that down.
>>>
>>>> -----Original Message-----
>>>> From: users [mailto:users-bounces_at_[hidden]] On Behalf Of Jeff
>>>> Squyres (jsquyres)
>>>> Sent: Friday, January 24, 2014 11:39 AM
>>>> To: Open MPI Users
>>>> Subject: Re: [OMPI users] simple test problem hangs on mpi_finalize and
>>>> consumes all system resources
>>>>
>>>> Ok. I only mention this because the "mca_paffinity_linux.so: undefined
>>>> symbol: mca_base_param_reg_int" type of message is almost always an
>>>> indicator of two different versions being installed into the same tree.
>>>>
>>>>
>>>> On Jan 24, 2014, at 11:26 AM, "Fischer, Greg A."
>>>> <fischega_at_[hidden]> wrote:
>>>>
>>>>> Version 1.4.3 and 1.6.5 were and are installed in separate trees:
>>>>>
>>>>> 1003 fischega_at_lxlogin2[~]> ls
>>>>> /tools/casl_sles10/vera_clean/gcc-4.6.1/toolset/openmpi-1.*
>>>>> /tools/casl_sles10/vera_clean/gcc-4.6.1/toolset/openmpi-1.4.3:
>>>>> bin etc include lib share
>>>>>
>>>>> /tools/casl_sles10/vera_clean/gcc-4.6.1/toolset/openmpi-1.6.5:
>>>>> bin etc include lib share
>>>>>
>>>>> I'm fairly sure I was careful to check that the LD_LIBRARY_PATH was
>>>>> set
>>>> correctly, but I'll check again.
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: users [mailto:users-bounces_at_[hidden]] On Behalf Of Jeff
>>>>>> Squyres (jsquyres)
>>>>>> Sent: Friday, January 24, 2014 11:07 AM
>>>>>> To: Open MPI Users
>>>>>> Subject: Re: [OMPI users] simple test problem hangs on mpi_finalize
>>>>>> and consumes all system resources
>>>>>>
>>>>>> On Jan 22, 2014, at 10:21 AM, "Fischer, Greg A."
>>>>>> <fischega_at_[hidden]> wrote:
>>>>>>
>>>>>>> The reason for deleting the openmpi-1.6.5 installation was that I
>>>>>>> went back
>>>>>> and installed openmpi-1.4.3 and the problem (mostly) went away.
>>>>>> Openmpi-
>>>>>> 1.4.3 can run the simple tests without issue, but on my "real"
>>>>>> program, I'm getting symbol lookup errors:
>>>>>>>
>>>>>>> mca_paffinity_linux.so: undefined symbol: mca_base_param_reg_int
>>>>>>
>>>>>> This sounds like you are mixing 1.6.x and 1.4.x in the same
>>>>>> installation
>>>> tree.
>>>>>> This can definitely lead to sadness.
>>>>>>
>>>>>> More specifically: installing 1.6 over an existing 1.4 installation
>>>>>> (and vice
>>>>>> versa) is definitely NOT supported. The set of plugins that the two
>>>>>> install are different, and can lead to all manner of weird/undefined
>>>> behavior.
>>>>>>
>>>>>> FWIW: I typically install Open MPI into a tree by itself. And if I
>>>>>> later want to remove that installation, I just "rm -rf" that tree.
>>>>>> Then I can install a different version of OMPI into that same tree
>>>>>> (because the prior tree is completely gone).
>>>>>>
>>>>>> However, if you can't install OMPI into a tree by itself, you can
>>>>>> "make uninstall" from the source tree, and that should surgically
>>>>>> completely remove OMPI from the installation tree. Then it is safe
>>>>>> to install a different version of OMPI into that same tree.
>>>>>>
>>>>>> Can you verify that you had installed OMPI into completely clean
>>>>>> trees? If you didn't, I can imagine that causing the kinds of
>>>>>> errors that you
>>>> described.
>>>>>>
>>>>>> --
>>>>>> Jeff Squyres
>>>>>> jsquyres_at_[hidden]
>>>>>> For corporate legal information go to:
>>>>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>>>>>
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> users_at_[hidden]
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>>
>>>> --
>>>> Jeff Squyres
>>>> jsquyres_at_[hidden]
>>>> For corporate legal information go to:
>>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/