Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] FW: mpirun hangs when used on more than 2 CPUs ( mpirun compiled without thread support )
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2012-01-19 09:38:08


The thought occurs to me... (disclaimer: I know just about zero about OpenFoam and how to install/use it)

If your customer has been dealing with binaries, I wonder if there is some kind of ABI incompatibility going on here. Open MPI did not provide any ABI guarantees until Open MPI v1.3.2 -- see http://www.open-mpi.org/software/ompi/versions/ for details.

Also, Open MPI v1.3.2 is a bit old. There have been many bug fixes since then -- 1.4.4 is the latest stable. There will be a 1.4.5 shortly, but that will be the last on the 1.4 series.

On Jan 19, 2012, at 5:51 AM, Theiner, Andre wrote:

> Hi all,
> I have to stop my investigations and repairs on the request of my customer.
> I will unsubscribe from this list soon.
>
> I found out that OpenFoam does not use threaded MPI-calls.
> My next step would have been to compile openmpi-1.4.4 and have the user try this.
> In case it would have also not worked I would have compiled the whole OpenFoam from the sources.
> Up to now the user uses a rpm binary version of OF 2.0.1.
>
> Thanks for all your support.
>
>
> Andre
>
>
> -----Original Message-----
> From: Theiner, Andre
> Sent: Mittwoch, 18. Januar 2012 10:15
> To: 'Open MPI Users'
> Subject: RE: [OMPI users] mpirun hangs when used on more than 2 CPUs ( mpirun compiled without thread support )
> Importance: High
>
> Thanks, Jeff and Ralph for your good help.
> I do not know yet, whether OpenFoam uses threads with OpenMPI but I will find out.
>
> I ran "ompi_info" and it output the lines in the next chapter.
> The important line is " Thread support: posix (mpi: no, progress: no)".
> At first sight the above line made me think that I found the cause of the problem
> but I compared the output to the output of the same command run on another machine
> where OpenFoam runs fine. The OpenMPI version of that machine is 1.3.2-1.1 and it
> also does not have thread support.
> The difference though is that that machine's OpenFoam version is 1.7.1 and not 2.0.1 and the
> OS is SUSE Linux Enterprise Desktop 11 SP1 and not openSUSE 11.3.
> So I am at the beginning of the search for the cause of the problem.
>
> Package: Open MPI abuild_at_build30 Distribution
> Open MPI: 1.3.2
> Open MPI SVN revision: r21054
> Open MPI release date: Apr 21, 2009
> Open RTE: 1.3.2
> Open RTE SVN revision: r21054
> Open RTE release date: Apr 21, 2009
> OPAL: 1.3.2
> OPAL SVN revision: r21054
> OPAL release date: Apr 21, 2009
> Ident string: 1.3.2
> Prefix: /usr/lib64/mpi/gcc/openmpi
> Configured architecture: x86_64-unknown-linux-gnu
> Configure host: build30
> Configured by: abuild
> Configured on: Fri Sep 23 05:58:54 UTC 2011
> Configure host: build30
> Built by: abuild
> Built on: Fri Sep 23 06:11:31 UTC 2011
> Built host: build30
> C bindings: yes
> C++ bindings: yes
> Fortran77 bindings: yes (all)
> Fortran90 bindings: yes
> Fortran90 bindings size: small
> C compiler: gcc
> C compiler absolute: /usr/bin/gcc
> C++ compiler: g++
> C++ compiler absolute: /usr/bin/g++
> Fortran77 compiler: gfortran
> Fortran77 compiler abs: /usr/bin/gfortran
> Fortran90 compiler: gfortran
> Fortran90 compiler abs: /usr/bin/gfortran
> C profiling: yes
> C++ profiling: yes
> Fortran77 profiling: yes
> Fortran90 profiling: yes
> C++ exceptions: no
> Thread support: posix (mpi: no, progress: no)
> Sparse Groups: no
> Internal debug support: no
> MPI parameter check: runtime
> Memory profiling support: no
> Memory debugging support: no
> libltdl support: yes
> Heterogeneous support: no
> mpirun default --prefix: no
> MPI I/O support: yes
> MPI_WTIME support: gettimeofday
> Symbol visibility support: yes
> FT Checkpoint support: no (checkpoint thread: no)
> MCA backtrace: execinfo (MCA v2.0, API v2.0, Component v1.3.2)
> MCA memory: ptmalloc2 (MCA v2.0, API v2.0, Component v1.3.2)
> MCA paffinity: linux (MCA v2.0, API v2.0, Component v1.3.2)
> MCA carto: auto_detect (MCA v2.0, API v2.0, Component v1.3.2)
> MCA carto: file (MCA v2.0, API v2.0, Component v1.3.2)
> MCA maffinity: first_use (MCA v2.0, API v2.0, Component v1.3.2)
> MCA timer: linux (MCA v2.0, API v2.0, Component v1.3.2)
> MCA installdirs: env (MCA v2.0, API v2.0, Component v1.3.2)
> MCA installdirs: config (MCA v2.0, API v2.0, Component v1.3.2)
> MCA dpm: orte (MCA v2.0, API v2.0, Component v1.3.2)
> MCA pubsub: orte (MCA v2.0, API v2.0, Component v1.3.2)
> MCA allocator: basic (MCA v2.0, API v2.0, Component v1.3.2)
> MCA allocator: bucket (MCA v2.0, API v2.0, Component v1.3.2)
> MCA coll: basic (MCA v2.0, API v2.0, Component v1.3.2)
> MCA coll: hierarch (MCA v2.0, API v2.0, Component v1.3.2)
> MCA coll: inter (MCA v2.0, API v2.0, Component v1.3.2)
> MCA coll: self (MCA v2.0, API v2.0, Component v1.3.2)
> MCA coll: sm (MCA v2.0, API v2.0, Component v1.3.2)
> MCA coll: sync (MCA v2.0, API v2.0, Component v1.3.2)
> MCA coll: tuned (MCA v2.0, API v2.0, Component v1.3.2)
> MCA io: romio (MCA v2.0, API v2.0, Component v1.3.2)
> MCA mpool: fake (MCA v2.0, API v2.0, Component v1.3.2)
> MCA mpool: rdma (MCA v2.0, API v2.0, Component v1.3.2)
> MCA mpool: sm (MCA v2.0, API v2.0, Component v1.3.2)
> MCA pml: cm (MCA v2.0, API v2.0, Component v1.3.2)
> MCA pml: csum (MCA v2.0, API v2.0, Component v1.3.2)
> MCA pml: ob1 (MCA v2.0, API v2.0, Component v1.3.2)
> MCA pml: v (MCA v2.0, API v2.0, Component v1.3.2)
> MCA bml: r2 (MCA v2.0, API v2.0, Component v1.3.2)
> MCA rcache: vma (MCA v2.0, API v2.0, Component v1.3.2)
> MCA btl: self (MCA v2.0, API v2.0, Component v1.3.2)
> MCA btl: sm (MCA v2.0, API v2.0, Component v1.3.2)
> MCA btl: tcp (MCA v2.0, API v2.0, Component v1.3.2)
> MCA topo: unity (MCA v2.0, API v2.0, Component v1.3.2)
> MCA osc: pt2pt (MCA v2.0, API v2.0, Component v1.3.2)
> MCA osc: rdma (MCA v2.0, API v2.0, Component v1.3.2)
> MCA iof: hnp (MCA v2.0, API v2.0, Component v1.3.2)
> MCA iof: orted (MCA v2.0, API v2.0, Component v1.3.2)
> MCA iof: tool (MCA v2.0, API v2.0, Component v1.3.2)
> MCA oob: tcp (MCA v2.0, API v2.0, Component v1.3.2)
> MCA odls: default (MCA v2.0, API v2.0, Component v1.3.2)
> MCA ras: slurm (MCA v2.0, API v2.0, Component v1.3.2)
> MCA rmaps: rank_file (MCA v2.0, API v2.0, Component v1.3.2)
> MCA rmaps: round_robin (MCA v2.0, API v2.0, Component v1.3.2)
> MCA rmaps: seq (MCA v2.0, API v2.0, Component v1.3.2)
> MCA rml: oob (MCA v2.0, API v2.0, Component v1.3.2)
> MCA routed: binomial (MCA v2.0, API v2.0, Component v1.3.2)
> MCA routed: direct (MCA v2.0, API v2.0, Component v1.3.2)
> MCA routed: linear (MCA v2.0, API v2.0, Component v1.3.2)
> MCA plm: rsh (MCA v2.0, API v2.0, Component v1.3.2)
> MCA plm: slurm (MCA v2.0, API v2.0, Component v1.3.2)
> MCA filem: rsh (MCA v2.0, API v2.0, Component v1.3.2)
> MCA errmgr: default (MCA v2.0, API v2.0, Component v1.3.2)
> MCA ess: env (MCA v2.0, API v2.0, Component v1.3.2)
> MCA ess: hnp (MCA v2.0, API v2.0, Component v1.3.2)
> MCA ess: singleton (MCA v2.0, API v2.0, Component v1.3.2)
> MCA ess: slurm (MCA v2.0, API v2.0, Component v1.3.2)
> MCA ess: tool (MCA v2.0, API v2.0, Component v1.3.2)
> MCA grpcomm: bad (MCA v2.0, API v2.0, Component v1.3.2)
> MCA grpcomm: basic (MCA v2.0, API v2.0, Component v1.3.2)
>
>
> I also have requested the user to run the following adaption to his original
> command "mpriun -np 9 interFoam -parallel". I hoped to get a kind of debug output
> which points me into the right way. The new command did not work and I am a bit lost.
> Is the syntax wrong somehow or is there a problem in the user's PATH?
> I do not understand what debugger is wanted. Does mpirun not have an internal debugger?
>
> testuser_at_caelde04:~/OpenFOAM/testuser-2.0.1/nozzleFlow2D> mpirun -v --debug --debug-daemons -np 9 interfoam -parallel
> --------------------------------------------------------------------------
> A suitable debugger could not be found in your PATH.
> Check the values specified in the orte_base_user_debugger MCA parameter for the list of debuggers that was searched.
>
>
>
>
> Gruss/Regards
>
> Andre
> Tel. 05362-936222
>
>
> -----Original Message-----
> From: users-bounces_at_[hidden] [mailto:users-bounces_at_[hidden]] On Behalf Of Jeff Squyres
> Sent: Dienstag, 17. Januar 2012 22:53
> To: Open MPI Users
> Subject: Re: [OMPI users] mpirun hangs when used on more than 2 CPUs
>
> You should probably also run the ompi_info command; it tells you details about your installation, and how it was configured.
>
> Is it known that OpenFoam uses threads with MPI?
>
>
> On Jan 17, 2012, at 9:08 AM, Ralph Castain wrote:
>
>> You might first just try running a simple MPI "hello" to verify the installation. I don't know if OF is threaded or not.
>>
>> Sent from my iPad
>>
>> On Jan 17, 2012, at 5:22 AM, John Hearns <hearnsj_at_[hidden]> wrote:
>>
>>> Andre,
>>> you should not need the OpenMPI sources.
>>>
>>> Install the openmpi-devel package from the same source
>>> (zypper install openmpi-devel if you have that science repository enabled)
>>> This will give you the mpi.h file and other include files, libraries
>>> and manual pages.
>>>
>>> That is a convention in Suse-style distros - the devel package
>>> contains the stuf you need to 'develop'
>>>
>>> On 17/01/2012, Theiner, Andre <andre.theiner_at_[hidden]> wrote:
>>>> Hi Devendra,
>>>> thanks for your interesting answer, up to now I expected to get a fully
>>>> operational openmpi installation package
>>>> by installing openmpi from the "science" repository (
>>>> http://download.opensuse.org/repositories/science/openSUSE_11.3" ).
>>>> To compile your script I need to have the openmpi sources which I do not
>>>> have at present, I will try to get them.
>>>> How do I compile and build using multiple processors?
>>>> Is there a special flag which tells the compiler to care for multiple CPUs?
>>>>
>>>> Andre
>>>>
>>>>
>>>> From: users-bounces_at_[hidden] [mailto:users-bounces_at_[hidden]] On
>>>> Behalf Of devendra rai
>>>> Sent: Montag, 16. Januar 2012 13:25
>>>> To: Open MPI Users
>>>> Subject: Re: [OMPI users] mpirun hangs when used on more than 2 CPUs
>>>>
>>>> Hello Andre,
>>>>
>>>> It may be possible that your openmpi does not support threaded MPI-calls (if
>>>> these are happening). I had a similar problem, and it was traced to this
>>>> cause. If you installed your openmpi from available repositories, chances
>>>> are that you do not have thread-support.
>>>>
>>>> Here's a small script that you can use to determine whether or not you have
>>>> thread support:
>>>>
>>>> #include <mpi.h>
>>>> #include <iostream>
>>>> int main(int argc, char **argv)
>>>> {
>>>> int myrank;
>>>> int desired_thread_support = MPI_THREAD_MULTIPLE;
>>>> int provided_thread_support;
>>>>
>>>> MPI_Init_thread(&argc, &argv, desired_thread_support,
>>>> &provided_thread_support);
>>>>
>>>> /* check if the thread support has been provided */
>>>> if (provided_thread_support!=desired_thread_support)
>>>> {
>>>> std::cout << "MPI thread support not available! Aborted. " <<
>>>> std::endl;
>>>> exit(-1);
>>>> }
>>>> MPI_Finalize();
>>>> return 0;
>>>> }
>>>>
>>>> Compile and build as usual, using multiple processors.
>>>>
>>>> Maybe this helps. If you do discover that you do not have support available,
>>>> you will need to rebuild MPI with --enable-mpi-threads=yes flag.
>>>>
>>>> HTH.
>>>>
>>>>
>>>> Devendra
>>>>
>>>> ________________________________
>>>> From: "Theiner, Andre" <andre.theiner_at_[hidden]>
>>>> To: "users_at_[hidden]" <users_at_[hidden]>
>>>> Sent: Monday, 16 January 2012, 11:55
>>>> Subject: [OMPI users] mpirun hangs when used on more than 2 CPUs
>>>>
>>>>
>>>> Hi everyone,
>>>> may I have your help on a strange problem?
>>>> High performance computing is new to me and I have not much idea about
>>>> OpenMPI and OpenFoam (OF) which uses the "mpirun" command.
>>>> I have to support the OF application in my company and have been trying to
>>>> find the problem since about 1 week.
>>>> The versions are openmpi-1.3.2 and OF 2.0.1 which are running on openSUSE
>>>> 11.3 x86_64.
>>>> The computer is brand new, has 96 GB RAM, 12 CPUs and was installed with
>>>> Linux some weeks ago.
>>>> I installed OF 2.0.1 according to the vendors instructions at
>>>> http://www.openfoam.org/archive/2.0.1/download/suse.php.
>>>>
>>>> Here the problem:
>>>> The experienced user tested the OF with a test case out of one of the
>>>> vendors tutorials.
>>>> He only used the computing power of his local machine "caelde04" , no other
>>>> computers were accessed by mpirun.
>>>>
>>>> He found no problem when testing in single "processor mode" but in
>>>> "multiprocessor mode" his calculations hangs when he distributes
>>>> the calculations to more than 2 CPUs. The OF vendor thinks this is an
>>>> OpenMPI problem somehow and that is why I am trying to get
>>>> help from this forum here.
>>>> I attached 2 files, one is the "decomposeParDict" which resides in the
>>>> "system" subdirectory of his test case and the other is the log file
>>>> from the "decomposePar" command and the mpirun command "mpirun -np 9
>>>> interFoam -parallel".
>>>> Do you have an idea where the problem is or how I can narrow it down?
>>>> Thanks much for any help.
>>>>
>>>> Andre
>>>>
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]<mailto:users_at_[hidden]>
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> --
> Jeff Squyres
> jsquyres_at_[hidden]
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/