Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Program hangs
From: Eugene Loh (Eugene.Loh_at_[hidden])
Date: 2009-11-23 18:27:24

I can't tell if these problems are related to trac ticket 2043 or not.

Compiler: In my experience, trac 2043 depends on GCC 4.4.x. It isn't
necessarily a GCC bug... perhaps it's just exposing an OMPI problem.
I'm confused what compiler Jiaye is using, and Vasilis is apparently
seeing a problem when using the PGI compiler. But, maybe other
compilers in addition to GCC 4.4.x are exposing the problem.

Severity: In my experience, trac 2043 shows up rather dramatically:
within dozens to hundreds of iterations of simple message patterns. So,
a problem that shows up only after hours of execution feels to me to be
something different. But maybe I misunderstand Jiaye's and Vasili's
cases: are the programs running well for several hours before the hang

Shared memory: Trac 2043 appears related to shared memory. Jiaye seems
to run on a single node. Vasilis talks of running on a "cluster" -- so
I don't know if that means over an interconnect or still using sm.

Anyhow, it's hard to know which problems are the same or different when
we don't yet really understand what's going on.

vasilis gkanis wrote:

>I also experience a similar problem with the MUMPS solver, when I run it on a
>cluster. After several hours of running the code does not produce any results,
>although the command top shows that the program occupies 100% of the CPU.
>The difference here, however, is that the same program runs fine on my PC. The
>differences between my PC and the cluster are:
>1) 32bit vs 64-bit(cluster)
>2) intel compiler vs portland compiler(cluster)
>On Friday 20 November 2009 03:50:17 am Jiaye Li wrote:
>>I installed openmpi-1.3.3 on my single node(cpu) intel 64bit quad-core
>>machine. The compiler info is:
>>*********************************** intel-icc101018-10.1.018-1.i386
>>I compiled PWscf program with openmpi and tested the program. At the
>>beginning, the execution of PW went on well, but after about 10 h, when
>> the program is going to finish. The program hang there, but the cpu time
>> is still occupied. (100% taken up by the program). There seems to be
>> something wrong, somewhere. Any ideas? Thank you in advance.