Subject: Re: [OMPI users] MPI loop problem
From: Eugene Loh (Eugene.Loh_at_[hidden])
Date: 2009-08-18 14:38:54

Is the problem independent of the the number of MPI processes?  (You suggest this is the case.)

If so, does this problem show up even with np=1 (a single MPI process)?

If so, does the problem show up even if you turn MPI off?

If so, the problem would seem to be unrelated to the MPI implementation (but possibly related to code that was introduced to parallelize).

Julia He wrote:
The OpenMPI version is

[julia.he@bob bin]$ mpirun --version
mpirun (Open MPI) 1.2.8

Report bugs to

The platform is

[julia.he@bob bin]$ uname -a
Linux 2.6.18-92.1.13.el5 #1 SMP Wed Sep 24 19:32:05 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux

The my_sub is a modification of Radiative Transfer code 6S. The 6S code takes angles, atmospheric conditions, altitude, etc as inputs, and it returns top of the atmosphere reflectance as the output. The code I provided is a pseudo code because 6S code consists of plenty of subroutines and the main program has 3219 lines.

What I need is to use MPI to parallel the jobs. So, each computing node computes one set of the inputs. But I found that the returned value were not correct after 570 instances. So, I passed the same inputs to each computing node. But the problem still exist. The first 570 returned values are correct(also same in this case), but after 570 the returned values are NaN.

Can someone give a hint because our system administrator can't help with programming? But, I suspect if some setting in MPI prevents computing more than certain times? I know it sounds weird. But I have no clue why with the same inputs the returned value could be garbage after 570 instances.
