I found that the subroutine call inside a loop did not
return correct value after certain iterations. In order to simplify the
problem, the inputs to the subroutine are chosen to be constant, so the
output should be the same for every iteration on every computing node.
It is a fortran program, after the initialization the program goes like
do i = 1, N
call my_sub(A, B, C, re)
print *, mypn, A, B, C, re
where re is the output value of the my_sub, A, B, C are inputs to my_sub.
is the number of correct iterations. If the combined instances does not
exceed 570, the output is fine. For example, if I requested 10
computing nodes and N were 40, so it gives 10*40=400 instances, the
output would be fine. But if the combined instances exceeded 570, the
first 570 is fine, but the rest will return NaN value. For example, if
the number of computing nodes were 20 and N were 40, which gives
20*40=800 instances, then the first 570 are fine, but the rest are NaN
someone know what might cause the problem? I googled it, but can't find
a clue where to start. Please also let me know what else you need to
debug the problem.