Sorry for the delay in replying; this turned into a hectic week...
On Feb 4, 2009, at 11:28 AM, Hana Milani wrote:
> Jeff, Thanks for helping me.
> Is this a Fortran program, perchance?
> Yes, it has been written by f77, but I have compiled it with
> gfortran. People have also done the same with no problem.
> Do you have access to the source code? I wonder if the program is
> internally raising an error and effectively aborting itself. Do you
> know that the application runs correctly? Do you have any test data
> sets that you can try that give known outputs?
> Yes, I have installed the source code. I have not been able to run
> the program in parallel, but I have run my inputs sequentially and
> got satisfactory results.
That's a good datapoint, but it's unfortunately not conclusive.
> If you allow me, I can send the details of the code to your email.
If it's small and simple, sure. I'm afraid I don't have the time/
resources to investigate a large complex application that is
I don't have any more insights other than to re-state that *something*
is killing your application with SIGTERM. It is *likely* some other
entity on your node - a daemon or some other controller process. But
it is also possible (although probably less likely) that the
application is aborting itself.
Are you able to run *any* MPI applications (especially those compiled
with Fortran) in parallel? E.g., the hello world and the ring
programs in the examples/ subdirectory in the OMPI distribution?