Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] signal 15 (terminated)
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2009-02-06 17:25:41


Sorry for the delay in replying; this turned into a hectic week...

On Feb 4, 2009, at 11:28 AM, Hana Milani wrote:

> Jeff, Thanks for helping me.
>
> Is this a Fortran program, perchance?
>
> Yes, it has been written by f77, but I have compiled it with
> gfortran. People have also done the same with no problem.
>
> Do you have access to the source code? I wonder if the program is
> internally raising an error and effectively aborting itself. Do you
> know that the application runs correctly? Do you have any test data
> sets that you can try that give known outputs?
>
> Yes, I have installed the source code. I have not been able to run
> the program in parallel, but I have run my inputs sequentially and
> got satisfactory results.

That's a good datapoint, but it's unfortunately not conclusive.

> If you allow me, I can send the details of the code to your email.

If it's small and simple, sure. I'm afraid I don't have the time/
resources to investigate a large complex application that is
misbehaving.

I don't have any more insights other than to re-state that *something*
is killing your application with SIGTERM. It is *likely* some other
entity on your node - a daemon or some other controller process. But
it is also possible (although probably less likely) that the
application is aborting itself.

Are you able to run *any* MPI applications (especially those compiled
with Fortran) in parallel? E.g., the hello world and the ring
programs in the examples/ subdirectory in the OMPI distribution?

-- 
Jeff Squyres
Cisco Systems