On Apr 15, 2011, at 2:59 AM, Reuti wrote:

Hi,

Am 15.04.2011 um 07:25 schrieb Asad Ali:

<snip>
Yes. The entire job gets restarted.

maybe this is caused by a signal sent to the job by Condor, so that it gets terminated and as a result Condor restarts it. Can you trap the signals in your appliaction for a test?


If so, you had best talk to the Condor folks - it has nothing to do with Open MPI, but is due to a job control flag you are passing to Condor.

I have talked to them several times. But most of the cluster users are non-mpi users and thus they don't have much knowledge about the configuration of MPI with Condor.
If you know any person who uses Condor for running MPI jobs then please let me know.

Is the use of Open MPI supported by Condor? In former times they had a special universe for MPICH(1) and only for an older version to run parallel jobs under Condor. Did this change?

See https://bugzilla.redhat.com/show_bug.cgi?id=537232

At one time, it appears such a script existed. You might start with the one offered here, and/or check on the web for updates.

I would also go to the Condor web site:

http://www.cs.wisc.edu/condor/

A search for "openmpi" revealed several presentations on how to make this work.




-- Reuti


Cheers,

Asad



On Apr 14, 2011, at 6:37 PM, Asad Ali wrote:

Hi all,

I am using Condor to run my MPI jobs on a large cluster of nodes. The jobs run fine but after sometimes they automatically get restarted. What can be the reason?

Cheers,

Asad

--
"A Bayesian is one who, vaguely expecting a horse, and catching a glimpse of a donkey, strongly believes he has seen a mule."
_______________________________________________
users mailing list
users@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


_______________________________________________
users mailing list
users@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
"A Bayesian is one who, vaguely expecting a horse, and catching a glimpse of a donkey, strongly believes he has seen a mule."
_______________________________________________
users mailing list
users@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


_______________________________________________
users mailing list
users@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users