Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] How to detect a failure to start-up and MPI_Init()?
From: Katz, Jacob (jacob.katz_at_[hidden])
Date: 2009-12-15 10:47:10


Ralph,
Have you been able to confirm this as a bug?
Thanks!
--------------------------------
Jacob M. Katz | jacob.katz_at_[hidden]<mailto:jacob.katz_at_[hidden]> | Work: +972-4-865-5726 | iNet: (8)-465-5726

From: users-bounces_at_[hidden] [mailto:users-bounces_at_[hidden]] On Behalf Of Ralph Castain
Sent: Sunday, December 06, 2009 19:24
To: Open MPI Users
Subject: Re: [OMPI users] How to detect a failure to start-up and MPI_Init()?

I'll look into it - sounds like a bug

Thanks!
On Sun, Dec 6, 2009 at 9:13 AM, Katz, Jacob <jacob.katz_at_[hidden]<mailto:jacob.katz_at_[hidden]>> wrote:
I'm using 1.3.3.
The job isn't aborted in my case when the failing process haven't called MPI_Init... It is aborted if the process have called MPI_Init...

--------------------------------
Jacob M. Katz | jacob.katz_at_[hidden]<mailto:jacob.katz_at_[hidden]> | Work: +972-4-865-5726 | iNet: (8)-465-5726

From: users-bounces_at_[hidden]<mailto:users-bounces_at_[hidden]> [mailto:users-bounces_at_[hidden]<mailto:users-bounces_at_[hidden]>] On Behalf Of Ralph Castain
Sent: Sunday, December 06, 2009 17:44
To: Open MPI Users
Subject: Re: [OMPI users] How to detect a failure to start-up and MPI_Init()?

The system should see that app fail and abort the job - whether it calls MPI_Init first or not is irrelevant. What version are you using?
On Sun, Dec 6, 2009 at 8:40 AM, Katz, Jacob <jacob.katz_at_[hidden]<mailto:jacob.katz_at_[hidden]>> wrote:
Hi,
Is there a way to detect a situation than one of the processes in an MPI application exits without even calling MPI_Init()?
I have a case in which all the processes except one are stuck forever in MPI_Init(), and that one exits before being able to call MPI_Init()...
I tried using the mca params that I thought might be related - orte_startup_timeout, orte_abort_timeout, but that didn't help.

Thanks!
--------------------------------
Jacob M. Katz | jacob.katz_at_[hidden]<mailto:jacob.katz_at_[hidden]> | Work: +972-4-865-5726 | iNet: (8)-465-5726

---------------------------------------------------------------------

Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for

the sole use of the intended recipient(s). Any review or distribution

by others is strictly prohibited. If you are not the intended

recipient, please contact the sender and delete all copies.

_______________________________________________
users mailing list
users_at_[hidden]<mailto:users_at_[hidden]>
http://www.open-mpi.org/mailman/listinfo.cgi/users

---------------------------------------------------------------------

Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for

the sole use of the intended recipient(s). Any review or distribution

by others is strictly prohibited. If you are not the intended

recipient, please contact the sender and delete all copies.

_______________________________________________
users mailing list
users_at_[hidden]<mailto:users_at_[hidden]>
http://www.open-mpi.org/mailman/listinfo.cgi/users

---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.