Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] How to detect a failure to start-up and MPI_Init()?
From: Ralph Castain (rhc_at_[hidden])
Date: 2009-12-06 12:23:51


I'll look into it - sounds like a bug

Thanks!

On Sun, Dec 6, 2009 at 9:13 AM, Katz, Jacob <jacob.katz_at_[hidden]> wrote:

> I’m using 1.3.3.
>
> The job isn’t aborted in my case when the failing process haven’t called
> MPI_Init… It is aborted if the process have called MPI_Init…
>
>
>
> --------------------------------
>
> *Jacob M. Katz* | jacob.katz_at_[hidden] | *Work:* +972-4-865-5726 | *iNet:
> *(8)-465-5726
>
>
>
> *From:* users-bounces_at_[hidden] [mailto:users-bounces_at_[hidden]] *On
> Behalf Of *Ralph Castain
> *Sent:* Sunday, December 06, 2009 17:44
> *To:* Open MPI Users
> *Subject:* Re: [OMPI users] How to detect a failure to start-up and
> MPI_Init()?
>
>
>
> The system should see that app fail and abort the job - whether it calls
> MPI_Init first or not is irrelevant. What version are you using?
>
> On Sun, Dec 6, 2009 at 8:40 AM, Katz, Jacob <jacob.katz_at_[hidden]> wrote:
>
> Hi,
>
> Is there a way to detect a situation than one of the processes in an MPI
> application exits without even calling MPI_Init()?
>
> I have a case in which all the processes except one are stuck forever in
> MPI_Init(), and that one exits before being able to call MPI_Init()…
>
> I tried using the mca params that I thought might be related -
> orte_startup_timeout, orte_abort_timeout, but that didn’t help.
>
>
>
> Thanks!
>
> --------------------------------
>
> *Jacob M. Katz* | jacob.katz_at_[hidden] | *Work:* +972-4-865-5726 | *iNet:
> *(8)-465-5726
>
>
>
> ---------------------------------------------------------------------
>
> Intel Israel (74) Limited
>
>
>
> This e-mail and any attachments may contain confidential material for
>
> the sole use of the intended recipient(s). Any review or distribution
>
> by others is strictly prohibited. If you are not the intended
>
> recipient, please contact the sender and delete all copies.
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
> ---------------------------------------------------------------------
> Intel Israel (74) Limited
>
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>