Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

From: Laurent.POREZ_at_[hidden]
Date: 2006-10-27 10:56:15


> From: George Bosilca <bosilca_at_[hidden]>
> Subject: Re: [OMPI users] Error Handling Problem
> To: Open MPI Users <users_at_[hidden]>
> Message-ID: <EF68521D-C116-4E75-8FAC-5CE918E56D15_at_[hidden]>
> Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed
>
> How about changing the default error handler ?

I did change the default error handler (using Mpi_Comm_set_errhandler) in the main_exe program. I replaced it with a printf.
My error handler is never called, but main_exe receives a SIGPIPE signal.
So the only solution I found is to catch SIGPIPE and forget it...>

> It is not supposed to work, and if you find an MPI implementation
> that support this approach please tell me. I know the paper
> where you
> read about this, but even with their MPI library this approach does
> not work.

which paper are you talking about ?

>
> Soon, Open MPI will be able to support this feature. Several fault
> tolerant modes are under way, but no precise timeline yet.

OK. I keep watching for new versions of Open MPI.

Thanks,
        Laurent.

>
> Thanks,
> george.
>
> On Oct 26, 2006, at 10:19 AM, Laurent.POREZ_at_[hidden] wrote:
>
> > Hi,
> >
> > I developped a launcher application :
> > a MPI application (say main_exe) lauches 2 MPI applications (say
> > exe1 and exe2), using MPI_Comm_spawn_multiple.
> >
> > Now, I'm looking at the behavior when an exe crashes.
> >
> > What I can see is the following :
> > 1) when everybody is launched, I see the followings processes,
> > using 'ps' :
> > - the 'mpiexec -v -d -n 1 ./main_exe' command
> > - the orted server used for 'main_exe' (say 'orted1')
> > - main_exe
> > - the orted server used for 'exe1' and 'exe2' (say 'orted2')
> > - exe1
> > - exe2
> >
> > 2) I use kill -9 to 'crash' exe2
> >
> > 3) orted2 and exe1 finish.
> >
> > 4) with ps, I see it remains the following processes : mpiexec,
> > 'orted1', main_exe
> >
> > 5) main_exe tries to send a message to exe1, using MPI_Bsend :
> > main_exe gets killed by a SIG_PIPE signal !!!!
> >
> > So what I see is that when a part of an MPI application crashes,
> > the whole application crashes !
> > Is there a way to get an other behavior ? For exemple, MPI_Bsend
> > could return an error message.
> >
> > A few additionnal informations :
> > - I work on linux, with Open-MPI 1.1.1.
> > - I'm developping in C and C++.
> >
> > Thanks,
> > Laurent.
>