Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Deadly warning "Epoll ADD(4) on fd 2 failed." ?
From: Mike Dubman (miked_at_[hidden])
Date: 2014-05-28 02:17:59


I think it comes from PMI API used by OMPI/SLURM.
SLURM`s libpmi is trying to control stdout/stdin which is already
controlled by OMPI.

On Tue, May 27, 2014 at 8:31 PM, Ralph Castain <rhc_at_[hidden]> wrote:

> I'm unaware of any OMPI error message like that - might be caused by
> something in libevent as that could be using epoll, so it could be caused
> by us. However, I'm a little concerned about the use of the prerelease
> version of Slurm as we know that PMI is having some problems over there.
>
> So out of curiosity - how was this job launched? Via mpirun or directly
> using srun?
>
>
> On May 27, 2014, at 1:22 AM, Filippo Spiga <spiga.filippo_at_[hidden]>
> wrote:
>
> Dear all,
>
> I am using Open MPI v1.8.2 night snapshot compiled with SLURM support
> (version 14.03pre5). These two messages below appeared during a job of 2048
> MPI that died after 24 hours!
>
> [warn] Epoll ADD(1) on fd 0 failed. Old events were 0; read change was 1
> (add); write change was 0 (none): Operation not permitted
>
> [warn] Epoll ADD(4) on fd 2 failed. Old events were 0; read change was 0
> (none); write change was 1 (add): Operation not permitted
>
>
> The first one, appeared immediately at the beginning had no effect. The
> application started to compute and it successfully called a big parallel
> eigensolver. The second message appeared after 18~19 hours of non-stop
> computation and the application crashed without showing any other error
> message! Regularly I was checking that MPI processes were not stuck, after
> this message the processes were all aborted without dumping anything on
> stdout/stderr. It is quite weird.
>
> I believe these messages come from Open MPI (but correct me if I am
> wrong!). I am going to look at the application and the various libraries to
> find out if something is wrong. In the meanwhile it will be a great help if
> anyone can clarify the exact meaning of these warning messages.
>
> Many thanks in advance.
>
> Regards,
> Filippo
>
> --
> Mr. Filippo SPIGA, M.Sc.
> http://www.linkedin.com/in/filippospiga ~ skype: filippo.spiga
>
> «Nobody will drive us out of Cantor's paradise.» ~ David Hilbert
>
> *****
> Disclaimer: "Please note this message and any attachments are CONFIDENTIAL
> and may be privileged or otherwise protected from disclosure. The contents
> are not to be disclosed to anyone other than the addressee. Unauthorized
> recipients are requested to preserve this confidentiality and to advise the
> sender immediately of any error in transmission."
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>