Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] SIGSEV when running OMPI Java binding
From: Saliya Ekanayake (esaliya_at_[hidden])
Date: 2014-03-14 16:47:36


This is really great news!! I'll test the trunk on our cluster.

Thank you,
Saliya

On Fri, Mar 14, 2014 at 4:44 PM, Jeff Squyres (jsquyres) <jsquyres_at_[hidden]
> wrote:

> We just fixed the segv (see
> https://svn.open-mpi.org/trac/ompi/changeset/31073, if you care).
>
> The issue was an errant large array on the stack in debug builds, which
> would cause JVMs to run out of stack space.
>
> The fix is on the SVN trunk now; it will be on the v1.7 branch shortly.
>
>
> On Mar 11, 2014, at 5:06 PM, Saliya Ekanayake <esaliya_at_[hidden]> wrote:
>
> > I just tested with "ml" turned off as you suggested, but unfortunately
> it didn't solve the issue.
> >
> > However, I found that by explicitly setting --mca btl ^tcp the code
> worked on upto 4 nodes with each running 8 procs. If I don't specify this
> it'll simply fail even on one node with 8 procs.
> >
> > Thank you,
> > Saliya
> >
> >
> > On Tue, Mar 11, 2014 at 4:35 PM, Jeff Squyres (jsquyres) <
> jsquyres_at_[hidden]> wrote:
> > Looks like we still have a bug in one of our components -- can you try:
> >
> > mpirun --mca coll ^ml ...
> >
> > This will deactivate the "ml" collective component. See if that enables
> you to run (this particular component has nothing to do with Java).
> >
> >
> > On Mar 11, 2014, at 1:33 AM, Saliya Ekanayake <esaliya_at_[hidden]> wrote:
> >
> > > Just tested that this happens even with the simple Hello.java program
> given in OMPI distribution.
> > >
> > > I've made a tarball containing details of the error adhering to
> http://www.open-mpi.org/community/help/. Please let me know if I have
> missed any info necessary.
> > >
> > > Thank you,
> > > Saliya
> > >
> > >
> > >
> > >
> > > On Mon, Mar 10, 2014 at 10:46 AM, Jeff Squyres (jsquyres) <
> jsquyres_at_[hidden]> wrote:
> > > Greetings, and thanks for trying out our Java bindings.
> > >
> > > Can you provide some more details? E.g., is there a particular
> program you're running that incurs these problems? Or is there even a
> particular MPI function that you're using that results in this segv (e.g.,
> perhaps we have a specific bug somewhere)?
> > >
> > > Can you reduce the segv to a small example that we can reproduce (and
> therefore fix)?
> > >
> > >
> > > On Mar 10, 2014, at 12:05 AM, Saliya Ekanayake <esaliya_at_[hidden]>
> wrote:
> > >
> > > > Hi,
> > > >
> > > > I have 8 nodes each with 2 quad core sockets. Also, the nodes have
> IB connectivity. I am trying to run OMPI Java binding in OMPI trunk
> revision 30301 with 8 procs per node totaling 64 procs. This gives a SIGSEV
> error as below.
> > > >
> > > > I wonder if you have any suggestion to resolve this?
> > > >
> > > > Thank you,
> > > > Saliya
> > > >
> > > > # A fatal error has been detected by the Java Runtime Environment:
> > > > #
> > > > # SIGSEGV (0xb) at pc=0x000000313867b75b, pid=12229,
> tid=47864973515072
> > > > #
> > > > # JRE version: Java(TM) SE Runtime Environment (8.0-b118) (build
> 1.8.0-ea-b118)
> > > > # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.0-b60 mixed mode
> linux-amd64 compressed oops)
> > > > # Problematic frame:
> > > > # C [libc.so.6+0x7b75b] memcpy+0x15b
> > > >
> > > >
> > > > --
> > > > Saliya Ekanayake esaliya_at_[hidden]
> > > > http://saliya.org
> > > > _______________________________________________
> > > > users mailing list
> > > > users_at_[hidden]
> > > > http://www.open-mpi.org/mailman/listinfo.cgi/users
> > >
> > >
> > > --
> > > Jeff Squyres
> > > jsquyres_at_[hidden]
> > > For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
> > >
> > > _______________________________________________
> > > users mailing list
> > > users_at_[hidden]
> > > http://www.open-mpi.org/mailman/listinfo.cgi/users
> > >
> > >
> > >
> > > --
> > > Saliya Ekanayake esaliya_at_[hidden]
> > > Cell 812-391-4914 Home 812-961-6383
> > > http://saliya.org
> > > <hellobug.tar.gz>_______________________________________________
> > > users mailing list
> > > users_at_[hidden]
> > > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> >
> > --
> > Jeff Squyres
> > jsquyres_at_[hidden]
> > For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
> >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> >
> >
> > --
> > Saliya Ekanayake esaliya_at_[hidden]
> > Cell 812-391-4914 Home 812-961-6383
> > http://saliya.org
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> --
> Jeff Squyres
> jsquyres_at_[hidden]
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

-- 
Saliya Ekanayake esaliya_at_[hidden]
Cell 812-391-4914 Home 812-961-6383
http://saliya.org