Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] SIGSEV when running OMPI Java binding
From: Jeff Squyres (jsquyres) (jsquyres_at_[hidden])
Date: 2014-03-14 16:44:20


We just fixed the segv (see https://svn.open-mpi.org/trac/ompi/changeset/31073, if you care).

The issue was an errant large array on the stack in debug builds, which would cause JVMs to run out of stack space.

The fix is on the SVN trunk now; it will be on the v1.7 branch shortly.

On Mar 11, 2014, at 5:06 PM, Saliya Ekanayake <esaliya_at_[hidden]> wrote:

> I just tested with "ml" turned off as you suggested, but unfortunately it didn't solve the issue.
>
> However, I found that by explicitly setting --mca btl ^tcp the code worked on upto 4 nodes with each running 8 procs. If I don't specify this it'll simply fail even on one node with 8 procs.
>
> Thank you,
> Saliya
>
>
> On Tue, Mar 11, 2014 at 4:35 PM, Jeff Squyres (jsquyres) <jsquyres_at_[hidden]> wrote:
> Looks like we still have a bug in one of our components -- can you try:
>
> mpirun --mca coll ^ml ...
>
> This will deactivate the "ml" collective component. See if that enables you to run (this particular component has nothing to do with Java).
>
>
> On Mar 11, 2014, at 1:33 AM, Saliya Ekanayake <esaliya_at_[hidden]> wrote:
>
> > Just tested that this happens even with the simple Hello.java program given in OMPI distribution.
> >
> > I've made a tarball containing details of the error adhering to http://www.open-mpi.org/community/help/. Please let me know if I have missed any info necessary.
> >
> > Thank you,
> > Saliya
> >
> >
> >
> >
> > On Mon, Mar 10, 2014 at 10:46 AM, Jeff Squyres (jsquyres) <jsquyres_at_[hidden]> wrote:
> > Greetings, and thanks for trying out our Java bindings.
> >
> > Can you provide some more details? E.g., is there a particular program you're running that incurs these problems? Or is there even a particular MPI function that you're using that results in this segv (e.g., perhaps we have a specific bug somewhere)?
> >
> > Can you reduce the segv to a small example that we can reproduce (and therefore fix)?
> >
> >
> > On Mar 10, 2014, at 12:05 AM, Saliya Ekanayake <esaliya_at_[hidden]> wrote:
> >
> > > Hi,
> > >
> > > I have 8 nodes each with 2 quad core sockets. Also, the nodes have IB connectivity. I am trying to run OMPI Java binding in OMPI trunk revision 30301 with 8 procs per node totaling 64 procs. This gives a SIGSEV error as below.
> > >
> > > I wonder if you have any suggestion to resolve this?
> > >
> > > Thank you,
> > > Saliya
> > >
> > > # A fatal error has been detected by the Java Runtime Environment:
> > > #
> > > # SIGSEGV (0xb) at pc=0x000000313867b75b, pid=12229, tid=47864973515072
> > > #
> > > # JRE version: Java(TM) SE Runtime Environment (8.0-b118) (build 1.8.0-ea-b118)
> > > # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.0-b60 mixed mode linux-amd64 compressed oops)
> > > # Problematic frame:
> > > # C [libc.so.6+0x7b75b] memcpy+0x15b
> > >
> > >
> > > --
> > > Saliya Ekanayake esaliya_at_[hidden]
> > > http://saliya.org
> > > _______________________________________________
> > > users mailing list
> > > users_at_[hidden]
> > > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> >
> > --
> > Jeff Squyres
> > jsquyres_at_[hidden]
> > For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
> >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> >
> >
> > --
> > Saliya Ekanayake esaliya_at_[hidden]
> > Cell 812-391-4914 Home 812-961-6383
> > http://saliya.org
> > <hellobug.tar.gz>_______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> --
> Jeff Squyres
> jsquyres_at_[hidden]
> For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
> --
> Saliya Ekanayake esaliya_at_[hidden]
> Cell 812-391-4914 Home 812-961-6383
> http://saliya.org
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/