We haven't figured it out yet - it seems somewhat erratic as your observations don't match anything we are seeing on our machines. We know the coll/ml component is causing trouble for Java applications (but nothing else, oddly enough), but that doesn't match your experience.


On Mar 12, 2014, at 9:01 PM, Saliya Ekanayake <esaliya@gmail.com> wrote:

Just checking if there's some solution for this.

Thank you,
Saliya


On Tue, Mar 11, 2014 at 10:54 PM, Saliya Ekanayake <esaliya@gmail.com> wrote:
I forgot to mention that I tried the hello.c version instead of Java and it too failed in a similar manner, but 

1. On a single node with --mca btl ^tcp it went up to 24 procs before failing
2. On 8 nodes with --mca btl ^tcp it could go only up to 16 procs


On Tue, Mar 11, 2014 at 5:06 PM, Saliya Ekanayake <esaliya@gmail.com> wrote:
I just tested with "ml" turned off as you suggested, but unfortunately it didn't solve the issue. 

However, I found that by explicitly setting --mca btl ^tcp the code worked on upto 4 nodes with each running 8 procs. If I don't specify this it'll simply fail even on one node with 8 procs.

Thank you,
Saliya


On Tue, Mar 11, 2014 at 4:35 PM, Jeff Squyres (jsquyres) <jsquyres@cisco.com> wrote:
Looks like we still have a bug in one of our components -- can you try:

    mpirun --mca coll ^ml ...

This will deactivate the "ml" collective component.  See if that enables you to run (this particular component has nothing to do with Java).


On Mar 11, 2014, at 1:33 AM, Saliya Ekanayake <esaliya@gmail.com> wrote:

> Just tested that this happens even with the simple Hello.java program given in OMPI distribution.
>
> I've made a tarball containing details of the error adhering to http://www.open-mpi.org/community/help/. Please let me know if I have missed any info necessary.
>
> Thank you,
> Saliya
>
>
>
>
> On Mon, Mar 10, 2014 at 10:46 AM, Jeff Squyres (jsquyres) <jsquyres@cisco.com> wrote:
> Greetings, and thanks for trying out our Java bindings.
>
> Can you provide some more details?  E.g., is there a particular program you're running that incurs these problems?  Or is there even a particular MPI function that you're using that results in this segv (e.g., perhaps we have a specific bug somewhere)?
>
> Can you reduce the segv to a small example that we can reproduce (and therefore fix)?
>
>
> On Mar 10, 2014, at 12:05 AM, Saliya Ekanayake <esaliya@gmail.com> wrote:
>
> > Hi,
> >
> > I have 8 nodes each with 2 quad core sockets. Also, the nodes have IB connectivity. I am trying to run OMPI Java binding in OMPI trunk revision 30301 with 8 procs per node totaling 64 procs. This gives a SIGSEV error as below.
> >
> > I wonder if you have any suggestion to resolve this?
> >
> > Thank you,
> > Saliya
> >
> > # A fatal error has been detected by the Java Runtime Environment:
> > #
> > #  SIGSEGV (0xb) at pc=0x000000313867b75b, pid=12229, tid=47864973515072
> > #
> > # JRE version: Java(TM) SE Runtime Environment (8.0-b118) (build 1.8.0-ea-b118)
> > # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.0-b60 mixed mode linux-amd64 compressed oops)
> > # Problematic frame:
> > # C  [libc.so.6+0x7b75b]  memcpy+0x15b
> >
> >
> > --
> > Saliya Ekanayake esaliya@gmail.com
> > http://saliya.org
> > _______________________________________________
> > users mailing list
> > users@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> --
> Jeff Squyres
> jsquyres@cisco.com
> For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
>
> _______________________________________________
> users mailing list
> users@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
> --
> Saliya Ekanayake esaliya@gmail.com
> Cell 812-391-4914 Home 812-961-6383
> http://saliya.org
> <hellobug.tar.gz>_______________________________________________
> users mailing list
> users@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


--
Jeff Squyres
jsquyres@cisco.com
For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/

_______________________________________________
users mailing list
users@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--



--



--
Saliya Ekanayake esaliya@gmail.com 
Cell 812-391-4914 Home 812-961-6383
http://saliya.org
_______________________________________________
users mailing list
users@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users