Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] segfault when combining OpenMPI and GotoBLAS2
From: Yaakoub El Khamra (yye00_at_[hidden])
Date: 2010-01-19 11:27:36


Greetings
Can we please verify this problem is with Gotoblas and not with
OpenMPI? if I read this correctly, without MPI and with other flavors
of MPI, you have normal execution. This would normally indicate the
problem is on the OpenMPI side.

I am 2 doors away from Kazushige's office. Please do let me know so
that I can talk to him about this.

Regards
Yaakoub El Khamra

On Tue, Jan 19, 2010 at 9:35 AM, Gus Correa <gus_at_[hidden]> wrote:
> Hi Dorian and Eloi
>
> I wonder if this is really a Goto BLAS problem or related to
> how OpenMPI was configured.
>
> In a recent sequence of postings on this list
> a colleague reported several errors which were fixed
> after he removed the (non-default) "--enable-mpi-threads"
> flag from his OpenMPI configuration (and built OpenMPI again,
> and recompiled).
>
> See this thread:
> http://www.open-mpi.org/community/lists/users/2009/12/11640.php
> http://www.open-mpi.org/community/lists/users/2010/01/11695.php
>
> He was also using BLAS (most likely Goto's) in the HPL benchmark.
>
> Did you configure OpenMPI with "--enable-mpi-threads"?
> Have you tried without it?
>
> I hope this helps.
> Gus Correa
> ---------------------------------------------------------------------
> Gustavo Correa
> Lamont-Doherty Earth Observatory - Columbia University
> Palisades, NY, 10964-8000 - USA
> ---------------------------------------------------------------------
>
>
> Eloi Gaudry wrote:
>> Dorian Krause wrote:
>>> Hi Eloi,
>>>>
>>>> Does the segmentation faults you're facing also happen in a
>>>> sequential environment (i.e. not linked against openmpi libraries) ?
>>>
>>> No, without MPI everything works fine. Also, linking against mvapich
>>> doesn't give any errors. I think there is a problem with GotoBLAS and
>>> the shared library infrastructure of OpenMPI. The code doesn't come to
>>> the point to execute the gemm operation at all.
>>>
>>>> Have you already informed Kazushige Goto (developer of Gotoblas) ?
>>>
>>> Not yet. Since the problem only happens with openmpi and the BLAS
>>> (stand-alone) seems to work, I thought the openmpi mailing list would
>>> be the better place to discuss this (to get a grasp of what the error
>>> could be before going to the GotoBLAS mailing list).
>>>
>>>>
>>>> Regards,
>>>> Eloi
>>>>
>>>> PS: Could you post your Makefile.rule here so that we could check the
>>>> different compilation options chosen ?
>>>
>>> I didn't make any changes to the Makefile.rules. This is the content
>>> of Makefile.conf:
>>>
>>> OSNAME=Linux
>>> ARCH=x86_64
>>> C_COMPILER=GCC
>>> BINARY32=
>>> BINARY64=1
>>> CEXTRALIB=-L/usr/lib/gcc/x86_64-redhat-linux/4.1.2
>>> -L/usr/lib/gcc/x86_64-redhat-linux/4.1.2
>>> -L/usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../lib64
>>> -L/lib/../lib64 -L/usr/lib/../lib64  -lc
>>> F_COMPILER=GFORTRAN
>>> FC=gfortran
>>> BU=_
>>> FEXTRALIB=-L/usr/lib/gcc/x86_64-redhat-linux/4.1.2
>>> -L/usr/lib/gcc/x86_64-redhat-linux/4.1.2
>>> -L/usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../lib64
>>> -L/lib/../lib64 -L/usr/lib/../lib64  -lgfortran -lm -lgfortran -lm -lc
>>> CORE=BARCELONA
>>> LIBCORE=barcelona
>>> NUM_CORES=8
>>> HAVE_MMX=1
>>> HAVE_SSE=1
>>> HAVE_SSE2=1
>>> HAVE_SSE3=1
>>> HAVE_SSE4A=1
>>> HAVE_3DNOWEX=1
>>> HAVE_3DNOW=1
>>> MAKE += -j 8
>>> SGEMM_UNROLL_M=8
>>> SGEMM_UNROLL_N=4
>>> DGEMM_UNROLL_M=4
>>> DGEMM_UNROLL_N=4
>>> QGEMM_UNROLL_M=2
>>> QGEMM_UNROLL_N=2
>>> CGEMM_UNROLL_M=4
>>> CGEMM_UNROLL_N=2
>>> ZGEMM_UNROLL_M=2
>>> ZGEMM_UNROLL_N=2
>>> XGEMM_UNROLL_M=1
>>> XGEMM_UNROLL_N=1
>>>
>>>
>>> Thanks,
>>> Dorian
>>>
>> Dorian,
>>
>> I've been experiencing similar issue on two different Opteron
>> architectures (22xx and 25x), in a sequential environment, when using
>> v2-1.10 of GotoBLAS. If you can downgrade to version 2-1.09, I bet you
>> will not experience such issues. Anyway, I'm pretty sure Kazushige is
>> working on fixing this right now.
>>
>> Eloi
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>