Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] segfault when combining OpenMPI and GotoBLAS2
From: Eloi Gaudry (eg_at_[hidden])
Date: 2010-01-19 11:36:30


Yaakoub El Khamra wrote:
> Greetings
> Can we please verify this problem is with Gotoblas and not with
> OpenMPI? if I read this correctly, without MPI and with other flavors
> of MPI, you have normal execution. This would normally indicate the
> problem is on the OpenMPI side.
>
> I am 2 doors away from Kazushige's office. Please do let me know so
> that I can talk to him about this.
>
> Regards
> Yaakoub El Khamra
>
>
>
>
> On Tue, Jan 19, 2010 at 9:35 AM, Gus Correa <gus_at_[hidden]> wrote:
>
>> Hi Dorian and Eloi
>>
>> I wonder if this is really a Goto BLAS problem or related to
>> how OpenMPI was configured.
>>
>> In a recent sequence of postings on this list
>> a colleague reported several errors which were fixed
>> after he removed the (non-default) "--enable-mpi-threads"
>> flag from his OpenMPI configuration (and built OpenMPI again,
>> and recompiled).
>>
>> See this thread:
>> http://www.open-mpi.org/community/lists/users/2009/12/11640.php
>> http://www.open-mpi.org/community/lists/users/2010/01/11695.php
>>
>> He was also using BLAS (most likely Goto's) in the HPL benchmark.
>>
>> Did you configure OpenMPI with "--enable-mpi-threads"?
>> Have you tried without it?
>>
>> I hope this helps.
>> Gus Correa
>> ---------------------------------------------------------------------
>> Gustavo Correa
>> Lamont-Doherty Earth Observatory - Columbia University
>> Palisades, NY, 10964-8000 - USA
>> ---------------------------------------------------------------------
>>
>>
>> Eloi Gaudry wrote:
>>
>>> Dorian Krause wrote:
>>>
>>>> Hi Eloi,
>>>>
>>>>> Does the segmentation faults you're facing also happen in a
>>>>> sequential environment (i.e. not linked against openmpi libraries) ?
>>>>>
>>>> No, without MPI everything works fine. Also, linking against mvapich
>>>> doesn't give any errors. I think there is a problem with GotoBLAS and
>>>> the shared library infrastructure of OpenMPI. The code doesn't come to
>>>> the point to execute the gemm operation at all.
>>>>
>>>>
>>>>> Have you already informed Kazushige Goto (developer of Gotoblas) ?
>>>>>
>>>> Not yet. Since the problem only happens with openmpi and the BLAS
>>>> (stand-alone) seems to work, I thought the openmpi mailing list would
>>>> be the better place to discuss this (to get a grasp of what the error
>>>> could be before going to the GotoBLAS mailing list).
>>>>
>>>>
>>>>> Regards,
>>>>> Eloi
>>>>>
>>>>> PS: Could you post your Makefile.rule here so that we could check the
>>>>> different compilation options chosen ?
>>>>>
>>>> I didn't make any changes to the Makefile.rules. This is the content
>>>> of Makefile.conf:
>>>>
>>>> OSNAME=Linux
>>>> ARCH=x86_64
>>>> C_COMPILER=GCC
>>>> BINARY32=
>>>> BINARY64=1
>>>> CEXTRALIB=-L/usr/lib/gcc/x86_64-redhat-linux/4.1.2
>>>> -L/usr/lib/gcc/x86_64-redhat-linux/4.1.2
>>>> -L/usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../lib64
>>>> -L/lib/../lib64 -L/usr/lib/../lib64 -lc
>>>> F_COMPILER=GFORTRAN
>>>> FC=gfortran
>>>> BU=_
>>>> FEXTRALIB=-L/usr/lib/gcc/x86_64-redhat-linux/4.1.2
>>>> -L/usr/lib/gcc/x86_64-redhat-linux/4.1.2
>>>> -L/usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../lib64
>>>> -L/lib/../lib64 -L/usr/lib/../lib64 -lgfortran -lm -lgfortran -lm -lc
>>>> CORE=BARCELONA
>>>> LIBCORE=barcelona
>>>> NUM_CORES=8
>>>> HAVE_MMX=1
>>>> HAVE_SSE=1
>>>> HAVE_SSE2=1
>>>> HAVE_SSE3=1
>>>> HAVE_SSE4A=1
>>>> HAVE_3DNOWEX=1
>>>> HAVE_3DNOW=1
>>>> MAKE += -j 8
>>>> SGEMM_UNROLL_M=8
>>>> SGEMM_UNROLL_N=4
>>>> DGEMM_UNROLL_M=4
>>>> DGEMM_UNROLL_N=4
>>>> QGEMM_UNROLL_M=2
>>>> QGEMM_UNROLL_N=2
>>>> CGEMM_UNROLL_M=4
>>>> CGEMM_UNROLL_N=2
>>>> ZGEMM_UNROLL_M=2
>>>> ZGEMM_UNROLL_N=2
>>>> XGEMM_UNROLL_M=1
>>>> XGEMM_UNROLL_N=1
>>>>
>>>>
>>>> Thanks,
>>>> Dorian
>>>>
>>>>
>>> Dorian,
>>>
>>> I've been experiencing similar issue on two different Opteron
>>> architectures (22xx and 25x), in a sequential environment, when using
>>> v2-1.10 of GotoBLAS. If you can downgrade to version 2-1.09, I bet you
>>> will not experience such issues. Anyway, I'm pretty sure Kazushige is
>>> working on fixing this right now.
>>>
>>> Eloi
>>>

Hi Gus and Yaakoub,

I've been able to reproduce similar issue on opteron servers, either
using a sequential or parallel binary linked with v2-1.10. With v2-1.09,
these segfaults disappear. I've just told Kazushige so.

I don't think that the segmentation faults experienced by Dorian are due
to OpenMPI (I'm myself using a non-mpi-thread-aware built).

Regards,
Eloi