Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] segfault when combining OpenMPI and GotoBLAS2
From: Gus Correa (gus_at_[hidden])
Date: 2010-01-19 10:35:02


Hi Dorian and Eloi

I wonder if this is really a Goto BLAS problem or related to
how OpenMPI was configured.

In a recent sequence of postings on this list
a colleague reported several errors which were fixed
after he removed the (non-default) "--enable-mpi-threads"
flag from his OpenMPI configuration (and built OpenMPI again,
and recompiled).

See this thread:
http://www.open-mpi.org/community/lists/users/2009/12/11640.php
http://www.open-mpi.org/community/lists/users/2010/01/11695.php

He was also using BLAS (most likely Goto's) in the HPL benchmark.

Did you configure OpenMPI with "--enable-mpi-threads"?
Have you tried without it?

I hope this helps.
Gus Correa
---------------------------------------------------------------------
Gustavo Correa
Lamont-Doherty Earth Observatory - Columbia University
Palisades, NY, 10964-8000 - USA
---------------------------------------------------------------------

Eloi Gaudry wrote:
> Dorian Krause wrote:
>> Hi Eloi,
>>>
>>> Does the segmentation faults you're facing also happen in a
>>> sequential environment (i.e. not linked against openmpi libraries) ?
>>
>> No, without MPI everything works fine. Also, linking against mvapich
>> doesn't give any errors. I think there is a problem with GotoBLAS and
>> the shared library infrastructure of OpenMPI. The code doesn't come to
>> the point to execute the gemm operation at all.
>>
>>> Have you already informed Kazushige Goto (developer of Gotoblas) ?
>>
>> Not yet. Since the problem only happens with openmpi and the BLAS
>> (stand-alone) seems to work, I thought the openmpi mailing list would
>> be the better place to discuss this (to get a grasp of what the error
>> could be before going to the GotoBLAS mailing list).
>>
>>>
>>> Regards,
>>> Eloi
>>>
>>> PS: Could you post your Makefile.rule here so that we could check the
>>> different compilation options chosen ?
>>
>> I didn't make any changes to the Makefile.rules. This is the content
>> of Makefile.conf:
>>
>> OSNAME=Linux
>> ARCH=x86_64
>> C_COMPILER=GCC
>> BINARY32=
>> BINARY64=1
>> CEXTRALIB=-L/usr/lib/gcc/x86_64-redhat-linux/4.1.2
>> -L/usr/lib/gcc/x86_64-redhat-linux/4.1.2
>> -L/usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../lib64
>> -L/lib/../lib64 -L/usr/lib/../lib64 -lc
>> F_COMPILER=GFORTRAN
>> FC=gfortran
>> BU=_
>> FEXTRALIB=-L/usr/lib/gcc/x86_64-redhat-linux/4.1.2
>> -L/usr/lib/gcc/x86_64-redhat-linux/4.1.2
>> -L/usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../lib64
>> -L/lib/../lib64 -L/usr/lib/../lib64 -lgfortran -lm -lgfortran -lm -lc
>> CORE=BARCELONA
>> LIBCORE=barcelona
>> NUM_CORES=8
>> HAVE_MMX=1
>> HAVE_SSE=1
>> HAVE_SSE2=1
>> HAVE_SSE3=1
>> HAVE_SSE4A=1
>> HAVE_3DNOWEX=1
>> HAVE_3DNOW=1
>> MAKE += -j 8
>> SGEMM_UNROLL_M=8
>> SGEMM_UNROLL_N=4
>> DGEMM_UNROLL_M=4
>> DGEMM_UNROLL_N=4
>> QGEMM_UNROLL_M=2
>> QGEMM_UNROLL_N=2
>> CGEMM_UNROLL_M=4
>> CGEMM_UNROLL_N=2
>> ZGEMM_UNROLL_M=2
>> ZGEMM_UNROLL_N=2
>> XGEMM_UNROLL_M=1
>> XGEMM_UNROLL_N=1
>>
>>
>> Thanks,
>> Dorian
>>
> Dorian,
>
> I've been experiencing similar issue on two different Opteron
> architectures (22xx and 25x), in a sequential environment, when using
> v2-1.10 of GotoBLAS. If you can downgrade to version 2-1.09, I bet you
> will not experience such issues. Anyway, I'm pretty sure Kazushige is
> working on fixing this right now.
>
> Eloi
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users