Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Segmentation fault whilst running RaXML-MPI
From: Nick Holway (nick.holway_at_[hidden])
Date: 2009-11-18 05:22:29


Dear all,

A quick follow up in aid of Google.

Upgrading the Intel compilers made no difference to the error message.

I contacted the researcher who wrote it who told me that the problem
was likely to be the Intel compilers over-optimising the code and
suggested using GCC which worked. He also pointed me in the direction
of new versions of RAxML which are available at
http://wwwkramer.in.tum.de/exelixis/software.html

Nick

2009/11/6 Nick Holway <nick.holway_at_[hidden]>:
> Hi,
>
> Thank you for the information, I'm going to try the new Intel
> Compilers which I'm downloading now, but as they're taking so long to
> download I don't think I'm going to be able to look into this again
> until after the weekend. BTW using their java-based downloader is a
> bit less painful than their normal download.
>
> In the meantime, if anyone else has some suggestions then please let me know.
>
> Thanks
>
> Nick
>
> 2009/11/5 Jeff Squyres <jsquyres_at_[hidden]>:
>> FWIW, I think Intel released 11.1.059 earlier today (I've been trying to
>> download it all morning).  I doubt it's an issue in this case, but I thought
>> I'd mention it as a public service announcement.  ;-)
>>
>> Seg faults are *usually* an application issue (never say "never", but they
>> *usually* are).  You might want to first contact the RaXML team to see if
>> there are any known issues with their software and Open MPI 1.3.3...?
>>  (Sorry, I'm totally unfamiliar with RaXML)
>>
>> On Nov 5, 2009, at 12:30 PM, Nick Holway wrote:
>>
>>> Dear all,
>>>
>>> I'm trying to run RaXML 7.0.4 on my 64bit Rocks 5.1 cluster (ie Centos
>>> 5.2). I compiled Open MPI 1.3.3 using the Intel compilers v 11.1.056
>>> using ./configure CC=icc CXX=icpc F77=ifort FC=ifort --with-sge
>>> --prefix=/usr/prog/mpi/openmpi/1.3.3/x86_64-no-mem-man
>>> --with-memory-manager=none.
>>>
>>> When I run run RaXML in a qlogin session using
>>> /usr/prog/mpi/openmpi/1.3.3/x86_64-no-mem-man/bin/mpirun -np 8
>>> /usr/prog/bioinformatics/RAxML/7.0.4/x86_64/RAxML-7.0.4/raxmlHPC-MPI
>>> -f a -x 12345 -p12345 -# 10 -m GTRGAMMA -s
>>> /users/holwani1/jay/ornodko-1582 -n mpitest39
>>>
>>> I get the following output:
>>>
>>> This is the RAxML MPI Worker Process Number: 1
>>> This is the RAxML MPI Worker Process Number: 3
>>>
>>> This is the RAxML MPI Master process
>>>
>>> This is the RAxML MPI Worker Process Number: 7
>>>
>>> This is the RAxML MPI Worker Process Number: 4
>>>
>>> This is the RAxML MPI Worker Process Number: 5
>>>
>>> This is the RAxML MPI Worker Process Number: 2
>>>
>>> This is the RAxML MPI Worker Process Number: 6
>>> IMPORTANT WARNING: Alignment column 1695 contains only undetermined
>>> values which will be treated as missing data
>>>
>>>
>>> IMPORTANT WARNING: Sequences A4_H10 and A3ii_E11 are exactly identical
>>>
>>>
>>> IMPORTANT WARNING: Sequences A2_A08 and A9_C10 are exactly identical
>>>
>>>
>>> IMPORTANT WARNING: Sequences A3ii_B03 and A3ii_C06 are exactly identical
>>>
>>>
>>> IMPORTANT WARNING: Sequences A9_D08 and A9_F10 are exactly identical
>>>
>>>
>>> IMPORTANT WARNING: Sequences A3ii_F07 and A9_C08 are exactly identical
>>>
>>>
>>> IMPORTANT WARNING: Sequences A6_F05 and A6_F11 are exactly identical
>>>
>>> IMPORTANT WARNING
>>> Found 6 sequences that are exactly identical to other sequences in the
>>> alignment.
>>> Normally they should be excluded from the analysis.
>>>
>>>
>>> IMPORTANT WARNING
>>> Found 1 column that contains only undetermined values which will be
>>> treated as missing data.
>>> Normally these columns should be excluded from the analysis.
>>>
>>> An alignment file with undetermined columns and sequence duplicates
>>> removed has already
>>> been printed to file /users/holwani1/jay/ornodko-1582.reduced
>>>
>>>
>>> You are using RAxML version 7.0.4 released by Alexandros Stamatakis in
>>> April 2008
>>>
>>> Alignment has 1280 distinct alignment patterns
>>>
>>> Proportion of gaps and completely undetermined characters in this
>>> alignment: 0.124198
>>>
>>> RAxML rapid bootstrapping and subsequent ML search
>>>
>>>
>>> Executing 10 rapid bootstrap inferences and thereafter a thorough ML
>>> search
>>>
>>> All free model parameters will be estimated by RAxML
>>> GAMMA model of rate heteorgeneity, ML estimate of alpha-parameter
>>> GAMMA Model parameters will be estimated up to an accuracy of
>>> 0.1000000000 Log Likelihood units
>>>
>>> Partition: 0
>>> Name: No Name Provided
>>> DataType: DNA
>>> Substitution Matrix: GTR
>>> Empirical Base Frequencies:
>>> pi(A): 0.261129 pi(C): 0.228570 pi(G): 0.315946 pi(T): 0.194354
>>>
>>>
>>> Switching from GAMMA to CAT for rapid Bootstrap, final ML search will
>>> be conducted under the GAMMA model you specified
>>> Bootstrap[10]: Time 44.442728 bootstrap likelihood -inf, best
>>> rearrangement setting 5
>>> Bootstrap[0]: Time 44.814948 bootstrap likelihood -inf, best
>>> rearrangement setting 5
>>> Bootstrap[6]: Time 46.470371 bootstrap likelihood -inf, best
>>> rearrangement setting 6
>>> [compute-0-11:08698] *** Process received signal ***
>>> [compute-0-11:08698] Signal: Segmentation fault (11)
>>> [compute-0-11:08698] Signal code: Address not mapped (1)
>>> [compute-0-11:08698] Failing at address: 0x408
>>> [compute-0-11:08698] [ 0] /lib64/libpthread.so.0 [0x3fb580de80]
>>> [compute-0-11:08698] [ 1]
>>>
>>> /usr/prog/bioinformatics/RAxML/7.0.4/x86_64/RAxML-7.0.4/raxmlHPC-MPI(hookup+0)
>>> [0x413ca0]
>>> [compute-0-11:08698] [ 2]
>>>
>>> /usr/prog/bioinformatics/RAxML/7.0.4/x86_64/RAxML-7.0.4/raxmlHPC-MPI(restoreTL+0xd9)
>>> [0x442c09]
>>> [compute-0-11:08698] [ 3]
>>> /usr/prog/bioinformatics/RAxML/7.0.4/x86_64/RAxML-7.0.4/raxmlHPC-MPI
>>> [0x42c968]
>>> [compute-0-11:08698] [ 4]
>>>
>>> /usr/prog/bioinformatics/RAxML/7.0.4/x86_64/RAxML-7.0.4/raxmlHPC-MPI(doAllInOne+0x91a)
>>> [0x42b21a]
>>> [compute-0-11:08698] [ 5]
>>>
>>> /usr/prog/bioinformatics/RAxML/7.0.4/x86_64/RAxML-7.0.4/raxmlHPC-MPI(main+0xc25)
>>> [0x4063f5]
>>> [compute-0-11:08698] [ 6] /lib64/libc.so.6(__libc_start_main+0xf4)
>>> [0x3fb501d8b4]
>>> [compute-0-11:08698] [ 7]
>>> /usr/prog/bioinformatics/RAxML/7.0.4/x86_64/RAxML-7.0.4/raxmlHPC-MPI
>>> [0x405719]
>>> [compute-0-11:08698] *** End of error message ***
>>> Bootstrap[1]: Time 8.400332 bootstrap likelihood -inf, best
>>> rearrangement setting 5
>>> --------------------------------------------------------------------------
>>> mpirun noticed that process rank 1 with PID 8698 on node
>>> compute-0-11.local exited on signal 11 (Segmentation fault).
>>> --------------------------------------------------------------------------
>>>
>>>
>>>
>>> My $PATH is
>>> /usr/prog/mpi/openmpi/1.3.3/x86_64-no-mem-man/bin/:/usr/prog/mpi/openmpi/1.3.3/x86_64/bin/:/usr/prog/intel/ifort/11.1.056/bin/intel64:/usr/prog/intel/icc/11.1.056//bin/intel64:/usr/prog/intel/ifort/11.1.056/bin/intel64:/usr/prog/intel/icc/11.1.056//bin/intel64:/opt/gridengine/bin/lx26-amd64:/usr/kerberos/sbin:/usr/kerberos/bin:/opt/gridengine/bin/lx26-amd64:/usr/java/latest/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/opt/ganglia/bin:/opt/ganglia/sbin:/opt/rocks/bin:/opt/rocks/sbin:/root/bin
>>>
>>> My $LD_LIBRARY_PATH is
>>>
>>> /usr/prog/mpi/openmpi/1.3.3/x86_64-no-mem-man/lib/:/usr/prog/mpi/openmpi/1.3.3/x86_64/lib/:/usr/prog/intel/ifort/11.1.056/lib/intel64:/usr/prog/intel/ifort/11.1.056/mkl/lib/em64t:/usr/prog/intel/icc/11.1.056//lib/intel64:/usr/prog/intel/icc/11.1.056//ipp/em64t/sharedlib:/usr/prog/intel/icc/11.1.056//mkl/lib/em64t:/usr/prog/intel/icc/11.1.056//tbb/intel64/cc4.1.0_libc2.4_kernel2.6.16.21/lib:/usr/prog/intel/ifort/11.1.056/lib/intel64:/usr/prog/intel/ifort/11.1.056/mkl/lib/em64t:/usr/prog/intel/icc/11.1.056//lib/intel64:/usr/prog/intel/icc/11.1.056//ipp/em64t/sharedlib:/usr/prog/intel/icc/11.1.056//mkl/lib/em64t:/usr/prog/intel/icc/11.1.056//tbb/intel64/cc4.1.0_libc2.4_kernel2.6.16.21/lib:/opt/gridengine/lib/lx26-amd64:/opt/gridengine/lib/lx26-amd64
>>>
>>> Although I'm only running this on one node, it may be helpful to know
>>> that there is Infiniband with Voltaire OFED v1.4 on the nodes. Rocks'
>>> HPC roll MPIs is not installed. I've tried running the above on
>>> multiple nodes but still see the same error. I've attached the
>>> config.log and ompi_info to the email.
>>>
>>> I believe that the input is OK as I can run the serial gcc-compiled
>>> raXML on the data with no problems. I tried compiling openmpi with
>>> --with-memory-manager=none as a quick google
>>> (http://osdir.com/ml/clustering.open-mpi.user/2008-07/msg00201.html)
>>> suggested that it could help, but it made no difference. Google also
>>> suggested that it could be caused by the compile environment being
>>> different to the runtime, to test this I compiled and ran RaXML
>>> immediately after I compiled Openmpi in the same session, again with
>>> no joy.
>>>
>>> Does any one know how I can fix this?
>>>
>>> Thanks
>>>
>>> Nick
>>>
>>> <config.tar.gz><ompi-info.tar.gz><ATT2831213.txt>
>>
>>
>> --
>> Jeff Squyres
>> jsquyres_at_[hidden]
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>