Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Segmentation fault whilst running RaXML-MPI
From: Nick Holway (nick.holway_at_[hidden])
Date: 2009-11-06 09:56:42


Hi,

Thank you for the information, I'm going to try the new Intel
Compilers which I'm downloading now, but as they're taking so long to
download I don't think I'm going to be able to look into this again
until after the weekend. BTW using their java-based downloader is a
bit less painful than their normal download.

In the meantime, if anyone else has some suggestions then please let me know.

Thanks

Nick

2009/11/5 Jeff Squyres <jsquyres_at_[hidden]>:
> FWIW, I think Intel released 11.1.059 earlier today (I've been trying to
> download it all morning).  I doubt it's an issue in this case, but I thought
> I'd mention it as a public service announcement.  ;-)
>
> Seg faults are *usually* an application issue (never say "never", but they
> *usually* are).  You might want to first contact the RaXML team to see if
> there are any known issues with their software and Open MPI 1.3.3...?
>  (Sorry, I'm totally unfamiliar with RaXML)
>
> On Nov 5, 2009, at 12:30 PM, Nick Holway wrote:
>
>> Dear all,
>>
>> I'm trying to run RaXML 7.0.4 on my 64bit Rocks 5.1 cluster (ie Centos
>> 5.2). I compiled Open MPI 1.3.3 using the Intel compilers v 11.1.056
>> using ./configure CC=icc CXX=icpc F77=ifort FC=ifort --with-sge
>> --prefix=/usr/prog/mpi/openmpi/1.3.3/x86_64-no-mem-man
>> --with-memory-manager=none.
>>
>> When I run run RaXML in a qlogin session using
>> /usr/prog/mpi/openmpi/1.3.3/x86_64-no-mem-man/bin/mpirun -np 8
>> /usr/prog/bioinformatics/RAxML/7.0.4/x86_64/RAxML-7.0.4/raxmlHPC-MPI
>> -f a -x 12345 -p12345 -# 10 -m GTRGAMMA -s
>> /users/holwani1/jay/ornodko-1582 -n mpitest39
>>
>> I get the following output:
>>
>> This is the RAxML MPI Worker Process Number: 1
>> This is the RAxML MPI Worker Process Number: 3
>>
>> This is the RAxML MPI Master process
>>
>> This is the RAxML MPI Worker Process Number: 7
>>
>> This is the RAxML MPI Worker Process Number: 4
>>
>> This is the RAxML MPI Worker Process Number: 5
>>
>> This is the RAxML MPI Worker Process Number: 2
>>
>> This is the RAxML MPI Worker Process Number: 6
>> IMPORTANT WARNING: Alignment column 1695 contains only undetermined
>> values which will be treated as missing data
>>
>>
>> IMPORTANT WARNING: Sequences A4_H10 and A3ii_E11 are exactly identical
>>
>>
>> IMPORTANT WARNING: Sequences A2_A08 and A9_C10 are exactly identical
>>
>>
>> IMPORTANT WARNING: Sequences A3ii_B03 and A3ii_C06 are exactly identical
>>
>>
>> IMPORTANT WARNING: Sequences A9_D08 and A9_F10 are exactly identical
>>
>>
>> IMPORTANT WARNING: Sequences A3ii_F07 and A9_C08 are exactly identical
>>
>>
>> IMPORTANT WARNING: Sequences A6_F05 and A6_F11 are exactly identical
>>
>> IMPORTANT WARNING
>> Found 6 sequences that are exactly identical to other sequences in the
>> alignment.
>> Normally they should be excluded from the analysis.
>>
>>
>> IMPORTANT WARNING
>> Found 1 column that contains only undetermined values which will be
>> treated as missing data.
>> Normally these columns should be excluded from the analysis.
>>
>> An alignment file with undetermined columns and sequence duplicates
>> removed has already
>> been printed to file /users/holwani1/jay/ornodko-1582.reduced
>>
>>
>> You are using RAxML version 7.0.4 released by Alexandros Stamatakis in
>> April 2008
>>
>> Alignment has 1280 distinct alignment patterns
>>
>> Proportion of gaps and completely undetermined characters in this
>> alignment: 0.124198
>>
>> RAxML rapid bootstrapping and subsequent ML search
>>
>>
>> Executing 10 rapid bootstrap inferences and thereafter a thorough ML
>> search
>>
>> All free model parameters will be estimated by RAxML
>> GAMMA model of rate heteorgeneity, ML estimate of alpha-parameter
>> GAMMA Model parameters will be estimated up to an accuracy of
>> 0.1000000000 Log Likelihood units
>>
>> Partition: 0
>> Name: No Name Provided
>> DataType: DNA
>> Substitution Matrix: GTR
>> Empirical Base Frequencies:
>> pi(A): 0.261129 pi(C): 0.228570 pi(G): 0.315946 pi(T): 0.194354
>>
>>
>> Switching from GAMMA to CAT for rapid Bootstrap, final ML search will
>> be conducted under the GAMMA model you specified
>> Bootstrap[10]: Time 44.442728 bootstrap likelihood -inf, best
>> rearrangement setting 5
>> Bootstrap[0]: Time 44.814948 bootstrap likelihood -inf, best
>> rearrangement setting 5
>> Bootstrap[6]: Time 46.470371 bootstrap likelihood -inf, best
>> rearrangement setting 6
>> [compute-0-11:08698] *** Process received signal ***
>> [compute-0-11:08698] Signal: Segmentation fault (11)
>> [compute-0-11:08698] Signal code: Address not mapped (1)
>> [compute-0-11:08698] Failing at address: 0x408
>> [compute-0-11:08698] [ 0] /lib64/libpthread.so.0 [0x3fb580de80]
>> [compute-0-11:08698] [ 1]
>>
>> /usr/prog/bioinformatics/RAxML/7.0.4/x86_64/RAxML-7.0.4/raxmlHPC-MPI(hookup+0)
>> [0x413ca0]
>> [compute-0-11:08698] [ 2]
>>
>> /usr/prog/bioinformatics/RAxML/7.0.4/x86_64/RAxML-7.0.4/raxmlHPC-MPI(restoreTL+0xd9)
>> [0x442c09]
>> [compute-0-11:08698] [ 3]
>> /usr/prog/bioinformatics/RAxML/7.0.4/x86_64/RAxML-7.0.4/raxmlHPC-MPI
>> [0x42c968]
>> [compute-0-11:08698] [ 4]
>>
>> /usr/prog/bioinformatics/RAxML/7.0.4/x86_64/RAxML-7.0.4/raxmlHPC-MPI(doAllInOne+0x91a)
>> [0x42b21a]
>> [compute-0-11:08698] [ 5]
>>
>> /usr/prog/bioinformatics/RAxML/7.0.4/x86_64/RAxML-7.0.4/raxmlHPC-MPI(main+0xc25)
>> [0x4063f5]
>> [compute-0-11:08698] [ 6] /lib64/libc.so.6(__libc_start_main+0xf4)
>> [0x3fb501d8b4]
>> [compute-0-11:08698] [ 7]
>> /usr/prog/bioinformatics/RAxML/7.0.4/x86_64/RAxML-7.0.4/raxmlHPC-MPI
>> [0x405719]
>> [compute-0-11:08698] *** End of error message ***
>> Bootstrap[1]: Time 8.400332 bootstrap likelihood -inf, best
>> rearrangement setting 5
>> --------------------------------------------------------------------------
>> mpirun noticed that process rank 1 with PID 8698 on node
>> compute-0-11.local exited on signal 11 (Segmentation fault).
>> --------------------------------------------------------------------------
>>
>>
>>
>> My $PATH is
>> /usr/prog/mpi/openmpi/1.3.3/x86_64-no-mem-man/bin/:/usr/prog/mpi/openmpi/1.3.3/x86_64/bin/:/usr/prog/intel/ifort/11.1.056/bin/intel64:/usr/prog/intel/icc/11.1.056//bin/intel64:/usr/prog/intel/ifort/11.1.056/bin/intel64:/usr/prog/intel/icc/11.1.056//bin/intel64:/opt/gridengine/bin/lx26-amd64:/usr/kerberos/sbin:/usr/kerberos/bin:/opt/gridengine/bin/lx26-amd64:/usr/java/latest/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/opt/ganglia/bin:/opt/ganglia/sbin:/opt/rocks/bin:/opt/rocks/sbin:/root/bin
>>
>> My $LD_LIBRARY_PATH is
>>
>> /usr/prog/mpi/openmpi/1.3.3/x86_64-no-mem-man/lib/:/usr/prog/mpi/openmpi/1.3.3/x86_64/lib/:/usr/prog/intel/ifort/11.1.056/lib/intel64:/usr/prog/intel/ifort/11.1.056/mkl/lib/em64t:/usr/prog/intel/icc/11.1.056//lib/intel64:/usr/prog/intel/icc/11.1.056//ipp/em64t/sharedlib:/usr/prog/intel/icc/11.1.056//mkl/lib/em64t:/usr/prog/intel/icc/11.1.056//tbb/intel64/cc4.1.0_libc2.4_kernel2.6.16.21/lib:/usr/prog/intel/ifort/11.1.056/lib/intel64:/usr/prog/intel/ifort/11.1.056/mkl/lib/em64t:/usr/prog/intel/icc/11.1.056//lib/intel64:/usr/prog/intel/icc/11.1.056//ipp/em64t/sharedlib:/usr/prog/intel/icc/11.1.056//mkl/lib/em64t:/usr/prog/intel/icc/11.1.056//tbb/intel64/cc4.1.0_libc2.4_kernel2.6.16.21/lib:/opt/gridengine/lib/lx26-amd64:/opt/gridengine/lib/lx26-amd64
>>
>> Although I'm only running this on one node, it may be helpful to know
>> that there is Infiniband with Voltaire OFED v1.4 on the nodes. Rocks'
>> HPC roll MPIs is not installed. I've tried running the above on
>> multiple nodes but still see the same error. I've attached the
>> config.log and ompi_info to the email.
>>
>> I believe that the input is OK as I can run the serial gcc-compiled
>> raXML on the data with no problems. I tried compiling openmpi with
>> --with-memory-manager=none as a quick google
>> (http://osdir.com/ml/clustering.open-mpi.user/2008-07/msg00201.html)
>> suggested that it could help, but it made no difference. Google also
>> suggested that it could be caused by the compile environment being
>> different to the runtime, to test this I compiled and ran RaXML
>> immediately after I compiled Openmpi in the same session, again with
>> no joy.
>>
>> Does any one know how I can fix this?
>>
>> Thanks
>>
>> Nick
>>
>> <config.tar.gz><ompi-info.tar.gz><ATT2831213.txt>
>
>
> --
> Jeff Squyres
> jsquyres_at_[hidden]
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>