Benchmarking: SOAPdenovo (Bioinformatics: De novo assembly)
SOAPdenovo
- SOAPdenovo: (SOAP) Short Oligonucleotide Analysis Package
- Homepage: http://soap.genomics.org.cn/index.html
SOAP has been in evolution from a single alignment tool to a tool package that provides full solution to next generation sequencing data analysis. Currently, it consists of a new alignment tool (SOAPaligner/soap2), a re-sequencing consensus sequence builder (SOAPsnp), an indel finder ( SOAPindel ), a structural variation scanner ( SOAPsv ) and a de novo short reads assembler ( SOAPdenovo ). And a GPU-accelerated alignment tool (SOAP3/GPU) are being implemented.
Build
- Get the source from: http://sourceforge.net/projects/soapdenovo2/files/?source=navbar
- Version 233
# Version 233 built with openCC 4.5.2.1
tar zxvf SOAPdenovo2-src-r223.tgz
cd r233/sparsePregraph
# edit the Makefile, set CC=openCC
make
cd ../standardPregraph
# edit the Makefile, set CC=opencc
make
make 127mer=1- Version 240
Notes on Bioinformatics
Good summary of Bioinformatics computational demands here: http://www.biogrid.jp/pdf/jsbicbi2012.pdf
Computational requirements for the different types of Analysis
Pairwise sequence alignment
- Embarrassingly parallel with no communication
- Batches of reads run on different nodes using threading within a node
De novo assembly
- Large shared memory needed in a single node
- More memory needed for reads with more errors
- Threading available
Phylogenetic tree inference
- Parallelization available at fine- and coarse-grain levels
- Hybrid parallel approach using MPI and Pthreads in single RAxML job
- Alternatively, multiple threaded jobs using RAxML-Light
Database search & retrieval
- Can benefit from using flash memory instead of disk
Applications used for different Analysis types
Pairwise sequence alignment
- ATAC, BFAST, BLAST, BLAT, Bowtie, BWA
Multiple sequence alignment (via CIPRES gateway)
- ClustalW, MAFFT
RNA-Seq analysis
- TopHat, Cufflinks
De novo assembly
- ABySS, SOAPdenovo, Velvet
Phylogenetic tree inference (via CIPRES gateway)
- BEAST with BEAGLE, GARLI, MrBayes, RAxML, RAxML-Light
Toolkits
- BEDTools, GATK, SAMtools
Database search & retrieval
- IntegromeDB