Difference between revisions of "Benchmarking: SOAPdenovo (Bioinformatics: De novo assembly)"
(Created page with "== SOAPdenovo == * SOAPdenovo: (SOAP) Short Oligonucleotide Analysis Package * Homepage: http://soap.genomics.org.cn/index.html SOAP has been in evolution from a single align...") |
|||
| Line 46: | Line 46: | ||
'''Database search & retrieval''' | '''Database search & retrieval''' | ||
* Can benefit from using flash memory instead of disk | * Can benefit from using flash memory instead of disk | ||
| + | |||
| + | === Applications used for different Analysis types === | ||
| + | '''Pairwise sequence alignment''' | ||
| + | * ATAC, BFAST, BLAST, BLAT, Bowtie, BWA | ||
| + | |||
| + | '''Multiple sequence alignment (via CIPRES gateway)''' | ||
| + | * ClustalW, MAFFT | ||
| + | |||
| + | '''RNA-Seq analysis''' | ||
| + | * TopHat, Cufflinks | ||
| + | |||
| + | '''De novo assembly''' | ||
| + | * ABySS, SOAPdenovo, Velvet | ||
| + | |||
| + | '''Phylogenetic tree inference (via CIPRES gateway)''' | ||
| + | * BEAST with BEAGLE, GARLI, MrBayes, RAxML, RAxML-Light | ||
| + | |||
| + | '''Toolkits''' | ||
| + | * BEDTools, GATK, SAMtools | ||
| + | |||
| + | '''Database search & retrieval''' | ||
| + | * IntegromeDB | ||
Revision as of 14:30, 28 July 2013
SOAPdenovo
- SOAPdenovo: (SOAP) Short Oligonucleotide Analysis Package
- Homepage: http://soap.genomics.org.cn/index.html
SOAP has been in evolution from a single alignment tool to a tool package that provides full solution to next generation sequencing data analysis. Currently, it consists of a new alignment tool (SOAPaligner/soap2), a re-sequencing consensus sequence builder (SOAPsnp), an indel finder ( SOAPindel ), a structural variation scanner ( SOAPsv ) and a de novo short reads assembler ( SOAPdenovo ). And a GPU-accelerated alignment tool (SOAP3/GPU) are being implemented.
Build
- Get the source from: http://sourceforge.net/projects/soapdenovo2/files/?source=navbar
- Version 233
# Version 233 built with openCC 4.5.2.1
tar zxvf SOAPdenovo2-src-r223.tgz
cd r233/sparsePregraph
# edit the Makefile, set CC=openCC
make
cd ../standardPregraph
# edit the Makefile, set CC=opencc
make
make 127mer=1- Version 240
Notes on Bioinformatics
Good summary of Bioinformatics computational demands here: http://www.biogrid.jp/pdf/jsbicbi2012.pdf
Computational requirements for the different types of Analysis
Pairwise sequence alignment
- Embarrassingly parallel with no communication
- Batches of reads run on different nodes using threading within a node
De novo assembly
- Large shared memory needed in a single node
- More memory needed for reads with more errors
- Threading available
Phylogenetic tree inference
- Parallelization available at fine- and coarse-grain levels
- Hybrid parallel approach using MPI and Pthreads in single RAxML job
- Alternatively, multiple threaded jobs using RAxML-Light
Database search & retrieval
- Can benefit from using flash memory instead of disk
Applications used for different Analysis types
Pairwise sequence alignment
- ATAC, BFAST, BLAST, BLAT, Bowtie, BWA
Multiple sequence alignment (via CIPRES gateway)
- ClustalW, MAFFT
RNA-Seq analysis
- TopHat, Cufflinks
De novo assembly
- ABySS, SOAPdenovo, Velvet
Phylogenetic tree inference (via CIPRES gateway)
- BEAST with BEAGLE, GARLI, MrBayes, RAxML, RAxML-Light
Toolkits
- BEDTools, GATK, SAMtools
Database search & retrieval
- IntegromeDB