Difference between revisions of "Benchmarking: SOAPdenovo (Bioinformatics: De novo assembly)"
(Created page with "== SOAPdenovo == * SOAPdenovo: (SOAP) Short Oligonucleotide Analysis Package * Homepage: http://soap.genomics.org.cn/index.html SOAP has been in evolution from a single align...") |
|||
| (One intermediate revision by the same user not shown) | |||
| Line 9: | Line 9: | ||
* Version 233 | * Version 233 | ||
<syntaxhighlight> | <syntaxhighlight> | ||
| − | # Version 233 built with openCC 4.5.2.1 | + | # Version 233 built with openCC/opencc 4.5.2.1 |
tar zxvf SOAPdenovo2-src-r223.tgz | tar zxvf SOAPdenovo2-src-r223.tgz | ||
cd r233/sparsePregraph | cd r233/sparsePregraph | ||
| Line 22: | Line 22: | ||
* Version 240 | * Version 240 | ||
<syntaxhighlight> | <syntaxhighlight> | ||
| + | # Version 240 built with openCC/opencc 4.5.2.1 | ||
| + | tar zxvf SOAPdenovo2-src-r240.tgz | ||
| + | cd SOAPdenovo2-src-r240 | ||
| + | # edit Makefile, set CC=openCC | ||
| + | # edit sparsePregraph/Makefile, set CC=openCC | ||
| + | # edit standardPregraph/Makefile, set CC=opencc | ||
| + | make | ||
| + | </syntaxhighlight> | ||
| − | </ | + | == Run Analysis == |
| + | * <tt>example.config</tt>; take this file from the SOAPdenovo home page | ||
== Notes on Bioinformatics == | == Notes on Bioinformatics == | ||
Good summary of Bioinformatics computational demands here: http://www.biogrid.jp/pdf/jsbicbi2012.pdf | Good summary of Bioinformatics computational demands here: http://www.biogrid.jp/pdf/jsbicbi2012.pdf | ||
| Line 46: | Line 55: | ||
'''Database search & retrieval''' | '''Database search & retrieval''' | ||
* Can benefit from using flash memory instead of disk | * Can benefit from using flash memory instead of disk | ||
| + | |||
| + | === Applications used for different Analysis types === | ||
| + | '''Pairwise sequence alignment''' | ||
| + | * ATAC, BFAST, BLAST, BLAT, Bowtie, BWA | ||
| + | |||
| + | '''Multiple sequence alignment (via CIPRES gateway)''' | ||
| + | * ClustalW, MAFFT | ||
| + | |||
| + | '''RNA-Seq analysis''' | ||
| + | * TopHat, Cufflinks | ||
| + | |||
| + | '''De novo assembly''' | ||
| + | * ABySS, SOAPdenovo, Velvet | ||
| + | |||
| + | '''Phylogenetic tree inference (via CIPRES gateway)''' | ||
| + | * BEAST with BEAGLE, GARLI, MrBayes, RAxML, RAxML-Light | ||
| + | |||
| + | '''Toolkits''' | ||
| + | * BEDTools, GATK, SAMtools | ||
| + | |||
| + | '''Database search & retrieval''' | ||
| + | * IntegromeDB | ||
Latest revision as of 20:02, 28 July 2013
SOAPdenovo
- SOAPdenovo: (SOAP) Short Oligonucleotide Analysis Package
- Homepage: http://soap.genomics.org.cn/index.html
SOAP has been in evolution from a single alignment tool to a tool package that provides full solution to next generation sequencing data analysis. Currently, it consists of a new alignment tool (SOAPaligner/soap2), a re-sequencing consensus sequence builder (SOAPsnp), an indel finder ( SOAPindel ), a structural variation scanner ( SOAPsv ) and a de novo short reads assembler ( SOAPdenovo ). And a GPU-accelerated alignment tool (SOAP3/GPU) are being implemented.
Build
- Get the source from: http://sourceforge.net/projects/soapdenovo2/files/?source=navbar
- Version 233
# Version 233 built with openCC/opencc 4.5.2.1
tar zxvf SOAPdenovo2-src-r223.tgz
cd r233/sparsePregraph
# edit the Makefile, set CC=openCC
make
cd ../standardPregraph
# edit the Makefile, set CC=opencc
make
make 127mer=1- Version 240
# Version 240 built with openCC/opencc 4.5.2.1
tar zxvf SOAPdenovo2-src-r240.tgz
cd SOAPdenovo2-src-r240
# edit Makefile, set CC=openCC
# edit sparsePregraph/Makefile, set CC=openCC
# edit standardPregraph/Makefile, set CC=opencc
makeRun Analysis
- example.config; take this file from the SOAPdenovo home page
Notes on Bioinformatics
Good summary of Bioinformatics computational demands here: http://www.biogrid.jp/pdf/jsbicbi2012.pdf
Computational requirements for the different types of Analysis
Pairwise sequence alignment
- Embarrassingly parallel with no communication
- Batches of reads run on different nodes using threading within a node
De novo assembly
- Large shared memory needed in a single node
- More memory needed for reads with more errors
- Threading available
Phylogenetic tree inference
- Parallelization available at fine- and coarse-grain levels
- Hybrid parallel approach using MPI and Pthreads in single RAxML job
- Alternatively, multiple threaded jobs using RAxML-Light
Database search & retrieval
- Can benefit from using flash memory instead of disk
Applications used for different Analysis types
Pairwise sequence alignment
- ATAC, BFAST, BLAST, BLAT, Bowtie, BWA
Multiple sequence alignment (via CIPRES gateway)
- ClustalW, MAFFT
RNA-Seq analysis
- TopHat, Cufflinks
De novo assembly
- ABySS, SOAPdenovo, Velvet
Phylogenetic tree inference (via CIPRES gateway)
- BEAST with BEAGLE, GARLI, MrBayes, RAxML, RAxML-Light
Toolkits
- BEDTools, GATK, SAMtools
Database search & retrieval
- IntegromeDB