Difference between revisions of "Benchmarking: SOAPdenovo (Bioinformatics: De novo assembly)"

From Define Wiki
Jump to navigation Jump to search
 
Line 9: Line 9:
 
* Version 233
 
* Version 233
 
<syntaxhighlight>
 
<syntaxhighlight>
   # Version 233 built with openCC 4.5.2.1
+
   # Version 233 built with openCC/opencc 4.5.2.1
 
   tar zxvf SOAPdenovo2-src-r223.tgz
 
   tar zxvf SOAPdenovo2-src-r223.tgz
 
   cd r233/sparsePregraph
 
   cd r233/sparsePregraph
Line 22: Line 22:
 
* Version 240
 
* Version 240
 
<syntaxhighlight>
 
<syntaxhighlight>
 +
  # Version 240 built with openCC/opencc 4.5.2.1
 +
  tar zxvf SOAPdenovo2-src-r240.tgz
 +
  cd SOAPdenovo2-src-r240
 +
  # edit Makefile, set CC=openCC
 +
  # edit sparsePregraph/Makefile, set CC=openCC
 +
  # edit standardPregraph/Makefile, set CC=opencc
 +
  make
 +
</syntaxhighlight>
  
</syntaxhighlight>
+
== Run Analysis ==
 +
* <tt>example.config</tt>; take this file from the SOAPdenovo home page
 
== Notes on Bioinformatics ==
 
== Notes on Bioinformatics ==
 
Good summary of Bioinformatics computational demands here: http://www.biogrid.jp/pdf/jsbicbi2012.pdf
 
Good summary of Bioinformatics computational demands here: http://www.biogrid.jp/pdf/jsbicbi2012.pdf

Latest revision as of 20:02, 28 July 2013

SOAPdenovo

SOAP has been in evolution from a single alignment tool to a tool package that provides full solution to next generation sequencing data analysis. Currently, it consists of a new alignment tool (SOAPaligner/soap2), a re-sequencing consensus sequence builder (SOAPsnp), an indel finder ( SOAPindel ), a structural variation scanner ( SOAPsv ) and a de novo short reads assembler ( SOAPdenovo ). And a GPU-accelerated alignment tool (SOAP3/GPU) are being implemented.

Build

  # Version 233 built with openCC/opencc 4.5.2.1
  tar zxvf SOAPdenovo2-src-r223.tgz
  cd r233/sparsePregraph
  # edit the Makefile, set CC=openCC
  make
  cd ../standardPregraph
  # edit the Makefile, set CC=opencc
  make 
  make 127mer=1
  • Version 240
  # Version 240 built with openCC/opencc 4.5.2.1
  tar zxvf SOAPdenovo2-src-r240.tgz
  cd SOAPdenovo2-src-r240
  # edit Makefile, set CC=openCC
  # edit sparsePregraph/Makefile, set CC=openCC
  # edit standardPregraph/Makefile, set CC=opencc
  make

Run Analysis

  • example.config; take this file from the SOAPdenovo home page

Notes on Bioinformatics

Good summary of Bioinformatics computational demands here: http://www.biogrid.jp/pdf/jsbicbi2012.pdf


Computational requirements for the different types of Analysis

Pairwise sequence alignment

  • Embarrassingly parallel with no communication
  • Batches of reads run on different nodes using threading within a node

De novo assembly

  • Large shared memory needed in a single node
  • More memory needed for reads with more errors
  • Threading available

Phylogenetic tree inference

  • Parallelization available at fine- and coarse-grain levels
  • Hybrid parallel approach using MPI and Pthreads in single RAxML job
  • Alternatively, multiple threaded jobs using RAxML-Light

Database search & retrieval

  • Can benefit from using flash memory instead of disk

Applications used for different Analysis types

Pairwise sequence alignment

  • ATAC, BFAST, BLAST, BLAT, Bowtie, BWA

Multiple sequence alignment (via CIPRES gateway)

  • ClustalW, MAFFT

RNA-Seq analysis

  • TopHat, Cufflinks

De novo assembly

  • ABySS, SOAPdenovo, Velvet

Phylogenetic tree inference (via CIPRES gateway)

  • BEAST with BEAGLE, GARLI, MrBayes, RAxML, RAxML-Light

Toolkits

  • BEDTools, GATK, SAMtools

Database search & retrieval

  • IntegromeDB