Benchmarking: SOAPdenovo (Bioinformatics: De novo assembly)

From Define Wiki
Revision as of 14:28, 28 July 2013 by David (talk | contribs) (Created page with "== SOAPdenovo == * SOAPdenovo: (SOAP) Short Oligonucleotide Analysis Package * Homepage: http://soap.genomics.org.cn/index.html SOAP has been in evolution from a single align...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

SOAPdenovo

SOAP has been in evolution from a single alignment tool to a tool package that provides full solution to next generation sequencing data analysis. Currently, it consists of a new alignment tool (SOAPaligner/soap2), a re-sequencing consensus sequence builder (SOAPsnp), an indel finder ( SOAPindel ), a structural variation scanner ( SOAPsv ) and a de novo short reads assembler ( SOAPdenovo ). And a GPU-accelerated alignment tool (SOAP3/GPU) are being implemented.

Build

  # Version 233 built with openCC 4.5.2.1
  tar zxvf SOAPdenovo2-src-r223.tgz
  cd r233/sparsePregraph
  # edit the Makefile, set CC=openCC
  make
  cd ../standardPregraph
  # edit the Makefile, set CC=opencc
  make 
  make 127mer=1
  • Version 240

Notes on Bioinformatics

Good summary of Bioinformatics computational demands here: http://www.biogrid.jp/pdf/jsbicbi2012.pdf


Computational requirements for the different types of Analysis

Pairwise sequence alignment

  • Embarrassingly parallel with no communication
  • Batches of reads run on different nodes using threading within a node

De novo assembly

  • Large shared memory needed in a single node
  • More memory needed for reads with more errors
  • Threading available

Phylogenetic tree inference

  • Parallelization available at fine- and coarse-grain levels
  • Hybrid parallel approach using MPI and Pthreads in single RAxML job
  • Alternatively, multiple threaded jobs using RAxML-Light

Database search & retrieval

  • Can benefit from using flash memory instead of disk