Difference between revisions of "Benchmarking: SOAPdenovo (Bioinformatics: De novo assembly)"

From Define Wiki
Jump to navigation Jump to search
(Created page with "== SOAPdenovo == * SOAPdenovo: (SOAP) Short Oligonucleotide Analysis Package * Homepage: http://soap.genomics.org.cn/index.html SOAP has been in evolution from a single align...")
 
Line 46: Line 46:
 
'''Database search & retrieval'''
 
'''Database search & retrieval'''
 
* Can benefit from using flash memory instead of disk
 
* Can benefit from using flash memory instead of disk
 +
 +
=== Applications used for different Analysis types ===
 +
'''Pairwise sequence alignment'''
 +
* ATAC, BFAST, BLAST, BLAT, Bowtie, BWA
 +
 +
'''Multiple sequence alignment (via CIPRES gateway)'''
 +
* ClustalW, MAFFT
 +
 +
'''RNA-Seq analysis'''
 +
* TopHat, Cufflinks
 +
 +
'''De novo assembly'''
 +
* ABySS, SOAPdenovo, Velvet
 +
 +
'''Phylogenetic tree inference (via CIPRES gateway)'''
 +
* BEAST with BEAGLE, GARLI, MrBayes, RAxML, RAxML-Light
 +
 +
'''Toolkits'''
 +
* BEDTools, GATK, SAMtools
 +
 +
'''Database search & retrieval'''
 +
* IntegromeDB

Revision as of 14:30, 28 July 2013

SOAPdenovo

SOAP has been in evolution from a single alignment tool to a tool package that provides full solution to next generation sequencing data analysis. Currently, it consists of a new alignment tool (SOAPaligner/soap2), a re-sequencing consensus sequence builder (SOAPsnp), an indel finder ( SOAPindel ), a structural variation scanner ( SOAPsv ) and a de novo short reads assembler ( SOAPdenovo ). And a GPU-accelerated alignment tool (SOAP3/GPU) are being implemented.

Build

  # Version 233 built with openCC 4.5.2.1
  tar zxvf SOAPdenovo2-src-r223.tgz
  cd r233/sparsePregraph
  # edit the Makefile, set CC=openCC
  make
  cd ../standardPregraph
  # edit the Makefile, set CC=opencc
  make 
  make 127mer=1
  • Version 240

Notes on Bioinformatics

Good summary of Bioinformatics computational demands here: http://www.biogrid.jp/pdf/jsbicbi2012.pdf


Computational requirements for the different types of Analysis

Pairwise sequence alignment

  • Embarrassingly parallel with no communication
  • Batches of reads run on different nodes using threading within a node

De novo assembly

  • Large shared memory needed in a single node
  • More memory needed for reads with more errors
  • Threading available

Phylogenetic tree inference

  • Parallelization available at fine- and coarse-grain levels
  • Hybrid parallel approach using MPI and Pthreads in single RAxML job
  • Alternatively, multiple threaded jobs using RAxML-Light

Database search & retrieval

  • Can benefit from using flash memory instead of disk

Applications used for different Analysis types

Pairwise sequence alignment

  • ATAC, BFAST, BLAST, BLAT, Bowtie, BWA

Multiple sequence alignment (via CIPRES gateway)

  • ClustalW, MAFFT

RNA-Seq analysis

  • TopHat, Cufflinks

De novo assembly

  • ABySS, SOAPdenovo, Velvet

Phylogenetic tree inference (via CIPRES gateway)

  • BEAST with BEAGLE, GARLI, MrBayes, RAxML, RAxML-Light

Toolkits

  • BEDTools, GATK, SAMtools

Database search & retrieval

  • IntegromeDB