Difference between revisions of "8. De novo assembly with Velvet"

From IBERS Bioinformatics and HPC Wiki
Jump to: navigation, search
Line 13: Line 13:
 
  module load velvet/1.2.10
 
  module load velvet/1.2.10
 
  cd /ibers/ernie/scratch/vpl/zebra_fish
 
  cd /ibers/ernie/scratch/vpl/zebra_fish
 +
 +
The general usage of the velveth module is:
 +
 +
Usage:
 +
./velveth directory hash_length {[-file_format][-read_type][- separate|-interleaved] filename1 [filename2 ...]} {...} [options]
 +
 +
where '''directory''' is the name of the output folder and '''hash_length''' is the size of the  '''k-mers'''  in  which  reads  are  decomposed  (must  be  an  even  number).  Now  let’s  write  the  velveth  command  to  assemble  reads  from  the 6 hours  zebrafish  embryo (all  in  one  line):
 +
 +
velveth velvet_out 29 -fastq data/6h_1.trim.fastq data/6h_2.trim.fastq

Revision as of 11:43, 22 February 2016

Velvet is an assembly algorithm developed for genomic assembly, but it can be applied to transcriptome assembly as well. A more recent algorithm from the same authors, called Oases, was developed specifically for un-guided transcriptome reconstruction, but for our purposes the original velvet is enough. Other de novo transcriptome reconstruction algorithms exist, for example Trinity, which is generally more accurate but has very large computational requirements, especially for memory. Trinity offers also modules for gene expression and differential expression estimation, while velvet is limited to transcript reconstruction. We will make a similar usage guide for Trinity in another session.

Velvet is composed by two modules, velveth and velvetg, that need to be run one after the other. The first module analyzes the reads, decomposes them in sub-sequences of fixed length k (called k-mers) and builds an index of all k-mers. The second module takes as input the output of velveth and builds structures in which reads overlap, called contigs, corresponding to transcripts when the input contains RNA-seq reads. Let’s open a script for running velvet, and write the header, load the modules and point to the working directory:

#$ -S /bin/sh
#$ -cwd
#$ -q amd.q,large.q,intel.q
#$ -l h_vmem=16G
#$ -e velveth.e
#$ -N velveth
#$ -o velveth.o
module load velvet/1.2.10
cd /ibers/ernie/scratch/vpl/zebra_fish

The general usage of the velveth module is:

Usage:

./velveth directory hash_length {[-file_format][-read_type][- separate|-interleaved] filename1 [filename2 ...]} {...} [options]

where directory is the name of the output folder and hash_length is the size of the k-mers in which reads are decomposed (must be an even number). Now let’s write the velveth command to assemble reads from the 6 hours zebrafish embryo (all in one line):

velveth velvet_out 29 -fastq data/6h_1.trim.fastq data/6h_2.trim.fastq