Difference between revisions of "4. Evaluating the assembly"

From IBERS Bioinformatics and HPC Wiki
Jump to: navigation, search
(Created page with "There are several ways to quantitatively as well as qualitatively assess the overall quality of the assembly. Check Trinity wiki [https://github.com/trinityrnaseq/trinityrnase...")
 
Line 1: Line 1:
 
There are several ways to quantitatively as well as qualitatively assess the overall quality of the assembly. Check Trinity wiki [https://github.com/trinityrnaseq/trinityrnaseq/wiki/Transcriptome-Assembly-Quality-Assessment].
 
There are several ways to quantitatively as well as qualitatively assess the overall quality of the assembly. Check Trinity wiki [https://github.com/trinityrnaseq/trinityrnaseq/wiki/Transcriptome-Assembly-Quality-Assessment].
 +
 +
First of all you can check how many contains you have:
 +
 +
grep '>' Trinity.fasta | wc -l
 +
 +
You also can capture some basic statistics about the Trinity assembly:
 +
 +
$TRINITY_HOME/util/TrinityStats.pl Trinity.fasta
 +
 +
You can run this command on fly. It is quite fast and you don't have to submit it as a job :)
 +
 +
'''Representation of reads:'''
 +
 +
A high quality transcriptome assembly is expected to have strong representation of the reads input to the assembler. By aligning the RNA-Seq reads back to the transcriptome assembly, we can quantify read representation. In Trinity,there is a script that will align each of the fastq files of the paired read set to the assembly separately, and then link up the pairs. This way, we can count the number of reads that are found as properly paired alignments in addition to those that align to separate contigs (improperly paired), in addition to those cases where only one of the paired reads (the left or the right read of the pair) aligns to a contig. The following script wraps the Bowtie software to do the read alignments:
 +
 +
$TRINITY_HOME/util/bowtie_PE_separate_then_join.pl \
 +
      --target Trinity.fasta \
 +
      --seqType fq \
 +
      --left left.fastq --right right.fastq \
 +
      --aligner bowtie -- -p 2 --all --best --strata -m 300
 +
 +
Submit the above command as a job.

Revision as of 14:44, 6 March 2017

There are several ways to quantitatively as well as qualitatively assess the overall quality of the assembly. Check Trinity wiki [1].

First of all you can check how many contains you have:

grep '>' Trinity.fasta | wc -l

You also can capture some basic statistics about the Trinity assembly:

$TRINITY_HOME/util/TrinityStats.pl Trinity.fasta

You can run this command on fly. It is quite fast and you don't have to submit it as a job :)

Representation of reads:

A high quality transcriptome assembly is expected to have strong representation of the reads input to the assembler. By aligning the RNA-Seq reads back to the transcriptome assembly, we can quantify read representation. In Trinity,there is a script that will align each of the fastq files of the paired read set to the assembly separately, and then link up the pairs. This way, we can count the number of reads that are found as properly paired alignments in addition to those that align to separate contigs (improperly paired), in addition to those cases where only one of the paired reads (the left or the right read of the pair) aligns to a contig. The following script wraps the Bowtie software to do the read alignments:

$TRINITY_HOME/util/bowtie_PE_separate_then_join.pl \
     --target Trinity.fasta \
     --seqType fq \
     --left left.fastq --right right.fastq \
     --aligner bowtie -- -p 2 --all --best --strata -m 300

Submit the above command as a job.