Difference between revisions of "4. Evaluating the assembly"
Line 22: | Line 22: | ||
Submit the above command as a job. | Submit the above command as a job. | ||
+ | |||
+ | The output files will exist in a bowtie_out/ subdirectory. You can look at the list of files generated like so: | ||
+ | |||
+ | ls -ltr bowtie_out/ | ||
+ | |||
+ | Among the outputs, you should find a name-sorted bam file (where aligned reads are sorted by their name, instead of by their coordinate) called 'bowtie_out/bowtie_out.nameSorted.bam'. Using this file and another script in the Trinity software suite, you can count the number of reads that correspond to the different alignment categories: | ||
+ | |||
+ | $TRINITY_HOME/util/SAM_nameSorted_to_uniq_count_stats.pl \ | ||
+ | bowtie_out/bowtie_out.nameSorted.bam | ||
+ | |||
+ | The output could be like: | ||
+ | |||
+ | #read_type count pct | ||
+ | proper_pairs 96611 97.38 | ||
+ | improper_pairs 868 0.87 | ||
+ | left_only 867 0.87 | ||
+ | right_only 867 0.87 | ||
+ | Total aligned rnaseq fragments: 99213 |
Revision as of 14:49, 6 March 2017
There are several ways to quantitatively as well as qualitatively assess the overall quality of the assembly. Check Trinity wiki [1].
First of all you can check how many contains you have:
grep '>' Trinity.fasta | wc -l
You also can capture some basic statistics about the Trinity assembly:
$TRINITY_HOME/util/TrinityStats.pl Trinity.fasta
You can run this command on fly. It is quite fast and you don't have to submit it as a job :)
Representation of reads:
A high quality transcriptome assembly is expected to have strong representation of the reads input to the assembler. By aligning the RNA-Seq reads back to the transcriptome assembly, we can quantify read representation. In Trinity,there is a script that will align each of the fastq files of the paired read set to the assembly separately, and then link up the pairs. This way, we can count the number of reads that are found as properly paired alignments in addition to those that align to separate contigs (improperly paired), in addition to those cases where only one of the paired reads (the left or the right read of the pair) aligns to a contig. The following script wraps the Bowtie software to do the read alignments:
$TRINITY_HOME/util/bowtie_PE_separate_then_join.pl \ --target Trinity.fasta \ --seqType fq \ --left left.fastq --right right.fastq \ --aligner bowtie -- -p 2 --all --best --strata -m 300
Submit the above command as a job.
The output files will exist in a bowtie_out/ subdirectory. You can look at the list of files generated like so:
ls -ltr bowtie_out/
Among the outputs, you should find a name-sorted bam file (where aligned reads are sorted by their name, instead of by their coordinate) called 'bowtie_out/bowtie_out.nameSorted.bam'. Using this file and another script in the Trinity software suite, you can count the number of reads that correspond to the different alignment categories:
$TRINITY_HOME/util/SAM_nameSorted_to_uniq_count_stats.pl \ bowtie_out/bowtie_out.nameSorted.bam
The output could be like:
#read_type count pct proper_pairs 96611 97.38 improper_pairs 868 0.87 left_only 867 0.87 right_only 867 0.87 Total aligned rnaseq fragments: 99213