7. Evaluating the assembly with ExN50
Below is described the use of an alternative statistic - the ExN50 value, which you assert is more useful in assessing the quality of the transcriptome assembly. The ExN50 indicates the N50 contig statistic but restricted to the top most highly expressed transcripts. Compute it like so:
$TRINITY_HOME/util/misc/contig_ExN50_statistic.pl Trinity_trans.TMM.EXPR.matrix \ Trinity.fasta > ExN50.stats
View the contents of the above output file:
cat ExN50.stats | column -t
A sample file:
#E min_expr E-N50 num_transcripts E3 320852.974 290 1 E5 20156.591 290 2 E6 20156.591 290 3 E7 20156.591 415 4 E8 20156.591 427 5 E9 14609.172 610 6 E10 9892.739 801 7 ... E79 151.033 1716 1030 E80 149.749 1757 1107 E81 139.449 1780 1189 E82 133.932 1801 1278 E83 118.854 1819 1375 E84 101.459 1848 1481 E85 93.910 1860 1596 E86 87.649 1897 1722 E87 80.252 1920 1860 E88 72.408 1939 2011 E89 65.075 1984 2178 E90 57.569 2008 2361 E91 51.728 2022 2565 E92 47.303 2043 2794 E93 41.027 2091 3053 E94 35.334 2132 3350 E95 30.830 2166 3695 E96 25.734 2220 4107 E97 20.764 2245 4613 E98 14.500 2234 5273 E99 12.416 2152 6181 E100 0.037 2066 7704
The N50 based on all expressed transcript contigs (E100, in the above sample) is 2066, but we find the peak N50 at E97 of 2245. The peak ExN50 can vary considerably depending on the assembly, and it can often be an indicator of the quality of the assembly.
Plotting the ExN50 statistics:
$TRINITY_HOME/util/misc/plot_ExN50_statistic.Rscript ExN50.stats
xpdf ExN50.stats.plot.pdf
Examples of ExN50 plots based on assemblies varying the number of input reads are available here: [1]