Difference between revisions of "5. Transcript expression quantitation"
(Created page with "'''Abundance estimation using RSEM''' To estimate the expression levels of the Trinity-reconstructed transcripts, you can use the strategy supported by the RSEM software invo...") |
|||
Line 20: | Line 20: | ||
--est_method RSEM --aln_method bowtie --trinity_mode --coordsort_bam \ | --est_method RSEM --aln_method bowtie --trinity_mode --coordsort_bam \ | ||
--output_dir /ibers/ernie/scratch/userName/outputName --prep_reference | --output_dir /ibers/ernie/scratch/userName/outputName --prep_reference | ||
+ | |||
+ | The outputs generated from running the command above will exist in the RSEM_name/ directory, as is indicated in the --output_dir parameter above. Since this is the first time of running RSEM, you need to prepare the Trinity.fasta file for alignment with the bowtie software, and so you have tol simply include the --prep_reference parameter to invoke that functionality. | ||
+ | |||
+ | The primary output generated by RSEM is the file containing the expression values for each of the transcripts. Examine this file like so: | ||
+ | |||
+ | head RSEM_name/name.isoforms.results | column -t | ||
+ | |||
+ | transcript_id gene_id length effective_length expected_count TPM FPKM IsoPct | ||
+ | TRINITY_DN0_c0_g1_i1 TRINITY_DN0_c0_g1 2010 1881.38 225.00 72.01 65.65 100.00 | ||
+ | TRINITY_DN0_c0_g2_i1 TRINITY_DN0_c0_g2 2011 1882.38 0.00 0.00 0.00 100.00 | ||
+ | TRINITY_DN1000_c0_g1_i1 TRINITY_DN1000_c0_g1 1664 1535.38 208.00 81.57 74.37 100.00 | ||
+ | TRINITY_DN1001_c0_g1_i1 TRINITY_DN1001_c0_g1 2948 2819.38 8.00 1.71 1.56 100.00 | ||
+ | TRINITY_DN1002_c0_g1_i1 TRINITY_DN1002_c0_g1 351 222.54 1.00 2.71 2.47 100.00 | ||
+ | TRINITY_DN1003_c0_g1_i1 TRINITY_DN1003_c0_g1 1913 1784.38 91.00 30.71 28.00 100.00 | ||
+ | TRINITY_DN1004_c0_g1_i1 TRINITY_DN1004_c0_g1 2395 2266.38 167.00 44.37 40.45 100.00 | ||
+ | TRINITY_DN1005_c0_g1_i1 TRINITY_DN1005_c0_g1 3076 2947.38 425.00 86.82 79.16 100.00 | ||
+ | TRINITY_DN1006_c0_g1_i1 TRINITY_DN1006_c0_g1 1141 1012.38 253.00 150.47 137.19 100.00 |
Revision as of 15:09, 6 March 2017
Abundance estimation using RSEM
To estimate the expression levels of the Trinity-reconstructed transcripts, you can use the strategy supported by the RSEM software involving read alignment followed by expectation maximization to assign reads according to maximum likelihood estimates. In essence, you have to first align the original rna-seq reads back against the Trinity transcripts, then run RSEM to estimate the number of rna-seq fragments that map to each contig. Because the abundance of individual transcripts may significantly differ between samples, the reads from each sample (and each biological replicate) must be examined separately, obtaining sample-specific abundance values.
There is a script to facilitate running of RSEM on Trinity transcript assemblies. The script runs the Bowtie aligner to align reads to the Trinity transcripts, and RSEM will then evaluate those alignments to estimate expression values. Again, you need to run this separately for each sample and biological replicate (ie. each pair of fastq files).
#$ -S /bin/sh #$ -cwd #$ -q amd.q,large.q,intel.q #$ -l h_vmem=50G #$ -N RSEM module load perl/5.22.2 module load trinity/2.2.0
RSEM command:
/cm/shared/apps/trinity/2.2.0/util/align_and_estimate_abundance.pl --seqType fq \ --left /ibers/ernie/scratch/userName/Fastq_trimmed/reads_1_1.fq.gz.p \ --right /ibers/ernie/scratch/userName/Fastq_trimmed/reads_1_2.fq.gz.p \ --transcripts /ibers/ernie/scratch/userName/trinity_out_dir/Trinity.fasta --output_prefix RSEM_name \ --est_method RSEM --aln_method bowtie --trinity_mode --coordsort_bam \ --output_dir /ibers/ernie/scratch/userName/outputName --prep_reference
The outputs generated from running the command above will exist in the RSEM_name/ directory, as is indicated in the --output_dir parameter above. Since this is the first time of running RSEM, you need to prepare the Trinity.fasta file for alignment with the bowtie software, and so you have tol simply include the --prep_reference parameter to invoke that functionality.
The primary output generated by RSEM is the file containing the expression values for each of the transcripts. Examine this file like so:
head RSEM_name/name.isoforms.results | column -t
transcript_id gene_id length effective_length expected_count TPM FPKM IsoPct TRINITY_DN0_c0_g1_i1 TRINITY_DN0_c0_g1 2010 1881.38 225.00 72.01 65.65 100.00 TRINITY_DN0_c0_g2_i1 TRINITY_DN0_c0_g2 2011 1882.38 0.00 0.00 0.00 100.00 TRINITY_DN1000_c0_g1_i1 TRINITY_DN1000_c0_g1 1664 1535.38 208.00 81.57 74.37 100.00 TRINITY_DN1001_c0_g1_i1 TRINITY_DN1001_c0_g1 2948 2819.38 8.00 1.71 1.56 100.00 TRINITY_DN1002_c0_g1_i1 TRINITY_DN1002_c0_g1 351 222.54 1.00 2.71 2.47 100.00 TRINITY_DN1003_c0_g1_i1 TRINITY_DN1003_c0_g1 1913 1784.38 91.00 30.71 28.00 100.00 TRINITY_DN1004_c0_g1_i1 TRINITY_DN1004_c0_g1 2395 2266.38 167.00 44.37 40.45 100.00 TRINITY_DN1005_c0_g1_i1 TRINITY_DN1005_c0_g1 3076 2947.38 425.00 86.82 79.16 100.00 TRINITY_DN1006_c0_g1_i1 TRINITY_DN1006_c0_g1 1141 1012.38 253.00 150.47 137.19 100.00