Difference between revisions of "5. Transcript expression quantitation"

From IBERS Bioinformatics and HPC Wiki
Jump to: navigation, search
(Created page with "'''Abundance estimation using RSEM''' To estimate the expression levels of the Trinity-reconstructed transcripts, you can use the strategy supported by the RSEM software invo...")
 
Line 20: Line 20:
 
  --est_method RSEM  --aln_method bowtie --trinity_mode --coordsort_bam \
 
  --est_method RSEM  --aln_method bowtie --trinity_mode --coordsort_bam \
 
  --output_dir /ibers/ernie/scratch/userName/outputName --prep_reference
 
  --output_dir /ibers/ernie/scratch/userName/outputName --prep_reference
 +
 +
The outputs generated from running the command above will exist in the RSEM_name/ directory, as is indicated in the --output_dir parameter above. Since this is the first time of running RSEM, you need to prepare the Trinity.fasta file for alignment with the bowtie software, and so you have tol simply include the --prep_reference parameter to invoke that functionality.
 +
 +
The primary output generated by RSEM is the file containing the expression values for each of the transcripts. Examine this file like so:
 +
 +
head RSEM_name/name.isoforms.results | column -t
 +
 +
transcript_id            gene_id              length  effective_length  expected_count  TPM    FPKM    IsoPct
 +
TRINITY_DN0_c0_g1_i1    TRINITY_DN0_c0_g1    2010    1881.38          225.00          72.01  65.65  100.00
 +
TRINITY_DN0_c0_g2_i1    TRINITY_DN0_c0_g2    2011    1882.38          0.00            0.00    0.00    100.00
 +
TRINITY_DN1000_c0_g1_i1  TRINITY_DN1000_c0_g1  1664    1535.38          208.00          81.57  74.37  100.00
 +
TRINITY_DN1001_c0_g1_i1  TRINITY_DN1001_c0_g1  2948    2819.38          8.00            1.71    1.56    100.00
 +
TRINITY_DN1002_c0_g1_i1  TRINITY_DN1002_c0_g1  351    222.54            1.00            2.71    2.47    100.00
 +
TRINITY_DN1003_c0_g1_i1  TRINITY_DN1003_c0_g1  1913    1784.38          91.00          30.71  28.00  100.00
 +
TRINITY_DN1004_c0_g1_i1  TRINITY_DN1004_c0_g1  2395    2266.38          167.00          44.37  40.45  100.00
 +
TRINITY_DN1005_c0_g1_i1  TRINITY_DN1005_c0_g1  3076    2947.38          425.00          86.82  79.16  100.00
 +
TRINITY_DN1006_c0_g1_i1  TRINITY_DN1006_c0_g1  1141    1012.38          253.00          150.47  137.19  100.00

Revision as of 15:09, 6 March 2017

Abundance estimation using RSEM

To estimate the expression levels of the Trinity-reconstructed transcripts, you can use the strategy supported by the RSEM software involving read alignment followed by expectation maximization to assign reads according to maximum likelihood estimates. In essence, you have to first align the original rna-seq reads back against the Trinity transcripts, then run RSEM to estimate the number of rna-seq fragments that map to each contig. Because the abundance of individual transcripts may significantly differ between samples, the reads from each sample (and each biological replicate) must be examined separately, obtaining sample-specific abundance values.

There is a script to facilitate running of RSEM on Trinity transcript assemblies. The script runs the Bowtie aligner to align reads to the Trinity transcripts, and RSEM will then evaluate those alignments to estimate expression values. Again, you need to run this separately for each sample and biological replicate (ie. each pair of fastq files).

#$ -S /bin/sh
#$ -cwd
#$ -q amd.q,large.q,intel.q
#$ -l h_vmem=50G
#$ -N RSEM
module load perl/5.22.2
module load trinity/2.2.0

RSEM command:

/cm/shared/apps/trinity/2.2.0/util/align_and_estimate_abundance.pl --seqType fq \
--left /ibers/ernie/scratch/userName/Fastq_trimmed/reads_1_1.fq.gz.p \
--right /ibers/ernie/scratch/userName/Fastq_trimmed/reads_1_2.fq.gz.p \
--transcripts /ibers/ernie/scratch/userName/trinity_out_dir/Trinity.fasta --output_prefix RSEM_name \
--est_method RSEM  --aln_method bowtie --trinity_mode --coordsort_bam \
--output_dir /ibers/ernie/scratch/userName/outputName --prep_reference

The outputs generated from running the command above will exist in the RSEM_name/ directory, as is indicated in the --output_dir parameter above. Since this is the first time of running RSEM, you need to prepare the Trinity.fasta file for alignment with the bowtie software, and so you have tol simply include the --prep_reference parameter to invoke that functionality.

The primary output generated by RSEM is the file containing the expression values for each of the transcripts. Examine this file like so:

head RSEM_name/name.isoforms.results | column -t
transcript_id            gene_id               length  effective_length  expected_count  TPM     FPKM    IsoPct
TRINITY_DN0_c0_g1_i1     TRINITY_DN0_c0_g1     2010    1881.38           225.00          72.01   65.65   100.00
TRINITY_DN0_c0_g2_i1     TRINITY_DN0_c0_g2     2011    1882.38           0.00            0.00    0.00    100.00
TRINITY_DN1000_c0_g1_i1  TRINITY_DN1000_c0_g1  1664    1535.38           208.00          81.57   74.37   100.00
TRINITY_DN1001_c0_g1_i1  TRINITY_DN1001_c0_g1  2948    2819.38           8.00            1.71    1.56    100.00
TRINITY_DN1002_c0_g1_i1  TRINITY_DN1002_c0_g1  351     222.54            1.00            2.71    2.47    100.00
TRINITY_DN1003_c0_g1_i1  TRINITY_DN1003_c0_g1  1913    1784.38           91.00           30.71   28.00   100.00
TRINITY_DN1004_c0_g1_i1  TRINITY_DN1004_c0_g1  2395    2266.38           167.00          44.37   40.45   100.00
TRINITY_DN1005_c0_g1_i1  TRINITY_DN1005_c0_g1  3076    2947.38           425.00          86.82   79.16   100.00
TRINITY_DN1006_c0_g1_i1  TRINITY_DN1006_c0_g1  1141    1012.38           253.00          150.47  137.19  100.00