Difference between revisions of "6. Construct an expression matrix"

From IBERS Bioinformatics and HPC Wiki
Jump to: navigation, search
(Created page with "Now, given the expression estimates for each of the transcripts in each of the samples, you're going to pull together expression values into matrices containing transcript IDs...")
 
Line 25: Line 25:
 
                                               A                      B                    C                      D                      E   
 
                                               A                      B                    C                      D                      E   
 
  TRINITY_DN323_c0_g1_i1    846                782                792                1403                1397                 
 
  TRINITY_DN323_c0_g1_i1    846                782                792                1403                1397                 
  TRINITY_DN2438_c0_g1_i1  418                364                353                  13                     10                   
+
  TRINITY_DN2438_c0_g1_i1  418                364                353                  13                   10                   
  TRINITY_DN4819_c0_g1_i1  136                128                165                   58                       64                 
+
  TRINITY_DN4819_c0_g1_i1  136                 128               165                   58                 64                 
  TRINITY_DN1223_c0_g1_i1     7                   4                      6                   6                         9
+
  TRINITY_DN1223_c0_g1_i1   7                     4                      6                     6                     9

Revision as of 15:32, 6 March 2017

Now, given the expression estimates for each of the transcripts in each of the samples, you're going to pull together expression values into matrices containing transcript IDs in the rows, and sample replicate names in the columns. you'll make two matrices, one containing the estimated counts, and another containing the TPM expression values that are cross-sample normalized using the TMM method. This is all done for you by the following script in Trinity, indicating the method you used for expression estimation and providing the list of individual sample abundance estimate files:

#$ -S /bin/sh
#$ -cwd
#$ -q large.q
#$ -l h_vmem=300G
#$ -N Matrix

Module you have to load:

module load perl/5.22.2
module load trinity/2.2.0
module load R/R-3.1.2
/cm/shared/apps/trinity/2.2.0/util/abundance_estimates_to_matrix.pl --est_method RSEM --out_prefix S5_Trinity_trans_all \
--name_sample_by_basedir /ibers/ernie/scratch/userName/RSEM_A/A.isoforms.results \
/ibers/ernie/scratch/userName/RSEM_B/B.isoforms.results \
/ibers/ernie/scratch/userName/RSEM_C/Cl.isoforms.results \
/ibers/ernie/scratch/userName/RSEM_D/D.isoforms.results \
/ibers/ernie/scratch/userName/RSEM_E/E.isoforms.results \

You should find a matrix file called 'Trinity_trans.counts.matrix', which contains the counts of RNA-Seq fragments mapped to each transcript. Examine the first few lines of the counts matrix:

head -n5 Trinity_trans.counts.matrix | column -t
                                              A                       B                    C                      D                      E  
TRINITY_DN323_c0_g1_i1    846                782                 792                 1403                1397                
TRINITY_DN2438_c0_g1_i1  418                364                353                   13                    10                  
TRINITY_DN4819_c0_g1_i1  136                 128                165                   58                  64                 
TRINITY_DN1223_c0_g1_i1   7                     4                      6                      6                     9