6. Construct an expression matrix
Now, given the expression estimates for each of the transcripts in each of the samples, you're going to pull together expression values into matrices containing transcript IDs in the rows, and sample replicate names in the columns. you'll make two matrices, one containing the estimated counts, and another containing the TPM expression values that are cross-sample normalized using the TMM method. This is all done for you by the following script in Trinity, indicating the method you used for expression estimation and providing the list of individual sample abundance estimate files:
#$ -S /bin/sh #$ -cwd #$ -q large.q #$ -l h_vmem=300G #$ -N Matrix
Module you have to load:
module load perl/5.22.2 module load trinity/2.2.0 module load R/R-3.1.2
/cm/shared/apps/trinity/2.2.0/util/abundance_estimates_to_matrix.pl --est_method RSEM --out_prefix S5_Trinity_trans_all \ --name_sample_by_basedir /ibers/ernie/scratch/userName/RSEM_A/A.isoforms.results \ /ibers/ernie/scratch/userName/RSEM_B/B.isoforms.results \ /ibers/ernie/scratch/userName/RSEM_C/Cl.isoforms.results \ /ibers/ernie/scratch/userName/RSEM_D/D.isoforms.results \ /ibers/ernie/scratch/userName/RSEM_E/E.isoforms.results \
You should find a matrix file called 'Trinity_trans.counts.matrix', which contains the counts of RNA-Seq fragments mapped to each transcript. Examine the first few lines of the counts matrix:
head -n5 Trinity_trans.counts.matrix | column -t
A B C D E TRINITY_DN323_c0_g1_i1 846 782 792 1403 1397 TRINITY_DN2438_c0_g1_i1 418 364 353 13 10 TRINITY_DN4819_c0_g1_i1 136 128 165 58 64 TRINITY_DN1223_c0_g1_i1 7 4 6 6 9