8. Expression analysis

From IBERS Bioinformatics and HPC Wiki
Revision as of 16:01, 6 March 2017 by Val1 (talk | contribs) (Created page with "A plethora of tools are currently available for identifying differentially expressed transcripts based on RNA-Seq data, and of these, edgeR and DESeq2 are very popular and hig...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

A plethora of tools are currently available for identifying differentially expressed transcripts based on RNA-Seq data, and of these, edgeR and DESeq2 are very popular and highly accurate. The edgeR software is part of the R Bioconductor package, and we provide support for using it in the Trinity package.

Having biological replicates for each of your samples is crucial for accurate detection of differentially expressed transcripts. Lets say that in your data set, you have three biological replicates for each of your conditions, and in general, having three or more replicates for each experimental condition is highly recommended.

Create a samples.txt file containing the contents below (tab-delimited), indicating the name of the condition followed by the name of the biological replicate. The replicate names must match up with the column headings of your counts matrix:

head -n1 Trinity_trans.counts.matrix | tee samples.txt
A_rep1  A_rep2  A_rep3  B_rep1  B_rep2  B_rep3  C_rep1   C_rep2   D_rep3

Now edit file 'samples.txt' to contain the tab-delimited 2-column format:

sample_name    unique_replicate_name

and it should look like so:

cat samples.txt
A     A_rep1
A     A_rep2
A     A_rep3
B     B_rep1
B     B_rep2
B     B_rep3
C     C_rep1
C     C_rep2
C     C_rep3