6.4 Differential Expression using cuffdiff

From IBERS Bioinformatics and HPC Wiki
Revision as of 10:41, 18 February 2016 by Vpl (talk | contribs)
Jump to: navigation, search

One of the stand-alone tools that are part of the cufflinks package, and that performs differential expression estimation, is cuffdiff. You can use this tool to compare between two conditions; for example control and disease, or wild-type and mutant, or, as in your case, we want to identify genes that are differentially expressed between two developmental stages. You don’t need to load a different module to use cuffdiff since it is part of the cufflinks module. So let’s just create an output folder to store its results:

$ mkdir cdiff_out

The general format of the cuffdiff command is:

cuffdiff [options]* <transcripts.gtf>\
<sample1_replicate1.sam[,...,sample1_replicateM]>\
<sample2_replicate1.sam[,...,sample2_replicateM.sam]>\

where the input is an annotation file and the aligned reads (either in SAM or BAM format) for the two conditions to be compared. You can now prepare the script for running cuffdiff, by opening a text file (let’s call it cuffdiff.sh), and writing the header, the loading module commands, and pointing to your working space:

#$ -S /bin/sh
#$ -cwd
#$ -q amd.q,large.q,intel.q
#$ -l h_vmem=16G
#$ -e cuffdiff.e
#$ -N cuffdiff
#$ -o cuffdiff.o
module load cufflinks/2.2.1

Now write the command to run cuffdiff (all in one line):

cuffdiff -o cdiff_out -L ZV9_2cells,ZV9_6h -b genome/Danio_rerio.Zv9.66.dna.fa -u \
--library-type fr-unstranded \
annotations/Zebrafish_refGene.gtf tophat_out/2cells/accepted_hits.bam tophat_out/6hours/accepted_hits.bam

The options that you used above are:

-o: output directory,

-L: labels for the different conditions,

-b, -u, --library-type: same meaning as described before for cufflinks