Difference between revisions of "6.2 Read mapping using tophat"

From IBERS Bioinformatics and HPC Wiki
Jump to: navigation, search
Line 27: Line 27:
  
 
Now write in the script file the tophat command (all in one line):  
 
Now write in the script file the tophat command (all in one line):  
  tophat --solexa-quals -g 2 --library-type fr-unstranded -o tophat_out/2cells bt2Index/ZV9 data/2cells_1.trim.fastq data/2cells_2.trim.fastq
+
  tophat --solexa-quals -g 2 --library-type fr-unstranded -o tophat_out/2cells bt2Index/ZV9 data/2cells_1.trim.fastq \
 +
data/2cells_2.trim.fastq

Revision as of 13:36, 4 February 2016

There are numerous tools performing short read alignment and the choice of aligner should be carefully made according to the analysis goals/requirements. Here you will use tophat2, a widely used ultrafast aligner that performs spliced alignments. Let’s first load the tophat module so that you can see its several parameters:

$ module load tophat/2.0.14
$ tophat --help

The general format of the tophat command is:

$ tophat [options]* <index_base> <reads_1> <reads_2>

where the last two arguments are the fastq files of the paired end trimmed reads, and the argument before is the prefix of the indexed genome which in your case is Z12 (unless you choose a different one). Let’s create a folder in which you will run and store the tophat output:

$ mkdir tophat_out

Now open a text file with a text editor that you will use to align the 2-cells zebrafish embryo sample, calling the file with a meaningful name (for example tophat_2cells.sh), and write the header for the scheduler as you did before for running bowtie2-build. Load the needed modules and point the script to the folder you just created. You script should look something like this:

#$ -S /bin/sh
#$ -cwd
#$ -q amd.q,large.q,intel.q
#$ -l h_vmem=16G
#$ -e run_tophat_2cells.e
#$ -N tophat_2cells
#$ -o tophat_2cells.o
module load tophat/2.0.14
module load samtools/0.1.19
cd /ibers/ernie/home/vpl/zebrafish

Here you need to take an extra step to solve a compatibility issue between tophat and some versions of samtools (a suite of utilities for handling SAM/BAM files). You will need to add to your path an older version of samtools. Write the following in your script (in one line):

export PATH=//cineca/prod/applications/samtools/0.1.19/gnu-- 4.8.3/bin/:$PATH

Now write in the script file the tophat command (all in one line):

tophat --solexa-quals -g 2 --library-type fr-unstranded -o tophat_out/2cells bt2Index/ZV9 data/2cells_1.trim.fastq \
data/2cells_2.trim.fastq