Difference between revisions of "5. Trimming"

From IBERS Bioinformatics and HPC Wiki
Jump to: navigation, search
Line 21: Line 21:
 
  data/2cells_1.trim.fastq data/2cells_1.trim.unpaired.fastq data/2cells_2.trim.fastq data/2cells_2.trim.unpaired.fastq \
 
  data/2cells_1.trim.fastq data/2cells_1.trim.unpaired.fastq data/2cells_2.trim.fastq data/2cells_2.trim.unpaired.fastq \
 
  LEADING:20 TRAILING:20 AVGQUAL:20 MINLEN:25
 
  LEADING:20 TRAILING:20 AVGQUAL:20 MINLEN:25
 +
The arguments are:
 +
PE: specify that reads are paired end
 +
-threads: number of threads
 +
-phred33: quality scale
 +
paired forward reads output file
 +
unpaired forward reads output file
 +
paired reverse reads output file
 +
unpaired reverse reads output file  LEADING: quality threshold for removing nucleotides from the 5’ end  TRAILING: quality threshold for removing nucleotides from the 3’ end  AVGQUAL: mean read quality threshold  MINLEN: minimum read length

Revision as of 16:02, 27 January 2016

We will use Trimmomatic for read trimming and filtering, and adapters removal.

Let’s see its general usage. To run trimmomatic you need to specify its full path: /cm/shared/apps/trimmomatic/0.33/trimmomatic-0.33.jar):

$ java -jar /cm/shared/apps/trimmomatic/0.33/trimmomatic-0.33.jar -h

Usage:

 PE [-threads <threads>] [-phred33|-phred64] [-trimlog <trimLogFile>] [-basein <inputBase> | <inputFile1> <inputFile2>] 
      [-baseout <outputBase> | <outputFile1P> <outputFile1U><outputFile2P> <outputFile2U>] <trimmer1>...

or:

SE [-threads <threads>] [-phred33|-phred64] [-trimlog <trimLogFile>] <inputFile> <outputFile> <trimmer1>...

For paired end reads (PE), after specifying some options, the arguments are the input forward and reverse fastq files, the names of the paired and unpaired remaining reads after trimming for the forward reads, and the names of the paired and unpaired remaining reads after trimming for the reverse reads. Finally, you need to specify the parameters for the trimming operations. Since the Trimmomatic output files are fastq files, you must be careful of not mixing them with the original un-trimmed fastq files, or to overwrite the un-trimmed fastq files. You can create a specific folder for the trimmed files, or use specific names that will remind you what they are. Let’s try for the forward 2-cells fastq file. We will use again the submission script but with larger memory limits (lets say 40GB. The java VM consumes a lot of memory) and the running command for trimmomatic:

#$ -S /bin/sh
#$ -cwd
#$ -q amd.q,large.q,intel.q
#$ -l h_vmem=40G
#$ -e run_trimmomatic.e
#$ -N run_trimmomatic
#$ -o run_trimmomatic.o
java -jar /cm/shared/apps/trimmomatic/0.33/trimmomatic-0.33.jar PE -threads 2 -phred33 data/2cells_1.fastq data/2cells_2.fastq \
data/2cells_1.trim.fastq data/2cells_1.trim.unpaired.fastq data/2cells_2.trim.fastq data/2cells_2.trim.unpaired.fastq \
LEADING:20 TRAILING:20 AVGQUAL:20 MINLEN:25

The arguments are: PE: specify that reads are paired end -threads: number of threads -phred33: quality scale paired forward reads output file unpaired forward reads output file paired reverse reads output file unpaired reverse reads output file LEADING: quality threshold for removing nucleotides from the 5’ end TRAILING: quality threshold for removing nucleotides from the 3’ end AVGQUAL: mean read quality threshold MINLEN: minimum read length