Difference between revisions of "5. Trimming"

From IBERS Bioinformatics and HPC Wiki
Jump to: navigation, search
Line 9: Line 9:
 
or:
 
or:
 
  SE [-threads <threads>] [-phred33|-phred64] [-trimlog <trimLogFile>] <inputFile> <outputFile> <trimmer1>...
 
  SE [-threads <threads>] [-phred33|-phred64] [-trimlog <trimLogFile>] <inputFile> <outputFile> <trimmer1>...
 +
For  paired  end  reads  (PE),  after  specifying  some  options,  the  arguments  are  the  input  forward  and  reverse  fastq  files,  the  names  of  the  paired  and  unpaired  remaining  reads  after  trimming  for  the  forward  reads,  and  the  names  of  the  paired  and  unpaired  remaining  reads  after  trimming  for  the  reverse  reads.  Finally,  you  need  to  specify  the  parameters for the trimming operations. Since the Trimmomatic output files are fastq  files,  you  must  be  careful  of  not  mixing  them  with  the  original  un-trimmed  fastq  files,  or  to  overwrite  the  un-trimmed  fastq  files.  You  can  create  a  specific  folder  for  the  trimmed  files,  or  use  specific  names  that  will  remind  you  what  they  are.  Let’s  try  for  the  forward  2-cells fastq file.
 +
We will use again the submission script but with larger memory limits (lets say 40GB. The java VM consumes a lot of memory) and the running command for trimmomatic:
 +
#$ -S /bin/sh
 +
#$ -cwd
 +
#$ -q amd.q,large.q,intel.q
 +
#$ -l h_vmem=40G
 +
#$ -e run_trimmomatic.e
 +
#$ -N run_trimmomatic
 +
#$ -o run_trimmomatic.o
 +
java -jar /cm/shared/apps/trimmomatic/0.33/trimmomatic-0.33.jar PE -threads 2 -phred33 data/2cells_1.fastq data/2cells_2.fastq data/2cells_1.trim.fastq data/2cells_1.trim.unpaired.fastq data/2cells_2.trim.fastq data/2cells_2.trim.unpaired.fastq LEADING:20 TRAILING:20 AVGQUAL:20 MINLEN:25

Revision as of 16:00, 27 January 2016

We will use Trimmomatic for read trimming and filtering, and adapters removal.

Let’s see its general usage. To run trimmomatic you need to specify its full path: /cm/shared/apps/trimmomatic/0.33/trimmomatic-0.33.jar):

$ java -jar /cm/shared/apps/trimmomatic/0.33/trimmomatic-0.33.jar -h

Usage:

 PE [-threads <threads>] [-phred33|-phred64] [-trimlog <trimLogFile>] [-basein <inputBase> | <inputFile1> <inputFile2>] 
      [-baseout <outputBase> | <outputFile1P> <outputFile1U><outputFile2P> <outputFile2U>] <trimmer1>...

or:

SE [-threads <threads>] [-phred33|-phred64] [-trimlog <trimLogFile>] <inputFile> <outputFile> <trimmer1>...

For paired end reads (PE), after specifying some options, the arguments are the input forward and reverse fastq files, the names of the paired and unpaired remaining reads after trimming for the forward reads, and the names of the paired and unpaired remaining reads after trimming for the reverse reads. Finally, you need to specify the parameters for the trimming operations. Since the Trimmomatic output files are fastq files, you must be careful of not mixing them with the original un-trimmed fastq files, or to overwrite the un-trimmed fastq files. You can create a specific folder for the trimmed files, or use specific names that will remind you what they are. Let’s try for the forward 2-cells fastq file. We will use again the submission script but with larger memory limits (lets say 40GB. The java VM consumes a lot of memory) and the running command for trimmomatic:

#$ -S /bin/sh
#$ -cwd
#$ -q amd.q,large.q,intel.q
#$ -l h_vmem=40G
#$ -e run_trimmomatic.e
#$ -N run_trimmomatic
#$ -o run_trimmomatic.o
java -jar /cm/shared/apps/trimmomatic/0.33/trimmomatic-0.33.jar PE -threads 2 -phred33 data/2cells_1.fastq data/2cells_2.fastq data/2cells_1.trim.fastq data/2cells_1.trim.unpaired.fastq data/2cells_2.trim.fastq data/2cells_2.trim.unpaired.fastq LEADING:20 TRAILING:20 AVGQUAL:20 MINLEN:25