2. Quality trimming

From IBERS Bioinformatics and HPC Wiki
Jump to: navigation, search

In the event that quality trimming is required, we recommend using the Trimmomatic software. According to MacManes (2014), light trimming of reads at or below Phred=5 scores can improve upon the final transcriptome assembly. For example purposes, simply to demonstrate the effects of trimming on this high quality data set, we'll restrict it to the high value of 36, which should remove those initial bases with quality values < 36 and we should see the results of this in a subsequent run of fastqc on the trimmed reads.


#$ -S /bin/sh
#$ -cwd
#$ -q amd.q,large.q,intel.q
#$ -l h_vmem=40G
#$ -N trimming
module load java/jdk1.7.0_40
java -jar /cm/shared/apps/trimmomatic/0.33/trimmomatic-0.33.jar PE -phred33 /ibers/ernie/scratch/userName/Fastq/reads_1_1.fq.gz \
/ibers/ernie/scratch/userName/Fastq/reads_1_2.fq.gz /ibers/ernie/scratch/userName/Fastq_trimmed/reads_1_1.fq.gz.p \
/ibers/ernie/scratch/userName/Fastq_trimmed/reads_1_1.fq.gz.unp /ibers/ernie/scratch/userName/Fastq_trimmed/reads_1_2.fq.gz.p \ 
/ibers/ernie/scratch/userName/Fastq_trimmed/reads_1_2.fq.gz.unp \
ILLUMINACLIP:/cm/shared/apps/trimmomatic/0.33/adapters/TruSeq3-PE-2.fa:2:30:10 LEADING:3 TRAILING:3  SLIDINGWINDOW:4:15 MINLEN:36

Where ".p" is for the paired trimmed reads and "unp" is for the unpaired reads after the trimming.

TIP: Trimmomatic and fastqc are java based software, so give a quite amount of RAM.

Running FASTQC on the quality-trimmed reads Normally, after quality trimming, you would confirm the improved read quality statistics by running FASTQC on the quality-trimmed reads.