3. Running Trinity

From IBERS Bioinformatics and HPC Wiki
Jump to: navigation, search

In silico read normalisation:

Large rna-seq data sets will have a large excess of reads corresponding to moderately and highly expressed transcripts, and these are far more than what are needed for their assembly. By removing the excess reads, we can lower memory consumption and speed up the assembly process. In silico normalization is an effective way of identifying and removing those excess reads. (link to diginorm). There is a process integrated into Trinity to perform in silico normalisation.

#$ -S /bin/sh
#$ -cwd
#$ -q large.q
#$ -l h_vmem=210G
#$ -N deNovo
#$ -pe multithread 8
module load perl/5.22.2
module load trinity/2.2.0
##Trinity command:
Trinity --seqType fq --max_memory 200G --left /ibers/ernie/scratch/userName/Fastq_trimmed/Pn_AL_l1_1.fq.gz.p,\
/ibers/ernie/scratch/userName/Fastq_trimmed/reads_1_1.fq.gz.p,\
/ibers/ernie/scratch/userName/Fastq_trimmed/reads_1_1.fq.gz.p,\
/ibers/ernie/scratch/userName/Fastq_trimmed/reads_1_1.fq.gz.p \ 
--right /ibers/ernie/scratch/userName/Fastq_trimmed/reads_1_2.fq.gz.p,\
/ibers/ernie/scratch/userName/Fastq_trimmed/reads_1_2.fq.gz.p,\
/ibers/ernie/scratch/userName/Fastq_trimmed/reads_1_2.fq.gz.p,\
/ibers/ernie/scratch/userName/Fastq_trimmed/reads_1_2.fq.gz.p --CPU 8 --normalize_reads

Trinity is memory greedy, so use as much as RAM is possible! Don't use more than 8 cores on bert. It could be crashed and the error message is not very helpful.

The assembly output is in a Trinity_out folder.