9.1 Genome indexing for bwa

From IBERS Bioinformatics and HPC Wiki
Jump to: navigation, search

Let’s first look at the bwa options:

$ module load bwa/0.7.12
$ bwa

The bwa suite provides a number of commands for various steps of the alignment procedure. The general usage is:

bwa <command> [options]

Among the available commands there is index for genome indexing, and aln, samse, sampe and mem for read mapping. To see the general usage of a particular command type bwa followed by the name of the command:

$ bwa index

Usage:

bwa index [-a bwtsw|is] [-c] <in.fasta>

You need to specify the genomic sequence file (in.fasta) and a label to identify the index (index), which will be the prefix of all files written by bwa index. You already copied the folder containing the genomic sequence of the zebrafish chromosome 12. To keep things well organized, you can create a folder to store the index (name it for example bwaIndex or any other name you like):

$ mkdir bwaMapping

Once you created the folder, let’s open a text file with a text editor. You can call the file any name you want, for example bwaindex.sh. Write at the beginning of the file the header for the scheduler, and then load the required modules:

#$ -S /bin/sh
#$ -cwd
#$ -q amd.q,large.q,intel.q
#$ -l h_vmem=16G
#$ -e bwa_idex.e
#$ -N bwa_index
#$ -o bwa_index.o
module load bwa/0.7.12

cd /pico/home/userexternal/fferre00/zebrafish/bwaMapping

Now you can write the command to index the genome:

bwa index -p danRer10 /ibers/ernie/scratch/vpl/zebra_fish/bwaMapping/danRer10.fa

Parameters:

path to genome: /ibers/ernie/scratch/vpl/zebra_fish/bwaMapping/danRer10.fa

-p: the prefix that I chose for all files written by bwa index (but you can chose a different one) Save and close the file. Now you can submit it to the scheduler using the qsub command, like this:

$ qsub bwaindex.sh

Once the job is finished, look at the content of the folder (with ls –l). The bunch of files having prefix danRer10 contain the index of the genome (or in your case only of the chromosome 12) in a format that bwa can use.