Difference between revisions of "6.1 Genome indexing for bowtie"

From IBERS Bioinformatics and HPC Wiki
Jump to: navigation, search
(Created page with "To map reads to a reference genome, all mapping tools require the genome to be converted into particular structures for quick access and to keep the ...")
 
Line 1: Line 1:
 
To  map  reads  to  a  reference  genome,  all  mapping  tools  require  the  genome  to  be  converted  into  particular  structures  for  quick  access  and  to  keep  the  memory  footprint  small. In this tutorial we will use '''tophat2''' for the transcriptome reconstruction, which  in turn requires '''bowtie2''' to run for the read mapping to the genome. '''Bowtie2''' needs  the  genome  to  be  indexed  using  the  BurrowsAWheeler  transform,  and  provides  a  tool  ('''bowtie2-build''')  to  obtain  this  transformation  starting  from  the  genome  sequence  stored  in  a  text  file  in  fasta  format.  You  will  also  try  another  mapper,  '''STAR''',  which  performs quite well and can be easily integrated with '''cufflinks/cuffdif''', as you will  see later.
 
To  map  reads  to  a  reference  genome,  all  mapping  tools  require  the  genome  to  be  converted  into  particular  structures  for  quick  access  and  to  keep  the  memory  footprint  small. In this tutorial we will use '''tophat2''' for the transcriptome reconstruction, which  in turn requires '''bowtie2''' to run for the read mapping to the genome. '''Bowtie2''' needs  the  genome  to  be  indexed  using  the  BurrowsAWheeler  transform,  and  provides  a  tool  ('''bowtie2-build''')  to  obtain  this  transformation  starting  from  the  genome  sequence  stored  in  a  text  file  in  fasta  format.  You  will  also  try  another  mapper,  '''STAR''',  which  performs quite well and can be easily integrated with '''cufflinks/cuffdif''', as you will  see later.
 +
 +
Let’s first look at the bowtie2-build options:
 +
 +
$ module load bowtie2
 +
$ bowtie2-build --help
 +
 +
The general usage is:
 +
 +
Usage:
 +
bowtie2-build [options]* <reference_in> <bt2_index_base>
 +
 +
You  need  to  specify  the  genomic  sequence  file  ('''reference_in''')  and  a  label  to  identify
 +
the index ('''bt2_index_base'''), which will be the prefix of all files written by '''bowtie2-build'''.
 +
Copy a folder containing the genomic sequence with the following command:
 +
 +
$ cp –r /pico/home/userexternal/fferre00/tutorial/genome .
 +
  Check the content of the genome folder that you just copied:
 +
 +
$ ls –l genome

Revision as of 10:55, 4 February 2016

To map reads to a reference genome, all mapping tools require the genome to be converted into particular structures for quick access and to keep the memory footprint small. In this tutorial we will use tophat2 for the transcriptome reconstruction, which in turn requires bowtie2 to run for the read mapping to the genome. Bowtie2 needs the genome to be indexed using the BurrowsAWheeler transform, and provides a tool (bowtie2-build) to obtain this transformation starting from the genome sequence stored in a text file in fasta format. You will also try another mapper, STAR, which performs quite well and can be easily integrated with cufflinks/cuffdif, as you will see later.

Let’s first look at the bowtie2-build options:

$ module load bowtie2
$ bowtie2-build --help

The general usage is:

Usage:

bowtie2-build [options]* <reference_in> <bt2_index_base>

You need to specify the genomic sequence file (reference_in) and a label to identify the index (bt2_index_base), which will be the prefix of all files written by bowtie2-build. Copy a folder containing the genomic sequence with the following command:

$ cp –r /pico/home/userexternal/fferre00/tutorial/genome .
 Check the content of the genome folder that you just copied: 

$ ls –l genome