Difference between revisions of "Blast2go"

From IBERS Bioinformatics and HPC Wiki
Jump to: navigation, search
Line 32: Line 32:
 
'''Steps to follow for a very large fasta file'''
 
'''Steps to follow for a very large fasta file'''
  
1) Split into N number of files using python script
+
You will probably wish to do this in your scratch directory
 +
 
 +
1) Split into N number of files using python script  
  
 
   [mjv08@bert ~]$ python fastaSplitByNumSubsets.py  
 
   [mjv08@bert ~]$ python fastaSplitByNumSubsets.py  
 
   Split fasta file into a number of subsets
 
   Split fasta file into a number of subsets
 
   USAGE: $0 <fasta file> <num of subsets> <prefix for subsets>
 
   USAGE: $0 <fasta file> <num of subsets> <prefix for subsets>
 +
 +
e.g.
 +
 +
  [mjv08@bert ~]$ python fastaSplitByNumSubsets.py file.fasta 300 split
 +
 +
2) Blast your split file. You will probably need to write a slightly more complicated SGE script to do this;
 +
 +
  #$ -S /bin/sh
 +
 +
  #$ -q amd.q
 +
  #$ -cwd
 +
  #$ -l h_vmem=20G,h_stack=2G
 +
 +
  module load BLAST
 +
 +
  echo blastx -db /ibers/ernie/scratch/databases/db/nr -query $FILENAME -out $FILENAME".xml" -evalue 1e-3 -outfmt 5 -show_gis | sh

Revision as of 11:22, 16 December 2013

Running blast2go

We have a blast2go server accessible within the Aberystwyth University network. You need to run the blast2go java program located here;

blast2go.com

You can select the amount of memory you wish to use on your own machine by selecting the value in the download box. Note, you will require java installed on your machine. This can be downloaded from [java.com].

The, you will run the blast2go download. In Tools >> General Settings >> DataAccess Settings you can enter the details of the local server;

Blast2goagain.jpg

Running blast2go from the HPC command line

THIS IS WORK IN PROGRESS AND IS NOT TESTED. THIS IS JUST COMPILED NOTES FROM VARIOUS SOURCES IN ORDER TO TEST THE SEQUENCE. - mjv08 - 26-07-13

This assumes you have a fasta file and you wish to recreate the default setting of blast2go. This would include the blast, mapping and annotation steps with default blast2go settings.

Let's blast it;

  blastall -p blastx -d nr -i myseqs.fasta -e 0.001 -m 7 -o blastResult.xml -I T -v 20 -b 20 -F L

where myseqs.fasta is the file you're blasting and blastResults.xml is your blasted results.

NOTE: According to the literature it says 'Note: Do not create xml-files with more than 100 xml-results. Pasrsing them will otherwise be difficult'

The next bit is to run the blast2go pipeline, which is the command line version;

 java -Xms256m -Xmx512m -jar blast2go.jar -in blastResult.xml -v -a -out MyAnnot -d MyAnnot.dat -p b2gPipe.properties


Steps to follow for a very large fasta file

You will probably wish to do this in your scratch directory

1) Split into N number of files using python script

 [mjv08@bert ~]$ python fastaSplitByNumSubsets.py 
 Split fasta file into a number of subsets
 USAGE: $0 <fasta file> <num of subsets> <prefix for subsets>

e.g.

 [mjv08@bert ~]$ python fastaSplitByNumSubsets.py file.fasta 300 split

2) Blast your split file. You will probably need to write a slightly more complicated SGE script to do this;

 #$ -S /bin/sh
 #$ -q amd.q
 #$ -cwd
 #$ -l h_vmem=20G,h_stack=2G
 module load BLAST
 echo blastx -db /ibers/ernie/scratch/databases/db/nr -query $FILENAME -out $FILENAME".xml" -evalue 1e-3 -outfmt 5 -show_gis | sh