Difference between revisions of "Diamond Blast"
(→Step3 : Create a database.) |
(→Step 4: BLAST like in BLAST.) |
||
(10 intermediate revisions by the same user not shown) | |||
Line 3: | Line 3: | ||
== Step 1: Gather sequences == | == Step 1: Gather sequences == | ||
− | ... for what will be the "database" (for instance, all ortholog sequences downloaded from Ensembl, or NCBI) and for the "query" (everything you want to match against the database). Everything has to be in fasta format. Like in BLAST, you can have DNA or protein, and depending on what you have, you can call different commands ( | + | ... for what will be the ''"database"'' (for instance, all ortholog sequences downloaded from Ensembl, or NCBI) and for the ''"query"'' (everything you want to match against the database). Everything has to be in fasta format. Like in BLAST, you can have DNA or protein, and depending on what you have, you can call different commands (''blastx'', ''blastp'', etc). |
− | |||
== Step 2: get the Diamond suite. == | == Step 2: get the Diamond suite. == | ||
Line 20: | Line 19: | ||
== Step3 : Create a database. == | == Step3 : Create a database. == | ||
− | In this example I'm using a whole bunch of amino acid sequences downloaded from EGGNOG in fasta format. I'll call my database ''eggnog''. Note: ''-p | + | In this example I'm using a whole bunch of amino acid sequences downloaded from EGGNOG in fasta format. I'll call my database ''"eggnog"''. Note: ''"-p"'' I think it's the number of processors you can use (parallelised). |
module load diamond; | module load diamond; | ||
Line 27: | Line 26: | ||
== Step 4: BLAST like in BLAST. == | == Step 4: BLAST like in BLAST. == | ||
− | In this case my query is a DNA file "Example.fasta" and my database is protein. So I use | + | In this case my query is a DNA file "Example.fasta" and my database is protein. So I use ''"blastx"''. If you used aminoacid vs amino acid, you use ''"blastp"''. It '''doesn't do''' nucleotide vs nucleotide (''"blastn"''). You can carry out the blast by directly calling: |
diamond blastx -d eggnog -q Example.fasta -a Example_v_Eggnog -p 10 -t /your/path/here/tmp | diamond blastx -d eggnog -q Example.fasta -a Example_v_Eggnog -p 10 -t /your/path/here/tmp | ||
− | Where the output file is especified in the ''-a'' option. In my case I called it "Example_v_Eggnog". | + | Where the output file is especified in the ''-a'' option. In my case I called it ''"Example_v_Eggnog"''. |
− | Note: you have to give it a temporary directory to write files it creates. In the example my path is especified with ''-t /your/path/here/tmp'' | + | |
+ | '''Note''': you have to give it a temporary directory to write files it creates. In the example my path is especified with ''-t /your/path/here/tmp'' | ||
You can also submit in the queue if you need a lot of resources. | You can also submit in the queue if you need a lot of resources. | ||
− | + | '''Note''': You might have to call ''"module load diamond"'' before submitting it in the queue (''qsub''). | |
== Step 5: Transform the .daa file into a tabular readable format == | == Step 5: Transform the .daa file into a tabular readable format == | ||
Line 43: | Line 43: | ||
If you want the output like in BLAST tabular format use the command: | If you want the output like in BLAST tabular format use the command: | ||
− | + | diamond view -a Example_v_Eggnog.daa -o Example_v_Eggnog.m8 | |
− | diamond view -a Example_v_Eggnog.daa -o Example_v_Eggnog.m8 | ||
− | |||
− | |||
− | |||
− | |||
+ | where ''"-a"'' is the input file (so the result from the BLAST, usually ends in .daa) and ''"-o"'' is how you want to call the output. | ||
== Extra optional Step 6: Summarise results == | == Extra optional Step 6: Summarise results == | ||
Line 56: | Line 52: | ||
To summarise the results, i.e. taking the unique genes identified and their best hit (and only taking things with higher than 60% percentage ID) : | To summarise the results, i.e. taking the unique genes identified and their best hit (and only taking things with higher than 60% percentage ID) : | ||
− | + | cat Example_v_Eggnog.m8 | awk 'BEGIN{prev=""};{if($1 != prev) { prev=$1; if($12 > 60) { print $0 }}}' > Example_v_Eggnog.m8.besthits | |
− | cat Example_v_Eggnog.m8 | awk 'BEGIN{prev=""};{if($1 != prev) { prev=$1; if($12 > 60) { print $0 }}}' > Example_v_Eggnog.m8.besthits | ||
− |
Latest revision as of 10:41, 23 June 2017
This Tutorial will explain how to run Diamond Blast
Contents
Step 1: Gather sequences
... for what will be the "database" (for instance, all ortholog sequences downloaded from Ensembl, or NCBI) and for the "query" (everything you want to match against the database). Everything has to be in fasta format. Like in BLAST, you can have DNA or protein, and depending on what you have, you can call different commands (blastx, blastp, etc).
Step 2: get the Diamond suite.
It is also available in the system by calling:
module load diamond;
Or have the manual handy here:
https://github.com/bbuchfink/diamond
Step3 : Create a database.
In this example I'm using a whole bunch of amino acid sequences downloaded from EGGNOG in fasta format. I'll call my database "eggnog". Note: "-p" I think it's the number of processors you can use (parallelised).
module load diamond; diamond makedb --in eggnogv4.proteins.all.fa -p 10 -d eggnog
Step 4: BLAST like in BLAST.
In this case my query is a DNA file "Example.fasta" and my database is protein. So I use "blastx". If you used aminoacid vs amino acid, you use "blastp". It doesn't do nucleotide vs nucleotide ("blastn"). You can carry out the blast by directly calling:
diamond blastx -d eggnog -q Example.fasta -a Example_v_Eggnog -p 10 -t /your/path/here/tmp
Where the output file is especified in the -a option. In my case I called it "Example_v_Eggnog".
Note: you have to give it a temporary directory to write files it creates. In the example my path is especified with -t /your/path/here/tmp
You can also submit in the queue if you need a lot of resources.
Note: You might have to call "module load diamond" before submitting it in the queue (qsub).
Step 5: Transform the .daa file into a tabular readable format
If you want the output like in BLAST tabular format use the command:
diamond view -a Example_v_Eggnog.daa -o Example_v_Eggnog.m8
where "-a" is the input file (so the result from the BLAST, usually ends in .daa) and "-o" is how you want to call the output.
Extra optional Step 6: Summarise results
To summarise the results, i.e. taking the unique genes identified and their best hit (and only taking things with higher than 60% percentage ID) :
cat Example_v_Eggnog.m8 | awk 'BEGIN{prev=""};{if($1 != prev) { prev=$1; if($12 > 60) { print $0 }}}' > Example_v_Eggnog.m8.besthits