Difference between revisions of "Diamond Blast"

Latest revision as of 10:41, 23 June 2017

This Tutorial will explain how to run Diamond Blast

Step 1: Gather sequences

... for what will be the "database" (for instance, all ortholog sequences downloaded from Ensembl, or NCBI) and for the "query" (everything you want to match against the database). Everything has to be in fasta format. Like in BLAST, you can have DNA or protein, and depending on what you have, you can call different commands (blastx, blastp, etc).

Step 2: get the Diamond suite.

It is also available in the system by calling:

 module load diamond;

Or have the manual handy here:

https://github.com/bbuchfink/diamond

Step3 : Create a database.

In this example I'm using a whole bunch of amino acid sequences downloaded from EGGNOG in fasta format. I'll call my database "eggnog". Note: "-p" I think it's the number of processors you can use (parallelised).

 module load diamond;
 diamond makedb --in eggnogv4.proteins.all.fa -p 10 -d eggnog

Step 4: BLAST like in BLAST.

In this case my query is a DNA file "Example.fasta" and my database is protein. So I use "blastx". If you used aminoacid vs amino acid, you use "blastp". It doesn't do nucleotide vs nucleotide ("blastn"). You can carry out the blast by directly calling:

 diamond blastx -d eggnog -q Example.fasta -a Example_v_Eggnog -p 10 -t /your/path/here/tmp

Where the output file is especified in the -a option. In my case I called it "Example_v_Eggnog".

Note: you have to give it a temporary directory to write files it creates. In the example my path is especified with -t /your/path/here/tmp

You can also submit in the queue if you need a lot of resources.

Note: You might have to call "module load diamond" before submitting it in the queue (qsub).

Step 5: Transform the .daa file into a tabular readable format

If you want the output like in BLAST tabular format use the command:

 diamond view -a Example_v_Eggnog.daa -o Example_v_Eggnog.m8

where "-a" is the input file (so the result from the BLAST, usually ends in .daa) and "-o" is how you want to call the output.

Extra optional Step 6: Summarise results

To summarise the results, i.e. taking the unique genes identified and their best hit (and only taking things with higher than 60% percentage ID) :

 cat Example_v_Eggnog.m8 | awk 'BEGIN{prev=""};{if($1 != prev) { prev=$1; if($12 > 60) { print $0 }}}' > Example_v_Eggnog.m8.besthits

@@ Line 3: / Line 3: @@
 == Step 1: Gather sequences ==
-... for what will be the "database" (for instance, all ortholog sequences downloaded from Ensembl, or NCBI) and for the "query" (everything you want to match against the database). Everything has to be in fasta format. Like in BLAST, you can have DNA or protein, and depending on what you have, you can call different commands (Blastx, blastp, etc).
+... for what will be the ''"database"'' (for instance, all ortholog sequences downloaded from Ensembl, or NCBI) and for the ''"query"'' (everything you want to match against the database). Everything has to be in fasta format. Like in BLAST, you can have DNA or protein, and depending on what you have, you can call different commands (''blastx'', ''blastp'', etc).
 == Step 2: get the Diamond suite. ==
@@ Line 20: / Line 19: @@
 == Step3 : Create a database. ==
-In this example I'm using a whole bunch of amino acid sequences downloaded from EGGNOG in fasta format. I'll call my database ''eggnog''. Note: ''-p'''''Bold text''' I think it's the number of processors you can use (parallelised).
+In this example I'm using a whole bunch of amino acid sequences downloaded from EGGNOG in fasta format. I'll call my database ''"eggnog"''. Note: ''"-p"'' I think it's the number of processors you can use (parallelised).
    module load diamond;
@@ Line 27: / Line 26: @@
 == Step 4: BLAST like in BLAST. ==
-In this case my query is a DNA file "Example.fasta" and my database is protein. So I use `blastx`. If you used DNA vs DNA you use `blastn`; if you used aminoacid vs amino acid, you use `blastp`.  You can perform the blast by directly calling:
+In this case my query is a DNA file "Example.fasta" and my database is protein. So I use ''"blastx"''. If you used aminoacid vs amino acid, you use ''"blastp"''. It '''doesn't do''' nucleotide vs nucleotide (''"blastn"''). You can carry out the blast by directly calling:
    diamond blastx -d eggnog -q Example.fasta -a Example_v_Eggnog -p 10 -t /your/path/here/tmp
-Where the output file is especified in the ''-a'' option. In my case I called it "Example_v_Eggnog".
+Where the output file is especified in the ''-a'' option. In my case I called it ''"Example_v_Eggnog"''.
-Note: you have to give it a temporary directory to write files it creates. In the example my path is especified with ''-t /your/path/here/tmp''
+'''Note''': you have to give it a temporary directory to write files it creates. In the example my path is especified with ''-t /your/path/here/tmp''
 You can also submit in the queue if you need a lot of resources.
->Note: You might have to call "module load diamond" before submitting it in the queue (''qsub'').
+'''Note''': You might have to call ''"module load diamond"'' before submitting it in the queue (''qsub'').
 == Step 5: Transform the .daa file into a tabular readable format ==
@@ Line 43: / Line 43: @@
 If you want the output like in BLAST tabular format use the command:
-```
+  diamond view -a Example_v_Eggnog.daa -o Example_v_Eggnog.m8
-diamond view -a Example_v_Eggnog.daa -o Example_v_Eggnog.m8
-```
-where `-a` is the input file (so the result from the BLAST, usually ends in .daa) and `-o` is how you want to call the output.
+where ''"-a"'' is the input file (so the result from the BLAST, usually ends in .daa) and ''"-o"'' is how you want to call the output.
 == Extra optional Step 6: Summarise results ==
@@ Line 56: / Line 52: @@
 To summarise the results, i.e. taking the unique genes identified and their best hit (and only taking things with higher than 60% percentage ID) :
-```
+  cat Example_v_Eggnog.m8 | awk 'BEGIN{prev=""};{if($1 != prev) { prev=$1; if($12 > 60) { print $0 }}}' > Example_v_Eggnog.m8.besthits
-cat Example_v_Eggnog.m8 | awk 'BEGIN{prev=""};{if($1 != prev) { prev=$1; if($12 > 60) { print $0 }}}' > Example_v_Eggnog.m8.besthits
-```

Difference between revisions of "Diamond Blast"

Latest revision as of 10:41, 23 June 2017

Contents

Step 1: Gather sequences

Step 2: get the Diamond suite.

Step3 : Create a database.

Step 4: BLAST like in BLAST.

Step 5: Transform the .daa file into a tabular readable format

Extra optional Step 6: Summarise results

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools