Diamond Blast

From IBERS Bioinformatics and HPC Wiki
Jump to: navigation, search

This Tutorial will explain how to run Diamond Blast

Step 1: Gather sequences

... for what will be the "database" (for instance, all ortholog sequences downloaded from Ensembl, or NCBI) and for the "query" (everything you want to match against the database). Everything has to be in fasta format. Like in BLAST, you can have DNA or protein, and depending on what you have, you can call different commands (blastx, blastp, etc).

Step 2: get the Diamond suite.

It is also available in the system by calling:

 module load diamond;


Or have the manual handy here:

https://github.com/bbuchfink/diamond

Step3 : Create a database.

In this example I'm using a whole bunch of amino acid sequences downloaded from EGGNOG in fasta format. I'll call my database "eggnog". Note: "-p" I think it's the number of processors you can use (parallelised).

 module load diamond;
 diamond makedb --in eggnogv4.proteins.all.fa -p 10 -d eggnog

Step 4: BLAST like in BLAST.

In this case my query is a DNA file "Example.fasta" and my database is protein. So I use "blastx". If you used aminoacid vs amino acid, you use "blastp". It doesn't do nucleotide vs nucleotide ("blastn"). You can carry out the blast by directly calling:

 diamond blastx -d eggnog -q Example.fasta -a Example_v_Eggnog -p 10 -t /your/path/here/tmp

Where the output file is especified in the -a option. In my case I called it "Example_v_Eggnog".

Note: you have to give it a temporary directory to write files it creates. In the example my path is especified with -t /your/path/here/tmp

You can also submit in the queue if you need a lot of resources.

Note: You might have to call "module load diamond" before submitting it in the queue (qsub).

Step 5: Transform the .daa file into a tabular readable format

If you want the output like in BLAST tabular format use the command:

 diamond view -a Example_v_Eggnog.daa -o Example_v_Eggnog.m8

where "-a" is the input file (so the result from the BLAST, usually ends in .daa) and "-o" is how you want to call the output.

Extra optional Step 6: Summarise results

To summarise the results, i.e. taking the unique genes identified and their best hit (and only taking things with higher than 60% percentage ID) :

 cat Example_v_Eggnog.m8 | awk 'BEGIN{prev=""};{if($1 != prev) { prev=$1; if($12 > 60) { print $0 }}}' > Example_v_Eggnog.m8.besthits