Difference between revisions of "5.Identification and masking of repeat elements"
Line 13: | Line 13: | ||
Most options are needed only for particular cases, and you can generally ignore them. Let’s run repeatmodeler on our genome. To do that, we will write the following line in the script that we show in the previous chapter (3. Your environment). | Most options are needed only for particular cases, and you can generally ignore them. Let’s run repeatmodeler on our genome. To do that, we will write the following line in the script that we show in the previous chapter (3. Your environment). | ||
− | #$ -S /bin/sh | + | #$ -S /bin/sh |
− | #$ -cwd | + | #$ -cwd |
− | #$ -q amd.q,large.q,intel.q | + | #$ -q amd.q,large.q,intel.q |
− | #$ -l h_vmem=20G | + | #$ -l h_vmem=20G |
− | #$ -e RepeatmodDB.e | + | #$ -e RepeatmodDB.e |
− | #$ -o RepeatmodDB.o | + | #$ -o RepeatmodDB.o |
− | #$ -N RMDatabase | + | #$ -N RMDatabase |
− | + | module load repeatmodeler/1.0.7 | |
− | module load repeatmodeler/1.0.7 | + | BuildDatabase -name Lp_v1_database -engine ncbi /ibers/ernie/scratch/seb19/Lperenne_V1/Lp_v1.fa |
− | |||
− | BuildDatabase -name Lp_v1_database -engine ncbi /ibers/ernie/scratch/seb19/Lperenne_V1/Lp_v1.fa |
Latest revision as of 17:26, 19 March 2016
Repeat identification Usually the first step for the genome annotation is the repeat identification and masking. With the term of "repeat" we mean different type of sequences like: Low complexity sequences as homopolymeric runs of nucleotides, transposable elements, viruses, long interspersed nuclear elements (LINEs) and short interspersed nuclear elements (SINEs).
The masking of a genome consists of two steps: 1) The built of the repeats data base and 2) the masking by using the data base.
For the construction of the repeat database we are using the RepeatModeler. RepeatModeler is a de-novo repeat family identification and modeling package.
Now, we have to load the RepeatModeler module and see the options of the program by typing the following commands:
$ module load repeatmodeler/1.0.7 $ repeatmodeler -h
Most options are needed only for particular cases, and you can generally ignore them. Let’s run repeatmodeler on our genome. To do that, we will write the following line in the script that we show in the previous chapter (3. Your environment).
#$ -S /bin/sh #$ -cwd #$ -q amd.q,large.q,intel.q #$ -l h_vmem=20G #$ -e RepeatmodDB.e #$ -o RepeatmodDB.o #$ -N RMDatabase module load repeatmodeler/1.0.7 BuildDatabase -name Lp_v1_database -engine ncbi /ibers/ernie/scratch/seb19/Lperenne_V1/Lp_v1.fa