[maker-devel] Repeats annotation

Carson Holt carsonhh at gmail.com
Wed Sep 13 12:26:08 MDT 2017


I don’t know of any tool to analyze the repeat info. MAKER really only focuses on getting the masking done for the gene prediction, and while it does keep the repeats as features in the GFF3, it does not do any kind of analysis. You would have to do that outside of MAKER.

—Carson


> On Sep 13, 2017, at 8:51 AM, Quanwei Zhang <qwzhang0601 at gmail.com> wrote:
> 
> Dear Carson:
> 
> We have generated species specific repeat library following your pipeline (http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction--Basic <http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction--Basic>). And did genome annotation by maker2 by using both species specific repeat library and mammalian repeat library. 
> 
> Now, we want to do some comparison about the repeat contexts among different species. So I want to generate species specific for other species and also use both their species specific repeat library and mammalian repeat library. But I found, I can only provide either the species specific repeat library or mammalian repeat library to RepeatMasker (not for both). I wonder whether I can run maker2 on those genome but only for repeat masking. 
> 
> BTW, by running RepeatMasker we can get a summary report (as below), I wonder whether there is any script from maker2 to analyze repeats element (or other tools to process the output of maker2). 
> 
> Many thanks
> 
> 
> file name: test_scaffold31.fasta    
> sequences:             1
> total length:     863590 bp  (858757 bp excl N/X-runs)
> GC level:         37.02 %
> bases masked:     301634 bp ( 34.93 %)
> ==================================================
>                number of      length   percentage
>                elements*    occupied  of sequence
> --------------------------------------------------
> SINEs:               134        14362 bp    1.66 %
>       Alu/B1          28         2183 bp    0.25 %
>       MIRs            21         2860 bp    0.33 %
> 
> LINEs:               188       129104 bp   14.95 %
>       LINE1          168       124633 bp   14.43 %
>       LINE2           16         4266 bp    0.49 %
>       L3/CR1           4          205 bp    0.02 %
>       RTE              0            0 bp    0.00 %
> 
> LTR elements:        127       101129 bp   11.71 %
>       ERVL            10         3057 bp    0.35 %
>       ERVL-MaLRs      22         6902 bp    0.80 %
>       ERV_classI      66        80258 bp    9.29 %
>       ERV_classII     29        10912 bp    1.26 %
> 
> DNA elements:         27         4402 bp    0.51 %
>       hAT-Charlie     13         1836 bp    0.21 %
>       TcMar-Tigger     8         1651 bp    0.19 %
> 
> Unclassified:          4         1590 bp    0.18 %
> 
> Total interspersed repeats:    250587 bp   29.02 %
> 
> 
> Small RNA:             9          616 bp    0.07 %
> 
> Satellites:           66        40820 bp    4.73 %
> Simple repeats:      159         7235 bp    0.84 %
> Low complexity:       50         2766 bp    0.32 %
> ==================================================
> 
> * most repeats fragmented by insertions or deletions
>   have been counted as one element
>                                                       
> 
> The query species was assumed to be mammalia      
> RepeatMasker Combined Database: Dfam_Consensus-20170127, RepBase-20170127
>         
> run with rmblastn version 2.2.27+ 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170913/9744da83/attachment-0003.html>


More information about the maker-devel mailing list