[maker-devel] repeats masking

Quanwei Zhang qwzhang0601 at gmail.com
Mon Aug 21 10:51:34 MDT 2017


Dear Carson:

I am trying to build a species specific repeat library for our new rodent
species, following "
http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction--Basic".
But there are somethings not clear to us, would you please explain? Thanks

(1) For the predicted unknown (unclassified) repeat sequences (those in
Modelerunknown.lib), it mentioned "Sequences in Modelerunknown.lib were
searched against a transposase database (derived from RepeatMaske
<http://www.repeatmasker.org/>r) and sequences matching transposase were
considered as transposons belonging to the relevant superfamily".
I wonder how to do this search. Annotate the "unknown" repeat sequences
using the Repeatmaker? Then what to do, if for an "unknown" repeat
sequence, only part of the sequence match the known repeat elements.

(2) To exclude gene fragments, I need map the predicted repeat sequences
against a protein database, and then run the package "ProExcluder"*. *Right?
I wonder how to get such protein database. Since I am working on a new
rodent species, can I use all the rodent proteins from Uniprot (both
Swiss-Prot and TrEMBL)?

(3) After I generate the species specific repeat library, do I still need
to select a model organism for RepBase masking (as shown below).

In the file "maker_opts.ctl"
#-----Repeat Masking (leave values blank to skip repeat masking)
model_org=Mammalia #select a model organism for RepBase masking in
RepeatMasker
rmlib=myRepeat.fa #provide an organism specific repeat library in fasta
format for RepeatMasker

Many thanks

Best
Quanwei

2017-08-18 11:35 GMT-04:00 Carson Holt <carsonhh at gmail.com>:

> Hi Quanwei,
>
> > (1) We are doing genome annotation for a new rodent species, we wonder
> whether we should use repeat library for  "Mammalia" or "rodent"? Which is
> more proper, if we did not construct a species-specific repeat library for
> the new genome?
>
> Over masking can occur, but you should really only worry about it if there
> is a specific gene you are looking for or gene family and you don’t care
> about false positive gene models. On a genome wide level you will find that
> undermasking is almost always the greater danger. So I’d recommend using
> Mammalia. Also you should always build a species specific library when
> working with repeat rich organisms like mammals.
>
>
> > (2) With some concerns as discussed above emails, we did not train a
> species-specific repeat library. Since we have finished the annotation only
> using the repeat library from repeatMasker and Maker2, we wonder whether it
> is worth for us to firstly train a  species-specific repeat library and
> then do the genome annotation again? Will it (i.e., trainning a
> species-specific repeat library) significantly affect the gene annotation
> and downstream analysis (e.g.,  gene family expansion analysis, positive
> selection)?
>
> It might be ok. Both Mammalia and rodent are already rich in related
> species repeats in RepBase. But you still may have a lot of false positives
> because of missed repeats. Repeats and transposable elements tend to create
> false regions of high evidence homology (make it look like you are getting
> evidence for a gene in the region, but when you look at the underlying
> sequence you realize it is a spurious alignment).
>
>
> > (3) We identified some gene families under contraction, but we want to
> confirm those gene families really lost copies in our new genome. Do you
> think it is worth to do the genome annotation without repeat masking, so
> there will not be genes missing from annotation due to repeat mask?
>
> Without repeat masking you will get a lot of false alignments. If you find
> anything without repeat masking you will need to do heavy manual review of
> the alignment and perhaps even domain identification to further weed out
> the many false positives you are sure to get.
>
> —Carson
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170821/1331ab2a/attachment-0003.html>


More information about the maker-devel mailing list