[maker-devel] repeats masking
Quanwei Zhang
qwzhang0601 at gmail.com
Mon Jul 31 10:42:04 MDT 2017
Hello:
We are using the Maker2 pipeline to annotating a new genome. We just read
something about the repeat masking from repeatMasker's documents. It
suggests to leave low complexity region unmasked and to do gene annotation
using both masked and unmasked genome. I wonder what your opinion and
suggestions on this? Many thanks
The paragraph below is from
http://www.binfo.ncku.edu.tw/RM/webrepeatmaskerhelp.html
Use in association with gene prediction programs
Predicting genes from a masked sequence faces several problems. First, one
should not mask low complexity regions, e.g. to avoid masking trinucleotide
repeats in coding regions. But even with only interspersed repeats masked,
gene prediction programs may fail to identify exons correctly. As mentioned
above, sometimes tail ends of coding regions may have originated from
transposable elements. Even if no coding regions have been masked, splice
sites may be compromised; e.g. the polypyrimidine region that is part of
the acceptor splice site may be contained within a repeat.
Thus, I generally recommend to run a gene prediction program on unmasked
DNA (as well) and compare the predicted genes and exons with the
RepeatMasker output. Some gene prediction program allow you to force
certain exons out of the predictions (e.g. often the old ORFs of LINE1
elements and endogenous retroviruses are included in genes). Work is also
in progress at several sites to incorporate RepeatMasker into gene
prediction programs, in which cases matches to repeats are weighted in
along with the other parameters used.
Best
Quanwei
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170731/125fa6e6/attachment-0002.html>
More information about the maker-devel
mailing list