[maker-devel] repeats masking
Daniel Ence
dandence at gmail.com
Mon Jul 31 10:53:11 MDT 2017
Hi Quanwei, Running maker on the unmasked genome will probably give you more genes, but won’t be helpful in the end. Maker soft-masks repeats, which prevents blast alignments from being seeded in the masked regions, but still allows them to extend into those regions. This solves the problem missing exons mentioned in the text you sent. There’s an option in the control file to run the ab-inition programs on the unmasked sequence (“unmask”) which is set to false (0) by default.
Hope this helps,
Daniel Ence
> On Jul 31, 2017, at 12:42 PM, Quanwei Zhang <qwzhang0601 at gmail.com> wrote:
>
> Hello:
>
> We are using the Maker2 pipeline to annotating a new genome. We just read something about the repeat masking from repeatMasker's documents. It suggests to leave low complexity region unmasked and to do gene annotation using both masked and unmasked genome. I wonder what your opinion and suggestions on this? Many thanks
>
>
> The paragraph below is from http://www.binfo.ncku.edu.tw/RM/webrepeatmaskerhelp.html <http://www.binfo.ncku.edu.tw/RM/webrepeatmaskerhelp.html>
> Use in association with gene prediction programs
>
> Predicting genes from a masked sequence faces several problems. First, one should not mask low complexity regions, e.g. to avoid masking trinucleotide repeats in coding regions. But even with only interspersed repeats masked, gene prediction programs may fail to identify exons correctly. As mentioned above, sometimes tail ends of coding regions may have originated from transposable elements. Even if no coding regions have been masked, splice sites may be compromised; e.g. the polypyrimidine region that is part of the acceptor splice site may be contained within a repeat.
>
> Thus, I generally recommend to run a gene prediction program on unmasked DNA (as well) and compare the predicted genes and exons with the RepeatMasker output. Some gene prediction program allow you to force certain exons out of the predictions (e.g. often the old ORFs of LINE1 elements and endogenous retroviruses are included in genes). Work is also in progress at several sites to incorporate RepeatMasker into gene prediction programs, in which cases matches to repeats are weighted in along with the other parameters used.
>
>
> Best
>
> Quanwei
>
>
>
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170731/f57fab5b/attachment-0003.html>
More information about the maker-devel
mailing list