[maker-devel] repeats masking
Quanwei Zhang
qwzhang0601 at gmail.com
Mon Jul 31 10:59:02 MDT 2017
Hi Carson:
I see. Thank you for your explanation!
Best
Quanwei
2017-07-31 12:48 GMT-04:00 Carson Holt <carsonhh at gmail.com>:
> MAKER uses the masking primarily for the evidence alignment step. Low
> complexity regions are soft masked which means alignments can extend
> through them but must seed outside of the masked region first. Successful
> BLAST alignments are then polished using exonerate on the unmasked region.
>
> Also for the gene predictor, the first run is done with hard masking of
> the transposons only. So they can still predict in low complexity regions.
> The second round of hint based prediction is done on the unmasked assembly.
> So MAKER already handles all the issues you are mentioning.
>
> --Carson
>
>
>
>
>
> On Jul 31, 2017, at 10:42 AM, Quanwei Zhang <qwzhang0601 at gmail.com> wrote:
>
> Hello:
>
> We are using the Maker2 pipeline to annotating a new genome. We just read
> something about the repeat masking from repeatMasker's documents. It
> suggests to leave low complexity region unmasked and to do gene annotation
> using both masked and unmasked genome. I wonder what your opinion and
> suggestions on this? Many thanks
>
>
> The paragraph below is from http://www.binfo.ncku.edu.tw/
> RM/webrepeatmaskerhelp.html
> Use in association with gene prediction programs
>
> Predicting genes from a masked sequence faces several problems. First,
> one should not mask low complexity regions, e.g. to avoid masking
> trinucleotide repeats in coding regions. But even with only interspersed
> repeats masked, gene prediction programs may fail to identify exons
> correctly. As mentioned above, sometimes tail ends of coding regions may
> have originated from transposable elements. Even if no coding regions have
> been masked, splice sites may be compromised; e.g. the polypyrimidine
> region that is part of the acceptor splice site may be contained within a
> repeat.
>
> Thus, I generally recommend to run a gene prediction program on unmasked
> DNA (as well) and compare the predicted genes and exons with the
> RepeatMasker output. Some gene prediction program allow you to force
> certain exons out of the predictions (e.g. often the old ORFs of LINE1
> elements and endogenous retroviruses are included in genes). Work is also
> in progress at several sites to incorporate RepeatMasker into gene
> prediction programs, in which cases matches to repeats are weighted in
> along with the other parameters used.
>
> Best
>
> Quanwei
>
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170731/8e09bde5/attachment-0003.html>
More information about the maker-devel
mailing list