<div dir="ltr"><div><div><div>Hi Carson:<br><br></div>I see. Thank you for your explanation!<br><br></div>Best<br></div>Quanwei<br></div><div class="gmail_extra"><br><div class="gmail_quote">2017-07-31 12:48 GMT-04:00 Carson Holt <span dir="ltr"><<a href="mailto:carsonhh@gmail.com" target="_blank">carsonhh@gmail.com</a>></span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word">MAKER uses the masking primarily for the evidence alignment step. Low complexity regions are soft masked which means alignments can extend through them but must seed outside of the masked region first. Successful BLAST alignments are then polished using exonerate on the unmasked region.<div><br></div><div>Also for the gene predictor, the first run is done with hard masking of the transposons only. So they can still predict in low complexity regions. The second round of hint based prediction is done on the unmasked assembly. So MAKER already handles all the issues you are mentioning.</div><div><br></div><div>--Carson</div><div><br></div><div><br></div><div><br></div><div><br></div><div><br><div><blockquote type="cite"><div><div class="h5"><div>On Jul 31, 2017, at 10:42 AM, Quanwei Zhang <<a href="mailto:qwzhang0601@gmail.com" target="_blank">qwzhang0601@gmail.com</a>> wrote:</div><br class="m_122329467894436106Apple-interchange-newline"></div></div><div><div><div class="h5"><div dir="ltr"><div>Hello:<br><br></div>We are using the Maker2 pipeline to annotating a new genome. We just read something about the repeat masking from repeatMasker's documents. It suggests to leave low complexity region unmasked and to do gene annotation using both masked and unmasked genome. I wonder what your opinion and suggestions on this? Many thanks<br><div><div><div><br><br>The paragraph below is from <a href="http://www.binfo.ncku.edu.tw/RM/webrepeatmaskerhelp.html" target="_blank">http://www.binfo.ncku.edu.tw/<wbr>RM/webrepeatmaskerhelp.html</a><br><h2>Use in association with gene prediction programs</h2><p> 

Predicting genes from a masked sequence faces several problems. <span style="color:rgb(0,0,255)">First,

one should not mask low complexity regions</span>, e.g. to avoid masking

trinucleotide repeats in coding regions. But even with only

interspersed repeats masked, gene prediction programs may fail to

identify exons correctly. As mentioned above, sometimes tail ends of

coding regions may have originated from transposable elements. Even if

no coding regions have been masked, splice sites may be compromised;

e.g. the polypyrimidine region that is part of the acceptor splice

site may be contained within a repeat.  

<br> <br>

Thus, I generally recommend to run a gene prediction program on

unmasked DNA (as well) and compare the predicted genes and exons with

the RepeatMasker output. Some gene prediction program allow you to

force certain exons out of the predictions (e.g. often the old ORFs of

LINE1 elements and endogenous retroviruses are included in

genes). Work is also in progress at several sites to incorporate

RepeatMasker into gene prediction programs, in which cases matches to

repeats are weighted in along with the other parameters used.  <br>

 <br></p><p>Best</p><p>Quanwei<br></p>

<h2><br></h2></div></div></div></div></div></div>

______________________________<wbr>_________________<br>maker-devel mailing list<br><a href="mailto:maker-devel@box290.bluehost.com" target="_blank">maker-devel@box290.bluehost.<wbr>com</a><br><a href="http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org" target="_blank">http://box290.bluehost.com/<wbr>mailman/listinfo/maker-devel_<wbr>yandell-lab.org</a><br></div></blockquote></div><br></div></div></blockquote></div><br></div>