<div dir="ltr"><div><div>Dear Carson and Daniel:<br></div><div><br></div><div>Thank you for your explanation about the details of repeat masking. But we still have some concerns, would you please give us some suggestions? Thanks<br></div><div> <br></div>(1) We are doing genome annotation for a new rodent species, we wonder whether we should use repeat library for "Mammalia" or "rodent"? Which is more proper, if we did not construct a species-specific repeat library for the new genome?<br></div><div><br><span style="color:rgb(0,0,255)">#-----Repeat Masking (leave values blank to skip repeat masking)<br>model_org=Mammalia #select a model organism for RepBase masking in RepeatMasker</span></div><div><span style="color:rgb(0,0,255)">repeat_protein=/gs/gsfs0/hpc01/apps/MAKER/2.31.9/data/te_proteins.fasta #provide a fasta file of transposable element proteins for RepeatRunner<br><br></span></div><div><br></div>(2) With some concerns as discussed above emails, we did not train a species-specific repeat library. Since we have finished the annotation only using the repeat library from repeatMasker and Maker2, we wonder whether it is worth for us to firstly train a species-specific repeat library and then do the genome annotation again? Will it (i.e., trainning a species-specific repeat library) significantly affect the gene annotation and downstream analysis (e.g., gene family expansion analysis, positive selection)? <br><div><br></div><div>(3) We identified some gene families under contraction, but we want to confirm those gene families really lost copies in our new genome. Do you think it is worth to do the genome annotation without repeat masking, so there will not be genes missing from annotation due to repeat mask?</div><div><br></div><div>Many thanks.</div><div><br></div><div>Best</div><div>Quanwei<br></div><div><br><br></div></div><div class="gmail_extra"><br><div class="gmail_quote">2017-07-31 13:02 GMT-04:00 Carson Holt <span dir="ltr"><<a href="mailto:carsonhh@gmail.com" target="_blank">carsonhh@gmail.com</a>></span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word">Please note that the unmask option Dan is talking about is a feature to run both masked and unmasked raw predictions in the first round of prediction (it does not affect alignemnt of the second round of predictiopn). It tends to increase the false positive rate but can be a quick test when you believe you are missing a gene because of overmasking from a user created library and protein/EST evidence is overly sparse (so the gene cannot be recovered through evidence alignment and the second round of unmasked prediction).<span class="HOEnZb"><font color="#888888"><div><br></div><div>—Carson</div></font></span><div><div class="h5"><div><br><div><blockquote type="cite"><div>On Jul 31, 2017, at 10:53 AM, Daniel Ence <<a href="mailto:dandence@gmail.com" target="_blank">dandence@gmail.com</a>> wrote:</div><br class="m_-6282269780038308299Apple-interchange-newline"><div><div style="word-wrap:break-word">Hi Quanwei, Running maker on the unmasked genome will probably give you more genes, but won’t be helpful in the end. Maker soft-masks repeats, which prevents blast alignments from being seeded in the masked regions, but still allows them to extend into those regions. This solves the problem missing exons mentioned in the text you sent. There’s an option in the control file to run the ab-inition programs on the unmasked sequence (“unmask”) which is set to false (0) by default. <div><br></div><div>Hope this helps, </div><div>Daniel Ence<br><div><br></div><div><br><div><blockquote type="cite"><div>On Jul 31, 2017, at 12:42 PM, Quanwei Zhang <<a href="mailto:qwzhang0601@gmail.com" target="_blank">qwzhang0601@gmail.com</a>> wrote:</div><br class="m_-6282269780038308299Apple-interchange-newline"><div><div dir="ltr"><div>Hello:<br><br></div>We are using the Maker2 pipeline to annotating a new genome. We just read something about the repeat masking from repeatMasker's documents. It suggests to leave low complexity region unmasked and to do gene annotation using both masked and unmasked genome. I wonder what your opinion and suggestions on this? Many thanks<br><div><div><div><br><br>The paragraph below is from <a href="http://www.binfo.ncku.edu.tw/RM/webrepeatmaskerhelp.html" target="_blank">http://www.binfo.ncku.edu.tw/<wbr>RM/webrepeatmaskerhelp.html</a><br><h2>Use in association with gene prediction programs</h2><p>
Predicting genes from a masked sequence faces several problems. <span style="color:rgb(0,0,255)">First,
one should not mask low complexity regions</span>, e.g. to avoid masking
trinucleotide repeats in coding regions. But even with only
interspersed repeats masked, gene prediction programs may fail to
identify exons correctly. As mentioned above, sometimes tail ends of
coding regions may have originated from transposable elements. Even if
no coding regions have been masked, splice sites may be compromised;
e.g. the polypyrimidine region that is part of the acceptor splice
site may be contained within a repeat.
<br> <br>
Thus, I generally recommend to run a gene prediction program on
unmasked DNA (as well) and compare the predicted genes and exons with
the RepeatMasker output. Some gene prediction program allow you to
force certain exons out of the predictions (e.g. often the old ORFs of
LINE1 elements and endogenous retroviruses are included in
genes). Work is also in progress at several sites to incorporate
RepeatMasker into gene prediction programs, in which cases matches to
repeats are weighted in along with the other parameters used. <br>
<br></p><p>Best</p><p>Quanwei<br></p>
<h2><br></h2></div></div></div></div>
______________________________<wbr>_________________<br>maker-devel mailing list<br><a href="mailto:maker-devel@box290.bluehost.com" target="_blank">maker-devel@box290.bluehost.<wbr>com</a><br><a href="http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org" target="_blank">http://box290.bluehost.com/<wbr>mailman/listinfo/maker-devel_<wbr>yandell-lab.org</a><br></div></blockquote></div><br></div></div></div>______________________________<wbr>_________________<br>maker-devel mailing list<br><a href="mailto:maker-devel@box290.bluehost.com" target="_blank">maker-devel@box290.bluehost.<wbr>com</a><br><a href="http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org" target="_blank">http://box290.bluehost.com/<wbr>mailman/listinfo/maker-devel_<wbr>yandell-lab.org</a><br></div></blockquote></div><br></div></div></div></div></blockquote></div><br></div>