<html><head><meta http-equiv="Content-Type" content="text/html charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">Sorry. Meant to say —> "RepeatMasker also started checking against <b class="">protein</b> repeats to get better performance"<div class=""><br class=""></div><div class="">—Carson</div><div class=""><br class=""></div><div class=""><br class=""><div><blockquote type="cite" class=""><div class="">On Sep 30, 2015, at 1:18 PM, Carson Holt <<a href="mailto:carsonhh@gmail.com" class="">carsonhh@gmail.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><meta http-equiv="Content-Type" content="text/html charset=utf-8" class=""><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">It’s from a tool called RepeatRunner.  Here is the paper —> <a href="https://publications.mpi-cbg.de/Smith_2007_5404.pdf" class="">https://publications.mpi-cbg.de/Smith_2007_5404.pdf</a><div class=""><br class=""></div><div class="">Post RepeatRunner development, RepeatMasker also started checking against repeats to get better performance. So nowadays it may be somewhat redundant with what RepeatMasker will do, but it does add a little.  It’s not updated regularly, but since RepBase started adding proteins that should not be an issue.</div><div class=""><br class=""></div><div class="">In addition to a number of protein repeats, te_proteins also contains a number of low complexity entries from NCBI’s NR database that tend to falsely align with great frequency frequently to many genomes. All te_protein matches generate soft masking in the genome whereas RepeatMasker results will be hard masked.</div><div class=""><br class=""></div><div class="">—Carson</div><div class=""><br class=""></div><div class=""><br class=""><div class=""><blockquote type="cite" class=""><div class="">On Sep 30, 2015, at 1:00 PM, Ole Kristian Tørresen <<a href="mailto:ole.toerresen@gmail.com" class="">ole.toerresen@gmail.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div dir="ltr" class="">Hi,<div class="">the file te_proteins.fasta is distributed with MAKER and is suggested as a way to find more divergent transposable elements by searching in protein level instead of at nucleotide level. I've been unable to find any information about it's creation, and whether or not it has been kept current. There is a file with mobile elements derived proteins distributed with RepBase, called RepeatPeps.lib, which seem to contain the same amount of sequences (about 9.4 Mbp in both), but half the number (10500 vs 25000). </div><div class=""><br class=""></div><div class="">Does anyone know how these two files compare? Could I use RepeatPeps.lib instead, or combine them (with some clustering maybe?)?</div><div class=""><br class=""></div><div class="">Thank you.</div><div class=""><br class=""></div><div class="">Sincerely,</div><div class="">Ole Kristian Tørresen</div></div>

_______________________________________________<br class="">maker-devel mailing list<br class=""><a href="mailto:maker-devel@box290.bluehost.com" class="">maker-devel@box290.bluehost.com</a><br class=""><a href="http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org" class="">http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org</a><br class=""></div></blockquote></div><br class=""></div></div></div></blockquote></div><br class=""></div></body></html>