<html><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">Mapping genes forward onto a new assembly is an iterative process.<div class=""><br class=""></div><div class="">If a previous model maps to multiple locations that is an indication that you accidentally annotated a repeat in the previous annotation set (i.e. transposes, etc). This is especially true if you get 100 copies. You should simply remove that gene. If it’s just 2 or 3 copies you can use the score column (see GFF3 format specification) which indicates the best match to the original copy to keep one or the other (value is 0-100% recovery when est_forward=1 is set). You can compare neighboring genes between assemblies to see if one copy is simply a paralog. If there are a lot of genes with 2 copies, you probably have high heterozygosity in the genome so maternal and paternal chromosomes are assembling independently (i.e. this means that both copies are real and are the exact same gene).<div class=""><br class=""></div><div class="">If you decide a gene belongs on a specific contig after reviewing neighboring models and score, you can anchor it to a contig or region by adding maker_coor= to the fasta header (example: >transcriptA <span style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class="">maker_coor=contig1:10000-11000; </span>). Also consider removing models that have low scores. They may map to only one location, but if the % recovery is low, it may be best to just not try and recover the old model (a new gene prediction may prove much more accurate).</div><div class=""><br class=""></div><div class="">—Carson</div><div class=""><br class=""></div><div class=""><br class=""></div><div class=""><br class=""><div><br class=""></div><div><br class=""><blockquote type="cite" class=""><div class="">On Jun 2, 2020, at 11:27 PM, Maxwell C Coyle <<a href="mailto:max_coyle@berkeley.edu" class="">max_coyle@berkeley.edu</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><meta http-equiv="Content-Type" content="text/html; charset=utf-8" class=""><div style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">Hello!<div class=""><br class=""></div><div class="">I’m using Maker to map forward annotations to a new genome assembly in a species of choanoflagellate. I followed the protocol from Campbell 2014, creating a transcript FASTA file from my old GFF3 file and using this as the only EST evidence. I set est2genome=1 and est_forward=1. The Maker ran great, except it seems that many of the old transcripts are mapping many places in the new assembly, up to 100+ times, so that my gene count has inflated from 11,624 to 25,905. Of the most highly multiply mapped genes, many but not all are rRNA genes.</div><div class=""><br class=""></div><div class="">I was wondering if there is a setting or tweak I can make so that each transcript maps uniquely to its best location in the new genome? Or maybe tweaking the exonerate stringency? This is not the final annotation step, as I will also be using RNA-seq evidence to improve my gene models after liftover.</div><div class=""><br class=""></div><div class="">Thanks for your help!</div><div class=""><br class=""></div><div class="">Best,</div><div class="">Max</div><div class=""><br class=""></div><div class=""><div class="">
<div dir="auto" style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class=""><div style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;" class="">Max Coyle</div><div style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;" class="">PhD Candidate, King Lab</div><div style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;" class="">Dept of Molecular and Cellular Biology, UC Berkeley</div><br class="Apple-interchange-newline"></div><br class="Apple-interchange-newline">
</div>
<br class=""></div></div>_______________________________________________<br class="">maker-devel mailing list<br class=""><a href="mailto:maker-devel@yandell-lab.org" class="">maker-devel@yandell-lab.org</a><br class="">http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org<br class=""></div></blockquote></div><br class=""></div></div></body></html>