<div dir="rtl"><div style="" dir="ltr">Hi MAKER users,</div><div style="" dir="ltr">I am new to Maker and had just finished running my first annotations. Although the results make sense in general, I have reasons to suspect some gene models are wrong and would like your help in understanding and optimizing the results.</div><div style="" dir="ltr">My research project involves the annotation of multiple tomato varieties (individuals) which are a bit different from the published reference genome. To this end, I created de-novo assemblies of these genomes and also generated an evidence set to be used as input for Maker. Evidence consist of a large set of transcripts from various tomato varieties and conditions, as well as full protein sets from 6 plant species, including the proteins derived from the annotation of the reference - called ITAG.</div><div style="" dir="ltr">For an initial QA, I tried annotating the reference genome using my evidence data and Augustus as gene predictor. This should allow me to compare my result to the ITAG annotation, which I assume to be the "correct" answer, and see how well I'm doing. I should mention that ITAG annotation was also created using Maker, followed by manual curation.</div><div style="" dir="ltr">I started by comparing the protein sets from my result and the ITAT set. Specifically, I ran an all-vs-all blast and took the top hits. I discovered that only about 70% of the ITAG proteins are covered by a protein from my result with a high quality alignment (evalue > 10e-5, coverage > 90%). I further investigated by running BUSCO on both protein sets and looking at BUSCOs found in ITAG but missing in my result. Attached is a screenshot from a genome browser where you can see such a case. Top track is the ITAG gene model, below is my result. Third track is the protein evidence alignments (i.e blastx and protein2genome features), and bottom track are masked repeats.</div><div style="" dir="ltr">As you can see, there seems to be two issues with my result:</div><div style="" dir="ltr">1. The two genes in ITAG were fused into one. I guess this is a difficult case as the genes are really close together.</div><div style="" dir="ltr">2. The last (3') CDS of the ITAG gene was predicted to be the 3' UTR in my result. This is in fact the reason I ended up with a truncated protein and a missing BUSCO.</div><div style="" dir="ltr">This is a bit surprising to me, since there seems to be quite a lot of protein evidence supporting this region as a CDS. Can you help me figure out why is the result so? Could it be due to the small repeats detected in this region?</div><div style="" dir="ltr">Any ideas on how my result can be improved without manual curation?</div><div style="" dir="ltr"><br></div><div style="" dir="ltr">Many thanks!</div></div>