<html><head></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; color: rgb(0, 0, 0); font-size: 14px; font-family: Calibri, sans-serif; "><div>Yes, those are the final annotations, and yes one is derived from the ab initio model and one from hint based models. The selection between hint based models and ab initio models is based on evidence overlap, so either can be better than the other or vice versa. The bets models will have lower AED scores. So if for a given locus I have both hint based and ab initio based models, I keep the one that best matches the evidence (lowest AED score).</div><div><br></div><div><span class="Apple-style-span" style="font-family: Tahoma; font-size: 13px; ">augustus_masked means the genome was masked for repeats before running augustus. Anything with </span><span class="Apple-style-span" style="font-family: Tahoma; font-size: 13px; ">augustus_masked in the second column will be ab initio models kept for reference purposes. Every ab initio model produced by augustus will have an entry there.</span></div><div><span class="Apple-style-span" style="font-family: Tahoma; font-size: 13px; "><br></span></div><div><span class="Apple-style-span" style="font-family: Tahoma; font-size: 13px; ">Thanks,</span></div><div><span class="Apple-style-span" style="font-family: Tahoma; font-size: 13px; ">Carson</span></div><div><br></div><span id="OLK_SRC_BODY_SECTION"><div style="font-family:Calibri; font-size:11pt; text-align:left; color:black; BORDER-BOTTOM: medium none; BORDER-LEFT: medium none; PADDING-BOTTOM: 0in; PADDING-LEFT: 0in; PADDING-RIGHT: 0in; BORDER-TOP: #b5c4df 1pt solid; BORDER-RIGHT: medium none; PADDING-TOP: 3pt"><span style="font-weight:bold">From: </span> Sivaranjani Namasivayam <<a href="mailto:ranjani@uga.edu">ranjani@uga.edu</a>><br><span style="font-weight:bold">Date: </span> Wednesday, 12 September, 2012 1:53 PM<br><span style="font-weight:bold">To: </span> Carson Holt <<a href="mailto:carsonhh@gmail.com">carsonhh@gmail.com</a>>, "<a href="mailto:maker-devel@yandell-lab.org">maker-devel@yandell-lab.org</a>" <<a href="mailto:maker-devel@yandell-lab.org">maker-devel@yandell-lab.org</a>><br><span style="font-weight:bold">Subject: </span> RE: [maker-devel] MAKER training<br></div><div><br></div><div dir="ltr"><meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"><style id="owaParaStyle" type="text/css">P {margin-top:0;margin-bottom:0;}</style><div ocsi="0" fpstyle="1" style="word-wrap:break-word; color:rgb(0,0,0)"><div style="direction: ltr;font-family: Tahoma;color: #000000;font-size: 10pt;">I have a few questions based on your comment about augustus/MAKER naming convention.<br><br>
I have been sorting the data using the second column of the GFF file, I wanted to be sure I have it right<br><br>
- Doesn't 'maker' in the second column signify MAKER's final annotations based on all evidence (EST, protein and abinitio prediction) ?<br>
I noticed two types of gene IDs, example<br>
1. augustus_masked-scaffold00030-abinit-gene-3.2<br>
2. maker-scaffold00030-augustus-gene-3.7<br><br>
Is the first one, a direct augustus prediction without a hints file and the second based on a hints file (made from the EST and protein evidence)? If this is the case, could 2 be a better annotation than 1?<br><br>
- In case of augustus_masked in the 2nd column, I believe all are predictions are without a hints file.<br><br>
Thanks,<br>
Ranjani<br><br><div style="font-family: Times New Roman; color: #000000; font-size: 16px"><hr tabindex="-1"><div style="direction: ltr;" id="divRpF35289"><font color="#000000" face="Tahoma" size="2"><b>From:</b> Carson Holt [<a href="mailto:carsonhh@gmail.com">carsonhh@gmail.com</a>]<br><b>Sent:</b> Tuesday, September 11, 2012 12:04 PM<br><b>To:</b> Sivaranjani Namasivayam; <a href="mailto:maker-devel@yandell-lab.org">maker-devel@yandell-lab.org</a><br><b>Subject:</b> Re: [maker-devel] MAKER training<br></font><br></div><div></div><div><div style="font-family:Calibri,sans-serif; font-size:14px"><br></div><span id="OLK_SRC_BODY_SECTION" style="font-size: 14px; font-family: Calibri, sans-serif; "><div dir="ltr"><div><blockquote style="margin:0 0 0 40px; border:none; padding:0px"><div style="direction:ltr; font-family:Tahoma; color:#000000; font-size:10pt">- I have transcriptome data from 454 and Illumina platforms. Illumina is from a single time point and 454 from multiple time point. 454 was assembled using Newbler(dataset 1) and
Illumina using Tophat-Cufflinks (dataset 2) and the denovo Trinity pipeline (dataset 3). I now have3 assemblies - 454 and Illumina will have some redunant transcripts (because of one overlapping time point); TopHat-Cufflinks and Trinity will have highly
redundant transcripts (because they use same raw reads). Is it OK to provide all 3 datasets as EST evidence, how does it affect the quality of annotation. (For now I have used dataset 1 and dataset 2 as EST evidence)<br></div></blockquote></div></div></span><div style="font-family:Calibri,sans-serif; font-size:14px"><br></div><div style="font-family:Calibri,sans-serif; font-size:14px">This is fine. You can give them as a comma separated list est=file1,file2,file3</div><div style="font-family:Calibri,sans-serif; font-size:14px"><br></div><div style="font-family:Calibri,sans-serif; font-size:14px"><br></div><span id="OLK_SRC_BODY_SECTION" style="font-size: 14px; font-family: Calibri, sans-serif; "><div dir="ltr"><div><blockquote style="margin:0 0 0 40px; border:none; padding:0px"><div style="direction:ltr; font-family:Tahoma; color:#000000; font-size:10pt">- I used the above model to retrain, I passed through everything except the abinitio gene predictions. I also provided a set a manually annotated genes , many of which have EST evidence.
Is this OK to do? [ For proteins evidence, I gave a set from related organisms, same as above]<br><br>
- In my third retraining, I used the above retrained model, but this time I only provided the genome_gff but did not pass through any other data. However I did provide the manually annotated genes as EST evidence and related proteins as protein_evidence.
<br></div></blockquote></div></div></span><blockquote style="font-family:Calibri,sans-serif; font-size:14px; margin-top:0px; margin-right:0px; margin-bottom:0px; margin-left:40px; border-top-style:none; border-right-style:none; border-bottom-style:none; border-left-style:none; border-width:initial; border-color:initial; padding-top:0px; padding-right:0px; padding-bottom:0px; padding-left:0px"><div><br></div><div><span class="Apple-style-span" style="font-family:Tahoma; font-size:13px">Can you please give me some advice on which of these could give me the best prediction, or if I can alter something to get a better prediction.</span></div><div><span class="Apple-style-span" style="font-family:Tahoma; font-size:13px"><br></span></div></blockquote><div style="font-family:Calibri,sans-serif; font-size:14px"><span class="Apple-style-span" style="font-family:Tahoma; font-size:13px">Everything you've done sounds reasonable. Better training comes from having the most correct models to train with, so providing
the manual annotations as training works, or you can also select MAKER models with the lowest AED score (i.e. models that most closely match evidence). The goal is to try and make the process as unbias as possible, so a consistent usually automated selection
method is often the easiest to justify justifiable.</span></div><span id="OLK_SRC_BODY_SECTION"><div dir="ltr" style="font-family:Calibri,sans-serif; font-size:14px"><div><blockquote style="margin:0 0 0 40px; border:none; padding:0px"><div style="direction:ltr; font-family:Tahoma; color:#000000; font-size:10pt"><br></div></blockquote></div></div></span><div><br></div><span id="OLK_SRC_BODY_SECTION"><div dir="ltr" style="font-family:Calibri,sans-serif; font-size:14px"><div><blockquote style="margin:0 0 0 40px; border:none; padding:0px"><div style="direction:ltr; font-family:Tahoma; color:#000000; font-size:10pt">- A quick question about Augustus - I used a Augustus model (trained for a closely related organism) for ab-initio prediction. Does MAKER adjust this model based on the evidence provided,
or use the model as such for a prediction.<br></div></blockquote></div></div></span><div><br></div><div>MAKER will provide hints to Augustus during the run to make it perform better. MAKER will report the raw unaided augustus results in the GFF3 file as a reference, but will use evidence to improve performance where it can. The gene name will let you know
if it is a hint based or ab initio model prediction. When 'maker', is part of the gene name it is hint based.</div><div><br></div><div>Thanks,</div><div>Carson</div></div></div></div></div></div></span></body></html>