<div dir="ltr"><div>Hi there,<br><br></div>I'm a newbie dealing with genomes and I've been trying to start using Maker for the annotation. I understand the base concepts but I have doubts about the correct steps to follow. I've being through the 2014 video tutorial and searched for detailed steps and I still have some question, maybe a bit obvious tough...<br><br>I have to annotate two fungal genomes and I only have the DNA assembly (no EST or protein files). <br><div><div>I understand that lacking of EST and protein files I should provide them as alt-est and protein from the closest species I can, but is it enough with one EST file from one organism for the alt-est?<br></div><div><br></div><div>Regarding the steps to process would this be correct?:<br><ol><li>run Maker with the genome, alt-est and protein files, with est2genome=1 and protein2genome=1 (softmask=1 ?)</li><li>with this first output, create the hmm file for SNAP based on the first output</li><li>Set est2genome=0 and protein2genome=0, set the snaphmm file and run again (using -base option)<br></li><li>repeat2 and 3 as necessary*<br></li></ol></div><div>*How do you know when you get to the point where no more refinement is possible? Would that the final model? It should be based on the AED scores? How can I get it without looking into individual sequence headings? Also, do you perform the bootstrapping on the same folder? In the <a href="http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial_for_GMOD_Online_Training_2014">tutorial </a>I saw different folders, (e.g. pyu_contig1, pyu_contig2) used on each repetition, not sure if just for demonstration purposes or if it is the proper way to go..<br><br></div><div>I'm trying to run also a gene prediction with Augustus and GeneMark. The first run will include an already trained profile for Augustus and the native hmm file of genemark-ES**. Do they need to repeat the prediction by bootstrap like with SNAP? If so, do I need to generate new hmm files or prediction models based on results?<br><br></div><div>**I have been trying to make the hmm file for genemark-ES using the <a href="http://gm_es.pl">gm_es.pl</a> script but no matter what parameters I use the cluster shut the job down as it exceeds 128GB of memory in use. The genome I've been testing for this is about 42Mbp in a roughly 40-50 MB fasta file<br></div><div><br></div><div>Thank you in advance,<br><br></div><div>Xabier<br></div><div><br>-- <br><div>Xabier Vázquez Campos<br><i>PhD Candidate</i><br>Water Research Centre<br>School of Civil and Environmental Engineering<br>
The University of New South Wales<br>Sydney NSW 2052 AUSTRALIA<br></div>
</div></div></div>