[maker-devel] few basic questions

Mon Dec 29 21:00:56 MST 2014

Hi there,

I'm a newbie dealing with genomes and I've been trying to start using Maker
for the annotation. I understand the base concepts but I have doubts about
the correct steps to follow. I've being through the 2014 video tutorial and
searched for detailed steps and I still have some question, maybe a bit
obvious tough...

I have to annotate two fungal genomes and I only have the DNA assembly (no
EST or protein files).
I understand that lacking of EST and protein files I should provide them as
alt-est and protein from the closest species I can, but is it enough with
one EST file from one organism for the alt-est?

Regarding the steps to process would this be correct?:

   1. run Maker with the genome, alt-est and protein files, with
   est2genome=1 and protein2genome=1 (softmask=1 ?)
   2. with this first output, create the hmm file for SNAP based on the
   first output
   3. Set est2genome=0 and protein2genome=0, set the snaphmm file and run
   again (using -base option)
   4. repeat2 and 3 as necessary*

*How do you know when you get to the point where no more refinement is
possible? Would that the final model? It should be based on the AED scores?
How can I get it without looking into individual sequence headings? Also,
do you perform the bootstrapping on the same folder? In the tutorial
<http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial_for_GMOD_Online_Training_2014>I
saw different folders, (e.g. pyu_contig1, pyu_contig2) used on each
repetition, not sure if just for demonstration purposes or if it is the
proper way to go..

I'm trying to run also a gene prediction with Augustus and GeneMark. The
first run will include an already trained profile for Augustus and the
native hmm file of genemark-ES**. Do they need to repeat the prediction by
bootstrap like with SNAP? If so, do I need to generate new hmm files or
prediction models based on results?

**I have been trying to make the hmm file for genemark-ES using the gm_es.pl
script but no matter what parameters I use the cluster shut the job down as
it exceeds 128GB of memory in use. The genome I've been testing for this is
about 42Mbp in a roughly 40-50 MB fasta file

Thank you in advance,

Xabier

-- 
Xabier Vázquez Campos
*PhD Candidate*
Water Research Centre
School of Civil and Environmental Engineering
The University of New South Wales
Sydney NSW 2052 AUSTRALIA
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20141230/7f403c04/attachment-0001.html>