[maker-devel] prokaryotic genome annotation

Wed Jan 27 08:17:37 MST 2016

Hi Panos,

The strategy for annotating prokaryotes is very different than that for eukaryotes.  Basically my recommendation is to use Genemark S and set protein2genome=1, keep_preds=1, always_complete=1, and no need for ESTs (irrelevant in prokaryotes). No need to do multiple iterations like you would for eukaryotes either. The bootstrapping procedure is not relevant for prokaryotes. I’d avoid also using the GFF3 passthrough option, you will lose some information about the alignment that affects reading frame of the protein evidence. It can be convenient for large eukaryotes when you are pulling evidence from a database, but if it’s just from a previous maker run, you should just rerun in the same directory with the protein fasta. MAKER will detect that it already ran blastx and pull the raw reports form the previous datastore.

Thanks,
Carson

> On Jan 27, 2016, at 6:14 AM, Panos Sapou <sapuizait at gmail.com> wrote:
> 
> Dear all
> 
> I recently started using maker for the annotation of my prokaryotic genomes and even if i managed to get some nice results I would like to check with you if what I did was right and also ask you a couple of questions about the procedure
> 
> I also apologize in advance if I ask sth silly since I am a newbie in bionformatics and I might ask very basic stuff
> 
> 
> I have only available DNA sequences, I have no ESTs and no proteins
> 
> 1) I started by using the protein2genome option and as reference I used the Uniref50 database. Then I generated a merged gff file (similar procedure like the one in the tutorial maker)
> 
> 2) I used Genemark.S and I created a model by using the gmsn.pl <http://gmsn.pl/> command and as input the assembled contigs of my bacteria
> 
> 3) after finishing the above 2 steps I run maker again by using as input the gff file from step 1: #-------Re-annotation using maker derived GFF3: maker_gff=input.gff
> and I also set
> protein_pass=1
> is that correct? do you think it helps?
> and at the #-----gene prediction I used the hmm.mod file generated in step 2 
> 
> my questions:
> Do the above sound correct?
> 
> it is in my understanding that I can only use genemark for prokaryotic genomes, is that correct?
> 
> when I run maker the second time (step 3) should I set protein2genome=1 or 0? or just having the gff file (from step 1) in the re-annotation options is enough? and thefore prediction based on the protein2genome has already been done?
> 
> Also if I use a gff file (from step 1) will it make any difference if I set protein2genome=1 and use an extra (different) database? (I was wondering if it will improve the results?)
> 
> finally regarding the choice of the database: would you advise me to use uniref or the proteomes of closely related bacteria (I have downloaded and created a single fasta from appx 100 proteomes of closely related bacteria)
> 
> thank you in advance 
> and once again I apologize if it is pretty basic what I am asking, just wanted to make sure...
> 
> 
> Best
> Panos
> 
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20160127/2893602d/attachment-0003.html>