[maker-devel] prokaryotic genome annotation

Panos Sapou sapuizait at gmail.com
Fri Jan 29 03:12:35 MST 2016


Dear all

I am trying to annotate a new spiroplasma strain and I would like to know
if there is a way to change the stop codons (not take into account 'tga')

cause eitherwise I get too many premature stop codons and fragmented genes
that are not real

Best
Panos

On 27 January 2016 at 14:14, Panos Sapou <sapuizait at gmail.com> wrote:

> Dear all
>
> I recently started using maker for the annotation of my prokaryotic
> genomes and even if i managed to get some nice results I would like to
> check with you if what I did was right and also ask you a couple of
> questions about the procedure
>
> I also apologize in advance if I ask sth silly since I am a newbie in
> bionformatics and I might ask very basic stuff
>
>
> I have only available DNA sequences, I have no ESTs and no proteins
>
> 1) I started by using the protein2genome option and as reference I used
> the Uniref50 database. Then I generated a merged gff file (similar
> procedure like the one in the tutorial maker)
>
> 2) I used Genemark.S and I created a model by using the gmsn.pl command
> and as input the assembled contigs of my bacteria
>
> 3) after finishing the above 2 steps I run maker again by using as input
> the gff file from step 1: #-------Re-annotation using maker derived GFF3:
> maker_gff=input.gff
> and I also set
> protein_pass=1
> is that correct? do you think it helps?
> and at the #-----gene prediction I used the hmm.mod file generated in step
> 2
>
> my questions:
> Do the above sound correct?
>
> it is in my understanding that I can only use genemark for prokaryotic
> genomes, is that correct?
>
> when I run maker the second time (step 3) should I set protein2genome=1 or
> 0? or just having the gff file (from step 1) in the re-annotation options
> is enough? and thefore prediction based on the protein2genome has already
> been done?
>
> Also if I use a gff file (from step 1) will it make any difference if I
> set protein2genome=1 and use an extra (different) database? (I was
> wondering if it will improve the results?)
>
> finally regarding the choice of the database: would you advise me to use
> uniref or the proteomes of closely related bacteria (I have downloaded and
> created a single fasta from appx 100 proteomes of closely related bacteria)
>
> thank you in advance
> and once again I apologize if it is pretty basic what I am asking, just
> wanted to make sure...
>
>
> Best
> Panos
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20160129/abcff94c/attachment-0003.html>


More information about the maker-devel mailing list