[maker-devel] Problem training agustus

Carson Holt carsonhh at gmail.com
Thu May 30 12:09:20 MDT 2019


If you are looking to submit to Genebank, tools like GAL are good (https://github.com/The-Sequence-Ontology/GAL <https://github.com/The-Sequence-Ontology/GAL>).

Things like accession which you listed below only exist inside of GeneBank (i.e. after you submit). They are not produced by pipelines or format converters.

—Carson



> On May 21, 2019, at 6:11 AM, djerroud samia <djerroudsamia at gmail.com> wrote:
> 
> Hello, thank your for the share, like I said the genbank format is quite important for me. What I really need is to get all this different informations 
> 
> cds: accession, protein product, protein sequence, start, end , locus_tag, gene, .....
> My problem is I don't know the pipeline I should follow to get all this informaitions
> 
> thank you, Samia
> 
> Le mar. 21 mai 2019 à 07:25, p sz <seoanezonjic at hotmail.com <mailto:seoanezonjic at hotmail.com>> a écrit :
> Hi Xavier
> I've changed from zff2genbank.pl <http://zff2genbank.pl/> to zff2augustus_gbk.pl <http://zff2augustus_gbk.pl/>  and the the problem is fixed. I used zff2genbank.pl <http://zff2genbank.pl/>  because it is packaged into the maker suite. I think that MAKER authors should include zff2augustus_gbk.pl <http://zff2augustus_gbk.pl/> into the main suite, I didn't know about this genome-scripts repository. 
> By the way, I would like to show you my training steps with augustus, in order to know if they are correct:
> 
> zff2augustus_gbk.pl <http://zff2augustus_gbk.pl/> export.ann export.dna > final_genes.gb <http://final_genes.gb/>
> randomSplit.pl final_genes.gb <http://final_genes.gb/> 500
> new_species.pl <http://new_species.pl/> --species=Demo
> etraining --species=Demo final_genes.gb <http://final_genes.gb/>
> optimize_augustus.pl <http://optimize_augustus.pl/> --species=Demo --onlytrain=final_genes.gb.train  final_genes.gb.test
> etraining --species=Demo final_genes.gb <http://final_genes.gb/>
> 
> I have taken the training parameters (excepting the 500 parameter) from the train_augustus.pl <http://train_augustus.pl/> script included in MAKER suite.
> Thank you in advance
> Pedro Seoane
> De: Xabier Vázquez-Campos <xvazquezc at gmail.com <mailto:xvazquezc at gmail.com>>
> Enviado: jueves, 16 de mayo de 2019 4:42
> Para: p sz
> Cc: maker-devel at yandell-lab.org <mailto:maker-devel at yandell-lab.org>
> Asunto: Re: [maker-devel] RV: Problem training agustus
>  
> Hi Pedro,
> I checked some of my files and there is no issue with a model in the inverse order. And the gb files generated look fine.
> You need to use zff2augustus_gbk.pl <http://zff2augustus_gbk.pl/> not zff2genbank.pl <http://zff2genbank.pl/>. I don't remember the differences but I know that zff2augustus_gbk.pl <http://zff2augustus_gbk.pl/> works for sure.
> 
> Link: https://github.com/hyphaltip/genome-scripts/blob/master/gene_prediction/zff2augustus_gbk.pl <https://github.com/hyphaltip/genome-scripts/blob/master/gene_prediction/zff2augustus_gbk.pl>
> Cheers,
> Xabi
> 
> On Tue, 14 May 2019 at 18:12, p sz <seoanezonjic at hotmail.com <mailto:seoanezonjic at hotmail.com>> wrote:
> Hi Maker author
> I have been using Maker for long years and recently, I've tried to train agustus using the snap training files. To do this, I have used the train_augustus.pl <http://train_augustus.pl/> script as follows:
> 
> zff2genbank.pl <http://zff2genbank.pl/> export.ann export.dna > final_genes.gb <http://final_genes.gb/>
> train_augustus.pl <http://train_augustus.pl/> final_genes.gb <http://final_genes.gb/> MyOrg
> 
> For each of my gene models the error is the following:
> 
> Constructing GenBank feature: Feature begins after it ends: 1006..1001,2051..1917,7791..7689,7993..7880,8485..8374,8775..8628,9050..8873,10467..10459,13315..11920,13598..13511,14971..14945,18637..18471,18898..18821,20558..20389,21067..20923,23249..23004,23549..23354,23647..23624
> GBProcessor::getGeneList(): GBFeature constructor:Format error when reading genbank format.
> Encountered error after reading 0 annotations.
> 
> The export files are generated with SNAP as described by your reference guides (two maker rounds). The issue seems related with the sense of the gene model that can be inspected here:
> (export.ann file)
> >MODEL236
> Eterm   23624   23647   MODEL236
> Exon    23354   23549   MODEL236
> Exon    23004   23249   MODEL236
> Exon    20923   21067   MODEL236
> Exon    20389   20558   MODEL236
> Exon    18821   18898   MODEL236
> Exon    18471   18637   MODEL236
> Exon    14945   14971   MODEL236
> Exon    13511   13598   MODEL236
> Exon    11920   13315   MODEL236
> Exon    10459   10467   MODEL236
> Exon    8873    9050    MODEL236
> Exon    8628    8775    MODEL236
> Exon    8374    8485    MODEL236
> Exon    7880    7993    MODEL236
> Exon    7689    7791    MODEL236
> Exon    1917    2051    MODEL236
> Einit   1001    1006    MODEL236
> 
> (genbankfile)
> LOCUS       MODEL236               24647 bp    dna     linear   UNK
> ACCESSION   unknown
> FEATURES             Location/Qualifiers
>      source          1..24647
>      CDS             complement(join(1006..1001,2051..1917,7791..7689,
>                      7993..7880,8485..8374,8775..8628,9050..8873,10467..10459,
>                      13315..11920,13598..13511,14971..14945,18637..18471,
>                      18898..18821,20558..20389,21067..20923,23249..23004,
>                      23549..23354,23647..23624))
> 
> It seems that augustus needs the direct sense description of the gene model in order to read the gb file and perform the training. How  could I fix the problem?
> Thank you in advance
> Pedro Seoane
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com <mailto:maker-devel at box290.bluehost.com>
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org <http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org>
> 
> 
> -- 
> Xabier Vázquez-Campos, PhD
> Research Associate
> NSW Systems Biology Initiative
> School of Biotechnology and Biomolecular Sciences
> The University of New South Wales
> Sydney NSW 2052 AUSTRALIA
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com <mailto:maker-devel at box290.bluehost.com>
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org <http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org>
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20190530/0c697aa7/attachment-0003.html>


More information about the maker-devel mailing list