[maker-devel] Combining and merging two Maker annotation gff files ?

Xabier Vázquez-Campos xvazquezc at gmail.com
Sun Oct 23 17:49:53 MDT 2016


If it's of any help I had this notes on my old protocol (before I started
to do the training with BUSCO):

For Augustus, we need the script "zff2augustus_gbk.pl". This will take the
> export.dna generated by fathom and generate a *.gb file that will be used
> as "training gene structure file" in a new training submission in
> WebAugustus, but remember to give it a new name in the submission, e.g.
> MYGENOME_v2, or Maker won't see the difference (same name):
>     perl PATH/TO/SCRIPT/zff2augustus_gbk.pl > MYGENOME.train.gb
>

As said, you could also do the training with BUSCO with the --long option.
It has a dataset specific for arthropods. But if you have EST data you'll
probably do better with the other method, as it allows to enter the EST for
a more accurate training.

On 24 October 2016 at 10:25, Carson Holt <carsonhh at gmail.com> wrote:

> It’s unfortunate the archived GMOD post is gone, because I always used it
> for my own reference. If I remember right, the main point was that Jason
> Stajich wrote a tool to convert Snap’s ZFF format to a Genbank format
> suitable for Augustus training. This meant you could use the maker2zff
> script that came with MAKER, then use Jason’s tool to convert for Augustus
> training.
>
> Tool to convert SNAP training ZFF to Augustus trining input file —>
> https://github.com/hyphaltip/genome-scripts/blob/master/
> gene_prediction/zff2augustus_gbk.pl
>
>
> Since the post is gone, you could use that documentation provided with his
> tool and then maybe a generic Augustus training guide like the following to
> design a path forward —>
> http://www.molecularevolution.org/molevolfiles/exercises/
> augustus/training.html
>
> —Carson
>
>
> On Oct 12, 2016, at 3:44 AM, chebbi mohamed amine <
> mohamed.amine.chebbi at univ-poitiers.fr> wrote:
>
> Thank you Carson for your quick response.  Sorry, I have another question
> concerning Augustus Training. You posted previously in the mailing list a
> link to an explanation of Augustus training steps  http://brie4.cshl.edu/
> pipermail/gmod-help/2012-June/001724.htm
> <http://brie4.cshl.edu/pipermail/gmod-help/2012-June/001724.html>l.
> Unfortunately the link doesn't work anymore. Otherwise could you explain
> how to filter the  gff  file produced by the first run of Maker to get best
> full length ORF as a set of gene models to train Augustus ?
>
> Best,
> Amine
>
> ------------------------------
> *De: *"chebbi mohamed amine" <mohamed.amine.chebbi at univ-poitiers.fr>
> *À: *"Carson Holt" <carsonhh at gmail.com>
> *Cc: *maker-devel at yandell-lab.org
> *Envoyé: *Mercredi 12 Octobre 2016 11:44:21
> *Objet: *Re: [maker-devel] Combining and merging two Maker annotation gff
> files ?
>
> Thank you Carson for your quick response.  Sorry, I have another question
> concerning Augustus Training. You posted previously in the mailing list a
> link to an explanation of Augustus training steps  http://brie4.cshl.edu/
> pipermail/gmod-help/2012-June/001724.htm
> <http://brie4.cshl.edu/pipermail/gmod-help/2012-June/001724.html>l.
> Unfortunately the link doesn't work anymore. Otherwise could you explain
> how to filter the  gff  file produced by the first run of Maker to get best
> full length ORF as a set of gene models to train Augustus ?
>
>
> ------------------------------
> *De: *"Carson Holt" <carsonhh at gmail.com>
> *À: *"Mohamed Amine CHEBBI" <mohamed.amine.chebbi at univ-poitiers.fr>
> *Cc: *maker-devel at yandell-lab.org
> *Envoyé: *Mardi 11 Octobre 2016 22:05:50
> *Objet: *Re: [maker-devel] Combining and merging two Maker annotation gff
> files ?
>
> Masking doesn’t just affect the gene models, but also evidence alignment
> and thus scoring. So merging in this way would not make much sense as the
> second less masked set would always score better because it has more
> evidence alignments permitted by the lack of masking (not necessarily real,
> but drawn in by repeats).
>
> The result would be that any attempt of a merge would almost exclusively
> result in all genes from the second set always scoring higher.
>
> —Carson
>
>
>
> On Oct 10, 2016, at 3:43 AM, Mohamed Amine CHEBBI <
> mohamed.amine.chebbi at univ-poitiers.fr> wrote:
>
> Hi!
>
> I’m using the latest version of Maker2 to annotate an arthropod genome.
> First, I have run RepeatModeler to create rmlib for Maker, then I have
> followed two independent annotation strategies on the same assembly :
> 1- Passing throw Maker all the repeats collected by RepeatModeler (
> Identified repeats in the Repbase + Unkown Models).
> 2-  Passing throw Maker only the identified repeats.
>
> Both annotations work successfully. The first annotation gives me 19048
> genes against 22931 done by the second one. Know, I'm seeing for a mean to
> merge the two annotation gff files without doing a re-annotation and by
> taking the best and non redundant supported gene models .
>
> So, do you think that configuring  the maker options as below, could
> resolve this issue :
> maker_gff=1-mask-all.gff,2-mask-onlyKnown.gff #MAKER derived GFF3 file
> #MAKER derived GFF3 file
> est_pass=1 #use ESTs in maker_gff: 1 = yes, 0 = no
> altest_pass=0 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no
> protein_pass=1 #use protein alignments in maker_gff: 1 = yes, 0 = no
> rm_pass=1 #use repeats in maker_gff: 1 = yes, 0 = no
> model_pass=0 #use gene models in maker_gff: 1 = yes, 0 = no
> pred_pass=1 #use ab-initio predictions in maker_gff: 1 = yes, 0 = no
> other_pass=0 #passthrough anyything else in maker_gff: 1 = yes, 0 = no
>
> --
> Mohamed Amine CHEBBI, PhD Student
> Université de Poitiers
>
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
>
>
>
>
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
>


-- 
Xabier Vázquez-Campos, *PhD*
*Research Associate*
Water Research Centre
School of Civil and Environmental Engineering
The University of New South Wales
Sydney NSW 2052 AUSTRALIA
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20161024/fd8c49af/attachment-0003.html>


More information about the maker-devel mailing list