[maker-devel] Combining and merging two Maker annotation gff files ?
Jason Stajich
jason.stajich at gmail.com
Wed Oct 26 19:26:26 MDT 2016
Yes thanks for re-sharing.
Maybe we should write this up into a clearer tutorial - I go back and forth
on how to make this easier and automated.
Jason
On Sunday, October 23, 2016, Xabier Vázquez-Campos <xvazquezc at gmail.com>
wrote:
> If it's of any help I had this notes on my old protocol (before I started
> to do the training with BUSCO):
>
> For Augustus, we need the script "zff2augustus_gbk.pl". This will take
>> the export.dna generated by fathom and generate a *.gb file that will be
>> used as "training gene structure file" in a new training submission in
>> WebAugustus, but remember to give it a new name in the submission, e.g.
>> MYGENOME_v2, or Maker won't see the difference (same name):
>> perl PATH/TO/SCRIPT/zff2augustus_gbk.pl > MYGENOME.train.gb
>>
>
> As said, you could also do the training with BUSCO with the --long option.
> It has a dataset specific for arthropods. But if you have EST data you'll
> probably do better with the other method, as it allows to enter the EST for
> a more accurate training.
>
> On 24 October 2016 at 10:25, Carson Holt <carsonhh at gmail.com
> <javascript:_e(%7B%7D,'cvml','carsonhh at gmail.com');>> wrote:
>
>> It’s unfortunate the archived GMOD post is gone, because I always used it
>> for my own reference. If I remember right, the main point was that Jason
>> Stajich wrote a tool to convert Snap’s ZFF format to a Genbank format
>> suitable for Augustus training. This meant you could use the maker2zff
>> script that came with MAKER, then use Jason’s tool to convert for Augustus
>> training.
>>
>> Tool to convert SNAP training ZFF to Augustus trining input file —>
>> https://github.com/hyphaltip/genome-scripts/blob/master/gene
>> _prediction/zff2augustus_gbk.pl
>>
>>
>> Since the post is gone, you could use that documentation provided with
>> his tool and then maybe a generic Augustus training guide like the
>> following to design a path forward —>
>> http://www.molecularevolution.org/molevolfiles/exercises/aug
>> ustus/training.html
>>
>> —Carson
>>
>>
>> On Oct 12, 2016, at 3:44 AM, chebbi mohamed amine <
>> mohamed.amine.chebbi at univ-poitiers.fr
>> <javascript:_e(%7B%7D,'cvml','mohamed.amine.chebbi at univ-poitiers.fr');>>
>> wrote:
>>
>> Thank you Carson for your quick response. Sorry, I have another question
>> concerning Augustus Training. You posted previously in the mailing list a
>> link to an explanation of Augustus training steps
>> http://brie4.cshl.edu/pipermail/gmod-help/2012-June/001724.htm
>> <http://brie4.cshl.edu/pipermail/gmod-help/2012-June/001724.html>l.
>> Unfortunately the link doesn't work anymore. Otherwise could you explain
>> how to filter the gff file produced by the first run of Maker to get best
>> full length ORF as a set of gene models to train Augustus ?
>>
>> Best,
>> Amine
>>
>> ------------------------------
>> *De: *"chebbi mohamed amine" <mohamed.amine.chebbi at univ-poitiers.fr
>> <javascript:_e(%7B%7D,'cvml','mohamed.amine.chebbi at univ-poitiers.fr');>>
>> *À: *"Carson Holt" <carsonhh at gmail.com
>> <javascript:_e(%7B%7D,'cvml','carsonhh at gmail.com');>>
>> *Cc: *maker-devel at yandell-lab.org
>> <javascript:_e(%7B%7D,'cvml','maker-devel at yandell-lab.org');>
>> *Envoyé: *Mercredi 12 Octobre 2016 11:44:21
>> *Objet: *Re: [maker-devel] Combining and merging two Maker annotation
>> gff files ?
>>
>> Thank you Carson for your quick response. Sorry, I have another question
>> concerning Augustus Training. You posted previously in the mailing list a
>> link to an explanation of Augustus training steps
>> http://brie4.cshl.edu/pipermail/gmod-help/2012-June/001724.htm
>> <http://brie4.cshl.edu/pipermail/gmod-help/2012-June/001724.html>l.
>> Unfortunately the link doesn't work anymore. Otherwise could you explain
>> how to filter the gff file produced by the first run of Maker to get best
>> full length ORF as a set of gene models to train Augustus ?
>>
>>
>> ------------------------------
>> *De: *"Carson Holt" <carsonhh at gmail.com
>> <javascript:_e(%7B%7D,'cvml','carsonhh at gmail.com');>>
>> *À: *"Mohamed Amine CHEBBI" <mohamed.amine.chebbi at univ-poitiers.fr
>> <javascript:_e(%7B%7D,'cvml','mohamed.amine.chebbi at univ-poitiers.fr');>>
>> *Cc: *maker-devel at yandell-lab.org
>> <javascript:_e(%7B%7D,'cvml','maker-devel at yandell-lab.org');>
>> *Envoyé: *Mardi 11 Octobre 2016 22:05:50
>> *Objet: *Re: [maker-devel] Combining and merging two Maker annotation
>> gff files ?
>>
>> Masking doesn’t just affect the gene models, but also evidence alignment
>> and thus scoring. So merging in this way would not make much sense as the
>> second less masked set would always score better because it has more
>> evidence alignments permitted by the lack of masking (not necessarily real,
>> but drawn in by repeats).
>>
>> The result would be that any attempt of a merge would almost exclusively
>> result in all genes from the second set always scoring higher.
>>
>> —Carson
>>
>>
>>
>> On Oct 10, 2016, at 3:43 AM, Mohamed Amine CHEBBI <
>> mohamed.amine.chebbi at univ-poitiers.fr
>> <javascript:_e(%7B%7D,'cvml','mohamed.amine.chebbi at univ-poitiers.fr');>>
>> wrote:
>>
>> Hi!
>>
>> I’m using the latest version of Maker2 to annotate an arthropod genome.
>> First, I have run RepeatModeler to create rmlib for Maker, then I have
>> followed two independent annotation strategies on the same assembly :
>> 1- Passing throw Maker all the repeats collected by RepeatModeler (
>> Identified repeats in the Repbase + Unkown Models).
>> 2- Passing throw Maker only the identified repeats.
>>
>> Both annotations work successfully. The first annotation gives me 19048
>> genes against 22931 done by the second one. Know, I'm seeing for a mean to
>> merge the two annotation gff files without doing a re-annotation and by
>> taking the best and non redundant supported gene models .
>>
>> So, do you think that configuring the maker options as below, could
>> resolve this issue :
>> maker_gff=1-mask-all.gff,2-mask-onlyKnown.gff #MAKER derived GFF3 file
>> #MAKER derived GFF3 file
>> est_pass=1 #use ESTs in maker_gff: 1 = yes, 0 = no
>> altest_pass=0 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no
>> protein_pass=1 #use protein alignments in maker_gff: 1 = yes, 0 = no
>> rm_pass=1 #use repeats in maker_gff: 1 = yes, 0 = no
>> model_pass=0 #use gene models in maker_gff: 1 = yes, 0 = no
>> pred_pass=1 #use ab-initio predictions in maker_gff: 1 = yes, 0 = no
>> other_pass=0 #passthrough anyything else in maker_gff: 1 = yes, 0 = no
>>
>> --
>> Mohamed Amine CHEBBI, PhD Student
>> Université de Poitiers
>>
>> _______________________________________________
>> maker-devel mailing list
>> maker-devel at box290.bluehost.com
>> <javascript:_e(%7B%7D,'cvml','maker-devel at box290.bluehost.com');>
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>
>>
>>
>>
>>
>> _______________________________________________
>> maker-devel mailing list
>> maker-devel at box290.bluehost.com
>> <javascript:_e(%7B%7D,'cvml','maker-devel at box290.bluehost.com');>
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>
>>
>
>
> --
> Xabier Vázquez-Campos, *PhD*
> *Research Associate*
> Water Research Centre
> School of Civil and Environmental Engineering
> The University of New South Wales
> Sydney NSW 2052 AUSTRALIA
>
--
Jason Stajich
jason.stajich at gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20161026/d3c00c45/attachment-0003.html>
More information about the maker-devel
mailing list