[maker-devel] Differences in non_overlapping protein file between runs

Carson Holt carsonhh at gmail.com
Thu Mar 9 10:52:30 MST 2017


My guess is that there is either an issue with the GFF3 file you supplied, so its features are not overlapping anything.

—Carson


> On Mar 6, 2017, at 9:51 AM, YannDussert <dussert.yann at gmail.com> wrote:
> 
> Hello,
> 
> First, thank you for developing MAKER, this is a great annotation tool!
> 
> I am trying to annotate the genome of a biotrophic oomycete with MAKER. After reading multiple posts on this list, I first used RNA-seq data and a protein set from other oomycetes to create a first training set. I then used augustus, snap (both trained with models from the first round) and genemark for ab-initio gene prediction during a second round (masked and unmasked genome). I ran MAKER with the following options: single_exon=1, split_hit=5000, correct_est_fusion=1.
> 
> After the second round, I had only around 11000 annotated genes (96% completeness with Busco V2), whereas I'm expecting between 13000-17000 genes (numbers from other annotated oomycetes). There was only around 1500 genes in the non_overlapping protein file. After looking at the annotation on a genome browser, one of the problems was apparently gene fusions due to bad protein evidence. Following the advice on another post, I tried running MAKER by passing the ab-initio predictions with pred_gff, to avoid using bad protein hints for gene predictors. I still have around 11000 annotated genes, but now there are 10000 genes in the non_overlapping protein file. Why this difference? I thought that this file included gene predictions not supported by any evidence, did I miss something?
> 
> Thank you in advance for your answer.
> 
> Best regards,
> Yann
> 
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org





More information about the maker-devel mailing list