[maker-devel] Differences in non_overlapping protein file between runs

YannDussert dussert.yann at gmail.com
Mon Mar 6 09:51:59 MST 2017


Hello,

First, thank you for developing MAKER, this is a great annotation tool!

I am trying to annotate the genome of a biotrophic oomycete with MAKER. 
After reading multiple posts on this list, I first used RNA-seq data and 
a protein set from other oomycetes to create a first training set. I 
then used augustus, snap (both trained with models from the first round) 
and genemark for ab-initio gene prediction during a second round (masked 
and unmasked genome). I ran MAKER with the following options: 
single_exon=1, split_hit=5000, correct_est_fusion=1.

After the second round, I had only around 11000 annotated genes (96% 
completeness with Busco V2), whereas I'm expecting between 13000-17000 
genes (numbers from other annotated oomycetes). There was only around 
1500 genes in the non_overlapping protein file. After looking at the 
annotation on a genome browser, one of the problems was apparently gene 
fusions due to bad protein evidence. Following the advice on another 
post, I tried running MAKER by passing the ab-initio predictions with 
pred_gff, to avoid using bad protein hints for gene predictors. I still 
have around 11000 annotated genes, but now there are 10000 genes in the 
non_overlapping protein file. Why this difference? I thought that this 
file included gene predictions not supported by any evidence, did I miss 
something?

Thank you in advance for your answer.

Best regards,
Yann




More information about the maker-devel mailing list