[maker-devel] Differences in non_overlapping protein file between runs
YannDussert
dussert.yann at gmail.com
Mon Mar 6 09:51:59 MST 2017
Hello,
First, thank you for developing MAKER, this is a great annotation tool!
I am trying to annotate the genome of a biotrophic oomycete with MAKER.
After reading multiple posts on this list, I first used RNA-seq data and
a protein set from other oomycetes to create a first training set. I
then used augustus, snap (both trained with models from the first round)
and genemark for ab-initio gene prediction during a second round (masked
and unmasked genome). I ran MAKER with the following options:
single_exon=1, split_hit=5000, correct_est_fusion=1.
After the second round, I had only around 11000 annotated genes (96%
completeness with Busco V2), whereas I'm expecting between 13000-17000
genes (numbers from other annotated oomycetes). There was only around
1500 genes in the non_overlapping protein file. After looking at the
annotation on a genome browser, one of the problems was apparently gene
fusions due to bad protein evidence. Following the advice on another
post, I tried running MAKER by passing the ab-initio predictions with
pred_gff, to avoid using bad protein hints for gene predictors. I still
have around 11000 annotated genes, but now there are 10000 genes in the
non_overlapping protein file. Why this difference? I thought that this
file included gene predictions not supported by any evidence, did I miss
something?
Thank you in advance for your answer.
Best regards,
Yann
More information about the maker-devel
mailing list