[maker-devel] Annotation quality and converting gff3 to gtf
Carson Holt
carsonhh at gmail.com
Tue Apr 16 08:20:01 MDT 2013
The input GFF3 file you have the link to only contains one gene? Is that
correct. If so then you should only get one gene in the output. The
resulting GTF should only have the genes (ignoring all the evidence).
To convert for eval use these command lines (note the flags such as -g for
gff3_merge so you are only looking at genes and the fast must be included in
the file, so no -n flag)
gff3_merge -d maker_datastore_index.log -g -o some_file.gff
add_utr_start_stop_gff some_file.gff > some_file2.gff
maker2eval some_file2.gff
Note that all version of MAKER after 2.09 no longer have
add_utr_start_stop_gff, the UTR is now always there explicitly, so you go
strait from gff3_merge and then use maker2eval_gtf
However with that explanation, I have to wonder if EVAL is appropriate for
you. EVAL requires a reference annotation set (that is assumed to be 100%
perfect) for comparison, and you get a perfect score whenever you call the
genes exactly identical to the reference set (which in itself has obvious
bias, but we won't get into that). Given that you have no reference set it
will not give you anything other than statistics for the distribution of
introns and exon sizes.
Alternate means for quality given no reference genome are AED (computed for
each gene as part of the MAKER run), this is basically a variation of EVAL
like statistics run against evidence clusters rather than a reference
genome, or you can just use % domain content.
See these links for examples of the statistics -->
http://www.biomedcentral.com/1471-2105/12/491
http://www.biomedcentral.com/1471-2105/10/67
Also a figure is attached with an example of quality analysis using combined
AED, domain content, and comparative orthologs.
--Carson
From: James Eckert <jteckert at gmail.com>
Date: Sunday, 14 April, 2013 5:07 PM
To: <maker-devel at yandell-lab.org>
Subject: [maker-devel] Annotation quality and converting gff3 to gtf
Hello,
I'm currently trying to figure out ways to evaluate the quality of
annotations that MAKER produces. I'm working on a novel species, so there
isn't a reference genome to compare the annotation quality to.
After doing a bit of searching on the web, I came across the EVAL tool,
which I thought may be useful for checking the output quality. EVAL takes in
gtf files, not gff3, however MAKER seems to have addressed this problem
through its accessory scripts.
I first used the script "gff3_merge" to have my whole annotation under one
gff3 file. Next I used "add_utr_start_stop_gff". This would explicitly add
the UTRs, which would be needed for converting the gff3 file to gtf. The
problem arose when trying to run "gff3_to_eval_gtf". I was expecting MAKER
to process the whole gff3 file, but it seems to have only processed 2 nodes.
The same thing happens when running the "gff3_2_gtf" script.
Here is the command I'm running, along with the output:
gff3_to_eval_gtf assem_kmer_57_utr.gff3
NODE_20666_length_66353_cov_18.405483 maker CDS 8801 8984 .
- 0 gene_id "1"; transcript_id "2";
NODE_20666_length_66353_cov_18.405483 maker CDS 8113 8717 .
- 2 gene_id "1"; transcript_id "2";
My question is whether the "gff3_to_eval_gtf" and "gff3_2_gtf" scripts have
a bug in them, or whether I'm just doing the process wrong? Perhaps if the
conversion doesn't work, there exists an alternative to EVAL that works with
native MAKER annotations?
Attached is my whole genome gff3 file, along with the file I ran
"gff3_to_eval_gtf" on.
assem_kmer-57_exp-44_covcutoff-auto_contigs.all.gff3
<https://docs.google.com/file/d/0Byl5QhezwxYOUFFBMVpldFJOb28/edit?usp=drive_
web>
assem_kmer_57_utr.gff3
<https://docs.google.com/file/d/0Byl5QhezwxYOYjBOWlVWMEJpTjQ/edit?usp=drive_
web>
Thank you in advance for your help,
James
_______________________________________________ maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20130416/c05e3c01/attachment-0003.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: B563F1FF-1E85-42E3-B79D-F7F6449F1AE9.png
Type: image/png
Size: 227568 bytes
Desc: not available
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20130416/c05e3c01/attachment-0003.png>
More information about the maker-devel
mailing list