[maker-devel] A way to compare 2 annotation runs?

Tue Apr 19 15:44:04 MDT 2016

Just a quick thought

The smallest summary of what you’re after might be the jaccard difference between you annotation as computed by bedtools http://bedtools.readthedocs.org/en/latest/content/tools/jaccard.html

??

From: maker-devel [mailto:maker-devel-bounces at yandell-lab.org] On Behalf Of Barry Moore
Sent: Tuesday, April 19, 2016 4:37 PM
To: Florian <fdolze at students.uni-mainz.de>; maker-devel <maker-devel at yandell-lab.org>
Cc: Campbell, Michael <mcampbel at cshl.edu>
Subject: Re: [maker-devel] A way to compare 2 annotation runs?

The Sequence Ontology provides some tools for this:

SOBAcl has some pre-configured reports/graphs with some flexibility to modify their content/layout.
https://github.com/The-Sequence-Ontology/SOBA

This simple example provides a table for two GFF3 files of the count of feature types:

SOBAcl --columns file --rows type --data type --data_type count   \

  data/dmel-all-r5.30_0001000.gff data/dmel-all-r5.30_0010000.gff
More complex examples are available in the test file SOBA/t/sobacl_test.sh

The GAL library is a perl library that works well with MAKER output and other valid GFF3 documents.  I has some scripts that would provide metrics along the lines of what you’re looking for, but is primarily a programing library to make it easy to roll your own
https://github.com/The-Sequence-Ontology/GAL<https://github.com/The-Sequence-Ontology/SOBA>

If you’re OK with a little bit of perl code, modifying the synopsis code in the README a bit you can generate the splice complexity metrics described here (http://www.ncbi.nlm.nih.gov/pubmed/19236712) are easy to produce:

use GAL::Annotation;

my $annot = GAL::Annotation->new(qw(file.gff file.fasta);

my $features = $annot->features;

my $genes = $features->search( {type => ‘gene'} );

while (my $gene = $genes->next) {

    print $gene->feature_id        . “\t";

    print $gene->splice_complexity . “\n”;

    }

}

Hope that helps,

Barry

On Apr 19, 2016, at 9:08 AM, Carson Holt <carsonhh at gmail.com<mailto:carsonhh at gmail.com>> wrote:

I’m going to ask Michael Campbell to answer this. He wrote a protocols paper that will help.

—Carson

On Apr 19, 2016, at 6:08 AM, Florian <fdolze at students.uni-mainz.de<mailto:fdolze at students.uni-mainz.de>> wrote:

Hello All,

We ran MAKER on a newly assembled genome for 3 iterations, since 2 seems to be the recommended standard and while on holiday I just ran it a third time. Now I want to compare the results of the iterations to see where the annotation (hopefully) improved/changed but I cant really come up with a clever way to this.

I reckon this has to be an often solved problem though I couldnt find a solution except an older entry in this mail-list but that wasnt helpful.

So how are people assessing quality of a maker run? How do you say one run was 'better' than another?

best regards & thanks for your input,
Florian

_______________________________________________
maker-devel mailing list
maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>
http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org

_______________________________________________
maker-devel mailing list
maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>
http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20160419/e2c1b854/attachment-0003.html>