[maker-devel] putative preponderance of short exons??
Carson Holt
carsonhh at gmail.com
Mon Mar 31 07:52:15 MDT 2014
The intron/exon structure is determined by SNAP, Augustus, etc. It is not
affected by any of the maker parameters. Only evidence alignments are
affected by the maker settings. You can try retraining or manually
editing the HMMs, but they might also be regions where your assembly is
incorrect and those algorithms make short exons in order to make a
structure work without getting stop codons mid gene.
Thanks,
Carson
On 3/31/14, 4:20 AM, "Malcolm Hinsley" <mhinsley at ebi.ac.uk> wrote:
>Hi
>
>I've run Maker on a de novo assembly of a species of fly and then ran
>some simple statistics (intron/ exon/ CDS length, exons per gene) over
>the GFF output and compared with a couple of other species.
>It all looks good except that there is a surprising number of very short
>exons (6000 < 50 bp, 3500 < 30 bp, 878< 10 bp, 87k total - see attached
>pdf), black is drosophilia, red is A.gambiae, green is with 5' and 3'
>exons removed).
>
>I ran est2genome & protein2genome, then 3 cycles of Augustus and SNAP.
>I'm using maker 2.31 (unpatched).
>
>Anecdotally, these short exons appear without EST or protein evidence
>and they all line up with canonical splice sequences (GT----AG).
>(but i've only looked at a few using Apollo).
>
>While there's no requirement that exons should be longer I'm suspicious
>of this as there must be some evolutionary relationship between these
>species.
>I've compared with a another species annotated with Maker (using SNAP
>and Augustus) which is more distant (not yet publicly available), and
>the same pattern of short exons is present.
>I wondered if they were created to fulfil the need for start/stop
>codons, but this does not appear to be the case (mostly they are
>mid-gene).
>
>
>Is there some way to adjust the predictors eg to require external
>evidence? or anything else you could suggest? ... I can see the
>following in the tutorial but I'm not sure how they could help:
>
>pred_flank=200 #flank for extending evidence clusters sent to gene
>predictors
>pred_stats=0 #report AED and QI statistics for all predictions as well as
>models
>AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and
>1)
>min_protein=0 #require at least this many amino acids in predicted
>proteins
>alt_splice=0 #Take extra steps to try and find alternative splicing, 1 =
>yes, 0 = no
>always_complete=0 #extra steps to force start and stop codons, 1 = yes, 0
>= no
>
>
>thanks
>
>--
>malcolm hinsley | EnsEMBL Genomes | +44 (0)1223 49 4669
>European Bioinformatics Institute (EMBL-EBI)
>European Molecular Biology Laboratory
>Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD
>United Kingdom
>
>_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
More information about the maker-devel
mailing list