[maker-devel] putative preponderance of short exons??
Malcolm Hinsley
mhinsley at ebi.ac.uk
Mon Mar 31 04:20:10 MDT 2014
Hi
I've run Maker on a de novo assembly of a species of fly and then ran
some simple statistics (intron/ exon/ CDS length, exons per gene) over
the GFF output and compared with a couple of other species.
It all looks good except that there is a surprising number of very short
exons (6000 < 50 bp, 3500 < 30 bp, 878< 10 bp, 87k total - see attached
pdf), black is drosophilia, red is A.gambiae, green is with 5' and 3'
exons removed).
I ran est2genome & protein2genome, then 3 cycles of Augustus and SNAP.
I'm using maker 2.31 (unpatched).
Anecdotally, these short exons appear without EST or protein evidence
and they all line up with canonical splice sequences (GT----AG).
(but i've only looked at a few using Apollo).
While there's no requirement that exons should be longer I'm suspicious
of this as there must be some evolutionary relationship between these
species.
I've compared with a another species annotated with Maker (using SNAP
and Augustus) which is more distant (not yet publicly available), and
the same pattern of short exons is present.
I wondered if they were created to fulfil the need for start/stop
codons, but this does not appear to be the case (mostly they are mid-gene).
Is there some way to adjust the predictors eg to require external
evidence? or anything else you could suggest? ... I can see the
following in the tutorial but I'm not sure how they could help:
pred_flank=200 #flank for extending evidence clusters sent to gene predictors
pred_stats=0 #report AED and QI statistics for all predictions as well as models
AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and 1)
min_protein=0 #require at least this many amino acids in predicted proteins
alt_splice=0 #Take extra steps to try and find alternative splicing, 1 = yes, 0 = no
always_complete=0 #extra steps to force start and stop codons, 1 = yes, 0 = no
thanks
--
malcolm hinsley | EnsEMBL Genomes | +44 (0)1223 49 4669
European Bioinformatics Institute (EMBL-EBI)
European Molecular Biology Laboratory
Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD
United Kingdom
-------------- next part --------------
A non-text attachment was scrubbed...
Name: exon_53.pdf
Type: application/pdf
Size: 10619 bytes
Desc: not available
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140331/edd22fe9/attachment-0002.pdf>
More information about the maker-devel
mailing list