[maker-devel] putative preponderance of short exons??

Malcolm Hinsley mhinsley at ebi.ac.uk
Mon Mar 31 04:20:10 MDT 2014


Hi

I've run Maker on a de novo assembly of a species of fly and then ran 
some simple statistics (intron/ exon/ CDS length, exons per gene)  over 
the GFF output and compared with a couple of other species.
It all looks good except that there is a surprising number of very short 
exons (6000 < 50 bp, 3500 < 30 bp, 878< 10 bp, 87k total - see attached 
pdf), black is drosophilia, red is A.gambiae, green is with 5' and 3' 
exons removed).

I ran est2genome & protein2genome, then 3 cycles of Augustus and SNAP.  
I'm using maker 2.31 (unpatched).

Anecdotally, these short exons appear without EST or protein evidence 
and they all line up with canonical splice sequences (GT----AG).
(but i've only looked at a few using Apollo).

While there's no requirement that exons should be longer I'm suspicious 
of this as there must be some evolutionary relationship between these 
species.
I've compared with a another species annotated with Maker (using SNAP 
and Augustus)  which is more distant (not yet publicly available), and 
the same pattern of short exons is present.
I wondered if they were created to fulfil the need for start/stop 
codons, but this does not appear to be the case (mostly they are mid-gene).


Is there some way to adjust the predictors eg to require external 
evidence? or anything else you could suggest? ... I can see the 
following in the tutorial but I'm not sure how they could help:

pred_flank=200 #flank for extending evidence clusters sent to gene predictors
pred_stats=0 #report AED and QI statistics for all predictions as well as models
AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and 1)
min_protein=0 #require at least this many amino acids in predicted proteins
alt_splice=0 #Take extra steps to try and find alternative splicing, 1 = yes, 0 = no
always_complete=0 #extra steps to force start and stop codons, 1 = yes, 0 = no


thanks

-- 
malcolm hinsley | EnsEMBL Genomes | +44 (0)1223 49 4669
European Bioinformatics Institute (EMBL-EBI)
European Molecular Biology Laboratory
Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD
United Kingdom

-------------- next part --------------
A non-text attachment was scrubbed...
Name: exon_53.pdf
Type: application/pdf
Size: 10619 bytes
Desc: not available
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140331/edd22fe9/attachment-0002.pdf>


More information about the maker-devel mailing list