[maker-devel] Short introns

Shaun Jackman sjackman at gmail.com
Thu Jul 16 13:11:32 MDT 2015


Hi, Carson.

One of the ten questionable introns has a canonical GT-AG splice site and is 33 bp. The splice sites are GA-AG, GG-GG, GC-AG, GT-CG, GA-AT, GG-AA, GG-AG, AT-TT, GG-AT and GT-AG. The intron sizes are 33, 111, 84, 30, 219, 186, 51, 30, 45 and 33.

I was wrong about there being stop codons in the questionable introns. All ten are completely free of stop codons. Sorry for the confusion. I had extracted just the intron sequence and translated the first frame, but the intron was not aligned to a 3-nucleotide boundary.

I am convinced that these short introns are in fact genomic insertions and not introns. The root cause may be incorrectly annotated introns in the protein evidence, as you suggest.

I’ll try tweaking the $min_intron parameter. Could this parameter be added to the maker_opts.ctl configuration file?

Thanks for your help, Carson. Cheers,
Shaun



-- 
http://sjackman.ca/

On 2015-July-16 at 10:55:32 , Carson Holt (carsonhh at gmail.com) wrote:

Look at the region.  If it’s being suggested by the polished alignment, I somewhat doubt it’s just an insertion because the polished alignment will have valid splice sites.  It could be an insertion, but one that perfectly maps around canonical splice sites would be quite the coincidence (because exonerate shouldn't make big gaps to force the alignment). However if it looks more like a forced mapping around non-canonical spice sites (which shouldn’t actually produce protein2genome results) then I might support the idea that it’s an insertion.  A 250bp intron or even a 100 bp doesn’t really seem that short to me. The lower range seen in fungi for example (which have very short introns) can get close to about 20bp.  I guess it’s possible that the protein evidence you are using contains an intron that isn’t really there, that results in an intron in your job because of protein conservation (i.e. conserved codons contain the falsely used splice site).

The minimum intron given to exonerate for polishing is 20.  It’s hard coded, and you would have to manually edit it.

Line 1534 in maker/lib/GI.pm —> my $min_intron = 20; 

—Carson


On Jul 15, 2015, at 6:36 PM, Shaun Jackman <sjackman at gmail.com> wrote:

Hi, Carson.

I’m using protein evidence and protein2genome alone without ab initio gene finders to annotate an organellar genome. MAKER annotates 16 introns. 6 introns look real (according to RNAweasel) and are all larger than 900 bp. The other 10 introns are all shorter than 250 bp and multiples of 3 bp. These short introns look like genomic insertions rather than introns to me. Is there a way to specify a minimum intron size to MAKER?

6 of these short introns do not contain a stop codon, and 4 do contain a stop codon. I suppose these 4 are pseudogenes.

Thanks,
Shaun




-- 
http://sjackman.ca/

_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20150716/69940f89/attachment-0003.html>


More information about the maker-devel mailing list