[maker-devel] question regarding MAKER determination of CDS boundaries

Carson Holt carsonhh at gmail.com
Thu Oct 2 13:52:44 MDT 2014


There can be three sources of non-M starting transcripts.

1  Partial models that do not have a start.
2. The ab initio gene predictors themselves can pick an alternate
non-canonical start (this is rare).
3. The default BioPerl codon table has alternate start codons and these
can return 'true' when you test if a codon is a start codon before adding
UTR or if you used the always_complet=1 option (If you get non-canonical
starts with UTR then this is the most likely source).


The current versions of MAKER (2.31+) exports a 'strict' canonical codon
table to BioPerl (overriding the default table with alternate start).
This will force start locations identified on extended transcripts to be
only 'M'. You can rerun your annotations on a current version of MAKER or
just pass in you previous transcripts via GFF3 to have it recalculate the
ORF if you have an odd number of alternate starts from a previous version
of MAKER when you used the always_complet=1 option.

--Carson


On 10/2/14, 1:28 PM, "Andrew Farmer" <adf at ncgr.org> wrote:

>Hi all-
>several months ago, our group used MAKER-P (version 2.30) to annotate
>some draft genome assemblies,
>and have since been working a bit more closely evaluating the predicted
>gene models in an effort to get them
>ready for public release.  One of the things that we recently noticed
>during this process is that a considerable proportion
>(~%10) of the peptides predicted do not begin with start codons.
>Initially, my guess was that this was simply due
>to assembly gaps causing truncations (and this may be a partial
>explanation) but I was surprised to see many of
>them with 5' UTRs reported- about half of the proteins beginning without
>a start codon report a 5'UTR of length 0,
>while the rest of have 5'UTR lengths reported in a range from a few bp
>to several kb in length.
>
>Having dug in a little deeper on the supporting evidence for one
>example, one plausible explanation seems
>to be that the choice of CDS start has been influenced by an outlier in
>the protein alignments (ie one protein whose
>alignment start extends a little further upstream than all of the
>others, which ). Before I spend more time trying
>to reverse engineer the diagnosis of other examples, it seemed worth
>sending the list a message to see if this
>seems plausible, or maybe there is a simpler explanation for it that
>I've overlooked. I can send more specific
>details on my example case if it would be helpful.
>
>thanks in advance for your insights/suggestions
>
>Andrew Farmer
>
>-- 
>...all concepts in which an entire process is semiotically concentrated
>elude definition; only that which has no history is definable.
>
>Friedrich Nietzsche
>
>
>_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org






More information about the maker-devel mailing list