[maker-devel] question regarding MAKER determination of CDS boundaries

Andrew Farmer adf at ncgr.org
Thu Oct 2 13:28:16 MDT 2014


Hi all-
several months ago, our group used MAKER-P (version 2.30) to annotate 
some draft genome assemblies,
and have since been working a bit more closely evaluating the predicted 
gene models in an effort to get them
ready for public release.  One of the things that we recently noticed 
during this process is that a considerable proportion
(~%10) of the peptides predicted do not begin with start codons. 
Initially, my guess was that this was simply due
to assembly gaps causing truncations (and this may be a partial 
explanation) but I was surprised to see many of
them with 5' UTRs reported- about half of the proteins beginning without 
a start codon report a 5'UTR of length 0,
while the rest of have 5'UTR lengths reported in a range from a few bp 
to several kb in length.

Having dug in a little deeper on the supporting evidence for one 
example, one plausible explanation seems
to be that the choice of CDS start has been influenced by an outlier in 
the protein alignments (ie one protein whose
alignment start extends a little further upstream than all of the 
others, which ). Before I spend more time trying
to reverse engineer the diagnosis of other examples, it seemed worth 
sending the list a message to see if this
seems plausible, or maybe there is a simpler explanation for it that 
I've overlooked. I can send more specific
details on my example case if it would be helpful.

thanks in advance for your insights/suggestions

Andrew Farmer

-- 
...all concepts in which an entire process is semiotically concentrated
elude definition; only that which has no history is definable.

Friedrich Nietzsche





More information about the maker-devel mailing list