[maker-devel] question regarding MAKER determination of CDS boundaries

Thu Oct 2 14:06:27 MDT 2014

Thanks Carson-
indeed, I was using always_complete=1, thinking that would be most 
appropriate
for the current state of the genome assemblies. I see now that this 
question has come up a few
times before on the list, sorry not to have thought to search through 
the list archives
before posting the query yet again. But thanks for the additional 
suggestion on the recalculation
approach, that sounds straightforward.

regards

Andrew

On 10/2/14 1:52 PM, Carson Holt wrote:
> There can be three sources of non-M starting transcripts.
>
> 1  Partial models that do not have a start.
> 2. The ab initio gene predictors themselves can pick an alternate
> non-canonical start (this is rare).
> 3. The default BioPerl codon table has alternate start codons and these
> can return 'true' when you test if a codon is a start codon before adding
> UTR or if you used the always_complet=1 option (If you get non-canonical
> starts with UTR then this is the most likely source).
>
>
> The current versions of MAKER (2.31+) exports a 'strict' canonical codon
> table to BioPerl (overriding the default table with alternate start).
> This will force start locations identified on extended transcripts to be
> only 'M'. You can rerun your annotations on a current version of MAKER or
> just pass in you previous transcripts via GFF3 to have it recalculate the
> ORF if you have an odd number of alternate starts from a previous version
> of MAKER when you used the always_complet=1 option.
>
> --Carson
>
>
> On 10/2/14, 1:28 PM, "Andrew Farmer" <adf at ncgr.org> wrote:
>
>> Hi all-
>> several months ago, our group used MAKER-P (version 2.30) to annotate
>> some draft genome assemblies,
>> and have since been working a bit more closely evaluating the predicted
>> gene models in an effort to get them
>> ready for public release.  One of the things that we recently noticed
>> during this process is that a considerable proportion
>> (~%10) of the peptides predicted do not begin with start codons.
>> Initially, my guess was that this was simply due
>> to assembly gaps causing truncations (and this may be a partial
>> explanation) but I was surprised to see many of
>> them with 5' UTRs reported- about half of the proteins beginning without
>> a start codon report a 5'UTR of length 0,
>> while the rest of have 5'UTR lengths reported in a range from a few bp
>> to several kb in length.
>>
>> Having dug in a little deeper on the supporting evidence for one
>> example, one plausible explanation seems
>> to be that the choice of CDS start has been influenced by an outlier in
>> the protein alignments (ie one protein whose
>> alignment start extends a little further upstream than all of the
>> others, which ). Before I spend more time trying
>> to reverse engineer the diagnosis of other examples, it seemed worth
>> sending the list a message to see if this
>> seems plausible, or maybe there is a simpler explanation for it that
>> I've overlooked. I can send more specific
>> details on my example case if it would be helpful.
>>
>> thanks in advance for your insights/suggestions
>>
>> Andrew Farmer
>>
>> -- 
>> ...all concepts in which an entire process is semiotically concentrated
>> elude definition; only that which has no history is definable.
>>
>> Friedrich Nietzsche
>>
>>
>> _______________________________________________
>> maker-devel mailing list
>> maker-devel at box290.bluehost.com
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>

-- 
...all concepts in which an entire process is semiotically concentrated
elude definition; only that which has no history is definable.

Friedrich Nietzsche