[maker-devel] Question about MAKER chunks and "neighbouring" annotations

Tue May 31 19:57:10 MDT 2016

Hello,

thank you very much for your detailed answer! Looks like I had
misinterpreted some details of the program, this is very helpful, thank you!

Cheers
Philipp

On 31.05.2016 23:51, Carson Holt wrote:
> Annotations never actually cross a chunk boundaries because the boundaries are not fixed. It’s much more complicated than that, but basically we know from the alignment scoring model the maximum distance an HSP can occur and still be included in the alignment. This means that I know precisely whether there is a chance that an alignment may include another part when it occurs near the edge of a blasted sequence. When there is a chance, the sequence gets extended and everything will be realigned (de novo) using the extended sequence which can include an entire neighboring chunk. This is a very fast operation since it’s just the known hits being aligned rather than the whole database. So think of it more like a dynamic window rather than a fixed boundary. Results are then sorted and serialized to disk. Also the initial BLAST is done with very permissive parameters and overlapping sequence boundaries, so extremely low scoring partial alignments are enough to trigger an extension and realignment (we know before hand the minimum sequence length needed to generate a given alignment score and can extrapolate maximum theoretical score given a yet to be generated extension).
>
> The serialized alignments then get clustered across the entire length of the contig (not just within a chunk), and clusters are annotated one at a time. Think of it like a linear walk down the contig through the serialized features, clustering as you go. Every time alignments stop being added to a cluster and that cluster ends, it can be annotated as a self contained unit. This is why shared storage is required for MAKER. So MAKER never joins the genes, as they were never called in a way where they could be split in the first place.
>
> The split_hit parameter affect clustering as well as the alignment model for how far away an HSP can be and still be conceded part of the same alignment (long unpolished alignments with gaps longer than this will be broken into two separate alignments). pred_flank also affects clusteing slightly, but it’s primary effect is the generation of flanking sequence around current cluster boundaries (clusters include all alignments as well as ab initio predictions, so it is added to those existing boundaries).
>
> The reason you may get models without a start or stop codon, is because HMMs in predictors like snap and augustus pick the highest likelihood path regardless, not because of a chunk split. Also all ab initio calls are part of the cluster, so it is never trimmed in a way that a cluster boundary ever falls part way across one of those models. 
>
> —Carson
>  
>> On May 31, 2016, at 12:37 AM, Philipp Bayer <philipp.bayer at uwa.edu.au> wrote:
>>
>> Hello,
>>
>> I have a minor question about the way MAKER joins annotations from
>> different chunks when using MPI.
>>
>> Let's say I have a longer gene that bridges two chunks, so the jobs
>> annotating both chunks separately would return two incomplete genes, one
>> without a stop codon, one without a start codon. I assume MAKER would
>> then join those two into a single gene, right? Is this behaviour
>> influenced by the "split_hit" or "pred_flank" parameters in maker_opts.ctl?
>>
>> Thank you
>>
>> Philipp Bayer
>>
>>
>> _______________________________________________
>> maker-devel mailing list
>> maker-devel at box290.bluehost.com
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org