[maker-devel] Maker annotation result contain 10% of gene with incorrect start or stop codon
Carson Holt
carsonhh at gmail.com
Fri Mar 26 09:59:55 MDT 2021
Thanks for this. Just some info on how MAKER sets the start/stop codon. MAKER will maintain the reading frame and codons of the gene predictor, so if a longer ORF exists in another reading frame, MAKER will not switch to it (this is because the gene predictor is saying the other reading frame is less probable). MAKER can extend the ORF in the same reading frame, but only if the initial prediction is partial, mRNA alignment suggests an extension, and the ORF extends to a canonical start in the same frame as the prediction. MAKER can also truncate the ORF if the gene prediction is partial to begin with and there is mRNA evidence of UTR (this can indicate an assembly error in the gene that artificially splits the ORF - or also false merge of neighboring genes through bad mRNA-seq assembly). The always_complete options adds one extra step that is not necessarily biologically correct, but does help make genes more canonical. When set, maker will walk off the edge of the CDS without (mRNA evidence) in both directions and extend to the first canonical start or canonical stop that it encounters. It’s beneficial when using protein2genome alignments for homology based annotation since the alignments can be fuzzy near the edges.
Thanks,
Carson
> On Mar 25, 2021, at 3:03 PM, Jacques Dainat <jacques.dainat at nbis.se> wrote:
>
> Hi,
>
> I met this problem in some projects where the ORFs were not well defined. In the mRNA it was not the longest ORF chosen, which is not necessarily wrong but here it was obviously not the correct one chosen. Probably due to bad training of my abinitio tools.
> I ended up to develop a script to fix the predictions and use the longest ORF as CDS. The script is called agat_sp_fix_longest_ORF.pl <https://github.com/NBISweden/AGAT/blob/master/bin/agat_sp_fix_longest_ORF.pl>
> It is available within AGAT (https://github.com/NBISweden/AGAT <https://github.com/NBISweden/AGAT>)
>
> Hoping it could help,
>
> Best regards,
>
> Jacques Dainat, Ph.D.
>
>
>> On 8 Mar 2021, at 11:25, 廖家緯 <jwliao25 at gmail.com <mailto:jwliao25 at gmail.com>> wrote:
>>
>> Hi maker-devel group,
>>
>> I used the maker with SNAP and Augustus for annotating a green algae genome. I always set the 'always_complete=1' from the first round of annotation.
>>
>> After Augustus training, I still get around 1111 and 208 genes that don't have the correct start and stop codon. (Total annotated gene number is 12696)
>>
>> I provided the close species proteome and the green algae its own RNA-seq data for EST hint.
>>
>> Does that make sense? Does there have any way to improve the result or fix the incorrect start and stop codon for those gene?
>>
>> best,
>> Chai-Wei Liao
>>
>> --------------------------------------------------
>>
>> Chia-Wei Liao 廖家緯
>>
>> Research Assistant
>>
>> Institute of Molecular Biology,
>>
>> Academia Sinica, Taipei City, Taiwan
>>
>> Phone: 886-2-2789-9216 (Lab)
>>
>> _______________________________________________
>> maker-devel mailing list
>> maker-devel at yandell-lab.org <mailto:maker-devel at yandell-lab.org>
>> http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org
>
> _______________________________________________
> maker-devel mailing list
> maker-devel at yandell-lab.org
> http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20210326/2ef64e7a/attachment-0003.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 1376 bytes
Desc: not available
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20210326/2ef64e7a/attachment-0003.p7s>
More information about the maker-devel
mailing list