[maker-devel] gene annotation for a better genome

Carson Holt carsonhh at gmail.com
Tue Jan 30 09:54:05 MST 2018


You can set both simultaneously. est2genome will almost always be picked first since it will match better thatn the protein alignment (i.e. it matches at UTRs).

—Carson


> On Jan 29, 2018, at 12:57 PM, Quanwei Zhang <qwzhang0601 at gmail.com> wrote:
> 
> Dear Carson:
> 
> Thank you for your reply. Do you mean set est2genome=1 and protein2genome=1 in one round or do such mapping in two separate rounds?
> 
> So I will provide gff files by mapping the transcripts and proteins to "pred_gff". Besides the gff from such mapping, I am also considering to provide a gff file obtained from a regular de novo annotation by maker2. And then update gene models from those gff. 
> 
> Here is the reason why I consider this. Suppose at location 1 there is a gene model gA by mapping transcripts and proteins. Then if I try to update those gene  models in the second round of maker, maker can not change internal exons of gA (so can not replace it). However, if I provide both the gff by mapping transcripts and gff by maker de novo annotation, then if another gene model gA' (by de novo annotation) was predicted by maker at the same location, maker will compare gA and gA' and select the one with higher score, right? By this way we can replace a mapping gene model with predicted model by maker if the predicted one have stronger evidence support. Right?
> 
> Thank you.
> 
> Best
> Quanwei
> 
>  
> 
> 2018-01-29 13:23 GMT-05:00 Carson Holt <carsonhh at gmail.com <mailto:carsonhh at gmail.com>>:
> You can set both est2genome=1 and protein2genome=1. You can also set est_forward=1 to get the names from the old models (you have to add it as it’s not already there). If you want to try and force an alignment to a specifc location, you can also add maker_coor=chr2:1-3000 to the fasta header comment line to have maker only alow alignments within a specific region (chr2:1-3000 in the example).
> 
> —Carson
> 
> 
>> On Jan 26, 2018, at 4:16 PM, Quanwei Zhang <qwzhang0601 at gmail.com <mailto:qwzhang0601 at gmail.com>> wrote:
>> 
>> Hi Carson:
>> 
>> Thank you for your previous suggestions. I have done the annotation according to your suggestions. I firstly mapped the transcripts from old assembly to the new assembly by setting "est2genome=1", and then update the models by new predictions. 
>> 
>> Besides mapping by "est2genome=1" , do you think it is a good idea to do a separate mapping by proteins of old assembly (setting "protein2genome=1")? And then I provide both mapping GFF files (i.e., mapping GFF by transcripts and proteins, separately) and update them with new predictions and evidence support? Why I am trying to do this is because I found for  certain genes they were not mapped to the new assembly but they can be mapped by protein orthologs. 
>> 
>> Thank you.
>> 
>> Best
>> Quanwei 
>> 
>> 2017-10-24 18:26 GMT-04:00 Carson Holt <carsonhh at gmail.com <mailto:carsonhh at gmail.com>>:
>> Yes. If you use est2genome it will just align the model, and then find the longest ORF. So it is a quick way to jsut align old models to the new assembly. Alternatively you can just do de novo annotation.
>> 
>> —Carson
>> 
>> 
>> 
>>> On Oct 24, 2017, at 10:54 AM, Quanwei Zhang <qwzhang0601 at gmail.com <mailto:qwzhang0601 at gmail.com>> wrote:
>>> 
>>> Dear Carson:
>>> 
>>> Thank you again for your suggestions. I just get the new genome assembly of NMR and start to do gene annotation. I understand you ideas about this. But can I simply use the old genome transcripts as transcript evidence, and just following the standard Maker2 pipeline? I set est2genome=1 and provide the mRNA sequences in the fasta format for the first round training of SNAP.
>>> 
>>> For transcripts I have the following choices. I think the first choice is more reliable and better, right?
>>> (1) There are about 60,000 RefSeq transcripts from NCBI. So I downloaded those sequences in fasta format.
>>> (2) We have the raw data of RNA-seq from 11 tissues, we can do assembly by trinity for each sample and then get the transcripts. But I think most of the RNA-seq should have been submitted to NCBI. 
>>> 
>>> BTW, if we use the RefSeq data from NCBI, we can download the mRNA sequences, coding sequences or protein sequences. I wonder which type of data are the best to train the SNAP? For Augustus, we will use BUSCO to train it. 
>>> 
>>> Many thanks. 
>>> 
>>> Best
>>> Quanwei
>>> 
>>> 
>>> 
>>> 
>>> 2017-09-29 12:36 GMT-04:00 Carson Holt <carsonhh at gmail.com <mailto:carsonhh at gmail.com>>:
>>> You can try using the est2genome=1 option to map the old models forward onto the new assembly as if they were ESTs (add a line that says est_forward=1 to the control file to maintain old naming and set est=1 to the old model transcript file). Then provide the final models as a pred_gff for a subsuquent run (i.e. a traditional MAKER run where you are annotating the new assembly with transcript and protein evidence and ab initio predictors). Don’t supply the old models to est= on that run.
>>> 
>>> The idea behind doing it this way is:
>>> 1. You need to get old models onto the new assembly so coordinates will change. So by doing it this way, you will at least be able to move many models forward based on homology.
>>> 2. By providing the models to pred_gff on a subsequent MAKER run, you are just letting old models compete against new annotations. They will be rejected if they have no evidence support, or can be kept if they score better than alternate models from SNAP/Augustus. That way you have the chance to integrate old models while at the same time rejecting some old models that have no evidence overlap.
>>> 
>>> —Carson
>>> 
>>> 
>>> > On Sep 28, 2017, at 6:05 AM, Quanwei Zhang <qwzhang0601 at gmail.com <mailto:qwzhang0601 at gmail.com>> wrote:
>>> >
>>> > Hello:
>>> >
>>> > Recently, we got a new version of NMR genome, whose genome had been assembled and annotated a few years ago. We can download the gene annotation from NCBI.
>>> >
>>> > Now we want to annotate the new genome using Maker2 pipeline. I wonder how can I fully make use of existing annotations. On the other hand, since the previous genome is not very well assemblies, some genes annotation maybe false positives. I hope those false positive genes in previous annotation won't mislead Maker2 for current gene annotation.
>>> >
>>> > Do you have any suggestions. Thanks
>>> >
>>> > Best
>>> > Quanwei
>>> > _______________________________________________
>>> > maker-devel mailing list
>>> > maker-devel at box290.bluehost.com <mailto:maker-devel at box290.bluehost.com>
>>> > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org <http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org>
>>> 
>>> 
>> 
>> 
> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20180130/98782683/attachment-0002.html>


More information about the maker-devel mailing list