[maker-devel] gene annotation for a better genome

Carson Holt carsonhh at gmail.com
Mon Jan 29 11:23:06 MST 2018


You can set both est2genome=1 and protein2genome=1. You can also set est_forward=1 to get the names from the old models (you have to add it as it’s not already there). If you want to try and force an alignment to a specifc location, you can also add maker_coor=chr2:1-3000 to the fasta header comment line to have maker only alow alignments within a specific region (chr2:1-3000 in the example).

—Carson


> On Jan 26, 2018, at 4:16 PM, Quanwei Zhang <qwzhang0601 at gmail.com> wrote:
> 
> Hi Carson:
> 
> Thank you for your previous suggestions. I have done the annotation according to your suggestions. I firstly mapped the transcripts from old assembly to the new assembly by setting "est2genome=1", and then update the models by new predictions. 
> 
> Besides mapping by "est2genome=1" , do you think it is a good idea to do a separate mapping by proteins of old assembly (setting "protein2genome=1")? And then I provide both mapping GFF files (i.e., mapping GFF by transcripts and proteins, separately) and update them with new predictions and evidence support? Why I am trying to do this is because I found for  certain genes they were not mapped to the new assembly but they can be mapped by protein orthologs. 
> 
> Thank you.
> 
> Best
> Quanwei 
> 
> 2017-10-24 18:26 GMT-04:00 Carson Holt <carsonhh at gmail.com <mailto:carsonhh at gmail.com>>:
> Yes. If you use est2genome it will just align the model, and then find the longest ORF. So it is a quick way to jsut align old models to the new assembly. Alternatively you can just do de novo annotation.
> 
> —Carson
> 
> 
> 
>> On Oct 24, 2017, at 10:54 AM, Quanwei Zhang <qwzhang0601 at gmail.com <mailto:qwzhang0601 at gmail.com>> wrote:
>> 
>> Dear Carson:
>> 
>> Thank you again for your suggestions. I just get the new genome assembly of NMR and start to do gene annotation. I understand you ideas about this. But can I simply use the old genome transcripts as transcript evidence, and just following the standard Maker2 pipeline? I set est2genome=1 and provide the mRNA sequences in the fasta format for the first round training of SNAP.
>> 
>> For transcripts I have the following choices. I think the first choice is more reliable and better, right?
>> (1) There are about 60,000 RefSeq transcripts from NCBI. So I downloaded those sequences in fasta format.
>> (2) We have the raw data of RNA-seq from 11 tissues, we can do assembly by trinity for each sample and then get the transcripts. But I think most of the RNA-seq should have been submitted to NCBI. 
>> 
>> BTW, if we use the RefSeq data from NCBI, we can download the mRNA sequences, coding sequences or protein sequences. I wonder which type of data are the best to train the SNAP? For Augustus, we will use BUSCO to train it. 
>> 
>> Many thanks. 
>> 
>> Best
>> Quanwei
>> 
>> 
>> 
>> 
>> 2017-09-29 12:36 GMT-04:00 Carson Holt <carsonhh at gmail.com <mailto:carsonhh at gmail.com>>:
>> You can try using the est2genome=1 option to map the old models forward onto the new assembly as if they were ESTs (add a line that says est_forward=1 to the control file to maintain old naming and set est=1 to the old model transcript file). Then provide the final models as a pred_gff for a subsuquent run (i.e. a traditional MAKER run where you are annotating the new assembly with transcript and protein evidence and ab initio predictors). Don’t supply the old models to est= on that run.
>> 
>> The idea behind doing it this way is:
>> 1. You need to get old models onto the new assembly so coordinates will change. So by doing it this way, you will at least be able to move many models forward based on homology.
>> 2. By providing the models to pred_gff on a subsequent MAKER run, you are just letting old models compete against new annotations. They will be rejected if they have no evidence support, or can be kept if they score better than alternate models from SNAP/Augustus. That way you have the chance to integrate old models while at the same time rejecting some old models that have no evidence overlap.
>> 
>> —Carson
>> 
>> 
>> > On Sep 28, 2017, at 6:05 AM, Quanwei Zhang <qwzhang0601 at gmail.com <mailto:qwzhang0601 at gmail.com>> wrote:
>> >
>> > Hello:
>> >
>> > Recently, we got a new version of NMR genome, whose genome had been assembled and annotated a few years ago. We can download the gene annotation from NCBI.
>> >
>> > Now we want to annotate the new genome using Maker2 pipeline. I wonder how can I fully make use of existing annotations. On the other hand, since the previous genome is not very well assemblies, some genes annotation maybe false positives. I hope those false positive genes in previous annotation won't mislead Maker2 for current gene annotation.
>> >
>> > Do you have any suggestions. Thanks
>> >
>> > Best
>> > Quanwei
>> > _______________________________________________
>> > maker-devel mailing list
>> > maker-devel at box290.bluehost.com <mailto:maker-devel at box290.bluehost.com>
>> > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org <http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org>
>> 
>> 
> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20180129/543a4d1d/attachment-0002.html>


More information about the maker-devel mailing list