[maker-devel] gene annotation for a better genome

Quanwei Zhang qwzhang0601 at gmail.com
Fri Jan 26 16:16:50 MST 2018


Hi Carson:

Thank you for your previous suggestions. I have done the annotation
according to your suggestions. I firstly mapped the transcripts from old
assembly to the new assembly by setting "est2genome=1", and then update the
models by new predictions.

Besides mapping by "est2genome=1" , do you think it is a good idea to do a
separate mapping by proteins of old assembly (setting "protein2genome=1")?
And then I provide both mapping GFF files (i.e., mapping GFF by transcripts
and proteins, separately) and update them with new predictions and evidence
support? Why I am trying to do this is because I found for  certain genes
they were not mapped to the new assembly but they can be mapped by protein
orthologs.

Thank you.

Best
Quanwei

2017-10-24 18:26 GMT-04:00 Carson Holt <carsonhh at gmail.com>:

> Yes. If you use est2genome it will just align the model, and then find the
> longest ORF. So it is a quick way to jsut align old models to the new
> assembly. Alternatively you can just do de novo annotation.
>
> —Carson
>
>
>
> On Oct 24, 2017, at 10:54 AM, Quanwei Zhang <qwzhang0601 at gmail.com> wrote:
>
> Dear Carson:
>
> Thank you again for your suggestions. I just get the new genome assembly
> of NMR and start to do gene annotation. I understand you ideas about this.
> But can I simply use the old genome transcripts as transcript evidence, and
> just following the standard Maker2 pipeline? I set est2genome=1 and provide
> the mRNA sequences in the fasta format for the first round training of SNAP.
>
> For transcripts I have the following choices. I think the first choice is
> more reliable and better, right?
> (1) There are about 60,000 RefSeq transcripts from NCBI. So I downloaded
> those sequences in fasta format.
> (2) We have the raw data of RNA-seq from 11 tissues, we can do assembly by
> trinity for each sample and then get the transcripts. But I think most of
> the RNA-seq should have been submitted to NCBI.
>
> BTW, if we use the RefSeq data from NCBI, we can download the mRNA
> sequences, coding sequences or protein sequences. I wonder which type of
> data are the best to train the SNAP? For Augustus, we will use BUSCO to
> train it.
>
> Many thanks.
>
> Best
> Quanwei
>
>
>
>
> 2017-09-29 12:36 GMT-04:00 Carson Holt <carsonhh at gmail.com>:
>
>> You can try using the est2genome=1 option to map the old models forward
>> onto the new assembly as if they were ESTs (add a line that says
>> est_forward=1 to the control file to maintain old naming and set est=1 to
>> the old model transcript file). Then provide the final models as a pred_gff
>> for a subsuquent run (i.e. a traditional MAKER run where you are annotating
>> the new assembly with transcript and protein evidence and ab initio
>> predictors). Don’t supply the old models to est= on that run.
>>
>> The idea behind doing it this way is:
>> 1. You need to get old models onto the new assembly so coordinates will
>> change. So by doing it this way, you will at least be able to move many
>> models forward based on homology.
>> 2. By providing the models to pred_gff on a subsequent MAKER run, you are
>> just letting old models compete against new annotations. They will be
>> rejected if they have no evidence support, or can be kept if they score
>> better than alternate models from SNAP/Augustus. That way you have the
>> chance to integrate old models while at the same time rejecting some old
>> models that have no evidence overlap.
>>
>> —Carson
>>
>>
>> > On Sep 28, 2017, at 6:05 AM, Quanwei Zhang <qwzhang0601 at gmail.com>
>> wrote:
>> >
>> > Hello:
>> >
>> > Recently, we got a new version of NMR genome, whose genome had been
>> assembled and annotated a few years ago. We can download the gene
>> annotation from NCBI.
>> >
>> > Now we want to annotate the new genome using Maker2 pipeline. I wonder
>> how can I fully make use of existing annotations. On the other hand, since
>> the previous genome is not very well assemblies, some genes annotation
>> maybe false positives. I hope those false positive genes in previous
>> annotation won't mislead Maker2 for current gene annotation.
>> >
>> > Do you have any suggestions. Thanks
>> >
>> > Best
>> > Quanwei
>> > _______________________________________________
>> > maker-devel mailing list
>> > maker-devel at box290.bluehost.com
>> > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20180126/5aa6e024/attachment-0001.html>


More information about the maker-devel mailing list