[maker-devel] gene annotation for a better genome

Tue Oct 24 10:54:13 MDT 2017

Dear Carson:

Thank you again for your suggestions. I just get the new genome assembly of
NMR and start to do gene annotation. I understand you ideas about this. But
can I simply use the old genome transcripts as transcript evidence, and
just following the standard Maker2 pipeline? I set est2genome=1 and provide
the mRNA sequences in the fasta format for the first round training of SNAP.

For transcripts I have the following choices. I think the first choice is
more reliable and better, right?
(1) There are about 60,000 RefSeq transcripts from NCBI. So I downloaded
those sequences in fasta format.
(2) We have the raw data of RNA-seq from 11 tissues, we can do assembly by
trinity for each sample and then get the transcripts. But I think most of
the RNA-seq should have been submitted to NCBI.

BTW, if we use the RefSeq data from NCBI, we can download the mRNA
sequences, coding sequences or protein sequences. I wonder which type of
data are the best to train the SNAP? For Augustus, we will use BUSCO to
train it.

Many thanks.

Best
Quanwei

2017-09-29 12:36 GMT-04:00 Carson Holt <carsonhh at gmail.com>:

> You can try using the est2genome=1 option to map the old models forward
> onto the new assembly as if they were ESTs (add a line that says
> est_forward=1 to the control file to maintain old naming and set est=1 to
> the old model transcript file). Then provide the final models as a pred_gff
> for a subsuquent run (i.e. a traditional MAKER run where you are annotating
> the new assembly with transcript and protein evidence and ab initio
> predictors). Don’t supply the old models to est= on that run.
>
> The idea behind doing it this way is:
> 1. You need to get old models onto the new assembly so coordinates will
> change. So by doing it this way, you will at least be able to move many
> models forward based on homology.
> 2. By providing the models to pred_gff on a subsequent MAKER run, you are
> just letting old models compete against new annotations. They will be
> rejected if they have no evidence support, or can be kept if they score
> better than alternate models from SNAP/Augustus. That way you have the
> chance to integrate old models while at the same time rejecting some old
> models that have no evidence overlap.
>
> —Carson
>
>
> > On Sep 28, 2017, at 6:05 AM, Quanwei Zhang <qwzhang0601 at gmail.com>
> wrote:
> >
> > Hello:
> >
> > Recently, we got a new version of NMR genome, whose genome had been
> assembled and annotated a few years ago. We can download the gene
> annotation from NCBI.
> >
> > Now we want to annotate the new genome using Maker2 pipeline. I wonder
> how can I fully make use of existing annotations. On the other hand, since
> the previous genome is not very well assemblies, some genes annotation
> maybe false positives. I hope those false positive genes in previous
> annotation won't mislead Maker2 for current gene annotation.
> >
> > Do you have any suggestions. Thanks
> >
> > Best
> > Quanwei
> > _______________________________________________
> > maker-devel mailing list
> > maker-devel at box290.bluehost.com
> > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20171024/3c412026/attachment-0002.html>