[maker-devel] gene annotation for a better genome

Mon Jan 29 12:57:42 MST 2018

Dear Carson:

Thank you for your reply. Do you mean set est2genome=1 and protein2genome=1
in one round or do such mapping in two separate rounds?

So I will provide gff files by mapping the transcripts and proteins to
"pred_gff". Besides the gff from such mapping, I am also considering to
provide a gff file obtained from a regular de novo annotation by maker2.
And then update gene models from those gff.

Here is the reason why I consider this. Suppose at location 1 there is a
gene model gA by mapping transcripts and proteins. Then if I try to update
those gene  models in the second round of maker, maker can not change
internal exons of gA (so can not replace it). However, if I provide both
the gff by mapping transcripts and gff by maker de novo annotation, then if
another gene model gA' (by de novo annotation) was predicted by maker at
the same location, maker will compare gA and gA' and select the one with
higher score, right? By this way we can replace a mapping gene model with
predicted model by maker if the predicted one have stronger evidence
support. Right?

Thank you.

Best
Quanwei

2018-01-29 13:23 GMT-05:00 Carson Holt <carsonhh at gmail.com>:

> You can set both est2genome=1 and protein2genome=1. You can also set
> est_forward=1 to get the names from the old models (you have to add it as
> it’s not already there). If you want to try and force an alignment to a
> specifc location, you can also add maker_coor=chr2:1-3000 to the fasta
> header comment line to have maker only alow alignments within a specific
> region (chr2:1-3000 in the example).
>
> —Carson
>
>
> On Jan 26, 2018, at 4:16 PM, Quanwei Zhang <qwzhang0601 at gmail.com> wrote:
>
> Hi Carson:
>
> Thank you for your previous suggestions. I have done the annotation
> according to your suggestions. I firstly mapped the transcripts from old
> assembly to the new assembly by setting "est2genome=1", and then update the
> models by new predictions.
>
> Besides mapping by "est2genome=1" , do you think it is a good idea to do a
> separate mapping by proteins of old assembly (setting "protein2genome=1")?
> And then I provide both mapping GFF files (i.e., mapping GFF by transcripts
> and proteins, separately) and update them with new predictions and evidence
> support? Why I am trying to do this is because I found for  certain genes
> they were not mapped to the new assembly but they can be mapped by protein
> orthologs.
>
> Thank you.
>
> Best
> Quanwei
>
> 2017-10-24 18:26 GMT-04:00 Carson Holt <carsonhh at gmail.com>:
>
>> Yes. If you use est2genome it will just align the model, and then find
>> the longest ORF. So it is a quick way to jsut align old models to the new
>> assembly. Alternatively you can just do de novo annotation.
>>
>> —Carson
>>
>>
>>
>> On Oct 24, 2017, at 10:54 AM, Quanwei Zhang <qwzhang0601 at gmail.com>
>> wrote:
>>
>> Dear Carson:
>>
>> Thank you again for your suggestions. I just get the new genome assembly
>> of NMR and start to do gene annotation. I understand you ideas about this.
>> But can I simply use the old genome transcripts as transcript evidence, and
>> just following the standard Maker2 pipeline? I set est2genome=1 and provide
>> the mRNA sequences in the fasta format for the first round training of SNAP.
>>
>> For transcripts I have the following choices. I think the first choice is
>> more reliable and better, right?
>> (1) There are about 60,000 RefSeq transcripts from NCBI. So I downloaded
>> those sequences in fasta format.
>> (2) We have the raw data of RNA-seq from 11 tissues, we can do assembly
>> by trinity for each sample and then get the transcripts. But I think most
>> of the RNA-seq should have been submitted to NCBI.
>>
>> BTW, if we use the RefSeq data from NCBI, we can download the mRNA
>> sequences, coding sequences or protein sequences. I wonder which type of
>> data are the best to train the SNAP? For Augustus, we will use BUSCO to
>> train it.
>>
>> Many thanks.
>>
>> Best
>> Quanwei
>>
>>
>>
>>
>> 2017-09-29 12:36 GMT-04:00 Carson Holt <carsonhh at gmail.com>:
>>
>>> You can try using the est2genome=1 option to map the old models forward
>>> onto the new assembly as if they were ESTs (add a line that says
>>> est_forward=1 to the control file to maintain old naming and set est=1 to
>>> the old model transcript file). Then provide the final models as a pred_gff
>>> for a subsuquent run (i.e. a traditional MAKER run where you are annotating
>>> the new assembly with transcript and protein evidence and ab initio
>>> predictors). Don’t supply the old models to est= on that run.
>>>
>>> The idea behind doing it this way is:
>>> 1. You need to get old models onto the new assembly so coordinates will
>>> change. So by doing it this way, you will at least be able to move many
>>> models forward based on homology.
>>> 2. By providing the models to pred_gff on a subsequent MAKER run, you
>>> are just letting old models compete against new annotations. They will be
>>> rejected if they have no evidence support, or can be kept if they score
>>> better than alternate models from SNAP/Augustus. That way you have the
>>> chance to integrate old models while at the same time rejecting some old
>>> models that have no evidence overlap.
>>>
>>> —Carson
>>>
>>>
>>> > On Sep 28, 2017, at 6:05 AM, Quanwei Zhang <qwzhang0601 at gmail.com>
>>> wrote:
>>> >
>>> > Hello:
>>> >
>>> > Recently, we got a new version of NMR genome, whose genome had been
>>> assembled and annotated a few years ago. We can download the gene
>>> annotation from NCBI.
>>> >
>>> > Now we want to annotate the new genome using Maker2 pipeline. I wonder
>>> how can I fully make use of existing annotations. On the other hand, since
>>> the previous genome is not very well assemblies, some genes annotation
>>> maybe false positives. I hope those false positive genes in previous
>>> annotation won't mislead Maker2 for current gene annotation.
>>> >
>>> > Do you have any suggestions. Thanks
>>> >
>>> > Best
>>> > Quanwei
>>> > _______________________________________________
>>> > maker-devel mailing list
>>> > maker-devel at box290.bluehost.com
>>> > http://box290.bluehost.com/mailman/listinfo/maker-devel_yand
>>> ell-lab.org
>>>
>>>
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20180129/cd01c483/attachment-0002.html>