[maker-devel] Missing genes in lift-over with est2genome
Lior Glick
liorglic at mail.tau.ac.il
Thu Apr 30 06:58:17 MDT 2020
Thanks Carson - your answer was very helpful.
Another question related to the lift-over process, if I may.
I want to take the resulting gff and pass it on to another MAKER run, where
I provide further, lower confidence evidence (ESTs and proteins). I'm not
sure which option to use though. According to this helpful post
<https://computationalbiologysite.wordpress.com/2013/07/11/maker-gff-cite-online/>,
I tried using pred_gff and model_gff, but both created cases of fusion
genes when genes are very adjacent to one another (see attached picture),
even with the correct_est_fusion parameter enabled. It looks like the only
way to take lifted-over genes "as-is" would be to use other_gff, but I
figure that this was not really intended for genes. Would you recommend
this usage? Am I missing something?
Thank you!
בתאריך יום ה׳, 23 באפר׳ 2020 ב-20:43 מאת Carson Holt <
carsonhh at gmail.com>:
> There are percent cutoffs for the est2genome algorithm you can set in the
> maker_bopts.ctl file. Additionally, maker will give the alignment but not
> produce a gene model if it can’t translate through the est2genome alignment
> (i.e. stop codons in the assembly). I believe the cutoff is 50%. If you add
> est_forward=1 to the maker_opts.ctl file names will be copied from the
> alignment source and the score in the GFF3 column will be the percent match
> to the original transcript.
>
> —Carson
>
>
>
> > On Apr 21, 2020, at 7:08 AM, Lior Glick <liorglic at mail.tau.ac.il> wrote:
> >
> > Hello,
> > I am using MAKER to annotate a plant genome assembly. A high-quality
> reference genome and annotation exists for another variety of the same
> species, so my first step is lifting over reference genes to my genome. I
> do this by setting est2genome = 1 and providing MAKER with the reference
> cDNA (transcriptome). No other evidence is provided and no prediction is
> performed. Repeat masking is done using the reference repeats library.
> > When checking the results, I found out lots of reference genes missing
> from the lift-over result. However, if I blast the sequences of these genes
> myself, I get good matches. I even see these matches when I look at the
> blast results buried in the MAKER data_store.
> > For example, a transcript of length 1077 got a match of length 855 -
> 100% identity and no gaps. Bitscore was 1709 and E-value 0. This looks like
> a pretty good match, but it is not found in the final MAKER results
> (gff/fasta).
> > Why is this happening? Are there some cutoffs that are not satisfied? If
> so, what are they and how can they be configured?
> >
> > Thanks,
> > Lior
> > _______________________________________________
> > maker-devel mailing list
> > maker-devel at yandell-lab.org
> > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20200430/a53d513e/attachment-0004.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: fusion.png
Type: image/png
Size: 33185 bytes
Desc: not available
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20200430/a53d513e/attachment-0004.png>
More information about the maker-devel
mailing list