[maker-devel] Maker protein match & tandem similar genes

Tim Fallon tfallon at mit.edu
Tue Jun 13 11:35:28 MDT 2017


Hi there,

I am aligning reference proteins to an insect genome through Maker, in preparation for using the gene models from the protein alignments as evidence to train SNAP (alongside de-novo assembled RNA-Seq).  I also plan on passing the protein alignments to a future Maker run as hints for SNAP / Augustus.

I’ve noticed that the maker blastx "protein_match” feature, which I presume is a result of Maker trying to make the blastx HSPs contiguous to format as a reference for exonerate (this Maker run did have protein2genome turned on), tends to fuse tandem genes from the same gene family.  See attached image.

The red regions highlight two de novo assembled transcripts which I aligned manually, from two genes that are homologous.  The top track is the blastx “match_part” features, the bottom track is the blastx “protein_match” features.  You can see that the protein_match fuses the two genes, using ~1000 bp in an intervening region, that doesn’t have blastx HSP support in the blastx “match_part” track.  The trick seems to be that a single reference protein, has blastx matches on both the left and right gene.

Cleary this isn’t a good gene model to train SNAP with, but would this misannotation screw up the hints passed to pretrained SNAP / Augustus?

Is there anyway to prevent this protein_match fusing of adjacent similar genes from happening?  For species that are closer, I’ve set the “eval_blastx” to be a lot higher (1e-50), and in that case the genes don’t get fused (but, with that level of stringent search, it is more like an orthology search, rather than just annotating general protein similarity).  I do have (rare) introns ~1000 bp, so I wouldn’t want to change the Maker “split_hit” parameter to be too low.

All the best,
-Tim

Timothy R. Fallon
PhD candidate
Laboratory of Jing-Ke Weng
Department of Biology
MIT

tfallon at mit.edu



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170613/d2617936/attachment-0002.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: protein_match_example.png
Type: image/png
Size: 142379 bytes
Desc: not available
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170613/d2617936/attachment-0002.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 1849 bytes
Desc: not available
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170613/d2617936/attachment-0002.p7s>


More information about the maker-devel mailing list