<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<style>
<!--
@font-face
{font-family:Helvetica}
@font-face
{font-family:Helvetica}
@font-face
{font-family:Calibri}
@font-face
{font-family:Tahoma}
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0cm;
margin-bottom:.0001pt;
font-size:12.0pt;
font-family:"Times New Roman","serif"}
a:link, span.MsoHyperlink
{color:blue;
text-decoration:underline}
a:visited, span.MsoHyperlinkFollowed
{color:purple;
text-decoration:underline}
span.apple-converted-space
{}
span.EmailStyle18
{font-family:"Calibri","sans-serif";
color:#1F497D}
span.EmailStyle19
{font-family:"Calibri","sans-serif";
color:#993366}
span.EmailStyle20
{font-family:"Arial","sans-serif";
color:black;
font-weight:normal;
font-style:normal}
span.EmailStyle21
{font-family:"Calibri","sans-serif";
color:#003300}
.MsoChpDefault
{font-size:10.0pt}
@page WordSection1
{margin:72.0pt 72.0pt 72.0pt 72.0pt}
div.WordSection1
{}
-->
</style>
</head>
<body lang="EN-SG" link="blue" vlink="purple" style="word-wrap:break-word; line-break:after-white-space">
<div class="WordSection1">
<p class="MsoNormal"><span style="font-size:11.0pt; font-family:"Calibri","sans-serif"; color:#1F497D">Hi Carson,</span></p>
<p class="MsoNormal"><span style="font-size:11.0pt; font-family:"Calibri","sans-serif"; color:#1F497D"> </span></p>
<p class="MsoNormal"><span style="font-size:11.0pt; font-family:"Calibri","sans-serif"; color:#1F497D">Thank you for the reply.
</span></p>
<p class="MsoNormal"><span style="font-size:10.0pt; font-family:"Arial","sans-serif"; color:#1F497D"> </span></p>
<p class="MsoNormal"><span style="font-size:11.0pt; font-family:"Calibri","sans-serif"; color:#1F497D">Please find attached the GFF file of Scaffold61 along with related ctl files. The coordinates of the gene I am referring to is from 698,453 to 840,581 bp.
Several proteins, short and long are aligned to this locus including a 1200aa protein and a 2,069 aa (XP_022239675.1) protein. The latter is aligned completely to scaffold61 from 698,453 to 840,581 bp with >90% identity. Please see the related line to this
alignment from GFF file.</span></p>
<p class="MsoNormal"><span style="font-size:11.0pt; font-family:"Calibri","sans-serif"; color:#1F497D"> </span></p>
<p class="MsoNormal"><span style="font-size:11.0pt; font-family:"Calibri","sans-serif"; color:#1F497D">SCAffold61 blastx protein_match 698453 840581 730 - . ID=SCAffold61:hit:1861:3.10.0.0;Name=XP_022239675.1</span></p>
<p class="MsoNormal"><span style="font-size:11.0pt; font-family:"Calibri","sans-serif"; color:#1F497D"> </span></p>
<p class="MsoNormal"><span style="font-size:11.0pt; font-family:"Calibri","sans-serif"; color:#1F497D">However, in evidence-based run, this gene is split into two fragments. Please see the related lines from GFF file as follow:</span></p>
<p class="MsoNormal"><span style="font-size:11.0pt; font-family:"Calibri","sans-serif"; color:#1F497D"> </span></p>
<p class="MsoNormal"><span style="font-size:11.0pt; font-family:"Calibri","sans-serif"; color:#1F497D">SCAffold61 maker gene 708052 717805 . - . ID=maker-SCAffold61-exonerate_est2genome-gene-0.95;Name=maker-SCAffold61-exonerate_est2genome-gene-0.95</span></p>
<p class="MsoNormal"><span style="font-size:11.0pt; font-family:"Calibri","sans-serif"; color:#1F497D">SCAffold61 maker gene 748651 770415 . - . ID=maker-SCAffold61-exonerate_est2genome-gene-0.96;Name=maker-SCAffold61-exonerate_est2genome-gene-0.96</span></p>
<p class="MsoNormal"><span style="font-size:11.0pt; font-family:"Calibri","sans-serif"; color:#1F497D"> </span></p>
<p class="MsoNormal"><span style="font-size:11.0pt; font-family:"Calibri","sans-serif"; color:#1F497D">It looks like Exonerate prediction is based only on ESTs which are fragmented and the full-length protein aligned to this locus is completely ignored. We
have seen this type of priority for ESTs in other loci also resulting in split gene prediction (sometime 3 to 4 fragments) in spite of alignment of longer full-length proteins to the assembly. Our ESTs (Trinity assembled RNAseq transcripts) were generated
from the same individual whose genome was sequenced (and hence the identify is close to 100%). If we align only proteins, Exonerate still splits the gene based on shorter proteins aligned to the locus. I would really appreciate if you can help us to solve
this splitting of genes despite alignment of full-length proteins to the assembly.</span></p>
<p class="MsoNormal"><span style="font-size:11.0pt; font-family:"Calibri","sans-serif"; color:#1F497D"> </span></p>
<p class="MsoNormal"><span style="font-size:11.0pt; font-family:"Calibri","sans-serif"; color:#1F497D">I will be glad to send all reference protein and transcript sequences used<b><i>
</i></b>for annotation, if required. </span></p>
<p class="MsoNormal"><span style="font-size:11.0pt; font-family:"Calibri","sans-serif"; color:#1F497D"> </span></p>
<p class="MsoNormal"><span style="font-size:11.0pt; font-family:"Calibri","sans-serif"; color:#1F497D">Thanks for your time and help.</span></p>
<p class="MsoNormal"><span style="font-size:11.0pt; font-family:"Calibri","sans-serif"; color:#1F497D"> </span></p>
<p class="MsoNormal"><span style="font-size:11.0pt; font-family:"Calibri","sans-serif"; color:#1F497D">Best regards,</span></p>
<p class="MsoNormal"><span style="font-size:11.0pt; font-family:"Calibri","sans-serif"; color:#0070C0"> </span></p>
<div>
<p class="MsoNormal"><b><u><span lang="EN-US" style="font-size:9.0pt; font-family:"Calibri","sans-serif"; color:#0E24F2"><a href="mailto:prashantns@imcb.a-star.edu.sg">Prashant Shingate, PhD</a></span></u></b><b><span lang="EN-US" style="font-size:9.0pt; font-family:"Calibri","sans-serif"; color:#1F497D">
</span></b><b><span lang="EN-US" style="font-size:9.0pt; font-family:"Calibri","sans-serif"; color:black">::
</span></b><b><span lang="EN-US" style="font-size:9.0pt; font-family:"Calibri","sans-serif"; color:#1F497D">Research Fellow :: Comparative and Medical Genomics Lab :: Institute of Molecular and Cell Biology (IMCB) :: Agency for Science, Technology and Research
(A*STAR)</span></b><span style="font-size:9.0pt; font-family:"Calibri","sans-serif"; color:#1F497D"></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:9.0pt; font-family:"Calibri","sans-serif"; color:#1F497D">61 Biopolis Drive :: #05-04 Proteos :: Singapore 138673 :: DID (+65) 6586 9570 :: Fax (+65) 6779 1117</span><span lang="EN-US" style="font-size:9.0pt; font-family:"Calibri","sans-serif"; color:#0E24F2">::
<a href="http://www.imcb.a-star.edu.sg/"><span style="color:#0E24F2">http://www.imcb.a-star.edu.sg/</span></a></span><span style="font-size:9.0pt; font-family:"Calibri","sans-serif"; color:#0E24F2"></span></p>
<p class="MsoNormal" style="text-autospace:none"><b><span lang="EN-GB" style="font-size:9.0pt; font-family:"Calibri","sans-serif"; color:red">We advance science and develop innovative technology to further economic growth and improve lives.
</span></b></p>
<p class="MsoNormal"><span style="font-size:11.0pt; font-family:"Calibri","sans-serif"; color:#1F497D"> </span></p>
</div>
<p class="MsoNormal"><span style="font-size:11.0pt; font-family:"Calibri","sans-serif"; color:#1F497D"> </span></p>
<div>
<div style="border:none; border-top:solid #B5C4DF 1.0pt; padding:3.0pt 0cm 0cm 0cm">
<p class="MsoNormal"><b><span lang="EN-US" style="font-size:10.0pt; font-family:"Tahoma","sans-serif"">From:</span></b><span lang="EN-US" style="font-size:10.0pt; font-family:"Tahoma","sans-serif""> Carson Holt [mailto:carsonhh@gmail.com]
<br>
<b>Sent:</b> Tuesday, 18 December, 2018 1:38 AM<br>
<b>To:</b> Prashant Narendra SHINGATE<br>
<b>Cc:</b> Byrappa VENKATESH; maker-devel@yandell-lab.org<br>
<b>Subject:</b> Re: About split genes in MAKER annotation</span></p>
</div>
</div>
<p class="MsoNormal"> </p>
<p class="MsoNormal">It’s best to look at these in a browser like Apollo where you can also manipulate the intron/exon structure. What you will often find is that there is something that breaks the ORF or breaks splicing, so the predictors can’t build an end
to end model even with the hints given. If you have a GFF3 just for the contig, I can also look at it in a browser to help point out the logic that lead to the model.</p>
<div>
<p class="MsoNormal"> </p>
</div>
<div>
<p class="MsoNormal">—Carson</p>
<div>
<p class="MsoNormal" style="margin-bottom:12.0pt"> </p>
<div>
<p class="MsoNormal">On Dec 12, 2018, at 3:29 AM, Prashant Narendra SHINGATE <<a href="mailto:prashantns@imcb.a-star.edu.sg">prashantns@imcb.a-star.edu.sg</a>> wrote:</p>
</div>
<p class="MsoNormal"> </p>
<div>
<div>
<p class="MsoNormal" style="background:white"><span style="font-size:10.0pt; font-family:"Arial","sans-serif"; color:#222222">Hi Carson,</span><span style="font-size:11.0pt; font-family:"Calibri","sans-serif""></span></p>
</div>
<div>
<p class="MsoNormal" style="background:white"><span style="font-size:10.0pt; font-family:"Arial","sans-serif"; color:#222222"> </span><span style="font-size:11.0pt; font-family:"Calibri","sans-serif""></span></p>
</div>
<div>
<p class="MsoNormal" style="background:white"><span style="font-size:10.0pt; font-family:"Arial","sans-serif"; color:#222222">I am Prashant a Bioinformatics postdoctoral fellow from Prof B Venkatesh</span><span style="font-size:10.0pt; font-family:"Arial","sans-serif"">’s<span class="apple-converted-space"><span style="color:#222222"> </span></span><span style="color:#222222">lab,
IMCB, A*STAR. I am using<span class="apple-converted-space"> </span></span>MAKER-<span style="color:#222222">tool to annotate an invertebrate genome (~2Gb). During annotation process, we found several instances of split genes</span>even though we have full-length
reference protein sequences from very closely related species<span style="color:#222222">. Hence we decided to look at one of the loc</span>i<span class="apple-converted-space"><span style="color:#222222"> </span></span><span style="color:#222222">to understand
the reason behind it and to optimize the parameters.</span></span><span style="font-size:11.0pt; font-family:"Calibri","sans-serif""></span></p>
</div>
<div>
<p class="MsoNormal" style="background:white"><span style="font-size:10.0pt; font-family:"Arial","sans-serif"; color:#222222"> </span><span style="font-size:11.0pt; font-family:"Calibri","sans-serif""></span></p>
</div>
<div>
<p class="MsoNormal" style="background:white"><span style="font-size:10.0pt; font-family:"Arial","sans-serif"; color:#222222">We<span class="apple-converted-space"> </span></span><span style="font-size:10.0pt; font-family:"Arial","sans-serif"">looked at<span class="apple-converted-space"><span style="color:#222222"> </span></span><span style="color:#222222">a
gene ~110kb long<span class="apple-converted-space"> </span></span>and<span class="apple-converted-space"><span style="color:#222222"> </span></span><span style="color:#222222">codes for<span class="apple-converted-space"> </span></span>a<span class="apple-converted-space"> </span><span style="color:#222222">~1200
amino acid protein. We have a highly identical<span class="apple-converted-space"> </span></span>reference protein<span class="apple-converted-space"> </span><span style="color:#222222">(>90% identity and 100% coverage)</span><span class="apple-converted-space"> </span>from
another species<span style="color:#222222">.<span class="apple-converted-space"> </span></span>In addition we also have<span class="apple-converted-space"><span style="color:#222222"> </span></span><span style="color:#222222">a high coverage<span class="apple-converted-space"> </span></span>Trinity<span style="color:#222222">transcript
assembly<span class="apple-converted-space"> </span></span>from our species<span style="color:#222222">. Still, this gene is split into 4 fragments during evidence-based<span class="apple-converted-space"> </span></span>MAKER<span class="apple-converted-space"><span style="color:#222222"> </span></span><span style="color:#222222">run.
On closer a look, we found that the above mentioned closely related protein is not aligned by exonerate (protein2genome) even though it is the closest protein to this gene in our dataset.<span class="apple-converted-space"> </span></span>It looks like the
program is giving more weightage to transcripts which are typically fragments of the gene. So we are at a loss as to how to predict this gene in full.</span><span style="font-size:11.0pt; font-family:"Calibri","sans-serif""></span></p>
</div>
<div>
<p class="MsoNormal" style="background:white"><span style="font-size:10.0pt; font-family:"Arial","sans-serif"; color:#222222"> </span><span style="font-size:11.0pt; font-family:"Calibri","sans-serif""></span></p>
</div>
<div>
<p class="MsoNormal" style="background:white"><span style="font-size:10.0pt; font-family:"Arial","sans-serif"; color:#222222">For your reference,<span class="apple-converted-space"> </span></span><span style="font-size:10.0pt; font-family:"Arial","sans-serif"">I
am herewith enclosing<span class="apple-converted-space"><span style="color:#222222"> </span></span><span style="color:#222222">maker_opts.ctl file and maker_bopts.ctl. I will be glad to share the scaffold sequence and other input files if required.</span></span><span style="font-size:11.0pt; font-family:"Calibri","sans-serif""></span></p>
</div>
<div>
<p class="MsoNormal" style="background:white"><span style="font-size:10.0pt; font-family:"Arial","sans-serif"; color:#222222"> </span><span style="font-size:11.0pt; font-family:"Calibri","sans-serif""></span></p>
</div>
<div>
<p class="MsoNormal" style="background:white"><span style="font-size:10.0pt; font-family:"Arial","sans-serif"; color:#222222">Can you please help me to understand the reason behind MAKER not able to use</span><span class="apple-converted-space"><span style="font-size:10.0pt; font-family:"Arial","sans-serif""> </span></span><span style="font-size:10.0pt; font-family:"Arial","sans-serif"">the
full-length reference<span class="apple-converted-space"><span style="color:#222222"> </span></span><span style="color:#222222">protein for gene prediction</span><span class="apple-converted-space"> </span>and how we can overcome this problem.</span><span style="font-size:11.0pt; font-family:"Calibri","sans-serif""></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:10.0pt; font-family:"Arial","sans-serif""> </span><span style="font-size:11.0pt; font-family:"Calibri","sans-serif""></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:10.0pt; font-family:"Arial","sans-serif"">Thanks<span class="apple-converted-space"> </span>for your time and help.</span><span style="font-size:11.0pt; font-family:"Calibri","sans-serif""></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:10.0pt; font-family:"Arial","sans-serif""> </span><span style="font-size:11.0pt; font-family:"Calibri","sans-serif""></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:10.0pt; font-family:"Arial","sans-serif"">Best<span class="apple-converted-space"> </span>regards,</span><span style="font-size:11.0pt; font-family:"Calibri","sans-serif""></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:11.0pt; font-family:"Calibri","sans-serif""> </span></p>
</div>
<div>
<p class="MsoNormal"><b><u><span lang="EN-US" style="font-size:9.0pt; font-family:"Calibri","sans-serif"; color:#0E24F2"><a href="mailto:prashantns@imcb.a-star.edu.sg"><span style="color:purple">Prashant Shingate,<span class="apple-converted-space"> </span></span><span style="color:purple; text-decoration:none">PhD</span></a></span></u></b><span class="apple-converted-space"><b><span lang="EN-US" style="font-size:9.0pt; font-family:"Calibri","sans-serif"; color:#1F497D"> </span></b></span><b><span lang="EN-US" style="font-size:9.0pt; font-family:"Calibri","sans-serif"">::<span class="apple-converted-space"> </span><span style="color:#1F497D">Research
Fellow :: Comparative and Medical Genomics Lab :: Institute of Molecular and Cell Biology (IMCB) :: Agency for Science, Technology and Research (A*STAR)</span></span></b><span style="font-size:11.0pt; font-family:"Calibri","sans-serif""></span></p>
</div>
<div>
<p class="MsoNormal"><span lang="EN-US" style="font-size:9.0pt; font-family:"Calibri","sans-serif"; color:#1F497D">61 Biopolis Drive :: #05-04 Proteos :: Singapore 138673 :: DID<span class="apple-converted-space"> </span><a href="tel:(+65)%206586%209570"><span style="color:purple">(+65)
6586 9570</span></a><span class="apple-converted-space"> </span>:: Fax<span class="apple-converted-space"> </span><a href="tel:(+65)%206779%201117"><span style="color:purple">(+65) 6779 1117</span></a></span><span lang="EN-US" style="font-size:9.0pt; font-family:"Calibri","sans-serif"; color:#0E24F2">::<span class="apple-converted-space"> </span><a href="http://www.imcb.a-star.edu.sg/"><span style="color:#0E24F2">http://www.imcb.a-star.edu.sg/</span></a></span><span style="font-size:11.0pt; font-family:"Calibri","sans-serif""></span></p>
</div>
<div>
<p class="MsoNormal"><b><span lang="EN-GB" style="font-size:9.0pt; font-family:"Calibri","sans-serif"; color:red">We advance science and develop innovative technology to further economic growth and improve lives. </span></b><span style="font-size:11.0pt; font-family:"Calibri","sans-serif""></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:11.0pt; font-family:"Calibri","sans-serif""> </span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:11.0pt; font-family:"Calibri","sans-serif""> </span></p>
</div>
<p class="MsoNormal"><span style="font-size:9.0pt; font-family:"Helvetica","sans-serif""><br>
</span><span style="font-size:7.5pt; font-family:"Arial","sans-serif"; color:gray"><br>
Note: This message may contain confidential information. If this Email/Fax has been sent to you by mistake, please notify the sender and delete it immediately. Thank you.<br>
</span><maker_opts.ctl><maker_opts.ctl></p>
</div>
</div>
<p class="MsoNormal"> </p>
</div>
</div>
<br>
<font face="Arial" color="Gray" size="1"><br>
Note: This message may contain confidential information. If this Email/Fax has been sent to you by mistake, please notify the sender and delete it immediately. Thank you.<br>
</font>
</body>
</html>