<html><head><meta http-equiv="Content-Type" content="text/html charset=iso-8859-1"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; ">seems like in your case you want to do more of a liftover-based annotation. generate that and feed it as a gff file to maker if your intention is also gene discovery in your population? <div><br><div><br><div><div>On May 23, 2013, at 9:48 AM, Daniel Hughes <<a href="mailto:dsth@ebi.ac.uk">dsth@ebi.ac.uk</a>> wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite"><div dir="ltr">would gene annotation by projection using synteny/WGA not be more appropriate? either way what's wrong with running one of the standard orthology predictions tools or just basic best reciprocal blast?<br>
<br>dan.<br></div><div class="gmail_extra"><br clear="all"><div>Daniel S. T. Hughes M.Biochem (Hons; Oxford), Ph.D (Cambridge)<br>-------------------------------------------------------------------------------------<br><a href="mailto:dsth@cantab.net">dsth@cantab.net</a><br>
<a href="mailto:dsth@cpan.org">dsth@cpan.org</a></div>
<br><br><div class="gmail_quote">2013/5/23 Barry Moore <span dir="ltr"><<a href="mailto:barry.utah@gmail.com" target="_blank">barry.utah@gmail.com</a>></span><br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div style="word-wrap:break-word">Hi Liciano,<div><br></div><div>If I understand correctly you are including translations of SNAP and Augustus predictions as well as the predictions. If so, you don't want to do that. An overlapping protein evidence is sufficient to promote a prediction to an annotation, so by providing the protein translation of the prediction along with the prediction you will guarantee that every prediction will become an annotation and that means you lose the benefit of evidence supervised annotation that MAKER provides. Include the proteins from the D mel reference and if you want to cast a broader net include proteins from other dipterans or even Uniprot - just depend on how aggressive you want to try to be in capturing new annotations.</div>
<div><br></div><div>B </div><div><br><div><div>On May 23, 2013, at 8:41 AM, Luciano Abriata wrote:</div><br><blockquote type="cite"><span style="border-collapse:separate;font-family:Helvetica;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:-webkit-auto;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;font-size:medium"><div style="word-wrap:break-word">
<div style="direction:ltr;font-size:10pt;font-family:Tahoma">Thanks for your reply!<br><br>One more question, can you think of any tips to get the best possible predictions of protein sequences?<br><br>I am asking because I am getting a few proteins that are too big to be real and don't exist if I blast them, plus a few others which don't start with Methionine... So far I am including transcripts and translations from flybase, and snap and augustus with their available trainings for flies. Do you see any possible source of error in that?<br>
<br>Thanks again,<br><br>Luciano<br><br><div style="font-size:16px;font-family:'Times New Roman'"><hr><div style="direction:ltr"><font face="Tahoma"><b>De:</b><span> </span>Barry Moore [<a href="mailto:barry.moore@genetics.utah.edu" target="_blank">barry.moore@genetics.utah.edu</a>]<br>
<b>Enviado el:</b><span> </span>viernes, 17 de mayo de 2013 09:02 p.m.<br><b>Para:</b><span> </span>Luciano Abriata<br><b>Cc:</b><span> </span><a href="mailto:maker-devel@yandell-lab.org" target="_blank">maker-devel@yandell-lab.org</a><br>
<b>Asunto:</b><span> </span>Re: [maker-devel] getting protein sequences from genomes<br></font><br></div><div><div class="h5"><div></div><div><br><div><div>On May 17, 2013, at 3:45 AM, Luciano Abriata wrote:</div><br><blockquote type="cite">
<span style="border-collapse:separate;font-family:Helvetica;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;font-size:medium">
<div style="direction:ltr;font-size:10pt;font-family:Tahoma">Hello, I am trying to use Maker to annotate genomes from different individuals of a population (D. melanogaster flies).<br><br>My ultimate goal is to get, for each gene, the amino acid sequences of the coded proteins as they are expressed from each genome. My questions are:<br>
<br>1) How can I match proteins predicted for the same gene in two genomes?<br></div></span></blockquote><div><br></div><div>blastp tweaked with parameters to optimize near perfect match</div><br><blockquote type="cite">
<span style="border-collapse:separate;font-family:Helvetica;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;font-size:medium">
<div style="direction:ltr;font-size:10pt;font-family:Tahoma"><br>2) What is the meaning of all the data in a line such as the following one (taken from the protein.fasta output)<br><br>maker-2L-augustus-gene-0.19-mRNA-1 protein AED:0.0322873164323667 eAED:0.0322873164323667 QI:2|1|0.66|1|1|1|3|208|541<br>
<br></div></span></blockquote><div><br></div><div>AED = Annotation edit distance describes how closely the prediction matches the evidence. This is a distance measure and thus 0 is a perfect match and 1 is no overlap.</div>
<div><br></div><div>eAED = Exon adjusted annotation edit distance: This metric is the same as AED with a couple of exceptions. For a protein coding exon to be counted as overlapping protein evidence the reading frame must be the same in the coding exon and the protein evidence. <span style="font-family:Calibri,sans-serif;font-size:14px">Second, when mRNA Seq data is used as evidence and both ends of an exon are supported with splice site spanning reads, the middle of that exon is counted as supported as well even if coverage drops off in the interior of the exon.. For the most part AED and eAED will always be the same, but eAED tends to work better on many fringe cases.</span></div>
<div><span style="font-family:Calibri,sans-serif;font-size:14px"><br></span></div><div>QI values are as follows:</div><div><br></div><div><ol><li>5' UTR Length</li><li>Fraction of splice sites confirmed by EST alignment.</li>
<li>Fraction of exons that overlap and EST alignment.</li><li>Fraction of exons that overlap EST or protein alignment.</li><li>Fraction of splice sites confirmed by an ab initio prediction.</li><li>Fraction of exons that overlap an ab intitio prediction.</li>
<li>Number of exons in the transcript.</li><li>3' UTR length.</li><li>Length of encoded protein.</li></ol></div><div><br></div><br><blockquote type="cite"><span style="border-collapse:separate;font-family:Helvetica;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;font-size:medium">
<div style="direction:ltr;font-size:10pt;font-family:Tahoma">3) If I include snap and augustus to improve protein predictions, I get several protein.fasta files: augustus_masked.proteins.fasta , snap_masked.proteins.fasta , non_overlapping_ab_initio.proteins.fasta , and proteins.fasta<br>
<br></div></span></blockquote><blockquote type="cite"><span style="border-collapse:separate;font-family:Helvetica;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;font-size:medium">
<div style="direction:ltr;font-size:10pt;font-family:Tahoma">Which of these files contains the definite set of predicted protein sequences?<br></div></span></blockquote><div><br></div><div>The proteins.fasta file is the final set of proteins for all genes that MAKER created annotations for.</div>
<br><blockquote type="cite"><span style="border-collapse:separate;font-family:Helvetica;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;font-size:medium">
<div style="direction:ltr;font-size:10pt;font-family:Tahoma"><br></div><div style="direction:ltr;font-size:10pt;font-family:Tahoma"><br></div><div style="direction:ltr;font-size:10pt;font-family:Tahoma"><br>Thanks in advance!<br>
<br>Luciano<br></div>_______________________________________________<br>maker-devel mailing list<br><a href="mailto:maker-devel@box290.bluehost.com" target="_blank">maker-devel@box290.bluehost.com</a><br><a href="http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org" target="_blank">http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org</a><br>
</span></blockquote></div><br><div><span style="border-collapse:separate;font-family:Helvetica;font-size:medium;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px"><div>
<span style="font-family:Arial;font-size:12px"><div>Barry Moore</div><div>Research Scientist</div><div>Dept. of Human Genetics</div><div>University of Utah</div><div>Salt Lake City, UT 84112</div><div>--------------------------------------------</div>
<div><a href="tel:%28801%29%20585-3543" value="+18015853543" target="_blank">(801) 585-3543</a></div><div><br></div></span></div><div><br></div></span><br></div><br></div></div></div></div></div></div></span><br></blockquote>
</div><div><div class="h5"><br><div>
<span style="text-indent:0px;letter-spacing:normal;font-variant:normal;text-align:auto;font-style:normal;font-weight:normal;line-height:normal;border-collapse:separate;text-transform:none;font-size:medium;white-space:normal;font-family:Helvetica;word-spacing:0px"><div>
<span style="font-family:Arial;font-size:12px"><div>Barry Moore</div><div>Research Scientist</div><div>Dept. of Human Genetics</div><div>University of Utah</div><div>Salt Lake City, UT 84112</div><div>--------------------------------------------</div>
<div><a href="tel:%28801%29%20585-3543" value="+18015853543" target="_blank">(801) 585-3543</a></div><div><br></div></span></div><div><br></div></span><br>
</div>
<br></div></div></div></div><br>_______________________________________________<br>
maker-devel mailing list<br>
<a href="mailto:maker-devel@box290.bluehost.com">maker-devel@box290.bluehost.com</a><br>
<a href="http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org" target="_blank">http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org</a><br>
<br></blockquote></div><br></div>
_______________________________________________<br>maker-devel mailing list<br><a href="mailto:maker-devel@box290.bluehost.com">maker-devel@box290.bluehost.com</a><br>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org<br></blockquote></div><br><div apple-content-edited="true">
<span class="Apple-style-span" style="border-collapse: separate; color: rgb(0, 0, 0); font-family: Helvetica; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-align: -webkit-auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-border-horizontal-spacing: 0px; -webkit-border-vertical-spacing: 0px; -webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; font-size: medium; "><div>Jason Stajich</div><div><a href="mailto:jason.stajich@gmail.com">jason.stajich@gmail.com</a></div><div><a href="mailto:jason@bioperl.org">jason@bioperl.org</a></div></span>
</div>
<br></div></div></body></html>