<html><head></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; color: rgb(0, 0, 0); font-size: 14px; font-family: Calibri, sans-serif;"><div>Only if you were to remove the brackets around gene=.</div><div><br></div><div>--Carson</div><div><br></div><span id="OLK_SRC_BODY_SECTION"><div style="font-family:Calibri; font-size:11pt; text-align:left; color:black; BORDER-BOTTOM: medium none; BORDER-LEFT: medium none; PADDING-BOTTOM: 0in; PADDING-LEFT: 0in; PADDING-RIGHT: 0in; BORDER-TOP: #b5c4df 1pt solid; BORDER-RIGHT: medium none; PADDING-TOP: 3pt"><span style="font-weight:bold">From: </span> Shaun Jackman <<a href="mailto:sjackman@gmail.com">sjackman@gmail.com</a>><br><span style="font-weight:bold">Reply-To: </span> Shaun Jackman <<a href="mailto:sjackman@gmail.com">sjackman@gmail.com</a>><br><span style="font-weight:bold">Date: </span> Thursday, May 8, 2014 at 4:41 PM<br><span style="font-weight:bold">To: </span> Carson Holt <<a href="mailto:carsonhh@gmail.com">carsonhh@gmail.com</a>><br><span style="font-weight:bold">Cc: </span> "<a href="mailto:maker-devel@yandell-lab.org">maker-devel@yandell-lab.org</a>" <<a href="mailto:maker-devel@yandell-lab.org">maker-devel@yandell-lab.org</a>><br><span style="font-weight:bold">Subject: </span> Re: [maker-devel] est_forward and conflicting names<br></div><div><br></div><div dir="ltr"><div class="markdown-here-wrapper" id="markdown-here-wrapper-926841" style=""><p style="margin:1.2em 0px!important">Interesting. Thanks for the clarification. I’m working on a plant mitochondrion, and so as far as I know, there’s no alternative splicing. My <code style="font-size:0.85em;font-family:Consolas,Inconsolata,Courier,monospace;margin:0px 0.15em;padding:0px 0.3em;white-space:pre-wrap;border:1px solid rgb(234,234,234);background-color:rgb(248,248,248);border-top-left-radius:3px;border-top-right-radius:3px;border-bottom-right-radius:3px;border-bottom-left-radius:3px;display:inline">protein</code> FASTA file is composed of the protein sequences of ~100 species downloaded from GenBank. It looks like this:</p><pre style="font-size:0.85em;font-family:Consolas,Inconsolata,Courier,monospace;font-size:1em;line-height:1.2em;margin:1.2em 0px"><code style="font-size:0.85em;font-family:Consolas,Inconsolata,Courier,monospace;margin:0px 0.15em;padding:0px 0.3em;white-space:pre-wrap;border:1px solid rgb(234,234,234);background-color:rgb(248,248,248);border-top-left-radius:3px;border-top-right-radius:3px;border-bottom-right-radius:3px;border-bottom-left-radius:3px;display:inline;white-space:pre;overflow:auto;border-top-left-radius:3px;border-top-right-radius:3px;border-bottom-right-radius:3px;border-bottom-left-radius:3px;border:1px solid rgb(204,204,204);padding:0.5em 0.7em;display:block!important;display:block;padding:0.5em;color:rgb(51,51,51);background-color:rgb(248,248,255);background-repeat:initial initial">>cox1|lcl|KJ461445.1_cdsid_AHY20320.1 [gene=cox1] [protein=cytochrome c oxidase subunit 1] [protein_id=AHY20320.1] [location=complement(59212..60795)]
…
>cox1|lcl|EU534409.1_cdsid_ACA62629.1 [gene=cox1] [protein=cox1] [protein_id=ACA62629.1] [location=245282..246856]
…
>cox1|lcl|NC_023103.1_cdsid_YP_008964124.1 [gene=cox1] [protein=cytochrome c oxidase subunit 1] [protein_id=YP_008964124.1] [location=join(317824..318438,319511..320368)]
…
</code></pre><p style="margin:1.2em 0px!important">I’m not sure that I actually want the fancy behaviour that you describe, though it probably wouldn’t hurt anything. Will this FASTA format trigger the fancy behaviour?</p><p style="margin:1.2em 0px!important">Cheers,<br>Shaun</p></div></div><div class="gmail_extra"><br clear="all"><div><div dir="ltr"><u><a href="http://sjackman.ca" target="_blank">http://sjackman.ca</a></u></div></div><br><br><div class="gmail_quote">On 8 May 2014 15:33, Carson Holt <span dir="ltr"><<a href="mailto:carsonhh@gmail.com" target="_blank">carsonhh@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word;color:rgb(0,0,0);font-size:14px;font-family:Calibri,sans-serif"><div>When moving transcripts onto a new assembly, you may have multiple transcripts of the same gene. Because your transcript name should be your fasta ID there is no way for MAKER to know that they go together when moving the models forward, so you can use the gene= option to make MAKER aware that these belong to the same genes. They will be grouped and you recover all splice forms as a group. </div><div><br></div><div>Example:</div><div><br></div><div>>SMEDT_00004 gene=dpp</div><div>AAAAAAA</div><div><br></div><div>>SMEDT_00005 gene=dpp</div><div>AAAAAAA</div><div><br></div><div>--Carson</div><div><br></div><div><br></div><div><br></div><span><div style="font-family:Calibri;font-size:11pt;text-align:left;color:black;BORDER-BOTTOM:medium none;BORDER-LEFT:medium none;PADDING-BOTTOM:0in;PADDING-LEFT:0in;PADDING-RIGHT:0in;BORDER-TOP:#b5c4df 1pt solid;BORDER-RIGHT:medium none;PADDING-TOP:3pt"><span style="font-weight:bold">From: </span> Shaun Jackman <<a href="mailto:sjackman@gmail.com" target="_blank">sjackman@gmail.com</a>><br><span style="font-weight:bold">Reply-To: </span> Shaun Jackman <<a href="mailto:sjackman@gmail.com" target="_blank">sjackman@gmail.com</a>><br><span style="font-weight:bold">Date: </span> Thursday, May 8, 2014 at 4:26 PM<br><span style="font-weight:bold">To: </span> Carson Holt <<a href="mailto:carsonhh@gmail.com" target="_blank">carsonhh@gmail.com</a>><br><span style="font-weight:bold">Cc: </span> "<a href="mailto:maker-devel@yandell-lab.org" target="_blank">maker-devel@yandell-lab.org</a>" <<a href="mailto:maker-devel@yandell-lab.org" target="_blank">maker-devel@yandell-lab.org</a>><br><span style="font-weight:bold">Subject: </span> Re: [maker-devel] est_forward and conflicting names<br></div><div><div class="h5"><div><br></div><div dir="ltr"><div><p style="margin:1.2em 0px!important">Hi, Carson. Could you give an example of how to add <code style="font-size:0.85em;font-family:Consolas,Inconsolata,Courier,monospace;margin:0px 0.15em;padding:0px 0.3em;white-space:pre-wrap;border:1px solid rgb(234,234,234);background-color:rgb(248,248,248);border-top-left-radius:3px;border-top-right-radius:3px;border-bottom-right-radius:3px;border-bottom-left-radius:3px;display:inline">gene_id=</code> to the header of the FASTA file? I’m not clear on what you mean by this. In the FASTA header, what portion is the transcript name, and what portion is the gene name?</p><p style="margin:1.2em 0px!important">Cheers,<br>Shaun</p></div></div><div class="gmail_extra"><br clear="all"><div><div dir="ltr"><u><a href="http://sjackman.ca" target="_blank">http://sjackman.ca</a></u></div></div><br><br><div class="gmail_quote">On 2 May 2014 11:55, Carson Holt <span dir="ltr"><<a href="mailto:carsonhh@gmail.com" target="_blank">carsonhh@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word;color:rgb(0,0,0);font-size:14px;font-family:Calibri,sans-serif"><div>Whichever has the best AED score I believe, but you can add gene_id= to the header of each fasta file to ensure MAKER doesn't try and cluster unrelated transcripts into a single gene. Then the transcript name and gene name will be guaranteed to match up.</div><div><br></div><div>--Carson</div><div><br></div><div><br></div><span><div style="font-family:Calibri;font-size:11pt;text-align:left;color:black;BORDER-BOTTOM:medium none;BORDER-LEFT:medium none;PADDING-BOTTOM:0in;PADDING-LEFT:0in;PADDING-RIGHT:0in;BORDER-TOP:#b5c4df 1pt solid;BORDER-RIGHT:medium none;PADDING-TOP:3pt"><span style="font-weight:bold">From: </span> Shaun Jackman <<a href="mailto:sjackman@gmail.com" target="_blank">sjackman@gmail.com</a>><br><span style="font-weight:bold">Date: </span> Wednesday, April 30, 2014 at 5:25 PM<br><span style="font-weight:bold">To: </span> "<a href="mailto:maker-devel@yandell-lab.org" target="_blank">maker-devel@yandell-lab.org</a>" <<a href="mailto:maker-devel@yandell-lab.org" target="_blank">maker-devel@yandell-lab.org</a>><br><span style="font-weight:bold">Subject: </span> [maker-devel] est_forward and conflicting names<br></div><div><div><div><br></div><div><div style="word-wrap:break-word"><p>Hi, Carson.</p><p>I’ve downloaded a number genes from GenBank using Entrez Direct, which I’m using with <code>est</code> and <code>protein</code> to annotate a plant mitochondrion. Most of these reference sequences have sensible and consistent gene names, and so I’m using <code>est_forward</code> to retain the gene names. This workflow is working well for me. Some of the genes pulled in from GenBank have less useful names like <code>orf1234</code> or other numeric IDs. When multiple evidence sequences map to the same location, how does <code>est_forward</code> choose which name to use? If it’s chosen arbitrarily, could it be possible to choose the most common name instead?</p><p>Thanks,<br>
Shaun</p><p></p><div style="font-family:Helvetica,Arial;font-size:13px;color:rgba(0,0,0,1.0);margin:0px;line-height:auto"><br></div><p></p></div></div></div></div>_______________________________________________
maker-devel mailing list
<a href="mailto:maker-devel@box290.bluehost.com" target="_blank">maker-devel@box290.bluehost.com</a><a href="http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org" target="_blank">http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org</a></span></div></blockquote></div><br></div></div></div></span></div></blockquote></div><br></div></span></body></html>