<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">
Hi Julian,
<div class=""><br class="">
</div>
<div class="">The RefSeq NM models would be a good place to start for evidence, since those are curated manually. Don’t concatenate the protein and EST files together; putting amino acid seq in as EST will only give you errors in blast and vice versa. </div>
<div class=""><br class="">
</div>
<div class="">The number of files you use in your annotation doesn’t matter as much as the quantity and breadth of the evidence that you use. </div>
<div class=""><br class="">
</div>
<div class="">I don’t think that putting the RefSeq models as EST and protein evidence will make a big difference, but you’d have to put the human ESTs in as alt_ests, which takes longer to blast. </div>
<div class=""><br class="">
</div>
<div class="">I think a good rule of thumb for the protein evidence is to have protein evidence from two genome that are about the same distance from your target genome and a third set from a genome that’s an outgroup to all three genomes. Another good source
for protein evidence is the UniRef database. </div>
<div class=""><br class="">
</div>
<div class="">Let me know if that helps, </div>
<div class="">Daniel</div>
<div class=""><br class="">
</div>
<div class=""><br class="">
</div>
<div class=""><br class="">
</div>
<div class=""><br class="">
<div>
<blockquote type="cite" class="">
<div class="">On May 18, 2015, at 7:53 AM, Julian Egger <<a href="mailto:julian.egger@omahazoo.com" class="">julian.egger@omahazoo.com</a>> wrote:</div>
<br class="Apple-interchange-newline">
<div class="">
<div class="WordSection1" style="page: WordSection1; font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;">
<div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif;" class="">
<span style="font-size: 11pt; font-family: Calibri, sans-serif; color: rgb(31, 73, 125);" class="">Hi Carson,<br class="">
<br class="">
Thank you for the response. I had assumed using human transcripts instead of ESTs might be the way to go, but I had just read so much about using ESTs for annotation. You said to use either the human proteome or human transcripts, would using both in the
MAKER setup file be too inefficient? As far as using either data type, since we are trying to annotate as many genes as possible from our scaffolds, we are looking for a single file to use. For either mRNA transcripts or protein sequences, would using data
from<span class="Apple-converted-space"> </span><a href="ftp://ftp.ncbi.nlm.nih.gov/refseq/H_sapiens/mRNA_Prot/" style="color: purple; text-decoration: underline;" class="">ftp.ncbi.nlm.nih.gov/refseq/H_sapiens/mRNA_Prot/</a>be good reference data for our
scaffolds? That directory has both protein files and rna files. I am not sure if a good option would be too concatenate either the rna files or protein files together and use that as the est= or protein= file. Otherwise, is there a better reference source
people use for MAKER?<br class="">
<br class="">
Thanks,<br class="">
<br class="">
Julian<o:p class=""></o:p></span></div>
<div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif;" class="">
<span style="font-size: 11pt; font-family: Calibri, sans-serif; color: rgb(31, 73, 125);" class=""> </span></div>
<div class="">
<div style="border-style: solid none none; border-top-color: rgb(225, 225, 225); border-top-width: 1pt; padding: 3pt 0in 0in;" class="">
<div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif;" class="">
<b class=""><span style="font-size: 11pt; font-family: Calibri, sans-serif;" class="">From:</span></b><span style="font-size: 11pt; font-family: Calibri, sans-serif;" class=""><span class="Apple-converted-space"> </span>Carson Holt [<a href="mailto:carsonhh@gmail.com" class="">mailto:carsonhh@gmail.com</a>]<span class="Apple-converted-space"> </span><br class="">
<b class="">Sent:</b><span class="Apple-converted-space"> </span>Friday, May 15, 2015 10:31 AM<br class="">
<b class="">To:</b><span class="Apple-converted-space"> </span>Julian Egger<br class="">
<b class="">Cc:</b><span class="Apple-converted-space"> </span><a href="mailto:maker-devel@yandell-lab.org" class="">maker-devel@yandell-lab.org</a><br class="">
<b class="">Subject:</b><span class="Apple-converted-space"> </span>Re: [maker-devel] Non-redundant Reference Human EST Data<o:p class=""></o:p></span></div>
</div>
</div>
<div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif;" class="">
<o:p class=""> </o:p></div>
<div class="">
<div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif;" class="">
Hi Julian, <o:p class=""></o:p></div>
</div>
<div class="">
<div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif;" class="">
<o:p class=""> </o:p></div>
</div>
<div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif;" class="">
Using Human EST’s on primate contigs would be very inefficient. The human genome is already annotated, so you should instead use either the human proteome or human transcripts as input. Using EST’s from another species other than the one being annotated should
only be done if there is not a curated annotation set to use instead. <o:p class=""></o:p></div>
<div class="">
<div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif;" class="">
<o:p class=""> </o:p></div>
</div>
<div class="">
<div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif;" class="">
You may be able to just give the human transcripts to the est= option if the two organisms have not diverged too much in nucleotide sequence. Don’t use the alt_est option since you have human protein annotations. The alt_est option uses tblastx to seed the
alignments which will not be as accurate as the protein= option that seeds via blastx, and it is about 10 time more expensive computationally. So it will take a lot longer and you won’t get anything that you couldn’t have found using the protein data instead.<o:p class=""></o:p></div>
</div>
<div class="">
<div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif;" class="">
<o:p class=""> </o:p></div>
</div>
<div class="">
<div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif;" class="">
Also scaffolds shorter than about 10kb will likely be too short to annotate, so you can test out your parameters on a few of only the largest contigs. In addition, don’t use SNAP because it performs poorly on primate genomes (use Augustus instead).<o:p class=""></o:p></div>
</div>
<div class="">
<div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif;" class="">
<o:p class=""> </o:p></div>
</div>
<div class="">
<div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif;" class="">
Thanks,<o:p class=""></o:p></div>
</div>
<div class="">
<div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif;" class="">
Carson<o:p class=""></o:p></div>
</div>
<div class="">
<div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif;" class="">
<o:p class=""> </o:p></div>
<div class="">
<div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif;" class="">
<o:p class=""> </o:p></div>
</div>
<div class="">
<div class="">
<div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif;" class="">
<o:p class=""> </o:p></div>
<div class="">
<blockquote style="margin-top: 5pt; margin-bottom: 5pt;" class="">
<div class="">
<div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif;" class="">
On May 14, 2015, at 12:17 PM, Julian Egger <<a href="mailto:julian.egger@omahazoo.com" style="color: purple; text-decoration: underline;" class="">julian.egger@omahazoo.com</a>> wrote:<o:p class=""></o:p></div>
</div>
<div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif;" class="">
<o:p class=""> </o:p></div>
<div class="">
<div class="">
<div class="">
<div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif;" class="">
<span style="font-size: 10pt; font-family: Tahoma, sans-serif;" class="">We have assembled scaffolds from genomic reads of a primate sample and would like to annotate as many genes as possible with MAKER. Where is the best place to find an EST file to use
with MAKER containing all of the non-redundant reference humans ESTs? Was trying to look around NCBI, Ensembl, and UCSC, but not sure what the ftp site, subdirectory, and file name would be for something like that.<br class="">
<br class="">
Thanks <span class="apple-converted-space"> </span><o:p class=""></o:p></span></div>
</div>
<div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif;" class="">
<span style="font-size: 10pt; font-family: Tahoma, sans-serif;" class=""> </span></div>
</div>
<div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif;" class="">
<span style="font-size: 9pt; font-family: Helvetica, sans-serif;" class="">_______________________________________________<br class="">
maker-devel mailing list<br class="">
</span><a href="mailto:maker-devel@box290.bluehost.com" style="color: purple; text-decoration: underline;" class=""><span style="font-size: 9pt; font-family: Helvetica, sans-serif;" class="">maker-devel@box290.bluehost.com</span></a><span style="font-size: 9pt; font-family: Helvetica, sans-serif;" class=""><br class="">
</span><a href="http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org" style="color: purple; text-decoration: underline;" class=""><span style="font-size: 9pt; font-family: Helvetica, sans-serif;" class="">http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org</span></a><o:p class=""></o:p></div>
</div>
</blockquote>
</div>
<div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif;" class="">
<o:p class=""> </o:p></div>
</div>
</div>
</div>
</div>
<span style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; float: none; display: inline !important;" class="">_______________________________________________</span><br style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class="">
<span style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; float: none; display: inline !important;" class="">maker-devel
mailing list</span><br style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class="">
<span style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; float: none; display: inline !important;" class=""><a href="mailto:maker-devel@box290.bluehost.com" class="">maker-devel@box290.bluehost.com</a></span><br style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class="">
<span style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; float: none; display: inline !important;" class=""><a href="http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org" class="">http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org</a></span></div>
</blockquote>
</div>
<br class="">
</div>
</body>
</html>