<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=Windows-1252">
</head>
<body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;">
Hi Allison,
<div><br>
</div>
<div>I think that the Ensembl proteome dataset and the uniprot/swissprot dataset are likely to be useful for your annotation project. The omnibus nature of SwissProt (or one of the reduced datasets like UniProt90) helps to make sure that any proteins that might
be missing from the closely-related species’ proteomes (or the transcriptome datasets) can still be identified. </div>
<div><br>
</div>
<div>Since the uniprot dataset for teleost fish contains unreviewed protein sequence, you probably want to avoid that. It’s also worth noting that the transcriptome dataset is necessary for identifying features like 3’ and 5’ UTRs and refining the structure
of the gene models, so there’s a limit to the improvements that you’ll get to the MAKER results when you increase the size of the proteome dataset.</div>
<div><br>
</div>
<div>I hope that helps. Feel free to let us know if you have anymore questions. </div>
<div><br>
</div>
<div>Thanks,</div>
<div>Daniel</div>
<div><br>
</div>
<div><br>
</div>
<div><br>
</div>
<div>
<div>
<div>On Oct 20, 2014, at 3:39 PM, Allison Fuiten <<a href="mailto:allisonfuiten@gmail.com">allisonfuiten@gmail.com</a>> wrote:</div>
<br class="Apple-interchange-newline">
<blockquote type="cite">
<div dir="ltr">
<p class="MsoNormal">Hello,</p>
<div> <br class="webkit-block-placeholder">
</div>
<p class="MsoNormal">I am currently using Maker to annotate a <i>de novo</i> genome assembly for a teleost fish. I would like some clarification that I am using an appropriate set of protein evidence for the annotation pipeline. For mRNA/EST evidence, I am
using two independent transcriptomes (assembled with Trinity) from my specific species.
</p>
<div> <br class="webkit-block-placeholder">
</div>
<p class="MsoNormal">For the protein evidence, I am planning on using two proteomes from closely related model teleost species from the Ensembl database. From Ensembl, you can download all protein translations for a given species either resulting from known
or novel gene models which are based on transcriptome & proteome data (the ‘pep.all.fa’ file) or resulting from 'ab initio' gene prediction algorithms solely based on the genomic sequence with no other experimental evidence (‘pep.abinitio.fa’ file). I’m planning
on downloading the pep.all fasta files.</p>
<div> <br class="webkit-block-placeholder">
</div>
<p class="MsoNormal">Alternatively, after reading various posts on the Maker google group, I realize that I can also download proteomes from teleost fish from UniProt (<a href="http://www.uniprot.org/proteomes">www.uniprot.org/proteomes</a>). UniProt proteomes
can contain both reviewed and unreviewed protein sequences and for the fish species I’m interested in downloading, they mostly contain unreviewed proteins.
</p>
<div> <br class="webkit-block-placeholder">
</div>
<p class="MsoNormal">Do you recommend that I use the UniProt proteomes instead of the Ensembl proteomes?</p>
<div> <br class="webkit-block-placeholder">
</div>
<p class="MsoNormal">Also, there are actually four different model teleost species with available proteomes that are equally related to my teleost species. They’re all in different taxonomic orders, but that’s as closely related as I can get! Should I stick
to just using proteomes from two species or should I up it to three or four? </p>
<div> <br class="webkit-block-placeholder">
</div>
<p class="MsoNormal">In addition, I have read in previous posts that you recommend using a comprehensive set of proteins from UniProt/Swissprot. Avoiding the unreviewed, UniProt/tremble datasets, should I download the complete, reviewed set of all UniProt/Swissprot
proteins (<a>uniprot_sprot.fasta.gz</a>)? Under taxonomic divisions, there seems to be an option to download just vertebrate Uniprot/Swissprot proteins (the
<a>uniprot_sprot_vertebrates.dat.gz</a> file). This only seems to be available in a .dat file format, but converting a .dat file into a .fasta seems to be possible.</p>
<div> <br class="webkit-block-placeholder">
</div>
<p class="MsoNormal">My apologies if you have already answered these questions in the past. Any help on these points will be greatly appreciated.
</p>
<div> <br class="webkit-block-placeholder">
</div>
<p class="MsoNormal">Thank you,</p>
<p class="MsoNormal"><br>
</p>
<p class="MsoNormal">Allison</p>
</div>
_______________________________________________<br>
maker-devel mailing list<br>
<a href="mailto:maker-devel@box290.bluehost.com">maker-devel@box290.bluehost.com</a><br>
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org<br>
</blockquote>
</div>
<br>
</div>
</body>
</html>