<div dir="ltr">
<p class="MsoNormal">Hello,</p>
<p class="MsoNormal"> </p>
<p class="MsoNormal">I am currently using Maker to annotate a <i>de novo</i> genome
assembly for a teleost fish. I would like some clarification that I am using an
appropriate set of protein evidence for the annotation pipeline. For mRNA/EST
evidence, I am using two independent transcriptomes (assembled with Trinity)
from my specific species. </p>
<p class="MsoNormal"> </p>
<p class="MsoNormal">For the protein evidence, I am planning on using two
proteomes from closely related model teleost species from the Ensembl database.
>From Ensembl, you can download all protein translations for a given species
either resulting from known or novel gene models which are based on
transcriptome & proteome data (the ‘pep.all.fa’ file) or resulting from 'ab
initio' gene prediction algorithms solely based on the genomic sequence with no
other experimental evidence (‘pep.abinitio.fa’ file). I’m planning on downloading
the pep.all fasta files.</p>
<p class="MsoNormal"> </p>
<p class="MsoNormal">Alternatively, after reading various posts on the Maker
google group, I realize that I can also download proteomes from teleost fish
from UniProt (<a href="http://www.uniprot.org/proteomes">www.uniprot.org/proteomes</a>).
UniProt proteomes can contain both reviewed and unreviewed protein sequences
and for the fish species I’m interested in downloading, they mostly contain
unreviewed proteins. </p>
<p class="MsoNormal"> </p>
<p class="MsoNormal">Do you recommend that I use the UniProt proteomes instead of
the Ensembl proteomes?</p>
<p class="MsoNormal"> </p>
<p class="MsoNormal">Also, there are actually four different model teleost
species with available proteomes that are equally related to my teleost
species. They’re all in different taxonomic orders, but that’s as closely
related as I can get! Should I stick to just using proteomes from two species
or should I up it to three or four? </p>
<p class="MsoNormal"> </p>
<p class="MsoNormal">In addition, I have read in previous posts that you
recommend using a comprehensive set of proteins from UniProt/Swissprot.
Avoiding the unreviewed, UniProt/tremble datasets, should I download the
complete, reviewed set of all UniProt/Swissprot proteins (<a>uniprot_sprot.fasta.gz</a>)?
Under taxonomic divisions, there seems to be an option to download just
vertebrate Uniprot/Swissprot proteins (the <a>uniprot_sprot_vertebrates.dat.gz</a>
file). This only seems to be available in a .dat file format, but converting a
.dat file into a .fasta seems to be possible.</p>
<p class="MsoNormal"> </p>
<p class="MsoNormal">My apologies if you have already answered these questions in
the past. Any help on these points will be greatly appreciated. </p>
<p class="MsoNormal"> </p><p class="MsoNormal">Thank you,</p><p class="MsoNormal"><br></p>
<p class="MsoNormal">Allison</p>
</div>