[maker-devel] Protein Evidence for teleost fish
Allison Fuiten
allisonfuiten at gmail.com
Mon Oct 20 15:39:20 MDT 2014
Hello,
I am currently using Maker to annotate a *de novo* genome assembly for a
teleost fish. I would like some clarification that I am using an
appropriate set of protein evidence for the annotation pipeline. For
mRNA/EST evidence, I am using two independent transcriptomes (assembled
with Trinity) from my specific species.
For the protein evidence, I am planning on using two proteomes from closely
related model teleost species from the Ensembl database. From Ensembl, you
can download all protein translations for a given species either resulting
from known or novel gene models which are based on transcriptome & proteome
data (the ‘pep.all.fa’ file) or resulting from 'ab initio' gene prediction
algorithms solely based on the genomic sequence with no other experimental
evidence (‘pep.abinitio.fa’ file). I’m planning on downloading the pep.all
fasta files.
Alternatively, after reading various posts on the Maker google group, I
realize that I can also download proteomes from teleost fish from UniProt (
www.uniprot.org/proteomes). UniProt proteomes can contain both reviewed and
unreviewed protein sequences and for the fish species I’m interested in
downloading, they mostly contain unreviewed proteins.
Do you recommend that I use the UniProt proteomes instead of the Ensembl
proteomes?
Also, there are actually four different model teleost species with
available proteomes that are equally related to my teleost species. They’re
all in different taxonomic orders, but that’s as closely related as I can
get! Should I stick to just using proteomes from two species or should I up
it to three or four?
In addition, I have read in previous posts that you recommend using a
comprehensive set of proteins from UniProt/Swissprot. Avoiding the
unreviewed, UniProt/tremble datasets, should I download the complete,
reviewed set of all UniProt/Swissprot proteins (uniprot_sprot.fasta.gz)?
Under taxonomic divisions, there seems to be an option to download just
vertebrate Uniprot/Swissprot proteins (the uniprot_sprot_vertebrates.dat.gz
file). This only seems to be available in a .dat file format, but
converting a .dat file into a .fasta seems to be possible.
My apologies if you have already answered these questions in the past. Any
help on these points will be greatly appreciated.
Thank you,
Allison
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20141020/e833a517/attachment-0002.html>
More information about the maker-devel
mailing list