[maker-devel] collecting protein sequences as evidences

Quanwei Zhang qwzhang0601 at gmail.com
Tue Jan 31 10:36:13 MST 2017


I wonder what's the best way to collect protein sequences for gene
annotation of a de novo genome assembly.
(1) My first choice is to get protein sequences of human and mouse from
UniProt. At this step, I am not clear whether I should download the
reviewed ones (i.e., SWISS-prot) or automatically annotated ones (i.e.,
TrEMBL).
(2) On ther other hand, I also get protein sequences from NCBI, should I
just simply merge those fasta files. Does it matter if there are
redundancies? And also, if I get protein sequences from different sources,
they may not have the same quality. Do I need to do something before I
integrate protein sequences from different sources?

Many thanks

Best
Quanwei
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170131/315d4a00/attachment-0002.html>


More information about the maker-devel mailing list