[maker-devel] collecting protein sequences as evidences
Quanwei Zhang
qwzhang0601 at gmail.com
Thu Feb 2 09:16:51 MST 2017
Thank for you suggestions. So it does not matter if there are redundancies
of protein sequence from different sources?
I am trying to annotating a *rodent *genome, and planned to collect protein
sequences of human, mouse, rat from bout UniProt and NCBI (besides I also
have RNA-seq data). I choose these species, because they are close to the
species that I am working on and they are well annotated. But I saw someone
said that if we choose protein sequence from one lineage, the genes that
are missing in the lineage will not be detected. And in the following
paper, the authors claim they used the entire SwissProt database as the
input. How do you think about this? Should I include protein sequences from
more species (like all Eukaryota)? I think it can help us identify more
genes, but on the other hand won't this also give us more false positives?
This paper used the entire SwissProt database as the input.
Insights into the evolution of longevity from the bowhead whale genome.
2015. *Cell Rep* *10*(1): 112-122.
Thanks
Best
Quanwei
2017-01-31 15:57 GMT-05:00 Michael Campbell <michael.s.campbell1 at gmail.com>:
> Hi Quanwei,
>
> (1) When I use uniprot I use SWISS-prot and not tremble.
> (2) I don’t merge files together. I just pass them all to MAKER as a comma
> separated list.
>
> Thanks,
> Mike
>
> > On Jan 31, 2017, at 12:36 PM, Quanwei Zhang <qwzhang0601 at gmail.com>
> wrote:
> >
> > I wonder what's the best way to collect protein sequences for gene
> annotation of a de novo genome assembly.
> > (1) My first choice is to get protein sequences of human and mouse from
> UniProt. At this step, I am not clear whether I should download the
> reviewed ones (i.e., SWISS-prot) or automatically annotated ones (i.e.,
> TrEMBL).
> > (2) On ther other hand, I also get protein sequences from NCBI, should I
> just simply merge those fasta files. Does it matter if there are
> redundancies? And also, if I get protein sequences from different sources,
> they may not have the same quality. Do I need to do something before I
> integrate protein sequences from different sources?
> >
> > Many thanks
> >
> > Best
> > Quanwei
> > _______________________________________________
> > maker-devel mailing list
> > maker-devel at box290.bluehost.com
> > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170202/e3470d60/attachment-0002.html>
More information about the maker-devel
mailing list