<div dir="ltr"><div><div><div>I wonder what's the best way to collect protein sequences for gene annotation of a de novo genome assembly. <br>(1) My first choice is to get protein sequences of human and mouse from UniProt. At this step, I am not clear whether I should download the reviewed ones (i.e., SWISS-prot) or automatically annotated ones (i.e., TrEMBL). <br>(2) On ther other hand, I also get protein sequences from NCBI, should I just simply merge those fasta files. Does it matter if there are redundancies? And also, if I get protein sequences from different sources, they may not have the same quality. Do I need to do something before I integrate protein sequences from different sources? <br><br></div>Many thanks<br><br></div>Best<br></div>Quanwei<br></div>