[maker-devel] Suggestions if too many predicted genes

Quanwei Zhang qwzhang0601 at gmail.com
Wed Sep 27 08:30:28 MDT 2017


Hello:

Thank you for all your previous comments and suggestions. We annotated a
new rodent species using the maker2 pipeline. The assembly is about 3.2Gb
with N50 24.3Mb. I included all scaffolds longer than 300bp for gene
annotation (about 250k scaffolds).

For repeats masking, we also build a species specific library. We used both
transcriptome and protein sequences as evidences (including 10k reviewed
Mammalian and 340k predicted rodent protein sequences from uniprot). We
predicted 28800 genes with AED<1 (the "default" gene set).

For the 28800 predicted proteins, about 90% have AED value less than 0.5,
and 74% have domains by "InterProScan". It seems the genome was well
annotated, but I still feel  28800 protein coding genes are too many for a
rodent species. Do you think this gene set is good for downstream analysis
(e.g., gene family expansion analysis, positive selection analysis)? Or can
I do further filtering to make the number of genes closer to estimated
number (e.g., 22,000)?

Thanks

Best
Quanwei
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170927/b07f2f47/attachment-0002.html>


More information about the maker-devel mailing list