[maker-devel] How to explain the maker results?
dcg at cau.edu.cn
dcg at cau.edu.cn
Wed May 3 09:29:18 MDT 2017
Dear sir:
I‘ve been using maker to do my genome annotation. However, I still have something I can't understand:
1. After assembly, I have many contigs. Firstly, I set est2genome=1 and protein2genome=1 , with my proteins, ESTs and RNA-seq.. Which way below is correct?
1.1 Each contig has its own gff. I just use its own maker_gff file to get a pyu.hmm(be used in snap practice), and then, train the single contig.
1.2 I merge all the maker_gff to produce a pyu.hmm(for snap) , and then, use this pyu.hmm to train all the contigs.
2. The aim of my project is to find new protein, so I need to guarantee the rigor of my annotation.
I made a plan that the predicted protein should be successfully aligned to the Uniprot(reviewed protein, total number is about 30K) with 100% identity and coverage.
However, if I choose method 1.2 as above:
After the first step (est2genome=1 and protein2genome=1), about 1600 proteins can be 100% aligned to the Uniprot. After 2 rounds training(est2genome=0 and protein2genome=0), less proteins can be 100% aligned.
Is my test method reasonable? Why the final results can't get more well aligned proteins?
After training and fasta_merge, the results can be index_all.log.all.maker.proteins.fasta, index_all.log.all.maker.snap_masked.proteins.fasta, index_all.log.all.maker.non_overlapping_ab_initio.proteins.fasta, which is the final results?
I'm looking forward to hearing from you. Thanks!
Yours sincerely!
Chao Chao
dcg at cau.edu.cn
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170503/684f53cd/attachment-0002.html>
More information about the maker-devel
mailing list