[maker-devel] How to explain the maker results?

dcg at cau.edu.cn dcg at cau.edu.cn
Wed May 3 09:29:18 MDT 2017


Dear sir:
    I‘ve been using maker to do my genome annotation. However, I still have something I can't understand:

    1. After assembly, I have many contigs. Firstly, I set est2genome=1 and protein2genome=1 , with my proteins, ESTs and RNA-seq.. Which way below is correct?
    1.1 Each contig has its own gff. I just use its own maker_gff file to get a pyu.hmm(be used in snap practice), and then, train the single contig.
    1.2 I merge all the maker_gff to produce a pyu.hmm(for snap) , and then, use this pyu.hmm to train all the contigs.
    
    2. The aim of my project is to find new protein, so I need to guarantee the rigor of my annotation.
        I  made a plan that the predicted protein should be successfully aligned to the Uniprot(reviewed protein, total number is about 30K) with 100% identity and coverage.
        However, if I choose method 1.2 as above:
        After the first step (est2genome=1 and protein2genome=1), about 1600 proteins can be 100% aligned to the Uniprot. After 2 rounds training(est2genome=0 and protein2genome=0), less proteins can be 100% aligned.
        Is my test method reasonable? Why the final results can't get more well aligned proteins?
        After training and fasta_merge, the results can be index_all.log.all.maker.proteins.fasta, index_all.log.all.maker.snap_masked.proteins.fasta, index_all.log.all.maker.non_overlapping_ab_initio.proteins.fasta,  which is the final results?

    
     I'm looking forward to hearing from you. Thanks!
Yours sincerely!

     
Chao Chao


dcg at cau.edu.cn
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170503/684f53cd/attachment-0002.html>


More information about the maker-devel mailing list