[maker-devel] some problems using MAKER

Carson Holt carsonhh at gmail.com
Mon Jan 5 19:59:23 MST 2015


I’d have to see the two GFF3 files you are using for your comparison.

However one thing that comes to mind is that you may be unfamiliar with eval’s output.  Eval provides several levels of strictness in the report at the gene, transcript, exon, and base pair levels. If you are using the gene level strictness in the report for example, then a single base pair difference in any of the transcripts will cause the entire gene to be considered a miss-match.  You really only should use the base pair level SN/SP strictness for your comparison which will be in the eval report.  In the most extreme case an exon level SN/SP strictness may be used, but in general no gold standard dataset is considered perfect enough to use the gene level SN/SP (or usually even the exon level strictness).

—Carson



> On Dec 31, 2014, at 6:48 PM, 赵越 <jerryzhaosjtu at gmail.com> wrote:
> 
> Hi all,
> 
> Recently I'm using MAKER to annotate a single chromosome of rice as a pre-experiment. And I'm confronting some problems. After the annotation when I run the evaluation of eval between my result and gold standard, the gene sensitivity&specificity is only around 20%. And after I added the gff3 file maker made itself to run maker again, I found that the result is worse than 20%. 
> 
> My input is a Trinity-processed RNA-seq file and a protein file.  I chose snap, augustus and genemark as ab initio predictors.
> 
> I paste my maker_opts.ctl here:
> 
> #-----Genome (these are always required)
> genome=chr12.fasta #genome sequence (fasta file or fasta embeded in GFF3 file)
> organism_type=eukaryotic #eukaryotic or prokaryotic. Default is eukaryotic
> 
> #-----Re-annotation Using MAKER Derived GFF3
> maker_gff=chr12.gff #MAKER derived GFF3 file
> est_pass=1 #use ESTs in maker_gff: 1 = yes, 0 = no
> altest_pass=0 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no
> protein_pass=1 #use protein alignments in maker_gff: 1 = yes, 0 = no
> rm_pass=0 #use repeats in maker_gff: 1 = yes, 0 = no
> model_pass=0 #use gene models in maker_gff: 1 = yes, 0 = no
> pred_pass=0 #use ab-initio predictions in maker_gff: 1 = yes, 0 = no
> other_pass=0 #passthrough anyything else in maker_gff: 1 = yes, 0 = no
> 
> #-----EST Evidence (for best results provide a file for at least one)
> est=rna-seq_trinity.fasta #set of ESTs or assembled mRNA-seq in fasta format
> altest= #EST/cDNA sequence file in fasta format from an alternate organism
> est_gff= #aligned ESTs or mRNA-seq from an external GFF3 file
> altest_gff= #aligned ESTs from a closly relate species in GFF3 format
> 
> #-----Protein Homology Evidence (for best results provide a file for at least one)
> protein=Osativa_193_peptide.fa  #protein sequence file in fasta format (i.e. from mutiple oransisms)
> protein_gff= #aligned protein homology evidence from an external GFF3 file
> 
> #-----Repeat Masking (leave values blank to skip repeat masking)
> model_org=Rice #select a model organism for RepBase masking in RepeatMasker
> rmlib= #provide an organism specific repeat library in fasta format for RepeatMasker
> repeat_protein= #provide a fasta file of transposable element proteins for RepeatRunner
> rm_gff= #pre-identified repeat elements from an external GFF3 file
> prok_rm=0 #forces MAKER to repeatmask prokaryotes (no reason to change this), 1 = yes, 0 = no
> softmask=1 #use soft-masking rather than hard-masking in BLAST (i.e. seg and dust filtering)
> 
> #-----Gene Prediction
> snaphmm=rice #SNAP HMM file
> gmhmm=/lustre/home/clswcc/yzhao/MAKER/maker/exe/genemark_hmm_euk_linux_64/ehmm/o_sativa.mod #GeneMark HMM file
> augustus_species=arabidopsis #Augustus gene prediction species model
> fgenesh_par_file= #FGENESH parameter file
> pred_gff=augus.gff3 #ab-initio predictions from an external GFF3 file
> model_gff= #annotated gene models from an external GFF3 file (annotation pass-through)
> est2genome=0 #infer gene predictions directly from ESTs, 1 = yes, 0 = no
> protein2genome=0 #infer predictions from protein homology, 1 = yes, 0 = no
> trna=0 #find tRNAs with tRNAscan, 1 = yes, 0 = no                                                                                                          
> snoscan_rrna= #rRNA file to have Snoscan find snoRNAs
> unmask=1 #also run ab-initio prediction programs on unmasked sequence, 1 = yes, 0 = no
> 
> #-----Other Annotation Feature Types (features MAKER doesn't recognize)
> other_gff= #extra features to pass-through to final MAKER generated GFF3 file
> 
> #-----External Application Behavior Options
> alt_peptide=C #amino acid used to replace non-standard amino acids in BLAST databases
> cpus=16 #max number of cpus to use in BLAST and RepeatMasker (not for MPI, leave 1 when using MPI)
> 
> 
> Could you help me? Thank you !!!
> 
> 
> 
> -- 
> Yue Zhao (Jerry)
> Bachelor Candidate of Plant Biotechnology
> Researcher in UCLA-CSST program
> Shanghai Jiao Tong University, Shanghai
> jerryzhaosjtu at gmail.com <mailto:jerryzhaosjtu at gmail.com>_______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20150105/37e2924a/attachment-0002.html>


More information about the maker-devel mailing list