<div><P class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><SPAN lang=EN-US style="FONT-FAMILY: 'Times New Roman','serif'"><FONT size=3>Hello maker users and developers,<BR><!-- --><?xml:namespace prefix = "o" ns = "urn:schemas-microsoft-com:office:office" /><o:p><BR><!-- --></o:p></FONT></SPAN></P>
<P class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><FONT size=3><SPAN lang=EN-US style="FONT-FAMILY: 'Times New Roman','serif'">I’m trying to annotate a small fungal genome by using Maker-2.27-beta. For test purpose, I just used the augustus and genemark for de novo gene prediction and supplied the PASA </SPAN><SPAN lang=EN-US style="FONT-FAMILY: 'Times New Roman','serif'; mso-bidi-font-size: 10.5pt; mso-font-kerning: 0pt">assembled transcripts</SPAN><SPAN lang=EN-US style="FONT-FAMILY: 'Times New Roman','serif'"> to the est option. When maker2 finished, I used the gff3_merge and fasta_merge scripts to extract the results. There were 5608, 6255, 5084, and 254 sequences in the resulting protein files: augustus_masked, genemark, non-overlapping ab initio, and maker, respectively. My questions are:<BR><!-- --><o:p><BR><!-- --></o:p></SPAN></FONT></P>
<P class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><SPAN lang=EN-US style="FONT-FAMILY: 'Times New Roman','serif'"><FONT size=3><SPAN style="mso-spacerun: yes"> </SPAN>1. by view the gff file produced by maker2, I have found most of the predicted gene loci have est matches. but why only 254 gene annotations got by maker2 ?<BR><!-- --><o:p><BR><!-- --></o:p></FONT></SPAN></P>
<P class=MsoNormal style="MARGIN: 0cm 0cm 0pt; TEXT-INDENT: 5.25pt; mso-char-indent-count: .5"><SPAN lang=EN-US style="FONT-FAMILY: 'Times New Roman','serif'"><FONT size=3>2. in the “non-overlapping ab initio”file, I found sequences are all from augustus_masked prediction. Does the non-overlapping file only include the best gene modes from predicted by both augustus and genemark? <SPAN style="mso-spacerun: yes"> </SPAN>Does it include genemark- or augustus-specific genes ?<o:p></o:p></FONT></SPAN></P>
<P class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><SPAN lang=EN-US style="FONT-FAMILY: 'Times New Roman','serif'"><o:p><FONT size=3> </FONT></o:p></SPAN></P>
<P class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><SPAN lang=EN-US style="FONT-FAMILY: 'Times New Roman','serif'"><FONT size=3>Thanks in advance for any advice. I appreciate your help!<o:p></o:p></FONT></SPAN></P>
<P class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><SPAN lang=EN-US style="FONT-FAMILY: 'Times New Roman','serif'"><o:p><FONT size=3> </FONT></o:p></SPAN></P>
<P class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><SPAN lang=EN-US style="FONT-FAMILY: 'Times New Roman','serif'"><FONT size=3>best,<o:p></o:p></FONT></SPAN></P>
<P class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><SPAN lang=EN-US style="FONT-FAMILY: 'Times New Roman','serif'"><FONT size=3>Huiquan<o:p></o:p></FONT></SPAN></P>
<P class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><SPAN lang=EN-US style="FONT-FAMILY: 'Times New Roman','serif'"><o:p><FONT size=3> </FONT></o:p></SPAN></P>
<P class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><SPAN lang=EN-US style="FONT-FAMILY: 'Times New Roman','serif'"><FONT size=3>the maker_opts.ctl file:<BR><!-- --><o:p><BR><!-- --></o:p></FONT></SPAN></P>
<P class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><SPAN lang=EN-US style="FONT-FAMILY: 'Times New Roman','serif'"><FONT size=3>#-----Genome (these are always required)<o:p></o:p></FONT></SPAN></P>
<P class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><SPAN lang=EN-US style="FONT-FAMILY: 'Times New Roman','serif'"><FONT size=3>genome=my_gnm.fa #genome sequence (fasta file or fasta embeded in GFF3 file)<o:p></o:p></FONT></SPAN></P>
<P class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><SPAN lang=EN-US style="FONT-FAMILY: 'Times New Roman','serif'"><FONT size=3>organism_type=eukaryotic #eukaryotic or prokaryotic. Default is eukaryotic<o:p></o:p></FONT></SPAN></P>
<P class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><SPAN lang=EN-US style="FONT-FAMILY: 'Times New Roman','serif'"><o:p><FONT size=3> </FONT></o:p></SPAN></P>
<P class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><SPAN lang=EN-US style="FONT-FAMILY: 'Times New Roman','serif'"><FONT size=3>#-----EST Evidence (for best results provide a file for at least one)<o:p></o:p></FONT></SPAN></P>
<P class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><SPAN lang=EN-US style="FONT-FAMILY: 'Times New Roman','serif'"><FONT size=3>est=my_est.fa #set of ESTs or assembled mRNA-seq in fasta format<o:p></o:p></FONT></SPAN></P>
<P class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><SPAN lang=EN-US style="FONT-FAMILY: 'Times New Roman','serif'"><FONT size=3>altest= #EST/cDNA sequence file in fasta format from an alternate organism<o:p></o:p></FONT></SPAN></P>
<P class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><SPAN lang=EN-US style="FONT-FAMILY: 'Times New Roman','serif'"><FONT size=3>est_gff= #aligned ESTs or mRNA-seq from an external GFF3 file<o:p></o:p></FONT></SPAN></P>
<P class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><SPAN lang=EN-US style="FONT-FAMILY: 'Times New Roman','serif'"><FONT size=3>altest_gff= #aligned ESTs from a closly relate species in GFF3 format<o:p></o:p></FONT></SPAN></P>
<P class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><SPAN lang=EN-US style="FONT-FAMILY: 'Times New Roman','serif'"><o:p><FONT size=3> </FONT></o:p></SPAN></P>
<P class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><SPAN lang=EN-US style="FONT-FAMILY: 'Times New Roman','serif'"><FONT size=3>#-----Protein Homology Evidence (for best results provide a file for at least one)<o:p></o:p></FONT></SPAN></P>
<P class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><SPAN lang=EN-US style="FONT-FAMILY: 'Times New Roman','serif'"><FONT size=3>protein=<SPAN style="mso-spacerun: yes"> </SPAN>#protein sequence file in fasta format (i.e. from mutiple oransisms)<o:p></o:p></FONT></SPAN></P>
<P class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><SPAN lang=EN-US style="FONT-FAMILY: 'Times New Roman','serif'"><FONT size=3>protein_gff=<SPAN style="mso-spacerun: yes"> </SPAN>#aligned protein homology evidence from an external GFF3 file<o:p></o:p></FONT></SPAN></P>
<P class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><SPAN lang=EN-US style="FONT-FAMILY: 'Times New Roman','serif'"><o:p><FONT size=3> </FONT></o:p></SPAN></P>
<P class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><SPAN lang=EN-US style="FONT-FAMILY: 'Times New Roman','serif'"><FONT size=3>#-----Repeat Masking (leave values blank to skip repeat masking)<o:p></o:p></FONT></SPAN></P>
<P class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><SPAN lang=EN-US style="FONT-FAMILY: 'Times New Roman','serif'"><FONT size=3>model_org=fungi #select a model organism for RepBase masking in RepeatMasker<o:p></o:p></FONT></SPAN></P>
<P class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><SPAN lang=EN-US style="FONT-FAMILY: 'Times New Roman','serif'"><FONT size=3>rmlib= #provide an organism specific repeat library in fasta format for RepeatMasker<o:p></o:p></FONT></SPAN></P>
<P class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><SPAN lang=EN-US style="FONT-FAMILY: 'Times New Roman','serif'"><FONT size=3>repeat_protein=RepeatPeps.lib #provide a fasta file of transposable element proteins for RepeatRunner<o:p></o:p></FONT></SPAN></P>
<P class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><SPAN lang=EN-US style="FONT-FAMILY: 'Times New Roman','serif'"><FONT size=3>rm_gff= #pre-identified repeat elements from an external GFF3 file<o:p></o:p></FONT></SPAN></P>
<P class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><SPAN lang=EN-US style="FONT-FAMILY: 'Times New Roman','serif'"><FONT size=3>prok_rm=0 #forces MAKER to repeatmask prokaryotes (no reason to change this), 1 = yes, 0 = no<o:p></o:p></FONT></SPAN></P>
<P class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><SPAN lang=EN-US style="FONT-FAMILY: 'Times New Roman','serif'"><FONT size=3>softmask=1 #use soft-masking rather than hard-masking in BLAST (i.e. seg and dust filtering)<o:p></o:p></FONT></SPAN></P>
<P class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><SPAN lang=EN-US style="FONT-FAMILY: 'Times New Roman','serif'"><o:p><FONT size=3> </FONT></o:p></SPAN></P>
<P class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><SPAN lang=EN-US style="FONT-FAMILY: 'Times New Roman','serif'"><FONT size=3>#-----Gene Prediction<o:p></o:p></FONT></SPAN></P>
<P class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><SPAN lang=EN-US style="FONT-FAMILY: 'Times New Roman','serif'"><FONT size=3>snaphmm= #SNAP HMM file<o:p></o:p></FONT></SPAN></P>
<P class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><SPAN lang=EN-US style="FONT-FAMILY: 'Times New Roman','serif'"><FONT size=3>gmhmm=my_ges.mod #GeneMark HMM file<o:p></o:p></FONT></SPAN></P>
<P class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><SPAN lang=EN-US style="FONT-FAMILY: 'Times New Roman','serif'"><FONT size=3>augustus_species=my2 #Augustus gene prediction species model<o:p></o:p></FONT></SPAN></P>
<P class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><SPAN lang=EN-US style="FONT-FAMILY: 'Times New Roman','serif'"><FONT size=3>fgenesh_par_file= #FGENESH parameter file<o:p></o:p></FONT></SPAN></P>
<P class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><SPAN lang=EN-US style="FONT-FAMILY: 'Times New Roman','serif'"><FONT size=3>pred_gff= #ab-initio predictions from an external GFF3 file<o:p></o:p></FONT></SPAN></P>
<P class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><SPAN lang=EN-US style="FONT-FAMILY: 'Times New Roman','serif'"><FONT size=3>model_gff= #annotated gene models from an external GFF3 file (annotation pass-through)<o:p></o:p></FONT></SPAN></P>
<P class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><SPAN lang=EN-US style="FONT-FAMILY: 'Times New Roman','serif'"><FONT size=3>est2genome=0 #infer gene predictions directly from ESTs, 1 = yes, 0 = no<o:p></o:p></FONT></SPAN></P>
<P class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><SPAN lang=EN-US style="FONT-FAMILY: 'Times New Roman','serif'"><FONT size=3>protein2genome=0 #infer predictions from protein homology, 1 = yes, 0 = no<o:p></o:p></FONT></SPAN></P>
<P class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><SPAN lang=EN-US style="FONT-FAMILY: 'Times New Roman','serif'"><FONT size=3>unmask=0 #also run ab-initio prediction programs on unmasked sequence, 1 = yes, 0 = no<o:p></o:p></FONT></SPAN></P>
<P class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><SPAN lang=EN-US style="FONT-FAMILY: 'Times New Roman','serif'"><o:p><FONT size=3> </FONT></o:p></SPAN></P>
<P class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><SPAN lang=EN-US style="FONT-FAMILY: 'Times New Roman','serif'"><FONT size=3>#-----Other Annotation Feature Types (features MAKER doesn't recognize)<o:p></o:p></FONT></SPAN></P>
<P class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><SPAN lang=EN-US style="FONT-FAMILY: 'Times New Roman','serif'"><FONT size=3>other_gff= #extra features to pass-through to final MAKER generated GFF3 file<o:p></o:p></FONT></SPAN></P>
<P class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><SPAN lang=EN-US style="FONT-FAMILY: 'Times New Roman','serif'"><o:p><FONT size=3> </FONT></o:p></SPAN></P>
<P class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><SPAN lang=EN-US style="FONT-FAMILY: 'Times New Roman','serif'"><FONT size=3>#-----External Application Behavior Options<o:p></o:p></FONT></SPAN></P>
<P class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><SPAN lang=EN-US style="FONT-FAMILY: 'Times New Roman','serif'"><FONT size=3>alt_peptide=C #amino acid used to replace non-standard amino acids in BLAST databases<o:p></o:p></FONT></SPAN></P>
<P class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><SPAN lang=EN-US style="FONT-FAMILY: 'Times New Roman','serif'"><FONT size=3>cpus=14 #max number of cpus to use in BLAST and RepeatMasker (not for MPI, leave 1 when using MPI)<o:p></o:p></FONT></SPAN></P>
<P class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><SPAN lang=EN-US style="FONT-FAMILY: 'Times New Roman','serif'"><o:p><FONT size=3> </FONT></o:p></SPAN></P>
<P class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><SPAN lang=EN-US style="FONT-FAMILY: 'Times New Roman','serif'"><FONT size=3>#-----MAKER Behavior Options<o:p></o:p></FONT></SPAN></P>
<P class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><SPAN lang=EN-US style="FONT-FAMILY: 'Times New Roman','serif'"><FONT size=3>max_dna_len=100000 #length for dividing up contigs into chunks (increases/decreases memory usage)<o:p></o:p></FONT></SPAN></P>
<P class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><SPAN lang=EN-US style="FONT-FAMILY: 'Times New Roman','serif'"><FONT size=3>min_contig=1 #skip genome contigs below this length (under 10kb are often useless)<o:p></o:p></FONT></SPAN></P>
<P class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><SPAN lang=EN-US style="FONT-FAMILY: 'Times New Roman','serif'"><o:p><FONT size=3> </FONT></o:p></SPAN></P>
<P class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><SPAN lang=EN-US style="FONT-FAMILY: 'Times New Roman','serif'"><FONT size=3>pred_flank=200 #flank for extending evidence clusters sent to gene predictors<o:p></o:p></FONT></SPAN></P>
<P class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><SPAN lang=EN-US style="FONT-FAMILY: 'Times New Roman','serif'"><FONT size=3>pred_stats=0 #report AED and QI statistics for all predictions as well as models<o:p></o:p></FONT></SPAN></P>
<P class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><SPAN lang=EN-US style="FONT-FAMILY: 'Times New Roman','serif'"><FONT size=3>AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and 1)<o:p></o:p></FONT></SPAN></P>
<P class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><SPAN lang=EN-US style="FONT-FAMILY: 'Times New Roman','serif'"><FONT size=3>min_protein=20 #require at least this many amino acids in predicted proteins<o:p></o:p></FONT></SPAN></P>
<P class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><SPAN lang=EN-US style="FONT-FAMILY: 'Times New Roman','serif'"><FONT size=3>alt_splice=0 #Take extra steps to try and find alternative splicing, 1 = yes, 0 = no<o:p></o:p></FONT></SPAN></P>
<P class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><SPAN lang=EN-US style="FONT-FAMILY: 'Times New Roman','serif'"><FONT size=3>always_complete=1 #extra steps to force start and stop codons, 1 = yes, 0 = no<o:p></o:p></FONT></SPAN></P>
<P class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><SPAN lang=EN-US style="FONT-FAMILY: 'Times New Roman','serif'"><FONT size=3>map_forward=0 #map names and attributes forward from old GFF3 genes, 1 = yes, 0 = no<o:p></o:p></FONT></SPAN></P>
<P class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><SPAN lang=EN-US style="FONT-FAMILY: 'Times New Roman','serif'"><FONT size=3>keep_preds=0 #Concordance threshold to add unsupported gene prediction (bound by 0 and 1)<o:p></o:p></FONT></SPAN></P>
<P class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><SPAN lang=EN-US style="FONT-FAMILY: 'Times New Roman','serif'"><o:p><FONT size=3> </FONT></o:p></SPAN></P>
<P class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><FONT size=3><SPAN lang=EN-US style="FONT-FAMILY: 'Times New Roman','serif'; BACKGROUND: yellow; mso-highlight: yellow">split_hit=1500</SPAN><SPAN lang=EN-US style="FONT-FAMILY: 'Times New Roman','serif'"> #length for the splitting of hits (expected max intron size for evidence alignments)<o:p></o:p></SPAN></FONT></P>
<P class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><FONT size=3><SPAN lang=EN-US style="FONT-FAMILY: 'Times New Roman','serif'; BACKGROUND: yellow; mso-highlight: yellow">single_exon=1</SPAN><SPAN lang=EN-US style="FONT-FAMILY: 'Times New Roman','serif'"> #consider single exon EST evidence when generating annotations, 1 = yes, 0 = no<o:p></o:p></SPAN></FONT></P>
<P class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><FONT size=3><SPAN lang=EN-US style="FONT-FAMILY: 'Times New Roman','serif'; BACKGROUND: yellow; mso-highlight: yellow">single_length=200</SPAN><SPAN lang=EN-US style="FONT-FAMILY: 'Times New Roman','serif'"> #min length required for single exon ESTs if 'single_exon is enabled'<o:p></o:p></SPAN></FONT></P>
<P class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><FONT size=3><SPAN lang=EN-US style="FONT-FAMILY: 'Times New Roman','serif'; BACKGROUND: yellow; mso-highlight: yellow">correct_est_fusion=1</SPAN><SPAN lang=EN-US style="FONT-FAMILY: 'Times New Roman','serif'"> #limits use of ESTs in annotation to avoid fusion genes<o:p></o:p></SPAN></FONT></P>
<P class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><SPAN lang=EN-US style="FONT-FAMILY: 'Times New Roman','serif'"><o:p><FONT size=3> </FONT></o:p></SPAN></P>
<P class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><SPAN lang=EN-US style="FONT-FAMILY: 'Times New Roman','serif'"><FONT size=3>tries=2 #number of times to try a contig if there is a failure for some reason<o:p></o:p></FONT></SPAN></P>
<P class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><SPAN lang=EN-US style="FONT-FAMILY: 'Times New Roman','serif'"><FONT size=3>clean_try=0 #remove all data from previous run before retrying, 1 = yes, 0 = no<o:p></o:p></FONT></SPAN></P>
<P class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><SPAN lang=EN-US style="FONT-FAMILY: 'Times New Roman','serif'"><FONT size=3>clean_up=0 #removes theVoid directory with individual analysis files, 1 = yes, 0 = no<o:p></o:p></FONT></SPAN></P><SPAN lang=EN-US style="FONT-SIZE: 10.5pt; FONT-FAMILY: 'Times New Roman','serif'; mso-bidi-font-size: 11.0pt; mso-fareast-font-family: 宋体; mso-fareast-theme-font: minor-fareast; mso-ansi-language: EN-US; mso-fareast-language: ZH-CN; mso-bidi-language: AR-SA">TMP= #specify a directory other than the system default temporary directory for temporary files</SPAN><BR><QUARK_EDITOR_SIGN>--<BR></div>