<html dir="ltr">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=gb2312">
<style type="text/css" id="owaParaStyle"></style>
</head>
<body fpstyle="1" ocsi="0">
<div style="direction: ltr;font-family: Tahoma;color: #000000;font-size: 10pt;">Hi Huiquan,
<div><br>
</div>
<div>1)The default behavior for Maker is that it will only annotate gene models when there is support from both the evidence (est and protein alignments) and from the ab-initio predictors.</div>
<div><br>
</div>
<div>How many transcripts did you get from PASA? I expect there are about 254 sequences, which is about how many genes you annotated. If you want to get more gene models, then you need to supply more evidence. For our annotation projects, we often use some
derivation of Swiss-prot, which is a hand-curated database of proteins across all kingdoms. </div>
<div><br>
</div>
<div>2) The non-overlapping ab-initio file includes ab-initio predictions that didn't overlap any gene models. If augustus and genemark predictions overlap, I think it should include both, but if the one prediction completely covers the other, I think the longer
of the two would be included.</div>
<div><br>
</div>
<div>Does that answer your questions?</div>
<div><br>
</div>
<div>Thanks,</div>
<div>Daniel</div>
<div><br>
<div><br>
<div class="BodyFragment"><font size="2">
<div class="PlainText">Daniel Ence<br>
Graduate Student<br>
Eccles Institute of Human Genetics<br>
University of Utah<br>
15 North 2030 East, Room 2100<br>
Salt Lake City, UT 84112-5330</div>
</font></div>
</div>
<div style="font-family: Times New Roman; color: #000000; font-size: 16px">
<hr tabindex="-1">
<div id="divRpF354831" style="direction: ltr;"><font face="Tahoma" size="2" color="#000000"><b>From:</b> maker-devel-bounces@yandell-lab.org [maker-devel-bounces@yandell-lab.org] on behalf of Áõ»ÛȪ [liuhuiquan@nwsuaf.edu.cn]<br>
<b>Sent:</b> Tuesday, April 16, 2013 2:16 AM<br>
<b>To:</b> maker-devel@yandell-lab.org<br>
<b>Subject:</b> [maker-devel] *maker.proteins and *non_overlapping_ab_initio.proteins files<br>
</font><br>
</div>
<div></div>
<div>
<div>
<p class="MsoNormal" style="margin:0cm 0cm 0pt"><span lang="EN-US" style="font-family:'Times New Roman','serif'"><font size="3">Hello maker users and developers,<br>
<br>
</font></span></p>
<p class="MsoNormal" style="margin:0cm 0cm 0pt"><font size="3"><span lang="EN-US" style="font-family:'Times New Roman','serif'">I¡¯m trying to annotate a small fungal genome by using Maker-2.27-beta. For test purpose, I just used the augustus and genemark for
de novo gene prediction and supplied the PASA </span><span lang="EN-US" style="font-family:'Times New Roman','serif'">assembled transcripts</span><span lang="EN-US" style="font-family:'Times New Roman','serif'"> to the est option. When maker2 finished, I used
the gff3_merge and fasta_merge scripts to extract the results. There were 5608, 6255, 5084, and 254 sequences in the resulting protein files: augustus_masked, genemark, non-overlapping ab initio, and maker, respectively. My questions are:<br>
<br>
</span></font></p>
<p class="MsoNormal" style="margin:0cm 0cm 0pt"><span lang="EN-US" style="font-family:'Times New Roman','serif'"><font size="3"><span style=""> </span>1. by view the gff file produced by maker2, I have found most of the predicted gene loci have est matches.
but why only 254 gene annotations got by maker2 ?<br>
<br>
</font></span></p>
<p class="MsoNormal" style="margin:0cm 0cm 0pt; text-indent:5.25pt"><span lang="EN-US" style="font-family:'Times New Roman','serif'"><font size="3">2. in the ¡°non-overlapping ab initio¡±file, I found sequences are all from augustus_masked prediction. Does the
non-overlapping file only include the best gene modes from predicted by both augustus and genemark?
<span style=""> </span>Does it include genemark- or augustus-specific genes ?</font></span></p>
<p class="MsoNormal" style="margin:0cm 0cm 0pt"><span lang="EN-US" style="font-family:'Times New Roman','serif'"><font size="3"> </font></span></p>
<p class="MsoNormal" style="margin:0cm 0cm 0pt"><span lang="EN-US" style="font-family:'Times New Roman','serif'"><font size="3">Thanks in advance for any advice. I appreciate your help!</font></span></p>
<p class="MsoNormal" style="margin:0cm 0cm 0pt"><span lang="EN-US" style="font-family:'Times New Roman','serif'"><font size="3"> </font></span></p>
<p class="MsoNormal" style="margin:0cm 0cm 0pt"><span lang="EN-US" style="font-family:'Times New Roman','serif'"><font size="3">best,</font></span></p>
<p class="MsoNormal" style="margin:0cm 0cm 0pt"><span lang="EN-US" style="font-family:'Times New Roman','serif'"><font size="3">Huiquan</font></span></p>
<p class="MsoNormal" style="margin:0cm 0cm 0pt"><span lang="EN-US" style="font-family:'Times New Roman','serif'"><font size="3"> </font></span></p>
<p class="MsoNormal" style="margin:0cm 0cm 0pt"><span lang="EN-US" style="font-family:'Times New Roman','serif'"><font size="3">the maker_opts.ctl file:<br>
<br>
</font></span></p>
<p class="MsoNormal" style="margin:0cm 0cm 0pt"><span lang="EN-US" style="font-family:'Times New Roman','serif'"><font size="3">#-----Genome (these are always required)</font></span></p>
<p class="MsoNormal" style="margin:0cm 0cm 0pt"><span lang="EN-US" style="font-family:'Times New Roman','serif'"><font size="3">genome=my_gnm.fa #genome sequence (fasta file or fasta embeded in GFF3 file)</font></span></p>
<p class="MsoNormal" style="margin:0cm 0cm 0pt"><span lang="EN-US" style="font-family:'Times New Roman','serif'"><font size="3">organism_type=eukaryotic #eukaryotic or prokaryotic. Default is eukaryotic</font></span></p>
<p class="MsoNormal" style="margin:0cm 0cm 0pt"><span lang="EN-US" style="font-family:'Times New Roman','serif'"><font size="3"> </font></span></p>
<p class="MsoNormal" style="margin:0cm 0cm 0pt"><span lang="EN-US" style="font-family:'Times New Roman','serif'"><font size="3">#-----EST Evidence (for best results provide a file for at least one)</font></span></p>
<p class="MsoNormal" style="margin:0cm 0cm 0pt"><span lang="EN-US" style="font-family:'Times New Roman','serif'"><font size="3">est=my_est.fa #set of ESTs or assembled mRNA-seq in fasta format</font></span></p>
<p class="MsoNormal" style="margin:0cm 0cm 0pt"><span lang="EN-US" style="font-family:'Times New Roman','serif'"><font size="3">altest= #EST/cDNA sequence file in fasta format from an alternate organism</font></span></p>
<p class="MsoNormal" style="margin:0cm 0cm 0pt"><span lang="EN-US" style="font-family:'Times New Roman','serif'"><font size="3">est_gff= #aligned ESTs or mRNA-seq from an external GFF3 file</font></span></p>
<p class="MsoNormal" style="margin:0cm 0cm 0pt"><span lang="EN-US" style="font-family:'Times New Roman','serif'"><font size="3">altest_gff= #aligned ESTs from a closly relate species in GFF3 format</font></span></p>
<p class="MsoNormal" style="margin:0cm 0cm 0pt"><span lang="EN-US" style="font-family:'Times New Roman','serif'"><font size="3"> </font></span></p>
<p class="MsoNormal" style="margin:0cm 0cm 0pt"><span lang="EN-US" style="font-family:'Times New Roman','serif'"><font size="3">#-----Protein Homology Evidence (for best results provide a file for at least one)</font></span></p>
<p class="MsoNormal" style="margin:0cm 0cm 0pt"><span lang="EN-US" style="font-family:'Times New Roman','serif'"><font size="3">protein=<span style="">
</span>#protein sequence file in fasta format (i.e. from mutiple oransisms)</font></span></p>
<p class="MsoNormal" style="margin:0cm 0cm 0pt"><span lang="EN-US" style="font-family:'Times New Roman','serif'"><font size="3">protein_gff=<span style="">
</span>#aligned protein homology evidence from an external GFF3 file</font></span></p>
<p class="MsoNormal" style="margin:0cm 0cm 0pt"><span lang="EN-US" style="font-family:'Times New Roman','serif'"><font size="3"> </font></span></p>
<p class="MsoNormal" style="margin:0cm 0cm 0pt"><span lang="EN-US" style="font-family:'Times New Roman','serif'"><font size="3">#-----Repeat Masking (leave values blank to skip repeat masking)</font></span></p>
<p class="MsoNormal" style="margin:0cm 0cm 0pt"><span lang="EN-US" style="font-family:'Times New Roman','serif'"><font size="3">model_org=fungi #select a model organism for RepBase masking in RepeatMasker</font></span></p>
<p class="MsoNormal" style="margin:0cm 0cm 0pt"><span lang="EN-US" style="font-family:'Times New Roman','serif'"><font size="3">rmlib= #provide an organism specific repeat library in fasta format for RepeatMasker</font></span></p>
<p class="MsoNormal" style="margin:0cm 0cm 0pt"><span lang="EN-US" style="font-family:'Times New Roman','serif'"><font size="3">repeat_protein=RepeatPeps.lib #provide a fasta file of transposable element proteins for RepeatRunner</font></span></p>
<p class="MsoNormal" style="margin:0cm 0cm 0pt"><span lang="EN-US" style="font-family:'Times New Roman','serif'"><font size="3">rm_gff= #pre-identified repeat elements from an external GFF3 file</font></span></p>
<p class="MsoNormal" style="margin:0cm 0cm 0pt"><span lang="EN-US" style="font-family:'Times New Roman','serif'"><font size="3">prok_rm=0 #forces MAKER to repeatmask prokaryotes (no reason to change this), 1 = yes, 0 = no</font></span></p>
<p class="MsoNormal" style="margin:0cm 0cm 0pt"><span lang="EN-US" style="font-family:'Times New Roman','serif'"><font size="3">softmask=1 #use soft-masking rather than hard-masking in BLAST (i.e. seg and dust filtering)</font></span></p>
<p class="MsoNormal" style="margin:0cm 0cm 0pt"><span lang="EN-US" style="font-family:'Times New Roman','serif'"><font size="3"> </font></span></p>
<p class="MsoNormal" style="margin:0cm 0cm 0pt"><span lang="EN-US" style="font-family:'Times New Roman','serif'"><font size="3">#-----Gene Prediction</font></span></p>
<p class="MsoNormal" style="margin:0cm 0cm 0pt"><span lang="EN-US" style="font-family:'Times New Roman','serif'"><font size="3">snaphmm= #SNAP HMM file</font></span></p>
<p class="MsoNormal" style="margin:0cm 0cm 0pt"><span lang="EN-US" style="font-family:'Times New Roman','serif'"><font size="3">gmhmm=my_ges.mod #GeneMark HMM file</font></span></p>
<p class="MsoNormal" style="margin:0cm 0cm 0pt"><span lang="EN-US" style="font-family:'Times New Roman','serif'"><font size="3">augustus_species=my2 #Augustus gene prediction species model</font></span></p>
<p class="MsoNormal" style="margin:0cm 0cm 0pt"><span lang="EN-US" style="font-family:'Times New Roman','serif'"><font size="3">fgenesh_par_file= #FGENESH parameter file</font></span></p>
<p class="MsoNormal" style="margin:0cm 0cm 0pt"><span lang="EN-US" style="font-family:'Times New Roman','serif'"><font size="3">pred_gff= #ab-initio predictions from an external GFF3 file</font></span></p>
<p class="MsoNormal" style="margin:0cm 0cm 0pt"><span lang="EN-US" style="font-family:'Times New Roman','serif'"><font size="3">model_gff= #annotated gene models from an external GFF3 file (annotation pass-through)</font></span></p>
<p class="MsoNormal" style="margin:0cm 0cm 0pt"><span lang="EN-US" style="font-family:'Times New Roman','serif'"><font size="3">est2genome=0 #infer gene predictions directly from ESTs, 1 = yes, 0 = no</font></span></p>
<p class="MsoNormal" style="margin:0cm 0cm 0pt"><span lang="EN-US" style="font-family:'Times New Roman','serif'"><font size="3">protein2genome=0 #infer predictions from protein homology, 1 = yes, 0 = no</font></span></p>
<p class="MsoNormal" style="margin:0cm 0cm 0pt"><span lang="EN-US" style="font-family:'Times New Roman','serif'"><font size="3">unmask=0 #also run ab-initio prediction programs on unmasked sequence, 1 = yes, 0 = no</font></span></p>
<p class="MsoNormal" style="margin:0cm 0cm 0pt"><span lang="EN-US" style="font-family:'Times New Roman','serif'"><font size="3"> </font></span></p>
<p class="MsoNormal" style="margin:0cm 0cm 0pt"><span lang="EN-US" style="font-family:'Times New Roman','serif'"><font size="3">#-----Other Annotation Feature Types (features MAKER doesn't recognize)</font></span></p>
<p class="MsoNormal" style="margin:0cm 0cm 0pt"><span lang="EN-US" style="font-family:'Times New Roman','serif'"><font size="3">other_gff= #extra features to pass-through to final MAKER generated GFF3 file</font></span></p>
<p class="MsoNormal" style="margin:0cm 0cm 0pt"><span lang="EN-US" style="font-family:'Times New Roman','serif'"><font size="3"> </font></span></p>
<p class="MsoNormal" style="margin:0cm 0cm 0pt"><span lang="EN-US" style="font-family:'Times New Roman','serif'"><font size="3">#-----External Application Behavior Options</font></span></p>
<p class="MsoNormal" style="margin:0cm 0cm 0pt"><span lang="EN-US" style="font-family:'Times New Roman','serif'"><font size="3">alt_peptide=C #amino acid used to replace non-standard amino acids in BLAST databases</font></span></p>
<p class="MsoNormal" style="margin:0cm 0cm 0pt"><span lang="EN-US" style="font-family:'Times New Roman','serif'"><font size="3">cpus=14 #max number of cpus to use in BLAST and RepeatMasker (not for MPI, leave 1 when using MPI)</font></span></p>
<p class="MsoNormal" style="margin:0cm 0cm 0pt"><span lang="EN-US" style="font-family:'Times New Roman','serif'"><font size="3"> </font></span></p>
<p class="MsoNormal" style="margin:0cm 0cm 0pt"><span lang="EN-US" style="font-family:'Times New Roman','serif'"><font size="3">#-----MAKER Behavior Options</font></span></p>
<p class="MsoNormal" style="margin:0cm 0cm 0pt"><span lang="EN-US" style="font-family:'Times New Roman','serif'"><font size="3">max_dna_len=100000 #length for dividing up contigs into chunks (increases/decreases memory usage)</font></span></p>
<p class="MsoNormal" style="margin:0cm 0cm 0pt"><span lang="EN-US" style="font-family:'Times New Roman','serif'"><font size="3">min_contig=1 #skip genome contigs below this length (under 10kb are often useless)</font></span></p>
<p class="MsoNormal" style="margin:0cm 0cm 0pt"><span lang="EN-US" style="font-family:'Times New Roman','serif'"><font size="3"> </font></span></p>
<p class="MsoNormal" style="margin:0cm 0cm 0pt"><span lang="EN-US" style="font-family:'Times New Roman','serif'"><font size="3">pred_flank=200 #flank for extending evidence clusters sent to gene predictors</font></span></p>
<p class="MsoNormal" style="margin:0cm 0cm 0pt"><span lang="EN-US" style="font-family:'Times New Roman','serif'"><font size="3">pred_stats=0 #report AED and QI statistics for all predictions as well as models</font></span></p>
<p class="MsoNormal" style="margin:0cm 0cm 0pt"><span lang="EN-US" style="font-family:'Times New Roman','serif'"><font size="3">AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and 1)</font></span></p>
<p class="MsoNormal" style="margin:0cm 0cm 0pt"><span lang="EN-US" style="font-family:'Times New Roman','serif'"><font size="3">min_protein=20 #require at least this many amino acids in predicted proteins</font></span></p>
<p class="MsoNormal" style="margin:0cm 0cm 0pt"><span lang="EN-US" style="font-family:'Times New Roman','serif'"><font size="3">alt_splice=0 #Take extra steps to try and find alternative splicing, 1 = yes, 0 = no</font></span></p>
<p class="MsoNormal" style="margin:0cm 0cm 0pt"><span lang="EN-US" style="font-family:'Times New Roman','serif'"><font size="3">always_complete=1 #extra steps to force start and stop codons, 1 = yes, 0 = no</font></span></p>
<p class="MsoNormal" style="margin:0cm 0cm 0pt"><span lang="EN-US" style="font-family:'Times New Roman','serif'"><font size="3">map_forward=0 #map names and attributes forward from old GFF3 genes, 1 = yes, 0 = no</font></span></p>
<p class="MsoNormal" style="margin:0cm 0cm 0pt"><span lang="EN-US" style="font-family:'Times New Roman','serif'"><font size="3">keep_preds=0 #Concordance threshold to add unsupported gene prediction (bound by 0 and 1)</font></span></p>
<p class="MsoNormal" style="margin:0cm 0cm 0pt"><span lang="EN-US" style="font-family:'Times New Roman','serif'"><font size="3"> </font></span></p>
<p class="MsoNormal" style="margin:0cm 0cm 0pt"><font size="3"><span lang="EN-US" style="font-family:'Times New Roman','serif'; background:yellow">split_hit=1500</span><span lang="EN-US" style="font-family:'Times New Roman','serif'"> #length for the splitting
of hits (expected max intron size for evidence alignments)</span></font></p>
<p class="MsoNormal" style="margin:0cm 0cm 0pt"><font size="3"><span lang="EN-US" style="font-family:'Times New Roman','serif'; background:yellow">single_exon=1</span><span lang="EN-US" style="font-family:'Times New Roman','serif'"> #consider single exon EST
evidence when generating annotations, 1 = yes, 0 = no</span></font></p>
<p class="MsoNormal" style="margin:0cm 0cm 0pt"><font size="3"><span lang="EN-US" style="font-family:'Times New Roman','serif'; background:yellow">single_length=200</span><span lang="EN-US" style="font-family:'Times New Roman','serif'"> #min length required
for single exon ESTs if 'single_exon is enabled'</span></font></p>
<p class="MsoNormal" style="margin:0cm 0cm 0pt"><font size="3"><span lang="EN-US" style="font-family:'Times New Roman','serif'; background:yellow">correct_est_fusion=1</span><span lang="EN-US" style="font-family:'Times New Roman','serif'"> #limits use of ESTs
in annotation to avoid fusion genes</span></font></p>
<p class="MsoNormal" style="margin:0cm 0cm 0pt"><span lang="EN-US" style="font-family:'Times New Roman','serif'"><font size="3"> </font></span></p>
<p class="MsoNormal" style="margin:0cm 0cm 0pt"><span lang="EN-US" style="font-family:'Times New Roman','serif'"><font size="3">tries=2 #number of times to try a contig if there is a failure for some reason</font></span></p>
<p class="MsoNormal" style="margin:0cm 0cm 0pt"><span lang="EN-US" style="font-family:'Times New Roman','serif'"><font size="3">clean_try=0 #remove all data from previous run before retrying, 1 = yes, 0 = no</font></span></p>
<p class="MsoNormal" style="margin:0cm 0cm 0pt"><span lang="EN-US" style="font-family:'Times New Roman','serif'"><font size="3">clean_up=0 #removes theVoid directory with individual analysis files, 1 = yes, 0 = no</font></span></p>
<span lang="EN-US" style="font-size:10.5pt; font-family:'Times New Roman','serif'">TMP= #specify a directory other than the system default temporary directory for temporary files</span><br>
--<br>
</div>
</div>
</div>
</div>
</div>
</body>
</html>