[maker-devel] Question about MAKER

Kyungyong Seong s.kyungyong at berkeley.edu
Mon May 17 16:43:36 MDT 2021


Hi Carson,

A wild tomato genome I am annotating has BUSCO completeness of 97% when I
run BUSCO on the genome (-m geno). Currently, I've run MAKER three times,
once with cdna2genome and prot2genome = 1 and twice with trained Augustus,
SNAP and BRAKER gene models.

Running BUSCO on the MAKER annotation gives 88% completeness (-m prot). I
examined a few of the missing BUSCO genes. The homologs with high sequence
identity ( >98%) are present in the evidence annotation sets. Running SNAP
and Augustus independently captures the gene models for these missing
genes. BRAKER also already captured these gene models. These regions are
also not masked by repeatmakser. But MAKER is not annotating anything in
these regions.

Do you know what could be potential issues and how I may be able to resolve
this?
Thank you!
Kyungyong




On Wed, Mar 31, 2021 at 11:00 AM Carson Hinton Holt <
carson.holt at genetics.utah.edu> wrote:

> Hello,
>
> The google group is an archive only and cannot be posted to. You can
> however send questions to the email list (CCed).
>
> The behavior you see is normal. Exonerate gets rerun because it is a
> relatively fast step that can produce a lot of output and saving the
> results for later use can be very IO intensive when running under MPI. Some
> users were actually killing the NFS servers at their institutions. Since
> the performance gain from archiving exonerate results is small, but the
> consequences for IO were large, we don’t archive those results.
>
> —Carson
>
> Sent from my iPhone
>
> > On Mar 31, 2021, at 11:51 AM, Kyungyong Seong <s.kyungyong at berkeley.edu>
> wrote:
> >
> > 
> > Dear Carson,
> >
> > I wanted to leave a question on the google groups, but I didn't have
> permission to do so. I am having a small problem with MAKER and wanted to
> get some advice from you.
> >
> > I ran the initial run with the following figuration successfully.
> >
> > #-----Genome (these are always required)
> >
> genome=/global/scratch/skyungyong/S.habrochaites/1.Final_Assembly/5.Final_data/SH1353.primary.scaffolds.noPlasmid.fa
> #genome sequence (fasta file or fasta embeded in GFF3 file)
> > organism_type=eukaryotic #eukaryotic or prokaryotic. Default is
> eukaryotic
> >
> > #-----Re-annotation Using MAKER Derived GFF3
> > maker_gff= #MAKER derived GFF3 file
> > est_pass=0 #use ESTs in maker_gff: 1 = yes, 0 = no
> > altest_pass=0 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no
> > protein_pass=0 #use protein alignments in maker_gff: 1 = yes, 0 = no
> > rm_pass=0 #use repeats in maker_gff: 1 = yes, 0 = no
> > model_pass=0 #use gene models in maker_gff: 1 = yes, 0 = no
> > pred_pass=0 #use ab-initio predictions in maker_gff: 1 = yes, 0 = no
> > other_pass=0 #passthrough anyything else in maker_gff: 1 = yes, 0 = no
> >
> > #-----EST Evidence (for best results provide a file for at least one)
> >
> est=/global/scratch/skyungyong/S.habrochaites/2.Annotation/2.Maker/DB/Transcriptome/SH_ALL.superscripts.fa
> #set of ESTs or assembled mRNA-seq in fasta format
> >
> altest=/global/scratch/skyungyong/S.habrochaites/2.Annotation/2.Maker/DB/ITAG4.1_cDNA.fasta
> #EST/cDNA sequence file in fasta format from an alternate organism
> > est_gff= #aligned ESTs or mRNA-seq from an external GFF3 file
> > altest_gff= #aligned ESTs from a closly relate species in GFF3 format
> >
> > #-----Protein Homology Evidence (for best results provide a file for at
> least one)
> >
> protein=/global/scratch/skyungyong/S.habrochaites/2.Annotation/2.Maker/DB/Prot.evidence.fa
> #protein sequence file in fasta format (i.e. from mutiple organisms)
> > protein_gff=  #aligned protein homology evidence from an external GFF3
> file
> >
> > #-----Repeat Masking (leave values blank to skip repeat masking)
> > model_org=Solanum #select a model organism for RepBase masking in
> RepeatMasker
> >
> rmlib=/global/scratch/skyungyong/S.habrochaites/2.Annotation/1.RepeatModeler/SH1353-families.fa
> #provide an organism specific repeat library in fasta format for
> RepeatMasker
> >
> repeat_protein=/global/scratch/skyungyong/Software/maker/data/te_proteins.fasta
> #provide a fasta file of transposable element proteins for RepeatRunner
> > rm_gff= #pre-identified repeat elements from an external GFF3 file
> > prok_rm=0 #forces MAKER to repeatmask prokaryotes (no reason to change
> this), 1 = yes, 0 = no
> > softmask=1 #use soft-masking rather than hard-masking in BLAST (i.e. seg
> and dust filtering)
> >
> > #-----Gene Prediction
> > snaphmm=0 #SNAP HMM file
> > gmhmm= #GeneMark HMM file
> > augustus_species=0 #Augustus gene prediction species model
> > fgenesh_par_file= #FGENESH parameter file
> > pred_gff= #ab-initio predictions from an external GFF3 file
> > model_gff= #annotated gene models from an external GFF3 file (annotation
> pass-through)
> > run_evm=0 #run EvidenceModeler, 1 = yes, 0 = no
> > est2genome=1 #infer gene predictions directly from ESTs, 1 = yes, 0 = no
> > protein2genome=1 #infer predictions from protein homology, 1 = yes, 0 =
> no
> > trna=0 #find tRNAs with tRNAscan, 1 = yes, 0 = no
> > snoscan_rrna= #rRNA file to have Snoscan find snoRNAs
> > snoscan_meth= #-O-methylation site fileto have Snoscan find snoRNAs
> > unmask=0 #also run ab-initio prediction programs on unmasked sequence, 1
> = yes, 0 = no
> > allow_overlap= #allowed gene overlap fraction (value from 0 to 1, blank
> for default)
> >
> > #-----Other Annotation Feature Types (features MAKER doesn't recognize)
> > other_gff= #extra features to pass-through to final MAKER generated GFF3
> file
> >
> > #-----External Application Behavior Options
> > alt_peptide=C #amino acid used to replace non-standard amino acids in
> BLAST databases
> > cpus=1 #max number of cpus to use in BLAST and RepeatMasker (not for
> MPI, leave 1 when using MPI)
> >
> > #-----MAKER Behavior Options
> > max_dna_len=100000 #length for dividing up contigs into chunks
> (increases/decreases memory usage)
> > min_contig=1 #skip genome contigs below this length (under 10kb are
> often useless)
> >
> > pred_flank=200 #flank for extending evidence clusters sent to gene
> predictors
> > pred_stats=0 #report AED and QI statistics for all predictions as well
> as models
> > AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0
> and 1)
> > min_protein=0 #require at least this many amino acids in predicted
> proteins
> > alt_splice=0 #Take extra steps to try and find alternative splicing, 1 =
> yes, 0 = no
> > always_complete=0 #extra steps to force start and stop codons, 1 = yes,
> 0 = no
> > map_forward=0 #map names and attributes forward from old GFF3 genes, 1 =
> yes, 0 = no
> > keep_preds=0 #Concordance threshold to add unsupported gene prediction
> (bound by 0 and 1)
> >
> > split_hit=10000 #length for the splitting of hits (expected max intron
> size for evidence alignments)
> > min_intron=20 #minimum intron length (used for alignment polishing)
> > single_exon=0 #consider single exon EST evidence when generating
> annotations, 1 = yes, 0 = no
> > single_length=250 #min length required for single exon ESTs if
> 'single_exon is enabled'
> > correct_est_fusion=0 #limits use of ESTs in annotation to avoid fusion
> genes
> >
> > tries=3 #number of times to try a contig if there is a failure for some
> reason
> > clean_try=0 #remove all data from previous run before retrying, 1 = yes,
> 0 = no
> > clean_up=0 #removes theVoid directory with individual analysis files, 1
> = yes, 0 = no
> > TMP=
> >
> > For the second run, I trained SNAP and Augustus and provided the path to
> the HMMs in maker_opts.ctl. With est2genome=0 and prot2genome=0, I ran
> MAKER in the same directory, hoping MAKER to reuse the info from the
> previous run.
> >
> > The run starts with warning
> > MAKER WARNING: Changes in control files make re-use of hint based
> predictions impossible
> > Old hint based prediction files will be erased before continuing
> >
> > And it seems to run exonerate anew:
> > running  est2genome search.
> > #--------- command -------------#
> > Widget::exonerate::est2genome:
> > /global/scratch/skyungyong/Software/maker/exe/exonerate/bin/exonerate
> -q
> /tmp/maker_gdlFj8/53/LA2119%2ETRINITY_DN24321_c0_g1.for.475-1651.53.fasta
> -t /tmp/maker_gdlFj8/53/scaffold53.475-1651.53.fasta -Q dna -T dna --model
> est2genome  --minintron 20 --maxintron 10000 --showcigar --percent 20 >
> /tmp/maker_gdlFj8/53/scaffold53.475-1651.LA2119%2ETRINITY_DN24321_c0_g1.e.exonerate
> >
> > Is this normal?
> > Thank you for your help!
> > Kyungyong
> >
> >
> >
> >
> >
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20210517/a8a81c70/attachment-0001.html>


More information about the maker-devel mailing list