[maker-devel] First time using maker- Train or not to train?

Elyssa Garza elyssa_garza at yahoo.com
Thu Dec 17 15:29:56 MST 2015


Hi Daniel,
I used the pre-trained models of Arabidopsis from SNAP and Augustus for this first run of maker.  Do you think it would be wise to use the run I used previously (shown at the start of the topic) or should I make a new run with the following parameters to use for training?  
genome=CAB_assembly.fastaest=RTLs.faaltest=Brassica_oleracea.fasta
protein=Arabidopsis_proteins.fastaest2genome=0protein2genome=0SNAP=A.thalianaAugustus=arabidopsismodel_org=arabidopsisrmlib=Brassicaceae_repeats.fastarepeat_protein=te_proteins.fasta

At what point would I use est2genome=1?  Also for this plant genome, is it better to use model_org=arabidopsis or model_org=all?  I am also considering using RepeatModeler to create a custom repeat library, but I am not sure it is necessary with all of the repeat information I am putting in already.
Any advice is helpful.Thanks,-Elyssa 

    On Wednesday, December 16, 2015 12:07 PM, Daniel Ence <dence at genetics.utah.edu> wrote:
 

  Hi Elyssa, 
Setting est2genome=1 tells MAKER to promote all of the est2genome alignments to a gene model, which is not what you want for a final gene set. That being said, since your gene models are basically the unmodified alignments, I’m surprised that all of them have an AED of 1, since that means that they’re not supported by any of the evidence (either est or protein). 
Did you get gene models from snap or augustus? You can gather those with the fasta_merge script. Those should be a good starting place for training ab initio predictors. Instructions for training snap can be found here:http://gmod.org/wiki/MAKER_Tutorial#Training_ab_initio_Gene_Predictors
Augustus can also be trained but is much more involved.
~Daniel

Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330 

On Dec 11, 2015, at 10:43 AM, Elyssa Garza <elyssa_garza at yahoo.com> wrote:
Hello,
I have recently begun running Maker.  I am currently trying to annotate my Caulanthus Genome (~372Mb); a relative to Arabidopsis.  I am unsure about the parameters I have chosen for my first run in maker, which include:
genome=CAB_assembly.fasta (1044 contigs)est=Representative_transcript_loci.fasta (assembled transcripts btw 200-20000bp long)protein=TAIR10pep.fasta (Arabidopsis proteins)—Repeat maskingmodel_org=arabidopsisrmlib=list of Brassicaceae and common plant repeatsrepeat_protein=te_proteins.fastaGene Predictionsnaphmm=A.thaliana.hmmaugustus_species=arabidopsisest2genome=1
I have run a sample file of scaffolds, as well as the entire genome.In the sample file of scaffolds, I gff3merged the gffs and then ran evaluator.  I noticed that my AED are all 1.  Is this bad?  What should I try next?
I am also unsure on how to train files and if this should be done in my case.
Can anyone advise me on these issues?
-Elyssa_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org



_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20151217/9739d6d9/attachment-0003.html>


More information about the maker-devel mailing list