<html><head><meta http-equiv="Content-Type" content="text/html charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">Yes. BUSCO is awesome. Also they have presentations this year at PAG in both the “Next Generation Genome Annotation and Analysis” and “Computational Gene Discovery” workshops.<div class=""><br class=""></div><div class="">—Carson</div><div class=""><br class=""></div><div class=""><br class=""></div><div class=""><br class=""><div><blockquote type="cite" class=""><div class="">On Dec 16, 2015, at 4:41 PM, Xabier Vázquez Campos <<a href="mailto:xvazquezc@gmail.com" class="">xvazquezc@gmail.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div dir="ltr" class=""><div class=""><div class="">Hi Daniel,<br class=""><br class=""></div>Have you guys heard about <a href="http://busco.ezlab.org/" class="">BUSCO</a>? It's kind of a replacement for CEGMA, which was based in a rather limited set of genes (according to their devels we should stop using). BUSCO does not only produces a more thorough completeness profile but it also generates the Augustus species training profile (it needs access to your local Augustus species folder). According to the manual, if you use the --long option it is similar to a training and retraining step in the old training method.<br class=""><br class=""></div><div class="">I recently used it for training Augustus for my fungal genomes and it works well. Unfortunately, it may not apply for this case as they don't have the plant profile dataset ready yet. You may request early access to it though<br class=""><br class=""></div><div class="">I used to use the CEGMA output plus the webAugustus training service, a bit more tedious but not that complicated. I copy below what I had in my old protocol, nonetheless I would recommend any other user not dealing with plant genomes to use BUSCO instead:<br class=""><br class=""><blockquote style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex" class="gmail_quote">Augustus gff files are a bit different from CEGMA ones. Get the CEGMA output and run the following script:<br class=""> cegma2zff output.cegma.gff > augustus.gff<br class=""><br class="">Upload the genome file (e.g. contigs.fa from velvet) and the "training gene structure file" (augustus.gff) to <a href="http://bioinf.uni-greifswald.de/webaugustus/training/create" class="">http://bioinf.uni-greifswald.de/webaugustus/training/create</a><br class=""><br class="">Once finished, the "Species parameter archive" (parameters.tar.gz) will contain a folder with the model files for your species. Copy it to the species folder of Augustus (augustus/config/species).<br class=""><br class="">Re-training<br class=""><br class="">From Maker's output, follow the the same initial instructions as for SNAP training detailed in the Maker tutorial:<br class="">In the directory that contains MYGENOME.maker.output/ folder:<br class=""> mkdir snap<br class=""> cd snap<br class=""> gff3_merge -d ../MYGENOME.maker.output/MYGENOME_master_datastore_index.log<br class=""> maker2zff -n MYGENOME.all.gff<br class="">The option -n is not included in the original tutorial but you may end with empty genome.ann and genome.dna files.<br class="">From this point we generate training files for both SNAP and Augustus:<br class=""><br class=""> fathom genome.ann genome.dna -categorize 1000<br class=""> fathom uni.ann uni.dna -export 1000 -plus<br class=""> forge export.ann export.dna<br class=""> <br class="">For Augustus, we need the script "<a href="http://zff2augustus_gbk.pl/" class="">zff2augustus_gbk.pl</a>". This will take the export.dna generated by fathom and generate a *.gb file that will be used as "training gene structure file" in a new training submission in WebAugustus, but remember to give it a new name in the submission, e.g. MYGENOME_v2, or Maker won't see the difference (same name):<br class=""> perl PATH/TO/SCRIPT/<a href="http://zff2augustus_gbk.pl/" class="">zff2augustus_gbk.pl</a> > <a href="http://mygenome_v2.train.gb/" class="">MYGENOME_v2.train.gb</a><br class=""></blockquote><div class=""> <br class=""></div>Xabier<br class=""></div></div><div class="gmail_extra"><br class=""><div class="gmail_quote">On 17 December 2015 at 05:07, Daniel Ence <span dir="ltr" class=""><<a href="mailto:dence@genetics.utah.edu" target="_blank" class="">dence@genetics.utah.edu</a>></span> wrote:<br class=""><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div style="word-wrap:break-word" class="">
Hi Elyssa,
<div class=""><br class="">
</div>
<div class="">Setting est2genome=1 tells MAKER to promote all of the est2genome alignments to a gene model, which is not what you want for a final gene set. That being said, since your gene models are basically the unmodified alignments, I’m surprised that
all of them have an AED of 1, since that means that they’re not supported by any of the evidence (either est or protein). </div>
<div class=""><br class="">
</div>
<div class="">Did you get gene models from snap or augustus? You can gather those with the fasta_merge script. Those should be a good starting place for training ab initio predictors. Instructions for training snap can be found here:</div>
<div class=""><a href="http://gmod.org/wiki/MAKER_Tutorial#Training_ab_initio_Gene_Predictors" target="_blank" class="">http://gmod.org/wiki/MAKER_Tutorial#Training_ab_initio_Gene_Predictors</a></div>
<div class=""><br class="">
</div>
<div class="">Augustus can also be trained but is much more involved.</div>
<div class=""><br class="">
</div>
<div class="">~Daniel</div>
<div class=""><br class="">
</div>
<div class=""><br class="">
</div>
<div class="">
<div class="">Daniel Ence<br class="">
Graduate Student<br class="">
Eccles Institute of Human Genetics<br class="">
University of Utah<br class="">
15 North 2030 East, Room 2100<br class="">
Salt Lake City, UT 84112-5330 </div>
<br class="">
<div class="">
<blockquote type="cite" class=""><div class=""><div class="h5">
<div class="">On Dec 11, 2015, at 10:43 AM, Elyssa Garza <<a href="mailto:elyssa_garza@yahoo.com" target="_blank" class="">elyssa_garza@yahoo.com</a>> wrote:</div>
<br class="">
</div></div><div class=""><div class=""><div class="h5">
<div style="word-wrap:break-word" class="">
Hello,
<div class=""><br class="">
</div>
<div class="">I have recently begun running Maker. I am currently trying to annotate my Caulanthus Genome (~372Mb); a relative to Arabidopsis. I am unsure about the parameters I have chosen for my first run in maker, which include:</div>
<div class=""><br class="">
</div>
<div class="">genome=CAB_assembly.fasta (1044 contigs)</div>
<div class="">est=Representative_transcript_loci.fasta (assembled transcripts btw 200-20000bp long)</div>
<div class="">protein=TAIR10pep.fasta (Arabidopsis proteins)</div>
<div class="">—</div>
<div class=""><u class="">Repeat masking</u></div>
<div class="">model_org=arabidopsis</div>
<div class="">rmlib=list of Brassicaceae and common plant repeats</div>
<div class="">repeat_protein=te_proteins.fasta</div>
<div class=""><u class="">Gene Prediction</u></div>
<div class="">snaphmm=A.thaliana.hmm</div>
<div class="">augustus_species=arabidopsis</div>
<div class="">est2genome=1</div>
<div class=""><br class="">
</div>
<div class="">I have run a sample file of scaffolds, as well as the entire genome.</div>
<div class="">In the sample file of scaffolds, I gff3merged the gffs and then ran evaluator. I noticed that my AED are all 1. Is this bad? What should I try next?</div>
<div class=""><br class="">
</div>
<div class="">I am also unsure on how to train files and if this should be done in my case.</div>
<div class=""><br class="">
</div>
<div class="">Can anyone advise me on these issues?</div>
<div class=""><br class="">
</div>
<div class="">-Elyssa</div>
</div></div></div>
_______________________________________________<br class="">
maker-devel mailing list<br class="">
<a href="mailto:maker-devel@box290.bluehost.com" target="_blank" class="">maker-devel@box290.bluehost.com</a><br class="">
<a href="http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org" target="_blank" class="">http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org</a><br class="">
</div>
</blockquote>
</div>
<br class="">
</div>
</div>
<br class="">_______________________________________________<br class="">
maker-devel mailing list<br class="">
<a href="mailto:maker-devel@box290.bluehost.com" class="">maker-devel@box290.bluehost.com</a><br class="">
<a href="http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org" rel="noreferrer" target="_blank" class="">http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org</a><br class="">
<br class=""></blockquote></div><br class=""><br clear="all" class=""><br class="">-- <br class=""><div class="gmail_signature"><div dir="ltr" class=""><div class=""><div dir="ltr" class=""><div class=""><div dir="ltr" class=""><div class="">Xabier Vázquez-Campos, <i class="">PhD</i><br class=""><i class="">Research Associate</i><br class="">Water Research Centre<br class="">School of Civil and Environmental Engineering<br class="">
The University of New South Wales<br class="">Sydney NSW 2052 AUSTRALIA<br class=""></div></div></div></div></div></div></div>
</div>
_______________________________________________<br class="">maker-devel mailing list<br class=""><a href="mailto:maker-devel@box290.bluehost.com" class="">maker-devel@box290.bluehost.com</a><br class="">http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org<br class=""></div></blockquote></div><br class=""></div></body></html>