[maker-devel] Using Augustus with MAKER
Carson Holt
carsonhh at gmail.com
Tue May 19 15:48:29 MDT 2015
A couple of corrections from the reply below. SNAP doesn’t work well on primates, so you probably don’t want to use it (the mammal hmm is not a good replacement). This suggestion comes directly from the author of SNAP. There are ways to make it work by splitting the genome into isotigs but it’s a little messy and technical, so just don’t use it on primates.
Here’s a good website on training Augustus (http://www.molecularevolution.org/molevolfiles/exercises/augustus/training.html <http://www.molecularevolution.org/molevolfiles/exercises/augustus/training.html>). You need some sort of results to train with. You can either use results from a protein2genome run of MAKER or a run where you use human as your species together with other evidence in MAKER (models won’t be perfect but will be enough to get training going).
Unless it’s really really close evolutionarily to human, you probably don’t just want to stick to the human species file (this is because your not going to want to use SNAP, so you will need to optimize the one gene predictor you will get to use as much as possible).
You need models to be in GeneBank format for training. There is a round about way to do this with GFF3 models. First use the scripts that come with MAKER for training SNAP (makerr2zff). Then follow SNAP’s training instructions on training SNAP (in SNAP’s README).
Basically the following commands (where the first two files came from maker2zff) —>
fathom genome.ann genome.dna -categorize 1000
fathom uni.ann uni.dna -export 1000 -plus
Then using this script from Jason Stajich, you can convert it to the export.ann and export.dna files to a genebank format file —>
https://github.com/hyphaltip/genome-scripts/blob/master/gene_prediction/zff2augustus_gbk.pl <https://github.com/hyphaltip/genome-scripts/blob/master/gene_prediction/zff2augustus_gbk.pl>
Go ahead and run with human as your species first, so you can review models and see how models and evidence correlating in a viewer like Apollo or IGV. But I still would recommend training Augustus to your species.
—Carson
> On May 19, 2015, at 3:18 PM, Michael Campbell <michael.s.campbell1 at gmail.com> wrote:
>
> Hi Julian,
>
> Since you are annotating a primate I would use the pre-trained human parameter for augustus. Here is what I would try first
>
> genome=data/hsap_contig.fasta # contig file from example data
> est=data/mRNAs.fa # RNAs filtered to just mRNAs
> protein=data/protein.fa
> est2genome=0
> protein2genome=0
> augustus_species=human
>
> You could also use one of the mammal HMMs packaged with SNAP as well, or use the output from the above to train SNAP. There are tutorial that walk through these steps here:
>
> http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Main_Page <http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Main_Page>
>
> There is also a current protocols in bioinformatics article for using MAKER can may help you get started as well.
>
> http://onlinelibrary.wiley.com/doi/10.1002/0471250953.bi0411s48/abstract <http://onlinelibrary.wiley.com/doi/10.1002/0471250953.bi0411s48/abstract>
>
> Good luck,
> Mike
>
> On Tue, May 19, 2015 at 1:51 PM, Julian Egger <julian.egger at omahazoo.com <mailto:julian.egger at omahazoo.com>> wrote:
>
> I am trying to use Augustus in MAKER to help with annotating as many genes as possible from genomic reads of a primate sample. I am new to using gene prediction tools such as SNAP and Augustus, but was told Augustus would be better for primates. I tried using reference mRNAs and protein sequences from NCBI on the sample contig file included with the MAKER software and it ran ok. My question is how do I now use the output to train Augustus iteratively and thus create a file set of annotations from my original input?
>
> After creating the control files with maker -CTL, the only configurations I made to maker_opts.ctl were:
> genome=data/hsap_contig.fasta # contig file from example data
> est=data/mRNAs.fa # RNAs filtered to just mRNAs
> protein=data/protein.fa
> est2genome=1
> protein2genome=1
>
> I will eventually replace the contig file with our scaffolds file from the assembly. I know the output created a gff file along with protein and mRNA files. Do I then need to change the maker_opts file to account for the new files and if so how and what should the maker__opts file look like now? Was Augustus supposed to be set up on the initial maker run or do I wait until the second run after est2genome and protein2genome were used to initialize training for Augustus and how do the configurations change between multiple iterations because I have a solid annotation set?
>
> Sorry for all the questions, newbie here with a lot of data to work with.
>
> Thanks
>
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com <mailto:maker-devel at box290.bluehost.com>
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org <http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org>
>
>
>
>
> --
> Michael Campbell MS, RD.
> Doctoral Candidate
> Eccles Institute of Human Genetics
> University of Utah
> 15 North 2030 East, Room 2100
> Salt Lake City, UT 84112-5330
> ph:585-3543
>
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20150519/dd5391bf/attachment-0001.html>
More information about the maker-devel
mailing list