[maker-devel] Advice for optimizing augustus training on fungal genome?
Carson Holt
carsonhh at gmail.com
Fri Jul 27 10:38:56 MDT 2012
Regarding this older post, if you aren't getting good results using CEGMA
for training augustus for a new species, then One option would be just to
make a copy Neurospora's species directory. Edit the necessary file
contents to make it list as a different species, then run the augustus
training steps as before but this use the Neurospora copy as the base so
augustus will be optimizing Neurospora's parameters to be more like your
species of interest.
Thanks,
Carson
On 12-06-28 11:11 AM, "Fourie Joubert" <fourie.joubert at up.ac.za> wrote:
>Hi Everyone
>
>Apologies if this is not the relevant list to mail to.
>
>I am looking for advice in training augustus for a novel fungal genome.
>
>I generated a gene set using CEGMA (below), and have subsequently been
>following the instructions at
>http://www.molecularevolution.org/molevolfiles/exercises/augustus/scipio.h
>tml
>and at
>http://www.molecularevolution.org/molevolfiles/exercises/augustus/training
>.html.
>
>My training set is 339 genes and the test set is 100 genes.
>
>My initial output is below.
>
>It does not improve much with optimize_augustus.
>
>When using the training paramters to predict genes in the genome, I seem
>to only find around 2,000 of the known ~16,000 genes. When I use the
>training data from a distantly related fungus (Neurospora), I get
>roughly the correct number of genes.
>
>I am obviously doing something wrong here... (commands below).
>
>I would really appreciate any advice on where to start looking for
>improvement.
>
>Kindest regards!
>
>Fourie
>
>
>
>
>
>Augustus commands (Editedmyspecies_parameters.cfg and
>setstopCodonExcludedFromCDS to true.):
>
>> etraining --species=myspecies genes.gb.train
>
>> augustus --species=myspecies genes.gb.test | tee firsttest.out
>
>> grep -A 22 Evaluation firsttest.out
>
>> optimize_augustus.pl --species=myspecies genes.gb.train
>
>> etraining --species=myspecies genes.gb.train
>
>> augustus --species=myspecies genes.gb.test | tee secondtest.out
>
>> grep -A 22 Evaluation secondtest.out
>
>
>
>CEGMA output:
>
># Statistics of the completeness of the genome based on 248 CEGs
> #
>
> #Prots %Completeness - #Total Average %Ortho
>
> Complete 240 96.77 - 278 1.16 11.67
>
> Group 1 64 96.97 - 72 1.12 7.81
> Group 2 54 96.43 - 66 1.22 18.52
> Group 3 58 95.08 - 70 1.21 13.79
> Group 4 64 98.46 - 70 1.09 7.81
>
> Partial 245 98.79 - 290 1.18 13.88
>
> Group 1 65 98.48 - 73 1.12 7.69
> Group 2 56 100.00 - 70 1.25 21.43
> Group 3 59 96.72 - 75 1.27 18.64
> Group 4 65 100.00 - 72 1.11 9.23
>
>
>
>
>Augustus output:
>
>******* Evaluation of gene prediction *******
>
>---------------------------------------------\
>
> | sensitivity | specificity |
>
>---------------------------------------------|
>
>nucleotide level | 0.933 | 0.772 |
>
>---------------------------------------------/
>
>--------------------------------------------------------------------------
>--------------------------------\
>
> | #pred | #anno | | FP = false pos. | FN = false
>neg. | | |
>
> | total/ | total/ | TP
>|--------------------|--------------------| sensitivity | specificity |
>
> | unique | unique | | part | ovlp | wrng | part | ovlp |
>wrng | | |
>
>--------------------------------------------------------------------------
>--------------------------------|
>
> | | | | 229 |
> 85 | | |
>
>exon level | 475 | 331 | 246 | ------------------ |
>------------------ | 0.743 | 0.518 |
>
> | 475 | 331 | | 59 | 9 | 161 | 56 | 2 |
> 27 | | |
>
>--------------------------------------------------------------------------
>--------------------------------/
>
>--------------------------------------------------------------------------
>--\
>
>transcript | #pred | #anno | TP | FP | FN | sensitivity |
>specificity |
>
>--------------------------------------------------------------------------
>--|
>
>gene level | 158 | 100 | 45 | 113 | 55 | 0.45 |
>0.285 |
>
>--------------------------------------------------------------------------
>--/
>
>
>
>
>--
>--------------
>Prof Fourie Joubert
>Bioinformatics and Computational Biology Unit
>Department of Biochemistry
>University of Pretoria
>fourie.joubert at up.ac.za
>http://www.bi.up.ac.za
>Tel. +27-12-420-5825
>Fax. +27-12-420-5800
>
>-------------------------------------------------------------------------
>This message and attachments are subject to a disclaimer. Please refer
>to www.it.up.ac.za/documentation/governance/disclaimer/ for full details.
>
>
>_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
More information about the maker-devel
mailing list