[maker-devel] Advice for optimizing augustus training on fungal genome?
Fourie Joubert
fourie.joubert at up.ac.za
Thu Jun 28 09:11:31 MDT 2012
Hi Everyone
Apologies if this is not the relevant list to mail to.
I am looking for advice in training augustus for a novel fungal genome.
I generated a gene set using CEGMA (below), and have subsequently been
following the instructions at
http://www.molecularevolution.org/molevolfiles/exercises/augustus/scipio.html
and at
http://www.molecularevolution.org/molevolfiles/exercises/augustus/training.html.
My training set is 339 genes and the test set is 100 genes.
My initial output is below.
It does not improve much with optimize_augustus.
When using the training paramters to predict genes in the genome, I seem
to only find around 2,000 of the known ~16,000 genes. When I use the
training data from a distantly related fungus (Neurospora), I get
roughly the correct number of genes.
I am obviously doing something wrong here... (commands below).
I would really appreciate any advice on where to start looking for
improvement.
Kindest regards!
Fourie
Augustus commands (Editedmyspecies_parameters.cfg and setstopCodonExcludedFromCDS to true.):
> etraining --species=myspecies genes.gb.train
> augustus --species=myspecies genes.gb.test | tee firsttest.out
> grep -A 22 Evaluation firsttest.out
> optimize_augustus.pl --species=myspecies genes.gb.train
> etraining --species=myspecies genes.gb.train
> augustus --species=myspecies genes.gb.test | tee secondtest.out
> grep -A 22 Evaluation secondtest.out
CEGMA output:
# Statistics of the completeness of the genome based on 248 CEGs #
#Prots %Completeness - #Total Average %Ortho
Complete 240 96.77 - 278 1.16 11.67
Group 1 64 96.97 - 72 1.12 7.81
Group 2 54 96.43 - 66 1.22 18.52
Group 3 58 95.08 - 70 1.21 13.79
Group 4 64 98.46 - 70 1.09 7.81
Partial 245 98.79 - 290 1.18 13.88
Group 1 65 98.48 - 73 1.12 7.69
Group 2 56 100.00 - 70 1.25 21.43
Group 3 59 96.72 - 75 1.27 18.64
Group 4 65 100.00 - 72 1.11 9.23
Augustus output:
******* Evaluation of gene prediction *******
---------------------------------------------\
| sensitivity | specificity |
---------------------------------------------|
nucleotide level | 0.933 | 0.772 |
---------------------------------------------/
----------------------------------------------------------------------------------------------------------\
| #pred | #anno | | FP = false pos. | FN = false neg. | | |
| total/ | total/ | TP |--------------------|--------------------| sensitivity | specificity |
| unique | unique | | part | ovlp | wrng | part | ovlp | wrng | | |
----------------------------------------------------------------------------------------------------------|
| | | | 229 | 85 | | |
exon level | 475 | 331 | 246 | ------------------ | ------------------ | 0.743 | 0.518 |
| 475 | 331 | | 59 | 9 | 161 | 56 | 2 | 27 | | |
----------------------------------------------------------------------------------------------------------/
----------------------------------------------------------------------------\
transcript | #pred | #anno | TP | FP | FN | sensitivity | specificity |
----------------------------------------------------------------------------|
gene level | 158 | 100 | 45 | 113 | 55 | 0.45 | 0.285 |
----------------------------------------------------------------------------/
--
--------------
Prof Fourie Joubert
Bioinformatics and Computational Biology Unit
Department of Biochemistry
University of Pretoria
fourie.joubert at up.ac.za
http://www.bi.up.ac.za
Tel. +27-12-420-5825
Fax. +27-12-420-5800
-------------------------------------------------------------------------
This message and attachments are subject to a disclaimer. Please refer
to www.it.up.ac.za/documentation/governance/disclaimer/ for full details.
More information about the maker-devel
mailing list