[maker-devel] Advice for optimizing augustus training on fungal genome?

Carson Holt carsonhh at gmail.com
Fri Jul 27 10:38:56 MDT 2012


Regarding this older post, if you aren't getting good results using CEGMA
for training augustus for a new species, then  One option would be just to
make a copy Neurospora's species directory.  Edit the necessary file
contents to make it list as a different species, then run the augustus
training steps as before but this use the Neurospora copy as the base so
augustus will be optimizing Neurospora's parameters to be more like your
species of interest.

Thanks,
Carson



On 12-06-28 11:11 AM, "Fourie Joubert" <fourie.joubert at up.ac.za> wrote:

>Hi Everyone
>
>Apologies if this is not the relevant list to mail to.
>
>I am looking for advice in training augustus for a novel fungal genome.
>
>I generated a gene set using CEGMA (below), and have subsequently been
>following the instructions at
>http://www.molecularevolution.org/molevolfiles/exercises/augustus/scipio.h
>tml 
>and at 
>http://www.molecularevolution.org/molevolfiles/exercises/augustus/training
>.html.
>
>My training set is 339 genes and the test set is 100 genes.
>
>My initial output is below.
>
>It does not improve much with optimize_augustus.
>
>When using the training paramters to predict genes in the genome, I seem
>to only find around 2,000 of the known ~16,000 genes. When I use the
>training data from a distantly related fungus (Neurospora), I get
>roughly the correct number of genes.
>
>I am obviously doing something wrong here... (commands below).
>
>I would really appreciate any advice on where to start looking for
>improvement.
>
>Kindest regards!
>
>Fourie
>
>
>
>
>
>Augustus commands (Editedmyspecies_parameters.cfg  and
>setstopCodonExcludedFromCDS  to true.):
>
>>  etraining --species=myspecies genes.gb.train
>
>>  augustus --species=myspecies genes.gb.test | tee firsttest.out
>
>>  grep -A 22 Evaluation firsttest.out
>
>>  optimize_augustus.pl --species=myspecies genes.gb.train
>
>>  etraining --species=myspecies genes.gb.train
>
>>  augustus --species=myspecies genes.gb.test | tee secondtest.out
>
>>  grep -A 22 Evaluation secondtest.out
>
>
>
>CEGMA output:
>
>#      Statistics of the completeness of the genome based on 248 CEGs
> #
>
>               #Prots  %Completeness  -  #Total  Average  %Ortho
>
>   Complete      240       96.77      -   278     1.16     11.67
>
>    Group 1       64       96.97      -    72     1.12      7.81
>    Group 2       54       96.43      -    66     1.22     18.52
>    Group 3       58       95.08      -    70     1.21     13.79
>    Group 4       64       98.46      -    70     1.09      7.81
>
>    Partial      245       98.79      -   290     1.18     13.88
>
>    Group 1       65       98.48      -    73     1.12      7.69
>    Group 2       56      100.00      -    70     1.25     21.43
>    Group 3       59       96.72      -    75     1.27     18.64
>    Group 4       65      100.00      -    72     1.11      9.23
>
>
>
>
>Augustus output:
>
>*******      Evaluation of gene prediction     *******
>
>---------------------------------------------\
>
>                  | sensitivity | specificity |
>
>---------------------------------------------|
>
>nucleotide level |       0.933 |       0.772 |
>
>---------------------------------------------/
>
>--------------------------------------------------------------------------
>--------------------------------\
>
>            |  #pred |  #anno |      |    FP = false pos. |    FN = false
>neg. |             |             |
>
>            | total/ | total/ |   TP
>|--------------------|--------------------| sensitivity | specificity |
>
>            | unique | unique |      | part | ovlp | wrng | part | ovlp |
>wrng |             |             |
>
>--------------------------------------------------------------------------
>--------------------------------|
>
>            |        |        |      |                229 |
>  85 |             |             |
>
>exon level |    475 |    331 |  246 | ------------------ |
>------------------ |       0.743 |       0.518 |
>
>            |    475 |    331 |      |   59 |    9 |  161 |   56 |    2 |
>  27 |             |             |
>
>--------------------------------------------------------------------------
>--------------------------------/
>
>--------------------------------------------------------------------------
>--\
>
>transcript | #pred | #anno |   TP |   FP |   FN | sensitivity |
>specificity |
>
>--------------------------------------------------------------------------
>--|
>
>gene level |   158 |   100 |   45 |  113 |   55 |        0.45 |
>0.285 |
>
>--------------------------------------------------------------------------
>--/
>
>
>
>
>-- 
>--------------
>Prof Fourie Joubert
>Bioinformatics and Computational Biology Unit
>Department of Biochemistry
>University of Pretoria
>fourie.joubert at up.ac.za
>http://www.bi.up.ac.za
>Tel. +27-12-420-5825
>Fax. +27-12-420-5800
>
>-------------------------------------------------------------------------
>This message and attachments are subject to a disclaimer. Please refer
>to www.it.up.ac.za/documentation/governance/disclaimer/ for full details.
>
>
>_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org






More information about the maker-devel mailing list