[maker-devel] Advice for optimizing augustus training on fungal genome?

Fourie Joubert fourie.joubert at up.ac.za
Thu Jun 28 09:11:31 MDT 2012


Hi Everyone

Apologies if this is not the relevant list to mail to.

I am looking for advice in training augustus for a novel fungal genome.

I generated a gene set using CEGMA (below), and have subsequently been 
following the instructions at 
http://www.molecularevolution.org/molevolfiles/exercises/augustus/scipio.html 
and at 
http://www.molecularevolution.org/molevolfiles/exercises/augustus/training.html.

My training set is 339 genes and the test set is 100 genes.

My initial output is below.

It does not improve much with optimize_augustus.

When using the training paramters to predict genes in the genome, I seem 
to only find around 2,000 of the known ~16,000 genes. When I use the 
training data from a distantly related fungus (Neurospora), I get 
roughly the correct number of genes.

I am obviously doing something wrong here... (commands below).

I would really appreciate any advice on where to start looking for 
improvement.

Kindest regards!

Fourie





Augustus commands (Editedmyspecies_parameters.cfg  and setstopCodonExcludedFromCDS  to true.):

>  etraining --species=myspecies genes.gb.train

>  augustus --species=myspecies genes.gb.test | tee firsttest.out

>  grep -A 22 Evaluation firsttest.out

>  optimize_augustus.pl --species=myspecies genes.gb.train

>  etraining --species=myspecies genes.gb.train

>  augustus --species=myspecies genes.gb.test | tee secondtest.out

>  grep -A 22 Evaluation secondtest.out



CEGMA output:

#      Statistics of the completeness of the genome based on 248 CEGs      #

               #Prots  %Completeness  -  #Total  Average  %Ortho

   Complete      240       96.77      -   278     1.16     11.67

    Group 1       64       96.97      -    72     1.12      7.81
    Group 2       54       96.43      -    66     1.22     18.52
    Group 3       58       95.08      -    70     1.21     13.79
    Group 4       64       98.46      -    70     1.09      7.81

    Partial      245       98.79      -   290     1.18     13.88

    Group 1       65       98.48      -    73     1.12      7.69
    Group 2       56      100.00      -    70     1.25     21.43
    Group 3       59       96.72      -    75     1.27     18.64
    Group 4       65      100.00      -    72     1.11      9.23




Augustus output:

*******      Evaluation of gene prediction     *******

---------------------------------------------\

                  | sensitivity | specificity |

---------------------------------------------|

nucleotide level |       0.933 |       0.772 |

---------------------------------------------/

----------------------------------------------------------------------------------------------------------\

            |  #pred |  #anno |      |    FP = false pos. |    FN = false neg. |             |             |

            | total/ | total/ |   TP |--------------------|--------------------| sensitivity | specificity |

            | unique | unique |      | part | ovlp | wrng | part | ovlp | wrng |             |             |

----------------------------------------------------------------------------------------------------------|

            |        |        |      |                229 |                 85 |             |             |

exon level |    475 |    331 |  246 | ------------------ | ------------------ |       0.743 |       0.518 |

            |    475 |    331 |      |   59 |    9 |  161 |   56 |    2 |   27 |             |             |

----------------------------------------------------------------------------------------------------------/

----------------------------------------------------------------------------\

transcript | #pred | #anno |   TP |   FP |   FN | sensitivity | specificity |

----------------------------------------------------------------------------|

gene level |   158 |   100 |   45 |  113 |   55 |        0.45 |       0.285 |

----------------------------------------------------------------------------/




-- 
--------------
Prof Fourie Joubert
Bioinformatics and Computational Biology Unit
Department of Biochemistry
University of Pretoria
fourie.joubert at up.ac.za
http://www.bi.up.ac.za
Tel. +27-12-420-5825
Fax. +27-12-420-5800

-------------------------------------------------------------------------
This message and attachments are subject to a disclaimer. Please refer
to www.it.up.ac.za/documentation/governance/disclaimer/ for full details.





More information about the maker-devel mailing list