<html>

  <head>


    <meta http-equiv="content-type" content="text/html; charset=utf-8">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    Hello !<br>

    <br>

    I want to train Augustus for a non model organism and I have several

    questions about it !<br>

    <br>

    I planned to follow the section "Training ab initio Gene

    predictors".<br>

    <br>

    So first, I need to generate a gene model using EST data.<br>

    However, I was wondering how many sequences are necessary ?<br>

    Indeed, my genome is 476 Mb and I have milllions of RNA seq data but

    it takes ages if I put all of them !<br>

    I tried with 1000 sequences and it takes 30 min but is that enought

    ? Or should I take more ?<br>

    <br>

    Secondly, we then obtain plenty of gff files, should we concatenate

    them ?<br>

    <br>

    And then, what to do ? Indeed, the help of maker explains for Snap,

    but I want to use Augustus.<br>

    I found a script called <span itemscope=""

      itemtype="http://schema.org/Answer"><span itemprop="text"><code>autoAug.pl</code></span></span>

    to train Augustus.<br>

    What do you think of it ?<br>

    <br>

    Should I use it that way ?<br>

    <span itemscope="" itemtype="http://schema.org/Answer"><span

        itemprop="text">

        <p><code>autoAug.pl --singleCPU --useexisting

            --genome=mygenome.fasta --species=myspeciesname

            --cdna=EST.fasta --trainingset=genome.gff3</code></p>

      </span></span><br>

    where EST.fasta is the file I used earlier to generate the gene

    model and genome.gff3 is the result of the gene model.<br>

    However, I don't think that I obtained gff3 file from the first

    maker run.<br>

    So should I generate gff3 from gff ???<br>

    <br>

    Thanks a lot for your help,<br>

    <br>

    Muriel<br>

    <br>

    <br>

  </body>

</html>