<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
Hello !<br>
<br>
I want to train Augustus for a non model organism and I have several
questions about it !<br>
<br>
I planned to follow the section "Training ab initio Gene
predictors".<br>
<br>
So first, I need to generate a gene model using EST data.<br>
However, I was wondering how many sequences are necessary ?<br>
Indeed, my genome is 476 Mb and I have milllions of RNA seq data but
it takes ages if I put all of them !<br>
I tried with 1000 sequences and it takes 30 min but is that enought
? Or should I take more ?<br>
<br>
Secondly, we then obtain plenty of gff files, should we concatenate
them ?<br>
<br>
And then, what to do ? Indeed, the help of maker explains for Snap,
but I want to use Augustus.<br>
I found a script called <span itemscope=""
itemtype="http://schema.org/Answer"><span itemprop="text"><code>autoAug.pl</code></span></span>
to train Augustus.<br>
What do you think of it ?<br>
<br>
Should I use it that way ?<br>
<span itemscope="" itemtype="http://schema.org/Answer"><span
itemprop="text">
<p><code>autoAug.pl --singleCPU --useexisting
--genome=mygenome.fasta --species=myspeciesname
--cdna=EST.fasta --trainingset=genome.gff3</code></p>
</span></span><br>
where EST.fasta is the file I used earlier to generate the gene
model and genome.gff3 is the result of the gene model.<br>
However, I don't think that I obtained gff3 file from the first
maker run.<br>
So should I generate gff3 from gff ???<br>
<br>
Thanks a lot for your help,<br>
<br>
Muriel<br>
<br>
<br>
</body>
</html>