<html><head><meta http-equiv="Content-Type" content="text/html charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">Muriel -<div class=""><br class=""></div><div class="">Be best if you take longest models - ones that have ATG and STOP — my workflow is to analyze the data with Trinity to assemble the transcripts (genome guided) and then align these transcripts to the genome with PASA and take the longest ORFs using scripts provided with PASA to generate the best set for gene predictions.<div class=""><br class=""></div><div class=""><br class=""><div class=""><div><blockquote type="cite" class=""><div class="">On Oct 28, 2014, at 3:04 AM, Muriel Gros-Balthazard <<a href="mailto:muriel.grosb@gmail.com" class="">muriel.grosb@gmail.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class="">
<meta http-equiv="content-type" content="text/html; charset=utf-8" class="">
<div text="#000000" bgcolor="#FFFFFF" class="">
Hello !<br class="">
<br class="">
I want to train Augustus for a non model organism and I have several
questions about it !<br class="">
<br class="">
I planned to follow the section "Training ab initio Gene
predictors".<br class="">
<br class="">
So first, I need to generate a gene model using EST data.<br class="">
However, I was wondering how many sequences are necessary ?<br class="">
Indeed, my genome is 476 Mb and I have milllions of RNA seq data but
it takes ages if I put all of them !<br class="">
I tried with 1000 sequences and it takes 30 min but is that enought
? Or should I take more ?<br class="">
<br class="">
Secondly, we then obtain plenty of gff files, should we concatenate
them ?<br class="">
<br class="">
And then, what to do ? Indeed, the help of maker explains for Snap,
but I want to use Augustus.<br class="">
I found a script called <span itemscope="" itemtype="http://schema.org/Answer" class=""><span itemprop="text" class=""><code class="">autoAug.pl</code></span></span>
to train Augustus.<br class="">
What do you think of it ?<br class="">
<br class="">
Should I use it that way ?<br class="">
<span itemscope="" itemtype="http://schema.org/Answer" class=""><p class=""><code class="">autoAug.pl --singleCPU --useexisting
--genome=mygenome.fasta --species=myspeciesname
--cdna=EST.fasta --trainingset=genome.gff3</code></p>
</span><br class="">
where EST.fasta is the file I used earlier to generate the gene
model and genome.gff3 is the result of the gene model.<br class="">
However, I don't think that I obtained gff3 file from the first
maker run.<br class="">
So should I generate gff3 from gff ???<br class="">
<br class="">
Thanks a lot for your help,<br class="">
<br class="">
Muriel<br class="">
<br class="">
<br class="">
</div>
_______________________________________________<br class="">maker-devel mailing list<br class=""><a href="mailto:maker-devel@box290.bluehost.com" class="">maker-devel@box290.bluehost.com</a><br class="">http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org<br class=""></div></blockquote></div><br class=""></div></div></div></body></html>