[maker-devel] Q on MAKER

Thu Feb 11 15:36:44 MST 2016

Not if you already have trinity results. It will actually decrease the specificity of the run (i.e. causes false gene calls because of spurious evidence support).

—Carson

> On Feb 11, 2016, at 3:32 PM, hcma <hcma at uci.edu> wrote:
> 
> Hi Carlson,
> 
> Thanks for sharing.
> 
> I did assemble the Illumina RNA-seq PE100 reads de novo using Trinity and i input this to the 1st run of maker for generating a set of genes to train SNAP and augustus. Now, i am planning to run a 2nd run (and perhaps final run) of maker for gene prediction, provided that the result of Snap and Augustus looks similar to each other.
> 
> I was going to incorporate the GFF result from tophat into 2nd run of maker for gene prediction, along with Trinity output, but avoiding external protein annotation. I already did a separate blast analysis to identify orthologous genes and i prefer to run maker without any protein evidence.
> 
> Do you recommend to input the output of tophat2gff for this second run of maker for gene prediction?
> 
> Thanks again for your time and advise.
> 
> Best Regards
> Karen
> 
> 
> 
> On 2016-02-10 18:32, Carson Holt wrote:
>> I find tophat results to be too noisy, and prefer cufflinks. There is
>> both a tophat2gff and cufflinks2gff script that comes with MAKER. Also
>> consider assembling the reads with Trinity (my overall preferred
>> method because it yields the highest specificity).
>> --Carson
>> Sent from my iPhone
>>> On Feb 10, 2016, at 3:27 PM, hcma <hcma at uci.edu> wrote:
>>> Hi Mike,
>>> Thanks for the reply. So i can input raw RNA-seq reads to Tophat and feed the output to maker?
>>> Thanks.
>>> Best Regards
>>> KAren
>>>> On 2016-02-10 06:17, Michael Campbell wrote:
>>>> HI Karen,
>>>> From my experience trimming reads will not make things worse and it
>>>> generally makes things better. As far as the best program to use, one
>>>> doesn’t really stand out above the others as far as I can tell.
>>>> However, with paired end reads it is important to use a trimmer that
>>>> preserves the pairing between the two files (i.e when an entire read
>>>> is discarded the paired read is moved into a file for singletons).
>>>> Thanks
>>>> Mike
>>>>> On Feb 9, 2016, at 5:35 PM, hcma <hcma at uci.edu> wrote:
>>>>> Hi Carson,
>>>>> For the final run of annotation, I would like to incorporate tophat results from RNA-seq data, from your experience, do you know if it is better to use raw RNA-seq (Illumina paired-end data) or trimmed (trimmed using Trimmomatuc) data for feeding into tophat? If trimmed, do you recommend a particular programme?
>>>>> Thanks for your time.
>>>>> Best Regards
>>>>> KAren
>>>>>> On 2016-02-05 15:33, Carson Holt wrote:
>>>>>> I recommend using both.  You probably don't have augustus installed.
>>>>>> --Carson
>>>>>> Sent from my iPhone
>>>>>>> On Feb 5, 2016, at 4:20 PM, hcma <hcma at uci.edu> wrote:
>>>>>>> Hi Carlson,
>>>>>>> Thanks for the instruction and in maker_exe.ctl, i only see path to snap, but not to augustus, so my system admin is checking this for me.
>>>>>>> From some manual i found, people use both snap and augustus when using MAKER to annotate genomes. Would you recommend using both or one of the 2 is sufficient?
>>>>>>> Thanks for your valuable time and advise.
>>>>>>> Best Regards
>>>>>>> Karen
>>>>>>>> On 2016-02-05 15:03, Carson Holt wrote:
>>>>>>>> You need to find out where the augustus MAKER is using is installed.
>>>>>>>> Check the maker_exe.ctl file you are using, or type ‘which augustus’.
>>>>>>>> —Carson
>>>>>>>>> On Feb 5, 2016, at 3:58 PM, hcma <hcma at uci.edu> wrote:
>>>>>>>>> Hi Carlson,
>>>>>>>>> These are the list of directories under maker/2.31.8
>>>>>>>>> bin  data  GMOD  INSTALL  lib  LICENSE  MWAS  perl  README  RELEASE  src
>>>>>>>>> Where can i find augustus/? Or i have to ask my system admin to install this?
>>>>>>>>> Thanks.
>>>>>>>>> Best Regards
>>>>>>>>> Karen
>>>>>>>>>> On 2016-02-05 14:54, Carson Holt wrote:
>>>>>>>>>> Augustus gives you an entire directory rather than just a single file
>>>>>>>>>> like SNAP.  You have to take the directory and copy it to the
>>>>>>>>>> .../augustus/config/species/ directory.
>>>>>>>>>> Example:
>>>>>>>>>> …/augustus/config/species/arabidopsis/
>>>>>>>>>> Then ‘arabidopsis’ would be the species name to use with MAKER.
>>>>>>>>>> Sometimes you may have to do a second round of both SNAP and Augustus
>>>>>>>>>> training (called bootstrapping). Look at the models you get after the
>>>>>>>>>> first round, and if they look good then, the second round is probably
>>>>>>>>>> not going top be beneficial.
>>>>>>>>>> —Carson
>>>>>>>>>>> On Feb 5, 2016, at 3:42 PM, hcma <hcma at uci.edu> wrote:
>>>>>>>>>>> Hi Dr Holt,
>>>>>>>>>>> Thanks for the email. Here is my pipeline, does it seems acceptable? Any comments is welcome and much appreciated.
>>>>>>>>>>> 1. Use maker to generate training gene set:
>>>>>>>>>>> genome=all-chromosome-r1.04.fasta
>>>>>>>>>>> est=Trinity.fasta
>>>>>>>>>>> est2genome=1
>>>>>>>>>>> 2. Use output of Maker to train SNAP:
>>>>>>>>>>> maker2zff dwil-all-chromosome-r1.04.all.gff
>>>>>>>>>>> fathom genome.ann genome.dna –gene-stats
>>>>>>>>>>> fathom genome.ann genome.dna –categorize 1000
>>>>>>>>>>> fathom genome.ann genome.dna –gene-stats
>>>>>>>>>>> fathom uni.ann uni.dna –export 1000 –plus
>>>>>>>>>>> hmm-assembler.pl genome . > dwil_genome.hmm
>>>>>>>>>>> 3. Use output of Maker to train Augustus on their webserver:
>>>>>>>>>>> File used:
>>>>>>>>>>> Upload ‘export.dna’ as the genome file
>>>>>>>>>>> Upload ‘export.aa’ as the protein file
>>>>>>>>>>> 4. second and final Maker run:
>>>>>>>>>>> genome=all-chromosome-r1.04.fasta
>>>>>>>>>>> est=Trinity.fasta
>>>>>>>>>>> est2genome=0
>>>>>>>>>>> Snaphmm=output of 2
>>>>>>>>>>> How do i incorporate the output of training set of gene from Augustus web server here into this step 4?
>>>>>>>>>>> Thanks for your time.
>>>>>>>>>>> Best Regards
>>>>>>>>>>> Karen
>>>>>>>>>>>> On 2016-02-05 06:36, Carson Holt wrote:
>>>>>>>>>>>> Hi Karen,
>>>>>>>>>>>> There are many ways to train Augustus. I prefer to identify gene
>>>>>>>>>>>> models in MAKER (GFF3) and use those to train both SNAP and Augustus.
>>>>>>>>>>>> Here is a previous post on the topic —>
>>>>>>>>>>>> https://groups.google.com/forum/#!searchin/maker-devel/augustus/maker-devel/FWMSTdqWQqI/lC3miQtiCpwJ
>>>>>>>>>>>> [1]
>>>>>>>>>>>> In the end you need to look at the SNAP and Augustus models together
>>>>>>>>>>>> with evidence alignments in a genome browser (like desktop Apollo).
>>>>>>>>>>>> When everything is trained well, both SNAP and Augustus models will
>>>>>>>>>>>> look like each other and both seem to look like the evidence
>>>>>>>>>>>> alignments.
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Carson
>>>>>>>>>>>>> On Feb 4, 2016, at 5:52 PM, hcma <hcma at uci.edu> wrote:
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>> I have a genome sequence and Trinity assembly for a new species and
>>>>>>>>>>>>> I am wondering what are the best steps to take when using MAKER?
>>>>>>>>>>>>> 1. I used the genome sequence and all assembled Trinity sequence to
>>>>>>>>>>>>> do first run of MAKER in order to generate training set for SNAP and
>>>>>>>>>>>>> Augustus.
>>>>>>>>>>>>> In maker_opts.ctl:
>>>>>>>>>>>>> genome=all-chromosome-r1.04.fasta
>>>>>>>>>>>>> est=Trinity.fasta
>>>>>>>>>>>>> est2genome=1
>>>>>>>>>>>>> 2. Train SNAP
>>>>>>>>>>>>> 3. Train Augustus
>>>>>>>>>>>>> When i train Augustus, i only supply genome and protein file, should
>>>>>>>>>>>>> i also supply the trinity file here?
>>>>>>>>>>>>> 4. what's the best parameter to use when running MAKER the second
>>>>>>>>>>>>> time for obtaining the final annotation? I would prefer not to use
>>>>>>>>>>>>> any external protein data.
>>>>>>>>>>>>> genome=all-chromosome-r1.04.fasta
>>>>>>>>>>>>> est=Trinity.fasta
>>>>>>>>>>>>> est2genome=0
>>>>>>>>>>>>> SNAP
>>>>>>>>>>>>> Augustus
>>>>>>>>>>>>> Thanks.
>>>>>>>>>>>>> Best Regards
>>>>>>>>>>>>> KAren
>>>>>>>>>>>> Links:
>>>>>>>>>>>> ------
>>>>>>>>>>>> [1]
>>>>>>>>>>>> https://groups.google.com/forum/#!searchin/maker-devel/augustus/maker-devel/FWMSTdqWQqI/lC3miQtiCpwJ
>