[maker-devel] Q on MAKER

Wed Feb 10 19:32:00 MST 2016

I find tophat results to be too noisy, and prefer cufflinks. There is both a tophat2gff and cufflinks2gff script that comes with MAKER. Also consider assembling the reads with Trinity (my overall preferred  method because it yields the highest specificity).

--Carson

Sent from my iPhone

> On Feb 10, 2016, at 3:27 PM, hcma <hcma at uci.edu> wrote:
> 
> Hi Mike,
> 
> Thanks for the reply. So i can input raw RNA-seq reads to Tophat and feed the output to maker?
> 
> Thanks.
> 
> Best Regards
> KAren
> 
> 
> 
>> On 2016-02-10 06:17, Michael Campbell wrote:
>> HI Karen,
>> From my experience trimming reads will not make things worse and it
>> generally makes things better. As far as the best program to use, one
>> doesn’t really stand out above the others as far as I can tell.
>> However, with paired end reads it is important to use a trimmer that
>> preserves the pairing between the two files (i.e when an entire read
>> is discarded the paired read is moved into a file for singletons).
>> Thanks
>> Mike
>>> On Feb 9, 2016, at 5:35 PM, hcma <hcma at uci.edu> wrote:
>>> Hi Carson,
>>> For the final run of annotation, I would like to incorporate tophat results from RNA-seq data, from your experience, do you know if it is better to use raw RNA-seq (Illumina paired-end data) or trimmed (trimmed using Trimmomatuc) data for feeding into tophat? If trimmed, do you recommend a particular programme?
>>> Thanks for your time.
>>> Best Regards
>>> KAren
>>>> On 2016-02-05 15:33, Carson Holt wrote:
>>>> I recommend using both.  You probably don't have augustus installed.
>>>> --Carson
>>>> Sent from my iPhone
>>>>> On Feb 5, 2016, at 4:20 PM, hcma <hcma at uci.edu> wrote:
>>>>> Hi Carlson,
>>>>> Thanks for the instruction and in maker_exe.ctl, i only see path to snap, but not to augustus, so my system admin is checking this for me.
>>>>> From some manual i found, people use both snap and augustus when using MAKER to annotate genomes. Would you recommend using both or one of the 2 is sufficient?
>>>>> Thanks for your valuable time and advise.
>>>>> Best Regards
>>>>> Karen
>>>>>> On 2016-02-05 15:03, Carson Holt wrote:
>>>>>> You need to find out where the augustus MAKER is using is installed.
>>>>>> Check the maker_exe.ctl file you are using, or type ‘which augustus’.
>>>>>> —Carson
>>>>>>> On Feb 5, 2016, at 3:58 PM, hcma <hcma at uci.edu> wrote:
>>>>>>> Hi Carlson,
>>>>>>> These are the list of directories under maker/2.31.8
>>>>>>> bin  data  GMOD  INSTALL  lib  LICENSE  MWAS  perl  README  RELEASE  src
>>>>>>> Where can i find augustus/? Or i have to ask my system admin to install this?
>>>>>>> Thanks.
>>>>>>> Best Regards
>>>>>>> Karen
>>>>>>>> On 2016-02-05 14:54, Carson Holt wrote:
>>>>>>>> Augustus gives you an entire directory rather than just a single file
>>>>>>>> like SNAP.  You have to take the directory and copy it to the
>>>>>>>> .../augustus/config/species/ directory.
>>>>>>>> Example:
>>>>>>>> …/augustus/config/species/arabidopsis/
>>>>>>>> Then ‘arabidopsis’ would be the species name to use with MAKER.
>>>>>>>> Sometimes you may have to do a second round of both SNAP and Augustus
>>>>>>>> training (called bootstrapping). Look at the models you get after the
>>>>>>>> first round, and if they look good then, the second round is probably
>>>>>>>> not going top be beneficial.
>>>>>>>> —Carson
>>>>>>>>> On Feb 5, 2016, at 3:42 PM, hcma <hcma at uci.edu> wrote:
>>>>>>>>> Hi Dr Holt,
>>>>>>>>> Thanks for the email. Here is my pipeline, does it seems acceptable? Any comments is welcome and much appreciated.
>>>>>>>>> 1. Use maker to generate training gene set:
>>>>>>>>> genome=all-chromosome-r1.04.fasta
>>>>>>>>> est=Trinity.fasta
>>>>>>>>> est2genome=1
>>>>>>>>> 2. Use output of Maker to train SNAP:
>>>>>>>>> maker2zff dwil-all-chromosome-r1.04.all.gff
>>>>>>>>> fathom genome.ann genome.dna –gene-stats
>>>>>>>>> fathom genome.ann genome.dna –categorize 1000
>>>>>>>>> fathom genome.ann genome.dna –gene-stats
>>>>>>>>> fathom uni.ann uni.dna –export 1000 –plus
>>>>>>>>> hmm-assembler.pl genome . > dwil_genome.hmm
>>>>>>>>> 3. Use output of Maker to train Augustus on their webserver:
>>>>>>>>> File used:
>>>>>>>>> Upload ‘export.dna’ as the genome file
>>>>>>>>> Upload ‘export.aa’ as the protein file
>>>>>>>>> 4. second and final Maker run:
>>>>>>>>> genome=all-chromosome-r1.04.fasta
>>>>>>>>> est=Trinity.fasta
>>>>>>>>> est2genome=0
>>>>>>>>> Snaphmm=output of 2
>>>>>>>>> How do i incorporate the output of training set of gene from Augustus web server here into this step 4?
>>>>>>>>> Thanks for your time.
>>>>>>>>> Best Regards
>>>>>>>>> Karen
>>>>>>>>>> On 2016-02-05 06:36, Carson Holt wrote:
>>>>>>>>>> Hi Karen,
>>>>>>>>>> There are many ways to train Augustus. I prefer to identify gene
>>>>>>>>>> models in MAKER (GFF3) and use those to train both SNAP and Augustus.
>>>>>>>>>> Here is a previous post on the topic —>
>>>>>>>>>> https://groups.google.com/forum/#!searchin/maker-devel/augustus/maker-devel/FWMSTdqWQqI/lC3miQtiCpwJ
>>>>>>>>>> [1]
>>>>>>>>>> In the end you need to look at the SNAP and Augustus models together
>>>>>>>>>> with evidence alignments in a genome browser (like desktop Apollo).
>>>>>>>>>> When everything is trained well, both SNAP and Augustus models will
>>>>>>>>>> look like each other and both seem to look like the evidence
>>>>>>>>>> alignments.
>>>>>>>>>> Thanks,
>>>>>>>>>> Carson
>>>>>>>>>>> On Feb 4, 2016, at 5:52 PM, hcma <hcma at uci.edu> wrote:
>>>>>>>>>>> Hi,
>>>>>>>>>>> I have a genome sequence and Trinity assembly for a new species and
>>>>>>>>>>> I am wondering what are the best steps to take when using MAKER?
>>>>>>>>>>> 1. I used the genome sequence and all assembled Trinity sequence to
>>>>>>>>>>> do first run of MAKER in order to generate training set for SNAP and
>>>>>>>>>>> Augustus.
>>>>>>>>>>> In maker_opts.ctl:
>>>>>>>>>>> genome=all-chromosome-r1.04.fasta
>>>>>>>>>>> est=Trinity.fasta
>>>>>>>>>>> est2genome=1
>>>>>>>>>>> 2. Train SNAP
>>>>>>>>>>> 3. Train Augustus
>>>>>>>>>>> When i train Augustus, i only supply genome and protein file, should
>>>>>>>>>>> i also supply the trinity file here?
>>>>>>>>>>> 4. what's the best parameter to use when running MAKER the second
>>>>>>>>>>> time for obtaining the final annotation? I would prefer not to use
>>>>>>>>>>> any external protein data.
>>>>>>>>>>> genome=all-chromosome-r1.04.fasta
>>>>>>>>>>> est=Trinity.fasta
>>>>>>>>>>> est2genome=0
>>>>>>>>>>> SNAP
>>>>>>>>>>> Augustus
>>>>>>>>>>> Thanks.
>>>>>>>>>>> Best Regards
>>>>>>>>>>> KAren
>>>>>>>>>> Links:
>>>>>>>>>> ------
>>>>>>>>>> [1]
>>>>>>>>>> https://groups.google.com/forum/#!searchin/maker-devel/augustus/maker-devel/FWMSTdqWQqI/lC3miQtiCpwJ
>