[maker-devel] Q on MAKER

hcma hcma at uci.edu
Thu Feb 11 15:32:45 MST 2016


Hi Carlson,

Thanks for sharing.

I did assemble the Illumina RNA-seq PE100 reads de novo using Trinity 
and i input this to the 1st run of maker for generating a set of genes 
to train SNAP and augustus. Now, i am planning to run a 2nd run (and 
perhaps final run) of maker for gene prediction, provided that the 
result of Snap and Augustus looks similar to each other.

I was going to incorporate the GFF result from tophat into 2nd run of 
maker for gene prediction, along with Trinity output, but avoiding 
external protein annotation. I already did a separate blast analysis to 
identify orthologous genes and i prefer to run maker without any protein 
evidence.

Do you recommend to input the output of tophat2gff for this second run 
of maker for gene prediction?

Thanks again for your time and advise.

Best Regards
Karen



On 2016-02-10 18:32, Carson Holt wrote:
> I find tophat results to be too noisy, and prefer cufflinks. There is
> both a tophat2gff and cufflinks2gff script that comes with MAKER. Also
> consider assembling the reads with Trinity (my overall preferred
> method because it yields the highest specificity).
> 
> --Carson
> 
> Sent from my iPhone
> 
>> On Feb 10, 2016, at 3:27 PM, hcma <hcma at uci.edu> wrote:
>> 
>> Hi Mike,
>> 
>> Thanks for the reply. So i can input raw RNA-seq reads to Tophat and 
>> feed the output to maker?
>> 
>> Thanks.
>> 
>> Best Regards
>> KAren
>> 
>> 
>> 
>>> On 2016-02-10 06:17, Michael Campbell wrote:
>>> HI Karen,
>>> From my experience trimming reads will not make things worse and it
>>> generally makes things better. As far as the best program to use, one
>>> doesn’t really stand out above the others as far as I can tell.
>>> However, with paired end reads it is important to use a trimmer that
>>> preserves the pairing between the two files (i.e when an entire read
>>> is discarded the paired read is moved into a file for singletons).
>>> Thanks
>>> Mike
>>>> On Feb 9, 2016, at 5:35 PM, hcma <hcma at uci.edu> wrote:
>>>> Hi Carson,
>>>> For the final run of annotation, I would like to incorporate tophat 
>>>> results from RNA-seq data, from your experience, do you know if it 
>>>> is better to use raw RNA-seq (Illumina paired-end data) or trimmed 
>>>> (trimmed using Trimmomatuc) data for feeding into tophat? If 
>>>> trimmed, do you recommend a particular programme?
>>>> Thanks for your time.
>>>> Best Regards
>>>> KAren
>>>>> On 2016-02-05 15:33, Carson Holt wrote:
>>>>> I recommend using both.  You probably don't have augustus 
>>>>> installed.
>>>>> --Carson
>>>>> Sent from my iPhone
>>>>>> On Feb 5, 2016, at 4:20 PM, hcma <hcma at uci.edu> wrote:
>>>>>> Hi Carlson,
>>>>>> Thanks for the instruction and in maker_exe.ctl, i only see path 
>>>>>> to snap, but not to augustus, so my system admin is checking this 
>>>>>> for me.
>>>>>> From some manual i found, people use both snap and augustus when 
>>>>>> using MAKER to annotate genomes. Would you recommend using both or 
>>>>>> one of the 2 is sufficient?
>>>>>> Thanks for your valuable time and advise.
>>>>>> Best Regards
>>>>>> Karen
>>>>>>> On 2016-02-05 15:03, Carson Holt wrote:
>>>>>>> You need to find out where the augustus MAKER is using is 
>>>>>>> installed.
>>>>>>> Check the maker_exe.ctl file you are using, or type ‘which 
>>>>>>> augustus’.
>>>>>>> —Carson
>>>>>>>> On Feb 5, 2016, at 3:58 PM, hcma <hcma at uci.edu> wrote:
>>>>>>>> Hi Carlson,
>>>>>>>> These are the list of directories under maker/2.31.8
>>>>>>>> bin  data  GMOD  INSTALL  lib  LICENSE  MWAS  perl  README  
>>>>>>>> RELEASE  src
>>>>>>>> Where can i find augustus/? Or i have to ask my system admin to 
>>>>>>>> install this?
>>>>>>>> Thanks.
>>>>>>>> Best Regards
>>>>>>>> Karen
>>>>>>>>> On 2016-02-05 14:54, Carson Holt wrote:
>>>>>>>>> Augustus gives you an entire directory rather than just a 
>>>>>>>>> single file
>>>>>>>>> like SNAP.  You have to take the directory and copy it to the
>>>>>>>>> .../augustus/config/species/ directory.
>>>>>>>>> Example:
>>>>>>>>> …/augustus/config/species/arabidopsis/
>>>>>>>>> Then ‘arabidopsis’ would be the species name to use with MAKER.
>>>>>>>>> Sometimes you may have to do a second round of both SNAP and 
>>>>>>>>> Augustus
>>>>>>>>> training (called bootstrapping). Look at the models you get 
>>>>>>>>> after the
>>>>>>>>> first round, and if they look good then, the second round is 
>>>>>>>>> probably
>>>>>>>>> not going top be beneficial.
>>>>>>>>> —Carson
>>>>>>>>>> On Feb 5, 2016, at 3:42 PM, hcma <hcma at uci.edu> wrote:
>>>>>>>>>> Hi Dr Holt,
>>>>>>>>>> Thanks for the email. Here is my pipeline, does it seems 
>>>>>>>>>> acceptable? Any comments is welcome and much appreciated.
>>>>>>>>>> 1. Use maker to generate training gene set:
>>>>>>>>>> genome=all-chromosome-r1.04.fasta
>>>>>>>>>> est=Trinity.fasta
>>>>>>>>>> est2genome=1
>>>>>>>>>> 2. Use output of Maker to train SNAP:
>>>>>>>>>> maker2zff dwil-all-chromosome-r1.04.all.gff
>>>>>>>>>> fathom genome.ann genome.dna –gene-stats
>>>>>>>>>> fathom genome.ann genome.dna –categorize 1000
>>>>>>>>>> fathom genome.ann genome.dna –gene-stats
>>>>>>>>>> fathom uni.ann uni.dna –export 1000 –plus
>>>>>>>>>> hmm-assembler.pl genome . > dwil_genome.hmm
>>>>>>>>>> 3. Use output of Maker to train Augustus on their webserver:
>>>>>>>>>> File used:
>>>>>>>>>> Upload ‘export.dna’ as the genome file
>>>>>>>>>> Upload ‘export.aa’ as the protein file
>>>>>>>>>> 4. second and final Maker run:
>>>>>>>>>> genome=all-chromosome-r1.04.fasta
>>>>>>>>>> est=Trinity.fasta
>>>>>>>>>> est2genome=0
>>>>>>>>>> Snaphmm=output of 2
>>>>>>>>>> How do i incorporate the output of training set of gene from 
>>>>>>>>>> Augustus web server here into this step 4?
>>>>>>>>>> Thanks for your time.
>>>>>>>>>> Best Regards
>>>>>>>>>> Karen
>>>>>>>>>>> On 2016-02-05 06:36, Carson Holt wrote:
>>>>>>>>>>> Hi Karen,
>>>>>>>>>>> There are many ways to train Augustus. I prefer to identify 
>>>>>>>>>>> gene
>>>>>>>>>>> models in MAKER (GFF3) and use those to train both SNAP and 
>>>>>>>>>>> Augustus.
>>>>>>>>>>> Here is a previous post on the topic —>
>>>>>>>>>>> https://groups.google.com/forum/#!searchin/maker-devel/augustus/maker-devel/FWMSTdqWQqI/lC3miQtiCpwJ
>>>>>>>>>>> [1]
>>>>>>>>>>> In the end you need to look at the SNAP and Augustus models 
>>>>>>>>>>> together
>>>>>>>>>>> with evidence alignments in a genome browser (like desktop 
>>>>>>>>>>> Apollo).
>>>>>>>>>>> When everything is trained well, both SNAP and Augustus 
>>>>>>>>>>> models will
>>>>>>>>>>> look like each other and both seem to look like the evidence
>>>>>>>>>>> alignments.
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Carson
>>>>>>>>>>>> On Feb 4, 2016, at 5:52 PM, hcma <hcma at uci.edu> wrote:
>>>>>>>>>>>> Hi,
>>>>>>>>>>>> I have a genome sequence and Trinity assembly for a new 
>>>>>>>>>>>> species and
>>>>>>>>>>>> I am wondering what are the best steps to take when using 
>>>>>>>>>>>> MAKER?
>>>>>>>>>>>> 1. I used the genome sequence and all assembled Trinity 
>>>>>>>>>>>> sequence to
>>>>>>>>>>>> do first run of MAKER in order to generate training set for 
>>>>>>>>>>>> SNAP and
>>>>>>>>>>>> Augustus.
>>>>>>>>>>>> In maker_opts.ctl:
>>>>>>>>>>>> genome=all-chromosome-r1.04.fasta
>>>>>>>>>>>> est=Trinity.fasta
>>>>>>>>>>>> est2genome=1
>>>>>>>>>>>> 2. Train SNAP
>>>>>>>>>>>> 3. Train Augustus
>>>>>>>>>>>> When i train Augustus, i only supply genome and protein 
>>>>>>>>>>>> file, should
>>>>>>>>>>>> i also supply the trinity file here?
>>>>>>>>>>>> 4. what's the best parameter to use when running MAKER the 
>>>>>>>>>>>> second
>>>>>>>>>>>> time for obtaining the final annotation? I would prefer not 
>>>>>>>>>>>> to use
>>>>>>>>>>>> any external protein data.
>>>>>>>>>>>> genome=all-chromosome-r1.04.fasta
>>>>>>>>>>>> est=Trinity.fasta
>>>>>>>>>>>> est2genome=0
>>>>>>>>>>>> SNAP
>>>>>>>>>>>> Augustus
>>>>>>>>>>>> Thanks.
>>>>>>>>>>>> Best Regards
>>>>>>>>>>>> KAren
>>>>>>>>>>> Links:
>>>>>>>>>>> ------
>>>>>>>>>>> [1]
>>>>>>>>>>> https://groups.google.com/forum/#!searchin/maker-devel/augustus/maker-devel/FWMSTdqWQqI/lC3miQtiCpwJ
>> 





More information about the maker-devel mailing list