[maker-devel] Maker_opts.ctl

Nguyen, Anh-Dao (NIH/NHGRI) [C] nguyenan at mail.nih.gov
Wed Jul 16 13:28:33 MDT 2014


By default, model_org=all. Can I use the de novo repeat library predicted
by RepeatModeler for the rmlib option?

Anh-Dao



On 7/16/14 3:17 PM, "Carson Holt" <carsonhh at gmail.com> wrote:

>No.  You can provide both to MAKER. The options are model_org= and rmlib=.
> By letting MAKER handle repeat masking it will differentiate repeat types
>and use soft masking for some and hard masking for others.  This increases
>sensitivity of evidence alignments while still maintaining specificity.
>
>--Carson
>
>
>
>On 7/16/14, 1:07 PM, "Nguyen, Anh-Dao (NIH/NHGRI) [C]"
><nguyenan at mail.nih.gov> wrote:
>
>>I will run Augustus and FGENESH++ inside of MAKER using the parameter
>>files for Augustus.
>>I could also run RepeatMasker inside of MAKER. However, I ran RM using
>>two
>>options: -lib (de novo) and -species (known). I got ~ 45% repeats via de
>>novo and ~ 4% repeats via known options. As I understood, RM inside of
>>MAKER uses only RepBase repeat library and RepeatRunner protein database.
>>
>>Anh-Dao
>>
>>
>>On 7/16/14 2:36 PM, "Carson Holt" <carsonhh at gmail.com> wrote:
>>
>>>When you ran Augustus separately, it should have created the parameters
>>>needed to run it.  Now you should be able to run it inside of MAKER
>>>using
>>>the species name you just created.
>>>
>>>I'd also recommend letting MAKER run RepeatMasker for you rather than
>>>giving it the results as GFF3.
>>>
>>>--Carson
>>>
>>>
>>>On 7/16/14, 12:30 PM, "Nguyen, Anh-Dao (NIH/NHGRI) [C]"
>>><nguyenan at mail.nih.gov> wrote:
>>>
>>>>Thanks Daniel for your quick response.
>>>>
>>>>I did not use the parameter file of other organism when running
>>>>Augustus.
>>>>I created the parameter file for the genome following their
>>>>instructions.
>>>>There were multiple steps to train and run Augustus (Creating gene
>>>>structures for training AUGUSTUS with CEGMA => parameter file will be
>>>>created; Creating Hints for AUGUSTUS from ESTs/cDNA sequences;
>>>>Incorporating Illumina RNAseq into AUGUSTUS with GSNAP, etc.)
>>>>As I mentioned the reason why I ran Augustus separately, because
>>>>Augustus
>>>>has not trained that genome (no parameter file exists). Otherwise I
>>>>would
>>>>run Augustus inside MAKER.
>>>> 
>>>>You suggested to use rm_gff option to specify RepeatMasker output (sure
>>>>I
>>>>will convert them to .gff3 formatted files). Can I submit two RM .gff3
>>>>files, separated by comma?
>>>>
>>>>Anh-Dao
>>>>
>>>>
>>>>On 7/16/14 2:13 PM, "Daniel Ence" <dence at genetics.utah.edu> wrote:
>>>>
>>>>>Hi Anh-Dao, 
>>>>>
>>>>>In the maker_opts.ctl file, there are options for est and protein
>>>>>evidence. You¹ll put all of your fasta est files together in a command
>>>>>separated list in the ³est" option, and all of your fasta protein
>>>>>files
>>>>>in a command separated list for the ³protein² option.
>>>>>
>>>>>You¹ll specify the SNAP and Genemark files in their respective options
>>>>>in
>>>>>the control file and pass the augustus and fgenesh predictions in the
>>>>>³pred_gff² option.
>>>>>
>>>>>If you have the RepeatMasker output in gff3 format you can give it to
>>>>>maker with the ³rm_gff² option.
>>>>>
>>>>>If you¹ve converted the cufflinks output to gff3, you can give it to
>>>>>maker with the ³est_gff² option. I¹m pretty sure Trinity only gives
>>>>>fasta
>>>>>output, so you would put that in the ³est² option, along with all the
>>>>>other est fasta files.
>>>>>
>>>>>If Augustus isn¹t trained for your particular organism, then you can
>>>>>use
>>>>>another organism that augustus is already trained for. The list of
>>>>>species that augustus has parameter files for is in the README.txt
>>>>>that
>>>>>came with Augustus. I really recommend that you run Augustus from
>>>>>inside
>>>>>maker, because then you get all the benefits of maker passing
>>>>>ext-based
>>>>>hints to augustus at runtime, which can really improve Augustus¹
>>>>>predictive ability.
>>>>>
>>>>>When you ran the augustus gene prediction separately, did you use
>>>>>another
>>>>>organism¹s parameter file?
>>>>>
>>>>>Thanks,
>>>>>Daniel
>>>>>
>>>>>
>>>>>On Jul 16, 2014, at 11:15 AM, Nguyen, Anh-Dao (NIH/NHGRI) [C]
>>>>><nguyenan at mail.nih.gov> wrote:
>>>>>
>>>>>> Hi,
>>>>>> 
>>>>>> I would like to conduct a genome annotation and have the following
>>>>>>data:
>>>>>> - Two separate RepeatMasker outputs (using -lib and -species
>>>>>>options)
>>>>>> - ESTs and RACE (fasta)
>>>>>> - proteins (fasta)
>>>>>> - proteins of related organisms (fasta)
>>>>>> - SNAP's .hmm file (ran CEGMA, then used cegma2zff.pl to convert to
>>>>>>ZFF
>>>>>>format, etc. )
>>>>>> - GeneMark's .hmm file (es.mod file from running gm_es.pl)
>>>>>> - FGENESH++ and Augustus gene predictions. I wrote scripts to
>>>>>>convert
>>>>>>the outputs to .gff3 files. The reason why I ran Augustus gene
>>>>>>prediction separately, because the genome has never been trained for
>>>>>>Augustus.
>>>>>> - Cufflinks and Trinity from RNA-Seq
>>>>>> 
>>>>>> Could you please let me know how can I specify parameters in the
>>>>>>maker_opts.ctl file?
>>>>>> Or do you have other suggestions to re-do the data listed above?
>>>>>> 
>>>>>> Thanks.
>>>>>> Anh-Dao
>>>>>> 
>>>>>> _______________________________________________
>>>>>> maker-devel mailing list
>>>>>> maker-devel at box290.bluehost.com
>>>>>> 
>>>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.o
>>>>>>r
>>>>>>g
>>>>>
>>>>
>>>>
>>>>_______________________________________________
>>>>maker-devel mailing list
>>>>maker-devel at box290.bluehost.com
>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>>
>>>
>>
>
>



More information about the maker-devel mailing list