[maker-devel] Maker_opts.ctl

Carson Holt carsonhh at gmail.com
Wed Jul 16 13:17:31 MDT 2014


No.  You can provide both to MAKER. The options are model_org= and rmlib=.
 By letting MAKER handle repeat masking it will differentiate repeat types
and use soft masking for some and hard masking for others.  This increases
sensitivity of evidence alignments while still maintaining specificity.

--Carson



On 7/16/14, 1:07 PM, "Nguyen, Anh-Dao (NIH/NHGRI) [C]"
<nguyenan at mail.nih.gov> wrote:

>I will run Augustus and FGENESH++ inside of MAKER using the parameter
>files for Augustus.
>I could also run RepeatMasker inside of MAKER. However, I ran RM using two
>options: -lib (de novo) and -species (known). I got ~ 45% repeats via de
>novo and ~ 4% repeats via known options. As I understood, RM inside of
>MAKER uses only RepBase repeat library and RepeatRunner protein database.
>
>Anh-Dao
>
>
>On 7/16/14 2:36 PM, "Carson Holt" <carsonhh at gmail.com> wrote:
>
>>When you ran Augustus separately, it should have created the parameters
>>needed to run it.  Now you should be able to run it inside of MAKER using
>>the species name you just created.
>>
>>I'd also recommend letting MAKER run RepeatMasker for you rather than
>>giving it the results as GFF3.
>>
>>--Carson
>>
>>
>>On 7/16/14, 12:30 PM, "Nguyen, Anh-Dao (NIH/NHGRI) [C]"
>><nguyenan at mail.nih.gov> wrote:
>>
>>>Thanks Daniel for your quick response.
>>>
>>>I did not use the parameter file of other organism when running
>>>Augustus.
>>>I created the parameter file for the genome following their
>>>instructions.
>>>There were multiple steps to train and run Augustus (Creating gene
>>>structures for training AUGUSTUS with CEGMA => parameter file will be
>>>created; Creating Hints for AUGUSTUS from ESTs/cDNA sequences;
>>>Incorporating Illumina RNAseq into AUGUSTUS with GSNAP, etc.)
>>>As I mentioned the reason why I ran Augustus separately, because
>>>Augustus
>>>has not trained that genome (no parameter file exists). Otherwise I
>>>would
>>>run Augustus inside MAKER.
>>> 
>>>You suggested to use rm_gff option to specify RepeatMasker output (sure
>>>I
>>>will convert them to .gff3 formatted files). Can I submit two RM .gff3
>>>files, separated by comma?
>>>
>>>Anh-Dao
>>>
>>>
>>>On 7/16/14 2:13 PM, "Daniel Ence" <dence at genetics.utah.edu> wrote:
>>>
>>>>Hi Anh-Dao, 
>>>>
>>>>In the maker_opts.ctl file, there are options for est and protein
>>>>evidence. You¹ll put all of your fasta est files together in a command
>>>>separated list in the ³est" option, and all of your fasta protein files
>>>>in a command separated list for the ³protein² option.
>>>>
>>>>You¹ll specify the SNAP and Genemark files in their respective options
>>>>in
>>>>the control file and pass the augustus and fgenesh predictions in the
>>>>³pred_gff² option.
>>>>
>>>>If you have the RepeatMasker output in gff3 format you can give it to
>>>>maker with the ³rm_gff² option.
>>>>
>>>>If you¹ve converted the cufflinks output to gff3, you can give it to
>>>>maker with the ³est_gff² option. I¹m pretty sure Trinity only gives
>>>>fasta
>>>>output, so you would put that in the ³est² option, along with all the
>>>>other est fasta files.
>>>>
>>>>If Augustus isn¹t trained for your particular organism, then you can
>>>>use
>>>>another organism that augustus is already trained for. The list of
>>>>species that augustus has parameter files for is in the README.txt that
>>>>came with Augustus. I really recommend that you run Augustus from
>>>>inside
>>>>maker, because then you get all the benefits of maker passing ext-based
>>>>hints to augustus at runtime, which can really improve Augustus¹
>>>>predictive ability.
>>>>
>>>>When you ran the augustus gene prediction separately, did you use
>>>>another
>>>>organism¹s parameter file?
>>>>
>>>>Thanks,
>>>>Daniel
>>>>
>>>>
>>>>On Jul 16, 2014, at 11:15 AM, Nguyen, Anh-Dao (NIH/NHGRI) [C]
>>>><nguyenan at mail.nih.gov> wrote:
>>>>
>>>>> Hi,
>>>>> 
>>>>> I would like to conduct a genome annotation and have the following
>>>>>data:
>>>>> - Two separate RepeatMasker outputs (using -lib and -species options)
>>>>> - ESTs and RACE (fasta)
>>>>> - proteins (fasta)
>>>>> - proteins of related organisms (fasta)
>>>>> - SNAP's .hmm file (ran CEGMA, then used cegma2zff.pl to convert to
>>>>>ZFF
>>>>>format, etc. )
>>>>> - GeneMark's .hmm file (es.mod file from running gm_es.pl)
>>>>> - FGENESH++ and Augustus gene predictions. I wrote scripts to convert
>>>>>the outputs to .gff3 files. The reason why I ran Augustus gene
>>>>>prediction separately, because the genome has never been trained for
>>>>>Augustus.
>>>>> - Cufflinks and Trinity from RNA-Seq
>>>>> 
>>>>> Could you please let me know how can I specify parameters in the
>>>>>maker_opts.ctl file?
>>>>> Or do you have other suggestions to re-do the data listed above?
>>>>> 
>>>>> Thanks.
>>>>> Anh-Dao
>>>>> 
>>>>> _______________________________________________
>>>>> maker-devel mailing list
>>>>> maker-devel at box290.bluehost.com
>>>>> 
>>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.or
>>>>>g
>>>>
>>>
>>>
>>>_______________________________________________
>>>maker-devel mailing list
>>>maker-devel at box290.bluehost.com
>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>
>>
>






More information about the maker-devel mailing list