[maker-devel] Maker_opts.ctl

Nguyen, Anh-Dao (NIH/NHGRI) [C] nguyenan at mail.nih.gov
Wed Jul 16 13:16:43 MDT 2014


I forget to mention that I ran RepeatModeler on the genome first, then
used the output of RepeatModeler to submit to RepeatMasker using -lib
option (de novo).
For the -species option, I used metazoa

Anh-Dao



On 7/16/14 3:07 PM, "Nguyen, Anh-Dao (NIH/NHGRI) [C]"
<nguyenan at mail.nih.gov> wrote:

>I will run Augustus and FGENESH++ inside of MAKER using the parameter
>files for Augustus.
>I could also run RepeatMasker inside of MAKER. However, I ran RM using two
>options: -lib (de novo) and -species (known). I got ~ 45% repeats via de
>novo and ~ 4% repeats via known options. As I understood, RM inside of
>MAKER uses only RepBase repeat library and RepeatRunner protein database.
>
>Anh-Dao
>
>
>On 7/16/14 2:36 PM, "Carson Holt" <carsonhh at gmail.com> wrote:
>
>>When you ran Augustus separately, it should have created the parameters
>>needed to run it.  Now you should be able to run it inside of MAKER using
>>the species name you just created.
>>
>>I'd also recommend letting MAKER run RepeatMasker for you rather than
>>giving it the results as GFF3.
>>
>>--Carson
>>
>>
>>On 7/16/14, 12:30 PM, "Nguyen, Anh-Dao (NIH/NHGRI) [C]"
>><nguyenan at mail.nih.gov> wrote:
>>
>>>Thanks Daniel for your quick response.
>>>
>>>I did not use the parameter file of other organism when running
>>>Augustus.
>>>I created the parameter file for the genome following their
>>>instructions.
>>>There were multiple steps to train and run Augustus (Creating gene
>>>structures for training AUGUSTUS with CEGMA => parameter file will be
>>>created; Creating Hints for AUGUSTUS from ESTs/cDNA sequences;
>>>Incorporating Illumina RNAseq into AUGUSTUS with GSNAP, etc.)
>>>As I mentioned the reason why I ran Augustus separately, because
>>>Augustus
>>>has not trained that genome (no parameter file exists). Otherwise I
>>>would
>>>run Augustus inside MAKER.
>>> 
>>>You suggested to use rm_gff option to specify RepeatMasker output (sure
>>>I
>>>will convert them to .gff3 formatted files). Can I submit two RM .gff3
>>>files, separated by comma?
>>>
>>>Anh-Dao
>>>
>>>
>>>On 7/16/14 2:13 PM, "Daniel Ence" <dence at genetics.utah.edu> wrote:
>>>
>>>>Hi Anh-Dao, 
>>>>
>>>>In the maker_opts.ctl file, there are options for est and protein
>>>>evidence. You¹ll put all of your fasta est files together in a command
>>>>separated list in the ³est" option, and all of your fasta protein files
>>>>in a command separated list for the ³protein² option.
>>>>
>>>>You¹ll specify the SNAP and Genemark files in their respective options
>>>>in
>>>>the control file and pass the augustus and fgenesh predictions in the
>>>>³pred_gff² option.
>>>>
>>>>If you have the RepeatMasker output in gff3 format you can give it to
>>>>maker with the ³rm_gff² option.
>>>>
>>>>If you¹ve converted the cufflinks output to gff3, you can give it to
>>>>maker with the ³est_gff² option. I¹m pretty sure Trinity only gives
>>>>fasta
>>>>output, so you would put that in the ³est² option, along with all the
>>>>other est fasta files.
>>>>
>>>>If Augustus isn¹t trained for your particular organism, then you can
>>>>use
>>>>another organism that augustus is already trained for. The list of
>>>>species that augustus has parameter files for is in the README.txt that
>>>>came with Augustus. I really recommend that you run Augustus from
>>>>inside
>>>>maker, because then you get all the benefits of maker passing ext-based
>>>>hints to augustus at runtime, which can really improve Augustus¹
>>>>predictive ability.
>>>>
>>>>When you ran the augustus gene prediction separately, did you use
>>>>another
>>>>organism¹s parameter file?
>>>>
>>>>Thanks,
>>>>Daniel
>>>>
>>>>
>>>>On Jul 16, 2014, at 11:15 AM, Nguyen, Anh-Dao (NIH/NHGRI) [C]
>>>><nguyenan at mail.nih.gov> wrote:
>>>>
>>>>> Hi,
>>>>> 
>>>>> I would like to conduct a genome annotation and have the following
>>>>>data:
>>>>> - Two separate RepeatMasker outputs (using -lib and -species options)
>>>>> - ESTs and RACE (fasta)
>>>>> - proteins (fasta)
>>>>> - proteins of related organisms (fasta)
>>>>> - SNAP's .hmm file (ran CEGMA, then used cegma2zff.pl to convert to
>>>>>ZFF
>>>>>format, etc. )
>>>>> - GeneMark's .hmm file (es.mod file from running gm_es.pl)
>>>>> - FGENESH++ and Augustus gene predictions. I wrote scripts to convert
>>>>>the outputs to .gff3 files. The reason why I ran Augustus gene
>>>>>prediction separately, because the genome has never been trained for
>>>>>Augustus.
>>>>> - Cufflinks and Trinity from RNA-Seq
>>>>> 
>>>>> Could you please let me know how can I specify parameters in the
>>>>>maker_opts.ctl file?
>>>>> Or do you have other suggestions to re-do the data listed above?
>>>>> 
>>>>> Thanks.
>>>>> Anh-Dao
>>>>> 
>>>>> _______________________________________________
>>>>> maker-devel mailing list
>>>>> maker-devel at box290.bluehost.com
>>>>> 
>>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.or
>>>>>g
>>>>
>>>
>>>
>>>_______________________________________________
>>>maker-devel mailing list
>>>maker-devel at box290.bluehost.com
>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>
>>
>



More information about the maker-devel mailing list