[maker-devel] maker-devel Digest, Vol 74, Issue 17
Nguyen, Anh-Dao (NIH/NHGRI) [C]
nguyenan at mail.nih.gov
Fri Sep 5 08:43:19 MDT 2014
Hi,
I finished running MAKER as suggested above.
Then I ran gff3_merge.pl to retrieve only MAKER annotation using -n -g
options. I called the output file maker.gff3
In the maker.gff3 I found some invalid data (does not conform .gff3
format), e.g.
###
2 +
###
OR
###
.Contig1:hsp:72378:1.3.0.0;Parent=c209800247.Contig1:hit:30214:1.3.0.0;Targ
et=species:tRNA-Asn-AAC|genus:tRNA 1 75 +
###
OR some gene (or mRNA) IDs are not uniq. This means they can be found
multiple times with different values within the maker.gff3
How could it happen? As I understood, mRNA IDs in a .gff3 file must be
uniq.
Thanks
Anh-Dao
On 7/18/14 2:00 PM, "maker-devel-request at yandell-lab.org"
<maker-devel-request at yandell-lab.org> wrote:
>Send maker-devel mailing list submissions to
> maker-devel at yandell-lab.org
>
>To subscribe or unsubscribe via the World Wide Web, visit
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
>or, via email, send a message with subject or body 'help' to
> maker-devel-request at yandell-lab.org
>
>You can reach the person managing the list at
> maker-devel-owner at yandell-lab.org
>
>When replying, please edit your Subject line so it is more specific
>than "Re: Contents of maker-devel digest..."
>
>
>Today's Topics:
>
> 1. Re: Maker_opts.ctl (Carson Holt)
>
>
>----------------------------------------------------------------------
>
>Message: 1
>Date: Fri, 18 Jul 2014 11:04:09 -0600
>From: Carson Holt <carsonhh at gmail.com>
>To: "Nguyen, Anh-Dao (NIH/NHGRI) [C]" <nguyenan at mail.nih.gov>, Daniel
> Ence <dence at genetics.utah.edu>
>Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
>Subject: Re: [maker-devel] Maker_opts.ctl
>Message-ID: <CFEEAF84.DCAF%carsonhh at gmail.com>
>Content-Type: text/plain; charset="UTF-8"
>
>It should just be 'fgenesh'. If it's not there you can still just give
>the GFF3.
>
>--Carson
>
>
>On 7/17/14, 8:19 AM, "Nguyen, Anh-Dao (NIH/NHGRI) [C]"
><nguyenan at mail.nih.gov> wrote:
>
>>I am not sure which fgenesh executable file should I use.
>>
>>fgenesh= #location of fgenesh executable
>>
>>When I run FGENESH++, I need to run the run_pipe.pl script. Sure you need
>>to specify a list of other executable programs (such as ppd, ppdn+, etc)
>>
>>Anh-Dao
>>
>>
>>On 7/16/14 3:32 PM, "Carson Holt" <carsonhh at gmail.com> wrote:
>>
>>>'all' will use the whole of RepBase, or you can do 'metazoa' like your
>>>previous run. Then provide the RepeatModeler file to rmlib=
>>>
>>>--Carson
>>>
>>>
>>>
>>>On 7/16/14, 1:28 PM, "Nguyen, Anh-Dao (NIH/NHGRI) [C]"
>>><nguyenan at mail.nih.gov> wrote:
>>>
>>>>By default, model_org=all. Can I use the de novo repeat library
>>>>predicted
>>>>by RepeatModeler for the rmlib option?
>>>>
>>>>Anh-Dao
>>>>
>>>>
>>>>
>>>>On 7/16/14 3:17 PM, "Carson Holt" <carsonhh at gmail.com> wrote:
>>>>
>>>>>No. You can provide both to MAKER. The options are model_org= and
>>>>>rmlib=.
>>>>> By letting MAKER handle repeat masking it will differentiate repeat
>>>>>types
>>>>>and use soft masking for some and hard masking for others. This
>>>>>increases
>>>>>sensitivity of evidence alignments while still maintaining
>>>>>specificity.
>>>>>
>>>>>--Carson
>>>>>
>>>>>
>>>>>
>>>>>On 7/16/14, 1:07 PM, "Nguyen, Anh-Dao (NIH/NHGRI) [C]"
>>>>><nguyenan at mail.nih.gov> wrote:
>>>>>
>>>>>>I will run Augustus and FGENESH++ inside of MAKER using the parameter
>>>>>>files for Augustus.
>>>>>>I could also run RepeatMasker inside of MAKER. However, I ran RM
>>>>>>using
>>>>>>two
>>>>>>options: -lib (de novo) and -species (known). I got ~ 45% repeats via
>>>>>>de
>>>>>>novo and ~ 4% repeats via known options. As I understood, RM inside
>>>>>>of
>>>>>>MAKER uses only RepBase repeat library and RepeatRunner protein
>>>>>>database.
>>>>>>
>>>>>>Anh-Dao
>>>>>>
>>>>>>
>>>>>>On 7/16/14 2:36 PM, "Carson Holt" <carsonhh at gmail.com> wrote:
>>>>>>
>>>>>>>When you ran Augustus separately, it should have created the
>>>>>>>parameters
>>>>>>>needed to run it. Now you should be able to run it inside of MAKER
>>>>>>>using
>>>>>>>the species name you just created.
>>>>>>>
>>>>>>>I'd also recommend letting MAKER run RepeatMasker for you rather
>>>>>>>than
>>>>>>>giving it the results as GFF3.
>>>>>>>
>>>>>>>--Carson
>>>>>>>
>>>>>>>
>>>>>>>On 7/16/14, 12:30 PM, "Nguyen, Anh-Dao (NIH/NHGRI) [C]"
>>>>>>><nguyenan at mail.nih.gov> wrote:
>>>>>>>
>>>>>>>>Thanks Daniel for your quick response.
>>>>>>>>
>>>>>>>>I did not use the parameter file of other organism when running
>>>>>>>>Augustus.
>>>>>>>>I created the parameter file for the genome following their
>>>>>>>>instructions.
>>>>>>>>There were multiple steps to train and run Augustus (Creating gene
>>>>>>>>structures for training AUGUSTUS with CEGMA => parameter file will
>>>>>>>>be
>>>>>>>>created; Creating Hints for AUGUSTUS from ESTs/cDNA sequences;
>>>>>>>>Incorporating Illumina RNAseq into AUGUSTUS with GSNAP, etc.)
>>>>>>>>As I mentioned the reason why I ran Augustus separately, because
>>>>>>>>Augustus
>>>>>>>>has not trained that genome (no parameter file exists). Otherwise I
>>>>>>>>would
>>>>>>>>run Augustus inside MAKER.
>>>>>>>>
>>>>>>>>You suggested to use rm_gff option to specify RepeatMasker output
>>>>>>>>(sure
>>>>>>>>I
>>>>>>>>will convert them to .gff3 formatted files). Can I submit two RM
>>>>>>>>.gff3
>>>>>>>>files, separated by comma?
>>>>>>>>
>>>>>>>>Anh-Dao
>>>>>>>>
>>>>>>>>
>>>>>>>>On 7/16/14 2:13 PM, "Daniel Ence" <dence at genetics.utah.edu> wrote:
>>>>>>>>
>>>>>>>>>Hi Anh-Dao,
>>>>>>>>>
>>>>>>>>>In the maker_opts.ctl file, there are options for est and protein
>>>>>>>>>evidence. You?ll put all of your fasta est files together in a
>>>>>>>>>command
>>>>>>>>>separated list in the ?est" option, and all of your fasta protein
>>>>>>>>>files
>>>>>>>>>in a command separated list for the ?protein? option.
>>>>>>>>>
>>>>>>>>>You?ll specify the SNAP and Genemark files in their respective
>>>>>>>>>options
>>>>>>>>>in
>>>>>>>>>the control file and pass the augustus and fgenesh predictions in
>>>>>>>>>the
>>>>>>>>>?pred_gff? option.
>>>>>>>>>
>>>>>>>>>If you have the RepeatMasker output in gff3 format you can give it
>>>>>>>>>to
>>>>>>>>>maker with the ?rm_gff? option.
>>>>>>>>>
>>>>>>>>>If you?ve converted the cufflinks output to gff3, you can give it
>>>>>>>>>to
>>>>>>>>>maker with the ?est_gff? option. I?m pretty sure Trinity only
>>>>>>>>>gives
>>>>>>>>>fasta
>>>>>>>>>output, so you would put that in the ?est? option, along with all
>>>>>>>>>the
>>>>>>>>>other est fasta files.
>>>>>>>>>
>>>>>>>>>If Augustus isn?t trained for your particular organism, then you
>>>>>>>>>can
>>>>>>>>>use
>>>>>>>>>another organism that augustus is already trained for. The list of
>>>>>>>>>species that augustus has parameter files for is in the README.txt
>>>>>>>>>that
>>>>>>>>>came with Augustus. I really recommend that you run Augustus from
>>>>>>>>>inside
>>>>>>>>>maker, because then you get all the benefits of maker passing
>>>>>>>>>ext-based
>>>>>>>>>hints to augustus at runtime, which can really improve Augustus?
>>>>>>>>>predictive ability.
>>>>>>>>>
>>>>>>>>>When you ran the augustus gene prediction separately, did you use
>>>>>>>>>another
>>>>>>>>>organism?s parameter file?
>>>>>>>>>
>>>>>>>>>Thanks,
>>>>>>>>>Daniel
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>On Jul 16, 2014, at 11:15 AM, Nguyen, Anh-Dao (NIH/NHGRI) [C]
>>>>>>>>><nguyenan at mail.nih.gov> wrote:
>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> I would like to conduct a genome annotation and have the
>>>>>>>>>>following
>>>>>>>>>>data:
>>>>>>>>>> - Two separate RepeatMasker outputs (using -lib and -species
>>>>>>>>>>options)
>>>>>>>>>> - ESTs and RACE (fasta)
>>>>>>>>>> - proteins (fasta)
>>>>>>>>>> - proteins of related organisms (fasta)
>>>>>>>>>> - SNAP's .hmm file (ran CEGMA, then used cegma2zff.pl to convert
>>>>>>>>>>to
>>>>>>>>>>ZFF
>>>>>>>>>>format, etc. )
>>>>>>>>>> - GeneMark's .hmm file (es.mod file from running gm_es.pl)
>>>>>>>>>> - FGENESH++ and Augustus gene predictions. I wrote scripts to
>>>>>>>>>>convert
>>>>>>>>>>the outputs to .gff3 files. The reason why I ran Augustus gene
>>>>>>>>>>prediction separately, because the genome has never been trained
>>>>>>>>>>for
>>>>>>>>>>Augustus.
>>>>>>>>>> - Cufflinks and Trinity from RNA-Seq
>>>>>>>>>>
>>>>>>>>>> Could you please let me know how can I specify parameters in the
>>>>>>>>>>maker_opts.ctl file?
>>>>>>>>>> Or do you have other suggestions to re-do the data listed above?
>>>>>>>>>>
>>>>>>>>>> Thanks.
>>>>>>>>>> Anh-Dao
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> maker-devel mailing list
>>>>>>>>>> maker-devel at box290.bluehost.com
>>>>>>>>>>
>>>>>>>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-l
>>>>>>>>>>a
>>>>>>>>>>b
>>>>>>>>>>.
>>>>>>>>>>o
>>>>>>>>>>r
>>>>>>>>>>g
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>_______________________________________________
>>>>>>>>maker-devel mailing list
>>>>>>>>maker-devel at box290.bluehost.com
>>>>>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab
>>>>>>>>.
>>>>>>>>o
>>>>>>>>r
>>>>>>>>g
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>
>
>
>
>
>
>------------------------------
>
>Subject: Digest Footer
>
>_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
>
>------------------------------
>
>End of maker-devel Digest, Vol 74, Issue 17
>*******************************************
More information about the maker-devel
mailing list