[maker-devel] Maker2 gff file output

Carson Holt carsonhh at gmail.com
Tue Mar 19 20:43:44 MDT 2013


>I am currently running MAKER2 on a new algal genome and am running
>into a couple of problems that I would like your input on the genome
>size is ~60Mb and is currently in ~3100 contigs.
>First, I am having trouble doing multiple iterations of hmm training
>with SNAP due to the fact that I have so many gff output files in the
>datastore (1 for each contig in my draft genome). not just a single
>gff output that seems to be in the examples and tutorials I have
>followed thus far. Is there a way to combine all of my gff files
>together to make use of the SNAP hmm training or re-annotation?

Use the gff3_merge script in the .../maker/bin/ directory


>
>Second, Using multiple lines of evidence (augustus, genemarkES, RNAseq
>data, and COGs based on homology searches) I am having a hard time
>getting a lot of maker gene calls.  It seems that the calling is too
>stringent in many cases.  When I view the output of many contigs in
>Apollo, there is many times where 3 or 4 models show close to
>identical gene structure, but the final maker output does not contain
>that gene call.  Do you have any suggestions on how to lower the
>stringency of the MAKER output so that more genes will be called?  In
>some cases I am getting less than 3000 gene calls in the final output.
>Where an Augustus model trained on Chlamydamonas will return ~15000.

I agree with Mark.  You may want to set single_exon=1 to accept single
exon evidence, try increasing the depth of your protein evidence file as
well, or if the genome is relatively gene dense, set keep_preds=1.  On
some genomes that are gene dense (fungi for example) ab initio predictors
don't have that high a false positive rate, so this can be safe.  However
on more complex genomes doing so can produce more false positives than
there are genes.


Thanks,
Carson




On 13-03-19 8:02 PM, "Mark Yandell" <myandell at genetics.utah.edu> wrote:

>Hi Blake,
>
>I'be forwarded this onto the maker_devel list, they can help you  more
>there.
>
>regarding your comment g 'When I view the output of many contigs in
>Apollo, there is many times where 3 or 4 models show close to identical
>gene structure, but the final maker output does not contain that gene
>call.  ' Those calls are in the output files, but there are in a
>different multifasta file; there are  non-overalpping ab intio models.
>Another way is to set the config flag to allow MAKEr to use unspliced EST
>and RNA-seq alignments as evidence,
>
>I'be forwarded this onto the maker_devel list, they can help you  more
>there.
>
>cheers,
>
>--mark
>
>
>Mark Yandell
>Professor of Human Genetics
>H.A. & Edna Benning Presidential Endowed Chair
>Eccles Institute of Human Genetics
>University of Utah
>15 North 2030 East, Room 2100
>Salt Lake City, UT 84112-5330
>ph:801-587-7707
>
>________________________________________
>From: Blake Hovde [hovdebt at uw.edu]
>Sent: Tuesday, March 19, 2013 2:35 PM
>To: Mark Yandell
>Subject: Maker2 gff file output
>
>Hi Dr. Yandell,
>
>I am currently running MAKER2 on a new algal genome and am running
>into a couple of problems that I would like your input on the genome
>size is ~60Mb and is currently in ~3100 contigs.
>First, I am having trouble doing multiple iterations of hmm training
>with SNAP due to the fact that I have so many gff output files in the
>datastore (1 for each contig in my draft genome). not just a single
>gff output that seems to be in the examples and tutorials I have
>followed thus far. Is there a way to combine all of my gff files
>together to make use of the SNAP hmm training or re-annotation?
>
>Second, Using multiple lines of evidence (augustus, genemarkES, RNAseq
>data, and COGs based on homology searches) I am having a hard time
>getting a lot of maker gene calls.  It seems that the calling is too
>stringent in many cases.  When I view the output of many contigs in
>Apollo, there is many times where 3 or 4 models show close to
>identical gene structure, but the final maker output does not contain
>that gene call.  Do you have any suggestions on how to lower the
>stringency of the MAKER output so that more genes will be called?  In
>some cases I am getting less than 3000 gene calls in the final output.
> Where an Augustus model trained on Chlamydamonas will return ~15000.
>
>Thanks very much for your help!
>
>Sincerely,
>Blake Hovde
>Graduate Student
>Department of Genome Sciences
>University of Washington
>
>_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org






More information about the maker-devel mailing list