[maker-devel] clarification on creating a standard build
Carson Holt
carsonhh at gmail.com
Fri Mar 23 11:20:22 MDT 2018
You will get the ID from the IntrepProscan report. Then you can take that ID and grep for it in the MAKER gff3.
All ab initio predictions have match/match_part features created for them the gff3. You can then take those non-gene match/match_part features and provide them to pred_gff.
You then have two alternate ways to get those models into your dataset.
1. Do a second run with only that pred_gff (i.e. turn off all other MAKER options and blank out all evidence including repeat masking options) and set keep_preds=1.
That will simply take the pred_gff match/match_part values and turn them into a nicely formatted gene/mRNA/exon/CDS features together with associated fasta files. Those can then simply be merged into your current result using GFF3 merge.
2. Provide maker_gff (set pred_pass=0 and all other pass options to 1), provide pred_gff, and set keep_preds=1.
This is the same as the previous run option, but MAKER will do the merging for you. But it will take longer since it will use the maker_gff to rebuild all models and evidence in memory and rescore everything.
—Carson
> On Mar 20, 2018, at 6:48 PM, Valerie Soza <vsoza at uw.edu> wrote:
>
> Hi MAKER community
>
> I am trying to create a standard build as indicated in the Campbell et al. 2014 papers in Plant Physiology and Current Protocols in Bioinformatics. I was following the protocol as outlined in Current Protocols in Bioinformatics, but then came across this thread in the MAKER google forum: https://groups.google.com/forum/#!searchin/maker-devel/quality_filter%7Csort:date/maker-devel/97aNJkT3bgk/mpL7V5QWAAAJ.
>
> I can’t reply to this original thread, but I am trying to follow Carson’s suggestion for a standard build using this protocol instead now:
> "One note I’d like to make, is that doing a second round with keep_preds=1 is the wrong procedure (only do that if you really want to keep everything - i.e. in some fungi or oomycetes). Rather you should use InterProScan to evaluate the rejected models in the non-overlapping.abinit.proteins.fasta file, then grep the ones that have an IPR domain out of the GFF3 (will be match/match_part features) and then pass them to pred_gff in a separate run (just updates the format to gene/mRNA/exon/CDSwith proper reading frame). You can then merge the resulting GFF3's and fasta files.”
>
> Instead of doing a second round of annotations with keep_preds=1, I am using my original annotations with keep_preds=0. I have used InterProScan on the non-overlapping.abinit.proteins.fasta. I am unclear as to what gff3 file to use to grep for genes with IPR domains from the non-overlapping.abinit.proteins.fasta file. Genes from the non-overlapping.abinit.proteins.fasta file are not in my .all.gff file created by the gff3_merge script.
>
> What gff3 file should I be using to resurrect proteins with IPR domains from the non-overlapping.abinit.proteins.fasta? Should I be doing an annotation with keep_preds=1 as well, and resurrecting genes with IPR domains from this gff3?
>
> Thanks.
>
> -Valerie
>
> Valerie Soza, Ph.D.
> c/o Hall Lab
> Department of Biology
> University of Washington
> Johnson Hall 202A
> Box 351800
> Seattle, WA 98195-1800
> 206-543-6740
> http://staff.washington.edu/vsoza/
>
>
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
More information about the maker-devel
mailing list