[maker-devel] [Caution: Message contains Redirect URL content] InterProScan protein domain & AED physical evidence filtering
Carson Holt
carsonhh at gmail.com
Tue Nov 1 09:43:21 MDT 2016
One note I’d like to make, is that doing a second round with keep_preds=1 is the wrong procedure (only do that if you really want to keep everything - i.e. in some fungi or oomycetes). Rather you should use InterProScan to evaluate the rejected models in the non-overlapping.abinit.proteins.fasta file, then grep the ones that have an IPR domain out of the GFF3 (will be match/match_part features) and then pass them to pred_gff in a separate run (just updates the format to gene/mRNA/exon/CDSwith proper reading frame). You can then merge the resulting GFF3's and fasta files.
The reason there are differences between the runs is that there are models with AED less than 1 that get rejected for other reasons that you are brought back with keep_preds=1. For example if the only evidence is a protein alignment that has deep overlapping HSPs (extremely low complexity alignment) it will be filtered out even though AED is not technically equal to 1. Also if the overlapping protein evidence is in a different reading frame than the model it is supposed to support then the AED will be less than 1 but eAED will be 1 (extended AED), and the model will be rejected.
—Carson
>> Hello MAKER google group,
>>
>>
>> For the final round of a MAKER annotation for a de novo plant genome assembly, I ran MAKER twice: once with keep_preds=0 which annotated 20,284 genes and once with keep_preds=1 which annotated 34,055 genes.
>>
>>
>> I ran the 34,055 genes (the keep_preds=1 set) through InterProScan to search the MAKER predictions for protein domain content and added this IPRScan output into the MAKER gff file with the ipr_update_gff accessory script.
>>
>>
>> The game plan is to go through the 34,055 genes and remove any gene model that doesn稚 have either protein domain content or physical evidence. I am counting genes that have an AED=1 as the genes that don稚 have physical evidence.
>>
>>
>> I have two questions:
>>
>>
>>
>> 1. I count 11,762 genes that have AED=1.0 in the keep_preds=1 annotation set, which leaves me with 22,293 genes that I知 assuming have some physical evidence (34,055-11,762=22,293). But when I ran MAKER with keep_preds=0 originally, I only count 20,284 genes. What are the extra ‾2,000 genes that are being annotated in the keep_preds=1 run that have and AED score of less than 1.0, but are not being annotated in the keep_preds=0 run?
>>
>>
>> 2. My second question is if there is an accessory script available that will remove genes that lack either the IPRScan protein domains or physical evidence (AED < 1)? This type of gene removal was mentioned in a previous post from 2012 (https://groups.google.com/forum/#!searchin/maker-devel/sorry$20there$27s$20not$20a$20script$20prepackaged$20with$20MAKER$20for$20that$20yet.%7Csort:relevance/maker-devel/VaoXWlGHOjs/EElr_otrK8QJ <https://groups.google.com/forum/#!searchin/maker-devel/sorry$20there$27s$20not$20a$20script$20prepackaged$20with$20MAKER$20for$20that$20yet.%7Csort:relevance/maker-devel/VaoXWlGHOjs/EElr_otrK8QJ>) and I was just wondering if since then someone wrote a script that will do this for me.
>>
>>
>>
>> If anyone could offer me any feedback, that would be greatly appreciated!
>>
>>
>>
>> Thank you,
>>
>>
>>
>> Allison
>>
>> _______________________________________________
>> maker-devel mailing list
>> maker-devel at box290.bluehost.com <mailto:maker-devel at box290.bluehost.com>
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20161101/f404df35/attachment-0003.html>
More information about the maker-devel
mailing list