[maker-devel] question on gene numbers with quality_filter.pl

Michael Campbell michael.s.campbell1 at gmail.com
Mon Oct 2 07:30:43 MDT 2017


Hi Chris,

This is interesting. -d in quality_filter.pl should only filter out genes based on AED. Is there a chance that you counted transcripts instead of genes? If there is a transcript with an AED of 1 then quality filter should remove it but leave the gene and the transcripts with AEDs less than 1. I can have a look at it if you send me one of the genes (in GFF3 format)  that was filtered out by quality_filter.pl even though it had an AED less than 1. 

Thanks,
Mike


> On Sep 29, 2017, at 1:20 PM, Willett, Christopher S <willett4 at email.unc.edu> wrote:
> 
> Hello-
> 
> We are getting to the final stages (hopefully) of a reannotation of a new assembly of a copepod genome using MAKER and we had some questions about which set of genes to use.  Our latest runs were using Pfam domains to define default vs standard set using the quality_filter.pl script and I had a question about stringency of the filters for this script. It appears that the default is more stringent than the output that we get from MAKER without using this script (all with AED max set to 1). Are there additional filters in this script beyond AED that would cause this? 
> 
> Here is what we are seeing if more details would be helpful. With a run with or without the keep_pred turned our final MAKER run gives ~21500 predicted genes with or 15200 without the keep predictions turned on. What I was wondering about was why this 15200 is higher than the default set  (which gives ~14500 genes) after we filter the gff using the -d setting in quality_filter.pl. For completeness the standard set (-s setting) is retaining ~14800 genes and if I filter the 15200 gff file with the default parameters that yields ~14100 genes. So I was curious what else was going on in the filter script beyond AED that would trim out genes?  
> 
> The genes sets look pretty good overall and seem like reasonable numbers so we were debating which set to use as our final set. I am also trying a few other analyses in InterProScan to see if that identifies additional genes beyond Pfam for retention but that seems a bit independent from the question above.
> 
> Thanks for your help,
> 
> Best,
> 
> Chris Willett
> 
>   
> 
> 
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> Research Associate Professor
> Department of Biology
> CB#3280 Coker Hall
> University of North Carolina, Chapel Hill
> Chapel Hill, NC, 27599-3280 
> 
> Office: 2252 Genome Science Building
> phone:
> 919-843-8663
> fax:
> 919-962-1625
> 
> http://labs.bio.unc.edu/Willett/ <http://labs.bio.unc.edu/Willett/>
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20171002/351c9c95/attachment-0001.html>


More information about the maker-devel mailing list