[maker-devel] question on gene numbers with quality_filter.pl

Willett, Christopher S willett4 at email.unc.edu
Mon Oct 2 13:28:19 MDT 2017


Hi Mike-

Here is the control file for the last run of MAKER with keep_preds=0 and here is an example of one mRNA retained from the gff file:

Chromosome_6    maker   mRNA    556000  557215  .       +       .       ID=maker-Chromosome_6-exonerate_est2genome-gene-5.3-mRNA-1;Parent=maker-Chromosome_6-exonerate_est2genome-gene-5.3;Name=TCALIF_02833-PA;_AED=1.00;_eAED=1.00;_QI=15|0|0|0|1|1|2|75|338;score=100;Alias=TCALIF_02833-PA

Thanks,

Chris






On Oct 2, 2017, at 3:19 PM, Michael Campbell <michael.s.campbell1 at gmail.com<mailto:michael.s.campbell1 at gmail.com>> wrote:

Hi Chris,

Yeah By default MAKER shouldn’t keep any annotation with an AED of 1. I’ve ccd the dev list on this to see if anyone else has any idea why you might get AED 1 genes with keep_preds=0. Could you send me the maker_opts.ctl file for the run. There may be something informative in there.

Thanks,
Mike
On Oct 2, 2017, at 2:32 PM, Willett, Christopher S <willett4 at email.unc.edu<mailto:willett4 at email.unc.edu>> wrote:

Hi Mike-

I was looking at the lists of mRNAs and I think what is happening is that there are still genes retained in our initial output from MAKER that have an AED=1 that are then getting trimmed out of the filtered file. If I am setting the AED threshold equal to 1 in the control file for the MAKER run is that less than one or less than or equal to one for retention? Should these AED=1 genes be making it into the gene and mRNA pools if we have the keep predictions parameter set to 0?

Thanks for your help,

Best,

Chris



On Oct 2, 2017, at 9:30 AM, Michael Campbell <michael.s.campbell1 at gmail.com<mailto:michael.s.campbell1 at gmail.com>> wrote:

Hi Chris,

This is interesting. -d in quality_filter.pl should only filter out genes based on AED. Is there a chance that you counted transcripts instead of genes? If there is a transcript with an AED of 1 then quality filter should remove it but leave the gene and the transcripts with AEDs less than 1. I can have a look at it if you send me one of the genes (in GFF3 format)  that was filtered out by quality_filter.pl even though it had an AED less than 1.

Thanks,
Mike


On Sep 29, 2017, at 1:20 PM, Willett, Christopher S <willett4 at email.unc.edu<mailto:willett4 at email.unc.edu>> wrote:

Hello-

We are getting to the final stages (hopefully) of a reannotation of a new assembly of a copepod genome using MAKER and we had some questions about which set of genes to use.  Our latest runs were using Pfam domains to define default vs standard set using the quality_filter.pl script and I had a question about stringency of the filters for this script. It appears that the default is more stringent than the output that we get from MAKER without using this script (all with AED max set to 1). Are there additional filters in this script beyond AED that would cause this?

Here is what we are seeing if more details would be helpful. With a run with or without the keep_pred turned our final MAKER run gives ~21500 predicted genes with or 15200 without the keep predictions turned on. What I was wondering about was why this 15200 is higher than the default set  (which gives ~14500 genes) after we filter the gff using the -d setting in quality_filter.pl. For completeness the standard set (-s setting) is retaining ~14800 genes and if I filter the 15200 gff file with the default parameters that yields ~14100 genes. So I was curious what else was going on in the filter script beyond AED that would trim out genes?

The genes sets look pretty good overall and seem like reasonable numbers so we were debating which set to use as our final set. I am also trying a few other analyses in InterProScan to see if that identifies additional genes beyond Pfam for retention but that seems a bit independent from the question above.

Thanks for your help,

Best,

Chris Willett




~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Research Associate Professor
Department of Biology
CB#3280 Coker Hall
University of North Carolina, Chapel Hill
Chapel Hill, NC, 27599-3280

Office: 2252 Genome Science Building
phone: 919-843-8663
fax: 919-962-1625


http://labs.bio.unc.edu/Willett/

_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com>
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org




-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20171002/79f27ca5/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: maker_opts.ctl_full8
Type: application/octet-stream
Size: 5617 bytes
Desc: maker_opts.ctl_full8
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20171002/79f27ca5/attachment-0001.obj>


More information about the maker-devel mailing list