[maker-devel] Curious pattern in AED distributions
Mark Yandell
myandell at genetics.utah.edu
Sun Apr 7 11:39:16 MDT 2019
· Sorry. I’m dyslexic, especially early in the morning. Yes, good stuff is on the left. As regards single exon genes, that’s always a hard call, as these have a higher false positive rate. Things to consider are how prevalent are introns in your org? Cason can give more advice on this point, I’m sure.
·
· By ‘"final build", I meant is this using the ‘Standard build’ or ‘Max Build’ protocol from PMC4286374?
From: Lior Glick <liorglck at gmail.com>
Date: Sunday, April 7, 2019 at 10:29 AM
To: Mark Yandell <myandell at genetics.utah.edu>
Cc: "liorglic at mail.tau.ac.il" <liorglic at mail.tau.ac.il>, "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
Subject: Re: [maker-devel] Curious pattern in AED distributions
Dear Mark,
Thank you for the quick reply. I'm happy to see this ignites your interest and am willing to endure your punishing questions (;
Before I answer them, I just want to make sure we're on the same page - as far as I understand, lower AED scores indicate higher agreement with the evidence, so the "good stuff" is actually left of the 0.5 surge. Am I correct? Otherwise, this is a very poor annotation...
Now for the questions:
1) I did not make any filtrations so far, so single exon genes are included as well. in fact, I'm exploring the results in order to develop some criteria for filtering the genes. Would you suggest discarding single exon genes?
2) My evidence consist of assembled transcripts, proteins and predicted gene models (pred_gff).
3) As for repeats, I'm masking based on a repeats library obtained from a previous publication, specific to my organism of interest.
Unfortunately, I didn't understand your final question. Could you please explain what you mean by "final build"?
Hope these answers are helpful, and waiting to hear more thoughts.
Thanks again.
On Sun, Apr 7, 2019, 18:11 Mark Yandell <myandell at genetics.utah.edu<mailto:myandell at genetics.utah.edu>> wrote:
Hi Lior,
Fun! The short answer is I don’t know. Obviously, the good stuff is on the right side of 0.5.
That said, I can think of a couple of things to look into to explain the left side of the graph. Are you allowing single exon genes? Are you using RNA seq data, protein, or both? What about repeat masking? Are you doing it? Do you have your own library?
My first guess, would be low complexity/repeat sequences generating more or less random blastx hits across the genome…Carson, what do you think?
And finally, what does the AED look like for the genes included in the final build?
Sorry for all the questions, Lior. That’s your punishment for asking an interesting one. 😉
--mark
From: maker-devel <maker-devel-bounces at yandell-lab.org<mailto:maker-devel-bounces at yandell-lab.org>> on behalf of Lior Glick <liorglic at mail.tau.ac.il<mailto:liorglic at mail.tau.ac.il>>
Date: Sunday, April 7, 2019 at 7:26 AM
To: "maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>" <maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>
Subject: [maker-devel] Curious pattern in AED distributions
Hi MAKER users,
Lately I've been performing annotations for multiple genomes from the same species.
When plotting the histogram of AED scores over all genes, I repeatedly see a very specific pattern, that looks something like this:
[AED_hist.png]
This pattern is a bit surprising to me, in two aspects:
1) Why is there a surge towards 0.5?
2) Why is there a sudden drop right after that surge?
Has anyone else seen this, or is this a specific outcome of my data/configuration?
Any ideas of what may cause such a distribution?
While this is not necessarily an indication of a problem or bug, it does seem a bit odd, and might imply some bias or artifact.
Would appreciate your comments.
Thank you!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20190407/5a8a83a9/attachment-0003.html>
More information about the maker-devel
mailing list