[maker-devel] Fewer gene models output with a superset of EST evidence
Bob Zimmermann
robert.zimmermann at univie.ac.at
Thu Oct 19 09:28:17 MDT 2017
Correction to the above numbers, the median lengths are 1414 and 1256.
> On 19 Oct 2017, at 17:25, Bob Zimmermann <robert.zimmermann at univie.ac.at> wrote:
>
> Hi Maker Developers,
>
> I have been playing around with several data sets as input to annotate our newly reassembled genome. We have 3 RNA seq datasets which have been assembled into de novo transcripts using Trinity. These are input into the maker pipeline along with protein evidence. What is strange is that when I run maker with the de novo transcripts from a single set, I optain more maker transcripts than when I run with a combined set (1619 vs 1450 on one chromosome) and they are longer (median transcript length 1619 vs 1450, IQR 872-2160 vs 667-2026). It might make sense if they were more and shorter if the additional evidence was joining transcripts, but this would indicate that it is not the case.
>
> Therefore I’m trying to understand the algorithm. From what I understand if it finds evidence for an ab initio prediction for which the internal splice junctions agree, then it is considered for improvement. Why, then, if my combined set is a strict superset of the single set, do i get more transcripts with the single set?
>
> Thanks for your help!
>
> Best,
> Bob
>
> —
>
> Department of Molecular Evolution and Development
> Universität Wien
> Althanstraße 14 (UZA I), Zimmer 2.019
> 1090 Vienna
> Austria
>
> +43 1 427757002
>
More information about the maker-devel
mailing list