[maker-devel] Fused gene problem, improvement in the Maker 2.27?
Carson Holt
carsonhh at gmail.com
Tue May 21 19:39:01 MDT 2013
One more time, but I fixed a few obvious spelling errors -->
1. To run maker, we use transcripts produced by tophat+cufflink
approach instead of de novo trinity. Will it avoid the possible merging of
RNA-Seq reads?
No. Trinity would probably be a better approach to avoid merging.
2. If my understanding is correct, the ³correct_est_fusion² parameter
needs to be turned off when we don¹t ask Maker/prediction algorithms to
predict UTRs? Also, it makes me wonder, in such case, when Maker turn off
UTRs, but our RNA-Seq data has got long UTRs, will these UTRs been added
into the maker gene model?
MAKER will always try to add UTR if the EST evidence suggests it.
Technically it's a little bit more than that, it can also add missing exons
and extend CDS. The correct_est_fusion, just causes it to clip really long
UTR if it looks like it was added due to merged evidence, and is probably
not really a contiguous part of the gene. The long UTRs that can result
from mRNA-seq are often false. You are basically expanding the UTR by
assembling into exons from the neighboring gene. This is especially common
in organisms like fungi where UTR of neighboring genes often overlap, and
mRNA-seq assemblies falsely make it look like one transcript encompasses 1,
2 , or more gene loci (you lose the true UTR boundaries).
--Carson
From: <Sean.Li at csiro.au>
Date: Tuesday, 21 May, 2013 9:23 PM
To: Carson Holt <carsonhh at gmail.com>, Barry Moore
<barry.moore at genetics.utah.edu>
Cc: <maker-devel at yandell-lab.org>
Subject: RE: [maker-devel] Fused gene problem, improvement in the Maker
2.27?
Thanks Barry and Carson for your detailed explanation. Now I have a better
understand of ³pred_flank².
1. To run maker, we use transcripts produced by tophat+cufflink
approach instead of de novo trinity. Will it avoid the possible merging of
RNA-Seq reads?
2. If my understanding is correct, the ³correct_est_fusion² parameter
needs to be turned off when we don¹t ask Maker/prediction algorithms to
predict UTRs? Also, it makes me wonder, in such case, when Maker turn off
UTRs, but our RNA-Seq data has got long UTRs, will these UTRs been added
into the maker gene model?
Regards,
Sean
From: Carson Holt [mailto:carsonhh at gmail.com]
Sent: Wednesday, 22 May 2013 10:59 AM
To: Barry Moore; Li, Sean (CMIS, Acton)
Cc: maker-devel at yandell-lab.org
Subject: Re: [maker-devel] Fused gene problem, improvement in the Maker
2.27?
Yes. Barry gave a good overview. The correct_est_fusion option basically
clips UTR when there are two neighboring genes that only overlap in the UTR
(so you still get both gene models). Since the primary effect of falsely
merged mRNA-seq is overly long UTR this tends to fix many cases. Of course
avoiding merging the mRNA-seq reads in the first place also works. So using
Trinity's extra options to control that together with the correct_est_option
option in MAKER is probably the way to go.
I think you can lower pred_flank to 100, but below that you might start to
get weird behavior from the gene predictors (they need some upstream and
downstream sequence or the HMMs don't work well).
Thanks,
Carson
From: Barry Moore <barry.moore at genetics.utah.edu>
Date: Tuesday, 21 May, 2013 7:54 PM
To: <Sean.Li at csiro.au>
Cc: <maker-devel at yandell-lab.org>
Subject: Re: [maker-devel] Fused gene problem, improvement in the Maker
2.27?
Hi Sean,
I think you want to be careful with dropping the pred_flank parameter too
low. This controls how much flanking sequence (for a given cluster of
evidence) MAKER will pass to the gene predictor. Some (maybe all?) of the
gene predictors have an initial state in their HMM for intergenic sequence
and if you do not have some intergenic sequence for them to consider first
they can't transition to their next state. The correct_est_fusion option
can help (at the cost of losing some UTR annotations) - Carson will likely
give you a better description of the intricacies of the correct_est_fusion.
Don't know how you are assembling your RNASeq, but there is an option in
Trinity - I forget the name - that will instruct Trinity to be more
restrictive in merging neighboring clusters of reads into a longer
transcript and this can help as well.
B
On May 21, 2013, at 1:36 AM, <Sean.Li at csiro.au>
wrote:
Hi Carson,
We are currently working on the annotation of Helicoverpa genome project.
Maker has been chosen as the preliminary tool for the task. By checking the
annotation results by using maker 2.10, we saw some loci have the fusion
problem: two separate neighbour genes are likely to be fused together and
regarded as a single candidate output by maker. If we go further by looking
at the outputs from each individual de novo algorithm, e.g. augustus or
snap, the prediction was correct. We are also using RNA-Seq assembly from
cufflinks and some protein evidence data from closely related insects.
We noticed that the parameters ³pred_flank² in maker v2.10 and
³correct_est_fusion² in maker v2.27 might be useful for maker to decide when
to merge models or not. If possible, can you please explain what these two
parameters can do with the predicted genes, RNA-Seq and protein evidence?
Also, our current plan is to install maker 2.27, train the algorithms to
predict UTRs, enlarge the protein evidence datasets and input our previous
annotations as model_gff. We are facing with an critical question: in which
way we could effectively improve the gene fusing problem? 1) setting the
pred_flank lower than 100? 2) turn the correct_est_fusion on? 3) anything
else?
Thank you.
With best regards,
Xi (Sean) Li, Ph. D.
Bioinformatics Analyst, Bioinformatics Core,
CSIRO Mathematics, Informatics and Statistics
Phone: +61 2 6216 7138 <tel:%2B61%202%206216%207138>
Address: GPO Box 664, Canberra, ACT 2601
_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Barry Moore
Research Scientist
Dept. of Human Genetics
University of Utah
Salt Lake City, UT 84112
--------------------------------------------
(801) 585-3543
_______________________________________________ maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
<http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20130521/6d518f39/attachment-0003.html>
More information about the maker-devel
mailing list