[maker-devel] Fwd: exon/intron boundaries
Carson Holt
carsonhh at gmail.com
Mon Oct 7 07:20:17 MDT 2013
Hi Janna,
There are a couple of things to do. Download the maker-2.29p-beta from the
lab website. This includes some changes made to improve the performance of
correct_est_fusion. Whenever using the correct_est_fusion option, this also
reduces the influence of ESTs on annotation. So normally you would need to
increase your protein dataset, but given that you have supplied 3 nematode
species and all of UniProt already you should probably be fine. But there is
one last thing you can do. Instead of using cufflinks, try using trinity to
assemble the ESTs. There is a Jaccard clip option that reducing merging
caused by overlapping UTR. Between the trinity and correct_est_fusion you
should be able to really reduce the effect of those ESTs.
If those changes don't work there is one last option. If you take the MAKER
results and filter them for SNAP and Augustus ab initio results
(match/match_part in the GFF3), then you can pass those in to the pred_gff
options. Then turn snaphmm and augustus_species off in the control files.
Basically what this will do is turn MAKER's hint based prediction off and
force it to filter the ab intio results and select directly models from
there. Since the merging is being caused by bad hints (merged transcripts
from mRNAseq) this would reduce that effect. You will still need
correct_est_fusion=1 though to trim UTR coming from the merged transcripts,
because even though MAEKR can't process hints to rerun SNAP and Augustus
this way, it will try and add UTR using the EST evidence.
--Carson
From: Janna Fierst <jfierst at uoregon.edu>
Date: Friday, October 4, 2013 1:06 PM
To: <maker-devel at yandell-lab.org>
Subject: [maker-devel] Fwd: exon/intron boundaries
Hi,
thanks for your reply- I have been going through our annotations in detail
and trying different parameter sets, and I think I have identified what is
going on but I'm not sure how to set the MAKER2 parameters for our
situation. We are working with a species of Caenorhabditid worm and there
are long gene-dense blocks that are being incorrectly annotated as large
single genes instead of several smaller closely spaced genes. The protein
alignments (tblastx and protein2genome) show very clearly where the
exon/intron boundaries are and in most cases agree with the augustus
predictions. The assembled cufflinks output (through blastn, est2genome and
est_gff:cufflinks) does not agree in some locations; I think this may be
because in some cases the UTR nearly overlaps adjacent genes.
I have included a screenshot of an annotated region viewed in apollo to try
to show this. The large gene in the middle is actually 7 different genes
that are extremely close together and MAKER2 is collapsing them into a
single gene. I tried running without any RNASeq/cufflinks data and MAKER2
annotates the region as two genes instead of one, but I can't get it to
recognize the 7 as different genes. I have retrained SNAP but we have not
been able to successfully train Augustus, we are currently using the default
caenhorhabditis species model. I also included a species specific repeat
library. I tried setting correct_est_fusion=1 and reducing pred_flank but
these changes appear to really alter the annotations and we end up
annotating almost nothing. I also tried setting est2genome=0 to decrease the
influence of the cufflinks assembly but it didn't appear to help. There are
some very large introns in these genes so I haven't tried yet decreasing the
maximum intron size because I'm concerned this may generate too many split
genes instead of our current merged gene problem. Thanks for your help, any
advice is greatly appreciated! -Janna Fierst
On Mon, Aug 26, 2013 at 12:21 PM, Carson Holt <carsonhh at gmail.com> wrote:
> Are you getting gene fusions or just more exons? Gene fusions can be reduced
> by setting correct_est_fusion=1, or reducing pred_flank, although reducing
> pred_flank can cause other issues (but those generally only appear if setting
> the value below below 150). Also if you have the maximum intron size set to
> high (split_hit option), you may also be generating bridging alignments that
> make evidence align across distant paralogous genes as well (this can result
> in gene merging)
>
> You should also look at your results manually in a viewer like Apollo. Then
> see if the extra exons are supported by something such as protein alignments
> from another species. If this is the case, you may have a poorly annotated
> protein set that is being used as evidence that is carrying over it's
> erroneous exons into the species you are annotating. If the extra exons are
> supported by EST evidence, then perhaps you should try and rebuild the EST
> assembly (for example trinity has an option to use a Jarccardian similarity
> coefficient to avoid fusing transcripts).
>
> Another option, is to retrain SNAP or Augustus. MAKER does not actually
> produce any of the models itself (it is a pipeline not a predictor). The
> models are all generated using these other algorithms, MAKER just feeds them
> hints based on protein and transcript alignments, so making sure training is
> sufficient is important for those programs to produce their best models.
>
> Finally make sure your repeat database is sufficient, you may need to generate
> a species specific repeat library using something like RepeatModeler. Repeats
> can end up being included as extra exons in gene models because they may
> contain reading frames the do code for proteins (I.e. reverse transcriptases).
>
> If you have any questions on any of the above, just let us know.
>
> Thanks,
> Carson
>
>
> From: Janna Fierst <jfierst at uoregon.edu>
> Date: Monday, August 26, 2013 2:54 PM
> To: <maker-devel at yandell-lab.org>
> Subject: [maker-devel] exon/intron boundaries
>
> Hi,
>
> I am using MAKER 2.28 to annotate a Caenorhabditid worm genome, and the
> initial results appear fairly good but we seem to be be annotating too many
> exons for multiple genes. I was wondering which parameters should be tuned to
> change the threshold for exon/intron boundaries? Thanks for your help -Janna
> Fierst
> _______________________________________________ maker-devel mailing list
> maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/mak
> er-devel_yandell-lab.org
_______________________________________________ maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20131007/dd8c6529/attachment-0003.html>
More information about the maker-devel
mailing list