[maker-devel] Fwd: exon/intron boundaries
Janna Fierst
jfierst at uoregon.edu
Sat Oct 5 16:18:52 MDT 2013
Hi,
thanks for your reply- I have been going through our annotations in detail
and trying different parameter sets, and I think I have identified what is
going on but I'm not sure how to set the MAKER2 parameters for our
situation. We are working with a species of Caenorhabditid worm and there
are long gene-dense blocks that are being incorrectly annotated as large
single genes instead of several smaller closely spaced genes. The protein
alignments (tblastx and protein2genome) show very clearly where the
exon/intron boundaries are and in most cases agree with the augustus
predictions. The assembled cufflinks output (through blastn, est2genome and
est_gff:cufflinks) does not agree in some locations; I think this may be
because in some cases the UTR nearly overlaps adjacent genes.
I have included a screenshot of an annotated region viewed in apollo to try
to show this. The large gene in the middle is actually 7 different genes
that are extremely close together and MAKER2 is collapsing them into a
single gene. I tried running without any RNASeq/cufflinks data and MAKER2
annotates the region as two genes instead of one, but I can't get it to
recognize the 7 as different genes. I have retrained SNAP but we have not
been able to successfully train Augustus, we are currently using the
default caenhorhabditis species model. I also included a species specific
repeat library. I tried setting correct_est_fusion=1 and reducing
pred_flank but these changes appear to really alter the annotations and we
end up annotating almost nothing. I also tried setting est2genome=0 to
decrease the influence of the cufflinks assembly but it didn't appear to
help. There are some very large introns in these genes so I haven't tried
yet decreasing the maximum intron size because I'm concerned this may
generate too many split genes instead of our current merged gene problem.
Thanks for your help, any advice is greatly appreciated! -Janna Fierst
On Mon, Aug 26, 2013 at 12:21 PM, Carson Holt <carsonhh at gmail.com> wrote:
> Are you getting gene fusions or just more exons? Gene fusions can be
> reduced by setting correct_est_fusion=1, or reducing pred_flank, although
> reducing pred_flank can cause other issues (but those generally only appear
> if setting the value below below 150). Also if you have the maximum intron
> size set to high (split_hit option), you may also be generating bridging
> alignments that make evidence align across distant paralogous genes as well
> (this can result in gene merging)
>
> You should also look at your results manually in a viewer like Apollo.
> Then see if the extra exons are supported by something such as protein
> alignments from another species. If this is the case, you may have a
> poorly annotated protein set that is being used as evidence that is
> carrying over it's erroneous exons into the species you are annotating. If
> the extra exons are supported by EST evidence, then perhaps you should try
> and rebuild the EST assembly (for example trinity has an option to use a
> Jarccardian similarity coefficient to avoid fusing transcripts).
>
> Another option, is to retrain SNAP or Augustus. MAKER does not actually
> produce any of the models itself (it is a pipeline not a predictor). The
> models are all generated using these other algorithms, MAKER just feeds
> them hints based on protein and transcript alignments, so making sure
> training is sufficient is important for those programs to produce their
> best models.
>
> Finally make sure your repeat database is sufficient, you may need to
> generate a species specific repeat library using something like
> RepeatModeler. Repeats can end up being included as extra exons in gene
> models because they may contain reading frames the do code for proteins
> (I.e. reverse transcriptases).
>
> If you have any questions on any of the above, just let us know.
>
> Thanks,
> Carson
>
>
> From: Janna Fierst <jfierst at uoregon.edu>
> Date: Monday, August 26, 2013 2:54 PM
> To: <maker-devel at yandell-lab.org>
> Subject: [maker-devel] exon/intron boundaries
>
> Hi,
>
> I am using MAKER 2.28 to annotate a Caenorhabditid worm genome, and the
> initial results appear fairly good but we seem to be be annotating too many
> exons for multiple genes. I was wondering which parameters should be tuned
> to change the threshold for exon/intron boundaries? Thanks for your help
> -Janna Fierst
> _______________________________________________ maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20131005/c530b020/attachment-0003.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 1.tiff
Type: image/tiff
Size: 115362 bytes
Desc: not available
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20131005/c530b020/attachment-0003.tiff>
More information about the maker-devel
mailing list