[maker-devel] tblastn Cleanup?
Carson Holt
carsonhh at gmail.com
Fri Jun 1 11:01:50 MDT 2012
MAKER does this in the GI library file (function --> GI::polish_exonerate).
You might have to do some coding to modify it to work in your own scripts.
It has multiple dependencies on other MAKER library files including
exonerate::splice_info, Widget::exonerate::est2genome, and
Widget::exonerate::protein2genome.
Thanks,
Carson
From: Yogesh <yogeshp08 at gmail.com>
Date: Friday, 1 June, 2012 12:56 PM
To: Carson Holt <carsonhh at gmail.com>
Cc: <maker-devel at yandell-lab.org>
Subject: Re: [maker-devel] tblastn Cleanup?
Hi Carson,
Thanks a lot. This is helpful.
Also I wanted to ask how can I follow this up with exonerate polishing. Is
there a module in MAKER that can run this separately on my BLAST results?
-Yogesh
On Friday, May 18, 2012 at 9:22 AM, Carson Holt wrote:
>
>
> There are several things. I set several filtering options directly on the
> BLAST command line. These are things like maximum intron length, an e-value
> filter, and simple repeat filtering (called dust filter in NCBI blast and seg
> filter in WUBLAST).
>
> I also run repeat masker over the genome first. This allows simple and
> complex repeats to be removed before running BLAST (otherwise you get many
> false alignments).
>
> Last I filter the results based on percent coverage of the hit to the original
> database sequence and percent identity. I think you can set percent identity
> as a flag in BLAST, but the percent coverage filter is being calculated by
> MAKER, so to do this outside of MAKER would require that you write your own
> filtering script to compare the length of the alignment to the length of the
> sequence in the database.
>
> I also have an HSP depth overlap filter. This removes weird low complexity
> hits that escape repeatmasking. They show up as multiple HSPs overlapping
> multiple times in the same region (usually very high numbers like 90 HSPs all
> 100 bp long in the same region). I calculate the number of base pairs in the
> alignment on the hit then divide by the number of base pairs in the query
> alignment. If it's greater than 3, I throw the hit out.
>
> Thanks,
> Carson
>
>
>
> From: Yogesh <yogeshp08 at gmail.com>
> Date: Tuesday, 15 May, 2012 12:07 PM
> To: <maker-devel at yandell-lab.org>
> Subject: [maker-devel] tblastn Cleanup?
>
>
> Hello,
>
> I have a few tblastn alignments with a lot of low quality hits. I have to
> clean that up. Can you please suggest how Maker pipeline does it? Also can I
> run it directly on my data without having to go through the whole pipeline?
>
> Thanks,
>
> -Yogesh
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20120601/e145c056/attachment-0003.html>
More information about the maker-devel
mailing list