[maker-devel] tblastn Cleanup?

Carson Holt carsonhh at gmail.com
Fri Jun 1 11:01:50 MDT 2012


MAKER does this in the GI library file (function --> GI::polish_exonerate).
You might have to do some coding to modify it to work in your own scripts.
It has multiple dependencies on other MAKER library files including
exonerate::splice_info, Widget::exonerate::est2genome, and
Widget::exonerate::protein2genome.

Thanks,
Carson

From:  Yogesh <yogeshp08 at gmail.com>
Date:  Friday, 1 June, 2012 12:56 PM
To:  Carson Holt <carsonhh at gmail.com>
Cc:  <maker-devel at yandell-lab.org>
Subject:  Re: [maker-devel] tblastn Cleanup?

 
Hi Carson,

Thanks a lot. This is helpful.

Also I wanted to ask how can I follow this up with exonerate polishing. Is
there a module in MAKER that can run this separately on my BLAST results?

-Yogesh

  

On Friday, May 18, 2012 at 9:22 AM, Carson Holt wrote:
> 
>  
> There are several things.  I set several filtering options directly on the
> BLAST command line.  These are things like maximum intron length, an e-value
> filter, and simple repeat filtering (called dust filter in NCBI blast and seg
> filter in WUBLAST).
> 
> I also run repeat masker over the genome first.  This allows simple and
> complex repeats to be removed before running BLAST (otherwise you get many
> false alignments).
> 
> Last I filter the results based on percent coverage of the hit to the original
> database sequence and percent identity.  I think you can set percent identity
> as a flag in BLAST, but the percent coverage filter is being calculated by
> MAKER, so to do this outside of MAKER would require that you write your own
> filtering script to compare the length of the alignment to the length of the
> sequence in the database.
> 
> I also have an HSP depth overlap filter.  This removes weird low complexity
> hits that escape repeatmasking.  They show up as multiple HSPs overlapping
> multiple times in the same region (usually very high numbers like 90 HSPs all
> 100 bp long in the same region).  I calculate the number of base pairs in the
> alignment on the hit then divide by the number of base pairs in the query
> alignment.  If it's greater than 3, I throw the hit out.
> 
> Thanks,
> Carson
> 
> 
> 
> From:  Yogesh <yogeshp08 at gmail.com>
> Date:  Tuesday, 15 May, 2012 12:07 PM
> To:  <maker-devel at yandell-lab.org>
> Subject:  [maker-devel] tblastn Cleanup?
> 
>  
> Hello,
> 
> I have a few tblastn alignments with a lot of low quality hits. I have to
> clean that up. Can you please suggest how Maker pipeline does it? Also can I
> run it directly on my data without having to go through the whole pipeline?
> 
> Thanks,
> 
> -Yogesh


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20120601/e145c056/attachment-0003.html>


More information about the maker-devel mailing list