[maker-devel] Repeats...
Panos Ioannidis
panos.ioannidis at gmail.com
Mon Jul 13 03:45:08 MDT 2015
Hi Daniel,
Thanks for the reply. No, I'm not using Genemark. I didn't check for
overlap with RepeatMasker elements or transcript/protein evidence though.
But since it is such an unexpected finding, I decided to do something
simpler. So I took all 750 transposases with the same InterPro annotation
(IS4 family transposases) and clustered them with CD-HIT (amino acid
sequences). At 90% similarity threshold each transposase goes to its own
cluster. At 80% I get 748 clusters... This means that even though these
transposases belong to the same family, they have diverged quite a bit, so
that they're no longer considered "repeat elements". And this explains why
they were not filtered out by RepeatMasker and made it to the final gene
set.
On Fri, Jul 10, 2015 at 5:00 PM, Daniel Ence <dence at genetics.utah.edu>
wrote:
> Hi Panos, Without knowing how you made the species-specific repeat
> library, I can't speak to why it's giving hits against repbase. As to the
> 800 transposases, are they overlapped by repeat masker elements? Are they
> supported by EST or protein evidence? Are you using Genemark? That
> ab-initio predictor runs on the unmasked genome sequence, so if the
> transposases are present in your evidence set, they could still show up as
> gene models.
>
> ~Daniel
>
> Sent from my iPhone
>
> On Jul 10, 2015, at 5:45 AM, Panos Ioannidis <panos.ioannidis at gmail.com>
> wrote:
>
> An additional question related to the previous.
>
> I searched my species-specific repeat library with InterProScan and can't
> find a single sequence with similarity to a transposable element...
>
> I would expect it to find at least a few transposases. Is there an
> explanation for this, or has something gone wrong?
>
> Thanks,
> P
>
>
> On Fri, Jul 10, 2015 at 11:50 AM, Panos Ioannidis <
> panos.ioannidis at gmail.com> wrote:
>
>> Hi guys,
>>
>> I have finished running Maker on my genome, but get >800 genes (out of
>> ~20,000) that have similarity to transposases. Except from RepBase, have
>> also built a species-specific repeat library, so it's weird that I still
>> have quite a few transposases in my gene set...
>>
>> The repeat masking-related parameters in my maker-opts.ctl file are:
>>
>> model_org=all #select a model organism for RepBase masking in RepeatMasker
>> rmlib=consensi.fa.classified #provide an organism specific repeat library
>> in fasta format for RepeatMasker
>> repeat_protein=/Home/pioannid/Programs/maker/data/te_proteins.fasta
>> #provide a fasta file of transposable element proteins for RepeatRunner
>> rm_gff= #pre-identified repeat elements from an external GFF3 file
>> prok_rm=0 #forces MAKER to repeatmask prokaryotes (no reason to change
>> this), 1 = yes, 0 = no
>> softmask=1 #use soft-masking rather than hard-masking in BLAST (i.e. seg
>> and dust filtering)
>>
>> Does anyone have an idea why I'm getting so many transposases?
>>
>> Thanks,
>> Panos
>>
>
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20150713/4f20ec6f/attachment-0003.html>
More information about the maker-devel
mailing list