[maker-devel] FW: extract repeats from maker output

Michael Campbell michael.s.campbell1 at gmail.com
Wed Jan 8 13:51:03 MST 2014


Hi Stefan,

MAKER does write the repeatmasker output to the final gff3 file. However,
to save space and prevent redundancy in the case of nested or overlapping
repeats the repeat regions are collapsed and named after the longest repeat
contributing to the repeat region. This means that the portion of the
genome being masked is easy to calculate, but the length of the masked
regions reported in the MAKER output will non always represent the length
of individual repetitive elements.

To get the true length distribution you could run repeat masker on the
genome outside of MAKER or look for the repeatmasker output in the void
directories in the datastore for each scaffold.

Also make sure that you are using the most recent version of MAKER. There
was one recent bug fix that affected the repeatmaksker output that was
written to the gff3 file that resulted, in some cases, in an
underestimation of the portion of the genome that was masked prior to
annotation.

I've copied this to the MAKER dev list just in case anyone else has
additional insights.

Mike


>
> Dear Mark Yandell,
>
>  I wanted to ask if it's possible to extract a file with all repeats found
> from the maker output?
>
> I would like to see how much of the genome is repeats and check their
> lengths distribution.
>
> All the best,
> Stefan Prost
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140108/c78b1525/attachment-0002.html>


More information about the maker-devel mailing list