[maker-devel] MAKER and RepeatModeler
Shaun Jackman
sjackman at gmail.com
Fri Jul 17 14:29:49 MDT 2015
Hi, Carson.
It seems that RepeatModeler is not deterministic. I run it fives times on the same sequence and get very different outputs. Two of these five runs mark atp8 as a repeat, which is why I have genes blinking in and out of existence. How do folk deal with this situation? It seems absurd. What’s the cause of the non-determinism? Random number generator? Threading? Can I get deterministic behaviour if I set the seed of the random number generator and use it single-threaded? I don’t see how I can implement a reproducible pipeline with the situation as it is.
This has become a RepeatModeler question more than a MAKER question, but I thought I’d continue this thread that I’d started here.
n n:1 L50 min N80 N50 N20 E-size max sum name
6 6 1 289 7667 12403 12403 9102 12403 24293 RepeatModeler1.fa
6 6 1 332 4023 14769 14769 10920 14769 21738 RepeatModeler2.fa
6 6 1 244 370 2731 2731 1765 2731 4688 RepeatModeler3.fa
10 10 1 354 2114 17134 17134 11354 17134 30782 RepeatModeler4.fa
8 8 3 538 1093 1750 2526 1706 2526 10713 RepeatModeler5.fa
My command line is
BuildDatabase -name x -engine ncbi x.fa
RepeatModeler -database x
cp -a RM_*/consensi.fa.classified RepeatModeler.fa
I installed the following software using Homebrew on a Mac.
repeatmodeler 1.0.8
recon 1.07
repeatmasker 4.0.5
repeatscout 1.0.5
rmblast 2.2.28
trf 4.07b
Cheers,
Shaun
--
http://sjackman.ca/
On 2015-July-17 at 10:36:50 , Carson Holt (carsonhh at gmail.com) wrote:
The subset is actually built of a built of a taxonomy. So you can extract all repeats for a species or genus for example. If a term doesn’t match the internal taxonomy, it throughs an error.
—Carson
On Jul 17, 2015, at 11:24 AM, Carson Holt <carsonhh at gmail.com> wrote:
Yes. It takes a the subset of RepBase. If runtime isn’t an issue and you really want to mask as much as possible, you can also set model_org=all. Most of whatever else is in RepBase probably won’t align anywhere, but it may give you marginally better sensitivity.
—Carson
On Jul 17, 2015, at 11:20 AM, Shaun Jackman <sjackman at gmail.com> wrote:
Hi, Carson.
I set model_org=picea. I see that it created a new data base in the RepeatModeler folder Libraries/20140131/picea/specieslib. What is the effect of the model_org option? Does it extract sequences from RepBase that match the string picea?
Cheers,
Shaun
--
http://sjackman.ca/
On 2015-July-17 at 9:40:58 , Carson Holt (carsonhh at gmail.com) wrote:
That is weird.
One thought though. When you run MAKER do you supply both rmlib and model_org or just rmlib? If you are only supplying rmlib, you could try supplying both together (RepeatMasker will then run twice). That way some of the edge cases might better be identified.
—Carson
On Jul 16, 2015, at 5:25 PM, Shaun Jackman <sjackman at gmail.com> wrote:
Hi, Carson.
I removed two small contaminant contigs (~7 kbp) from the assembly (~6 Mbp), and MAKER found four fewer genes, four copies of the same atp8 gene, but these genes were not in the contaminant contigs.I figured out that it’s because I’m running RepeatModeler to create the rmlib for MAKER. When I remove the contaminant contigs, RepeatModeler now identifies this gene atp8 as being a LTR/Gypsy repeat.
Any thoughts on why removing two contigs would cause RepeatModeler to identify new repeats?
Cheers,
Shaun
--
http://sjackman.ca/
_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20150717/57d792f5/attachment-0003.html>
More information about the maker-devel
mailing list