[maker-devel] MAKER and RepeatModeler

Shaun Jackman sjackman at gmail.com
Fri Jul 17 14:29:49 MDT 2015


Hi, Carson.

It seems that RepeatModeler is not deterministic. I run it fives times on the same sequence and get very different outputs. Two of these five runs mark atp8 as a repeat, which is why I have genes blinking in and out of existence. How do folk deal with this situation? It seems absurd. What’s the cause of the non-determinism? Random number generator? Threading? Can I get deterministic behaviour if I set the seed of the random number generator and use it single-threaded? I don’t see how I can implement a reproducible pipeline with the situation as it is.

This has become a RepeatModeler question more than a MAKER question, but I thought I’d continue this thread that I’d started here.

n	n:1	L50	min	N80	N50	N20	E-size	max	sum	name
6	6	1	289	7667	12403	12403	9102	12403	24293	RepeatModeler1.fa
6	6	1	332	4023	14769	14769	10920	14769	21738	RepeatModeler2.fa
6	6	1	244	370	2731	2731	1765	2731	4688	RepeatModeler3.fa
10	10	1	354	2114	17134	17134	11354	17134	30782	RepeatModeler4.fa
8	8	3	538	1093	1750	2526	1706	2526	10713	RepeatModeler5.fa
My command line is

    BuildDatabase -name x -engine ncbi x.fa
    RepeatModeler -database x
    cp -a RM_*/consensi.fa.classified RepeatModeler.fa
I installed the following software using Homebrew on a Mac.

repeatmodeler 1.0.8
recon 1.07
repeatmasker 4.0.5
repeatscout 1.0.5
rmblast 2.2.28
trf 4.07b
Cheers,
Shaun



-- 
http://sjackman.ca/

On 2015-July-17 at 10:36:50 , Carson Holt (carsonhh at gmail.com) wrote:

The subset is actually built of a built of a taxonomy. So you can extract all repeats for a species or genus for example. If a term doesn’t match the internal taxonomy, it throughs an error.

—Carson

On Jul 17, 2015, at 11:24 AM, Carson Holt <carsonhh at gmail.com> wrote:

Yes. It takes a the subset of RepBase. If runtime isn’t an issue and you really want to mask as much as possible, you can also set model_org=all.  Most of whatever else is in RepBase probably won’t align anywhere, but it may give you marginally better sensitivity.

—Carson



On Jul 17, 2015, at 11:20 AM, Shaun Jackman <sjackman at gmail.com> wrote:

Hi, Carson.

I set model_org=picea. I see that it created a new data base in the RepeatModeler folder Libraries/20140131/picea/specieslib. What is the effect of the model_org option? Does it extract sequences from RepBase that match the string picea?

Cheers,
Shaun




-- 
http://sjackman.ca/

On 2015-July-17 at 9:40:58 , Carson Holt (carsonhh at gmail.com) wrote:

That is weird.

One thought though.  When you run MAKER do you supply both rmlib and model_org or just rmlib? If you are only supplying rmlib, you could try supplying both together (RepeatMasker will then run twice).  That way some of the edge cases might better be identified.

—Carson



On Jul 16, 2015, at 5:25 PM, Shaun Jackman <sjackman at gmail.com> wrote:

Hi, Carson.

I removed two small contaminant contigs (~7 kbp) from the assembly (~6 Mbp), and MAKER found four fewer genes, four copies of the same atp8 gene, but these genes were not in the contaminant contigs.I figured out that it’s because I’m running RepeatModeler to create the rmlib for MAKER. When I remove the contaminant contigs, RepeatModeler now identifies this gene atp8 as being a LTR/Gypsy repeat.
Any thoughts on why removing two contigs would cause RepeatModeler to identify new repeats?

Cheers,
Shaun




-- 
http://sjackman.ca/

_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org




-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20150717/57d792f5/attachment-0003.html>


More information about the maker-devel mailing list