<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">
Hi Florian, I don’t think you want est2genome or protein2genome turned on for this run. Est2genome is usually only used if you don’t have any ab-initio predictors trained; protein2genome should only be used if you have good reason not to expect any introns
at a ll (for example, a prokaryotic genome).
<div class=""><br class="">
</div>
<div class="">Also, you set the max_dna_len parameter for 2.1Mbp, which is larger than your N50. Setting this too large prevents MAKER from speeding up it’s analysis by splitting contigs/scaffolds across multiple processors. There’s usually no reason to change
this from the default setting.</div>
<div class=""><br class="">
</div>
<div class="">With a good N50 like you have, you’ll probably get good results. </div>
<div class=""><br class="">
</div>
<div class="">~Daniel</div>
<div class=""><br class="">
</div>
<div class=""><br class="">
</div>
<div class=""><br class="">
</div>
<div class=""><br class="">
<div class="">Daniel Ence<br class="">
Graduate Student<br class="">
Eccles Institute of Human Genetics<br class="">
University of Utah<br class="">
15 North 2030 East, Room 2100<br class="">
Salt Lake City, UT 84112-5330 </div>
<br class="">
<div>
<blockquote type="cite" class="">
<div class="">On Feb 16, 2016, at 3:10 AM, Florian <<a href="mailto:fdolze@students.uni-mainz.de" class="">fdolze@students.uni-mainz.de</a>> wrote:</div>
<br class="Apple-interchange-newline">
<div class="">
<div text="#000000" bgcolor="#FFFFFF" class="">Hi all,<br class="">
<br class="">
I am trying to run MAKER on a project of mine and since this is the first time I use MAKER I'd like to ask some more experienced users what I can expect in regard to resource consumption and runtime of MAKER.<br class="">
<br class="">
My genome data is:<br class="">
<br class="">
<ul class="">
<li class="">180.652.019 bp genome length<br class="">
</li><li class="">5.292 Scaffolds </li><li class="">34.136 bp median scaffold length<br class="">
</li><li class="">2.056.324 bp longest <br class="">
</li><li class="">272.065 bp N50 </li></ul>
- I use a 73mb transcriptome assembly as EST Evidence<br class="">
- SwissProt as Protein Homology Evidence<br class="">
- 60kb custom repeat library for RepeatMasker<br class="">
<br class="">
<br class="">
<br class="">
For gene prediction I am running with a SNAP hmm I generated using CEGMA, GeneMark, and Augustus trained by their webservice.<br class="">
I have options est2genome and protein2genome turned on (=1) and use tRNAscan and snoscan. And other options as following:
<br class="">
<br class="">
<small class="">#-----MAKER Behavior Options<br class="">
max_dna_len=2100000 #length for dividing up contigs into chunks (increases/decreases memory usage) <--- Is this reasonable?<br class="">
min_contig=1 #skip genome contigs below this length (under 10kb are often useless)<br class="">
<br class="">
pred_flank=200 #flank for extending evidence clusters sent to gene predictors<br class="">
pred_stats=0 #report AED and QI statistics for all predictions as well as models<br class="">
AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and 1)<br class="">
min_protein=0 #require at least this many amino acids in predicted proteins<br class="">
alt_splice=0 #Take extra steps to try and find alternative splicing, 1 = yes, 0 = no<br class="">
always_complete=0 #extra steps to force start and stop codons, 1 = yes, 0 = no<br class="">
map_forward=0 #map names and attributes forward from old GFF3 genes, 1 = yes, 0 = no<br class="">
keep_preds=1 #Concordance threshold to add unsupported gene prediction (bound by 0 and 1)<br class="">
<br class="">
split_hit=10000 #length for the splitting of hits (expected max intron size for evidence alignments)<br class="">
single_exon=1 #consider single exon EST evidence when generating annotations, 1 = yes, 0 = no<br class="">
single_length=250 #min length required for single exon ESTs if 'single_exon is enabled'<br class="">
correct_est_fusion=0 #limits use of ESTs in annotation to avoid fusion genes<br class="">
</small><br class="">
The maker_bopts.ctl file is unchanged.<br class="">
<br class="">
(Basically I follow this guide <a class="moz-txt-link-freetext" href="https://github.com/sujaikumar/assemblage/blob/master/README-annotation.md">
https://github.com/sujaikumar/assemblage/blob/master/README-annotation.md</a>)<br class="">
<br class="">
<br class="">
At the moment I am running this with openMPI as: <br class="">
<br class="">
mpiexec -mca btl ^openib -n 128 /project/molgen/Bio/maker-2.31.8_MPI-1.8.1/bin/maker -base maker_run1 -fix_nucleotides<br class="">
<br class="">
on 128 cores with 130GB of memory.<br class="">
<br class="">
<br class="">
First of all, are those options I use viable?<br class="">
<br class="">
Is it possible to guesstimate the runtime I can expect? 5 days? 20 days? And is it reasonable to use additional cores or will this not benefit much?<br class="">
<br class="">
Thanks for your insights,<br class="">
Florian<br class="">
<br class="">
<br class="">
<br class="">
<br class="">
</div>
_______________________________________________<br class="">
maker-devel mailing list<br class="">
<a href="mailto:maker-devel@box290.bluehost.com" class="">maker-devel@box290.bluehost.com</a><br class="">
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org<br class="">
</div>
</blockquote>
</div>
<br class="">
</div>
</body>
</html>