<html>

<head>

<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

</head>

<body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">

Hi Florian, I don’t think you want est2genome or protein2genome turned on for this run. Est2genome is usually only used if you don’t have any ab-initio predictors trained; protein2genome should only be used if you have good reason not to expect any introns

 at a ll (for example, a prokaryotic genome).  

<div class=""><br class="">

</div>

<div class="">Also, you set the max_dna_len parameter for 2.1Mbp, which is larger than your N50. Setting this too large prevents MAKER from speeding up it’s analysis by splitting contigs/scaffolds across multiple processors. There’s usually no reason to change

 this from the default setting.</div>

<div class=""><br class="">

</div>

<div class="">With a good N50 like you have, you’ll probably get good results. </div>

<div class=""><br class="">

</div>

<div class="">~Daniel</div>

<div class=""><br class="">

</div>

<div class=""><br class="">

</div>

<div class=""><br class="">

</div>

<div class=""><br class="">

<div class="">Daniel Ence<br class="">

Graduate Student<br class="">

Eccles Institute of Human Genetics<br class="">

University of Utah<br class="">

15 North 2030 East, Room 2100<br class="">

Salt Lake City, UT 84112-5330 </div>

<br class="">

<div>

<blockquote type="cite" class="">

<div class="">On Feb 16, 2016, at 3:10 AM, Florian <<a href="mailto:fdolze@students.uni-mainz.de" class="">fdolze@students.uni-mainz.de</a>> wrote:</div>

<br class="Apple-interchange-newline">

<div class="">

<div text="#000000" bgcolor="#FFFFFF" class="">Hi all,<br class="">

<br class="">

I am trying to run MAKER on a project of mine and since this is the first time I use MAKER I'd like to ask some more experienced users what I can expect in regard to resource consumption and runtime of MAKER.<br class="">

<br class="">

My genome data is:<br class="">

<br class="">

<ul class="">

<li class="">180.652.019 bp genome length<br class="">

</li><li class="">5.292 Scaffolds </li><li class="">34.136 bp median scaffold length<br class="">

</li><li class="">2.056.324 bp longest <br class="">

</li><li class="">272.065 bp N50 </li></ul>

- I use a 73mb transcriptome assembly as EST Evidence<br class="">

- SwissProt as Protein Homology Evidence<br class="">

- 60kb custom repeat library for RepeatMasker<br class="">

<br class="">

<br class="">

<br class="">

For gene prediction I am running with a SNAP hmm I generated using CEGMA, GeneMark, and Augustus trained by their webservice.<br class="">

I have options est2genome and protein2genome turned on (=1) and use tRNAscan and snoscan. And other options as following:

<br class="">

<br class="">

<small class="">#-----MAKER Behavior Options<br class="">

max_dna_len=2100000 #length for dividing up contigs into chunks (increases/decreases memory usage)   <--- Is this reasonable?<br class="">

min_contig=1 #skip genome contigs below this length (under 10kb are often useless)<br class="">

<br class="">

pred_flank=200 #flank for extending evidence clusters sent to gene predictors<br class="">

pred_stats=0 #report AED and QI statistics for all predictions as well as models<br class="">

AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and 1)<br class="">

min_protein=0 #require at least this many amino acids in predicted proteins<br class="">

alt_splice=0 #Take extra steps to try and find alternative splicing, 1 = yes, 0 = no<br class="">

always_complete=0 #extra steps to force start and stop codons, 1 = yes, 0 = no<br class="">

map_forward=0 #map names and attributes forward from old GFF3 genes, 1 = yes, 0 = no<br class="">

keep_preds=1 #Concordance threshold to add unsupported gene prediction (bound by 0 and 1)<br class="">

<br class="">

split_hit=10000 #length for the splitting of hits (expected max intron size for evidence alignments)<br class="">

single_exon=1 #consider single exon EST evidence when generating annotations, 1 = yes, 0 = no<br class="">

single_length=250 #min length required for single exon ESTs if 'single_exon is enabled'<br class="">

correct_est_fusion=0 #limits use of ESTs in annotation to avoid fusion genes<br class="">

</small><br class="">

The maker_bopts.ctl file is unchanged.<br class="">

<br class="">

(Basically I follow this guide <a class="moz-txt-link-freetext" href="https://github.com/sujaikumar/assemblage/blob/master/README-annotation.md">

https://github.com/sujaikumar/assemblage/blob/master/README-annotation.md</a>)<br class="">

<br class="">

<br class="">

At the moment I am running this with openMPI as: <br class="">

<br class="">

mpiexec -mca btl ^openib -n 128 /project/molgen/Bio/maker-2.31.8_MPI-1.8.1/bin/maker -base maker_run1 -fix_nucleotides<br class="">

<br class="">

on 128 cores with 130GB of memory.<br class="">

<br class="">

<br class="">

First of all, are those options I use viable?<br class="">

<br class="">

Is it possible to guesstimate the runtime I can expect? 5 days? 20 days? And is it reasonable to use additional cores or will this not benefit much?<br class="">

<br class="">

Thanks for your insights,<br class="">

Florian<br class="">

<br class="">

<br class="">

<br class="">

<br class="">

</div>

_______________________________________________<br class="">

maker-devel mailing list<br class="">

<a href="mailto:maker-devel@box290.bluehost.com" class="">maker-devel@box290.bluehost.com</a><br class="">

http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org<br class="">

</div>

</blockquote>

</div>

<br class="">

</div>

</body>

</html>