<html>
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
</head>
<body text="#000000" bgcolor="#FFFFFF">
Hi all,<br>
<br>
I am trying to run MAKER on a project of mine and since this is the
first time I use MAKER I'd like to ask some more experienced users
what I can expect in regard to resource consumption and runtime of
MAKER.<br>
<br>
My genome data is:<br>
<br>
<ul>
<li>180.652.019 bp genome length<br>
</li>
<li>5.292 Scaffolds</li>
<li>34.136 bp median scaffold length<br>
</li>
<li>2.056.324 bp longest <br>
</li>
<li>272.065 bp N50</li>
</ul>
- I use a 73mb transcriptome assembly as EST Evidence<br>
- SwissProt as Protein Homology Evidence<br>
- 60kb custom repeat library for RepeatMasker<br>
<br>
<br>
<br>
For gene prediction I am running with a SNAP hmm I generated using
CEGMA, GeneMark, and Augustus trained by their webservice.<br>
I have options est2genome and protein2genome turned on (=1) and use
tRNAscan and snoscan. And other options as following: <br>
<br>
<small>#-----MAKER Behavior Options<br>
max_dna_len=2100000 #length for dividing up contigs into chunks
(increases/decreases memory usage) <--- Is this reasonable?<br>
min_contig=1 #skip genome contigs below this length (under 10kb
are often useless)<br>
<br>
pred_flank=200 #flank for extending evidence clusters sent to gene
predictors<br>
pred_stats=0 #report AED and QI statistics for all predictions as
well as models<br>
AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound
by 0 and 1)<br>
min_protein=0 #require at least this many amino acids in predicted
proteins<br>
alt_splice=0 #Take extra steps to try and find alternative
splicing, 1 = yes, 0 = no<br>
always_complete=0 #extra steps to force start and stop codons, 1 =
yes, 0 = no<br>
map_forward=0 #map names and attributes forward from old GFF3
genes, 1 = yes, 0 = no<br>
keep_preds=1 #Concordance threshold to add unsupported gene
prediction (bound by 0 and 1)<br>
<br>
split_hit=10000 #length for the splitting of hits (expected max
intron size for evidence alignments)<br>
single_exon=1 #consider single exon EST evidence when generating
annotations, 1 = yes, 0 = no<br>
single_length=250 #min length required for single exon ESTs if
'single_exon is enabled'<br>
correct_est_fusion=0 #limits use of ESTs in annotation to avoid
fusion genes<br>
</small><br>
The maker_bopts.ctl file is unchanged.<br>
<br>
(Basically I follow this guide
<a class="moz-txt-link-freetext" href="https://github.com/sujaikumar/assemblage/blob/master/README-annotation.md">https://github.com/sujaikumar/assemblage/blob/master/README-annotation.md</a>)<br>
<br>
<br>
At the moment I am running this with openMPI as: <br>
<br>
mpiexec -mca btl ^openib -n 128
/project/molgen/Bio/maker-2.31.8_MPI-1.8.1/bin/maker -base
maker_run1 -fix_nucleotides<br>
<br>
on 128 cores with 130GB of memory.<br>
<br>
<br>
First of all, are those options I use viable?<br>
<br>
Is it possible to guesstimate the runtime I can expect? 5 days? 20
days? And is it reasonable to use additional cores or will this not
benefit much?<br>
<br>
Thanks for your insights,<br>
Florian<br>
<br>
<br>
<br>
<br>
</body>
</html>