<html>

  <head>

    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    Hi all,<br>

    <br>

    I am trying to run MAKER on a project of mine and since this is the

    first time I use MAKER I'd like to ask some more experienced users

    what I can expect in regard to resource consumption and runtime of

    MAKER.<br>

    <br>

    My genome data is:<br>

    <br>

    <ul>

      <li>180.652.019 bp genome length<br>

      </li>

      <li>5.292 Scaffolds</li>

      <li>34.136 bp median scaffold length<br>

      </li>

      <li>2.056.324 bp longest <br>

      </li>

      <li>272.065 bp N50</li>

    </ul>

    - I use a 73mb transcriptome assembly as EST Evidence<br>

    - SwissProt as Protein Homology Evidence<br>

    - 60kb custom repeat library for RepeatMasker<br>

    <br>

    <br>

    <br>

    For gene prediction I am running with a SNAP hmm I generated using

    CEGMA, GeneMark, and Augustus trained by their webservice.<br>

    I have options est2genome and protein2genome turned on (=1) and use

    tRNAscan and snoscan. And other options as following: <br>

    <br>

    <small>#-----MAKER Behavior Options<br>

      max_dna_len=2100000 #length for dividing up contigs into chunks

      (increases/decreases memory usage)   <--- Is this reasonable?<br>

      min_contig=1 #skip genome contigs below this length (under 10kb

      are often useless)<br>

      <br>

      pred_flank=200 #flank for extending evidence clusters sent to gene

      predictors<br>

      pred_stats=0 #report AED and QI statistics for all predictions as

      well as models<br>

      AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound

      by 0 and 1)<br>

      min_protein=0 #require at least this many amino acids in predicted

      proteins<br>

      alt_splice=0 #Take extra steps to try and find alternative

      splicing, 1 = yes, 0 = no<br>

      always_complete=0 #extra steps to force start and stop codons, 1 =

      yes, 0 = no<br>

      map_forward=0 #map names and attributes forward from old GFF3

      genes, 1 = yes, 0 = no<br>

      keep_preds=1 #Concordance threshold to add unsupported gene

      prediction (bound by 0 and 1)<br>

      <br>

      split_hit=10000 #length for the splitting of hits (expected max

      intron size for evidence alignments)<br>

      single_exon=1 #consider single exon EST evidence when generating

      annotations, 1 = yes, 0 = no<br>

      single_length=250 #min length required for single exon ESTs if

      'single_exon is enabled'<br>

      correct_est_fusion=0 #limits use of ESTs in annotation to avoid

      fusion genes<br>

    </small><br>

    The maker_bopts.ctl file is unchanged.<br>

    <br>

    (Basically I follow this guide

<a class="moz-txt-link-freetext" href="https://github.com/sujaikumar/assemblage/blob/master/README-annotation.md">https://github.com/sujaikumar/assemblage/blob/master/README-annotation.md</a>)<br>

    <br>

    <br>

    At the moment I am running this with openMPI as: <br>

    <br>

    mpiexec -mca btl ^openib -n 128

    /project/molgen/Bio/maker-2.31.8_MPI-1.8.1/bin/maker -base

    maker_run1 -fix_nucleotides<br>

    <br>

    on 128 cores with 130GB of memory.<br>

    <br>

    <br>

    First of all, are those options I use viable?<br>

    <br>

    Is it possible to guesstimate the runtime I can expect? 5 days? 20

    days? And is it reasonable to use additional cores or will this not

    benefit much?<br>

    <br>

    Thanks for your insights,<br>

    Florian<br>

    <br>

    <br>

    <br>

    <br>

  </body>

</html>