<html>

<head>

<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

</head>

<body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">

Maybe. Those two options can result in a lot of partial models. Also setting always_complete=1 will help some.

<div class=""><br class="">

</div>

<div class="">Models without M at the start are generally partial models. There is often something about the contig that keeps it from being a whole model (single basepair error breaks ORF or splice site, or a string of NNN’s overlap part of an exon). You can

 also try identifying InterPro domain and dropping any model without a defined domain (i.e. if it’s going to be partial, at least make sure it’s useful in its partial form).<br class="">

<br class="">

—Carson</div>

<div class=""><br class="">

</div>

<div class=""><br class="">

<br class="">

<div>

<blockquote type="cite" class="">

<div class="">On Mar 29, 2017, at 4:23 AM, Dario Copetti <<a href="mailto:dcopetti@email.arizona.edu" class="">dcopetti@email.arizona.edu</a>> wrote:</div>

<br class="Apple-interchange-newline">

<div class="">

<div bgcolor="#FFFFFF" text="#000000" class="">

<p class=""><font face="Arial" class="">Looking at the config file again I notice this:<br class="">

<font face="Courier New, Courier, monospace" size="-2" class="">est2genome=1 #infer gene predictions directly from ESTs, 1 = yes, 0 = no<br class="">

protein2genome=1 #infer predictions from protein homology, 1 = yes, 0 = no</font><br class="">

</font>I usually turn them on only to get models from ESTs to train Augustus and SNAP: do you think that having these parameters on during the final annotation will produce the non-M models?<br class="">

If so, do you think that re-running MAKER again with them turned off and using the MAKER-derived gff3 will clean out these models?</p>

<p class="">Can you elaborate a bit more on the usage of these two parameters?<br class="">

Thanks,</p>

<p class="">Dario<br class="">

</p>

<br class="">

<div class="moz-cite-prefix">On 3/29/2017 12:07 PM, Dario Copetti wrote:<br class="">

</div>

<blockquote cite="mid:717138b6-fc7f-8f23-e550-c3019c4f96ec@email.arizona.edu" type="cite" class="">

<p class=""><font face="Arial" class="">Hi Carson,</font></p>

<p class=""><font face="Arial" class="">We are ready to submit several different sets of annotations but we are now stuck with the issue of having models which protein sequence does not start with Met, and NCBI is picky about that.

<br class="">

Below I paste an example of a genome we are working on: as you see, most (95%) of the models start with M, but a significant fraction (almost 1500 models!) does not.</font></p>

<p class=""><font face="Arial" class="">We used MAKER 2.31.8, specifying the option of having models that only start with M. We realize that this issue may not be easy to fix - and also that there are indeed isoforms that do not start with M - but how would

 you fix this? Within or outside MAKER I mean, any help will be appreciated.</font></p>

<p class=""><font face="Arial" class="">Some time ago, Josh and Sharon (cc'd) fixed the models by having the CDS start at the first M that was in frame with the exon, and wrote a script for that.<br class="">

Is this issue maybe fixed in a newer version of MAKER? How else would you fix it or deal with NCBI genomes people?<br class="">

Thanks,</font></p>

<p class=""><font face="Arial" class="">Dario</font></p>

<p class=""><font face="Arial" class=""><br class="">

</font></p>

<p class=""><font face="Arial" class=""><font face="Courier New, Courier, monospace" size="-2" class="">grep -A1 ">" maker_proteins_161026.fasta | grep -v ">" | grep -v "\-\-" | cut -c1 | sort | uniq -c<br class="">

    106 A<br class="">

     33 C<br class="">

     69 D<br class="">

     88 E<br class="">

     53 F<br class="">

     94 G<br class="">

     34 H<br class="">

     86 I<br class="">

     77 K<br class="">

    144 L<br class="">

  28245 M<br class="">

     58 N<br class="">

     72 P<br class="">

     44 Q<br class="">

     95 R<br class="">

    142 S<br class="">

     80 T<br class="">

    114 V<br class="">

     29 W<br class="">

      6 X<br class="">

     53 Y</font><br class="">

<br class="">

</font></p>

<p class=""><br class="">

</p>

<br class="">

<pre class="moz-signature" cols="72">-- 

Dario Copetti, PhD

Research Associate | Arizona Genomics Institute

University of Arizona | BIO5

1657 E. Helen St.

Tucson, AZ  85721, USA

<a moz-do-not-send="true" class="moz-txt-link-abbreviated" href="http://www.genome.arizona.edu/">www.genome.arizona.edu</a></pre>

</blockquote>

<br class="">

<pre class="moz-signature" cols="72">-- 

Dario Copetti, PhD

Research Associate | Arizona Genomics Institute

University of Arizona | BIO5

1657 E. Helen St.

Tucson, AZ  85721, USA

<a class="moz-txt-link-abbreviated" href="http://www.genome.arizona.edu/">www.genome.arizona.edu</a></pre>

</div>

</div>

</blockquote>

</div>

<br class="">

</div>

</body>

</html>