<html><head></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; color: rgb(0, 0, 0); font-size: 14px; font-family: Calibri, sans-serif; "><div><blockquote style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 40px; border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: none; border-width: initial; border-color: initial; padding-top: 0px; padding-right: 0px; padding-bottom: 0px; padding-left: 0px; "><div>It is a mixture of cases, and I can only look at some examples to say that. There are cases where all 3 used ab initio predictors provide models, there are blastx hits, or both blastx and protein2 genome, but no EST evidence, thus no model is retained. i guess my default parameters could be responsible for these cases at least.</div></blockquote><div><br></div><div>The only way you should be able to get BLASTX overlap and still not get a model for the region is if 1. The protein alignment in in a different reading frame then your models for every single base pair of the alignment (in which case it's not true overlap). 2. The BLASTX HSPs are stacked on each other again and again in weird rearranged overlaps to produce a very deep alignment which would mean this is a repetitive region and is not really a significant alignment. Otherwise this should not happen unless you have the AED_threshold set to some value where MAKER will ignore genes unless they have a minimum amount of support (by default this option is always off). The other two possibilities can be tested by just looking at the alignments manually in Apollo. Also take a look at the AED and eAED values for your missing genes. Anything below 1 should always be kept by MAKER by default because it has at least some evidence supported.</div><div><br></div><blockquote style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 40px; border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: none; border-width: initial; border-color: initial; padding-top: 0px; padding-right: 0px; padding-bottom: 0px; padding-left: 0px; "><div>which fasta do you refer to? The proteins file I use as evidence contains all proteins i can actually use.</div></blockquote><div><br></div><div>If they are already in your current run ignore this. </div></div><div><br></div><div>Barry provided detailed instructions on how to configure MAKER, for your particular case. So just follow his excellent instructions.</div><div><br></div><div>Thanks,</div><div>Carson</div><div><br></div><div><br></div><div><br></div><span id="OLK_SRC_BODY_SECTION"><div style="font-family:Calibri; font-size:11pt; text-align:left; color:black; BORDER-BOTTOM: medium none; BORDER-LEFT: medium none; PADDING-BOTTOM: 0in; PADDING-LEFT: 0in; PADDING-RIGHT: 0in; BORDER-TOP: #b5c4df 1pt solid; BORDER-RIGHT: medium none; PADDING-TOP: 3pt"><span style="font-weight:bold">From: </span> Barry Moore <<a href="mailto:barry.moore@genetics.utah.edu">barry.moore@genetics.utah.edu</a>><br><span style="font-weight:bold">Date: </span> Friday, 27 April, 2012 7:57 AM<br><span style="font-weight:bold">To: </span> Anastasia Gioti <<a href="mailto:anastasia.gioti@scilifelab.se">anastasia.gioti@scilifelab.se</a>><br><span style="font-weight:bold">Cc: </span> Carson Holt <<a href="mailto:carsonhh@gmail.com">carsonhh@gmail.com</a>>, <<a href="mailto:maker-devel@yandell-lab.org">maker-devel@yandell-lab.org</a>><br><span style="font-weight:bold">Subject: </span> Re: [maker-devel] Use pass-through system to add missing genes<br></div><div><br></div><div><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; ">Hi Anastasia,<div><br></div><div><div><div>On Apr 27, 2012, at 2:43 AM, Anastasia Gioti wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite"><meta http-equiv="Content-Type" content="text/html; charset=us-ascii"><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; ">Hi Carlson, <div><div><div>Thanks for your help!</div><br class="Apple-interchange-newline"><blockquote type="cite"><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; color: rgb(0, 0, 0); font-size: 14px; font-family: Calibri, sans-serif; "><div>The way you proceed depends on why the genes are not there to begin with. Are they not there because of a lack of evidence? </div></div></blockquote><br>It is a mixture of cases, and I can only look at some examples to say that. There are cases where all 3 used ab initio predictors provide models, there are blastx hits, or both blastx and protein2 genome, but no EST evidence, thus no model is retained. i guess my default parameters could be responsible for these cases at least.</div><div><br></div></div></div></blockquote><div><br></div><div>This doesn't sound right. If there are predicted models and blastx protein evidence overlapping them you should get a model retained. I know for the EST evidence that it has to support a splice site before it will be promoted and I can't remember if protein evidence is the same but certainly if you pass back those protein2genome predictions and the original proteins as evidence then they will be retained as models.</div><div><br></div><blockquote type="cite"><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><div><div><blockquote type="cite"><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; color: rgb(0, 0, 0); font-size: 14px; font-family: Calibri, sans-serif; "><div>If that's the case just adding the new fasta file should do the trick. </div></div></blockquote><br>which fasta do you refer to? The proteins file I use as evidence contains all proteins i can actually use.</div><div><br></div></div></div></blockquote><div><br></div><div>Yes using the protein fasta from the closely related species as evidence. I think you said you've already done that right?</div><div><br></div><br><blockquote type="cite"><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><div><div><blockquote type="cite"><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; color: rgb(0, 0, 0); font-size: 14px; font-family: Calibri, sans-serif; "><div>Or are they not there because an assembly error makes it impossible to get a logical model for the region (I.e reading frame breaks). </div></div></blockquote><br>This is not the case in general.</div><div><br><blockquote type="cite"><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; color: rgb(0, 0, 0); font-size: 14px; font-family: Calibri, sans-serif; "><div>Are there ab initio models already called in those regions that could just be promoted to the annotation tier? You can test that one by blasting against the nonoverlaping_abinits.fasta files.</div></div></blockquote><br>I have not done this, will do!<br><br><blockquote type="cite"><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; color: rgb(0, 0, 0); font-size: 14px; font-family: Calibri, sans-serif; "><div><br></div><div>For any of the cases described, you can provide the existing annotation set as the input in GFF3 format, and previous models will be maintained preferentially. </div></div></blockquote><br>You mean in a new maker run? is this possible with the old maker as well, not maker2, right?</div><div><br></div></div></div></blockquote><div><br></div><div>Yes, the original MAKER will do this.</div><div><br></div><br><blockquote type="cite"><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><div><div><blockquote type="cite"><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; color: rgb(0, 0, 0); font-size: 14px; font-family: Calibri, sans-serif; "><div>If you know which ab initio predictions you want to add (I.e. the ab initio promoting scenario I descibed), you can provide those predictions to the use the pred_gff option and then set keep_preds=1 and they will be maintained even without evidence. Attached is a script that would make selecting those easier. It take the MAKER generated GFF3 and a list of predictions to keep (one name per line). These might be the results of a BLAST analysis for example. It will then return the GFF3 entries for just those models selected.</div></div></blockquote><br>The thing is, for the few cases I have looked at, I cannot really decide which model is the best, and the 3 models from the ab initio predictors do not agree on the exact intron-exon junctions or the start and stop codons.</div></div></div></blockquote><blockquote type="cite"><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><div><div><blockquote type="cite"><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; color: rgb(0, 0, 0); font-size: 14px; font-family: Calibri, sans-serif; "><div><br></div><div>If the situation is more complex, just provide more detail, and I am sure we can help you come up with a plan.</div><div><br></div></div></blockquote>What i was thinking to do was to provide a gff file of alignments (eg by exonerate) to the proteins of the closely related species that i am missing, and somehow keep the previous annotations and get the extra ones by this gff file. But how exactly maker should be run to do this I am not sure. if I want to keep the previous annotations I need the gff file of the last maker run as input, but then how do I discriminate with the exonerate gff file? And which mode of rediction should be on, and with which parameters? You mention <span class="Apple-style-span" style="font-size: 14px; font-family: Calibri, sans-serif; ">keep_preds=1 for the existing annotations, but how do i also promote evidence from alignments on the same way in the same run?</span></div><div><font class="Apple-style-span" face="Calibri,sans-serif" size="4">Looks feasible though. </font><span class="Apple-style-span" style="font-family: Calibri, sans-serif; ">Thanks again,</span></div><div>Anastasia</div><div><br></div></div></div></blockquote><div><br></div><div>Let me just restate what you've said so that I can be sure that I am correct about what you've already done. You have run Maker with SNAP, Genemark and Augustus using EST from a closely related species (passed to altest) and protein evidence from other fungi. You are missing about 1,000 genes compared to the species that provided the EST alignments. You say their is good evidence that these genes exist from the alignments and I assume by this that you mean the EST/protein alignments that Maker produced.</div><div><br></div><div>1) Is the closely related fungus annotated and if so have you included it's proteins in the evidence set that you provided to Maker. If you haven't provided these proteins as evidence to maker then you should do this. You can re-run maker passing your original models back through like this:</div><div><br></div><div><div>#-----Re-annotation Using MAKER Derived GFF3</div><div>genome_gff=original_maker_annotations.gff3</div><div>est_pass=1</div><div>altest_pass=1</div><div>protein_pass=1</div><div>rm_pass=1</div><div>model_pass=1</div><div>pred_pass=1</div><div>other_pass=1</div></div><div><div><br></div><div>#-----Protein Homology Evidence (for best results provide a file for at least one)</div><div>protein=proteins_from_closely_related.fasta</div><div>## OR it sounds like you've already aligned these with exonerate?</div><div>protein_gff=proteins_from_closely_related_already_aligned.gff</div><div><br></div><div>2) If you've already included those closely related species proteins but still didn't get the 1,000 genes, then take your <span class="Apple-style-span" style="font-size: 14px; font-family: Calibri, sans-serif; ">nonoverlaping_abinits.fasta and blast them directly against your closely related proteins. Presumably they don't hit too well because if they did they should have been promoted to predictions by Maker the first time, but here you can decide yourself what thresholds to allow to keep the abinit predictions that hit the closely related species proteins. If you filter you blast hits the way you want and keep the names of the abinit predictions that pass your filter, then use the script Carson attached it it will generate a abinit precidtion GFF file with only the predictions you selected. You can then pass those predictions back to Maker and force it to keep them and Maker will turn them from predictions (match/match_part) into gene models.</span></div><div><span class="Apple-style-span" style="font-size: 14px; font-family: Calibri, sans-serif; "><br></span></div><div><span class="Apple-style-span" style="font-size: 14px; font-family: Calibri, sans-serif; "><div style="font-family: Helvetica; font-size: medium; ">#-----Re-annotation Using MAKER Derived GFF3</div><div style="font-family: Helvetica; font-size: medium; ">genome_gff=original_maker_annotations.gff3</div><div style="font-family: Helvetica; font-size: medium; ">est_pass=1</div><div style="font-family: Helvetica; font-size: medium; ">altest_pass=1</div><div style="font-family: Helvetica; font-size: medium; ">protein_pass=1</div><div style="font-family: Helvetica; font-size: medium; ">rm_pass=1</div><div style="font-family: Helvetica; font-size: medium; ">model_pass=1</div><div style="font-family: Helvetica; font-size: medium; ">pred_pass=0</div><div style="font-family: Helvetica; font-size: medium; ">other_pass=1</div></span></div><div><br></div></div><div><div>#-----Gene Prediction</div><div>snaphmm=</div><div>gmhmm=</div><div>augustus_species=</div><div>fgenesh_par_file=</div><div>pred_gff=ab_init_predictions_rescued_by_blast.gff</div><div><br></div><div><div>keep_preds=1</div><div><br></div></div><div>Barry</div></div><div><br></div><blockquote type="cite"><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><div><div><blockquote type="cite"><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; color: rgb(0, 0, 0); font-family: Calibri, sans-serif; "><div>Thanks,</div><div style="font-size: 14px; ">Carson</div><div style="font-size: 14px; "><br></div><span id="OLK_SRC_BODY_SECTION" style="font-size: 14px; "><div style="font-family:Calibri; font-size:11pt; text-align:left; color:black; BORDER-BOTTOM: medium none; BORDER-LEFT: medium none; PADDING-BOTTOM: 0in; PADDING-LEFT: 0in; PADDING-RIGHT: 0in; BORDER-TOP: #b5c4df 1pt solid; BORDER-RIGHT: medium none; PADDING-TOP: 3pt"><span style="font-weight:bold">From: </span> Anastasia Gioti <<a href="mailto:anastasia.gioti@scilifelab.se">anastasia.gioti@scilifelab.se</a>><br><span style="font-weight:bold">Date: </span> Wed, 25 Apr 2012 11:09:36 +0200<br><span style="font-weight:bold">To: </span> <<a href="mailto:maker-devel@yandell-lab.org">maker-devel@yandell-lab.org</a>><br><span style="font-weight:bold">Subject: </span> [maker-devel] Use pass-through system to add missing genes<br></div><div><br></div><div><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; ">Hi, <div>I have a set of predicted proteins from the genome of a fungus annotated by MAKER using EST data from a closely related species and 3 ab initio predictors (snap iterativelly trained 3 times, genemark trained directly on the assembly and augustus with a model from a less closely related species), along with a set of fungal proteins. I am missing ~ 1000 proteins when I compare to the species i used EST data from, and there is good evidence from alignments that these genes exist. The question is how to proceed from Blast hits to actual gene models here. The idea would be to add these genes to the existing dataset, rather than reannotate the genome. I believe that reannotating it without any further evidence such as RNA-seq from the species itself would not change much,and i d rather stick with actual predictions that i trust and have used in subsequent analyses. The 1000 genes I can accept to annotate with a less stringent and reliable way than MAKER, I just want to add them so that the difference in gene count gets corrected.</div><div>I was reading the MAKER 2 paper and i was wondering if I can use the legacy annotations scheme to do it, by providing GFF3 of the alignments between the two species in the regions where genes were missed, but as i said, I would not like to reannotate the whole genome, and running MAKER2 might cause slight changes that i d like to avoid. Is this possible? First, is it possible to provide a Gff3 file of specific locations and not the entire genome alignment? (I guess so..) Second, how can I tag the existing annotations as 'not to be changed' or alternatively, tag the new models only? How should I run maker2, with which predictors on and which off?</div><div>Thanks, </div><div>Anastasia</div><div><br></div><div><div apple-content-edited="true">Anastasia Gioti<br>Post-doctoral Researcher<br><br><a href="mailto:anastasia.gioti@scilifelab.se">anastasia.gioti@scilifelab.se</a><br><a href="mailto:anastasia.gioti@ebc.uu.se">anastasia.gioti@ebc.uu.se</a><br><br><a href="http://www.ebc.uu.se/Research/IEG/evbiol/people/pages/Gioti_Anastasia/">http://www.ebc.uu.se/Research/IEG/evbiol/people/pages/Gioti_Anastasia/</a><br><br><br></div><br></div></div></div>_______________________________________________
maker-devel mailing list
<a href="mailto:maker-devel@box290.bluehost.com">maker-devel@box290.bluehost.com</a><a href="http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org">http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org</a></span></div><span><gff3_select></span></blockquote></div><br><div>
Anastasia Gioti<br>Post-doctoral Researcher<br><br><a href="mailto:anastasia.gioti@scilifelab.se">anastasia.gioti@scilifelab.se</a><br><a href="mailto:anastasia.gioti@ebc.uu.se">anastasia.gioti@ebc.uu.se</a><br><br><a href="http://www.ebc.uu.se/Research/IEG/evbiol/people/pages/Gioti_Anastasia/">http://www.ebc.uu.se/Research/IEG/evbiol/people/pages/Gioti_Anastasia/</a><br><br><br></div><br></div></div>_______________________________________________<br>maker-devel mailing list<br><a href="mailto:maker-devel@box290.bluehost.com">maker-devel@box290.bluehost.com</a><br><a href="http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org">http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org</a><br></blockquote></div><br><div apple-content-edited="true"><div><div style="font-family: Arial; font-size: 12px; ">Barry Moore</div><div style="font-family: Arial; font-size: 12px; ">Research Scientist</div><div style="font-family: Arial; font-size: 12px; ">Dept. of Human Genetics</div><div style="font-family: Arial; font-size: 12px; ">University of Utah</div><div style="font-family: Arial; font-size: 12px; ">Salt Lake City, UT 84112</div><div style="font-family: Arial; font-size: 12px; ">--------------------------------------------</div><div style="font-family: Arial; font-size: 12px; ">(801) 585-3543</div><div style="font-family: Arial; font-size: 12px; "><br class="khtml-block-placeholder"></div></div><div><br></div><br class="Apple-interchange-newline"></div><br></div></div></div></span></body></html>