<html><head></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; color: rgb(0, 0, 0); font-size: 14px; font-family: Calibri, sans-serif;"><div>Could you give me the file without using 'head’ to trim it, its cutting it before it reaches the part I’m interested in.</div><div><br></div><div>—Carson</div><div><br></div><div><br></div><span id="OLK_SRC_BODY_SECTION"><div style="font-family:Calibri; font-size:11pt; text-align:left; color:black; BORDER-BOTTOM: medium none; BORDER-LEFT: medium none; PADDING-BOTTOM: 0in; PADDING-LEFT: 0in; PADDING-RIGHT: 0in; BORDER-TOP: #b5c4df 1pt solid; BORDER-RIGHT: medium none; PADDING-TOP: 3pt"><span style="font-weight:bold">From: </span> dhivya arasappan <<a href="mailto:darasappan@gmail.com">darasappan@gmail.com</a>><br><span style="font-weight:bold">Date: </span> Thursday, February 6, 2014 at 10:01 AM<br><span style="font-weight:bold">To: </span> Carson Holt <<a href="mailto:carsonhh@gmail.com">carsonhh@gmail.com</a>><br><span style="font-weight:bold">Cc: </span> Daniel Ence <<a href="mailto:dence@genetics.utah.edu">dence@genetics.utah.edu</a>>, "<a href="mailto:maker-devel@yandell-lab.org">maker-devel@yandell-lab.org</a>" <<a href="mailto:maker-devel@yandell-lab.org">maker-devel@yandell-lab.org</a>><br><span style="font-weight:bold">Subject: </span> Re: [maker-devel] maker annotation with cufflinks output<br></div><div><br></div><div><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; ">Oh yes I did- I took just the non sequence entries in the gff file and used that as my input. I will rerun snap with the gff file containing the sequences as well. <div><br></div><div>I'm attaching a snippet of the gff file that I used as input to maker2zff.</div><div><br></div><div>Thanks for your help</div><div>Dhivya</div><div><br></div><div></div></div></div><div><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><div><br><div><br></div><div><br><div><div>On Feb 6, 2014, at 10:05 AM, Carson Holt wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite"><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; color: rgb(0, 0, 0); font-size: 14px; font-family: Calibri, sans-serif;"><div>Your genome.dna file has no sequence? Did you by any chance strip the fasta sequence from the GFF3 you are using as input to maker2zff? There should be fasta sequence at the end of that file. Also can I see the GFF3 file you are using as input to maker2zff.</div><div><br></div><div>Thanks,</div><div>Carson</div><div><br></div><span id="OLK_SRC_BODY_SECTION"><div style="font-family:Calibri; font-size:11pt; text-align:left; color:black; BORDER-BOTTOM: medium none; BORDER-LEFT: medium none; PADDING-BOTTOM: 0in; PADDING-LEFT: 0in; PADDING-RIGHT: 0in; BORDER-TOP: #b5c4df 1pt solid; BORDER-RIGHT: medium none; PADDING-TOP: 3pt"><span style="font-weight:bold">From: </span> dhivya arasappan <<a href="mailto:darasappan@gmail.com">darasappan@gmail.com</a>><br><span style="font-weight:bold">Date: </span> Thursday, February 6, 2014 at 7:47 AM<br><span style="font-weight:bold">To: </span> Carson Holt <<a href="mailto:carsonhh@gmail.com">carsonhh@gmail.com</a>><br><span style="font-weight:bold">Cc: </span> Daniel Ence <<a href="mailto:dence@genetics.utah.edu">dence@genetics.utah.edu</a>>, "<a href="mailto:maker-devel@yandell-lab.org">maker-devel@yandell-lab.org</a>" <<a href="mailto:maker-devel@yandell-lab.org">maker-devel@yandell-lab.org</a>><br><span style="font-weight:bold">Subject: </span> Re: [maker-devel] maker annotation with cufflinks output<br></div><div><br></div><div><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; ">Hello,<div><br></div><div>I does appear than my genome.ann file from maker2zff script has data in it. However, the SNAP steps after that have created empty files. The following are all empty:</div><div><br></div><div>alt.dna err.dna export.dna genome.dna olp.dna uni.dna wrn.dna</div><div><div>alt.ann err.ann export.ann genome.ann olp.ann uni.ann wrn.ann</div><div><br></div><div>When I tried to get gene stats or validate genome.ann, I get errors like this for all of them:</div><div><br></div><div>fathom genome.ann genome.dna -gene-stats |more</div><div><div>MODEL5547 1 1 6 + errors(6): exon-1:out_of_bounds exon-2:out_of_bounds exon-3:out_of_bounds exon-4:out_of_bounds exon-5:out_of_bounds exon-6:out_of_bounds</div><div>MODEL5568 1 1 6 - errors(6): exon-6:out_of_bounds exon-5:out_of_bounds exon-4:out_of_bounds exon-3:out_of_bounds exon-2:out_of_bounds exon-1:out_of_bounds</div><div>MODEL5589 1 1 5 + errors(5): exon-1:out_of_bounds exon-2:out_of_bounds exon-3:out_of_bounds exon-4:out_of_bounds exon-5:out_of_bounds</div><div>MODEL5195 1 1 21 + errors(21): exon-1:out_of_bounds exon-2:out_of_bounds exon-3:out_of_bounds exon-4:out_of_bounds exon-5:out_of_bounds exon-6:out_of_bounds exon-7:out_of_bounds exon-8:out_of_bounds exon-9:out_of_bounds exon-10:out_of_bounds exon-11:out_of_bounds exon-12:out_of_bounds exon-13:out_of_bounds exon-14:out_of_bounds exon-15:out_of_bounds exon-16:out_of_bounds exon-17:out_of_bounds exon-18:out_of_bounds exon-19:out_of_bounds exon-20:out_of_bounds exon-21:out_of_bounds</div></div><div><br></div><div>I'm not sure why the annotation I'm seeing in genome.ann are all showing up as errors. I realize this may be an issue with snap, but are you familiar with anything like this? My genome.ann file is attached for reference.</div><div><br></div><div>Thanks</div><div>Dhivya</div><div></div></div></div></div><div><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><div><div></div></div></div></div><div><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><div><div></div><div><br></div><div><div>On Feb 5, 2014, at 12:38 PM, Carson Holt wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite"><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; color: rgb(0, 0, 0); font-size: 14px; font-family: Calibri, sans-serif;"><div>Do you have any features of type snap in your results from step 3? We’ve had a couple of recent posts where after training snap was giving no results, and as a result maker couldn’t give any genes. One cause of something like that may be your step 2. Make sure the ZFF wasn’t empty you used to train with. The maker2zff script uses filters to only put the best genes in the off file, and if all your genes fail the filtering then you are training with an empty ZFF.</div><div><br></div><div>Also you should use proteins from a related species as your protein file. I see that you protein marches are varying wildly from run to run? So is your contig count? Were the subset of contigs you have results for long enough to contain genes?</div><div><br></div><div>—Carson</div><div><br></div><span id="OLK_SRC_BODY_SECTION"><div style="font-family:Calibri; font-size:11pt; text-align:left; color:black; BORDER-BOTTOM: medium none; BORDER-LEFT: medium none; PADDING-BOTTOM: 0in; PADDING-LEFT: 0in; PADDING-RIGHT: 0in; BORDER-TOP: #b5c4df 1pt solid; BORDER-RIGHT: medium none; PADDING-TOP: 3pt"><span style="font-weight:bold">From: </span> dhivya arasappan <<a href="mailto:darasappan@gmail.com">darasappan@gmail.com</a>><br><span style="font-weight:bold">Date: </span> Monday, February 3, 2014 at 9:31 AM<br><span style="font-weight:bold">To: </span> Daniel Ence <<a href="mailto:dence@genetics.utah.edu">dence@genetics.utah.edu</a>><br><span style="font-weight:bold">Cc: </span> "<a href="mailto:maker-devel@yandell-lab.org">maker-devel@yandell-lab.org</a>" <<a href="mailto:maker-devel@yandell-lab.org">maker-devel@yandell-lab.org</a>><br><span style="font-weight:bold">Subject: </span> Re: [maker-devel] maker annotation with cufflinks output<br></div><div><br></div><div><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; ">Hi Daniel,<div><br></div><div>I was able to check on some of those questions.</div><div><br></div><div><b>1. From trinity assembly</b>: I started with 102000 contigs. I used trinotate to annotate proteins in this.</div><div><br></div><div>I ran maker on this data with est2genome set to 1. The output looks like this (most important parts on top):</div><div><br></div><div><div><b> 6653 gene</b></div><div><b> 46675 exon</b></div><div><div><div> 280534 protein_match</div><div>59934 CDS</div></div></div><div> 969 contig</div><div> 105388 expressed_sequence_match</div><div> 12584 five_prime_UTR</div><div> 78565 match</div><div>1401369 match_part</div><div> 10180 mRNA</div><div> 11545 three_prime_UTR</div></div><div><br></div><div><b>2. From cufflinks assembly: </b>I started with 133380 entries (out of which there are 29,000 transcripts). I used the protein sequences from trinity assembly.</div><div><br></div><div>I ran maker on this data with est2genome set to 1. The output looks like this:</div><div><div><b> 29 gene</b></div><div><b><div> 75 exon</div></b><div> 573659 protein_match</div><div>67 CDS</div></div><div> 1099 contig</div><div> 269298 expressed_sequence_match</div><div> 23 five_prime_UTR</div><div> 173844 match</div><div>2221846 match_part</div><div> 29 mRNA</div><div> 23 three_prime_UTR</div></div><div><br></div><div>The genes annotated using the trinity assembly is lower than expected, so I went the cufflinks route. I dont understand why when using the cufflinks transcripts, even less genes are being found.</div><div><br></div><div><b>3. Training SNAP: </b>I used the results of maker from 1 to train SNAP. I then used that training set to rerun maker:</div><div>snaphmm=/scratch/01184/daras/jansen/RHA/allpaths/maker_mpi_withAlltrinity/snap/RHA.hmm</div><div>est2genome=0</div><div><br></div><div>And again I got results with no entries for gene, exon, CDS etc.</div><div><div>957 contig</div><div> 46555 expressed_sequence_match</div><div> 43651 match</div><div> 553633 match_part</div><div> 113738 protein_match</div></div><div><br></div><div>As I mentioned in another email, cegma results indicated that the genome was more than 90% complete. Any suggestions would be helpful.</div><div><br></div><div>Thank you</div><div>Dhivya</div><div><br></div><div><br></div><div><br></div><div><br></div><div><div>On Jan 30, 2014, at 2:51 PM, Daniel Ence wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite"><div>Hi Dhivya, <br><br>I think there a few numbers that could be helpful to understand what's happening here. <br><br>How many transcripts did Trinity assembly the RNA-seq data into? Also, you had 29,000 transcripts from cufflinks, but fewer from MAKER when you gave it the cufflinks data. How many transcripts did MAKER identify with the cufflinks data? Did you still get more than the 10,000 transcripts that you found with just the Trinity data?<br><br>A key part of MAKER's approach to genome annotation that might be affecting it's performance is that it only annotates a gene where there is both evidence (like your RNA-seq data) and an ab-initio prediction. If a prediction is unsupported by the evidence, then MAKER won't annotate a gene and if evidence aligns where there's no prediction, MAKER won't annotate a gene either. What ab-initio predictors are you using and have they been trained specific genome?<br><br>You can force MAKER to automatically promote evidence alignments to a gene model by setting the est2genome option to 1, but that will usually give you many false positives. <br><br>Try rerunning it with either the Trinity data or the Cufflinks data and with est2genome set to 1, and let us know how that affects the MAKER results. <br><br>Thanks,<br>Daniel<br><br>Daniel Ence<br>Graduate Student<br>Eccles Institute of Human Genetics<br>University of Utah<br>15 North 2030 East, Room 2100<br>Salt Lake City, UT 84112-5330<br>________________________________________<br>From: maker-devel [<a href="mailto:maker-devel-bounces@yandell-lab.org">maker-devel-bounces@yandell-lab.org</a>] on behalf of dhivya arasappan [<a href="mailto:darasappan@gmail.com">darasappan@gmail.com</a>]<br>Sent: Thursday, January 30, 2014 11:18 AM<br>To: <a href="mailto:maker-devel@yandell-lab.org">maker-devel@yandell-lab.org</a><br>Subject: [maker-devel] maker annotation with cufflinks output<br><br>Hello,<br><br>I am trying to annotate a 200 mb plant genome for which I have a very<br>good assembly.<br><br>I tried to denovo assemble RNA-seq data using trinity and ran maker<br>using my genome assembly and the trinity results. I did not get as<br>many transcripts as expected, around 10,000 transcripts.<br><br>So, I decided to try a different approach. I did a genome assisted<br>assembly of the RNA-seq data using tophat/cufflinks. This pipeline<br>generated 21,000 genes, 29,000 transcripts. I then ran maker using my<br>genome assembly and the cufflinks result. I get much less number of<br>transcripts as a result.<br><br>If cufflinks found 29000 transcripts by mapping to the genome, I'm<br>confused as to why maker is not finding the same.<br><br>Any suggestions would be appreciated.<br><br>Thanks<br>Dhivya<br><br><br>_______________________________________________<br>maker-devel mailing list<br><a href="mailto:maker-devel@box290.bluehost.com">maker-devel@box290.bluehost.com</a><br><a href="http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org">http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org</a><br></div></blockquote></div><br></div></div>_______________________________________________ maker-devel mailing list <a href="mailto:maker-devel@box290.bluehost.com">maker-devel@box290.bluehost.com</a> <a href="http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org">http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org</a> </span></div></blockquote></div><br></div></div></div></span></div></blockquote></div><br></div></div></div></div></span></body></html>