From carsonhh at gmail.com Tue Sep 4 17:51:07 2018 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 4 Sep 2018 16:51:07 -0600 Subject: [maker-devel] Re-annotation of a previous annotation with "est2genome=1" In-Reply-To: References: Message-ID: > Is it possible to correct this mistake without starting from the begin? I start a new run with "est2genome=0" and using the previous gff output in several options, but it seems like it will take forever to finish. If you run in the same directory as a previous run, it will reuse archived raw reports from blast, etc. > Also, would it be necessary some filtering/edition in the "all.gff file" when put it in the options like "est_gff" and "rm_gff"? You can try that, but you do lose some extra info that is in the raw alignment report and not in the GFF3. So it?s usually better to let MAKER do the alignment from fasta and only use GFF3 passthrough for datasets that you no longer have access to. ?Carson From carson.holt at genetics.utah.edu Tue Sep 11 11:18:00 2018 From: carson.holt at genetics.utah.edu (Carson Hinton Holt) Date: Tue, 11 Sep 2018 16:18:00 +0000 Subject: [maker-devel] Plant and Animal Genome Conference 2019 Message-ID: Hello MAKER e-mail list, I just wanted to let you know I am organizing the ?Next Generation Genome Annotation and Analysis? workshop at PAG in San Diego (Jan 12-16). If you are interested in presenting an annotation related tool or annotation project at PAG at this workshop, contact me directly with your presentation proposal. Projects do not need to be MAKER related, rather we like presenters to share their experience with genome annotation. This provides practical examples of annotation that can help other researchers who may be preparing for their own annotation projects and are looking for advice as well as tools. Thanks, Carson Holt From anthony.bretaudeau at inria.fr Tue Sep 25 07:10:13 2018 From: anthony.bretaudeau at inria.fr (Anthony Bretaudeau) Date: Tue, 25 Sep 2018 14:10:13 +0200 Subject: [maker-devel] Segfault with OpenMPI Message-ID: An HTML attachment was scrubbed... URL: From dandence at gmail.com Fri Sep 28 12:21:35 2018 From: dandence at gmail.com (Daniel Ence) Date: Fri, 28 Sep 2018 13:21:35 -0400 Subject: [maker-devel] NCBI now accepts GFF Message-ID: Hi all, NCBI now accepts genome annotations in gff format. https://www.ncbi.nlm.nih.gov/genbank/genomes_gff/ No more converting to NCBI table format! ~Daniel Ence -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 1356 bytes Desc: not available URL: From liorglic at mail.tau.ac.il Sun Sep 30 13:27:20 2018 From: liorglic at mail.tau.ac.il (Lior Glick) Date: Sun, 30 Sep 2018 21:27:20 +0300 Subject: [maker-devel] Help debugging a MAKER result Message-ID: Hi MAKER users, I am new to Maker and had just finished running my first annotations. Although the results make sense in general, I have reasons to suspect some gene models are wrong and would like your help in understanding and optimizing the results. My research project involves the annotation of multiple tomato varieties (individuals) which are a bit different from the published reference genome. To this end, I created de-novo assemblies of these genomes and also generated an evidence set to be used as input for Maker. Evidence consist of a large set of transcripts from various tomato varieties and conditions, as well as full protein sets from 6 plant species, including the proteins derived from the annotation of the reference - called ITAG. For an initial QA, I tried annotating the reference genome using my evidence data and Augustus as gene predictor. This should allow me to compare my result to the ITAG annotation, which I assume to be the "correct" answer, and see how well I'm doing. I should mention that ITAG annotation was also created using Maker, followed by manual curation. I started by comparing the protein sets from my result and the ITAT set. Specifically, I ran an all-vs-all blast and took the top hits. I discovered that only about 70% of the ITAG proteins are covered by a protein from my result with a high quality alignment (evalue > 10e-5, coverage > 90%). I further investigated by running BUSCO on both protein sets and looking at BUSCOs found in ITAG but missing in my result. Attached is a screenshot from a genome browser where you can see such a case. Top track is the ITAG gene model, below is my result. Third track is the protein evidence alignments (i.e blastx and protein2genome features), and bottom track are masked repeats. As you can see, there seems to be two issues with my result: 1. The two genes in ITAG were fused into one. I guess this is a difficult case as the genes are really close together. 2. The last (3') CDS of the ITAG gene was predicted to be the 3' UTR in my result. This is in fact the reason I ended up with a truncated protein and a missing BUSCO. This is a bit surprising to me, since there seems to be quite a lot of protein evidence supporting this region as a CDS. Can you help me figure out why is the result so? Could it be due to the small repeats detected in this region? Any ideas on how my result can be improved without manual curation? Many thanks! -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker.png Type: image/png Size: 30422 bytes Desc: not available URL: From carsonhh at gmail.com Tue Sep 4 16:51:07 2018 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 4 Sep 2018 16:51:07 -0600 Subject: [maker-devel] Re-annotation of a previous annotation with "est2genome=1" In-Reply-To: References: Message-ID: > Is it possible to correct this mistake without starting from the begin? I start a new run with "est2genome=0" and using the previous gff output in several options, but it seems like it will take forever to finish. If you run in the same directory as a previous run, it will reuse archived raw reports from blast, etc. > Also, would it be necessary some filtering/edition in the "all.gff file" when put it in the options like "est_gff" and "rm_gff"? You can try that, but you do lose some extra info that is in the raw alignment report and not in the GFF3. So it?s usually better to let MAKER do the alignment from fasta and only use GFF3 passthrough for datasets that you no longer have access to. ?Carson From carson.holt at genetics.utah.edu Tue Sep 11 10:18:00 2018 From: carson.holt at genetics.utah.edu (Carson Hinton Holt) Date: Tue, 11 Sep 2018 16:18:00 +0000 Subject: [maker-devel] Plant and Animal Genome Conference 2019 Message-ID: Hello MAKER e-mail list, I just wanted to let you know I am organizing the ?Next Generation Genome Annotation and Analysis? workshop at PAG in San Diego (Jan 12-16). If you are interested in presenting an annotation related tool or annotation project at PAG at this workshop, contact me directly with your presentation proposal. Projects do not need to be MAKER related, rather we like presenters to share their experience with genome annotation. This provides practical examples of annotation that can help other researchers who may be preparing for their own annotation projects and are looking for advice as well as tools. Thanks, Carson Holt From anthony.bretaudeau at inria.fr Tue Sep 25 06:10:13 2018 From: anthony.bretaudeau at inria.fr (Anthony Bretaudeau) Date: Tue, 25 Sep 2018 14:10:13 +0200 Subject: [maker-devel] Segfault with OpenMPI Message-ID: An HTML attachment was scrubbed... URL: From dandence at gmail.com Fri Sep 28 11:21:35 2018 From: dandence at gmail.com (Daniel Ence) Date: Fri, 28 Sep 2018 13:21:35 -0400 Subject: [maker-devel] NCBI now accepts GFF Message-ID: Hi all, NCBI now accepts genome annotations in gff format. https://www.ncbi.nlm.nih.gov/genbank/genomes_gff/ No more converting to NCBI table format! ~Daniel Ence -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 1356 bytes Desc: not available URL: From liorglic at mail.tau.ac.il Sun Sep 30 12:27:20 2018 From: liorglic at mail.tau.ac.il (Lior Glick) Date: Sun, 30 Sep 2018 21:27:20 +0300 Subject: [maker-devel] Help debugging a MAKER result Message-ID: Hi MAKER users, I am new to Maker and had just finished running my first annotations. Although the results make sense in general, I have reasons to suspect some gene models are wrong and would like your help in understanding and optimizing the results. My research project involves the annotation of multiple tomato varieties (individuals) which are a bit different from the published reference genome. To this end, I created de-novo assemblies of these genomes and also generated an evidence set to be used as input for Maker. Evidence consist of a large set of transcripts from various tomato varieties and conditions, as well as full protein sets from 6 plant species, including the proteins derived from the annotation of the reference - called ITAG. For an initial QA, I tried annotating the reference genome using my evidence data and Augustus as gene predictor. This should allow me to compare my result to the ITAG annotation, which I assume to be the "correct" answer, and see how well I'm doing. I should mention that ITAG annotation was also created using Maker, followed by manual curation. I started by comparing the protein sets from my result and the ITAT set. Specifically, I ran an all-vs-all blast and took the top hits. I discovered that only about 70% of the ITAG proteins are covered by a protein from my result with a high quality alignment (evalue > 10e-5, coverage > 90%). I further investigated by running BUSCO on both protein sets and looking at BUSCOs found in ITAG but missing in my result. Attached is a screenshot from a genome browser where you can see such a case. Top track is the ITAG gene model, below is my result. Third track is the protein evidence alignments (i.e blastx and protein2genome features), and bottom track are masked repeats. As you can see, there seems to be two issues with my result: 1. The two genes in ITAG were fused into one. I guess this is a difficult case as the genes are really close together. 2. The last (3') CDS of the ITAG gene was predicted to be the 3' UTR in my result. This is in fact the reason I ended up with a truncated protein and a missing BUSCO. This is a bit surprising to me, since there seems to be quite a lot of protein evidence supporting this region as a CDS. Can you help me figure out why is the result so? Could it be due to the small repeats detected in this region? Any ideas on how my result can be improved without manual curation? Many thanks! -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker.png Type: image/png Size: 30422 bytes Desc: not available URL: From carsonhh at gmail.com Tue Sep 4 16:51:07 2018 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 4 Sep 2018 16:51:07 -0600 Subject: [maker-devel] Re-annotation of a previous annotation with "est2genome=1" In-Reply-To: References: Message-ID: > Is it possible to correct this mistake without starting from the begin? I start a new run with "est2genome=0" and using the previous gff output in several options, but it seems like it will take forever to finish. If you run in the same directory as a previous run, it will reuse archived raw reports from blast, etc. > Also, would it be necessary some filtering/edition in the "all.gff file" when put it in the options like "est_gff" and "rm_gff"? You can try that, but you do lose some extra info that is in the raw alignment report and not in the GFF3. So it?s usually better to let MAKER do the alignment from fasta and only use GFF3 passthrough for datasets that you no longer have access to. ?Carson From carson.holt at genetics.utah.edu Tue Sep 11 10:18:00 2018 From: carson.holt at genetics.utah.edu (Carson Hinton Holt) Date: Tue, 11 Sep 2018 16:18:00 +0000 Subject: [maker-devel] Plant and Animal Genome Conference 2019 Message-ID: Hello MAKER e-mail list, I just wanted to let you know I am organizing the ?Next Generation Genome Annotation and Analysis? workshop at PAG in San Diego (Jan 12-16). If you are interested in presenting an annotation related tool or annotation project at PAG at this workshop, contact me directly with your presentation proposal. Projects do not need to be MAKER related, rather we like presenters to share their experience with genome annotation. This provides practical examples of annotation that can help other researchers who may be preparing for their own annotation projects and are looking for advice as well as tools. Thanks, Carson Holt From anthony.bretaudeau at inria.fr Tue Sep 25 06:10:13 2018 From: anthony.bretaudeau at inria.fr (Anthony Bretaudeau) Date: Tue, 25 Sep 2018 14:10:13 +0200 Subject: [maker-devel] Segfault with OpenMPI Message-ID: An HTML attachment was scrubbed... URL: From dandence at gmail.com Fri Sep 28 11:21:35 2018 From: dandence at gmail.com (Daniel Ence) Date: Fri, 28 Sep 2018 13:21:35 -0400 Subject: [maker-devel] NCBI now accepts GFF Message-ID: Hi all, NCBI now accepts genome annotations in gff format. https://www.ncbi.nlm.nih.gov/genbank/genomes_gff/ No more converting to NCBI table format! ~Daniel Ence -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 1356 bytes Desc: not available URL: From liorglic at mail.tau.ac.il Sun Sep 30 12:27:20 2018 From: liorglic at mail.tau.ac.il (Lior Glick) Date: Sun, 30 Sep 2018 21:27:20 +0300 Subject: [maker-devel] Help debugging a MAKER result Message-ID: Hi MAKER users, I am new to Maker and had just finished running my first annotations. Although the results make sense in general, I have reasons to suspect some gene models are wrong and would like your help in understanding and optimizing the results. My research project involves the annotation of multiple tomato varieties (individuals) which are a bit different from the published reference genome. To this end, I created de-novo assemblies of these genomes and also generated an evidence set to be used as input for Maker. Evidence consist of a large set of transcripts from various tomato varieties and conditions, as well as full protein sets from 6 plant species, including the proteins derived from the annotation of the reference - called ITAG. For an initial QA, I tried annotating the reference genome using my evidence data and Augustus as gene predictor. This should allow me to compare my result to the ITAG annotation, which I assume to be the "correct" answer, and see how well I'm doing. I should mention that ITAG annotation was also created using Maker, followed by manual curation. I started by comparing the protein sets from my result and the ITAT set. Specifically, I ran an all-vs-all blast and took the top hits. I discovered that only about 70% of the ITAG proteins are covered by a protein from my result with a high quality alignment (evalue > 10e-5, coverage > 90%). I further investigated by running BUSCO on both protein sets and looking at BUSCOs found in ITAG but missing in my result. Attached is a screenshot from a genome browser where you can see such a case. Top track is the ITAG gene model, below is my result. Third track is the protein evidence alignments (i.e blastx and protein2genome features), and bottom track are masked repeats. As you can see, there seems to be two issues with my result: 1. The two genes in ITAG were fused into one. I guess this is a difficult case as the genes are really close together. 2. The last (3') CDS of the ITAG gene was predicted to be the 3' UTR in my result. This is in fact the reason I ended up with a truncated protein and a missing BUSCO. This is a bit surprising to me, since there seems to be quite a lot of protein evidence supporting this region as a CDS. Can you help me figure out why is the result so? Could it be due to the small repeats detected in this region? Any ideas on how my result can be improved without manual curation? Many thanks! -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker.png Type: image/png Size: 30422 bytes Desc: not available URL: From carsonhh at gmail.com Tue Sep 4 16:51:07 2018 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 4 Sep 2018 16:51:07 -0600 Subject: [maker-devel] Re-annotation of a previous annotation with "est2genome=1" In-Reply-To: References: Message-ID: > Is it possible to correct this mistake without starting from the begin? I start a new run with "est2genome=0" and using the previous gff output in several options, but it seems like it will take forever to finish. If you run in the same directory as a previous run, it will reuse archived raw reports from blast, etc. > Also, would it be necessary some filtering/edition in the "all.gff file" when put it in the options like "est_gff" and "rm_gff"? You can try that, but you do lose some extra info that is in the raw alignment report and not in the GFF3. So it?s usually better to let MAKER do the alignment from fasta and only use GFF3 passthrough for datasets that you no longer have access to. ?Carson From carson.holt at genetics.utah.edu Tue Sep 11 10:18:00 2018 From: carson.holt at genetics.utah.edu (Carson Hinton Holt) Date: Tue, 11 Sep 2018 16:18:00 +0000 Subject: [maker-devel] Plant and Animal Genome Conference 2019 Message-ID: Hello MAKER e-mail list, I just wanted to let you know I am organizing the ?Next Generation Genome Annotation and Analysis? workshop at PAG in San Diego (Jan 12-16). If you are interested in presenting an annotation related tool or annotation project at PAG at this workshop, contact me directly with your presentation proposal. Projects do not need to be MAKER related, rather we like presenters to share their experience with genome annotation. This provides practical examples of annotation that can help other researchers who may be preparing for their own annotation projects and are looking for advice as well as tools. Thanks, Carson Holt From anthony.bretaudeau at inria.fr Tue Sep 25 06:10:13 2018 From: anthony.bretaudeau at inria.fr (Anthony Bretaudeau) Date: Tue, 25 Sep 2018 14:10:13 +0200 Subject: [maker-devel] Segfault with OpenMPI Message-ID: An HTML attachment was scrubbed... URL: From dandence at gmail.com Fri Sep 28 11:21:35 2018 From: dandence at gmail.com (Daniel Ence) Date: Fri, 28 Sep 2018 13:21:35 -0400 Subject: [maker-devel] NCBI now accepts GFF Message-ID: Hi all, NCBI now accepts genome annotations in gff format. https://www.ncbi.nlm.nih.gov/genbank/genomes_gff/ No more converting to NCBI table format! ~Daniel Ence -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 1356 bytes Desc: not available URL: From liorglic at mail.tau.ac.il Sun Sep 30 12:27:20 2018 From: liorglic at mail.tau.ac.il (Lior Glick) Date: Sun, 30 Sep 2018 21:27:20 +0300 Subject: [maker-devel] Help debugging a MAKER result Message-ID: Hi MAKER users, I am new to Maker and had just finished running my first annotations. Although the results make sense in general, I have reasons to suspect some gene models are wrong and would like your help in understanding and optimizing the results. My research project involves the annotation of multiple tomato varieties (individuals) which are a bit different from the published reference genome. To this end, I created de-novo assemblies of these genomes and also generated an evidence set to be used as input for Maker. Evidence consist of a large set of transcripts from various tomato varieties and conditions, as well as full protein sets from 6 plant species, including the proteins derived from the annotation of the reference - called ITAG. For an initial QA, I tried annotating the reference genome using my evidence data and Augustus as gene predictor. This should allow me to compare my result to the ITAG annotation, which I assume to be the "correct" answer, and see how well I'm doing. I should mention that ITAG annotation was also created using Maker, followed by manual curation. I started by comparing the protein sets from my result and the ITAT set. Specifically, I ran an all-vs-all blast and took the top hits. I discovered that only about 70% of the ITAG proteins are covered by a protein from my result with a high quality alignment (evalue > 10e-5, coverage > 90%). I further investigated by running BUSCO on both protein sets and looking at BUSCOs found in ITAG but missing in my result. Attached is a screenshot from a genome browser where you can see such a case. Top track is the ITAG gene model, below is my result. Third track is the protein evidence alignments (i.e blastx and protein2genome features), and bottom track are masked repeats. As you can see, there seems to be two issues with my result: 1. The two genes in ITAG were fused into one. I guess this is a difficult case as the genes are really close together. 2. The last (3') CDS of the ITAG gene was predicted to be the 3' UTR in my result. This is in fact the reason I ended up with a truncated protein and a missing BUSCO. This is a bit surprising to me, since there seems to be quite a lot of protein evidence supporting this region as a CDS. Can you help me figure out why is the result so? Could it be due to the small repeats detected in this region? Any ideas on how my result can be improved without manual curation? Many thanks! -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker.png Type: image/png Size: 30422 bytes Desc: not available URL: