From hcma at uci.edu Thu Feb 4 18:52:12 2016 From: hcma at uci.edu (hcma) Date: Thu, 04 Feb 2016 16:52:12 -0800 Subject: [maker-devel] Q on MAKER In-Reply-To: <980827F7-C4F0-4E26-A480-F588B2E09C7B@gmail.com> References: <07C1C0FA-243E-4F05-AE17-FBECCE5BC3A5@gmail.com> <980827F7-C4F0-4E26-A480-F588B2E09C7B@gmail.com> Message-ID: <99f6989955acdf6fd6b0875affbeefa9@uci.edu> Hi, I have a genome sequence and Trinity assembly for a new species and I am wondering what are the best steps to take when using MAKER? 1. I used the genome sequence and all assembled Trinity sequence to do first run of MAKER in order to generate training set for SNAP and Augustus. In maker_opts.ctl: genome=all-chromosome-r1.04.fasta est=Trinity.fasta est2genome=1 2. Train SNAP 3. Train Augustus When i train Augustus, i only supply genome and protein file, should i also supply the trinity file here? 4. what's the best parameter to use when running MAKER the second time for obtaining the final annotation? I would prefer not to use any external protein data. genome=all-chromosome-r1.04.fasta est=Trinity.fasta est2genome=0 SNAP Augustus Thanks. Best Regards KAren From carsonhh at gmail.com Fri Feb 5 08:36:06 2016 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 5 Feb 2016 07:36:06 -0700 Subject: [maker-devel] Q on MAKER In-Reply-To: <99f6989955acdf6fd6b0875affbeefa9@uci.edu> References: <07C1C0FA-243E-4F05-AE17-FBECCE5BC3A5@gmail.com> <980827F7-C4F0-4E26-A480-F588B2E09C7B@gmail.com> <99f6989955acdf6fd6b0875affbeefa9@uci.edu> Message-ID: Hi Karen, There are many ways to train Augustus. I prefer to identify gene models in MAKER (GFF3) and use those to train both SNAP and Augustus. Here is a previous post on the topic ?> https://groups.google.com/forum/#!searchin/maker-devel/augustus/maker-devel/FWMSTdqWQqI/lC3miQtiCpwJ In the end you need to look at the SNAP and Augustus models together with evidence alignments in a genome browser (like desktop Apollo). When everything is trained well, both SNAP and Augustus models will look like each other and both seem to look like the evidence alignments. Thanks, Carson > On Feb 4, 2016, at 5:52 PM, hcma wrote: > > Hi, > > I have a genome sequence and Trinity assembly for a new species and I am wondering what are the best steps to take when using MAKER? > > 1. I used the genome sequence and all assembled Trinity sequence to do first run of MAKER in order to generate training set for SNAP and Augustus. > > In maker_opts.ctl: > genome=all-chromosome-r1.04.fasta > est=Trinity.fasta > est2genome=1 > > > 2. Train SNAP > > 3. Train Augustus > > When i train Augustus, i only supply genome and protein file, should i also supply the trinity file here? > > > 4. what's the best parameter to use when running MAKER the second time for obtaining the final annotation? I would prefer not to use any external protein data. > > genome=all-chromosome-r1.04.fasta > est=Trinity.fasta > est2genome=0 > SNAP > Augustus > > Thanks. > > Best Regards > KAren -------------- next part -------------- An HTML attachment was scrubbed... URL: From hcma at uci.edu Fri Feb 5 16:42:37 2016 From: hcma at uci.edu (hcma) Date: Fri, 05 Feb 2016 14:42:37 -0800 Subject: [maker-devel] Q on MAKER In-Reply-To: References: <07C1C0FA-243E-4F05-AE17-FBECCE5BC3A5@gmail.com> <980827F7-C4F0-4E26-A480-F588B2E09C7B@gmail.com> <99f6989955acdf6fd6b0875affbeefa9@uci.edu> Message-ID: <308df3bd49b7098b6f0d36cb2675a0de@uci.edu> Hi Dr Holt, Thanks for the email. Here is my pipeline, does it seems acceptable? Any comments is welcome and much appreciated. 1. Use maker to generate training gene set: genome=all-chromosome-r1.04.fasta est=Trinity.fasta est2genome=1 2. Use output of Maker to train SNAP: maker2zff dwil-all-chromosome-r1.04.all.gff fathom genome.ann genome.dna ?gene-stats fathom genome.ann genome.dna ?categorize 1000 fathom genome.ann genome.dna ?gene-stats fathom uni.ann uni.dna ?export 1000 ?plus hmm-assembler.pl genome . > dwil_genome.hmm 3. Use output of Maker to train Augustus on their webserver: File used: Upload ?export.dna? as the genome file Upload ?export.aa? as the protein file 4. second and final Maker run: genome=all-chromosome-r1.04.fasta est=Trinity.fasta est2genome=0 Snaphmm=output of 2 How do i incorporate the output of training set of gene from Augustus web server here into this step 4? Thanks for your time. Best Regards Karen On 2016-02-05 06:36, Carson Holt wrote: > Hi Karen, > > There are many ways to train Augustus. I prefer to identify gene > models in MAKER (GFF3) and use those to train both SNAP and Augustus. > Here is a previous post on the topic ?> > https://groups.google.com/forum/#!searchin/maker-devel/augustus/maker-devel/FWMSTdqWQqI/lC3miQtiCpwJ > [1] > > In the end you need to look at the SNAP and Augustus models together > with evidence alignments in a genome browser (like desktop Apollo). > When everything is trained well, both SNAP and Augustus models will > look like each other and both seem to look like the evidence > alignments. > > Thanks, > Carson > >> On Feb 4, 2016, at 5:52 PM, hcma wrote: >> >> Hi, >> >> I have a genome sequence and Trinity assembly for a new species and >> I am wondering what are the best steps to take when using MAKER? >> >> 1. I used the genome sequence and all assembled Trinity sequence to >> do first run of MAKER in order to generate training set for SNAP and >> Augustus. >> >> In maker_opts.ctl: >> genome=all-chromosome-r1.04.fasta >> est=Trinity.fasta >> est2genome=1 >> >> 2. Train SNAP >> >> 3. Train Augustus >> >> When i train Augustus, i only supply genome and protein file, should >> i also supply the trinity file here? >> >> 4. what's the best parameter to use when running MAKER the second >> time for obtaining the final annotation? I would prefer not to use >> any external protein data. >> >> genome=all-chromosome-r1.04.fasta >> est=Trinity.fasta >> est2genome=0 >> SNAP >> Augustus >> >> Thanks. >> >> Best Regards >> KAren > > > > Links: > ------ > [1] > https://groups.google.com/forum/#!searchin/maker-devel/augustus/maker-devel/FWMSTdqWQqI/lC3miQtiCpwJ From carsonhh at gmail.com Fri Feb 5 16:54:58 2016 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 5 Feb 2016 15:54:58 -0700 Subject: [maker-devel] Q on MAKER In-Reply-To: <308df3bd49b7098b6f0d36cb2675a0de@uci.edu> References: <07C1C0FA-243E-4F05-AE17-FBECCE5BC3A5@gmail.com> <980827F7-C4F0-4E26-A480-F588B2E09C7B@gmail.com> <99f6989955acdf6fd6b0875affbeefa9@uci.edu> <308df3bd49b7098b6f0d36cb2675a0de@uci.edu> Message-ID: Augustus gives you an entire directory rather than just a single file like SNAP. You have to take the directory and copy it to the .../augustus/config/species/ directory. Example: ?/augustus/config/species/arabidopsis/ Then ?arabidopsis? would be the species name to use with MAKER. Sometimes you may have to do a second round of both SNAP and Augustus training (called bootstrapping). Look at the models you get after the first round, and if they look good then, the second round is probably not going top be beneficial. ?Carson > On Feb 5, 2016, at 3:42 PM, hcma wrote: > > Hi Dr Holt, > > Thanks for the email. Here is my pipeline, does it seems acceptable? Any comments is welcome and much appreciated. > > > 1. Use maker to generate training gene set: > > genome=all-chromosome-r1.04.fasta > est=Trinity.fasta > est2genome=1 > > > 2. Use output of Maker to train SNAP: > > maker2zff dwil-all-chromosome-r1.04.all.gff > fathom genome.ann genome.dna ?gene-stats > fathom genome.ann genome.dna ?categorize 1000 > fathom genome.ann genome.dna ?gene-stats > fathom uni.ann uni.dna ?export 1000 ?plus > hmm-assembler.pl genome . > dwil_genome.hmm > > > 3. Use output of Maker to train Augustus on their webserver: > > File used: > > Upload ?export.dna? as the genome file > Upload ?export.aa? as the protein file > > > > 4. second and final Maker run: > > > genome=all-chromosome-r1.04.fasta > est=Trinity.fasta > est2genome=0 > Snaphmm=output of 2 > > How do i incorporate the output of training set of gene from Augustus web server here into this step 4? > > Thanks for your time. > > Best Regards > Karen > > > > > > > > > > > > On 2016-02-05 06:36, Carson Holt wrote: >> Hi Karen, >> There are many ways to train Augustus. I prefer to identify gene >> models in MAKER (GFF3) and use those to train both SNAP and Augustus. >> Here is a previous post on the topic ?> >> https://groups.google.com/forum/#!searchin/maker-devel/augustus/maker-devel/FWMSTdqWQqI/lC3miQtiCpwJ >> [1] >> In the end you need to look at the SNAP and Augustus models together >> with evidence alignments in a genome browser (like desktop Apollo). >> When everything is trained well, both SNAP and Augustus models will >> look like each other and both seem to look like the evidence >> alignments. >> Thanks, >> Carson >>> On Feb 4, 2016, at 5:52 PM, hcma wrote: >>> Hi, >>> I have a genome sequence and Trinity assembly for a new species and >>> I am wondering what are the best steps to take when using MAKER? >>> 1. I used the genome sequence and all assembled Trinity sequence to >>> do first run of MAKER in order to generate training set for SNAP and >>> Augustus. >>> In maker_opts.ctl: >>> genome=all-chromosome-r1.04.fasta >>> est=Trinity.fasta >>> est2genome=1 >>> 2. Train SNAP >>> 3. Train Augustus >>> When i train Augustus, i only supply genome and protein file, should >>> i also supply the trinity file here? >>> 4. what's the best parameter to use when running MAKER the second >>> time for obtaining the final annotation? I would prefer not to use >>> any external protein data. >>> genome=all-chromosome-r1.04.fasta >>> est=Trinity.fasta >>> est2genome=0 >>> SNAP >>> Augustus >>> Thanks. >>> Best Regards >>> KAren >> Links: >> ------ >> [1] >> https://groups.google.com/forum/#!searchin/maker-devel/augustus/maker-devel/FWMSTdqWQqI/lC3miQtiCpwJ > From hcma at uci.edu Fri Feb 5 16:58:56 2016 From: hcma at uci.edu (hcma) Date: Fri, 05 Feb 2016 14:58:56 -0800 Subject: [maker-devel] Q on MAKER In-Reply-To: References: <07C1C0FA-243E-4F05-AE17-FBECCE5BC3A5@gmail.com> <980827F7-C4F0-4E26-A480-F588B2E09C7B@gmail.com> <99f6989955acdf6fd6b0875affbeefa9@uci.edu> <308df3bd49b7098b6f0d36cb2675a0de@uci.edu> Message-ID: <4b6492c5148151cc52c91f2d56c6532b@uci.edu> Hi Carlson, These are the list of directories under maker/2.31.8 bin data GMOD INSTALL lib LICENSE MWAS perl README RELEASE src Where can i find augustus/? Or i have to ask my system admin to install this? Thanks. Best Regards Karen On 2016-02-05 14:54, Carson Holt wrote: > Augustus gives you an entire directory rather than just a single file > like SNAP. You have to take the directory and copy it to the > .../augustus/config/species/ directory. > > Example: > ?/augustus/config/species/arabidopsis/ > > Then ?arabidopsis? would be the species name to use with MAKER. > > Sometimes you may have to do a second round of both SNAP and Augustus > training (called bootstrapping). Look at the models you get after the > first round, and if they look good then, the second round is probably > not going top be beneficial. > > ?Carson > > > >> On Feb 5, 2016, at 3:42 PM, hcma wrote: >> >> Hi Dr Holt, >> >> Thanks for the email. Here is my pipeline, does it seems acceptable? >> Any comments is welcome and much appreciated. >> >> >> 1. Use maker to generate training gene set: >> >> genome=all-chromosome-r1.04.fasta >> est=Trinity.fasta >> est2genome=1 >> >> >> 2. Use output of Maker to train SNAP: >> >> maker2zff dwil-all-chromosome-r1.04.all.gff >> fathom genome.ann genome.dna ?gene-stats >> fathom genome.ann genome.dna ?categorize 1000 >> fathom genome.ann genome.dna ?gene-stats >> fathom uni.ann uni.dna ?export 1000 ?plus >> hmm-assembler.pl genome . > dwil_genome.hmm >> >> >> 3. Use output of Maker to train Augustus on their webserver: >> >> File used: >> >> Upload ?export.dna? as the genome file >> Upload ?export.aa? as the protein file >> >> >> >> 4. second and final Maker run: >> >> >> genome=all-chromosome-r1.04.fasta >> est=Trinity.fasta >> est2genome=0 >> Snaphmm=output of 2 >> >> How do i incorporate the output of training set of gene from Augustus >> web server here into this step 4? >> >> Thanks for your time. >> >> Best Regards >> Karen >> >> >> >> >> >> >> >> >> >> >> >> On 2016-02-05 06:36, Carson Holt wrote: >>> Hi Karen, >>> There are many ways to train Augustus. I prefer to identify gene >>> models in MAKER (GFF3) and use those to train both SNAP and Augustus. >>> Here is a previous post on the topic ?> >>> https://groups.google.com/forum/#!searchin/maker-devel/augustus/maker-devel/FWMSTdqWQqI/lC3miQtiCpwJ >>> [1] >>> In the end you need to look at the SNAP and Augustus models together >>> with evidence alignments in a genome browser (like desktop Apollo). >>> When everything is trained well, both SNAP and Augustus models will >>> look like each other and both seem to look like the evidence >>> alignments. >>> Thanks, >>> Carson >>>> On Feb 4, 2016, at 5:52 PM, hcma wrote: >>>> Hi, >>>> I have a genome sequence and Trinity assembly for a new species and >>>> I am wondering what are the best steps to take when using MAKER? >>>> 1. I used the genome sequence and all assembled Trinity sequence to >>>> do first run of MAKER in order to generate training set for SNAP and >>>> Augustus. >>>> In maker_opts.ctl: >>>> genome=all-chromosome-r1.04.fasta >>>> est=Trinity.fasta >>>> est2genome=1 >>>> 2. Train SNAP >>>> 3. Train Augustus >>>> When i train Augustus, i only supply genome and protein file, should >>>> i also supply the trinity file here? >>>> 4. what's the best parameter to use when running MAKER the second >>>> time for obtaining the final annotation? I would prefer not to use >>>> any external protein data. >>>> genome=all-chromosome-r1.04.fasta >>>> est=Trinity.fasta >>>> est2genome=0 >>>> SNAP >>>> Augustus >>>> Thanks. >>>> Best Regards >>>> KAren >>> Links: >>> ------ >>> [1] >>> https://groups.google.com/forum/#!searchin/maker-devel/augustus/maker-devel/FWMSTdqWQqI/lC3miQtiCpwJ >> From carsonhh at gmail.com Fri Feb 5 17:03:56 2016 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 5 Feb 2016 16:03:56 -0700 Subject: [maker-devel] Q on MAKER In-Reply-To: <4b6492c5148151cc52c91f2d56c6532b@uci.edu> References: <07C1C0FA-243E-4F05-AE17-FBECCE5BC3A5@gmail.com> <980827F7-C4F0-4E26-A480-F588B2E09C7B@gmail.com> <99f6989955acdf6fd6b0875affbeefa9@uci.edu> <308df3bd49b7098b6f0d36cb2675a0de@uci.edu> <4b6492c5148151cc52c91f2d56c6532b@uci.edu> Message-ID: <54B64F8D-D4A6-4E0F-9693-3AC4055CEA02@gmail.com> You need to find out where the augustus MAKER is using is installed. Check the maker_exe.ctl file you are using, or type ?which augustus?. ?Carson > On Feb 5, 2016, at 3:58 PM, hcma wrote: > > Hi Carlson, > > These are the list of directories under maker/2.31.8 > > bin data GMOD INSTALL lib LICENSE MWAS perl README RELEASE src > > > Where can i find augustus/? Or i have to ask my system admin to install this? > > Thanks. > > Best Regards > Karen > > > > > On 2016-02-05 14:54, Carson Holt wrote: >> Augustus gives you an entire directory rather than just a single file >> like SNAP. You have to take the directory and copy it to the >> .../augustus/config/species/ directory. >> Example: >> ?/augustus/config/species/arabidopsis/ >> Then ?arabidopsis? would be the species name to use with MAKER. >> Sometimes you may have to do a second round of both SNAP and Augustus >> training (called bootstrapping). Look at the models you get after the >> first round, and if they look good then, the second round is probably >> not going top be beneficial. >> ?Carson >>> On Feb 5, 2016, at 3:42 PM, hcma wrote: >>> Hi Dr Holt, >>> Thanks for the email. Here is my pipeline, does it seems acceptable? Any comments is welcome and much appreciated. >>> 1. Use maker to generate training gene set: >>> genome=all-chromosome-r1.04.fasta >>> est=Trinity.fasta >>> est2genome=1 >>> 2. Use output of Maker to train SNAP: >>> maker2zff dwil-all-chromosome-r1.04.all.gff >>> fathom genome.ann genome.dna ?gene-stats >>> fathom genome.ann genome.dna ?categorize 1000 >>> fathom genome.ann genome.dna ?gene-stats >>> fathom uni.ann uni.dna ?export 1000 ?plus >>> hmm-assembler.pl genome . > dwil_genome.hmm >>> 3. Use output of Maker to train Augustus on their webserver: >>> File used: >>> Upload ?export.dna? as the genome file >>> Upload ?export.aa? as the protein file >>> 4. second and final Maker run: >>> genome=all-chromosome-r1.04.fasta >>> est=Trinity.fasta >>> est2genome=0 >>> Snaphmm=output of 2 >>> How do i incorporate the output of training set of gene from Augustus web server here into this step 4? >>> Thanks for your time. >>> Best Regards >>> Karen >>> On 2016-02-05 06:36, Carson Holt wrote: >>>> Hi Karen, >>>> There are many ways to train Augustus. I prefer to identify gene >>>> models in MAKER (GFF3) and use those to train both SNAP and Augustus. >>>> Here is a previous post on the topic ?> >>>> https://groups.google.com/forum/#!searchin/maker-devel/augustus/maker-devel/FWMSTdqWQqI/lC3miQtiCpwJ >>>> [1] >>>> In the end you need to look at the SNAP and Augustus models together >>>> with evidence alignments in a genome browser (like desktop Apollo). >>>> When everything is trained well, both SNAP and Augustus models will >>>> look like each other and both seem to look like the evidence >>>> alignments. >>>> Thanks, >>>> Carson >>>>> On Feb 4, 2016, at 5:52 PM, hcma wrote: >>>>> Hi, >>>>> I have a genome sequence and Trinity assembly for a new species and >>>>> I am wondering what are the best steps to take when using MAKER? >>>>> 1. I used the genome sequence and all assembled Trinity sequence to >>>>> do first run of MAKER in order to generate training set for SNAP and >>>>> Augustus. >>>>> In maker_opts.ctl: >>>>> genome=all-chromosome-r1.04.fasta >>>>> est=Trinity.fasta >>>>> est2genome=1 >>>>> 2. Train SNAP >>>>> 3. Train Augustus >>>>> When i train Augustus, i only supply genome and protein file, should >>>>> i also supply the trinity file here? >>>>> 4. what's the best parameter to use when running MAKER the second >>>>> time for obtaining the final annotation? I would prefer not to use >>>>> any external protein data. >>>>> genome=all-chromosome-r1.04.fasta >>>>> est=Trinity.fasta >>>>> est2genome=0 >>>>> SNAP >>>>> Augustus >>>>> Thanks. >>>>> Best Regards >>>>> KAren >>>> Links: >>>> ------ >>>> [1] >>>> https://groups.google.com/forum/#!searchin/maker-devel/augustus/maker-devel/FWMSTdqWQqI/lC3miQtiCpwJ > From hcma at uci.edu Fri Feb 5 17:20:26 2016 From: hcma at uci.edu (hcma) Date: Fri, 05 Feb 2016 15:20:26 -0800 Subject: [maker-devel] Q on MAKER In-Reply-To: <54B64F8D-D4A6-4E0F-9693-3AC4055CEA02@gmail.com> References: <07C1C0FA-243E-4F05-AE17-FBECCE5BC3A5@gmail.com> <980827F7-C4F0-4E26-A480-F588B2E09C7B@gmail.com> <99f6989955acdf6fd6b0875affbeefa9@uci.edu> <308df3bd49b7098b6f0d36cb2675a0de@uci.edu> <4b6492c5148151cc52c91f2d56c6532b@uci.edu> <54B64F8D-D4A6-4E0F-9693-3AC4055CEA02@gmail.com> Message-ID: <5a40b7af9947dc8297046ba52620569e@uci.edu> Hi Carlson, Thanks for the instruction and in maker_exe.ctl, i only see path to snap, but not to augustus, so my system admin is checking this for me. From some manual i found, people use both snap and augustus when using MAKER to annotate genomes. Would you recommend using both or one of the 2 is sufficient? Thanks for your valuable time and advise. Best Regards Karen On 2016-02-05 15:03, Carson Holt wrote: > You need to find out where the augustus MAKER is using is installed. > Check the maker_exe.ctl file you are using, or type ?which augustus?. > > ?Carson > > >> On Feb 5, 2016, at 3:58 PM, hcma wrote: >> >> Hi Carlson, >> >> These are the list of directories under maker/2.31.8 >> >> bin data GMOD INSTALL lib LICENSE MWAS perl README RELEASE >> src >> >> >> Where can i find augustus/? Or i have to ask my system admin to >> install this? >> >> Thanks. >> >> Best Regards >> Karen >> >> >> >> >> On 2016-02-05 14:54, Carson Holt wrote: >>> Augustus gives you an entire directory rather than just a single file >>> like SNAP. You have to take the directory and copy it to the >>> .../augustus/config/species/ directory. >>> Example: >>> ?/augustus/config/species/arabidopsis/ >>> Then ?arabidopsis? would be the species name to use with MAKER. >>> Sometimes you may have to do a second round of both SNAP and Augustus >>> training (called bootstrapping). Look at the models you get after the >>> first round, and if they look good then, the second round is probably >>> not going top be beneficial. >>> ?Carson >>>> On Feb 5, 2016, at 3:42 PM, hcma wrote: >>>> Hi Dr Holt, >>>> Thanks for the email. Here is my pipeline, does it seems acceptable? >>>> Any comments is welcome and much appreciated. >>>> 1. Use maker to generate training gene set: >>>> genome=all-chromosome-r1.04.fasta >>>> est=Trinity.fasta >>>> est2genome=1 >>>> 2. Use output of Maker to train SNAP: >>>> maker2zff dwil-all-chromosome-r1.04.all.gff >>>> fathom genome.ann genome.dna ?gene-stats >>>> fathom genome.ann genome.dna ?categorize 1000 >>>> fathom genome.ann genome.dna ?gene-stats >>>> fathom uni.ann uni.dna ?export 1000 ?plus >>>> hmm-assembler.pl genome . > dwil_genome.hmm >>>> 3. Use output of Maker to train Augustus on their webserver: >>>> File used: >>>> Upload ?export.dna? as the genome file >>>> Upload ?export.aa? as the protein file >>>> 4. second and final Maker run: >>>> genome=all-chromosome-r1.04.fasta >>>> est=Trinity.fasta >>>> est2genome=0 >>>> Snaphmm=output of 2 >>>> How do i incorporate the output of training set of gene from >>>> Augustus web server here into this step 4? >>>> Thanks for your time. >>>> Best Regards >>>> Karen >>>> On 2016-02-05 06:36, Carson Holt wrote: >>>>> Hi Karen, >>>>> There are many ways to train Augustus. I prefer to identify gene >>>>> models in MAKER (GFF3) and use those to train both SNAP and >>>>> Augustus. >>>>> Here is a previous post on the topic ?> >>>>> https://groups.google.com/forum/#!searchin/maker-devel/augustus/maker-devel/FWMSTdqWQqI/lC3miQtiCpwJ >>>>> [1] >>>>> In the end you need to look at the SNAP and Augustus models >>>>> together >>>>> with evidence alignments in a genome browser (like desktop Apollo). >>>>> When everything is trained well, both SNAP and Augustus models will >>>>> look like each other and both seem to look like the evidence >>>>> alignments. >>>>> Thanks, >>>>> Carson >>>>>> On Feb 4, 2016, at 5:52 PM, hcma wrote: >>>>>> Hi, >>>>>> I have a genome sequence and Trinity assembly for a new species >>>>>> and >>>>>> I am wondering what are the best steps to take when using MAKER? >>>>>> 1. I used the genome sequence and all assembled Trinity sequence >>>>>> to >>>>>> do first run of MAKER in order to generate training set for SNAP >>>>>> and >>>>>> Augustus. >>>>>> In maker_opts.ctl: >>>>>> genome=all-chromosome-r1.04.fasta >>>>>> est=Trinity.fasta >>>>>> est2genome=1 >>>>>> 2. Train SNAP >>>>>> 3. Train Augustus >>>>>> When i train Augustus, i only supply genome and protein file, >>>>>> should >>>>>> i also supply the trinity file here? >>>>>> 4. what's the best parameter to use when running MAKER the second >>>>>> time for obtaining the final annotation? I would prefer not to use >>>>>> any external protein data. >>>>>> genome=all-chromosome-r1.04.fasta >>>>>> est=Trinity.fasta >>>>>> est2genome=0 >>>>>> SNAP >>>>>> Augustus >>>>>> Thanks. >>>>>> Best Regards >>>>>> KAren >>>>> Links: >>>>> ------ >>>>> [1] >>>>> https://groups.google.com/forum/#!searchin/maker-devel/augustus/maker-devel/FWMSTdqWQqI/lC3miQtiCpwJ >> From carsonhh at gmail.com Fri Feb 5 17:33:23 2016 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 5 Feb 2016 16:33:23 -0700 Subject: [maker-devel] Q on MAKER In-Reply-To: <5a40b7af9947dc8297046ba52620569e@uci.edu> References: <07C1C0FA-243E-4F05-AE17-FBECCE5BC3A5@gmail.com> <980827F7-C4F0-4E26-A480-F588B2E09C7B@gmail.com> <99f6989955acdf6fd6b0875affbeefa9@uci.edu> <308df3bd49b7098b6f0d36cb2675a0de@uci.edu> <4b6492c5148151cc52c91f2d56c6532b@uci.edu> <54B64F8D-D4A6-4E0F-9693-3AC4055CEA02@gmail.com> <5a40b7af9947dc8297046ba52620569e@uci.edu> Message-ID: <9449FD1E-5D22-4B0B-A61A-644D7B63335B@gmail.com> I recommend using both. You probably don't have augustus installed. --Carson Sent from my iPhone > On Feb 5, 2016, at 4:20 PM, hcma wrote: > > Hi Carlson, > > Thanks for the instruction and in maker_exe.ctl, i only see path to snap, but not to augustus, so my system admin is checking this for me. > > From some manual i found, people use both snap and augustus when using MAKER to annotate genomes. Would you recommend using both or one of the 2 is sufficient? > > Thanks for your valuable time and advise. > > Best Regards > Karen > > > > > >> On 2016-02-05 15:03, Carson Holt wrote: >> You need to find out where the augustus MAKER is using is installed. >> Check the maker_exe.ctl file you are using, or type ?which augustus?. >> ?Carson >>> On Feb 5, 2016, at 3:58 PM, hcma wrote: >>> Hi Carlson, >>> These are the list of directories under maker/2.31.8 >>> bin data GMOD INSTALL lib LICENSE MWAS perl README RELEASE src >>> Where can i find augustus/? Or i have to ask my system admin to install this? >>> Thanks. >>> Best Regards >>> Karen >>>> On 2016-02-05 14:54, Carson Holt wrote: >>>> Augustus gives you an entire directory rather than just a single file >>>> like SNAP. You have to take the directory and copy it to the >>>> .../augustus/config/species/ directory. >>>> Example: >>>> ?/augustus/config/species/arabidopsis/ >>>> Then ?arabidopsis? would be the species name to use with MAKER. >>>> Sometimes you may have to do a second round of both SNAP and Augustus >>>> training (called bootstrapping). Look at the models you get after the >>>> first round, and if they look good then, the second round is probably >>>> not going top be beneficial. >>>> ?Carson >>>>> On Feb 5, 2016, at 3:42 PM, hcma wrote: >>>>> Hi Dr Holt, >>>>> Thanks for the email. Here is my pipeline, does it seems acceptable? Any comments is welcome and much appreciated. >>>>> 1. Use maker to generate training gene set: >>>>> genome=all-chromosome-r1.04.fasta >>>>> est=Trinity.fasta >>>>> est2genome=1 >>>>> 2. Use output of Maker to train SNAP: >>>>> maker2zff dwil-all-chromosome-r1.04.all.gff >>>>> fathom genome.ann genome.dna ?gene-stats >>>>> fathom genome.ann genome.dna ?categorize 1000 >>>>> fathom genome.ann genome.dna ?gene-stats >>>>> fathom uni.ann uni.dna ?export 1000 ?plus >>>>> hmm-assembler.pl genome . > dwil_genome.hmm >>>>> 3. Use output of Maker to train Augustus on their webserver: >>>>> File used: >>>>> Upload ?export.dna? as the genome file >>>>> Upload ?export.aa? as the protein file >>>>> 4. second and final Maker run: >>>>> genome=all-chromosome-r1.04.fasta >>>>> est=Trinity.fasta >>>>> est2genome=0 >>>>> Snaphmm=output of 2 >>>>> How do i incorporate the output of training set of gene from Augustus web server here into this step 4? >>>>> Thanks for your time. >>>>> Best Regards >>>>> Karen >>>>>> On 2016-02-05 06:36, Carson Holt wrote: >>>>>> Hi Karen, >>>>>> There are many ways to train Augustus. I prefer to identify gene >>>>>> models in MAKER (GFF3) and use those to train both SNAP and Augustus. >>>>>> Here is a previous post on the topic ?> >>>>>> https://groups.google.com/forum/#!searchin/maker-devel/augustus/maker-devel/FWMSTdqWQqI/lC3miQtiCpwJ >>>>>> [1] >>>>>> In the end you need to look at the SNAP and Augustus models together >>>>>> with evidence alignments in a genome browser (like desktop Apollo). >>>>>> When everything is trained well, both SNAP and Augustus models will >>>>>> look like each other and both seem to look like the evidence >>>>>> alignments. >>>>>> Thanks, >>>>>> Carson >>>>>>> On Feb 4, 2016, at 5:52 PM, hcma wrote: >>>>>>> Hi, >>>>>>> I have a genome sequence and Trinity assembly for a new species and >>>>>>> I am wondering what are the best steps to take when using MAKER? >>>>>>> 1. I used the genome sequence and all assembled Trinity sequence to >>>>>>> do first run of MAKER in order to generate training set for SNAP and >>>>>>> Augustus. >>>>>>> In maker_opts.ctl: >>>>>>> genome=all-chromosome-r1.04.fasta >>>>>>> est=Trinity.fasta >>>>>>> est2genome=1 >>>>>>> 2. Train SNAP >>>>>>> 3. Train Augustus >>>>>>> When i train Augustus, i only supply genome and protein file, should >>>>>>> i also supply the trinity file here? >>>>>>> 4. what's the best parameter to use when running MAKER the second >>>>>>> time for obtaining the final annotation? I would prefer not to use >>>>>>> any external protein data. >>>>>>> genome=all-chromosome-r1.04.fasta >>>>>>> est=Trinity.fasta >>>>>>> est2genome=0 >>>>>>> SNAP >>>>>>> Augustus >>>>>>> Thanks. >>>>>>> Best Regards >>>>>>> KAren >>>>>> Links: >>>>>> ------ >>>>>> [1] >>>>>> https://groups.google.com/forum/#!searchin/maker-devel/augustus/maker-devel/FWMSTdqWQqI/lC3miQtiCpwJ > From dcard at uta.edu Mon Feb 8 10:05:21 2016 From: dcard at uta.edu (Card, Daren C) Date: Mon, 8 Feb 2016 10:05:21 -0600 Subject: [maker-devel] Most scaffolds fail with BadParameter error, Maker on TACC Lonestar Message-ID: <38614065-4DEF-47B4-8100-BD18901D2592@uta.edu> Hello, I?ve tried to run Maker on TACC Lonestar (4, trying to squeeze some last things in before deprecation), but I haven?t had much success. I get Maker to run, but only 28 proteins/transcripts are annotated and most scaffolds fail to finish properly, according to the ?master_datastore_index.log. In my STDERR, I see a consistent error show up for many scaffolds: ------------- EXCEPTION: Bio::Root::BadParameter ------------- MSG: ' 7.5' is not a valid score VALUE: 7.5 STACK: Error::throw STACK: Bio::Root::Root::throw /opt/apps/maker/2.30/bin/../perl/lib/Bio/Root/Root.pm:486 STACK: Bio::SeqFeature::Generic::score /opt/apps/maker/2.30/bin/../perl/lib/Bio/SeqFeature/Generic.pm:468 STACK: GFFDB::_ary_to_features /opt/apps/maker/2.30/bin/../lib/GFFDB.pm:891 STACK: GFFDB::phathits_on_chunk /opt/apps/maker/2.30/bin/../lib/GFFDB.pm:534 STACK: Process::MpiChunk::_go /opt/apps/maker/2.30/bin/../lib/Process/MpiChunk.pm:756 STACK: Process::MpiChunk::run /opt/apps/maker/2.30/bin/../lib/Process/MpiChunk.pm:341 STACK: main::node_thread /opt/apps/maker/2.30/bin/maker:1433 STACK: threads::new /opt/apps/maker/2.30/bin/../perl/lib/forks.pm:799 STACK: /opt/apps/maker/2.30/bin/maker:901 -------------------------------------------------------------- --> rank=18, hostname=c304-113.ls4.tacc.utexas.edu ERROR: Failed while doing repeat masking ERROR: Chunk failed at level:0, tier_type:1 FAILED CONTIG:scaffold279|size418813 The ?7.5? value can vary between errors, but other than that and the scaffold ID, the rest of the error message is the same. I obviously don?t have the expertise to diagnose the issue here, but I?m hoping someone can help me sort this out. A quick, unrelated question, is whether the Yandell lab (or anyone else) has a script that will produce a CDS file (multi-FASTA file) from a GFF annotation and FASTA genome sequence. I?m trying to produce a CDS from some NCBI genomes (annoying that it isn?t already included from NCBI), but the script I produced to do this is giving some suspect results. I figured if anyone had a well-tested script for this purpose, it would be someone on this list. Best, Daren Daren Card Ph.D. Candidate Castoe Lab University of Texas at Arlington dcard at uta.edu www.darencard.net From carsonhh at gmail.com Mon Feb 8 10:31:08 2016 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 8 Feb 2016 09:31:08 -0700 Subject: [maker-devel] Most scaffolds fail with BadParameter error, Maker on TACC Lonestar In-Reply-To: <38614065-4DEF-47B4-8100-BD18901D2592@uta.edu> References: <38614065-4DEF-47B4-8100-BD18901D2592@uta.edu> Message-ID: <9BA957A0-DD0F-4920-A778-65D0DE10F1ED@gmail.com> It?s failing because there is something wrong with the format of the input GFF file. It might not be GFF3, it may be GTF format, it may have mixed types (not just gene/mRNA/exon/CDS models), or it may have a missing Parent= or ID= tag required to generate the proper feature relationship. You can try and use GAL (http://www.sequenceontology.org/software/GAL.html ) to help validate of convert the format. Also note the message ?> MSG: ' 7.5' is not a valid score There is an extra whitespace inside the single quotes which probably means you have contaminating whitespace before the value. GFF3 is tab delimited, space characters are not permitted, and if required must be escaped following URI escaping convention. ?Carson > On Feb 8, 2016, at 9:05 AM, Card, Daren C wrote: > > Hello, > > I?ve tried to run Maker on TACC Lonestar (4, trying to squeeze some last things in before deprecation), but I haven?t had much success. I get Maker to run, but only 28 proteins/transcripts are annotated and most scaffolds fail to finish properly, according to the ?master_datastore_index.log. In my STDERR, I see a consistent error show up for many scaffolds: > > ------------- EXCEPTION: Bio::Root::BadParameter ------------- > MSG: ' 7.5' is not a valid score > VALUE: 7.5 > STACK: Error::throw > STACK: Bio::Root::Root::throw /opt/apps/maker/2.30/bin/../perl/lib/Bio/Root/Root.pm:486 > STACK: Bio::SeqFeature::Generic::score /opt/apps/maker/2.30/bin/../perl/lib/Bio/SeqFeature/Generic.pm:468 > STACK: GFFDB::_ary_to_features /opt/apps/maker/2.30/bin/../lib/GFFDB.pm:891 > STACK: GFFDB::phathits_on_chunk /opt/apps/maker/2.30/bin/../lib/GFFDB.pm:534 > STACK: Process::MpiChunk::_go /opt/apps/maker/2.30/bin/../lib/Process/MpiChunk.pm:756 > STACK: Process::MpiChunk::run /opt/apps/maker/2.30/bin/../lib/Process/MpiChunk.pm:341 > STACK: main::node_thread /opt/apps/maker/2.30/bin/maker:1433 > STACK: threads::new /opt/apps/maker/2.30/bin/../perl/lib/forks.pm:799 > STACK: /opt/apps/maker/2.30/bin/maker:901 > -------------------------------------------------------------- > --> rank=18, hostname=c304-113.ls4.tacc.utexas.edu > ERROR: Failed while doing repeat masking > ERROR: Chunk failed at level:0, tier_type:1 > FAILED CONTIG:scaffold279|size418813 > > The ?7.5? value can vary between errors, but other than that and the scaffold ID, the rest of the error message is the same. I obviously don?t have the expertise to diagnose the issue here, but I?m hoping someone can help me sort this out. > > A quick, unrelated question, is whether the Yandell lab (or anyone else) has a script that will produce a CDS file (multi-FASTA file) from a GFF annotation and FASTA genome sequence. I?m trying to produce a CDS from some NCBI genomes (annoying that it isn?t already included from NCBI), but the script I produced to do this is giving some suspect results. I figured if anyone had a well-tested script for this purpose, it would be someone on this list. > > Best, > Daren > > > Daren Card > Ph.D. Candidate > Castoe Lab > University of Texas at Arlington > dcard at uta.edu > www.darencard.net > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From hcma at uci.edu Tue Feb 9 16:35:13 2016 From: hcma at uci.edu (hcma) Date: Tue, 09 Feb 2016 14:35:13 -0800 Subject: [maker-devel] Q on MAKER In-Reply-To: <9449FD1E-5D22-4B0B-A61A-644D7B63335B@gmail.com> References: <07C1C0FA-243E-4F05-AE17-FBECCE5BC3A5@gmail.com> <980827F7-C4F0-4E26-A480-F588B2E09C7B@gmail.com> <99f6989955acdf6fd6b0875affbeefa9@uci.edu> <308df3bd49b7098b6f0d36cb2675a0de@uci.edu> <4b6492c5148151cc52c91f2d56c6532b@uci.edu> <54B64F8D-D4A6-4E0F-9693-3AC4055CEA02@gmail.com> <5a40b7af9947dc8297046ba52620569e@uci.edu> <9449FD1E-5D22-4B0B-A61A-644D7B63335B@gmail.com> Message-ID: <7e4d6f2773f654f8530155936b648832@uci.edu> Hi Carson, For the final run of annotation, I would like to incorporate tophat results from RNA-seq data, from your experience, do you know if it is better to use raw RNA-seq (Illumina paired-end data) or trimmed (trimmed using Trimmomatuc) data for feeding into tophat? If trimmed, do you recommend a particular programme? Thanks for your time. Best Regards KAren On 2016-02-05 15:33, Carson Holt wrote: > I recommend using both. You probably don't have augustus installed. > > --Carson > > Sent from my iPhone > >> On Feb 5, 2016, at 4:20 PM, hcma wrote: >> >> Hi Carlson, >> >> Thanks for the instruction and in maker_exe.ctl, i only see path to >> snap, but not to augustus, so my system admin is checking this for me. >> >> From some manual i found, people use both snap and augustus when using >> MAKER to annotate genomes. Would you recommend using both or one of >> the 2 is sufficient? >> >> Thanks for your valuable time and advise. >> >> Best Regards >> Karen >> >> >> >> >> >>> On 2016-02-05 15:03, Carson Holt wrote: >>> You need to find out where the augustus MAKER is using is installed. >>> Check the maker_exe.ctl file you are using, or type ?which augustus?. >>> ?Carson >>>> On Feb 5, 2016, at 3:58 PM, hcma wrote: >>>> Hi Carlson, >>>> These are the list of directories under maker/2.31.8 >>>> bin data GMOD INSTALL lib LICENSE MWAS perl README RELEASE >>>> src >>>> Where can i find augustus/? Or i have to ask my system admin to >>>> install this? >>>> Thanks. >>>> Best Regards >>>> Karen >>>>> On 2016-02-05 14:54, Carson Holt wrote: >>>>> Augustus gives you an entire directory rather than just a single >>>>> file >>>>> like SNAP. You have to take the directory and copy it to the >>>>> .../augustus/config/species/ directory. >>>>> Example: >>>>> ?/augustus/config/species/arabidopsis/ >>>>> Then ?arabidopsis? would be the species name to use with MAKER. >>>>> Sometimes you may have to do a second round of both SNAP and >>>>> Augustus >>>>> training (called bootstrapping). Look at the models you get after >>>>> the >>>>> first round, and if they look good then, the second round is >>>>> probably >>>>> not going top be beneficial. >>>>> ?Carson >>>>>> On Feb 5, 2016, at 3:42 PM, hcma wrote: >>>>>> Hi Dr Holt, >>>>>> Thanks for the email. Here is my pipeline, does it seems >>>>>> acceptable? Any comments is welcome and much appreciated. >>>>>> 1. Use maker to generate training gene set: >>>>>> genome=all-chromosome-r1.04.fasta >>>>>> est=Trinity.fasta >>>>>> est2genome=1 >>>>>> 2. Use output of Maker to train SNAP: >>>>>> maker2zff dwil-all-chromosome-r1.04.all.gff >>>>>> fathom genome.ann genome.dna ?gene-stats >>>>>> fathom genome.ann genome.dna ?categorize 1000 >>>>>> fathom genome.ann genome.dna ?gene-stats >>>>>> fathom uni.ann uni.dna ?export 1000 ?plus >>>>>> hmm-assembler.pl genome . > dwil_genome.hmm >>>>>> 3. Use output of Maker to train Augustus on their webserver: >>>>>> File used: >>>>>> Upload ?export.dna? as the genome file >>>>>> Upload ?export.aa? as the protein file >>>>>> 4. second and final Maker run: >>>>>> genome=all-chromosome-r1.04.fasta >>>>>> est=Trinity.fasta >>>>>> est2genome=0 >>>>>> Snaphmm=output of 2 >>>>>> How do i incorporate the output of training set of gene from >>>>>> Augustus web server here into this step 4? >>>>>> Thanks for your time. >>>>>> Best Regards >>>>>> Karen >>>>>>> On 2016-02-05 06:36, Carson Holt wrote: >>>>>>> Hi Karen, >>>>>>> There are many ways to train Augustus. I prefer to identify gene >>>>>>> models in MAKER (GFF3) and use those to train both SNAP and >>>>>>> Augustus. >>>>>>> Here is a previous post on the topic ?> >>>>>>> https://groups.google.com/forum/#!searchin/maker-devel/augustus/maker-devel/FWMSTdqWQqI/lC3miQtiCpwJ >>>>>>> [1] >>>>>>> In the end you need to look at the SNAP and Augustus models >>>>>>> together >>>>>>> with evidence alignments in a genome browser (like desktop >>>>>>> Apollo). >>>>>>> When everything is trained well, both SNAP and Augustus models >>>>>>> will >>>>>>> look like each other and both seem to look like the evidence >>>>>>> alignments. >>>>>>> Thanks, >>>>>>> Carson >>>>>>>> On Feb 4, 2016, at 5:52 PM, hcma wrote: >>>>>>>> Hi, >>>>>>>> I have a genome sequence and Trinity assembly for a new species >>>>>>>> and >>>>>>>> I am wondering what are the best steps to take when using MAKER? >>>>>>>> 1. I used the genome sequence and all assembled Trinity sequence >>>>>>>> to >>>>>>>> do first run of MAKER in order to generate training set for SNAP >>>>>>>> and >>>>>>>> Augustus. >>>>>>>> In maker_opts.ctl: >>>>>>>> genome=all-chromosome-r1.04.fasta >>>>>>>> est=Trinity.fasta >>>>>>>> est2genome=1 >>>>>>>> 2. Train SNAP >>>>>>>> 3. Train Augustus >>>>>>>> When i train Augustus, i only supply genome and protein file, >>>>>>>> should >>>>>>>> i also supply the trinity file here? >>>>>>>> 4. what's the best parameter to use when running MAKER the >>>>>>>> second >>>>>>>> time for obtaining the final annotation? I would prefer not to >>>>>>>> use >>>>>>>> any external protein data. >>>>>>>> genome=all-chromosome-r1.04.fasta >>>>>>>> est=Trinity.fasta >>>>>>>> est2genome=0 >>>>>>>> SNAP >>>>>>>> Augustus >>>>>>>> Thanks. >>>>>>>> Best Regards >>>>>>>> KAren >>>>>>> Links: >>>>>>> ------ >>>>>>> [1] >>>>>>> https://groups.google.com/forum/#!searchin/maker-devel/augustus/maker-devel/FWMSTdqWQqI/lC3miQtiCpwJ >> From jgallant at msu.edu Tue Feb 9 20:36:51 2016 From: jgallant at msu.edu (Jason Gallant) Date: Wed, 10 Feb 2016 02:36:51 +0000 Subject: [maker-devel] Extract FASTA Sequences from "Maker Standard" Build Message-ID: Hi Everyone, Quick question? I?ve run through Mike Cambell?s tutorial on building ?Maker Standard?, ?Maker Default? and ?Maker Max? datasets. I?ve decided that the ?Maker Standard? data (Transcripts with Evidence and/or IPR scan hits) makes the most sense for what we?re trying to do. Is there an easy way to create the fasta files associated with the maker standard build? Fasta_merge typically outputs a variety of .fasta files, which I?ve been able to create following this protocol for the ?maker max? dataset. I?d like to get these for the ?maker standard? build. Currently, the datastore contains the data for the ?maker max? data. One way, i suppose would be to re-run MAKER with the maker standard gff file, but it seems like an overly complicated way of doing it?? Any suggestions Mike (or others?) Has anyone written a script to do this automagically? Best, Jason Gallant -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.s.campbell1 at gmail.com Wed Feb 10 08:03:29 2016 From: michael.s.campbell1 at gmail.com (Michael Campbell) Date: Wed, 10 Feb 2016 09:03:29 -0500 Subject: [maker-devel] Extract FASTA Sequences from "Maker Standard" Build In-Reply-To: References: Message-ID: <2F89E4BC-C473-40A9-AE81-EAA2323B17D0@gmail.com> Hi Jason, Rerunning MAKER with the standard gff3 file would work, but for speed I would use the fasta_tool accessory script that is bundled with MAKER. All you need to make is a file with the list of transcript names from the standard gff3. Then you can use fasta_tool with the --select ooption to return all of the FASTA sequences that are in the list. The command would look like this PATH_TO_MAKER/maker/bin/fasta_tool --select id_file.txt max_transcritps.fasta | PATH_TO_MAKER/maker/bin/fasta_tool --wrap 80 > standard_transcripts.fasta fasta_tool outputs unwraped fasta by default, so I generally pipe the output back through fasta_tool to wrap the text. The above command line wraps the sequence at 80 characters. you can use a perl one liner like this one to make the id file perl -lane ' if ($F[2] eq mRNA){my ($id) = $_ =~ /Name=(\S+?);/; print $id;}? maker_standard.gff If you use these command line make sure you type them out yourself, email programs have a tendency to change characters slightly making copy/pasted command fail. Thanks, Mike > On Feb 9, 2016, at 9:36 PM, Jason Gallant wrote: > > Hi Everyone, > > Quick question? I?ve run through Mike Cambell?s tutorial on building ?Maker Standard?, ?Maker Default? and ?Maker Max? datasets. I?ve decided that the ?Maker Standard? data (Transcripts with Evidence and/or IPR scan hits) makes the most sense for what we?re trying to do. > > Is there an easy way to create the fasta files associated with the maker standard build? Fasta_merge typically outputs a variety of .fasta files, which I?ve been able to create following this protocol for the ?maker max? dataset. I?d like to get these for the ?maker standard? build. > > Currently, the datastore contains the data for the ?maker max? data. One way, i suppose would be to re-run MAKER with the maker standard gff file, but it seems like an overly complicated way of doing it?? > > Any suggestions Mike (or others?) Has anyone written a script to do this automagically? > > Best, > Jason Gallant > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From michael.s.campbell1 at gmail.com Wed Feb 10 08:17:11 2016 From: michael.s.campbell1 at gmail.com (Michael Campbell) Date: Wed, 10 Feb 2016 09:17:11 -0500 Subject: [maker-devel] Q on MAKER In-Reply-To: <7e4d6f2773f654f8530155936b648832@uci.edu> References: <07C1C0FA-243E-4F05-AE17-FBECCE5BC3A5@gmail.com> <980827F7-C4F0-4E26-A480-F588B2E09C7B@gmail.com> <99f6989955acdf6fd6b0875affbeefa9@uci.edu> <308df3bd49b7098b6f0d36cb2675a0de@uci.edu> <4b6492c5148151cc52c91f2d56c6532b@uci.edu> <54B64F8D-D4A6-4E0F-9693-3AC4055CEA02@gmail.com> <5a40b7af9947dc8297046ba52620569e@uci.edu> <9449FD1E-5D22-4B0B-A61A-644D7B63335B@gmail.com> <7e4d6f2773f654f8530155936b648832@uci.edu> Message-ID: <7495272C-476A-4985-8D49-30D991410535@gmail.com> HI Karen, From my experience trimming reads will not make things worse and it generally makes things better. As far as the best program to use, one doesn?t really stand out above the others as far as I can tell. However, with paired end reads it is important to use a trimmer that preserves the pairing between the two files (i.e when an entire read is discarded the paired read is moved into a file for singletons). Thanks Mike > On Feb 9, 2016, at 5:35 PM, hcma wrote: > > Hi Carson, > > For the final run of annotation, I would like to incorporate tophat results from RNA-seq data, from your experience, do you know if it is better to use raw RNA-seq (Illumina paired-end data) or trimmed (trimmed using Trimmomatuc) data for feeding into tophat? If trimmed, do you recommend a particular programme? > > Thanks for your time. > > Best Regards > KAren > > > > > On 2016-02-05 15:33, Carson Holt wrote: >> I recommend using both. You probably don't have augustus installed. >> --Carson >> Sent from my iPhone >>> On Feb 5, 2016, at 4:20 PM, hcma wrote: >>> Hi Carlson, >>> Thanks for the instruction and in maker_exe.ctl, i only see path to snap, but not to augustus, so my system admin is checking this for me. >>> From some manual i found, people use both snap and augustus when using MAKER to annotate genomes. Would you recommend using both or one of the 2 is sufficient? >>> Thanks for your valuable time and advise. >>> Best Regards >>> Karen >>>> On 2016-02-05 15:03, Carson Holt wrote: >>>> You need to find out where the augustus MAKER is using is installed. >>>> Check the maker_exe.ctl file you are using, or type ?which augustus?. >>>> ?Carson >>>>> On Feb 5, 2016, at 3:58 PM, hcma wrote: >>>>> Hi Carlson, >>>>> These are the list of directories under maker/2.31.8 >>>>> bin data GMOD INSTALL lib LICENSE MWAS perl README RELEASE src >>>>> Where can i find augustus/? Or i have to ask my system admin to install this? >>>>> Thanks. >>>>> Best Regards >>>>> Karen >>>>>> On 2016-02-05 14:54, Carson Holt wrote: >>>>>> Augustus gives you an entire directory rather than just a single file >>>>>> like SNAP. You have to take the directory and copy it to the >>>>>> .../augustus/config/species/ directory. >>>>>> Example: >>>>>> ?/augustus/config/species/arabidopsis/ >>>>>> Then ?arabidopsis? would be the species name to use with MAKER. >>>>>> Sometimes you may have to do a second round of both SNAP and Augustus >>>>>> training (called bootstrapping). Look at the models you get after the >>>>>> first round, and if they look good then, the second round is probably >>>>>> not going top be beneficial. >>>>>> ?Carson >>>>>>> On Feb 5, 2016, at 3:42 PM, hcma wrote: >>>>>>> Hi Dr Holt, >>>>>>> Thanks for the email. Here is my pipeline, does it seems acceptable? Any comments is welcome and much appreciated. >>>>>>> 1. Use maker to generate training gene set: >>>>>>> genome=all-chromosome-r1.04.fasta >>>>>>> est=Trinity.fasta >>>>>>> est2genome=1 >>>>>>> 2. Use output of Maker to train SNAP: >>>>>>> maker2zff dwil-all-chromosome-r1.04.all.gff >>>>>>> fathom genome.ann genome.dna ?gene-stats >>>>>>> fathom genome.ann genome.dna ?categorize 1000 >>>>>>> fathom genome.ann genome.dna ?gene-stats >>>>>>> fathom uni.ann uni.dna ?export 1000 ?plus >>>>>>> hmm-assembler.pl genome . > dwil_genome.hmm >>>>>>> 3. Use output of Maker to train Augustus on their webserver: >>>>>>> File used: >>>>>>> Upload ?export.dna? as the genome file >>>>>>> Upload ?export.aa? as the protein file >>>>>>> 4. second and final Maker run: >>>>>>> genome=all-chromosome-r1.04.fasta >>>>>>> est=Trinity.fasta >>>>>>> est2genome=0 >>>>>>> Snaphmm=output of 2 >>>>>>> How do i incorporate the output of training set of gene from Augustus web server here into this step 4? >>>>>>> Thanks for your time. >>>>>>> Best Regards >>>>>>> Karen >>>>>>>> On 2016-02-05 06:36, Carson Holt wrote: >>>>>>>> Hi Karen, >>>>>>>> There are many ways to train Augustus. I prefer to identify gene >>>>>>>> models in MAKER (GFF3) and use those to train both SNAP and Augustus. >>>>>>>> Here is a previous post on the topic ?> >>>>>>>> https://groups.google.com/forum/#!searchin/maker-devel/augustus/maker-devel/FWMSTdqWQqI/lC3miQtiCpwJ >>>>>>>> [1] >>>>>>>> In the end you need to look at the SNAP and Augustus models together >>>>>>>> with evidence alignments in a genome browser (like desktop Apollo). >>>>>>>> When everything is trained well, both SNAP and Augustus models will >>>>>>>> look like each other and both seem to look like the evidence >>>>>>>> alignments. >>>>>>>> Thanks, >>>>>>>> Carson >>>>>>>>> On Feb 4, 2016, at 5:52 PM, hcma wrote: >>>>>>>>> Hi, >>>>>>>>> I have a genome sequence and Trinity assembly for a new species and >>>>>>>>> I am wondering what are the best steps to take when using MAKER? >>>>>>>>> 1. I used the genome sequence and all assembled Trinity sequence to >>>>>>>>> do first run of MAKER in order to generate training set for SNAP and >>>>>>>>> Augustus. >>>>>>>>> In maker_opts.ctl: >>>>>>>>> genome=all-chromosome-r1.04.fasta >>>>>>>>> est=Trinity.fasta >>>>>>>>> est2genome=1 >>>>>>>>> 2. Train SNAP >>>>>>>>> 3. Train Augustus >>>>>>>>> When i train Augustus, i only supply genome and protein file, should >>>>>>>>> i also supply the trinity file here? >>>>>>>>> 4. what's the best parameter to use when running MAKER the second >>>>>>>>> time for obtaining the final annotation? I would prefer not to use >>>>>>>>> any external protein data. >>>>>>>>> genome=all-chromosome-r1.04.fasta >>>>>>>>> est=Trinity.fasta >>>>>>>>> est2genome=0 >>>>>>>>> SNAP >>>>>>>>> Augustus >>>>>>>>> Thanks. >>>>>>>>> Best Regards >>>>>>>>> KAren >>>>>>>> Links: >>>>>>>> ------ >>>>>>>> [1] >>>>>>>> https://groups.google.com/forum/#!searchin/maker-devel/augustus/maker-devel/FWMSTdqWQqI/lC3miQtiCpwJ > From hcma at uci.edu Wed Feb 10 16:27:41 2016 From: hcma at uci.edu (hcma) Date: Wed, 10 Feb 2016 14:27:41 -0800 Subject: [maker-devel] Q on MAKER In-Reply-To: <7495272C-476A-4985-8D49-30D991410535@gmail.com> References: <07C1C0FA-243E-4F05-AE17-FBECCE5BC3A5@gmail.com> <980827F7-C4F0-4E26-A480-F588B2E09C7B@gmail.com> <99f6989955acdf6fd6b0875affbeefa9@uci.edu> <308df3bd49b7098b6f0d36cb2675a0de@uci.edu> <4b6492c5148151cc52c91f2d56c6532b@uci.edu> <54B64F8D-D4A6-4E0F-9693-3AC4055CEA02@gmail.com> <5a40b7af9947dc8297046ba52620569e@uci.edu> <9449FD1E-5D22-4B0B-A61A-644D7B63335B@gmail.com> <7e4d6f2773f654f8530155936b648832@uci.edu> <7495272C-476A-4985-8D49-30D991410535@gmail.com> Message-ID: <7870d65f86546a8b486faf98c1f6fcc0@uci.edu> Hi Mike, Thanks for the reply. So i can input raw RNA-seq reads to Tophat and feed the output to maker? Thanks. Best Regards KAren On 2016-02-10 06:17, Michael Campbell wrote: > HI Karen, > > From my experience trimming reads will not make things worse and it > generally makes things better. As far as the best program to use, one > doesn?t really stand out above the others as far as I can tell. > However, with paired end reads it is important to use a trimmer that > preserves the pairing between the two files (i.e when an entire read > is discarded the paired read is moved into a file for singletons). > > Thanks > Mike > >> On Feb 9, 2016, at 5:35 PM, hcma wrote: >> >> Hi Carson, >> >> For the final run of annotation, I would like to incorporate tophat >> results from RNA-seq data, from your experience, do you know if it is >> better to use raw RNA-seq (Illumina paired-end data) or trimmed >> (trimmed using Trimmomatuc) data for feeding into tophat? If trimmed, >> do you recommend a particular programme? >> >> Thanks for your time. >> >> Best Regards >> KAren >> >> >> >> >> On 2016-02-05 15:33, Carson Holt wrote: >>> I recommend using both. You probably don't have augustus installed. >>> --Carson >>> Sent from my iPhone >>>> On Feb 5, 2016, at 4:20 PM, hcma wrote: >>>> Hi Carlson, >>>> Thanks for the instruction and in maker_exe.ctl, i only see path to >>>> snap, but not to augustus, so my system admin is checking this for >>>> me. >>>> From some manual i found, people use both snap and augustus when >>>> using MAKER to annotate genomes. Would you recommend using both or >>>> one of the 2 is sufficient? >>>> Thanks for your valuable time and advise. >>>> Best Regards >>>> Karen >>>>> On 2016-02-05 15:03, Carson Holt wrote: >>>>> You need to find out where the augustus MAKER is using is >>>>> installed. >>>>> Check the maker_exe.ctl file you are using, or type ?which >>>>> augustus?. >>>>> ?Carson >>>>>> On Feb 5, 2016, at 3:58 PM, hcma wrote: >>>>>> Hi Carlson, >>>>>> These are the list of directories under maker/2.31.8 >>>>>> bin data GMOD INSTALL lib LICENSE MWAS perl README >>>>>> RELEASE src >>>>>> Where can i find augustus/? Or i have to ask my system admin to >>>>>> install this? >>>>>> Thanks. >>>>>> Best Regards >>>>>> Karen >>>>>>> On 2016-02-05 14:54, Carson Holt wrote: >>>>>>> Augustus gives you an entire directory rather than just a single >>>>>>> file >>>>>>> like SNAP. You have to take the directory and copy it to the >>>>>>> .../augustus/config/species/ directory. >>>>>>> Example: >>>>>>> ?/augustus/config/species/arabidopsis/ >>>>>>> Then ?arabidopsis? would be the species name to use with MAKER. >>>>>>> Sometimes you may have to do a second round of both SNAP and >>>>>>> Augustus >>>>>>> training (called bootstrapping). Look at the models you get after >>>>>>> the >>>>>>> first round, and if they look good then, the second round is >>>>>>> probably >>>>>>> not going top be beneficial. >>>>>>> ?Carson >>>>>>>> On Feb 5, 2016, at 3:42 PM, hcma wrote: >>>>>>>> Hi Dr Holt, >>>>>>>> Thanks for the email. Here is my pipeline, does it seems >>>>>>>> acceptable? Any comments is welcome and much appreciated. >>>>>>>> 1. Use maker to generate training gene set: >>>>>>>> genome=all-chromosome-r1.04.fasta >>>>>>>> est=Trinity.fasta >>>>>>>> est2genome=1 >>>>>>>> 2. Use output of Maker to train SNAP: >>>>>>>> maker2zff dwil-all-chromosome-r1.04.all.gff >>>>>>>> fathom genome.ann genome.dna ?gene-stats >>>>>>>> fathom genome.ann genome.dna ?categorize 1000 >>>>>>>> fathom genome.ann genome.dna ?gene-stats >>>>>>>> fathom uni.ann uni.dna ?export 1000 ?plus >>>>>>>> hmm-assembler.pl genome . > dwil_genome.hmm >>>>>>>> 3. Use output of Maker to train Augustus on their webserver: >>>>>>>> File used: >>>>>>>> Upload ?export.dna? as the genome file >>>>>>>> Upload ?export.aa? as the protein file >>>>>>>> 4. second and final Maker run: >>>>>>>> genome=all-chromosome-r1.04.fasta >>>>>>>> est=Trinity.fasta >>>>>>>> est2genome=0 >>>>>>>> Snaphmm=output of 2 >>>>>>>> How do i incorporate the output of training set of gene from >>>>>>>> Augustus web server here into this step 4? >>>>>>>> Thanks for your time. >>>>>>>> Best Regards >>>>>>>> Karen >>>>>>>>> On 2016-02-05 06:36, Carson Holt wrote: >>>>>>>>> Hi Karen, >>>>>>>>> There are many ways to train Augustus. I prefer to identify >>>>>>>>> gene >>>>>>>>> models in MAKER (GFF3) and use those to train both SNAP and >>>>>>>>> Augustus. >>>>>>>>> Here is a previous post on the topic ?> >>>>>>>>> https://groups.google.com/forum/#!searchin/maker-devel/augustus/maker-devel/FWMSTdqWQqI/lC3miQtiCpwJ >>>>>>>>> [1] >>>>>>>>> In the end you need to look at the SNAP and Augustus models >>>>>>>>> together >>>>>>>>> with evidence alignments in a genome browser (like desktop >>>>>>>>> Apollo). >>>>>>>>> When everything is trained well, both SNAP and Augustus models >>>>>>>>> will >>>>>>>>> look like each other and both seem to look like the evidence >>>>>>>>> alignments. >>>>>>>>> Thanks, >>>>>>>>> Carson >>>>>>>>>> On Feb 4, 2016, at 5:52 PM, hcma wrote: >>>>>>>>>> Hi, >>>>>>>>>> I have a genome sequence and Trinity assembly for a new >>>>>>>>>> species and >>>>>>>>>> I am wondering what are the best steps to take when using >>>>>>>>>> MAKER? >>>>>>>>>> 1. I used the genome sequence and all assembled Trinity >>>>>>>>>> sequence to >>>>>>>>>> do first run of MAKER in order to generate training set for >>>>>>>>>> SNAP and >>>>>>>>>> Augustus. >>>>>>>>>> In maker_opts.ctl: >>>>>>>>>> genome=all-chromosome-r1.04.fasta >>>>>>>>>> est=Trinity.fasta >>>>>>>>>> est2genome=1 >>>>>>>>>> 2. Train SNAP >>>>>>>>>> 3. Train Augustus >>>>>>>>>> When i train Augustus, i only supply genome and protein file, >>>>>>>>>> should >>>>>>>>>> i also supply the trinity file here? >>>>>>>>>> 4. what's the best parameter to use when running MAKER the >>>>>>>>>> second >>>>>>>>>> time for obtaining the final annotation? I would prefer not to >>>>>>>>>> use >>>>>>>>>> any external protein data. >>>>>>>>>> genome=all-chromosome-r1.04.fasta >>>>>>>>>> est=Trinity.fasta >>>>>>>>>> est2genome=0 >>>>>>>>>> SNAP >>>>>>>>>> Augustus >>>>>>>>>> Thanks. >>>>>>>>>> Best Regards >>>>>>>>>> KAren >>>>>>>>> Links: >>>>>>>>> ------ >>>>>>>>> [1] >>>>>>>>> https://groups.google.com/forum/#!searchin/maker-devel/augustus/maker-devel/FWMSTdqWQqI/lC3miQtiCpwJ >> From carsonhh at gmail.com Wed Feb 10 20:32:00 2016 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 10 Feb 2016 19:32:00 -0700 Subject: [maker-devel] Q on MAKER In-Reply-To: <7870d65f86546a8b486faf98c1f6fcc0@uci.edu> References: <07C1C0FA-243E-4F05-AE17-FBECCE5BC3A5@gmail.com> <980827F7-C4F0-4E26-A480-F588B2E09C7B@gmail.com> <99f6989955acdf6fd6b0875affbeefa9@uci.edu> <308df3bd49b7098b6f0d36cb2675a0de@uci.edu> <4b6492c5148151cc52c91f2d56c6532b@uci.edu> <54B64F8D-D4A6-4E0F-9693-3AC4055CEA02@gmail.com> <5a40b7af9947dc8297046ba52620569e@uci.edu> <9449FD1E-5D22-4B0B-A61A-644D7B63335B@gmail.com> <7e4d6f2773f654f8530155936b648832@uci.edu> <7495272C-476A-4985-8D49-30D991410535@gmail.com> <7870d65f86546a8b486faf98c1f6fcc0@uci.edu> Message-ID: I find tophat results to be too noisy, and prefer cufflinks. There is both a tophat2gff and cufflinks2gff script that comes with MAKER. Also consider assembling the reads with Trinity (my overall preferred method because it yields the highest specificity). --Carson Sent from my iPhone > On Feb 10, 2016, at 3:27 PM, hcma wrote: > > Hi Mike, > > Thanks for the reply. So i can input raw RNA-seq reads to Tophat and feed the output to maker? > > Thanks. > > Best Regards > KAren > > > >> On 2016-02-10 06:17, Michael Campbell wrote: >> HI Karen, >> From my experience trimming reads will not make things worse and it >> generally makes things better. As far as the best program to use, one >> doesn?t really stand out above the others as far as I can tell. >> However, with paired end reads it is important to use a trimmer that >> preserves the pairing between the two files (i.e when an entire read >> is discarded the paired read is moved into a file for singletons). >> Thanks >> Mike >>> On Feb 9, 2016, at 5:35 PM, hcma wrote: >>> Hi Carson, >>> For the final run of annotation, I would like to incorporate tophat results from RNA-seq data, from your experience, do you know if it is better to use raw RNA-seq (Illumina paired-end data) or trimmed (trimmed using Trimmomatuc) data for feeding into tophat? If trimmed, do you recommend a particular programme? >>> Thanks for your time. >>> Best Regards >>> KAren >>>> On 2016-02-05 15:33, Carson Holt wrote: >>>> I recommend using both. You probably don't have augustus installed. >>>> --Carson >>>> Sent from my iPhone >>>>> On Feb 5, 2016, at 4:20 PM, hcma wrote: >>>>> Hi Carlson, >>>>> Thanks for the instruction and in maker_exe.ctl, i only see path to snap, but not to augustus, so my system admin is checking this for me. >>>>> From some manual i found, people use both snap and augustus when using MAKER to annotate genomes. Would you recommend using both or one of the 2 is sufficient? >>>>> Thanks for your valuable time and advise. >>>>> Best Regards >>>>> Karen >>>>>> On 2016-02-05 15:03, Carson Holt wrote: >>>>>> You need to find out where the augustus MAKER is using is installed. >>>>>> Check the maker_exe.ctl file you are using, or type ?which augustus?. >>>>>> ?Carson >>>>>>> On Feb 5, 2016, at 3:58 PM, hcma wrote: >>>>>>> Hi Carlson, >>>>>>> These are the list of directories under maker/2.31.8 >>>>>>> bin data GMOD INSTALL lib LICENSE MWAS perl README RELEASE src >>>>>>> Where can i find augustus/? Or i have to ask my system admin to install this? >>>>>>> Thanks. >>>>>>> Best Regards >>>>>>> Karen >>>>>>>> On 2016-02-05 14:54, Carson Holt wrote: >>>>>>>> Augustus gives you an entire directory rather than just a single file >>>>>>>> like SNAP. You have to take the directory and copy it to the >>>>>>>> .../augustus/config/species/ directory. >>>>>>>> Example: >>>>>>>> ?/augustus/config/species/arabidopsis/ >>>>>>>> Then ?arabidopsis? would be the species name to use with MAKER. >>>>>>>> Sometimes you may have to do a second round of both SNAP and Augustus >>>>>>>> training (called bootstrapping). Look at the models you get after the >>>>>>>> first round, and if they look good then, the second round is probably >>>>>>>> not going top be beneficial. >>>>>>>> ?Carson >>>>>>>>> On Feb 5, 2016, at 3:42 PM, hcma wrote: >>>>>>>>> Hi Dr Holt, >>>>>>>>> Thanks for the email. Here is my pipeline, does it seems acceptable? Any comments is welcome and much appreciated. >>>>>>>>> 1. Use maker to generate training gene set: >>>>>>>>> genome=all-chromosome-r1.04.fasta >>>>>>>>> est=Trinity.fasta >>>>>>>>> est2genome=1 >>>>>>>>> 2. Use output of Maker to train SNAP: >>>>>>>>> maker2zff dwil-all-chromosome-r1.04.all.gff >>>>>>>>> fathom genome.ann genome.dna ?gene-stats >>>>>>>>> fathom genome.ann genome.dna ?categorize 1000 >>>>>>>>> fathom genome.ann genome.dna ?gene-stats >>>>>>>>> fathom uni.ann uni.dna ?export 1000 ?plus >>>>>>>>> hmm-assembler.pl genome . > dwil_genome.hmm >>>>>>>>> 3. Use output of Maker to train Augustus on their webserver: >>>>>>>>> File used: >>>>>>>>> Upload ?export.dna? as the genome file >>>>>>>>> Upload ?export.aa? as the protein file >>>>>>>>> 4. second and final Maker run: >>>>>>>>> genome=all-chromosome-r1.04.fasta >>>>>>>>> est=Trinity.fasta >>>>>>>>> est2genome=0 >>>>>>>>> Snaphmm=output of 2 >>>>>>>>> How do i incorporate the output of training set of gene from Augustus web server here into this step 4? >>>>>>>>> Thanks for your time. >>>>>>>>> Best Regards >>>>>>>>> Karen >>>>>>>>>> On 2016-02-05 06:36, Carson Holt wrote: >>>>>>>>>> Hi Karen, >>>>>>>>>> There are many ways to train Augustus. I prefer to identify gene >>>>>>>>>> models in MAKER (GFF3) and use those to train both SNAP and Augustus. >>>>>>>>>> Here is a previous post on the topic ?> >>>>>>>>>> https://groups.google.com/forum/#!searchin/maker-devel/augustus/maker-devel/FWMSTdqWQqI/lC3miQtiCpwJ >>>>>>>>>> [1] >>>>>>>>>> In the end you need to look at the SNAP and Augustus models together >>>>>>>>>> with evidence alignments in a genome browser (like desktop Apollo). >>>>>>>>>> When everything is trained well, both SNAP and Augustus models will >>>>>>>>>> look like each other and both seem to look like the evidence >>>>>>>>>> alignments. >>>>>>>>>> Thanks, >>>>>>>>>> Carson >>>>>>>>>>> On Feb 4, 2016, at 5:52 PM, hcma wrote: >>>>>>>>>>> Hi, >>>>>>>>>>> I have a genome sequence and Trinity assembly for a new species and >>>>>>>>>>> I am wondering what are the best steps to take when using MAKER? >>>>>>>>>>> 1. I used the genome sequence and all assembled Trinity sequence to >>>>>>>>>>> do first run of MAKER in order to generate training set for SNAP and >>>>>>>>>>> Augustus. >>>>>>>>>>> In maker_opts.ctl: >>>>>>>>>>> genome=all-chromosome-r1.04.fasta >>>>>>>>>>> est=Trinity.fasta >>>>>>>>>>> est2genome=1 >>>>>>>>>>> 2. Train SNAP >>>>>>>>>>> 3. Train Augustus >>>>>>>>>>> When i train Augustus, i only supply genome and protein file, should >>>>>>>>>>> i also supply the trinity file here? >>>>>>>>>>> 4. what's the best parameter to use when running MAKER the second >>>>>>>>>>> time for obtaining the final annotation? I would prefer not to use >>>>>>>>>>> any external protein data. >>>>>>>>>>> genome=all-chromosome-r1.04.fasta >>>>>>>>>>> est=Trinity.fasta >>>>>>>>>>> est2genome=0 >>>>>>>>>>> SNAP >>>>>>>>>>> Augustus >>>>>>>>>>> Thanks. >>>>>>>>>>> Best Regards >>>>>>>>>>> KAren >>>>>>>>>> Links: >>>>>>>>>> ------ >>>>>>>>>> [1] >>>>>>>>>> https://groups.google.com/forum/#!searchin/maker-devel/augustus/maker-devel/FWMSTdqWQqI/lC3miQtiCpwJ > From fdolze at students.uni-mainz.de Thu Feb 11 04:43:51 2016 From: fdolze at students.uni-mainz.de (Florian) Date: Thu, 11 Feb 2016 11:43:51 +0100 Subject: [maker-devel] Maker-run with no clean finish on openMPI-cluster Message-ID: <56BC65E7.6000904@students.uni-mainz.de> Hi all, I am no expert for MPI so maybe this is something very trivial or maybe not caused by MAKER at all but I'd be glad to have your thoughts on this. I installed MAKER 2.31.8 with MPI support (openMPI 1.8.1) on our cluster. I ran maker with the options attached and the command in bsub_maker, and I _think_ it worked fine. Here is the last output of maker: running exonerate search. #--------- command -------------# Widget::exonerate::protein2genome: /gpfs/fs1/cluster/Apps/bioinf/maker/2.31.8/exe/exonerate/bin/exonerate -q /project/molgen/workbench_Florian/riparius_MAKER_v2/tmp_fil es/maker_yZhQlA/49/sp%7CQ4JHE0%7CXB36_ORYSJ.for.114901-115619.49.fasta -t /project/molgen/workbench_Florian/riparius_MAKER_v2/tmp_fil es/maker_yZhQlA/49/scaffold299_size115619.114901-115619.49.fasta -Q protein -T dna -m protein2genome --softmasktarget --percent 20 - -showcigar > /project/molgen/workbench_Florian/riparius_MAKER_v2/tmp_files/maker_yZhQlA/49/scaffold299_size115619.114901-115619.sp%7 CQ4JHE0%7CXB36_ORYSJ.p.exonerate #-------------------------------# cleaning blastx... in cluster::shadow_cluster... ...finished clustering. in cluster::shadow_cluster... ...finished clustering. cleaning clusters.... total clusters:37 now processing 0 ...processing 0 of 11 ...processing 1 of 11 ...processing 2 of 11 ...processing 3 of 11 ... ...processing 174 of 177 ...processing 175 of 177 ...processing 176 of 177 flattening protein clusters prepare section files Maker is now finished!!! Start_time: 1454700985 End_time: 1455023070 Elapsed: 322085 but my cluster job didnt finish here, instead I got the following errors until my runtime limit of 5 days was reached: Argument "ALRM" isn't numeric in exit at /home/fdolze/perl5/lib/perl5/x86_64-linux-thread-multi/forks.pm line 2184. SIGTERM received Argument "ALRM" isn't numeric in exit at /home/fdolze/perl5/lib/perl5/x86_64-linux-thread-multi/forks.pm line 2184. SIGTERM received Perl exited with active threads: 1 running and unjoined 0 finished and unjoined 0 running and detached SIGTERM received Perl exited with active threads: 1 running and unjoined 0 finished and unjoined 0 running and detached SIGTERM received Perl exited with active threads: 1 running and unjoined 0 finished and unjoined 0 running and detached SIGTERM received SIGTERM received Perl exited with active threads: 1 running and unjoined 0 finished and unjoined 0 running and detached SIGTERM received Perl exited with active threads: 1 running and unjoined 0 finished and unjoined 0 running and detached SIGTERM received Perl exited with active threads: 1 running and unjoined 0 finished and unjoined 0 running and detached Argument "ALRM" isn't numeric in exit at /home/fdolze/perl5/lib/perl5/x86_64-linux-thread-multi/forks.pm line 2184. SIGTERM received SIGTERM received Perl exited with active threads: 1 running and unjoined 0 finished and unjoined 0 running and detached Argument "ALRM" isn't numeric in exit at /home/fdolze/perl5/lib/perl5/x86_64-linux-thread-multi/forks.pm line 2184. Argument "ALRM" isn't numeric in exit at /home/fdolze/perl5/lib/perl5/x86_64-linux-thread-multi/forks.pm line 2184. SIGTERM received SIGTERM received SIGTERM received SIGTERM received Perl exited with active threads: 1 running and unjoined 0 finished and unjoined 0 running and detached SIGTERM received SIGTERM received Perl exited with active threads: 1 running and unjoined 0 finished and unjoined 0 running and detached Argument "ALRM" isn't numeric in exit at /home/fdolze/perl5/lib/perl5/x86_64-linux-thread-multi/forks.pm line 2184. [a0238:09542] *** Process received signal *** [a0238:09542] Signal: Segmentation fault (11) [a0238:09542] Signal code: Address not mapped (1) [a0238:09542] Failing at address: 0xa80 [a0238:09542] [ 0] /lib64/libpthread.so.0(+0xf710)[0x2ba955727710] [a0238:09542] [ 1] /usr/lib64/perl5/CORE/libperl.so(Perl_csighandler+0x22)[0x2ba954715002] [a0238:09542] [ 2] /lib64/libpthread.so.0(+0xf710)[0x2ba955727710] [a0238:09542] [ 3] /lib64/libc.so.6(__poll+0x53)[0x2ba955a170d3] [a0238:09542] [ 4] /cluster/mpi/gcc_4.4.7/OpenMPI-1.8.1/lib/libopen-pal.so.6(+0x6cfca)[0x2ba955fb4fca] [a0238:09542] [ 5] /cluster/mpi/gcc_4.4.7/OpenMPI-1.8.1/lib/libopen-pal.so.6(opal_libevent2021_event_base_loop+0x221)[0x2ba955fabf11] [a0238:09542] [ 6] /cluster/mpi/gcc_4.4.7/OpenMPI-1.8.1/lib/libopen-rte.so.7(+0x376ae)[0x2ba955d076ae] [a0238:09542] [ 7] /lib64/libpthread.so.0(+0x79d1)[0x2ba95571f9d1] [a0238:09542] [ 8] /lib64/libc.so.6(clone+0x6d)[0x2ba955a208fd] [a0238:09542] *** End of error message *** Argument "ALRM" isn't numeric in exit at /home/fdolze/perl5/lib/perl5/x86_64-linux-thread-multi/forks.pm line 2184. SIGTERM received SIGTERM received ... maybe someone experienced something similar before or can give me some hint if this is caused by my setup or by maker. kind regards, Florian Dolze -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- #-----BLAST and Exonerate Statistics Thresholds blast_type=ncbi+ #set to 'ncbi+', 'ncbi' or 'wublast' pcov_blastn=0.8 #Blastn Percent Coverage Threhold EST-Genome Alignments pid_blastn=0.85 #Blastn Percent Identity Threshold EST-Genome Aligments eval_blastn=1e-10 #Blastn eval cutoff bit_blastn=40 #Blastn bit cutoff depth_blastn=0 #Blastn depth cutoff (0 to disable cutoff) pcov_blastx=0.5 #Blastx Percent Coverage Threhold Protein-Genome Alignments pid_blastx=0.4 #Blastx Percent Identity Threshold Protein-Genome Aligments eval_blastx=1e-06 #Blastx eval cutoff bit_blastx=30 #Blastx bit cutoff depth_blastx=0 #Blastx depth cutoff (0 to disable cutoff) pcov_tblastx=0.8 #tBlastx Percent Coverage Threhold alt-EST-Genome Alignments pid_tblastx=0.85 #tBlastx Percent Identity Threshold alt-EST-Genome Aligments eval_tblastx=1e-10 #tBlastx eval cutoff bit_tblastx=40 #tBlastx bit cutoff depth_tblastx=0 #tBlastx depth cutoff (0 to disable cutoff) pcov_rm_blastx=0.5 #Blastx Percent Coverage Threhold For Transposable Element Masking pid_rm_blastx=0.4 #Blastx Percent Identity Threshold For Transposbale Element Masking eval_rm_blastx=1e-06 #Blastx eval cutoff for transposable element masking bit_rm_blastx=30 #Blastx bit cutoff for transposable element masking ep_score_limit=20 #Exonerate protein percent of maximal score threshold en_score_limit=20 #Exonerate nucleotide percent of maximal score threshold -------------- next part -------------- #-----Location of Executables Used by MAKER/EVALUATOR makeblastdb=/cluster/Apps/bioinf/BLAST/2.2.28/bin/makeblastdb #location of NCBI+ makeblastdb executable blastn=/cluster/Apps/bioinf/BLAST/2.2.28/bin/blastn #location of NCBI+ blastn executable blastx=/cluster/Apps/bioinf/BLAST/2.2.28/bin/blastx #location of NCBI+ blastx executable tblastx=/cluster/Apps/bioinf/BLAST/2.2.28/bin/tblastx #location of NCBI+ tblastx executable formatdb= #location of NCBI formatdb executable blastall= #location of NCBI blastall executable xdformat= #location of WUBLAST xdformat executable blasta= #location of WUBLAST blasta executable RepeatMasker=/gpfs/fs1/cluster/Apps/bioinf/maker/2.31.8/bin/../exe/RepeatMasker/RepeatMasker #location of RepeatMasker executable exonerate=/gpfs/fs1/cluster/Apps/bioinf/maker/2.31.8/bin/../exe/exonerate/bin/exonerate #location of exonerate executable #-----Ab-initio Gene Prediction Algorithms snap=/gpfs/fs1/cluster/Apps/bioinf/maker/2.31.8/bin/../exe/snap/snap #location of snap executable gmhmme3=/project/molgen/Maker_additional_tools/genemark-4.32/gmhmme3 #location of eukaryotic genemark executable gmhmmp= #location of prokaryotic genemark executable augustus=/project/molgen/Maker_additional_tools/augustus-3.2.1/bin/augustus #location of augustus executable fgenesh= #location of fgenesh executable tRNAscan-SE=/project/molgen/Maker_additional_tools/tRNAscan/bin/tRNAscan-SE #location of trnascan executable snoscan=/project/molgen/Maker_additional_tools/snoscan/bin/snoscan #location of snoscan executable #-----Other Algorithms probuild=/project/molgen/Maker_additional_tools/genemark-4.32/probuild #location of probuild executable (required for genemark) -------------- next part -------------- #-----Genome (these are always required) genome= /project/molgen/workbench_Florian/riparius_MAKER_v2/Crip_genome_v20_newHead.fa organism_type=eukaryotic #eukaryotic or prokaryotic. Default is eukaryotic #-----Re-annotation Using MAKER Derived GFF3 maker_gff= #MAKER derived GFF3 file est_pass=0 #use ESTs in maker_gff: 1 = yes, 0 = no altest_pass=0 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no protein_pass=0 #use protein alignments in maker_gff: 1 = yes, 0 = no rm_pass=0 #use repeats in maker_gff: 1 = yes, 0 = no model_pass=0 #use gene models in maker_gff: 1 = yes, 0 = no pred_pass=0 #use ab-initio predictions in maker_gff: 1 = yes, 0 = no other_pass=0 #passthrough anyything else in maker_gff: 1 = yes, 0 = no #-----EST Evidence (for best results provide a file for at least one) est=/project/molgen/workbench_Florian/riparius_MAKER_v2/riparius_cDNA_formatedHeader.fa #set of ESTs or assembled mRNA-seq in fasta format altest= #EST/cDNA sequence file in fasta format from an alternate organism est_gff= #aligned ESTs or mRNA-seq from an external GFF3 file altest_gff= #aligned ESTs from a closly relate species in GFF3 format #-----Protein Homology Evidence (for best results provide a file for at least one) protein=/project/molgen/workbench_Florian/riparius_MAKER_v2/uniprot_sprot.fasta #protein sequence file in fasta format (i.e. from mutiple oransisms) protein_gff= #aligned protein homology evidence from an external GFF3 file #-----Repeat Masking (leave values blank to skip repeat masking) model_org=all #select a model organism for RepBase masking in RepeatMasker rmlib=/project/molgen/workbench_Florian/riparius_MAKER_v2/20151208_Custom_Crip_repeat_library_final.fas #provide an organism specific repeat library in fasta format for RepeatMasker repeat_protein=/gpfs/fs1/cluster/Apps/bioinf/maker/2.31.8/data/te_proteins.fasta #provide a fasta file of transposable element proteins for RepeatRunner rm_gff= #pre-identified repeat elements from an external GFF3 file prok_rm=0 #forces MAKER to repeatmask prokaryotes (no reason to change this), 1 = yes, 0 = no softmask=1 #use soft-masking rather than hard-masking in BLAST (i.e. seg and dust filtering) #-----Gene Prediction snaphmm=/project/molgen/workbench_Florian/riparius_MAKER_v2/cegmasnap.hmm #SNAP HMM file gmhmm=/project/molgen/workbench_Florian/riparius_MAKER_v2/gmhmm.mod #GeneMark HMM file augustus_species=Riparius_Neu #Augustus gene prediction species model fgenesh_par_file= #FGENESH parameter file pred_gff= #ab-initio predictions from an external GFF3 file model_gff= #annotated gene models from an external GFF3 file (annotation pass-through) est2genome=1 #infer gene predictions directly from ESTs, 1 = yes, 0 = no protein2genome=1 #infer predictions from protein homology, 1 = yes, 0 = no trna=1 #find tRNAs with tRNAscan, 1 = yes, 0 = no snoscan_rrna=/project/molgen/workbench_Florian/riparius_MAKER_v2/C.thummi_28S_rDNA_gene.fasta #rRNA file to have Snoscan find snoRNAs unmask=0 #also run ab-initio prediction programs on unmasked sequence, 1 = yes, 0 = no #-----Other Annotation Feature Types (features MAKER doesn't recognize) other_gff= #extra features to pass-through to final MAKER generated GFF3 file #-----External Application Behavior Options alt_peptide=C #amino acid used to replace non-standard amino acids in BLAST databases cpus=1 #max number of cpus to use in BLAST and RepeatMasker (not for MPI, leave 1 when using MPI) #-----MAKER Behavior Options max_dna_len=2100000 #length for dividing up contigs into chunks (increases/decreases memory usage) min_contig=1 #skip genome contigs below this length (under 10kb are often useless) pred_flank=200 #flank for extending evidence clusters sent to gene predictors pred_stats=0 #report AED and QI statistics for all predictions as well as models AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and 1) min_protein=0 #require at least this many amino acids in predicted proteins alt_splice=0 #Take extra steps to try and find alternative splicing, 1 = yes, 0 = no always_complete=0 #extra steps to force start and stop codons, 1 = yes, 0 = no map_forward=0 #map names and attributes forward from old GFF3 genes, 1 = yes, 0 = no keep_preds=1 #Concordance threshold to add unsupported gene prediction (bound by 0 and 1) split_hit=10000 #length for the splitting of hits (expected max intron size for evidence alignments) single_exon=1 #consider single exon EST evidence when generating annotations, 1 = yes, 0 = no single_length=250 #min length required for single exon ESTs if 'single_exon is enabled' correct_est_fusion=0 #limits use of ESTs in annotation to avoid fusion genes tries=2 #number of times to try a contig if there is a failure for some reason clean_try=0 #remove all data from previous run before retrying, 1 = yes, 0 = no clean_up=0 #removes theVoid directory with individual analysis files, 1 = yes, 0 = no TMP=/project/molgen/workbench_Florian/riparius_MAKER_v2/tmp_files #specify a directory other than the system default temporary directory for temporary files -------------- next part -------------- #!/bin/bash #BSUB -n 128 #BSUB -q long #BSUB -W 7200 #BSUB -o mogon_maker_MPIrun_5_feb.log #BSUB -J riparius_makerMPI #BSUB -app Reserve1G mpiexec -mca btl ^openib -n 128 /project/molgen/Bio/maker-2.31.8_MPI-1.8.1/bin/maker -base maker_MPIrun3 -fix_nucleotides From hcma at uci.edu Thu Feb 11 16:32:45 2016 From: hcma at uci.edu (hcma) Date: Thu, 11 Feb 2016 14:32:45 -0800 Subject: [maker-devel] Q on MAKER In-Reply-To: References: <07C1C0FA-243E-4F05-AE17-FBECCE5BC3A5@gmail.com> <980827F7-C4F0-4E26-A480-F588B2E09C7B@gmail.com> <99f6989955acdf6fd6b0875affbeefa9@uci.edu> <308df3bd49b7098b6f0d36cb2675a0de@uci.edu> <4b6492c5148151cc52c91f2d56c6532b@uci.edu> <54B64F8D-D4A6-4E0F-9693-3AC4055CEA02@gmail.com> <5a40b7af9947dc8297046ba52620569e@uci.edu> <9449FD1E-5D22-4B0B-A61A-644D7B63335B@gmail.com> <7e4d6f2773f654f8530155936b648832@uci.edu> <7495272C-476A-4985-8D49-30D991410535@gmail.com> <7870d65f86546a8b486faf98c1f6fcc0@uci.edu> Message-ID: Hi Carlson, Thanks for sharing. I did assemble the Illumina RNA-seq PE100 reads de novo using Trinity and i input this to the 1st run of maker for generating a set of genes to train SNAP and augustus. Now, i am planning to run a 2nd run (and perhaps final run) of maker for gene prediction, provided that the result of Snap and Augustus looks similar to each other. I was going to incorporate the GFF result from tophat into 2nd run of maker for gene prediction, along with Trinity output, but avoiding external protein annotation. I already did a separate blast analysis to identify orthologous genes and i prefer to run maker without any protein evidence. Do you recommend to input the output of tophat2gff for this second run of maker for gene prediction? Thanks again for your time and advise. Best Regards Karen On 2016-02-10 18:32, Carson Holt wrote: > I find tophat results to be too noisy, and prefer cufflinks. There is > both a tophat2gff and cufflinks2gff script that comes with MAKER. Also > consider assembling the reads with Trinity (my overall preferred > method because it yields the highest specificity). > > --Carson > > Sent from my iPhone > >> On Feb 10, 2016, at 3:27 PM, hcma wrote: >> >> Hi Mike, >> >> Thanks for the reply. So i can input raw RNA-seq reads to Tophat and >> feed the output to maker? >> >> Thanks. >> >> Best Regards >> KAren >> >> >> >>> On 2016-02-10 06:17, Michael Campbell wrote: >>> HI Karen, >>> From my experience trimming reads will not make things worse and it >>> generally makes things better. As far as the best program to use, one >>> doesn?t really stand out above the others as far as I can tell. >>> However, with paired end reads it is important to use a trimmer that >>> preserves the pairing between the two files (i.e when an entire read >>> is discarded the paired read is moved into a file for singletons). >>> Thanks >>> Mike >>>> On Feb 9, 2016, at 5:35 PM, hcma wrote: >>>> Hi Carson, >>>> For the final run of annotation, I would like to incorporate tophat >>>> results from RNA-seq data, from your experience, do you know if it >>>> is better to use raw RNA-seq (Illumina paired-end data) or trimmed >>>> (trimmed using Trimmomatuc) data for feeding into tophat? If >>>> trimmed, do you recommend a particular programme? >>>> Thanks for your time. >>>> Best Regards >>>> KAren >>>>> On 2016-02-05 15:33, Carson Holt wrote: >>>>> I recommend using both. You probably don't have augustus >>>>> installed. >>>>> --Carson >>>>> Sent from my iPhone >>>>>> On Feb 5, 2016, at 4:20 PM, hcma wrote: >>>>>> Hi Carlson, >>>>>> Thanks for the instruction and in maker_exe.ctl, i only see path >>>>>> to snap, but not to augustus, so my system admin is checking this >>>>>> for me. >>>>>> From some manual i found, people use both snap and augustus when >>>>>> using MAKER to annotate genomes. Would you recommend using both or >>>>>> one of the 2 is sufficient? >>>>>> Thanks for your valuable time and advise. >>>>>> Best Regards >>>>>> Karen >>>>>>> On 2016-02-05 15:03, Carson Holt wrote: >>>>>>> You need to find out where the augustus MAKER is using is >>>>>>> installed. >>>>>>> Check the maker_exe.ctl file you are using, or type ?which >>>>>>> augustus?. >>>>>>> ?Carson >>>>>>>> On Feb 5, 2016, at 3:58 PM, hcma wrote: >>>>>>>> Hi Carlson, >>>>>>>> These are the list of directories under maker/2.31.8 >>>>>>>> bin data GMOD INSTALL lib LICENSE MWAS perl README >>>>>>>> RELEASE src >>>>>>>> Where can i find augustus/? Or i have to ask my system admin to >>>>>>>> install this? >>>>>>>> Thanks. >>>>>>>> Best Regards >>>>>>>> Karen >>>>>>>>> On 2016-02-05 14:54, Carson Holt wrote: >>>>>>>>> Augustus gives you an entire directory rather than just a >>>>>>>>> single file >>>>>>>>> like SNAP. You have to take the directory and copy it to the >>>>>>>>> .../augustus/config/species/ directory. >>>>>>>>> Example: >>>>>>>>> ?/augustus/config/species/arabidopsis/ >>>>>>>>> Then ?arabidopsis? would be the species name to use with MAKER. >>>>>>>>> Sometimes you may have to do a second round of both SNAP and >>>>>>>>> Augustus >>>>>>>>> training (called bootstrapping). Look at the models you get >>>>>>>>> after the >>>>>>>>> first round, and if they look good then, the second round is >>>>>>>>> probably >>>>>>>>> not going top be beneficial. >>>>>>>>> ?Carson >>>>>>>>>> On Feb 5, 2016, at 3:42 PM, hcma wrote: >>>>>>>>>> Hi Dr Holt, >>>>>>>>>> Thanks for the email. Here is my pipeline, does it seems >>>>>>>>>> acceptable? Any comments is welcome and much appreciated. >>>>>>>>>> 1. Use maker to generate training gene set: >>>>>>>>>> genome=all-chromosome-r1.04.fasta >>>>>>>>>> est=Trinity.fasta >>>>>>>>>> est2genome=1 >>>>>>>>>> 2. Use output of Maker to train SNAP: >>>>>>>>>> maker2zff dwil-all-chromosome-r1.04.all.gff >>>>>>>>>> fathom genome.ann genome.dna ?gene-stats >>>>>>>>>> fathom genome.ann genome.dna ?categorize 1000 >>>>>>>>>> fathom genome.ann genome.dna ?gene-stats >>>>>>>>>> fathom uni.ann uni.dna ?export 1000 ?plus >>>>>>>>>> hmm-assembler.pl genome . > dwil_genome.hmm >>>>>>>>>> 3. Use output of Maker to train Augustus on their webserver: >>>>>>>>>> File used: >>>>>>>>>> Upload ?export.dna? as the genome file >>>>>>>>>> Upload ?export.aa? as the protein file >>>>>>>>>> 4. second and final Maker run: >>>>>>>>>> genome=all-chromosome-r1.04.fasta >>>>>>>>>> est=Trinity.fasta >>>>>>>>>> est2genome=0 >>>>>>>>>> Snaphmm=output of 2 >>>>>>>>>> How do i incorporate the output of training set of gene from >>>>>>>>>> Augustus web server here into this step 4? >>>>>>>>>> Thanks for your time. >>>>>>>>>> Best Regards >>>>>>>>>> Karen >>>>>>>>>>> On 2016-02-05 06:36, Carson Holt wrote: >>>>>>>>>>> Hi Karen, >>>>>>>>>>> There are many ways to train Augustus. I prefer to identify >>>>>>>>>>> gene >>>>>>>>>>> models in MAKER (GFF3) and use those to train both SNAP and >>>>>>>>>>> Augustus. >>>>>>>>>>> Here is a previous post on the topic ?> >>>>>>>>>>> https://groups.google.com/forum/#!searchin/maker-devel/augustus/maker-devel/FWMSTdqWQqI/lC3miQtiCpwJ >>>>>>>>>>> [1] >>>>>>>>>>> In the end you need to look at the SNAP and Augustus models >>>>>>>>>>> together >>>>>>>>>>> with evidence alignments in a genome browser (like desktop >>>>>>>>>>> Apollo). >>>>>>>>>>> When everything is trained well, both SNAP and Augustus >>>>>>>>>>> models will >>>>>>>>>>> look like each other and both seem to look like the evidence >>>>>>>>>>> alignments. >>>>>>>>>>> Thanks, >>>>>>>>>>> Carson >>>>>>>>>>>> On Feb 4, 2016, at 5:52 PM, hcma wrote: >>>>>>>>>>>> Hi, >>>>>>>>>>>> I have a genome sequence and Trinity assembly for a new >>>>>>>>>>>> species and >>>>>>>>>>>> I am wondering what are the best steps to take when using >>>>>>>>>>>> MAKER? >>>>>>>>>>>> 1. I used the genome sequence and all assembled Trinity >>>>>>>>>>>> sequence to >>>>>>>>>>>> do first run of MAKER in order to generate training set for >>>>>>>>>>>> SNAP and >>>>>>>>>>>> Augustus. >>>>>>>>>>>> In maker_opts.ctl: >>>>>>>>>>>> genome=all-chromosome-r1.04.fasta >>>>>>>>>>>> est=Trinity.fasta >>>>>>>>>>>> est2genome=1 >>>>>>>>>>>> 2. Train SNAP >>>>>>>>>>>> 3. Train Augustus >>>>>>>>>>>> When i train Augustus, i only supply genome and protein >>>>>>>>>>>> file, should >>>>>>>>>>>> i also supply the trinity file here? >>>>>>>>>>>> 4. what's the best parameter to use when running MAKER the >>>>>>>>>>>> second >>>>>>>>>>>> time for obtaining the final annotation? I would prefer not >>>>>>>>>>>> to use >>>>>>>>>>>> any external protein data. >>>>>>>>>>>> genome=all-chromosome-r1.04.fasta >>>>>>>>>>>> est=Trinity.fasta >>>>>>>>>>>> est2genome=0 >>>>>>>>>>>> SNAP >>>>>>>>>>>> Augustus >>>>>>>>>>>> Thanks. >>>>>>>>>>>> Best Regards >>>>>>>>>>>> KAren >>>>>>>>>>> Links: >>>>>>>>>>> ------ >>>>>>>>>>> [1] >>>>>>>>>>> https://groups.google.com/forum/#!searchin/maker-devel/augustus/maker-devel/FWMSTdqWQqI/lC3miQtiCpwJ >> From carsonhh at gmail.com Thu Feb 11 16:36:44 2016 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 11 Feb 2016 15:36:44 -0700 Subject: [maker-devel] Q on MAKER In-Reply-To: References: <07C1C0FA-243E-4F05-AE17-FBECCE5BC3A5@gmail.com> <980827F7-C4F0-4E26-A480-F588B2E09C7B@gmail.com> <99f6989955acdf6fd6b0875affbeefa9@uci.edu> <308df3bd49b7098b6f0d36cb2675a0de@uci.edu> <4b6492c5148151cc52c91f2d56c6532b@uci.edu> <54B64F8D-D4A6-4E0F-9693-3AC4055CEA02@gmail.com> <5a40b7af9947dc8297046ba52620569e@uci.edu> <9449FD1E-5D22-4B0B-A61A-644D7B63335B@gmail.com> <7e4d6f2773f654f8530155936b648832@uci.edu> <7495272C-476A-4985-8D49-30D991410535@gmail.com> <7870d65f86546a8b486faf98c1f6fcc0@uci.edu> Message-ID: <56F1935F-F6BA-4755-92F2-17EE81909619@gmail.com> Not if you already have trinity results. It will actually decrease the specificity of the run (i.e. causes false gene calls because of spurious evidence support). ?Carson > On Feb 11, 2016, at 3:32 PM, hcma wrote: > > Hi Carlson, > > Thanks for sharing. > > I did assemble the Illumina RNA-seq PE100 reads de novo using Trinity and i input this to the 1st run of maker for generating a set of genes to train SNAP and augustus. Now, i am planning to run a 2nd run (and perhaps final run) of maker for gene prediction, provided that the result of Snap and Augustus looks similar to each other. > > I was going to incorporate the GFF result from tophat into 2nd run of maker for gene prediction, along with Trinity output, but avoiding external protein annotation. I already did a separate blast analysis to identify orthologous genes and i prefer to run maker without any protein evidence. > > Do you recommend to input the output of tophat2gff for this second run of maker for gene prediction? > > Thanks again for your time and advise. > > Best Regards > Karen > > > > On 2016-02-10 18:32, Carson Holt wrote: >> I find tophat results to be too noisy, and prefer cufflinks. There is >> both a tophat2gff and cufflinks2gff script that comes with MAKER. Also >> consider assembling the reads with Trinity (my overall preferred >> method because it yields the highest specificity). >> --Carson >> Sent from my iPhone >>> On Feb 10, 2016, at 3:27 PM, hcma wrote: >>> Hi Mike, >>> Thanks for the reply. So i can input raw RNA-seq reads to Tophat and feed the output to maker? >>> Thanks. >>> Best Regards >>> KAren >>>> On 2016-02-10 06:17, Michael Campbell wrote: >>>> HI Karen, >>>> From my experience trimming reads will not make things worse and it >>>> generally makes things better. As far as the best program to use, one >>>> doesn?t really stand out above the others as far as I can tell. >>>> However, with paired end reads it is important to use a trimmer that >>>> preserves the pairing between the two files (i.e when an entire read >>>> is discarded the paired read is moved into a file for singletons). >>>> Thanks >>>> Mike >>>>> On Feb 9, 2016, at 5:35 PM, hcma wrote: >>>>> Hi Carson, >>>>> For the final run of annotation, I would like to incorporate tophat results from RNA-seq data, from your experience, do you know if it is better to use raw RNA-seq (Illumina paired-end data) or trimmed (trimmed using Trimmomatuc) data for feeding into tophat? If trimmed, do you recommend a particular programme? >>>>> Thanks for your time. >>>>> Best Regards >>>>> KAren >>>>>> On 2016-02-05 15:33, Carson Holt wrote: >>>>>> I recommend using both. You probably don't have augustus installed. >>>>>> --Carson >>>>>> Sent from my iPhone >>>>>>> On Feb 5, 2016, at 4:20 PM, hcma wrote: >>>>>>> Hi Carlson, >>>>>>> Thanks for the instruction and in maker_exe.ctl, i only see path to snap, but not to augustus, so my system admin is checking this for me. >>>>>>> From some manual i found, people use both snap and augustus when using MAKER to annotate genomes. Would you recommend using both or one of the 2 is sufficient? >>>>>>> Thanks for your valuable time and advise. >>>>>>> Best Regards >>>>>>> Karen >>>>>>>> On 2016-02-05 15:03, Carson Holt wrote: >>>>>>>> You need to find out where the augustus MAKER is using is installed. >>>>>>>> Check the maker_exe.ctl file you are using, or type ?which augustus?. >>>>>>>> ?Carson >>>>>>>>> On Feb 5, 2016, at 3:58 PM, hcma wrote: >>>>>>>>> Hi Carlson, >>>>>>>>> These are the list of directories under maker/2.31.8 >>>>>>>>> bin data GMOD INSTALL lib LICENSE MWAS perl README RELEASE src >>>>>>>>> Where can i find augustus/? Or i have to ask my system admin to install this? >>>>>>>>> Thanks. >>>>>>>>> Best Regards >>>>>>>>> Karen >>>>>>>>>> On 2016-02-05 14:54, Carson Holt wrote: >>>>>>>>>> Augustus gives you an entire directory rather than just a single file >>>>>>>>>> like SNAP. You have to take the directory and copy it to the >>>>>>>>>> .../augustus/config/species/ directory. >>>>>>>>>> Example: >>>>>>>>>> ?/augustus/config/species/arabidopsis/ >>>>>>>>>> Then ?arabidopsis? would be the species name to use with MAKER. >>>>>>>>>> Sometimes you may have to do a second round of both SNAP and Augustus >>>>>>>>>> training (called bootstrapping). Look at the models you get after the >>>>>>>>>> first round, and if they look good then, the second round is probably >>>>>>>>>> not going top be beneficial. >>>>>>>>>> ?Carson >>>>>>>>>>> On Feb 5, 2016, at 3:42 PM, hcma wrote: >>>>>>>>>>> Hi Dr Holt, >>>>>>>>>>> Thanks for the email. Here is my pipeline, does it seems acceptable? Any comments is welcome and much appreciated. >>>>>>>>>>> 1. Use maker to generate training gene set: >>>>>>>>>>> genome=all-chromosome-r1.04.fasta >>>>>>>>>>> est=Trinity.fasta >>>>>>>>>>> est2genome=1 >>>>>>>>>>> 2. Use output of Maker to train SNAP: >>>>>>>>>>> maker2zff dwil-all-chromosome-r1.04.all.gff >>>>>>>>>>> fathom genome.ann genome.dna ?gene-stats >>>>>>>>>>> fathom genome.ann genome.dna ?categorize 1000 >>>>>>>>>>> fathom genome.ann genome.dna ?gene-stats >>>>>>>>>>> fathom uni.ann uni.dna ?export 1000 ?plus >>>>>>>>>>> hmm-assembler.pl genome . > dwil_genome.hmm >>>>>>>>>>> 3. Use output of Maker to train Augustus on their webserver: >>>>>>>>>>> File used: >>>>>>>>>>> Upload ?export.dna? as the genome file >>>>>>>>>>> Upload ?export.aa? as the protein file >>>>>>>>>>> 4. second and final Maker run: >>>>>>>>>>> genome=all-chromosome-r1.04.fasta >>>>>>>>>>> est=Trinity.fasta >>>>>>>>>>> est2genome=0 >>>>>>>>>>> Snaphmm=output of 2 >>>>>>>>>>> How do i incorporate the output of training set of gene from Augustus web server here into this step 4? >>>>>>>>>>> Thanks for your time. >>>>>>>>>>> Best Regards >>>>>>>>>>> Karen >>>>>>>>>>>> On 2016-02-05 06:36, Carson Holt wrote: >>>>>>>>>>>> Hi Karen, >>>>>>>>>>>> There are many ways to train Augustus. I prefer to identify gene >>>>>>>>>>>> models in MAKER (GFF3) and use those to train both SNAP and Augustus. >>>>>>>>>>>> Here is a previous post on the topic ?> >>>>>>>>>>>> https://groups.google.com/forum/#!searchin/maker-devel/augustus/maker-devel/FWMSTdqWQqI/lC3miQtiCpwJ >>>>>>>>>>>> [1] >>>>>>>>>>>> In the end you need to look at the SNAP and Augustus models together >>>>>>>>>>>> with evidence alignments in a genome browser (like desktop Apollo). >>>>>>>>>>>> When everything is trained well, both SNAP and Augustus models will >>>>>>>>>>>> look like each other and both seem to look like the evidence >>>>>>>>>>>> alignments. >>>>>>>>>>>> Thanks, >>>>>>>>>>>> Carson >>>>>>>>>>>>> On Feb 4, 2016, at 5:52 PM, hcma wrote: >>>>>>>>>>>>> Hi, >>>>>>>>>>>>> I have a genome sequence and Trinity assembly for a new species and >>>>>>>>>>>>> I am wondering what are the best steps to take when using MAKER? >>>>>>>>>>>>> 1. I used the genome sequence and all assembled Trinity sequence to >>>>>>>>>>>>> do first run of MAKER in order to generate training set for SNAP and >>>>>>>>>>>>> Augustus. >>>>>>>>>>>>> In maker_opts.ctl: >>>>>>>>>>>>> genome=all-chromosome-r1.04.fasta >>>>>>>>>>>>> est=Trinity.fasta >>>>>>>>>>>>> est2genome=1 >>>>>>>>>>>>> 2. Train SNAP >>>>>>>>>>>>> 3. Train Augustus >>>>>>>>>>>>> When i train Augustus, i only supply genome and protein file, should >>>>>>>>>>>>> i also supply the trinity file here? >>>>>>>>>>>>> 4. what's the best parameter to use when running MAKER the second >>>>>>>>>>>>> time for obtaining the final annotation? I would prefer not to use >>>>>>>>>>>>> any external protein data. >>>>>>>>>>>>> genome=all-chromosome-r1.04.fasta >>>>>>>>>>>>> est=Trinity.fasta >>>>>>>>>>>>> est2genome=0 >>>>>>>>>>>>> SNAP >>>>>>>>>>>>> Augustus >>>>>>>>>>>>> Thanks. >>>>>>>>>>>>> Best Regards >>>>>>>>>>>>> KAren >>>>>>>>>>>> Links: >>>>>>>>>>>> ------ >>>>>>>>>>>> [1] >>>>>>>>>>>> https://groups.google.com/forum/#!searchin/maker-devel/augustus/maker-devel/FWMSTdqWQqI/lC3miQtiCpwJ > From hcma at uci.edu Thu Feb 11 18:18:43 2016 From: hcma at uci.edu (hcma) Date: Thu, 11 Feb 2016 16:18:43 -0800 Subject: [maker-devel] Q on MAKER In-Reply-To: References: <07C1C0FA-243E-4F05-AE17-FBECCE5BC3A5@gmail.com> <980827F7-C4F0-4E26-A480-F588B2E09C7B@gmail.com> <99f6989955acdf6fd6b0875affbeefa9@uci.edu> Message-ID: Hi Carson, I have downloaded Apollo and what format of the SNAP and Augustus models does Apollo take? Do i need to reformat the SNAP.hmm and which output of Augustus to use if I train Augustus manually? Thanks again for your time. Best Regards Karen On 2016-02-05 06:36, Carson Holt wrote: > Hi Karen, > > There are many ways to train Augustus. I prefer to identify gene > models in MAKER (GFF3) and use those to train both SNAP and Augustus. > Here is a previous post on the topic ?> > https://groups.google.com/forum/#!searchin/maker-devel/augustus/maker-devel/FWMSTdqWQqI/lC3miQtiCpwJ > [1] > > In the end you need to look at the SNAP and Augustus models together > with evidence alignments in a genome browser (like desktop Apollo). > When everything is trained well, both SNAP and Augustus models will > look like each other and both seem to look like the evidence > alignments. > > Thanks, > Carson > >> On Feb 4, 2016, at 5:52 PM, hcma wrote: >> >> Hi, >> >> I have a genome sequence and Trinity assembly for a new species and >> I am wondering what are the best steps to take when using MAKER? >> >> 1. I used the genome sequence and all assembled Trinity sequence to >> do first run of MAKER in order to generate training set for SNAP and >> Augustus. >> >> In maker_opts.ctl: >> genome=all-chromosome-r1.04.fasta >> est=Trinity.fasta >> est2genome=1 >> >> 2. Train SNAP >> >> 3. Train Augustus >> >> When i train Augustus, i only supply genome and protein file, should >> i also supply the trinity file here? >> >> 4. what's the best parameter to use when running MAKER the second >> time for obtaining the final annotation? I would prefer not to use >> any external protein data. >> >> genome=all-chromosome-r1.04.fasta >> est=Trinity.fasta >> est2genome=0 >> SNAP >> Augustus >> >> Thanks. >> >> Best Regards >> KAren > > > > Links: > ------ > [1] > https://groups.google.com/forum/#!searchin/maker-devel/augustus/maker-devel/FWMSTdqWQqI/lC3miQtiCpwJ From panos.ioannidis at gmail.com Fri Feb 12 02:35:49 2016 From: panos.ioannidis at gmail.com (Panos Ioannidis) Date: Fri, 12 Feb 2016 09:35:49 +0100 Subject: [maker-devel] GFF features from Maker Message-ID: Hi guys, I have a few questions regarding annotated features in the GFF file built by Maker. 1) I'm a bit confused about the annotations coming from "est2genome" and "blastn", because they both give "expressed_sequence_match" features. So, what's the difference between them? How do the EST matches from est2genome differ from those from blastn? 2) Same goes for "protein2genome" and "blastx", since they both give "protein_match" features. 3) Last, what is the difference between the partial matches and full-length matches? For example, in almost all cases where est2genome gives an "expressed_sequence_match" feature for a genomic area, it also gives a "match_part" feature for sub-areas within this area. What is the meaning of this? I'm pasting one such area, below. scaffold3|size1771164 est2genome expressed_sequence_match 21953 22276 949 + . ID=scaffold3|size1771164:hit:1901:3.2.0.0;Name=C24476_a_3_0_l_241 scaffold3|size1771164 est2genome match_part 21953 22035 949 + . ID=scaffold3|size1771164:hsp:1902:3.2.0.0;Parent=scaffold3|size1771164:hit:1901:3.2.0.0;Target=C24476_a_3_0_l_241 1 83 +;Gap=M83 scaffold3|size1771164 est2genome match_part 22148 22276 949 + . ID=scaffold3|size1771164:hsp:1903:3.2.0.0;Parent=scaffold3|size1771164:hit:1901:3.2.0.0;Target=C24476_a_3_0_l_241 84 215 +;Gap=M104 D2 M7 I4 M8 I1 M8 Thanks, Panos -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Feb 12 08:48:46 2016 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 12 Feb 2016 07:48:46 -0700 Subject: [maker-devel] GFF features from Maker In-Reply-To: References: Message-ID: <806D9F3C-13AF-4EDE-ACA8-DA981255E5DD@gmail.com> Hi Panos, Terms used are governed by the sequence ontology (http://www.sequenceontology.org ), and specific definitions can be found there. Terms have a Parent/Child relationship with lower levels being more specific than higher levels. The match feature is used for ab initio reference results rather than the potentially better term predicted_gene because match is already handled correctly by most software and most databases like FlyBase already use it for that purpose (in part because predicted_gene was a latecomer to the ontology list and it is used more often to distinguish accepted models without human curation rather than reference predictions). Since match is an experimental_feature, it matches the expected separation between genes (biological_region) and analysis results (experimental_feature). It?s rather boring and technical, but it?s all the result of carful selection using the Sequence Ontology inheritance levels and term definitions. Example in attached image. ?Carson > On Feb 12, 2016, at 1:35 AM, Panos Ioannidis wrote: > > Hi guys, > > I have a few questions regarding annotated features in the GFF file built by Maker. > > 1) I'm a bit confused about the annotations coming from "est2genome" and "blastn", because they both give "expressed_sequence_match" features. So, what's the difference between them? How do the EST matches from est2genome differ from those from blastn? > > 2) Same goes for "protein2genome" and "blastx", since they both give "protein_match" features. > > 3) Last, what is the difference between the partial matches and full-length matches? For example, in almost all cases where est2genome gives an "expressed_sequence_match" feature for a genomic area, it also gives a "match_part" feature for sub-areas within this area. What is the meaning of this? I'm pasting one such area, below. > > scaffold3|size1771164 est2genome expressed_sequence_match 21953 22276 949 + . ID=scaffold3|size1771164:hit:1901:3.2.0.0;Name=C24476_a_3_0_l_241 > scaffold3|size1771164 est2genome match_part 21953 22035 949 + . ID=scaffold3|size1771164:hsp:1902:3.2.0.0;Parent=scaffold3|size1771164:hit:1901:3.2.0.0;Target=C24476_a_3_0_l_241 1 83 +;Gap=M83 > scaffold3|size1771164 est2genome match_part 22148 22276 949 + . ID=scaffold3|size1771164:hsp:1903:3.2.0.0;Parent=scaffold3|size1771164:hit:1901:3.2.0.0;Target=C24476_a_3_0_l_241 84 215 +;Gap=M104 D2 M7 I4 M8 I1 M8 > > Thanks, > Panos > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: SO-0000102.png Type: image/png Size: 7720 bytes Desc: not available URL: From carsonhh at gmail.com Fri Feb 12 08:56:41 2016 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 12 Feb 2016 07:56:41 -0700 Subject: [maker-devel] GFF features from Maker In-Reply-To: References: Message-ID: <1B5D7E98-850C-4D16-A5C1-5BE1EB5B8735@gmail.com> Also BLAST vs Exonerate is an algorithmic difference. BLAST aligns using traditional Smith Watmerman resulting in potenially out of order sub alignments called HSPs. Exonerate does spice aware alignments (in order and correctly trimmed for splice sites). More info on polishing alignments on wiki page here ?> http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial_for_GMOD_Online_Training_2014#Polishing_Evidence_Alignments ?Carson > On Feb 12, 2016, at 1:35 AM, Panos Ioannidis wrote: > > Hi guys, > > I have a few questions regarding annotated features in the GFF file built by Maker. > > 1) I'm a bit confused about the annotations coming from "est2genome" and "blastn", because they both give "expressed_sequence_match" features. So, what's the difference between them? How do the EST matches from est2genome differ from those from blastn? > > 2) Same goes for "protein2genome" and "blastx", since they both give "protein_match" features. > > 3) Last, what is the difference between the partial matches and full-length matches? For example, in almost all cases where est2genome gives an "expressed_sequence_match" feature for a genomic area, it also gives a "match_part" feature for sub-areas within this area. What is the meaning of this? I'm pasting one such area, below. > > scaffold3|size1771164 est2genome expressed_sequence_match 21953 22276 949 + . ID=scaffold3|size1771164:hit:1901:3.2.0.0;Name=C24476_a_3_0_l_241 > scaffold3|size1771164 est2genome match_part 21953 22035 949 + . ID=scaffold3|size1771164:hsp:1902:3.2.0.0;Parent=scaffold3|size1771164:hit:1901:3.2.0.0;Target=C24476_a_3_0_l_241 1 83 +;Gap=M83 > scaffold3|size1771164 est2genome match_part 22148 22276 949 + . ID=scaffold3|size1771164:hsp:1903:3.2.0.0;Parent=scaffold3|size1771164:hit:1901:3.2.0.0;Target=C24476_a_3_0_l_241 84 215 +;Gap=M104 D2 M7 I4 M8 I1 M8 > > Thanks, > Panos > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From panos.ioannidis at gmail.com Fri Feb 12 08:59:05 2016 From: panos.ioannidis at gmail.com (Panos Ioannidis) Date: Fri, 12 Feb 2016 15:59:05 +0100 Subject: [maker-devel] GFF features from Maker In-Reply-To: <1B5D7E98-850C-4D16-A5C1-5BE1EB5B8735@gmail.com> References: <1B5D7E98-850C-4D16-A5C1-5BE1EB5B8735@gmail.com> Message-ID: Thanks for all the info Carson! Panos On Fri, Feb 12, 2016 at 3:56 PM, Carson Holt wrote: > Also BLAST vs Exonerate is an algorithmic difference. BLAST aligns using > traditional Smith Watmerman resulting in potenially out of order sub > alignments called HSPs. Exonerate does spice aware alignments (in order and > correctly trimmed for splice sites). More info on polishing alignments on > wiki page here ?> > http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial_for_GMOD_Online_Training_2014#Polishing_Evidence_Alignments > > ?Carson > > > > On Feb 12, 2016, at 1:35 AM, Panos Ioannidis > wrote: > > Hi guys, > > I have a few questions regarding annotated features in the GFF file built > by Maker. > > 1) I'm a bit confused about the annotations coming from "est2genome" and > "blastn", because they both give "expressed_sequence_match" features. So, > what's the difference between them? How do the EST matches from est2genome > differ from those from blastn? > > 2) Same goes for "protein2genome" and "blastx", since they both give > "protein_match" features. > > 3) Last, what is the difference between the partial matches and > full-length matches? For example, in almost all cases where est2genome > gives an "expressed_sequence_match" feature for a genomic area, it also > gives a "match_part" feature for sub-areas within this area. What is the > meaning of this? I'm pasting one such area, below. > > scaffold3|size1771164 est2genome expressed_sequence_match > 21953 22276 949 + . > ID=scaffold3|size1771164:hit:1901:3.2.0.0;Name=C24476_a_3_0_l_241 > scaffold3|size1771164 est2genome match_part 21953 22035 > 949 + . > ID=scaffold3|size1771164:hsp:1902:3.2.0.0;Parent=scaffold3|size1771164:hit:1901:3.2.0.0;Target=C24476_a_3_0_l_241 > 1 83 +;Gap=M83 > scaffold3|size1771164 est2genome match_part 22148 22276 > 949 + . > ID=scaffold3|size1771164:hsp:1903:3.2.0.0;Parent=scaffold3|size1771164:hit:1901:3.2.0.0;Target=C24476_a_3_0_l_241 > 84 215 +;Gap=M104 D2 M7 I4 M8 I1 M8 > > Thanks, > Panos > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Feb 12 13:14:16 2016 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 12 Feb 2016 12:14:16 -0700 Subject: [maker-devel] Q on MAKER In-Reply-To: References: <07C1C0FA-243E-4F05-AE17-FBECCE5BC3A5@gmail.com> <980827F7-C4F0-4E26-A480-F588B2E09C7B@gmail.com> <99f6989955acdf6fd6b0875affbeefa9@uci.edu> Message-ID: You need to view the output the programs produce, not the HMM. You can Run them through MAKER and then view the GFF3 files produced Here is a MAKER tutorial where this is done that you can follow along if you wish ?> http://gmod.org/wiki/MAKER_Tutorial_2013#Training_ab_initio_Gene_Predictors For Augustus training there are a number of threads related to how to do that on the MAKER mailing list archives ? https://groups.google.com/forum/#!searchin/maker-devel/augustus Also other resources online ?> http://www.molecularevolution.org/molevolfiles/exercises/augustus/training.html ?Carson > On Feb 11, 2016, at 5:18 PM, hcma wrote: > > Hi Carson, > > I have downloaded Apollo and what format of the SNAP and Augustus models does Apollo take? Do i need to reformat the SNAP.hmm and which output of Augustus to use if I train Augustus manually? > > Thanks again for your time. > > Best Regards > Karen > > > > > On 2016-02-05 06:36, Carson Holt wrote: >> Hi Karen, >> There are many ways to train Augustus. I prefer to identify gene >> models in MAKER (GFF3) and use those to train both SNAP and Augustus. >> Here is a previous post on the topic ?> >> https://groups.google.com/forum/#!searchin/maker-devel/augustus/maker-devel/FWMSTdqWQqI/lC3miQtiCpwJ >> [1] >> In the end you need to look at the SNAP and Augustus models together >> with evidence alignments in a genome browser (like desktop Apollo). >> When everything is trained well, both SNAP and Augustus models will >> look like each other and both seem to look like the evidence >> alignments. >> Thanks, >> Carson >>> On Feb 4, 2016, at 5:52 PM, hcma wrote: >>> Hi, >>> I have a genome sequence and Trinity assembly for a new species and >>> I am wondering what are the best steps to take when using MAKER? >>> 1. I used the genome sequence and all assembled Trinity sequence to >>> do first run of MAKER in order to generate training set for SNAP and >>> Augustus. >>> In maker_opts.ctl: >>> genome=all-chromosome-r1.04.fasta >>> est=Trinity.fasta >>> est2genome=1 >>> 2. Train SNAP >>> 3. Train Augustus >>> When i train Augustus, i only supply genome and protein file, should >>> i also supply the trinity file here? >>> 4. what's the best parameter to use when running MAKER the second >>> time for obtaining the final annotation? I would prefer not to use >>> any external protein data. >>> genome=all-chromosome-r1.04.fasta >>> est=Trinity.fasta >>> est2genome=0 >>> SNAP >>> Augustus >>> Thanks. >>> Best Regards >>> KAren >> Links: >> ------ >> [1] >> https://groups.google.com/forum/#!searchin/maker-devel/augustus/maker-devel/FWMSTdqWQqI/lC3miQtiCpwJ -------------- next part -------------- An HTML attachment was scrubbed... URL: From fdolze at students.uni-mainz.de Tue Feb 16 04:10:03 2016 From: fdolze at students.uni-mainz.de (Florian) Date: Tue, 16 Feb 2016 11:10:03 +0100 Subject: [maker-devel] Estimated runtime on 180mb genome @ 128 cores? In-Reply-To: <56BC65E7.6000904@students.uni-mainz.de> References: <56BC65E7.6000904@students.uni-mainz.de> Message-ID: <56C2F57B.8020208@students.uni-mainz.de> Hi all, I am trying to run MAKER on a project of mine and since this is the first time I use MAKER I'd like to ask some more experienced users what I can expect in regard to resource consumption and runtime of MAKER. My genome data is: * 180.652.019 bp genome length * 5.292 Scaffolds * 34.136 bp median scaffold length * 2.056.324 bp longest * 272.065 bp N50 - I use a 73mb transcriptome assembly as EST Evidence - SwissProt as Protein Homology Evidence - 60kb custom repeat library for RepeatMasker For gene prediction I am running with a SNAP hmm I generated using CEGMA, GeneMark, and Augustus trained by their webservice. I have options est2genome and protein2genome turned on (=1) and use tRNAscan and snoscan. And other options as following: #-----MAKER Behavior Options max_dna_len=2100000 #length for dividing up contigs into chunks (increases/decreases memory usage) <--- Is this reasonable? min_contig=1 #skip genome contigs below this length (under 10kb are often useless) pred_flank=200 #flank for extending evidence clusters sent to gene predictors pred_stats=0 #report AED and QI statistics for all predictions as well as models AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and 1) min_protein=0 #require at least this many amino acids in predicted proteins alt_splice=0 #Take extra steps to try and find alternative splicing, 1 = yes, 0 = no always_complete=0 #extra steps to force start and stop codons, 1 = yes, 0 = no map_forward=0 #map names and attributes forward from old GFF3 genes, 1 = yes, 0 = no keep_preds=1 #Concordance threshold to add unsupported gene prediction (bound by 0 and 1) split_hit=10000 #length for the splitting of hits (expected max intron size for evidence alignments) single_exon=1 #consider single exon EST evidence when generating annotations, 1 = yes, 0 = no single_length=250 #min length required for single exon ESTs if 'single_exon is enabled' correct_est_fusion=0 #limits use of ESTs in annotation to avoid fusion genes The maker_bopts.ctl file is unchanged. (Basically I follow this guide https://github.com/sujaikumar/assemblage/blob/master/README-annotation.md) At the moment I am running this with openMPI as: mpiexec -mca btl ^openib -n 128 /project/molgen/Bio/maker-2.31.8_MPI-1.8.1/bin/maker -base maker_run1 -fix_nucleotides on 128 cores with 130GB of memory. First of all, are those options I use viable? Is it possible to guesstimate the runtime I can expect? 5 days? 20 days? And is it reasonable to use additional cores or will this not benefit much? Thanks for your insights, Florian -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Tue Feb 16 10:42:51 2016 From: dence at genetics.utah.edu (Daniel Ence) Date: Tue, 16 Feb 2016 16:42:51 +0000 Subject: [maker-devel] Estimated runtime on 180mb genome @ 128 cores? In-Reply-To: <56C2F57B.8020208@students.uni-mainz.de> References: <56BC65E7.6000904@students.uni-mainz.de> <56C2F57B.8020208@students.uni-mainz.de> Message-ID: Hi Florian, I don?t think you want est2genome or protein2genome turned on for this run. Est2genome is usually only used if you don?t have any ab-initio predictors trained; protein2genome should only be used if you have good reason not to expect any introns at a ll (for example, a prokaryotic genome). Also, you set the max_dna_len parameter for 2.1Mbp, which is larger than your N50. Setting this too large prevents MAKER from speeding up it?s analysis by splitting contigs/scaffolds across multiple processors. There?s usually no reason to change this from the default setting. With a good N50 like you have, you?ll probably get good results. ~Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 On Feb 16, 2016, at 3:10 AM, Florian > wrote: Hi all, I am trying to run MAKER on a project of mine and since this is the first time I use MAKER I'd like to ask some more experienced users what I can expect in regard to resource consumption and runtime of MAKER. My genome data is: * 180.652.019 bp genome length * 5.292 Scaffolds * 34.136 bp median scaffold length * 2.056.324 bp longest * 272.065 bp N50 - I use a 73mb transcriptome assembly as EST Evidence - SwissProt as Protein Homology Evidence - 60kb custom repeat library for RepeatMasker For gene prediction I am running with a SNAP hmm I generated using CEGMA, GeneMark, and Augustus trained by their webservice. I have options est2genome and protein2genome turned on (=1) and use tRNAscan and snoscan. And other options as following: #-----MAKER Behavior Options max_dna_len=2100000 #length for dividing up contigs into chunks (increases/decreases memory usage) <--- Is this reasonable? min_contig=1 #skip genome contigs below this length (under 10kb are often useless) pred_flank=200 #flank for extending evidence clusters sent to gene predictors pred_stats=0 #report AED and QI statistics for all predictions as well as models AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and 1) min_protein=0 #require at least this many amino acids in predicted proteins alt_splice=0 #Take extra steps to try and find alternative splicing, 1 = yes, 0 = no always_complete=0 #extra steps to force start and stop codons, 1 = yes, 0 = no map_forward=0 #map names and attributes forward from old GFF3 genes, 1 = yes, 0 = no keep_preds=1 #Concordance threshold to add unsupported gene prediction (bound by 0 and 1) split_hit=10000 #length for the splitting of hits (expected max intron size for evidence alignments) single_exon=1 #consider single exon EST evidence when generating annotations, 1 = yes, 0 = no single_length=250 #min length required for single exon ESTs if 'single_exon is enabled' correct_est_fusion=0 #limits use of ESTs in annotation to avoid fusion genes The maker_bopts.ctl file is unchanged. (Basically I follow this guide https://github.com/sujaikumar/assemblage/blob/master/README-annotation.md) At the moment I am running this with openMPI as: mpiexec -mca btl ^openib -n 128 /project/molgen/Bio/maker-2.31.8_MPI-1.8.1/bin/maker -base maker_run1 -fix_nucleotides on 128 cores with 130GB of memory. First of all, are those options I use viable? Is it possible to guesstimate the runtime I can expect? 5 days? 20 days? And is it reasonable to use additional cores or will this not benefit much? Thanks for your insights, Florian _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Feb 16 10:53:55 2016 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 16 Feb 2016 09:53:55 -0700 Subject: [maker-devel] Estimated runtime on 180mb genome @ 128 cores? In-Reply-To: References: <56BC65E7.6000904@students.uni-mainz.de> <56C2F57B.8020208@students.uni-mainz.de> Message-ID: Agree. 500,000 is about the highest you ever want to go with max_dna_len. Increasing the value decreases parallelization and increases memory usage. The only biological reason to ever increase it is if genes are really long and don?t fit into windows of this size. Also test out the mpiexec command with something like ?hostname? to make sure it works. Example ?> mpiexec -mca btl ^openib -n 128 hostname Should print out 128 lines identifying all hosts in the communication ring. If it prints out the same host ID every time, then there is a problem and you may need to provide a hostfile to let mpiexec know all the hosts it can run across. ?Carson > On Feb 16, 2016, at 9:42 AM, Daniel Ence wrote: > > Hi Florian, I don?t think you want est2genome or protein2genome turned on for this run. Est2genome is usually only used if you don?t have any ab-initio predictors trained; protein2genome should only be used if you have good reason not to expect any introns at a ll (for example, a prokaryotic genome). > > Also, you set the max_dna_len parameter for 2.1Mbp, which is larger than your N50. Setting this too large prevents MAKER from speeding up it?s analysis by splitting contigs/scaffolds across multiple processors. There?s usually no reason to change this from the default setting. > > With a good N50 like you have, you?ll probably get good results. > > ~Daniel > > > > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > >> On Feb 16, 2016, at 3:10 AM, Florian > wrote: >> >> Hi all, >> >> I am trying to run MAKER on a project of mine and since this is the first time I use MAKER I'd like to ask some more experienced users what I can expect in regard to resource consumption and runtime of MAKER. >> >> My genome data is: >> >> 180.652.019 bp genome length >> 5.292 Scaffolds >> 34.136 bp median scaffold length >> 2.056.324 bp longest >> 272.065 bp N50 >> - I use a 73mb transcriptome assembly as EST Evidence >> - SwissProt as Protein Homology Evidence >> - 60kb custom repeat library for RepeatMasker >> >> >> >> For gene prediction I am running with a SNAP hmm I generated using CEGMA, GeneMark, and Augustus trained by their webservice. >> I have options est2genome and protein2genome turned on (=1) and use tRNAscan and snoscan. And other options as following: >> >> #-----MAKER Behavior Options >> max_dna_len=2100000 #length for dividing up contigs into chunks (increases/decreases memory usage) <--- Is this reasonable? >> min_contig=1 #skip genome contigs below this length (under 10kb are often useless) >> >> pred_flank=200 #flank for extending evidence clusters sent to gene predictors >> pred_stats=0 #report AED and QI statistics for all predictions as well as models >> AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and 1) >> min_protein=0 #require at least this many amino acids in predicted proteins >> alt_splice=0 #Take extra steps to try and find alternative splicing, 1 = yes, 0 = no >> always_complete=0 #extra steps to force start and stop codons, 1 = yes, 0 = no >> map_forward=0 #map names and attributes forward from old GFF3 genes, 1 = yes, 0 = no >> keep_preds=1 #Concordance threshold to add unsupported gene prediction (bound by 0 and 1) >> >> split_hit=10000 #length for the splitting of hits (expected max intron size for evidence alignments) >> single_exon=1 #consider single exon EST evidence when generating annotations, 1 = yes, 0 = no >> single_length=250 #min length required for single exon ESTs if 'single_exon is enabled' >> correct_est_fusion=0 #limits use of ESTs in annotation to avoid fusion genes >> >> The maker_bopts.ctl file is unchanged. >> >> (Basically I follow this guide https://github.com/sujaikumar/assemblage/blob/master/README-annotation.md ) >> >> >> At the moment I am running this with openMPI as: >> >> mpiexec -mca btl ^openib -n 128 /project/molgen/Bio/maker-2.31.8_MPI-1.8.1/bin/maker -base maker_run1 -fix_nucleotides >> >> on 128 cores with 130GB of memory. >> >> >> First of all, are those options I use viable? >> >> Is it possible to guesstimate the runtime I can expect? 5 days? 20 days? And is it reasonable to use additional cores or will this not benefit much? >> >> Thanks for your insights, >> Florian >> >> >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From alejocn5 at gmail.com Tue Feb 16 15:17:40 2016 From: alejocn5 at gmail.com (=?UTF-8?Q?Alejandro_Cer=C3=B3n_Noriega?=) Date: Tue, 16 Feb 2016 16:17:40 -0500 Subject: [maker-devel] problem with the example Message-ID: hello i am Alejandro I have tried to follow the tutorial MAKER 1-I Copy the files in the data directories to a temporary directory where i run an example file. 2-I Type maker -CTL to generate generic MAKER control files (foto_1) 3-I edit the control files to include the path of the genome file ( hsap_contig.fasta from the example) (foto_2) then I give the paht maker maker_exe.ctl maker_opts.ctl maker_bopts.ctl (foto 3) that generate a expected folder hsap_contig.maker.output but when i whatn to look for the gff file i dont find it, inside the /data/hsap_contig.maker.output/hsap_contig_datastore, i dont find the all subdirectories seq_name.gff - a gff file that can be loaded into GMOD, GBROWSE, or Apollo * seq_name.maker.transcripts.fasta - a fasta file of the MAKER annotated transcript sequences * seq_name.maker.proteins.fasta - a fasta file of the MAKER annotated protein sequences * seq_name.maker.XXX.transcript.fasta - a fasta file of ab-initio predicted transcript sequences from program XXX * seq_name.maker.XXX.proteins.fasta - a fasta file of ab-inito predicted protein sequences from program XXX * seq_name.maker.non_overlapping_ab_initio.transcripts.fasta - a fasta file of filtered ab-inito transcript sequences that don't overlap maker annotations * seq_name.maker.non_overlapping_ab_initio.proteins.fasta - a fasta file of filtered ab-inito protein sequences that don't overlap maker annotations * theVoid.seq_name/ - a directory containing all of the raw output files produced by MAKER, including BLAST reports, SNAP output, exonnerate output and the masked genomeic sequence. i only find a directorie named 80 (foto 4) i dont know if a make somthing wrong, also try to change the path of the EST (foto_5) thanks for your attention -- *Alejandro Cer?n Noriega, **B.Sc* MSc. Candidate Bioinformatics *K ?**?**?* -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: foto_1.png Type: image/png Size: 67330 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: foto_2.png Type: image/png Size: 257578 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Foto_3.png Type: image/png Size: 213241 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: foto_4.png Type: image/png Size: 129352 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: foto_5.png Type: image/png Size: 255944 bytes Desc: not available URL: From carsonhh at gmail.com Thu Feb 18 13:36:13 2016 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 18 Feb 2016 12:36:13 -0700 Subject: [maker-devel] problem with the example In-Reply-To: References: Message-ID: <4CD9B36B-8C9D-4E48-B1B6-ACAFF28DF3B2@gmail.com> To access files for individual sequences use the datastore index: /scratchsan/caceronn/Results/MAKER/data/hsap_contig.maker.output/hsap_contig_master_datastore_index.log look in that file to find the location of individual contig results. For merged results you have to use the gff3_merge script together with the datastore index. Here is a nice tutorial with step by step instructions and a video to easilly follow along ?> http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial_for_GMOD_Online_Training_2014 ?Carson > On Feb 16, 2016, at 2:17 PM, Alejandro Cer?n Noriega wrote: > > hello i am Alejandro > > I have tried to follow the tutorial MAKER > > 1-I Copy the files in the data directories to a temporary directory where i run an example file. > 2-I Type maker -CTL to generate generic MAKER control files (foto_1) > 3-I edit the control files to include the path of the genome file ( hsap_contig.fasta from the example) (foto_2) > then I give the paht maker maker_exe.ctl maker_opts.ctl maker_bopts.ctl (foto 3) > > that generate a expected folder > hsap_contig.maker.output > > but when i whatn to look for the gff file i dont find it, inside the /data/hsap_contig.maker.output/hsap_contig_datastore, i dont find the all subdirectories > > seq_name.gff - a gff file that can be loaded into GMOD, GBROWSE, > or Apollo > * seq_name.maker.transcripts.fasta - a fasta file of the MAKER > annotated transcript sequences > * seq_name.maker.proteins.fasta - a fasta file of the MAKER > annotated protein sequences > * seq_name.maker.XXX.transcript.fasta - a fasta file of ab-initio > predicted transcript sequences from program XXX > * seq_name.maker.XXX.proteins.fasta - a fasta file of ab-inito > predicted protein sequences from program XXX > * seq_name.maker.non_overlapping_ab_initio.transcripts.fasta - a > fasta file of filtered ab-inito transcript sequences that don't > overlap maker annotations > * seq_name.maker.non_overlapping_ab_initio.proteins.fasta - a > fasta file of filtered ab-inito protein sequences that don't > overlap maker annotations > * theVoid.seq_name/ - a directory containing all of the raw > output files produced by MAKER, including BLAST reports, SNAP > output, exonnerate output and the masked genomeic sequence. > > i only find a directorie named 80 (foto 4) > > i dont know if a make somthing wrong, > > also try to change the path of the EST (foto_5) > > > thanks for your attention > > > -- > Alejandro Cer?n Noriega, B.Sc > MSc. Candidate Bioinformatics > K ??? > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From fdolze at students.uni-mainz.de Fri Feb 26 08:16:10 2016 From: fdolze at students.uni-mainz.de (Florian) Date: Fri, 26 Feb 2016 15:16:10 +0100 Subject: [maker-devel] Possible to redirect maker output? Message-ID: <56D05E2A.1040201@students.uni-mainz.de> Hi all, I am trying to run maker on a cluster (2 nodes with 64 cores each), to speed things up I copied all input files to a ramdisk to reduce I/O time, but all subsequent results are still written to hdd. Is there a way I can tell maker to write the maker.results files to ramdisk (or generally any other directory than the current working dir) too? (are they actually used for the current run or are only files in the temp files location used?) Is anybody experienced with running maker on a similar setup and could tell me how you are handling this? thanks, Florian From scott at scottcain.net Fri Feb 26 11:50:06 2016 From: scott at scottcain.net (Scott Cain) Date: Fri, 26 Feb 2016 12:50:06 -0500 Subject: [maker-devel] GMOD 2016 meeting Message-ID: Hello all, I am pleased to announce that details have been finalized for the 2016 GMOD meeting. It will take place immediately following the Galaxy Community Conference at Indiana University in Bloomington, IN on June 30 and July 1. We're still working on agenda details, so if you have suggestions or would like to present, please let me know. For registration information, please see: https://gmod2016.eventbrite.com And for other information about the meeting, keep an eye on: http://gmod.org/wiki/Jun_2016_GMOD_Meeting I look forward to seeing you there! Scott -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research -------------- next part -------------- An HTML attachment was scrubbed... URL: From gloriarendon at gmail.com Fri Feb 26 16:14:26 2016 From: gloriarendon at gmail.com (Gloria Rendon) Date: Fri, 26 Feb 2016 16:14:26 -0600 Subject: [maker-devel] MAKER/3.00.0-beta: missing some accessory scripts Message-ID: Hello, My name is Gloria Rendon. I work at the Carle Woese Institute for Genomic Biology at the University of Illinois at Urbana-Champaign. In recent months we used MAKER/3.00.0-beta to generate annotations (in GFF3 format) for a de-novo assembly that we produced in-house for the Taro plant. As part of the same project, I need to run now an analysis with RNA seq data for the same Tarospecies. I am going to use STAR for the alignment step and I need to provide the annotations file in GTF format, not in GFF3 format as I currently have. In order to perform the GFF3->GTF conversion I was planning to run some of the accessory scripts that come with MAKER add_utr_start_stop_gff gff3_2_gtf However I just noticed that my installation of MAKER is missing those two scripts. This is how the MAKER/bin folder looks like now: $ ls /home/groups/hpcbio/apps/maker/maker-3.00.0-beta/bin/ AED_cdf_generator.pl ipr_update_gff maker_map_ids cegma2zff iprscan2gff3 map2assembly chado2gff3 maker map_data_ids compare_gff3_to_chado maker2chado map_fasta_ids cufflinks2gff3 maker2eval_gtf map_gff_ids evaluator maker2jbrowse match2gene.pl fasta_merge maker2wap quality_filter.pl fasta_tool maker2zff tophat2gff3 genemark_gtf2gff3 maker_functional_fasta gff3_merge maker_functional_gff btw, earlier versions of MAKER that are also installed on our cluster as also missing those scripts. Could you please tell me how to remedy the situation? Do you have executables of the two scripts that you can share with me? OR Do I need to re-install MAKER with special configuration options? Thank you very much for the attention to this matter. Sincerely, Gloria -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Feb 29 13:09:14 2016 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 29 Feb 2016 12:09:14 -0700 Subject: [maker-devel] Possible to redirect maker output? In-Reply-To: <56D05E2A.1040201@students.uni-mainz.de> References: <56D05E2A.1040201@students.uni-mainz.de> Message-ID: <75FD2CDE-AD66-416A-9A3E-6AF49B3FB13F@gmail.com> You can try setting TMP= in the control files to a RAM disk location (You will need a lot of RAM though, perhaps 500Gb). Even then some components used by MAKER may not function properly with tmpfs, but you can try. If it doesn?t work you?ll get an error. The main output directory on the other hand must be globally accessible to all nodes if working with MPI, and a RAM disk will only exist and be accessible on a single node (even though a directory with the same name may exists on multiple nodes, they will actually be separate and distinct locations, i.e. /dev/shm). ?Carson > On Feb 26, 2016, at 7:16 AM, Florian wrote: > > Hi all, > > I am trying to run maker on a cluster (2 nodes with 64 cores each), to speed things up I copied all input files to a ramdisk to reduce I/O time, but all subsequent results are still written to hdd. > > Is there a way I can tell maker to write the maker.results files to ramdisk (or generally any other directory than the current working dir) too? (are they actually used for the current run or are only files in the temp files location used?) > > Is anybody experienced with running maker on a similar setup and could tell me how you are handling this? > > > thanks, > Florian > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Mon Feb 29 13:17:29 2016 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 29 Feb 2016 12:17:29 -0700 Subject: [maker-devel] MAKER/3.00.0-beta: missing some accessory scripts In-Reply-To: References: Message-ID: You should be using maker2eval_gtf. The scripts you mention were actually deprecated in MAKER 2.10 onwards (about 5 years ago). You may be looking at old documentation. ?Carson > On Feb 26, 2016, at 3:14 PM, Gloria Rendon wrote: > > Hello, > > My name is Gloria Rendon. I work at the Carle Woese Institute for Genomic Biology at the University of Illinois at Urbana-Champaign. > > In recent months we used MAKER/3.00.0-beta to generate annotations (in GFF3 format) for a de-novo assembly that we produced in-house for the Taro plant. > > As part of the same project, I need to run now an analysis with RNA seq data for the same Tarospecies. > I am going to use STAR for the alignment step and I need to provide the annotations file in GTF format, not in GFF3 format as I currently have. > > In order to perform the GFF3->GTF conversion I was planning to run some of the accessory scripts that come with MAKER > > add_utr_start_stop_gff > gff3_2_gtf > > However I just noticed that my installation of MAKER is missing those two scripts. > This is how the MAKER/bin folder looks like now: > > $ ls /home/groups/hpcbio/apps/maker/maker-3.00.0-beta/bin/ > AED_cdf_generator.pl ipr_update_gff maker_map_ids > cegma2zff iprscan2gff3 map2assembly > chado2gff3 maker map_data_ids > compare_gff3_to_chado maker2chado map_fasta_ids > cufflinks2gff3 maker2eval_gtf map_gff_ids > evaluator maker2jbrowse match2gene.pl > fasta_merge maker2wap quality_filter.pl > fasta_tool maker2zff tophat2gff3 > genemark_gtf2gff3 maker_functional_fasta > gff3_merge maker_functional_gff > > > btw, earlier versions of MAKER that are also installed on our cluster as also missing those scripts. > > Could you please tell me how to remedy the situation? > Do you have executables of the two scripts that you can share with me? > OR > Do I need to re-install MAKER with special configuration options? > > Thank you very much for the attention to this matter. > > Sincerely, > > Gloria > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From hcma at uci.edu Thu Feb 4 17:52:12 2016 From: hcma at uci.edu (hcma) Date: Thu, 04 Feb 2016 16:52:12 -0800 Subject: [maker-devel] Q on MAKER In-Reply-To: <980827F7-C4F0-4E26-A480-F588B2E09C7B@gmail.com> References: <07C1C0FA-243E-4F05-AE17-FBECCE5BC3A5@gmail.com> <980827F7-C4F0-4E26-A480-F588B2E09C7B@gmail.com> Message-ID: <99f6989955acdf6fd6b0875affbeefa9@uci.edu> Hi, I have a genome sequence and Trinity assembly for a new species and I am wondering what are the best steps to take when using MAKER? 1. I used the genome sequence and all assembled Trinity sequence to do first run of MAKER in order to generate training set for SNAP and Augustus. In maker_opts.ctl: genome=all-chromosome-r1.04.fasta est=Trinity.fasta est2genome=1 2. Train SNAP 3. Train Augustus When i train Augustus, i only supply genome and protein file, should i also supply the trinity file here? 4. what's the best parameter to use when running MAKER the second time for obtaining the final annotation? I would prefer not to use any external protein data. genome=all-chromosome-r1.04.fasta est=Trinity.fasta est2genome=0 SNAP Augustus Thanks. Best Regards KAren From carsonhh at gmail.com Fri Feb 5 07:36:06 2016 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 5 Feb 2016 07:36:06 -0700 Subject: [maker-devel] Q on MAKER In-Reply-To: <99f6989955acdf6fd6b0875affbeefa9@uci.edu> References: <07C1C0FA-243E-4F05-AE17-FBECCE5BC3A5@gmail.com> <980827F7-C4F0-4E26-A480-F588B2E09C7B@gmail.com> <99f6989955acdf6fd6b0875affbeefa9@uci.edu> Message-ID: Hi Karen, There are many ways to train Augustus. I prefer to identify gene models in MAKER (GFF3) and use those to train both SNAP and Augustus. Here is a previous post on the topic ?> https://groups.google.com/forum/#!searchin/maker-devel/augustus/maker-devel/FWMSTdqWQqI/lC3miQtiCpwJ In the end you need to look at the SNAP and Augustus models together with evidence alignments in a genome browser (like desktop Apollo). When everything is trained well, both SNAP and Augustus models will look like each other and both seem to look like the evidence alignments. Thanks, Carson > On Feb 4, 2016, at 5:52 PM, hcma wrote: > > Hi, > > I have a genome sequence and Trinity assembly for a new species and I am wondering what are the best steps to take when using MAKER? > > 1. I used the genome sequence and all assembled Trinity sequence to do first run of MAKER in order to generate training set for SNAP and Augustus. > > In maker_opts.ctl: > genome=all-chromosome-r1.04.fasta > est=Trinity.fasta > est2genome=1 > > > 2. Train SNAP > > 3. Train Augustus > > When i train Augustus, i only supply genome and protein file, should i also supply the trinity file here? > > > 4. what's the best parameter to use when running MAKER the second time for obtaining the final annotation? I would prefer not to use any external protein data. > > genome=all-chromosome-r1.04.fasta > est=Trinity.fasta > est2genome=0 > SNAP > Augustus > > Thanks. > > Best Regards > KAren -------------- next part -------------- An HTML attachment was scrubbed... URL: From hcma at uci.edu Fri Feb 5 15:42:37 2016 From: hcma at uci.edu (hcma) Date: Fri, 05 Feb 2016 14:42:37 -0800 Subject: [maker-devel] Q on MAKER In-Reply-To: References: <07C1C0FA-243E-4F05-AE17-FBECCE5BC3A5@gmail.com> <980827F7-C4F0-4E26-A480-F588B2E09C7B@gmail.com> <99f6989955acdf6fd6b0875affbeefa9@uci.edu> Message-ID: <308df3bd49b7098b6f0d36cb2675a0de@uci.edu> Hi Dr Holt, Thanks for the email. Here is my pipeline, does it seems acceptable? Any comments is welcome and much appreciated. 1. Use maker to generate training gene set: genome=all-chromosome-r1.04.fasta est=Trinity.fasta est2genome=1 2. Use output of Maker to train SNAP: maker2zff dwil-all-chromosome-r1.04.all.gff fathom genome.ann genome.dna ?gene-stats fathom genome.ann genome.dna ?categorize 1000 fathom genome.ann genome.dna ?gene-stats fathom uni.ann uni.dna ?export 1000 ?plus hmm-assembler.pl genome . > dwil_genome.hmm 3. Use output of Maker to train Augustus on their webserver: File used: Upload ?export.dna? as the genome file Upload ?export.aa? as the protein file 4. second and final Maker run: genome=all-chromosome-r1.04.fasta est=Trinity.fasta est2genome=0 Snaphmm=output of 2 How do i incorporate the output of training set of gene from Augustus web server here into this step 4? Thanks for your time. Best Regards Karen On 2016-02-05 06:36, Carson Holt wrote: > Hi Karen, > > There are many ways to train Augustus. I prefer to identify gene > models in MAKER (GFF3) and use those to train both SNAP and Augustus. > Here is a previous post on the topic ?> > https://groups.google.com/forum/#!searchin/maker-devel/augustus/maker-devel/FWMSTdqWQqI/lC3miQtiCpwJ > [1] > > In the end you need to look at the SNAP and Augustus models together > with evidence alignments in a genome browser (like desktop Apollo). > When everything is trained well, both SNAP and Augustus models will > look like each other and both seem to look like the evidence > alignments. > > Thanks, > Carson > >> On Feb 4, 2016, at 5:52 PM, hcma wrote: >> >> Hi, >> >> I have a genome sequence and Trinity assembly for a new species and >> I am wondering what are the best steps to take when using MAKER? >> >> 1. I used the genome sequence and all assembled Trinity sequence to >> do first run of MAKER in order to generate training set for SNAP and >> Augustus. >> >> In maker_opts.ctl: >> genome=all-chromosome-r1.04.fasta >> est=Trinity.fasta >> est2genome=1 >> >> 2. Train SNAP >> >> 3. Train Augustus >> >> When i train Augustus, i only supply genome and protein file, should >> i also supply the trinity file here? >> >> 4. what's the best parameter to use when running MAKER the second >> time for obtaining the final annotation? I would prefer not to use >> any external protein data. >> >> genome=all-chromosome-r1.04.fasta >> est=Trinity.fasta >> est2genome=0 >> SNAP >> Augustus >> >> Thanks. >> >> Best Regards >> KAren > > > > Links: > ------ > [1] > https://groups.google.com/forum/#!searchin/maker-devel/augustus/maker-devel/FWMSTdqWQqI/lC3miQtiCpwJ From carsonhh at gmail.com Fri Feb 5 15:54:58 2016 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 5 Feb 2016 15:54:58 -0700 Subject: [maker-devel] Q on MAKER In-Reply-To: <308df3bd49b7098b6f0d36cb2675a0de@uci.edu> References: <07C1C0FA-243E-4F05-AE17-FBECCE5BC3A5@gmail.com> <980827F7-C4F0-4E26-A480-F588B2E09C7B@gmail.com> <99f6989955acdf6fd6b0875affbeefa9@uci.edu> <308df3bd49b7098b6f0d36cb2675a0de@uci.edu> Message-ID: Augustus gives you an entire directory rather than just a single file like SNAP. You have to take the directory and copy it to the .../augustus/config/species/ directory. Example: ?/augustus/config/species/arabidopsis/ Then ?arabidopsis? would be the species name to use with MAKER. Sometimes you may have to do a second round of both SNAP and Augustus training (called bootstrapping). Look at the models you get after the first round, and if they look good then, the second round is probably not going top be beneficial. ?Carson > On Feb 5, 2016, at 3:42 PM, hcma wrote: > > Hi Dr Holt, > > Thanks for the email. Here is my pipeline, does it seems acceptable? Any comments is welcome and much appreciated. > > > 1. Use maker to generate training gene set: > > genome=all-chromosome-r1.04.fasta > est=Trinity.fasta > est2genome=1 > > > 2. Use output of Maker to train SNAP: > > maker2zff dwil-all-chromosome-r1.04.all.gff > fathom genome.ann genome.dna ?gene-stats > fathom genome.ann genome.dna ?categorize 1000 > fathom genome.ann genome.dna ?gene-stats > fathom uni.ann uni.dna ?export 1000 ?plus > hmm-assembler.pl genome . > dwil_genome.hmm > > > 3. Use output of Maker to train Augustus on their webserver: > > File used: > > Upload ?export.dna? as the genome file > Upload ?export.aa? as the protein file > > > > 4. second and final Maker run: > > > genome=all-chromosome-r1.04.fasta > est=Trinity.fasta > est2genome=0 > Snaphmm=output of 2 > > How do i incorporate the output of training set of gene from Augustus web server here into this step 4? > > Thanks for your time. > > Best Regards > Karen > > > > > > > > > > > > On 2016-02-05 06:36, Carson Holt wrote: >> Hi Karen, >> There are many ways to train Augustus. I prefer to identify gene >> models in MAKER (GFF3) and use those to train both SNAP and Augustus. >> Here is a previous post on the topic ?> >> https://groups.google.com/forum/#!searchin/maker-devel/augustus/maker-devel/FWMSTdqWQqI/lC3miQtiCpwJ >> [1] >> In the end you need to look at the SNAP and Augustus models together >> with evidence alignments in a genome browser (like desktop Apollo). >> When everything is trained well, both SNAP and Augustus models will >> look like each other and both seem to look like the evidence >> alignments. >> Thanks, >> Carson >>> On Feb 4, 2016, at 5:52 PM, hcma wrote: >>> Hi, >>> I have a genome sequence and Trinity assembly for a new species and >>> I am wondering what are the best steps to take when using MAKER? >>> 1. I used the genome sequence and all assembled Trinity sequence to >>> do first run of MAKER in order to generate training set for SNAP and >>> Augustus. >>> In maker_opts.ctl: >>> genome=all-chromosome-r1.04.fasta >>> est=Trinity.fasta >>> est2genome=1 >>> 2. Train SNAP >>> 3. Train Augustus >>> When i train Augustus, i only supply genome and protein file, should >>> i also supply the trinity file here? >>> 4. what's the best parameter to use when running MAKER the second >>> time for obtaining the final annotation? I would prefer not to use >>> any external protein data. >>> genome=all-chromosome-r1.04.fasta >>> est=Trinity.fasta >>> est2genome=0 >>> SNAP >>> Augustus >>> Thanks. >>> Best Regards >>> KAren >> Links: >> ------ >> [1] >> https://groups.google.com/forum/#!searchin/maker-devel/augustus/maker-devel/FWMSTdqWQqI/lC3miQtiCpwJ > From hcma at uci.edu Fri Feb 5 15:58:56 2016 From: hcma at uci.edu (hcma) Date: Fri, 05 Feb 2016 14:58:56 -0800 Subject: [maker-devel] Q on MAKER In-Reply-To: References: <07C1C0FA-243E-4F05-AE17-FBECCE5BC3A5@gmail.com> <980827F7-C4F0-4E26-A480-F588B2E09C7B@gmail.com> <99f6989955acdf6fd6b0875affbeefa9@uci.edu> <308df3bd49b7098b6f0d36cb2675a0de@uci.edu> Message-ID: <4b6492c5148151cc52c91f2d56c6532b@uci.edu> Hi Carlson, These are the list of directories under maker/2.31.8 bin data GMOD INSTALL lib LICENSE MWAS perl README RELEASE src Where can i find augustus/? Or i have to ask my system admin to install this? Thanks. Best Regards Karen On 2016-02-05 14:54, Carson Holt wrote: > Augustus gives you an entire directory rather than just a single file > like SNAP. You have to take the directory and copy it to the > .../augustus/config/species/ directory. > > Example: > ?/augustus/config/species/arabidopsis/ > > Then ?arabidopsis? would be the species name to use with MAKER. > > Sometimes you may have to do a second round of both SNAP and Augustus > training (called bootstrapping). Look at the models you get after the > first round, and if they look good then, the second round is probably > not going top be beneficial. > > ?Carson > > > >> On Feb 5, 2016, at 3:42 PM, hcma wrote: >> >> Hi Dr Holt, >> >> Thanks for the email. Here is my pipeline, does it seems acceptable? >> Any comments is welcome and much appreciated. >> >> >> 1. Use maker to generate training gene set: >> >> genome=all-chromosome-r1.04.fasta >> est=Trinity.fasta >> est2genome=1 >> >> >> 2. Use output of Maker to train SNAP: >> >> maker2zff dwil-all-chromosome-r1.04.all.gff >> fathom genome.ann genome.dna ?gene-stats >> fathom genome.ann genome.dna ?categorize 1000 >> fathom genome.ann genome.dna ?gene-stats >> fathom uni.ann uni.dna ?export 1000 ?plus >> hmm-assembler.pl genome . > dwil_genome.hmm >> >> >> 3. Use output of Maker to train Augustus on their webserver: >> >> File used: >> >> Upload ?export.dna? as the genome file >> Upload ?export.aa? as the protein file >> >> >> >> 4. second and final Maker run: >> >> >> genome=all-chromosome-r1.04.fasta >> est=Trinity.fasta >> est2genome=0 >> Snaphmm=output of 2 >> >> How do i incorporate the output of training set of gene from Augustus >> web server here into this step 4? >> >> Thanks for your time. >> >> Best Regards >> Karen >> >> >> >> >> >> >> >> >> >> >> >> On 2016-02-05 06:36, Carson Holt wrote: >>> Hi Karen, >>> There are many ways to train Augustus. I prefer to identify gene >>> models in MAKER (GFF3) and use those to train both SNAP and Augustus. >>> Here is a previous post on the topic ?> >>> https://groups.google.com/forum/#!searchin/maker-devel/augustus/maker-devel/FWMSTdqWQqI/lC3miQtiCpwJ >>> [1] >>> In the end you need to look at the SNAP and Augustus models together >>> with evidence alignments in a genome browser (like desktop Apollo). >>> When everything is trained well, both SNAP and Augustus models will >>> look like each other and both seem to look like the evidence >>> alignments. >>> Thanks, >>> Carson >>>> On Feb 4, 2016, at 5:52 PM, hcma wrote: >>>> Hi, >>>> I have a genome sequence and Trinity assembly for a new species and >>>> I am wondering what are the best steps to take when using MAKER? >>>> 1. I used the genome sequence and all assembled Trinity sequence to >>>> do first run of MAKER in order to generate training set for SNAP and >>>> Augustus. >>>> In maker_opts.ctl: >>>> genome=all-chromosome-r1.04.fasta >>>> est=Trinity.fasta >>>> est2genome=1 >>>> 2. Train SNAP >>>> 3. Train Augustus >>>> When i train Augustus, i only supply genome and protein file, should >>>> i also supply the trinity file here? >>>> 4. what's the best parameter to use when running MAKER the second >>>> time for obtaining the final annotation? I would prefer not to use >>>> any external protein data. >>>> genome=all-chromosome-r1.04.fasta >>>> est=Trinity.fasta >>>> est2genome=0 >>>> SNAP >>>> Augustus >>>> Thanks. >>>> Best Regards >>>> KAren >>> Links: >>> ------ >>> [1] >>> https://groups.google.com/forum/#!searchin/maker-devel/augustus/maker-devel/FWMSTdqWQqI/lC3miQtiCpwJ >> From carsonhh at gmail.com Fri Feb 5 16:03:56 2016 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 5 Feb 2016 16:03:56 -0700 Subject: [maker-devel] Q on MAKER In-Reply-To: <4b6492c5148151cc52c91f2d56c6532b@uci.edu> References: <07C1C0FA-243E-4F05-AE17-FBECCE5BC3A5@gmail.com> <980827F7-C4F0-4E26-A480-F588B2E09C7B@gmail.com> <99f6989955acdf6fd6b0875affbeefa9@uci.edu> <308df3bd49b7098b6f0d36cb2675a0de@uci.edu> <4b6492c5148151cc52c91f2d56c6532b@uci.edu> Message-ID: <54B64F8D-D4A6-4E0F-9693-3AC4055CEA02@gmail.com> You need to find out where the augustus MAKER is using is installed. Check the maker_exe.ctl file you are using, or type ?which augustus?. ?Carson > On Feb 5, 2016, at 3:58 PM, hcma wrote: > > Hi Carlson, > > These are the list of directories under maker/2.31.8 > > bin data GMOD INSTALL lib LICENSE MWAS perl README RELEASE src > > > Where can i find augustus/? Or i have to ask my system admin to install this? > > Thanks. > > Best Regards > Karen > > > > > On 2016-02-05 14:54, Carson Holt wrote: >> Augustus gives you an entire directory rather than just a single file >> like SNAP. You have to take the directory and copy it to the >> .../augustus/config/species/ directory. >> Example: >> ?/augustus/config/species/arabidopsis/ >> Then ?arabidopsis? would be the species name to use with MAKER. >> Sometimes you may have to do a second round of both SNAP and Augustus >> training (called bootstrapping). Look at the models you get after the >> first round, and if they look good then, the second round is probably >> not going top be beneficial. >> ?Carson >>> On Feb 5, 2016, at 3:42 PM, hcma wrote: >>> Hi Dr Holt, >>> Thanks for the email. Here is my pipeline, does it seems acceptable? Any comments is welcome and much appreciated. >>> 1. Use maker to generate training gene set: >>> genome=all-chromosome-r1.04.fasta >>> est=Trinity.fasta >>> est2genome=1 >>> 2. Use output of Maker to train SNAP: >>> maker2zff dwil-all-chromosome-r1.04.all.gff >>> fathom genome.ann genome.dna ?gene-stats >>> fathom genome.ann genome.dna ?categorize 1000 >>> fathom genome.ann genome.dna ?gene-stats >>> fathom uni.ann uni.dna ?export 1000 ?plus >>> hmm-assembler.pl genome . > dwil_genome.hmm >>> 3. Use output of Maker to train Augustus on their webserver: >>> File used: >>> Upload ?export.dna? as the genome file >>> Upload ?export.aa? as the protein file >>> 4. second and final Maker run: >>> genome=all-chromosome-r1.04.fasta >>> est=Trinity.fasta >>> est2genome=0 >>> Snaphmm=output of 2 >>> How do i incorporate the output of training set of gene from Augustus web server here into this step 4? >>> Thanks for your time. >>> Best Regards >>> Karen >>> On 2016-02-05 06:36, Carson Holt wrote: >>>> Hi Karen, >>>> There are many ways to train Augustus. I prefer to identify gene >>>> models in MAKER (GFF3) and use those to train both SNAP and Augustus. >>>> Here is a previous post on the topic ?> >>>> https://groups.google.com/forum/#!searchin/maker-devel/augustus/maker-devel/FWMSTdqWQqI/lC3miQtiCpwJ >>>> [1] >>>> In the end you need to look at the SNAP and Augustus models together >>>> with evidence alignments in a genome browser (like desktop Apollo). >>>> When everything is trained well, both SNAP and Augustus models will >>>> look like each other and both seem to look like the evidence >>>> alignments. >>>> Thanks, >>>> Carson >>>>> On Feb 4, 2016, at 5:52 PM, hcma wrote: >>>>> Hi, >>>>> I have a genome sequence and Trinity assembly for a new species and >>>>> I am wondering what are the best steps to take when using MAKER? >>>>> 1. I used the genome sequence and all assembled Trinity sequence to >>>>> do first run of MAKER in order to generate training set for SNAP and >>>>> Augustus. >>>>> In maker_opts.ctl: >>>>> genome=all-chromosome-r1.04.fasta >>>>> est=Trinity.fasta >>>>> est2genome=1 >>>>> 2. Train SNAP >>>>> 3. Train Augustus >>>>> When i train Augustus, i only supply genome and protein file, should >>>>> i also supply the trinity file here? >>>>> 4. what's the best parameter to use when running MAKER the second >>>>> time for obtaining the final annotation? I would prefer not to use >>>>> any external protein data. >>>>> genome=all-chromosome-r1.04.fasta >>>>> est=Trinity.fasta >>>>> est2genome=0 >>>>> SNAP >>>>> Augustus >>>>> Thanks. >>>>> Best Regards >>>>> KAren >>>> Links: >>>> ------ >>>> [1] >>>> https://groups.google.com/forum/#!searchin/maker-devel/augustus/maker-devel/FWMSTdqWQqI/lC3miQtiCpwJ > From hcma at uci.edu Fri Feb 5 16:20:26 2016 From: hcma at uci.edu (hcma) Date: Fri, 05 Feb 2016 15:20:26 -0800 Subject: [maker-devel] Q on MAKER In-Reply-To: <54B64F8D-D4A6-4E0F-9693-3AC4055CEA02@gmail.com> References: <07C1C0FA-243E-4F05-AE17-FBECCE5BC3A5@gmail.com> <980827F7-C4F0-4E26-A480-F588B2E09C7B@gmail.com> <99f6989955acdf6fd6b0875affbeefa9@uci.edu> <308df3bd49b7098b6f0d36cb2675a0de@uci.edu> <4b6492c5148151cc52c91f2d56c6532b@uci.edu> <54B64F8D-D4A6-4E0F-9693-3AC4055CEA02@gmail.com> Message-ID: <5a40b7af9947dc8297046ba52620569e@uci.edu> Hi Carlson, Thanks for the instruction and in maker_exe.ctl, i only see path to snap, but not to augustus, so my system admin is checking this for me. From some manual i found, people use both snap and augustus when using MAKER to annotate genomes. Would you recommend using both or one of the 2 is sufficient? Thanks for your valuable time and advise. Best Regards Karen On 2016-02-05 15:03, Carson Holt wrote: > You need to find out where the augustus MAKER is using is installed. > Check the maker_exe.ctl file you are using, or type ?which augustus?. > > ?Carson > > >> On Feb 5, 2016, at 3:58 PM, hcma wrote: >> >> Hi Carlson, >> >> These are the list of directories under maker/2.31.8 >> >> bin data GMOD INSTALL lib LICENSE MWAS perl README RELEASE >> src >> >> >> Where can i find augustus/? Or i have to ask my system admin to >> install this? >> >> Thanks. >> >> Best Regards >> Karen >> >> >> >> >> On 2016-02-05 14:54, Carson Holt wrote: >>> Augustus gives you an entire directory rather than just a single file >>> like SNAP. You have to take the directory and copy it to the >>> .../augustus/config/species/ directory. >>> Example: >>> ?/augustus/config/species/arabidopsis/ >>> Then ?arabidopsis? would be the species name to use with MAKER. >>> Sometimes you may have to do a second round of both SNAP and Augustus >>> training (called bootstrapping). Look at the models you get after the >>> first round, and if they look good then, the second round is probably >>> not going top be beneficial. >>> ?Carson >>>> On Feb 5, 2016, at 3:42 PM, hcma wrote: >>>> Hi Dr Holt, >>>> Thanks for the email. Here is my pipeline, does it seems acceptable? >>>> Any comments is welcome and much appreciated. >>>> 1. Use maker to generate training gene set: >>>> genome=all-chromosome-r1.04.fasta >>>> est=Trinity.fasta >>>> est2genome=1 >>>> 2. Use output of Maker to train SNAP: >>>> maker2zff dwil-all-chromosome-r1.04.all.gff >>>> fathom genome.ann genome.dna ?gene-stats >>>> fathom genome.ann genome.dna ?categorize 1000 >>>> fathom genome.ann genome.dna ?gene-stats >>>> fathom uni.ann uni.dna ?export 1000 ?plus >>>> hmm-assembler.pl genome . > dwil_genome.hmm >>>> 3. Use output of Maker to train Augustus on their webserver: >>>> File used: >>>> Upload ?export.dna? as the genome file >>>> Upload ?export.aa? as the protein file >>>> 4. second and final Maker run: >>>> genome=all-chromosome-r1.04.fasta >>>> est=Trinity.fasta >>>> est2genome=0 >>>> Snaphmm=output of 2 >>>> How do i incorporate the output of training set of gene from >>>> Augustus web server here into this step 4? >>>> Thanks for your time. >>>> Best Regards >>>> Karen >>>> On 2016-02-05 06:36, Carson Holt wrote: >>>>> Hi Karen, >>>>> There are many ways to train Augustus. I prefer to identify gene >>>>> models in MAKER (GFF3) and use those to train both SNAP and >>>>> Augustus. >>>>> Here is a previous post on the topic ?> >>>>> https://groups.google.com/forum/#!searchin/maker-devel/augustus/maker-devel/FWMSTdqWQqI/lC3miQtiCpwJ >>>>> [1] >>>>> In the end you need to look at the SNAP and Augustus models >>>>> together >>>>> with evidence alignments in a genome browser (like desktop Apollo). >>>>> When everything is trained well, both SNAP and Augustus models will >>>>> look like each other and both seem to look like the evidence >>>>> alignments. >>>>> Thanks, >>>>> Carson >>>>>> On Feb 4, 2016, at 5:52 PM, hcma wrote: >>>>>> Hi, >>>>>> I have a genome sequence and Trinity assembly for a new species >>>>>> and >>>>>> I am wondering what are the best steps to take when using MAKER? >>>>>> 1. I used the genome sequence and all assembled Trinity sequence >>>>>> to >>>>>> do first run of MAKER in order to generate training set for SNAP >>>>>> and >>>>>> Augustus. >>>>>> In maker_opts.ctl: >>>>>> genome=all-chromosome-r1.04.fasta >>>>>> est=Trinity.fasta >>>>>> est2genome=1 >>>>>> 2. Train SNAP >>>>>> 3. Train Augustus >>>>>> When i train Augustus, i only supply genome and protein file, >>>>>> should >>>>>> i also supply the trinity file here? >>>>>> 4. what's the best parameter to use when running MAKER the second >>>>>> time for obtaining the final annotation? I would prefer not to use >>>>>> any external protein data. >>>>>> genome=all-chromosome-r1.04.fasta >>>>>> est=Trinity.fasta >>>>>> est2genome=0 >>>>>> SNAP >>>>>> Augustus >>>>>> Thanks. >>>>>> Best Regards >>>>>> KAren >>>>> Links: >>>>> ------ >>>>> [1] >>>>> https://groups.google.com/forum/#!searchin/maker-devel/augustus/maker-devel/FWMSTdqWQqI/lC3miQtiCpwJ >> From carsonhh at gmail.com Fri Feb 5 16:33:23 2016 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 5 Feb 2016 16:33:23 -0700 Subject: [maker-devel] Q on MAKER In-Reply-To: <5a40b7af9947dc8297046ba52620569e@uci.edu> References: <07C1C0FA-243E-4F05-AE17-FBECCE5BC3A5@gmail.com> <980827F7-C4F0-4E26-A480-F588B2E09C7B@gmail.com> <99f6989955acdf6fd6b0875affbeefa9@uci.edu> <308df3bd49b7098b6f0d36cb2675a0de@uci.edu> <4b6492c5148151cc52c91f2d56c6532b@uci.edu> <54B64F8D-D4A6-4E0F-9693-3AC4055CEA02@gmail.com> <5a40b7af9947dc8297046ba52620569e@uci.edu> Message-ID: <9449FD1E-5D22-4B0B-A61A-644D7B63335B@gmail.com> I recommend using both. You probably don't have augustus installed. --Carson Sent from my iPhone > On Feb 5, 2016, at 4:20 PM, hcma wrote: > > Hi Carlson, > > Thanks for the instruction and in maker_exe.ctl, i only see path to snap, but not to augustus, so my system admin is checking this for me. > > From some manual i found, people use both snap and augustus when using MAKER to annotate genomes. Would you recommend using both or one of the 2 is sufficient? > > Thanks for your valuable time and advise. > > Best Regards > Karen > > > > > >> On 2016-02-05 15:03, Carson Holt wrote: >> You need to find out where the augustus MAKER is using is installed. >> Check the maker_exe.ctl file you are using, or type ?which augustus?. >> ?Carson >>> On Feb 5, 2016, at 3:58 PM, hcma wrote: >>> Hi Carlson, >>> These are the list of directories under maker/2.31.8 >>> bin data GMOD INSTALL lib LICENSE MWAS perl README RELEASE src >>> Where can i find augustus/? Or i have to ask my system admin to install this? >>> Thanks. >>> Best Regards >>> Karen >>>> On 2016-02-05 14:54, Carson Holt wrote: >>>> Augustus gives you an entire directory rather than just a single file >>>> like SNAP. You have to take the directory and copy it to the >>>> .../augustus/config/species/ directory. >>>> Example: >>>> ?/augustus/config/species/arabidopsis/ >>>> Then ?arabidopsis? would be the species name to use with MAKER. >>>> Sometimes you may have to do a second round of both SNAP and Augustus >>>> training (called bootstrapping). Look at the models you get after the >>>> first round, and if they look good then, the second round is probably >>>> not going top be beneficial. >>>> ?Carson >>>>> On Feb 5, 2016, at 3:42 PM, hcma wrote: >>>>> Hi Dr Holt, >>>>> Thanks for the email. Here is my pipeline, does it seems acceptable? Any comments is welcome and much appreciated. >>>>> 1. Use maker to generate training gene set: >>>>> genome=all-chromosome-r1.04.fasta >>>>> est=Trinity.fasta >>>>> est2genome=1 >>>>> 2. Use output of Maker to train SNAP: >>>>> maker2zff dwil-all-chromosome-r1.04.all.gff >>>>> fathom genome.ann genome.dna ?gene-stats >>>>> fathom genome.ann genome.dna ?categorize 1000 >>>>> fathom genome.ann genome.dna ?gene-stats >>>>> fathom uni.ann uni.dna ?export 1000 ?plus >>>>> hmm-assembler.pl genome . > dwil_genome.hmm >>>>> 3. Use output of Maker to train Augustus on their webserver: >>>>> File used: >>>>> Upload ?export.dna? as the genome file >>>>> Upload ?export.aa? as the protein file >>>>> 4. second and final Maker run: >>>>> genome=all-chromosome-r1.04.fasta >>>>> est=Trinity.fasta >>>>> est2genome=0 >>>>> Snaphmm=output of 2 >>>>> How do i incorporate the output of training set of gene from Augustus web server here into this step 4? >>>>> Thanks for your time. >>>>> Best Regards >>>>> Karen >>>>>> On 2016-02-05 06:36, Carson Holt wrote: >>>>>> Hi Karen, >>>>>> There are many ways to train Augustus. I prefer to identify gene >>>>>> models in MAKER (GFF3) and use those to train both SNAP and Augustus. >>>>>> Here is a previous post on the topic ?> >>>>>> https://groups.google.com/forum/#!searchin/maker-devel/augustus/maker-devel/FWMSTdqWQqI/lC3miQtiCpwJ >>>>>> [1] >>>>>> In the end you need to look at the SNAP and Augustus models together >>>>>> with evidence alignments in a genome browser (like desktop Apollo). >>>>>> When everything is trained well, both SNAP and Augustus models will >>>>>> look like each other and both seem to look like the evidence >>>>>> alignments. >>>>>> Thanks, >>>>>> Carson >>>>>>> On Feb 4, 2016, at 5:52 PM, hcma wrote: >>>>>>> Hi, >>>>>>> I have a genome sequence and Trinity assembly for a new species and >>>>>>> I am wondering what are the best steps to take when using MAKER? >>>>>>> 1. I used the genome sequence and all assembled Trinity sequence to >>>>>>> do first run of MAKER in order to generate training set for SNAP and >>>>>>> Augustus. >>>>>>> In maker_opts.ctl: >>>>>>> genome=all-chromosome-r1.04.fasta >>>>>>> est=Trinity.fasta >>>>>>> est2genome=1 >>>>>>> 2. Train SNAP >>>>>>> 3. Train Augustus >>>>>>> When i train Augustus, i only supply genome and protein file, should >>>>>>> i also supply the trinity file here? >>>>>>> 4. what's the best parameter to use when running MAKER the second >>>>>>> time for obtaining the final annotation? I would prefer not to use >>>>>>> any external protein data. >>>>>>> genome=all-chromosome-r1.04.fasta >>>>>>> est=Trinity.fasta >>>>>>> est2genome=0 >>>>>>> SNAP >>>>>>> Augustus >>>>>>> Thanks. >>>>>>> Best Regards >>>>>>> KAren >>>>>> Links: >>>>>> ------ >>>>>> [1] >>>>>> https://groups.google.com/forum/#!searchin/maker-devel/augustus/maker-devel/FWMSTdqWQqI/lC3miQtiCpwJ > From dcard at uta.edu Mon Feb 8 09:05:21 2016 From: dcard at uta.edu (Card, Daren C) Date: Mon, 8 Feb 2016 10:05:21 -0600 Subject: [maker-devel] Most scaffolds fail with BadParameter error, Maker on TACC Lonestar Message-ID: <38614065-4DEF-47B4-8100-BD18901D2592@uta.edu> Hello, I?ve tried to run Maker on TACC Lonestar (4, trying to squeeze some last things in before deprecation), but I haven?t had much success. I get Maker to run, but only 28 proteins/transcripts are annotated and most scaffolds fail to finish properly, according to the ?master_datastore_index.log. In my STDERR, I see a consistent error show up for many scaffolds: ------------- EXCEPTION: Bio::Root::BadParameter ------------- MSG: ' 7.5' is not a valid score VALUE: 7.5 STACK: Error::throw STACK: Bio::Root::Root::throw /opt/apps/maker/2.30/bin/../perl/lib/Bio/Root/Root.pm:486 STACK: Bio::SeqFeature::Generic::score /opt/apps/maker/2.30/bin/../perl/lib/Bio/SeqFeature/Generic.pm:468 STACK: GFFDB::_ary_to_features /opt/apps/maker/2.30/bin/../lib/GFFDB.pm:891 STACK: GFFDB::phathits_on_chunk /opt/apps/maker/2.30/bin/../lib/GFFDB.pm:534 STACK: Process::MpiChunk::_go /opt/apps/maker/2.30/bin/../lib/Process/MpiChunk.pm:756 STACK: Process::MpiChunk::run /opt/apps/maker/2.30/bin/../lib/Process/MpiChunk.pm:341 STACK: main::node_thread /opt/apps/maker/2.30/bin/maker:1433 STACK: threads::new /opt/apps/maker/2.30/bin/../perl/lib/forks.pm:799 STACK: /opt/apps/maker/2.30/bin/maker:901 -------------------------------------------------------------- --> rank=18, hostname=c304-113.ls4.tacc.utexas.edu ERROR: Failed while doing repeat masking ERROR: Chunk failed at level:0, tier_type:1 FAILED CONTIG:scaffold279|size418813 The ?7.5? value can vary between errors, but other than that and the scaffold ID, the rest of the error message is the same. I obviously don?t have the expertise to diagnose the issue here, but I?m hoping someone can help me sort this out. A quick, unrelated question, is whether the Yandell lab (or anyone else) has a script that will produce a CDS file (multi-FASTA file) from a GFF annotation and FASTA genome sequence. I?m trying to produce a CDS from some NCBI genomes (annoying that it isn?t already included from NCBI), but the script I produced to do this is giving some suspect results. I figured if anyone had a well-tested script for this purpose, it would be someone on this list. Best, Daren Daren Card Ph.D. Candidate Castoe Lab University of Texas at Arlington dcard at uta.edu www.darencard.net From carsonhh at gmail.com Mon Feb 8 09:31:08 2016 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 8 Feb 2016 09:31:08 -0700 Subject: [maker-devel] Most scaffolds fail with BadParameter error, Maker on TACC Lonestar In-Reply-To: <38614065-4DEF-47B4-8100-BD18901D2592@uta.edu> References: <38614065-4DEF-47B4-8100-BD18901D2592@uta.edu> Message-ID: <9BA957A0-DD0F-4920-A778-65D0DE10F1ED@gmail.com> It?s failing because there is something wrong with the format of the input GFF file. It might not be GFF3, it may be GTF format, it may have mixed types (not just gene/mRNA/exon/CDS models), or it may have a missing Parent= or ID= tag required to generate the proper feature relationship. You can try and use GAL (http://www.sequenceontology.org/software/GAL.html ) to help validate of convert the format. Also note the message ?> MSG: ' 7.5' is not a valid score There is an extra whitespace inside the single quotes which probably means you have contaminating whitespace before the value. GFF3 is tab delimited, space characters are not permitted, and if required must be escaped following URI escaping convention. ?Carson > On Feb 8, 2016, at 9:05 AM, Card, Daren C wrote: > > Hello, > > I?ve tried to run Maker on TACC Lonestar (4, trying to squeeze some last things in before deprecation), but I haven?t had much success. I get Maker to run, but only 28 proteins/transcripts are annotated and most scaffolds fail to finish properly, according to the ?master_datastore_index.log. In my STDERR, I see a consistent error show up for many scaffolds: > > ------------- EXCEPTION: Bio::Root::BadParameter ------------- > MSG: ' 7.5' is not a valid score > VALUE: 7.5 > STACK: Error::throw > STACK: Bio::Root::Root::throw /opt/apps/maker/2.30/bin/../perl/lib/Bio/Root/Root.pm:486 > STACK: Bio::SeqFeature::Generic::score /opt/apps/maker/2.30/bin/../perl/lib/Bio/SeqFeature/Generic.pm:468 > STACK: GFFDB::_ary_to_features /opt/apps/maker/2.30/bin/../lib/GFFDB.pm:891 > STACK: GFFDB::phathits_on_chunk /opt/apps/maker/2.30/bin/../lib/GFFDB.pm:534 > STACK: Process::MpiChunk::_go /opt/apps/maker/2.30/bin/../lib/Process/MpiChunk.pm:756 > STACK: Process::MpiChunk::run /opt/apps/maker/2.30/bin/../lib/Process/MpiChunk.pm:341 > STACK: main::node_thread /opt/apps/maker/2.30/bin/maker:1433 > STACK: threads::new /opt/apps/maker/2.30/bin/../perl/lib/forks.pm:799 > STACK: /opt/apps/maker/2.30/bin/maker:901 > -------------------------------------------------------------- > --> rank=18, hostname=c304-113.ls4.tacc.utexas.edu > ERROR: Failed while doing repeat masking > ERROR: Chunk failed at level:0, tier_type:1 > FAILED CONTIG:scaffold279|size418813 > > The ?7.5? value can vary between errors, but other than that and the scaffold ID, the rest of the error message is the same. I obviously don?t have the expertise to diagnose the issue here, but I?m hoping someone can help me sort this out. > > A quick, unrelated question, is whether the Yandell lab (or anyone else) has a script that will produce a CDS file (multi-FASTA file) from a GFF annotation and FASTA genome sequence. I?m trying to produce a CDS from some NCBI genomes (annoying that it isn?t already included from NCBI), but the script I produced to do this is giving some suspect results. I figured if anyone had a well-tested script for this purpose, it would be someone on this list. > > Best, > Daren > > > Daren Card > Ph.D. Candidate > Castoe Lab > University of Texas at Arlington > dcard at uta.edu > www.darencard.net > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From hcma at uci.edu Tue Feb 9 15:35:13 2016 From: hcma at uci.edu (hcma) Date: Tue, 09 Feb 2016 14:35:13 -0800 Subject: [maker-devel] Q on MAKER In-Reply-To: <9449FD1E-5D22-4B0B-A61A-644D7B63335B@gmail.com> References: <07C1C0FA-243E-4F05-AE17-FBECCE5BC3A5@gmail.com> <980827F7-C4F0-4E26-A480-F588B2E09C7B@gmail.com> <99f6989955acdf6fd6b0875affbeefa9@uci.edu> <308df3bd49b7098b6f0d36cb2675a0de@uci.edu> <4b6492c5148151cc52c91f2d56c6532b@uci.edu> <54B64F8D-D4A6-4E0F-9693-3AC4055CEA02@gmail.com> <5a40b7af9947dc8297046ba52620569e@uci.edu> <9449FD1E-5D22-4B0B-A61A-644D7B63335B@gmail.com> Message-ID: <7e4d6f2773f654f8530155936b648832@uci.edu> Hi Carson, For the final run of annotation, I would like to incorporate tophat results from RNA-seq data, from your experience, do you know if it is better to use raw RNA-seq (Illumina paired-end data) or trimmed (trimmed using Trimmomatuc) data for feeding into tophat? If trimmed, do you recommend a particular programme? Thanks for your time. Best Regards KAren On 2016-02-05 15:33, Carson Holt wrote: > I recommend using both. You probably don't have augustus installed. > > --Carson > > Sent from my iPhone > >> On Feb 5, 2016, at 4:20 PM, hcma wrote: >> >> Hi Carlson, >> >> Thanks for the instruction and in maker_exe.ctl, i only see path to >> snap, but not to augustus, so my system admin is checking this for me. >> >> From some manual i found, people use both snap and augustus when using >> MAKER to annotate genomes. Would you recommend using both or one of >> the 2 is sufficient? >> >> Thanks for your valuable time and advise. >> >> Best Regards >> Karen >> >> >> >> >> >>> On 2016-02-05 15:03, Carson Holt wrote: >>> You need to find out where the augustus MAKER is using is installed. >>> Check the maker_exe.ctl file you are using, or type ?which augustus?. >>> ?Carson >>>> On Feb 5, 2016, at 3:58 PM, hcma wrote: >>>> Hi Carlson, >>>> These are the list of directories under maker/2.31.8 >>>> bin data GMOD INSTALL lib LICENSE MWAS perl README RELEASE >>>> src >>>> Where can i find augustus/? Or i have to ask my system admin to >>>> install this? >>>> Thanks. >>>> Best Regards >>>> Karen >>>>> On 2016-02-05 14:54, Carson Holt wrote: >>>>> Augustus gives you an entire directory rather than just a single >>>>> file >>>>> like SNAP. You have to take the directory and copy it to the >>>>> .../augustus/config/species/ directory. >>>>> Example: >>>>> ?/augustus/config/species/arabidopsis/ >>>>> Then ?arabidopsis? would be the species name to use with MAKER. >>>>> Sometimes you may have to do a second round of both SNAP and >>>>> Augustus >>>>> training (called bootstrapping). Look at the models you get after >>>>> the >>>>> first round, and if they look good then, the second round is >>>>> probably >>>>> not going top be beneficial. >>>>> ?Carson >>>>>> On Feb 5, 2016, at 3:42 PM, hcma wrote: >>>>>> Hi Dr Holt, >>>>>> Thanks for the email. Here is my pipeline, does it seems >>>>>> acceptable? Any comments is welcome and much appreciated. >>>>>> 1. Use maker to generate training gene set: >>>>>> genome=all-chromosome-r1.04.fasta >>>>>> est=Trinity.fasta >>>>>> est2genome=1 >>>>>> 2. Use output of Maker to train SNAP: >>>>>> maker2zff dwil-all-chromosome-r1.04.all.gff >>>>>> fathom genome.ann genome.dna ?gene-stats >>>>>> fathom genome.ann genome.dna ?categorize 1000 >>>>>> fathom genome.ann genome.dna ?gene-stats >>>>>> fathom uni.ann uni.dna ?export 1000 ?plus >>>>>> hmm-assembler.pl genome . > dwil_genome.hmm >>>>>> 3. Use output of Maker to train Augustus on their webserver: >>>>>> File used: >>>>>> Upload ?export.dna? as the genome file >>>>>> Upload ?export.aa? as the protein file >>>>>> 4. second and final Maker run: >>>>>> genome=all-chromosome-r1.04.fasta >>>>>> est=Trinity.fasta >>>>>> est2genome=0 >>>>>> Snaphmm=output of 2 >>>>>> How do i incorporate the output of training set of gene from >>>>>> Augustus web server here into this step 4? >>>>>> Thanks for your time. >>>>>> Best Regards >>>>>> Karen >>>>>>> On 2016-02-05 06:36, Carson Holt wrote: >>>>>>> Hi Karen, >>>>>>> There are many ways to train Augustus. I prefer to identify gene >>>>>>> models in MAKER (GFF3) and use those to train both SNAP and >>>>>>> Augustus. >>>>>>> Here is a previous post on the topic ?> >>>>>>> https://groups.google.com/forum/#!searchin/maker-devel/augustus/maker-devel/FWMSTdqWQqI/lC3miQtiCpwJ >>>>>>> [1] >>>>>>> In the end you need to look at the SNAP and Augustus models >>>>>>> together >>>>>>> with evidence alignments in a genome browser (like desktop >>>>>>> Apollo). >>>>>>> When everything is trained well, both SNAP and Augustus models >>>>>>> will >>>>>>> look like each other and both seem to look like the evidence >>>>>>> alignments. >>>>>>> Thanks, >>>>>>> Carson >>>>>>>> On Feb 4, 2016, at 5:52 PM, hcma wrote: >>>>>>>> Hi, >>>>>>>> I have a genome sequence and Trinity assembly for a new species >>>>>>>> and >>>>>>>> I am wondering what are the best steps to take when using MAKER? >>>>>>>> 1. I used the genome sequence and all assembled Trinity sequence >>>>>>>> to >>>>>>>> do first run of MAKER in order to generate training set for SNAP >>>>>>>> and >>>>>>>> Augustus. >>>>>>>> In maker_opts.ctl: >>>>>>>> genome=all-chromosome-r1.04.fasta >>>>>>>> est=Trinity.fasta >>>>>>>> est2genome=1 >>>>>>>> 2. Train SNAP >>>>>>>> 3. Train Augustus >>>>>>>> When i train Augustus, i only supply genome and protein file, >>>>>>>> should >>>>>>>> i also supply the trinity file here? >>>>>>>> 4. what's the best parameter to use when running MAKER the >>>>>>>> second >>>>>>>> time for obtaining the final annotation? I would prefer not to >>>>>>>> use >>>>>>>> any external protein data. >>>>>>>> genome=all-chromosome-r1.04.fasta >>>>>>>> est=Trinity.fasta >>>>>>>> est2genome=0 >>>>>>>> SNAP >>>>>>>> Augustus >>>>>>>> Thanks. >>>>>>>> Best Regards >>>>>>>> KAren >>>>>>> Links: >>>>>>> ------ >>>>>>> [1] >>>>>>> https://groups.google.com/forum/#!searchin/maker-devel/augustus/maker-devel/FWMSTdqWQqI/lC3miQtiCpwJ >> From jgallant at msu.edu Tue Feb 9 19:36:51 2016 From: jgallant at msu.edu (Jason Gallant) Date: Wed, 10 Feb 2016 02:36:51 +0000 Subject: [maker-devel] Extract FASTA Sequences from "Maker Standard" Build Message-ID: Hi Everyone, Quick question? I?ve run through Mike Cambell?s tutorial on building ?Maker Standard?, ?Maker Default? and ?Maker Max? datasets. I?ve decided that the ?Maker Standard? data (Transcripts with Evidence and/or IPR scan hits) makes the most sense for what we?re trying to do. Is there an easy way to create the fasta files associated with the maker standard build? Fasta_merge typically outputs a variety of .fasta files, which I?ve been able to create following this protocol for the ?maker max? dataset. I?d like to get these for the ?maker standard? build. Currently, the datastore contains the data for the ?maker max? data. One way, i suppose would be to re-run MAKER with the maker standard gff file, but it seems like an overly complicated way of doing it?? Any suggestions Mike (or others?) Has anyone written a script to do this automagically? Best, Jason Gallant -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.s.campbell1 at gmail.com Wed Feb 10 07:03:29 2016 From: michael.s.campbell1 at gmail.com (Michael Campbell) Date: Wed, 10 Feb 2016 09:03:29 -0500 Subject: [maker-devel] Extract FASTA Sequences from "Maker Standard" Build In-Reply-To: References: Message-ID: <2F89E4BC-C473-40A9-AE81-EAA2323B17D0@gmail.com> Hi Jason, Rerunning MAKER with the standard gff3 file would work, but for speed I would use the fasta_tool accessory script that is bundled with MAKER. All you need to make is a file with the list of transcript names from the standard gff3. Then you can use fasta_tool with the --select ooption to return all of the FASTA sequences that are in the list. The command would look like this PATH_TO_MAKER/maker/bin/fasta_tool --select id_file.txt max_transcritps.fasta | PATH_TO_MAKER/maker/bin/fasta_tool --wrap 80 > standard_transcripts.fasta fasta_tool outputs unwraped fasta by default, so I generally pipe the output back through fasta_tool to wrap the text. The above command line wraps the sequence at 80 characters. you can use a perl one liner like this one to make the id file perl -lane ' if ($F[2] eq mRNA){my ($id) = $_ =~ /Name=(\S+?);/; print $id;}? maker_standard.gff If you use these command line make sure you type them out yourself, email programs have a tendency to change characters slightly making copy/pasted command fail. Thanks, Mike > On Feb 9, 2016, at 9:36 PM, Jason Gallant wrote: > > Hi Everyone, > > Quick question? I?ve run through Mike Cambell?s tutorial on building ?Maker Standard?, ?Maker Default? and ?Maker Max? datasets. I?ve decided that the ?Maker Standard? data (Transcripts with Evidence and/or IPR scan hits) makes the most sense for what we?re trying to do. > > Is there an easy way to create the fasta files associated with the maker standard build? Fasta_merge typically outputs a variety of .fasta files, which I?ve been able to create following this protocol for the ?maker max? dataset. I?d like to get these for the ?maker standard? build. > > Currently, the datastore contains the data for the ?maker max? data. One way, i suppose would be to re-run MAKER with the maker standard gff file, but it seems like an overly complicated way of doing it?? > > Any suggestions Mike (or others?) Has anyone written a script to do this automagically? > > Best, > Jason Gallant > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From michael.s.campbell1 at gmail.com Wed Feb 10 07:17:11 2016 From: michael.s.campbell1 at gmail.com (Michael Campbell) Date: Wed, 10 Feb 2016 09:17:11 -0500 Subject: [maker-devel] Q on MAKER In-Reply-To: <7e4d6f2773f654f8530155936b648832@uci.edu> References: <07C1C0FA-243E-4F05-AE17-FBECCE5BC3A5@gmail.com> <980827F7-C4F0-4E26-A480-F588B2E09C7B@gmail.com> <99f6989955acdf6fd6b0875affbeefa9@uci.edu> <308df3bd49b7098b6f0d36cb2675a0de@uci.edu> <4b6492c5148151cc52c91f2d56c6532b@uci.edu> <54B64F8D-D4A6-4E0F-9693-3AC4055CEA02@gmail.com> <5a40b7af9947dc8297046ba52620569e@uci.edu> <9449FD1E-5D22-4B0B-A61A-644D7B63335B@gmail.com> <7e4d6f2773f654f8530155936b648832@uci.edu> Message-ID: <7495272C-476A-4985-8D49-30D991410535@gmail.com> HI Karen, From my experience trimming reads will not make things worse and it generally makes things better. As far as the best program to use, one doesn?t really stand out above the others as far as I can tell. However, with paired end reads it is important to use a trimmer that preserves the pairing between the two files (i.e when an entire read is discarded the paired read is moved into a file for singletons). Thanks Mike > On Feb 9, 2016, at 5:35 PM, hcma wrote: > > Hi Carson, > > For the final run of annotation, I would like to incorporate tophat results from RNA-seq data, from your experience, do you know if it is better to use raw RNA-seq (Illumina paired-end data) or trimmed (trimmed using Trimmomatuc) data for feeding into tophat? If trimmed, do you recommend a particular programme? > > Thanks for your time. > > Best Regards > KAren > > > > > On 2016-02-05 15:33, Carson Holt wrote: >> I recommend using both. You probably don't have augustus installed. >> --Carson >> Sent from my iPhone >>> On Feb 5, 2016, at 4:20 PM, hcma wrote: >>> Hi Carlson, >>> Thanks for the instruction and in maker_exe.ctl, i only see path to snap, but not to augustus, so my system admin is checking this for me. >>> From some manual i found, people use both snap and augustus when using MAKER to annotate genomes. Would you recommend using both or one of the 2 is sufficient? >>> Thanks for your valuable time and advise. >>> Best Regards >>> Karen >>>> On 2016-02-05 15:03, Carson Holt wrote: >>>> You need to find out where the augustus MAKER is using is installed. >>>> Check the maker_exe.ctl file you are using, or type ?which augustus?. >>>> ?Carson >>>>> On Feb 5, 2016, at 3:58 PM, hcma wrote: >>>>> Hi Carlson, >>>>> These are the list of directories under maker/2.31.8 >>>>> bin data GMOD INSTALL lib LICENSE MWAS perl README RELEASE src >>>>> Where can i find augustus/? Or i have to ask my system admin to install this? >>>>> Thanks. >>>>> Best Regards >>>>> Karen >>>>>> On 2016-02-05 14:54, Carson Holt wrote: >>>>>> Augustus gives you an entire directory rather than just a single file >>>>>> like SNAP. You have to take the directory and copy it to the >>>>>> .../augustus/config/species/ directory. >>>>>> Example: >>>>>> ?/augustus/config/species/arabidopsis/ >>>>>> Then ?arabidopsis? would be the species name to use with MAKER. >>>>>> Sometimes you may have to do a second round of both SNAP and Augustus >>>>>> training (called bootstrapping). Look at the models you get after the >>>>>> first round, and if they look good then, the second round is probably >>>>>> not going top be beneficial. >>>>>> ?Carson >>>>>>> On Feb 5, 2016, at 3:42 PM, hcma wrote: >>>>>>> Hi Dr Holt, >>>>>>> Thanks for the email. Here is my pipeline, does it seems acceptable? Any comments is welcome and much appreciated. >>>>>>> 1. Use maker to generate training gene set: >>>>>>> genome=all-chromosome-r1.04.fasta >>>>>>> est=Trinity.fasta >>>>>>> est2genome=1 >>>>>>> 2. Use output of Maker to train SNAP: >>>>>>> maker2zff dwil-all-chromosome-r1.04.all.gff >>>>>>> fathom genome.ann genome.dna ?gene-stats >>>>>>> fathom genome.ann genome.dna ?categorize 1000 >>>>>>> fathom genome.ann genome.dna ?gene-stats >>>>>>> fathom uni.ann uni.dna ?export 1000 ?plus >>>>>>> hmm-assembler.pl genome . > dwil_genome.hmm >>>>>>> 3. Use output of Maker to train Augustus on their webserver: >>>>>>> File used: >>>>>>> Upload ?export.dna? as the genome file >>>>>>> Upload ?export.aa? as the protein file >>>>>>> 4. second and final Maker run: >>>>>>> genome=all-chromosome-r1.04.fasta >>>>>>> est=Trinity.fasta >>>>>>> est2genome=0 >>>>>>> Snaphmm=output of 2 >>>>>>> How do i incorporate the output of training set of gene from Augustus web server here into this step 4? >>>>>>> Thanks for your time. >>>>>>> Best Regards >>>>>>> Karen >>>>>>>> On 2016-02-05 06:36, Carson Holt wrote: >>>>>>>> Hi Karen, >>>>>>>> There are many ways to train Augustus. I prefer to identify gene >>>>>>>> models in MAKER (GFF3) and use those to train both SNAP and Augustus. >>>>>>>> Here is a previous post on the topic ?> >>>>>>>> https://groups.google.com/forum/#!searchin/maker-devel/augustus/maker-devel/FWMSTdqWQqI/lC3miQtiCpwJ >>>>>>>> [1] >>>>>>>> In the end you need to look at the SNAP and Augustus models together >>>>>>>> with evidence alignments in a genome browser (like desktop Apollo). >>>>>>>> When everything is trained well, both SNAP and Augustus models will >>>>>>>> look like each other and both seem to look like the evidence >>>>>>>> alignments. >>>>>>>> Thanks, >>>>>>>> Carson >>>>>>>>> On Feb 4, 2016, at 5:52 PM, hcma wrote: >>>>>>>>> Hi, >>>>>>>>> I have a genome sequence and Trinity assembly for a new species and >>>>>>>>> I am wondering what are the best steps to take when using MAKER? >>>>>>>>> 1. I used the genome sequence and all assembled Trinity sequence to >>>>>>>>> do first run of MAKER in order to generate training set for SNAP and >>>>>>>>> Augustus. >>>>>>>>> In maker_opts.ctl: >>>>>>>>> genome=all-chromosome-r1.04.fasta >>>>>>>>> est=Trinity.fasta >>>>>>>>> est2genome=1 >>>>>>>>> 2. Train SNAP >>>>>>>>> 3. Train Augustus >>>>>>>>> When i train Augustus, i only supply genome and protein file, should >>>>>>>>> i also supply the trinity file here? >>>>>>>>> 4. what's the best parameter to use when running MAKER the second >>>>>>>>> time for obtaining the final annotation? I would prefer not to use >>>>>>>>> any external protein data. >>>>>>>>> genome=all-chromosome-r1.04.fasta >>>>>>>>> est=Trinity.fasta >>>>>>>>> est2genome=0 >>>>>>>>> SNAP >>>>>>>>> Augustus >>>>>>>>> Thanks. >>>>>>>>> Best Regards >>>>>>>>> KAren >>>>>>>> Links: >>>>>>>> ------ >>>>>>>> [1] >>>>>>>> https://groups.google.com/forum/#!searchin/maker-devel/augustus/maker-devel/FWMSTdqWQqI/lC3miQtiCpwJ > From hcma at uci.edu Wed Feb 10 15:27:41 2016 From: hcma at uci.edu (hcma) Date: Wed, 10 Feb 2016 14:27:41 -0800 Subject: [maker-devel] Q on MAKER In-Reply-To: <7495272C-476A-4985-8D49-30D991410535@gmail.com> References: <07C1C0FA-243E-4F05-AE17-FBECCE5BC3A5@gmail.com> <980827F7-C4F0-4E26-A480-F588B2E09C7B@gmail.com> <99f6989955acdf6fd6b0875affbeefa9@uci.edu> <308df3bd49b7098b6f0d36cb2675a0de@uci.edu> <4b6492c5148151cc52c91f2d56c6532b@uci.edu> <54B64F8D-D4A6-4E0F-9693-3AC4055CEA02@gmail.com> <5a40b7af9947dc8297046ba52620569e@uci.edu> <9449FD1E-5D22-4B0B-A61A-644D7B63335B@gmail.com> <7e4d6f2773f654f8530155936b648832@uci.edu> <7495272C-476A-4985-8D49-30D991410535@gmail.com> Message-ID: <7870d65f86546a8b486faf98c1f6fcc0@uci.edu> Hi Mike, Thanks for the reply. So i can input raw RNA-seq reads to Tophat and feed the output to maker? Thanks. Best Regards KAren On 2016-02-10 06:17, Michael Campbell wrote: > HI Karen, > > From my experience trimming reads will not make things worse and it > generally makes things better. As far as the best program to use, one > doesn?t really stand out above the others as far as I can tell. > However, with paired end reads it is important to use a trimmer that > preserves the pairing between the two files (i.e when an entire read > is discarded the paired read is moved into a file for singletons). > > Thanks > Mike > >> On Feb 9, 2016, at 5:35 PM, hcma wrote: >> >> Hi Carson, >> >> For the final run of annotation, I would like to incorporate tophat >> results from RNA-seq data, from your experience, do you know if it is >> better to use raw RNA-seq (Illumina paired-end data) or trimmed >> (trimmed using Trimmomatuc) data for feeding into tophat? If trimmed, >> do you recommend a particular programme? >> >> Thanks for your time. >> >> Best Regards >> KAren >> >> >> >> >> On 2016-02-05 15:33, Carson Holt wrote: >>> I recommend using both. You probably don't have augustus installed. >>> --Carson >>> Sent from my iPhone >>>> On Feb 5, 2016, at 4:20 PM, hcma wrote: >>>> Hi Carlson, >>>> Thanks for the instruction and in maker_exe.ctl, i only see path to >>>> snap, but not to augustus, so my system admin is checking this for >>>> me. >>>> From some manual i found, people use both snap and augustus when >>>> using MAKER to annotate genomes. Would you recommend using both or >>>> one of the 2 is sufficient? >>>> Thanks for your valuable time and advise. >>>> Best Regards >>>> Karen >>>>> On 2016-02-05 15:03, Carson Holt wrote: >>>>> You need to find out where the augustus MAKER is using is >>>>> installed. >>>>> Check the maker_exe.ctl file you are using, or type ?which >>>>> augustus?. >>>>> ?Carson >>>>>> On Feb 5, 2016, at 3:58 PM, hcma wrote: >>>>>> Hi Carlson, >>>>>> These are the list of directories under maker/2.31.8 >>>>>> bin data GMOD INSTALL lib LICENSE MWAS perl README >>>>>> RELEASE src >>>>>> Where can i find augustus/? Or i have to ask my system admin to >>>>>> install this? >>>>>> Thanks. >>>>>> Best Regards >>>>>> Karen >>>>>>> On 2016-02-05 14:54, Carson Holt wrote: >>>>>>> Augustus gives you an entire directory rather than just a single >>>>>>> file >>>>>>> like SNAP. You have to take the directory and copy it to the >>>>>>> .../augustus/config/species/ directory. >>>>>>> Example: >>>>>>> ?/augustus/config/species/arabidopsis/ >>>>>>> Then ?arabidopsis? would be the species name to use with MAKER. >>>>>>> Sometimes you may have to do a second round of both SNAP and >>>>>>> Augustus >>>>>>> training (called bootstrapping). Look at the models you get after >>>>>>> the >>>>>>> first round, and if they look good then, the second round is >>>>>>> probably >>>>>>> not going top be beneficial. >>>>>>> ?Carson >>>>>>>> On Feb 5, 2016, at 3:42 PM, hcma wrote: >>>>>>>> Hi Dr Holt, >>>>>>>> Thanks for the email. Here is my pipeline, does it seems >>>>>>>> acceptable? Any comments is welcome and much appreciated. >>>>>>>> 1. Use maker to generate training gene set: >>>>>>>> genome=all-chromosome-r1.04.fasta >>>>>>>> est=Trinity.fasta >>>>>>>> est2genome=1 >>>>>>>> 2. Use output of Maker to train SNAP: >>>>>>>> maker2zff dwil-all-chromosome-r1.04.all.gff >>>>>>>> fathom genome.ann genome.dna ?gene-stats >>>>>>>> fathom genome.ann genome.dna ?categorize 1000 >>>>>>>> fathom genome.ann genome.dna ?gene-stats >>>>>>>> fathom uni.ann uni.dna ?export 1000 ?plus >>>>>>>> hmm-assembler.pl genome . > dwil_genome.hmm >>>>>>>> 3. Use output of Maker to train Augustus on their webserver: >>>>>>>> File used: >>>>>>>> Upload ?export.dna? as the genome file >>>>>>>> Upload ?export.aa? as the protein file >>>>>>>> 4. second and final Maker run: >>>>>>>> genome=all-chromosome-r1.04.fasta >>>>>>>> est=Trinity.fasta >>>>>>>> est2genome=0 >>>>>>>> Snaphmm=output of 2 >>>>>>>> How do i incorporate the output of training set of gene from >>>>>>>> Augustus web server here into this step 4? >>>>>>>> Thanks for your time. >>>>>>>> Best Regards >>>>>>>> Karen >>>>>>>>> On 2016-02-05 06:36, Carson Holt wrote: >>>>>>>>> Hi Karen, >>>>>>>>> There are many ways to train Augustus. I prefer to identify >>>>>>>>> gene >>>>>>>>> models in MAKER (GFF3) and use those to train both SNAP and >>>>>>>>> Augustus. >>>>>>>>> Here is a previous post on the topic ?> >>>>>>>>> https://groups.google.com/forum/#!searchin/maker-devel/augustus/maker-devel/FWMSTdqWQqI/lC3miQtiCpwJ >>>>>>>>> [1] >>>>>>>>> In the end you need to look at the SNAP and Augustus models >>>>>>>>> together >>>>>>>>> with evidence alignments in a genome browser (like desktop >>>>>>>>> Apollo). >>>>>>>>> When everything is trained well, both SNAP and Augustus models >>>>>>>>> will >>>>>>>>> look like each other and both seem to look like the evidence >>>>>>>>> alignments. >>>>>>>>> Thanks, >>>>>>>>> Carson >>>>>>>>>> On Feb 4, 2016, at 5:52 PM, hcma wrote: >>>>>>>>>> Hi, >>>>>>>>>> I have a genome sequence and Trinity assembly for a new >>>>>>>>>> species and >>>>>>>>>> I am wondering what are the best steps to take when using >>>>>>>>>> MAKER? >>>>>>>>>> 1. I used the genome sequence and all assembled Trinity >>>>>>>>>> sequence to >>>>>>>>>> do first run of MAKER in order to generate training set for >>>>>>>>>> SNAP and >>>>>>>>>> Augustus. >>>>>>>>>> In maker_opts.ctl: >>>>>>>>>> genome=all-chromosome-r1.04.fasta >>>>>>>>>> est=Trinity.fasta >>>>>>>>>> est2genome=1 >>>>>>>>>> 2. Train SNAP >>>>>>>>>> 3. Train Augustus >>>>>>>>>> When i train Augustus, i only supply genome and protein file, >>>>>>>>>> should >>>>>>>>>> i also supply the trinity file here? >>>>>>>>>> 4. what's the best parameter to use when running MAKER the >>>>>>>>>> second >>>>>>>>>> time for obtaining the final annotation? I would prefer not to >>>>>>>>>> use >>>>>>>>>> any external protein data. >>>>>>>>>> genome=all-chromosome-r1.04.fasta >>>>>>>>>> est=Trinity.fasta >>>>>>>>>> est2genome=0 >>>>>>>>>> SNAP >>>>>>>>>> Augustus >>>>>>>>>> Thanks. >>>>>>>>>> Best Regards >>>>>>>>>> KAren >>>>>>>>> Links: >>>>>>>>> ------ >>>>>>>>> [1] >>>>>>>>> https://groups.google.com/forum/#!searchin/maker-devel/augustus/maker-devel/FWMSTdqWQqI/lC3miQtiCpwJ >> From carsonhh at gmail.com Wed Feb 10 19:32:00 2016 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 10 Feb 2016 19:32:00 -0700 Subject: [maker-devel] Q on MAKER In-Reply-To: <7870d65f86546a8b486faf98c1f6fcc0@uci.edu> References: <07C1C0FA-243E-4F05-AE17-FBECCE5BC3A5@gmail.com> <980827F7-C4F0-4E26-A480-F588B2E09C7B@gmail.com> <99f6989955acdf6fd6b0875affbeefa9@uci.edu> <308df3bd49b7098b6f0d36cb2675a0de@uci.edu> <4b6492c5148151cc52c91f2d56c6532b@uci.edu> <54B64F8D-D4A6-4E0F-9693-3AC4055CEA02@gmail.com> <5a40b7af9947dc8297046ba52620569e@uci.edu> <9449FD1E-5D22-4B0B-A61A-644D7B63335B@gmail.com> <7e4d6f2773f654f8530155936b648832@uci.edu> <7495272C-476A-4985-8D49-30D991410535@gmail.com> <7870d65f86546a8b486faf98c1f6fcc0@uci.edu> Message-ID: I find tophat results to be too noisy, and prefer cufflinks. There is both a tophat2gff and cufflinks2gff script that comes with MAKER. Also consider assembling the reads with Trinity (my overall preferred method because it yields the highest specificity). --Carson Sent from my iPhone > On Feb 10, 2016, at 3:27 PM, hcma wrote: > > Hi Mike, > > Thanks for the reply. So i can input raw RNA-seq reads to Tophat and feed the output to maker? > > Thanks. > > Best Regards > KAren > > > >> On 2016-02-10 06:17, Michael Campbell wrote: >> HI Karen, >> From my experience trimming reads will not make things worse and it >> generally makes things better. As far as the best program to use, one >> doesn?t really stand out above the others as far as I can tell. >> However, with paired end reads it is important to use a trimmer that >> preserves the pairing between the two files (i.e when an entire read >> is discarded the paired read is moved into a file for singletons). >> Thanks >> Mike >>> On Feb 9, 2016, at 5:35 PM, hcma wrote: >>> Hi Carson, >>> For the final run of annotation, I would like to incorporate tophat results from RNA-seq data, from your experience, do you know if it is better to use raw RNA-seq (Illumina paired-end data) or trimmed (trimmed using Trimmomatuc) data for feeding into tophat? If trimmed, do you recommend a particular programme? >>> Thanks for your time. >>> Best Regards >>> KAren >>>> On 2016-02-05 15:33, Carson Holt wrote: >>>> I recommend using both. You probably don't have augustus installed. >>>> --Carson >>>> Sent from my iPhone >>>>> On Feb 5, 2016, at 4:20 PM, hcma wrote: >>>>> Hi Carlson, >>>>> Thanks for the instruction and in maker_exe.ctl, i only see path to snap, but not to augustus, so my system admin is checking this for me. >>>>> From some manual i found, people use both snap and augustus when using MAKER to annotate genomes. Would you recommend using both or one of the 2 is sufficient? >>>>> Thanks for your valuable time and advise. >>>>> Best Regards >>>>> Karen >>>>>> On 2016-02-05 15:03, Carson Holt wrote: >>>>>> You need to find out where the augustus MAKER is using is installed. >>>>>> Check the maker_exe.ctl file you are using, or type ?which augustus?. >>>>>> ?Carson >>>>>>> On Feb 5, 2016, at 3:58 PM, hcma wrote: >>>>>>> Hi Carlson, >>>>>>> These are the list of directories under maker/2.31.8 >>>>>>> bin data GMOD INSTALL lib LICENSE MWAS perl README RELEASE src >>>>>>> Where can i find augustus/? Or i have to ask my system admin to install this? >>>>>>> Thanks. >>>>>>> Best Regards >>>>>>> Karen >>>>>>>> On 2016-02-05 14:54, Carson Holt wrote: >>>>>>>> Augustus gives you an entire directory rather than just a single file >>>>>>>> like SNAP. You have to take the directory and copy it to the >>>>>>>> .../augustus/config/species/ directory. >>>>>>>> Example: >>>>>>>> ?/augustus/config/species/arabidopsis/ >>>>>>>> Then ?arabidopsis? would be the species name to use with MAKER. >>>>>>>> Sometimes you may have to do a second round of both SNAP and Augustus >>>>>>>> training (called bootstrapping). Look at the models you get after the >>>>>>>> first round, and if they look good then, the second round is probably >>>>>>>> not going top be beneficial. >>>>>>>> ?Carson >>>>>>>>> On Feb 5, 2016, at 3:42 PM, hcma wrote: >>>>>>>>> Hi Dr Holt, >>>>>>>>> Thanks for the email. Here is my pipeline, does it seems acceptable? Any comments is welcome and much appreciated. >>>>>>>>> 1. Use maker to generate training gene set: >>>>>>>>> genome=all-chromosome-r1.04.fasta >>>>>>>>> est=Trinity.fasta >>>>>>>>> est2genome=1 >>>>>>>>> 2. Use output of Maker to train SNAP: >>>>>>>>> maker2zff dwil-all-chromosome-r1.04.all.gff >>>>>>>>> fathom genome.ann genome.dna ?gene-stats >>>>>>>>> fathom genome.ann genome.dna ?categorize 1000 >>>>>>>>> fathom genome.ann genome.dna ?gene-stats >>>>>>>>> fathom uni.ann uni.dna ?export 1000 ?plus >>>>>>>>> hmm-assembler.pl genome . > dwil_genome.hmm >>>>>>>>> 3. Use output of Maker to train Augustus on their webserver: >>>>>>>>> File used: >>>>>>>>> Upload ?export.dna? as the genome file >>>>>>>>> Upload ?export.aa? as the protein file >>>>>>>>> 4. second and final Maker run: >>>>>>>>> genome=all-chromosome-r1.04.fasta >>>>>>>>> est=Trinity.fasta >>>>>>>>> est2genome=0 >>>>>>>>> Snaphmm=output of 2 >>>>>>>>> How do i incorporate the output of training set of gene from Augustus web server here into this step 4? >>>>>>>>> Thanks for your time. >>>>>>>>> Best Regards >>>>>>>>> Karen >>>>>>>>>> On 2016-02-05 06:36, Carson Holt wrote: >>>>>>>>>> Hi Karen, >>>>>>>>>> There are many ways to train Augustus. I prefer to identify gene >>>>>>>>>> models in MAKER (GFF3) and use those to train both SNAP and Augustus. >>>>>>>>>> Here is a previous post on the topic ?> >>>>>>>>>> https://groups.google.com/forum/#!searchin/maker-devel/augustus/maker-devel/FWMSTdqWQqI/lC3miQtiCpwJ >>>>>>>>>> [1] >>>>>>>>>> In the end you need to look at the SNAP and Augustus models together >>>>>>>>>> with evidence alignments in a genome browser (like desktop Apollo). >>>>>>>>>> When everything is trained well, both SNAP and Augustus models will >>>>>>>>>> look like each other and both seem to look like the evidence >>>>>>>>>> alignments. >>>>>>>>>> Thanks, >>>>>>>>>> Carson >>>>>>>>>>> On Feb 4, 2016, at 5:52 PM, hcma wrote: >>>>>>>>>>> Hi, >>>>>>>>>>> I have a genome sequence and Trinity assembly for a new species and >>>>>>>>>>> I am wondering what are the best steps to take when using MAKER? >>>>>>>>>>> 1. I used the genome sequence and all assembled Trinity sequence to >>>>>>>>>>> do first run of MAKER in order to generate training set for SNAP and >>>>>>>>>>> Augustus. >>>>>>>>>>> In maker_opts.ctl: >>>>>>>>>>> genome=all-chromosome-r1.04.fasta >>>>>>>>>>> est=Trinity.fasta >>>>>>>>>>> est2genome=1 >>>>>>>>>>> 2. Train SNAP >>>>>>>>>>> 3. Train Augustus >>>>>>>>>>> When i train Augustus, i only supply genome and protein file, should >>>>>>>>>>> i also supply the trinity file here? >>>>>>>>>>> 4. what's the best parameter to use when running MAKER the second >>>>>>>>>>> time for obtaining the final annotation? I would prefer not to use >>>>>>>>>>> any external protein data. >>>>>>>>>>> genome=all-chromosome-r1.04.fasta >>>>>>>>>>> est=Trinity.fasta >>>>>>>>>>> est2genome=0 >>>>>>>>>>> SNAP >>>>>>>>>>> Augustus >>>>>>>>>>> Thanks. >>>>>>>>>>> Best Regards >>>>>>>>>>> KAren >>>>>>>>>> Links: >>>>>>>>>> ------ >>>>>>>>>> [1] >>>>>>>>>> https://groups.google.com/forum/#!searchin/maker-devel/augustus/maker-devel/FWMSTdqWQqI/lC3miQtiCpwJ > From fdolze at students.uni-mainz.de Thu Feb 11 03:43:51 2016 From: fdolze at students.uni-mainz.de (Florian) Date: Thu, 11 Feb 2016 11:43:51 +0100 Subject: [maker-devel] Maker-run with no clean finish on openMPI-cluster Message-ID: <56BC65E7.6000904@students.uni-mainz.de> Hi all, I am no expert for MPI so maybe this is something very trivial or maybe not caused by MAKER at all but I'd be glad to have your thoughts on this. I installed MAKER 2.31.8 with MPI support (openMPI 1.8.1) on our cluster. I ran maker with the options attached and the command in bsub_maker, and I _think_ it worked fine. Here is the last output of maker: running exonerate search. #--------- command -------------# Widget::exonerate::protein2genome: /gpfs/fs1/cluster/Apps/bioinf/maker/2.31.8/exe/exonerate/bin/exonerate -q /project/molgen/workbench_Florian/riparius_MAKER_v2/tmp_fil es/maker_yZhQlA/49/sp%7CQ4JHE0%7CXB36_ORYSJ.for.114901-115619.49.fasta -t /project/molgen/workbench_Florian/riparius_MAKER_v2/tmp_fil es/maker_yZhQlA/49/scaffold299_size115619.114901-115619.49.fasta -Q protein -T dna -m protein2genome --softmasktarget --percent 20 - -showcigar > /project/molgen/workbench_Florian/riparius_MAKER_v2/tmp_files/maker_yZhQlA/49/scaffold299_size115619.114901-115619.sp%7 CQ4JHE0%7CXB36_ORYSJ.p.exonerate #-------------------------------# cleaning blastx... in cluster::shadow_cluster... ...finished clustering. in cluster::shadow_cluster... ...finished clustering. cleaning clusters.... total clusters:37 now processing 0 ...processing 0 of 11 ...processing 1 of 11 ...processing 2 of 11 ...processing 3 of 11 ... ...processing 174 of 177 ...processing 175 of 177 ...processing 176 of 177 flattening protein clusters prepare section files Maker is now finished!!! Start_time: 1454700985 End_time: 1455023070 Elapsed: 322085 but my cluster job didnt finish here, instead I got the following errors until my runtime limit of 5 days was reached: Argument "ALRM" isn't numeric in exit at /home/fdolze/perl5/lib/perl5/x86_64-linux-thread-multi/forks.pm line 2184. SIGTERM received Argument "ALRM" isn't numeric in exit at /home/fdolze/perl5/lib/perl5/x86_64-linux-thread-multi/forks.pm line 2184. SIGTERM received Perl exited with active threads: 1 running and unjoined 0 finished and unjoined 0 running and detached SIGTERM received Perl exited with active threads: 1 running and unjoined 0 finished and unjoined 0 running and detached SIGTERM received Perl exited with active threads: 1 running and unjoined 0 finished and unjoined 0 running and detached SIGTERM received SIGTERM received Perl exited with active threads: 1 running and unjoined 0 finished and unjoined 0 running and detached SIGTERM received Perl exited with active threads: 1 running and unjoined 0 finished and unjoined 0 running and detached SIGTERM received Perl exited with active threads: 1 running and unjoined 0 finished and unjoined 0 running and detached Argument "ALRM" isn't numeric in exit at /home/fdolze/perl5/lib/perl5/x86_64-linux-thread-multi/forks.pm line 2184. SIGTERM received SIGTERM received Perl exited with active threads: 1 running and unjoined 0 finished and unjoined 0 running and detached Argument "ALRM" isn't numeric in exit at /home/fdolze/perl5/lib/perl5/x86_64-linux-thread-multi/forks.pm line 2184. Argument "ALRM" isn't numeric in exit at /home/fdolze/perl5/lib/perl5/x86_64-linux-thread-multi/forks.pm line 2184. SIGTERM received SIGTERM received SIGTERM received SIGTERM received Perl exited with active threads: 1 running and unjoined 0 finished and unjoined 0 running and detached SIGTERM received SIGTERM received Perl exited with active threads: 1 running and unjoined 0 finished and unjoined 0 running and detached Argument "ALRM" isn't numeric in exit at /home/fdolze/perl5/lib/perl5/x86_64-linux-thread-multi/forks.pm line 2184. [a0238:09542] *** Process received signal *** [a0238:09542] Signal: Segmentation fault (11) [a0238:09542] Signal code: Address not mapped (1) [a0238:09542] Failing at address: 0xa80 [a0238:09542] [ 0] /lib64/libpthread.so.0(+0xf710)[0x2ba955727710] [a0238:09542] [ 1] /usr/lib64/perl5/CORE/libperl.so(Perl_csighandler+0x22)[0x2ba954715002] [a0238:09542] [ 2] /lib64/libpthread.so.0(+0xf710)[0x2ba955727710] [a0238:09542] [ 3] /lib64/libc.so.6(__poll+0x53)[0x2ba955a170d3] [a0238:09542] [ 4] /cluster/mpi/gcc_4.4.7/OpenMPI-1.8.1/lib/libopen-pal.so.6(+0x6cfca)[0x2ba955fb4fca] [a0238:09542] [ 5] /cluster/mpi/gcc_4.4.7/OpenMPI-1.8.1/lib/libopen-pal.so.6(opal_libevent2021_event_base_loop+0x221)[0x2ba955fabf11] [a0238:09542] [ 6] /cluster/mpi/gcc_4.4.7/OpenMPI-1.8.1/lib/libopen-rte.so.7(+0x376ae)[0x2ba955d076ae] [a0238:09542] [ 7] /lib64/libpthread.so.0(+0x79d1)[0x2ba95571f9d1] [a0238:09542] [ 8] /lib64/libc.so.6(clone+0x6d)[0x2ba955a208fd] [a0238:09542] *** End of error message *** Argument "ALRM" isn't numeric in exit at /home/fdolze/perl5/lib/perl5/x86_64-linux-thread-multi/forks.pm line 2184. SIGTERM received SIGTERM received ... maybe someone experienced something similar before or can give me some hint if this is caused by my setup or by maker. kind regards, Florian Dolze -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- #-----BLAST and Exonerate Statistics Thresholds blast_type=ncbi+ #set to 'ncbi+', 'ncbi' or 'wublast' pcov_blastn=0.8 #Blastn Percent Coverage Threhold EST-Genome Alignments pid_blastn=0.85 #Blastn Percent Identity Threshold EST-Genome Aligments eval_blastn=1e-10 #Blastn eval cutoff bit_blastn=40 #Blastn bit cutoff depth_blastn=0 #Blastn depth cutoff (0 to disable cutoff) pcov_blastx=0.5 #Blastx Percent Coverage Threhold Protein-Genome Alignments pid_blastx=0.4 #Blastx Percent Identity Threshold Protein-Genome Aligments eval_blastx=1e-06 #Blastx eval cutoff bit_blastx=30 #Blastx bit cutoff depth_blastx=0 #Blastx depth cutoff (0 to disable cutoff) pcov_tblastx=0.8 #tBlastx Percent Coverage Threhold alt-EST-Genome Alignments pid_tblastx=0.85 #tBlastx Percent Identity Threshold alt-EST-Genome Aligments eval_tblastx=1e-10 #tBlastx eval cutoff bit_tblastx=40 #tBlastx bit cutoff depth_tblastx=0 #tBlastx depth cutoff (0 to disable cutoff) pcov_rm_blastx=0.5 #Blastx Percent Coverage Threhold For Transposable Element Masking pid_rm_blastx=0.4 #Blastx Percent Identity Threshold For Transposbale Element Masking eval_rm_blastx=1e-06 #Blastx eval cutoff for transposable element masking bit_rm_blastx=30 #Blastx bit cutoff for transposable element masking ep_score_limit=20 #Exonerate protein percent of maximal score threshold en_score_limit=20 #Exonerate nucleotide percent of maximal score threshold -------------- next part -------------- #-----Location of Executables Used by MAKER/EVALUATOR makeblastdb=/cluster/Apps/bioinf/BLAST/2.2.28/bin/makeblastdb #location of NCBI+ makeblastdb executable blastn=/cluster/Apps/bioinf/BLAST/2.2.28/bin/blastn #location of NCBI+ blastn executable blastx=/cluster/Apps/bioinf/BLAST/2.2.28/bin/blastx #location of NCBI+ blastx executable tblastx=/cluster/Apps/bioinf/BLAST/2.2.28/bin/tblastx #location of NCBI+ tblastx executable formatdb= #location of NCBI formatdb executable blastall= #location of NCBI blastall executable xdformat= #location of WUBLAST xdformat executable blasta= #location of WUBLAST blasta executable RepeatMasker=/gpfs/fs1/cluster/Apps/bioinf/maker/2.31.8/bin/../exe/RepeatMasker/RepeatMasker #location of RepeatMasker executable exonerate=/gpfs/fs1/cluster/Apps/bioinf/maker/2.31.8/bin/../exe/exonerate/bin/exonerate #location of exonerate executable #-----Ab-initio Gene Prediction Algorithms snap=/gpfs/fs1/cluster/Apps/bioinf/maker/2.31.8/bin/../exe/snap/snap #location of snap executable gmhmme3=/project/molgen/Maker_additional_tools/genemark-4.32/gmhmme3 #location of eukaryotic genemark executable gmhmmp= #location of prokaryotic genemark executable augustus=/project/molgen/Maker_additional_tools/augustus-3.2.1/bin/augustus #location of augustus executable fgenesh= #location of fgenesh executable tRNAscan-SE=/project/molgen/Maker_additional_tools/tRNAscan/bin/tRNAscan-SE #location of trnascan executable snoscan=/project/molgen/Maker_additional_tools/snoscan/bin/snoscan #location of snoscan executable #-----Other Algorithms probuild=/project/molgen/Maker_additional_tools/genemark-4.32/probuild #location of probuild executable (required for genemark) -------------- next part -------------- #-----Genome (these are always required) genome= /project/molgen/workbench_Florian/riparius_MAKER_v2/Crip_genome_v20_newHead.fa organism_type=eukaryotic #eukaryotic or prokaryotic. Default is eukaryotic #-----Re-annotation Using MAKER Derived GFF3 maker_gff= #MAKER derived GFF3 file est_pass=0 #use ESTs in maker_gff: 1 = yes, 0 = no altest_pass=0 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no protein_pass=0 #use protein alignments in maker_gff: 1 = yes, 0 = no rm_pass=0 #use repeats in maker_gff: 1 = yes, 0 = no model_pass=0 #use gene models in maker_gff: 1 = yes, 0 = no pred_pass=0 #use ab-initio predictions in maker_gff: 1 = yes, 0 = no other_pass=0 #passthrough anyything else in maker_gff: 1 = yes, 0 = no #-----EST Evidence (for best results provide a file for at least one) est=/project/molgen/workbench_Florian/riparius_MAKER_v2/riparius_cDNA_formatedHeader.fa #set of ESTs or assembled mRNA-seq in fasta format altest= #EST/cDNA sequence file in fasta format from an alternate organism est_gff= #aligned ESTs or mRNA-seq from an external GFF3 file altest_gff= #aligned ESTs from a closly relate species in GFF3 format #-----Protein Homology Evidence (for best results provide a file for at least one) protein=/project/molgen/workbench_Florian/riparius_MAKER_v2/uniprot_sprot.fasta #protein sequence file in fasta format (i.e. from mutiple oransisms) protein_gff= #aligned protein homology evidence from an external GFF3 file #-----Repeat Masking (leave values blank to skip repeat masking) model_org=all #select a model organism for RepBase masking in RepeatMasker rmlib=/project/molgen/workbench_Florian/riparius_MAKER_v2/20151208_Custom_Crip_repeat_library_final.fas #provide an organism specific repeat library in fasta format for RepeatMasker repeat_protein=/gpfs/fs1/cluster/Apps/bioinf/maker/2.31.8/data/te_proteins.fasta #provide a fasta file of transposable element proteins for RepeatRunner rm_gff= #pre-identified repeat elements from an external GFF3 file prok_rm=0 #forces MAKER to repeatmask prokaryotes (no reason to change this), 1 = yes, 0 = no softmask=1 #use soft-masking rather than hard-masking in BLAST (i.e. seg and dust filtering) #-----Gene Prediction snaphmm=/project/molgen/workbench_Florian/riparius_MAKER_v2/cegmasnap.hmm #SNAP HMM file gmhmm=/project/molgen/workbench_Florian/riparius_MAKER_v2/gmhmm.mod #GeneMark HMM file augustus_species=Riparius_Neu #Augustus gene prediction species model fgenesh_par_file= #FGENESH parameter file pred_gff= #ab-initio predictions from an external GFF3 file model_gff= #annotated gene models from an external GFF3 file (annotation pass-through) est2genome=1 #infer gene predictions directly from ESTs, 1 = yes, 0 = no protein2genome=1 #infer predictions from protein homology, 1 = yes, 0 = no trna=1 #find tRNAs with tRNAscan, 1 = yes, 0 = no snoscan_rrna=/project/molgen/workbench_Florian/riparius_MAKER_v2/C.thummi_28S_rDNA_gene.fasta #rRNA file to have Snoscan find snoRNAs unmask=0 #also run ab-initio prediction programs on unmasked sequence, 1 = yes, 0 = no #-----Other Annotation Feature Types (features MAKER doesn't recognize) other_gff= #extra features to pass-through to final MAKER generated GFF3 file #-----External Application Behavior Options alt_peptide=C #amino acid used to replace non-standard amino acids in BLAST databases cpus=1 #max number of cpus to use in BLAST and RepeatMasker (not for MPI, leave 1 when using MPI) #-----MAKER Behavior Options max_dna_len=2100000 #length for dividing up contigs into chunks (increases/decreases memory usage) min_contig=1 #skip genome contigs below this length (under 10kb are often useless) pred_flank=200 #flank for extending evidence clusters sent to gene predictors pred_stats=0 #report AED and QI statistics for all predictions as well as models AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and 1) min_protein=0 #require at least this many amino acids in predicted proteins alt_splice=0 #Take extra steps to try and find alternative splicing, 1 = yes, 0 = no always_complete=0 #extra steps to force start and stop codons, 1 = yes, 0 = no map_forward=0 #map names and attributes forward from old GFF3 genes, 1 = yes, 0 = no keep_preds=1 #Concordance threshold to add unsupported gene prediction (bound by 0 and 1) split_hit=10000 #length for the splitting of hits (expected max intron size for evidence alignments) single_exon=1 #consider single exon EST evidence when generating annotations, 1 = yes, 0 = no single_length=250 #min length required for single exon ESTs if 'single_exon is enabled' correct_est_fusion=0 #limits use of ESTs in annotation to avoid fusion genes tries=2 #number of times to try a contig if there is a failure for some reason clean_try=0 #remove all data from previous run before retrying, 1 = yes, 0 = no clean_up=0 #removes theVoid directory with individual analysis files, 1 = yes, 0 = no TMP=/project/molgen/workbench_Florian/riparius_MAKER_v2/tmp_files #specify a directory other than the system default temporary directory for temporary files -------------- next part -------------- #!/bin/bash #BSUB -n 128 #BSUB -q long #BSUB -W 7200 #BSUB -o mogon_maker_MPIrun_5_feb.log #BSUB -J riparius_makerMPI #BSUB -app Reserve1G mpiexec -mca btl ^openib -n 128 /project/molgen/Bio/maker-2.31.8_MPI-1.8.1/bin/maker -base maker_MPIrun3 -fix_nucleotides From hcma at uci.edu Thu Feb 11 15:32:45 2016 From: hcma at uci.edu (hcma) Date: Thu, 11 Feb 2016 14:32:45 -0800 Subject: [maker-devel] Q on MAKER In-Reply-To: References: <07C1C0FA-243E-4F05-AE17-FBECCE5BC3A5@gmail.com> <980827F7-C4F0-4E26-A480-F588B2E09C7B@gmail.com> <99f6989955acdf6fd6b0875affbeefa9@uci.edu> <308df3bd49b7098b6f0d36cb2675a0de@uci.edu> <4b6492c5148151cc52c91f2d56c6532b@uci.edu> <54B64F8D-D4A6-4E0F-9693-3AC4055CEA02@gmail.com> <5a40b7af9947dc8297046ba52620569e@uci.edu> <9449FD1E-5D22-4B0B-A61A-644D7B63335B@gmail.com> <7e4d6f2773f654f8530155936b648832@uci.edu> <7495272C-476A-4985-8D49-30D991410535@gmail.com> <7870d65f86546a8b486faf98c1f6fcc0@uci.edu> Message-ID: Hi Carlson, Thanks for sharing. I did assemble the Illumina RNA-seq PE100 reads de novo using Trinity and i input this to the 1st run of maker for generating a set of genes to train SNAP and augustus. Now, i am planning to run a 2nd run (and perhaps final run) of maker for gene prediction, provided that the result of Snap and Augustus looks similar to each other. I was going to incorporate the GFF result from tophat into 2nd run of maker for gene prediction, along with Trinity output, but avoiding external protein annotation. I already did a separate blast analysis to identify orthologous genes and i prefer to run maker without any protein evidence. Do you recommend to input the output of tophat2gff for this second run of maker for gene prediction? Thanks again for your time and advise. Best Regards Karen On 2016-02-10 18:32, Carson Holt wrote: > I find tophat results to be too noisy, and prefer cufflinks. There is > both a tophat2gff and cufflinks2gff script that comes with MAKER. Also > consider assembling the reads with Trinity (my overall preferred > method because it yields the highest specificity). > > --Carson > > Sent from my iPhone > >> On Feb 10, 2016, at 3:27 PM, hcma wrote: >> >> Hi Mike, >> >> Thanks for the reply. So i can input raw RNA-seq reads to Tophat and >> feed the output to maker? >> >> Thanks. >> >> Best Regards >> KAren >> >> >> >>> On 2016-02-10 06:17, Michael Campbell wrote: >>> HI Karen, >>> From my experience trimming reads will not make things worse and it >>> generally makes things better. As far as the best program to use, one >>> doesn?t really stand out above the others as far as I can tell. >>> However, with paired end reads it is important to use a trimmer that >>> preserves the pairing between the two files (i.e when an entire read >>> is discarded the paired read is moved into a file for singletons). >>> Thanks >>> Mike >>>> On Feb 9, 2016, at 5:35 PM, hcma wrote: >>>> Hi Carson, >>>> For the final run of annotation, I would like to incorporate tophat >>>> results from RNA-seq data, from your experience, do you know if it >>>> is better to use raw RNA-seq (Illumina paired-end data) or trimmed >>>> (trimmed using Trimmomatuc) data for feeding into tophat? If >>>> trimmed, do you recommend a particular programme? >>>> Thanks for your time. >>>> Best Regards >>>> KAren >>>>> On 2016-02-05 15:33, Carson Holt wrote: >>>>> I recommend using both. You probably don't have augustus >>>>> installed. >>>>> --Carson >>>>> Sent from my iPhone >>>>>> On Feb 5, 2016, at 4:20 PM, hcma wrote: >>>>>> Hi Carlson, >>>>>> Thanks for the instruction and in maker_exe.ctl, i only see path >>>>>> to snap, but not to augustus, so my system admin is checking this >>>>>> for me. >>>>>> From some manual i found, people use both snap and augustus when >>>>>> using MAKER to annotate genomes. Would you recommend using both or >>>>>> one of the 2 is sufficient? >>>>>> Thanks for your valuable time and advise. >>>>>> Best Regards >>>>>> Karen >>>>>>> On 2016-02-05 15:03, Carson Holt wrote: >>>>>>> You need to find out where the augustus MAKER is using is >>>>>>> installed. >>>>>>> Check the maker_exe.ctl file you are using, or type ?which >>>>>>> augustus?. >>>>>>> ?Carson >>>>>>>> On Feb 5, 2016, at 3:58 PM, hcma wrote: >>>>>>>> Hi Carlson, >>>>>>>> These are the list of directories under maker/2.31.8 >>>>>>>> bin data GMOD INSTALL lib LICENSE MWAS perl README >>>>>>>> RELEASE src >>>>>>>> Where can i find augustus/? Or i have to ask my system admin to >>>>>>>> install this? >>>>>>>> Thanks. >>>>>>>> Best Regards >>>>>>>> Karen >>>>>>>>> On 2016-02-05 14:54, Carson Holt wrote: >>>>>>>>> Augustus gives you an entire directory rather than just a >>>>>>>>> single file >>>>>>>>> like SNAP. You have to take the directory and copy it to the >>>>>>>>> .../augustus/config/species/ directory. >>>>>>>>> Example: >>>>>>>>> ?/augustus/config/species/arabidopsis/ >>>>>>>>> Then ?arabidopsis? would be the species name to use with MAKER. >>>>>>>>> Sometimes you may have to do a second round of both SNAP and >>>>>>>>> Augustus >>>>>>>>> training (called bootstrapping). Look at the models you get >>>>>>>>> after the >>>>>>>>> first round, and if they look good then, the second round is >>>>>>>>> probably >>>>>>>>> not going top be beneficial. >>>>>>>>> ?Carson >>>>>>>>>> On Feb 5, 2016, at 3:42 PM, hcma wrote: >>>>>>>>>> Hi Dr Holt, >>>>>>>>>> Thanks for the email. Here is my pipeline, does it seems >>>>>>>>>> acceptable? Any comments is welcome and much appreciated. >>>>>>>>>> 1. Use maker to generate training gene set: >>>>>>>>>> genome=all-chromosome-r1.04.fasta >>>>>>>>>> est=Trinity.fasta >>>>>>>>>> est2genome=1 >>>>>>>>>> 2. Use output of Maker to train SNAP: >>>>>>>>>> maker2zff dwil-all-chromosome-r1.04.all.gff >>>>>>>>>> fathom genome.ann genome.dna ?gene-stats >>>>>>>>>> fathom genome.ann genome.dna ?categorize 1000 >>>>>>>>>> fathom genome.ann genome.dna ?gene-stats >>>>>>>>>> fathom uni.ann uni.dna ?export 1000 ?plus >>>>>>>>>> hmm-assembler.pl genome . > dwil_genome.hmm >>>>>>>>>> 3. Use output of Maker to train Augustus on their webserver: >>>>>>>>>> File used: >>>>>>>>>> Upload ?export.dna? as the genome file >>>>>>>>>> Upload ?export.aa? as the protein file >>>>>>>>>> 4. second and final Maker run: >>>>>>>>>> genome=all-chromosome-r1.04.fasta >>>>>>>>>> est=Trinity.fasta >>>>>>>>>> est2genome=0 >>>>>>>>>> Snaphmm=output of 2 >>>>>>>>>> How do i incorporate the output of training set of gene from >>>>>>>>>> Augustus web server here into this step 4? >>>>>>>>>> Thanks for your time. >>>>>>>>>> Best Regards >>>>>>>>>> Karen >>>>>>>>>>> On 2016-02-05 06:36, Carson Holt wrote: >>>>>>>>>>> Hi Karen, >>>>>>>>>>> There are many ways to train Augustus. I prefer to identify >>>>>>>>>>> gene >>>>>>>>>>> models in MAKER (GFF3) and use those to train both SNAP and >>>>>>>>>>> Augustus. >>>>>>>>>>> Here is a previous post on the topic ?> >>>>>>>>>>> https://groups.google.com/forum/#!searchin/maker-devel/augustus/maker-devel/FWMSTdqWQqI/lC3miQtiCpwJ >>>>>>>>>>> [1] >>>>>>>>>>> In the end you need to look at the SNAP and Augustus models >>>>>>>>>>> together >>>>>>>>>>> with evidence alignments in a genome browser (like desktop >>>>>>>>>>> Apollo). >>>>>>>>>>> When everything is trained well, both SNAP and Augustus >>>>>>>>>>> models will >>>>>>>>>>> look like each other and both seem to look like the evidence >>>>>>>>>>> alignments. >>>>>>>>>>> Thanks, >>>>>>>>>>> Carson >>>>>>>>>>>> On Feb 4, 2016, at 5:52 PM, hcma wrote: >>>>>>>>>>>> Hi, >>>>>>>>>>>> I have a genome sequence and Trinity assembly for a new >>>>>>>>>>>> species and >>>>>>>>>>>> I am wondering what are the best steps to take when using >>>>>>>>>>>> MAKER? >>>>>>>>>>>> 1. I used the genome sequence and all assembled Trinity >>>>>>>>>>>> sequence to >>>>>>>>>>>> do first run of MAKER in order to generate training set for >>>>>>>>>>>> SNAP and >>>>>>>>>>>> Augustus. >>>>>>>>>>>> In maker_opts.ctl: >>>>>>>>>>>> genome=all-chromosome-r1.04.fasta >>>>>>>>>>>> est=Trinity.fasta >>>>>>>>>>>> est2genome=1 >>>>>>>>>>>> 2. Train SNAP >>>>>>>>>>>> 3. Train Augustus >>>>>>>>>>>> When i train Augustus, i only supply genome and protein >>>>>>>>>>>> file, should >>>>>>>>>>>> i also supply the trinity file here? >>>>>>>>>>>> 4. what's the best parameter to use when running MAKER the >>>>>>>>>>>> second >>>>>>>>>>>> time for obtaining the final annotation? I would prefer not >>>>>>>>>>>> to use >>>>>>>>>>>> any external protein data. >>>>>>>>>>>> genome=all-chromosome-r1.04.fasta >>>>>>>>>>>> est=Trinity.fasta >>>>>>>>>>>> est2genome=0 >>>>>>>>>>>> SNAP >>>>>>>>>>>> Augustus >>>>>>>>>>>> Thanks. >>>>>>>>>>>> Best Regards >>>>>>>>>>>> KAren >>>>>>>>>>> Links: >>>>>>>>>>> ------ >>>>>>>>>>> [1] >>>>>>>>>>> https://groups.google.com/forum/#!searchin/maker-devel/augustus/maker-devel/FWMSTdqWQqI/lC3miQtiCpwJ >> From carsonhh at gmail.com Thu Feb 11 15:36:44 2016 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 11 Feb 2016 15:36:44 -0700 Subject: [maker-devel] Q on MAKER In-Reply-To: References: <07C1C0FA-243E-4F05-AE17-FBECCE5BC3A5@gmail.com> <980827F7-C4F0-4E26-A480-F588B2E09C7B@gmail.com> <99f6989955acdf6fd6b0875affbeefa9@uci.edu> <308df3bd49b7098b6f0d36cb2675a0de@uci.edu> <4b6492c5148151cc52c91f2d56c6532b@uci.edu> <54B64F8D-D4A6-4E0F-9693-3AC4055CEA02@gmail.com> <5a40b7af9947dc8297046ba52620569e@uci.edu> <9449FD1E-5D22-4B0B-A61A-644D7B63335B@gmail.com> <7e4d6f2773f654f8530155936b648832@uci.edu> <7495272C-476A-4985-8D49-30D991410535@gmail.com> <7870d65f86546a8b486faf98c1f6fcc0@uci.edu> Message-ID: <56F1935F-F6BA-4755-92F2-17EE81909619@gmail.com> Not if you already have trinity results. It will actually decrease the specificity of the run (i.e. causes false gene calls because of spurious evidence support). ?Carson > On Feb 11, 2016, at 3:32 PM, hcma wrote: > > Hi Carlson, > > Thanks for sharing. > > I did assemble the Illumina RNA-seq PE100 reads de novo using Trinity and i input this to the 1st run of maker for generating a set of genes to train SNAP and augustus. Now, i am planning to run a 2nd run (and perhaps final run) of maker for gene prediction, provided that the result of Snap and Augustus looks similar to each other. > > I was going to incorporate the GFF result from tophat into 2nd run of maker for gene prediction, along with Trinity output, but avoiding external protein annotation. I already did a separate blast analysis to identify orthologous genes and i prefer to run maker without any protein evidence. > > Do you recommend to input the output of tophat2gff for this second run of maker for gene prediction? > > Thanks again for your time and advise. > > Best Regards > Karen > > > > On 2016-02-10 18:32, Carson Holt wrote: >> I find tophat results to be too noisy, and prefer cufflinks. There is >> both a tophat2gff and cufflinks2gff script that comes with MAKER. Also >> consider assembling the reads with Trinity (my overall preferred >> method because it yields the highest specificity). >> --Carson >> Sent from my iPhone >>> On Feb 10, 2016, at 3:27 PM, hcma wrote: >>> Hi Mike, >>> Thanks for the reply. So i can input raw RNA-seq reads to Tophat and feed the output to maker? >>> Thanks. >>> Best Regards >>> KAren >>>> On 2016-02-10 06:17, Michael Campbell wrote: >>>> HI Karen, >>>> From my experience trimming reads will not make things worse and it >>>> generally makes things better. As far as the best program to use, one >>>> doesn?t really stand out above the others as far as I can tell. >>>> However, with paired end reads it is important to use a trimmer that >>>> preserves the pairing between the two files (i.e when an entire read >>>> is discarded the paired read is moved into a file for singletons). >>>> Thanks >>>> Mike >>>>> On Feb 9, 2016, at 5:35 PM, hcma wrote: >>>>> Hi Carson, >>>>> For the final run of annotation, I would like to incorporate tophat results from RNA-seq data, from your experience, do you know if it is better to use raw RNA-seq (Illumina paired-end data) or trimmed (trimmed using Trimmomatuc) data for feeding into tophat? If trimmed, do you recommend a particular programme? >>>>> Thanks for your time. >>>>> Best Regards >>>>> KAren >>>>>> On 2016-02-05 15:33, Carson Holt wrote: >>>>>> I recommend using both. You probably don't have augustus installed. >>>>>> --Carson >>>>>> Sent from my iPhone >>>>>>> On Feb 5, 2016, at 4:20 PM, hcma wrote: >>>>>>> Hi Carlson, >>>>>>> Thanks for the instruction and in maker_exe.ctl, i only see path to snap, but not to augustus, so my system admin is checking this for me. >>>>>>> From some manual i found, people use both snap and augustus when using MAKER to annotate genomes. Would you recommend using both or one of the 2 is sufficient? >>>>>>> Thanks for your valuable time and advise. >>>>>>> Best Regards >>>>>>> Karen >>>>>>>> On 2016-02-05 15:03, Carson Holt wrote: >>>>>>>> You need to find out where the augustus MAKER is using is installed. >>>>>>>> Check the maker_exe.ctl file you are using, or type ?which augustus?. >>>>>>>> ?Carson >>>>>>>>> On Feb 5, 2016, at 3:58 PM, hcma wrote: >>>>>>>>> Hi Carlson, >>>>>>>>> These are the list of directories under maker/2.31.8 >>>>>>>>> bin data GMOD INSTALL lib LICENSE MWAS perl README RELEASE src >>>>>>>>> Where can i find augustus/? Or i have to ask my system admin to install this? >>>>>>>>> Thanks. >>>>>>>>> Best Regards >>>>>>>>> Karen >>>>>>>>>> On 2016-02-05 14:54, Carson Holt wrote: >>>>>>>>>> Augustus gives you an entire directory rather than just a single file >>>>>>>>>> like SNAP. You have to take the directory and copy it to the >>>>>>>>>> .../augustus/config/species/ directory. >>>>>>>>>> Example: >>>>>>>>>> ?/augustus/config/species/arabidopsis/ >>>>>>>>>> Then ?arabidopsis? would be the species name to use with MAKER. >>>>>>>>>> Sometimes you may have to do a second round of both SNAP and Augustus >>>>>>>>>> training (called bootstrapping). Look at the models you get after the >>>>>>>>>> first round, and if they look good then, the second round is probably >>>>>>>>>> not going top be beneficial. >>>>>>>>>> ?Carson >>>>>>>>>>> On Feb 5, 2016, at 3:42 PM, hcma wrote: >>>>>>>>>>> Hi Dr Holt, >>>>>>>>>>> Thanks for the email. Here is my pipeline, does it seems acceptable? Any comments is welcome and much appreciated. >>>>>>>>>>> 1. Use maker to generate training gene set: >>>>>>>>>>> genome=all-chromosome-r1.04.fasta >>>>>>>>>>> est=Trinity.fasta >>>>>>>>>>> est2genome=1 >>>>>>>>>>> 2. Use output of Maker to train SNAP: >>>>>>>>>>> maker2zff dwil-all-chromosome-r1.04.all.gff >>>>>>>>>>> fathom genome.ann genome.dna ?gene-stats >>>>>>>>>>> fathom genome.ann genome.dna ?categorize 1000 >>>>>>>>>>> fathom genome.ann genome.dna ?gene-stats >>>>>>>>>>> fathom uni.ann uni.dna ?export 1000 ?plus >>>>>>>>>>> hmm-assembler.pl genome . > dwil_genome.hmm >>>>>>>>>>> 3. Use output of Maker to train Augustus on their webserver: >>>>>>>>>>> File used: >>>>>>>>>>> Upload ?export.dna? as the genome file >>>>>>>>>>> Upload ?export.aa? as the protein file >>>>>>>>>>> 4. second and final Maker run: >>>>>>>>>>> genome=all-chromosome-r1.04.fasta >>>>>>>>>>> est=Trinity.fasta >>>>>>>>>>> est2genome=0 >>>>>>>>>>> Snaphmm=output of 2 >>>>>>>>>>> How do i incorporate the output of training set of gene from Augustus web server here into this step 4? >>>>>>>>>>> Thanks for your time. >>>>>>>>>>> Best Regards >>>>>>>>>>> Karen >>>>>>>>>>>> On 2016-02-05 06:36, Carson Holt wrote: >>>>>>>>>>>> Hi Karen, >>>>>>>>>>>> There are many ways to train Augustus. I prefer to identify gene >>>>>>>>>>>> models in MAKER (GFF3) and use those to train both SNAP and Augustus. >>>>>>>>>>>> Here is a previous post on the topic ?> >>>>>>>>>>>> https://groups.google.com/forum/#!searchin/maker-devel/augustus/maker-devel/FWMSTdqWQqI/lC3miQtiCpwJ >>>>>>>>>>>> [1] >>>>>>>>>>>> In the end you need to look at the SNAP and Augustus models together >>>>>>>>>>>> with evidence alignments in a genome browser (like desktop Apollo). >>>>>>>>>>>> When everything is trained well, both SNAP and Augustus models will >>>>>>>>>>>> look like each other and both seem to look like the evidence >>>>>>>>>>>> alignments. >>>>>>>>>>>> Thanks, >>>>>>>>>>>> Carson >>>>>>>>>>>>> On Feb 4, 2016, at 5:52 PM, hcma wrote: >>>>>>>>>>>>> Hi, >>>>>>>>>>>>> I have a genome sequence and Trinity assembly for a new species and >>>>>>>>>>>>> I am wondering what are the best steps to take when using MAKER? >>>>>>>>>>>>> 1. I used the genome sequence and all assembled Trinity sequence to >>>>>>>>>>>>> do first run of MAKER in order to generate training set for SNAP and >>>>>>>>>>>>> Augustus. >>>>>>>>>>>>> In maker_opts.ctl: >>>>>>>>>>>>> genome=all-chromosome-r1.04.fasta >>>>>>>>>>>>> est=Trinity.fasta >>>>>>>>>>>>> est2genome=1 >>>>>>>>>>>>> 2. Train SNAP >>>>>>>>>>>>> 3. Train Augustus >>>>>>>>>>>>> When i train Augustus, i only supply genome and protein file, should >>>>>>>>>>>>> i also supply the trinity file here? >>>>>>>>>>>>> 4. what's the best parameter to use when running MAKER the second >>>>>>>>>>>>> time for obtaining the final annotation? I would prefer not to use >>>>>>>>>>>>> any external protein data. >>>>>>>>>>>>> genome=all-chromosome-r1.04.fasta >>>>>>>>>>>>> est=Trinity.fasta >>>>>>>>>>>>> est2genome=0 >>>>>>>>>>>>> SNAP >>>>>>>>>>>>> Augustus >>>>>>>>>>>>> Thanks. >>>>>>>>>>>>> Best Regards >>>>>>>>>>>>> KAren >>>>>>>>>>>> Links: >>>>>>>>>>>> ------ >>>>>>>>>>>> [1] >>>>>>>>>>>> https://groups.google.com/forum/#!searchin/maker-devel/augustus/maker-devel/FWMSTdqWQqI/lC3miQtiCpwJ > From hcma at uci.edu Thu Feb 11 17:18:43 2016 From: hcma at uci.edu (hcma) Date: Thu, 11 Feb 2016 16:18:43 -0800 Subject: [maker-devel] Q on MAKER In-Reply-To: References: <07C1C0FA-243E-4F05-AE17-FBECCE5BC3A5@gmail.com> <980827F7-C4F0-4E26-A480-F588B2E09C7B@gmail.com> <99f6989955acdf6fd6b0875affbeefa9@uci.edu> Message-ID: Hi Carson, I have downloaded Apollo and what format of the SNAP and Augustus models does Apollo take? Do i need to reformat the SNAP.hmm and which output of Augustus to use if I train Augustus manually? Thanks again for your time. Best Regards Karen On 2016-02-05 06:36, Carson Holt wrote: > Hi Karen, > > There are many ways to train Augustus. I prefer to identify gene > models in MAKER (GFF3) and use those to train both SNAP and Augustus. > Here is a previous post on the topic ?> > https://groups.google.com/forum/#!searchin/maker-devel/augustus/maker-devel/FWMSTdqWQqI/lC3miQtiCpwJ > [1] > > In the end you need to look at the SNAP and Augustus models together > with evidence alignments in a genome browser (like desktop Apollo). > When everything is trained well, both SNAP and Augustus models will > look like each other and both seem to look like the evidence > alignments. > > Thanks, > Carson > >> On Feb 4, 2016, at 5:52 PM, hcma wrote: >> >> Hi, >> >> I have a genome sequence and Trinity assembly for a new species and >> I am wondering what are the best steps to take when using MAKER? >> >> 1. I used the genome sequence and all assembled Trinity sequence to >> do first run of MAKER in order to generate training set for SNAP and >> Augustus. >> >> In maker_opts.ctl: >> genome=all-chromosome-r1.04.fasta >> est=Trinity.fasta >> est2genome=1 >> >> 2. Train SNAP >> >> 3. Train Augustus >> >> When i train Augustus, i only supply genome and protein file, should >> i also supply the trinity file here? >> >> 4. what's the best parameter to use when running MAKER the second >> time for obtaining the final annotation? I would prefer not to use >> any external protein data. >> >> genome=all-chromosome-r1.04.fasta >> est=Trinity.fasta >> est2genome=0 >> SNAP >> Augustus >> >> Thanks. >> >> Best Regards >> KAren > > > > Links: > ------ > [1] > https://groups.google.com/forum/#!searchin/maker-devel/augustus/maker-devel/FWMSTdqWQqI/lC3miQtiCpwJ From panos.ioannidis at gmail.com Fri Feb 12 01:35:49 2016 From: panos.ioannidis at gmail.com (Panos Ioannidis) Date: Fri, 12 Feb 2016 09:35:49 +0100 Subject: [maker-devel] GFF features from Maker Message-ID: Hi guys, I have a few questions regarding annotated features in the GFF file built by Maker. 1) I'm a bit confused about the annotations coming from "est2genome" and "blastn", because they both give "expressed_sequence_match" features. So, what's the difference between them? How do the EST matches from est2genome differ from those from blastn? 2) Same goes for "protein2genome" and "blastx", since they both give "protein_match" features. 3) Last, what is the difference between the partial matches and full-length matches? For example, in almost all cases where est2genome gives an "expressed_sequence_match" feature for a genomic area, it also gives a "match_part" feature for sub-areas within this area. What is the meaning of this? I'm pasting one such area, below. scaffold3|size1771164 est2genome expressed_sequence_match 21953 22276 949 + . ID=scaffold3|size1771164:hit:1901:3.2.0.0;Name=C24476_a_3_0_l_241 scaffold3|size1771164 est2genome match_part 21953 22035 949 + . ID=scaffold3|size1771164:hsp:1902:3.2.0.0;Parent=scaffold3|size1771164:hit:1901:3.2.0.0;Target=C24476_a_3_0_l_241 1 83 +;Gap=M83 scaffold3|size1771164 est2genome match_part 22148 22276 949 + . ID=scaffold3|size1771164:hsp:1903:3.2.0.0;Parent=scaffold3|size1771164:hit:1901:3.2.0.0;Target=C24476_a_3_0_l_241 84 215 +;Gap=M104 D2 M7 I4 M8 I1 M8 Thanks, Panos -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Feb 12 07:48:46 2016 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 12 Feb 2016 07:48:46 -0700 Subject: [maker-devel] GFF features from Maker In-Reply-To: References: Message-ID: <806D9F3C-13AF-4EDE-ACA8-DA981255E5DD@gmail.com> Hi Panos, Terms used are governed by the sequence ontology (http://www.sequenceontology.org ), and specific definitions can be found there. Terms have a Parent/Child relationship with lower levels being more specific than higher levels. The match feature is used for ab initio reference results rather than the potentially better term predicted_gene because match is already handled correctly by most software and most databases like FlyBase already use it for that purpose (in part because predicted_gene was a latecomer to the ontology list and it is used more often to distinguish accepted models without human curation rather than reference predictions). Since match is an experimental_feature, it matches the expected separation between genes (biological_region) and analysis results (experimental_feature). It?s rather boring and technical, but it?s all the result of carful selection using the Sequence Ontology inheritance levels and term definitions. Example in attached image. ?Carson > On Feb 12, 2016, at 1:35 AM, Panos Ioannidis wrote: > > Hi guys, > > I have a few questions regarding annotated features in the GFF file built by Maker. > > 1) I'm a bit confused about the annotations coming from "est2genome" and "blastn", because they both give "expressed_sequence_match" features. So, what's the difference between them? How do the EST matches from est2genome differ from those from blastn? > > 2) Same goes for "protein2genome" and "blastx", since they both give "protein_match" features. > > 3) Last, what is the difference between the partial matches and full-length matches? For example, in almost all cases where est2genome gives an "expressed_sequence_match" feature for a genomic area, it also gives a "match_part" feature for sub-areas within this area. What is the meaning of this? I'm pasting one such area, below. > > scaffold3|size1771164 est2genome expressed_sequence_match 21953 22276 949 + . ID=scaffold3|size1771164:hit:1901:3.2.0.0;Name=C24476_a_3_0_l_241 > scaffold3|size1771164 est2genome match_part 21953 22035 949 + . ID=scaffold3|size1771164:hsp:1902:3.2.0.0;Parent=scaffold3|size1771164:hit:1901:3.2.0.0;Target=C24476_a_3_0_l_241 1 83 +;Gap=M83 > scaffold3|size1771164 est2genome match_part 22148 22276 949 + . ID=scaffold3|size1771164:hsp:1903:3.2.0.0;Parent=scaffold3|size1771164:hit:1901:3.2.0.0;Target=C24476_a_3_0_l_241 84 215 +;Gap=M104 D2 M7 I4 M8 I1 M8 > > Thanks, > Panos > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: SO-0000102.png Type: image/png Size: 7720 bytes Desc: not available URL: From carsonhh at gmail.com Fri Feb 12 07:56:41 2016 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 12 Feb 2016 07:56:41 -0700 Subject: [maker-devel] GFF features from Maker In-Reply-To: References: Message-ID: <1B5D7E98-850C-4D16-A5C1-5BE1EB5B8735@gmail.com> Also BLAST vs Exonerate is an algorithmic difference. BLAST aligns using traditional Smith Watmerman resulting in potenially out of order sub alignments called HSPs. Exonerate does spice aware alignments (in order and correctly trimmed for splice sites). More info on polishing alignments on wiki page here ?> http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial_for_GMOD_Online_Training_2014#Polishing_Evidence_Alignments ?Carson > On Feb 12, 2016, at 1:35 AM, Panos Ioannidis wrote: > > Hi guys, > > I have a few questions regarding annotated features in the GFF file built by Maker. > > 1) I'm a bit confused about the annotations coming from "est2genome" and "blastn", because they both give "expressed_sequence_match" features. So, what's the difference between them? How do the EST matches from est2genome differ from those from blastn? > > 2) Same goes for "protein2genome" and "blastx", since they both give "protein_match" features. > > 3) Last, what is the difference between the partial matches and full-length matches? For example, in almost all cases where est2genome gives an "expressed_sequence_match" feature for a genomic area, it also gives a "match_part" feature for sub-areas within this area. What is the meaning of this? I'm pasting one such area, below. > > scaffold3|size1771164 est2genome expressed_sequence_match 21953 22276 949 + . ID=scaffold3|size1771164:hit:1901:3.2.0.0;Name=C24476_a_3_0_l_241 > scaffold3|size1771164 est2genome match_part 21953 22035 949 + . ID=scaffold3|size1771164:hsp:1902:3.2.0.0;Parent=scaffold3|size1771164:hit:1901:3.2.0.0;Target=C24476_a_3_0_l_241 1 83 +;Gap=M83 > scaffold3|size1771164 est2genome match_part 22148 22276 949 + . ID=scaffold3|size1771164:hsp:1903:3.2.0.0;Parent=scaffold3|size1771164:hit:1901:3.2.0.0;Target=C24476_a_3_0_l_241 84 215 +;Gap=M104 D2 M7 I4 M8 I1 M8 > > Thanks, > Panos > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From panos.ioannidis at gmail.com Fri Feb 12 07:59:05 2016 From: panos.ioannidis at gmail.com (Panos Ioannidis) Date: Fri, 12 Feb 2016 15:59:05 +0100 Subject: [maker-devel] GFF features from Maker In-Reply-To: <1B5D7E98-850C-4D16-A5C1-5BE1EB5B8735@gmail.com> References: <1B5D7E98-850C-4D16-A5C1-5BE1EB5B8735@gmail.com> Message-ID: Thanks for all the info Carson! Panos On Fri, Feb 12, 2016 at 3:56 PM, Carson Holt wrote: > Also BLAST vs Exonerate is an algorithmic difference. BLAST aligns using > traditional Smith Watmerman resulting in potenially out of order sub > alignments called HSPs. Exonerate does spice aware alignments (in order and > correctly trimmed for splice sites). More info on polishing alignments on > wiki page here ?> > http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial_for_GMOD_Online_Training_2014#Polishing_Evidence_Alignments > > ?Carson > > > > On Feb 12, 2016, at 1:35 AM, Panos Ioannidis > wrote: > > Hi guys, > > I have a few questions regarding annotated features in the GFF file built > by Maker. > > 1) I'm a bit confused about the annotations coming from "est2genome" and > "blastn", because they both give "expressed_sequence_match" features. So, > what's the difference between them? How do the EST matches from est2genome > differ from those from blastn? > > 2) Same goes for "protein2genome" and "blastx", since they both give > "protein_match" features. > > 3) Last, what is the difference between the partial matches and > full-length matches? For example, in almost all cases where est2genome > gives an "expressed_sequence_match" feature for a genomic area, it also > gives a "match_part" feature for sub-areas within this area. What is the > meaning of this? I'm pasting one such area, below. > > scaffold3|size1771164 est2genome expressed_sequence_match > 21953 22276 949 + . > ID=scaffold3|size1771164:hit:1901:3.2.0.0;Name=C24476_a_3_0_l_241 > scaffold3|size1771164 est2genome match_part 21953 22035 > 949 + . > ID=scaffold3|size1771164:hsp:1902:3.2.0.0;Parent=scaffold3|size1771164:hit:1901:3.2.0.0;Target=C24476_a_3_0_l_241 > 1 83 +;Gap=M83 > scaffold3|size1771164 est2genome match_part 22148 22276 > 949 + . > ID=scaffold3|size1771164:hsp:1903:3.2.0.0;Parent=scaffold3|size1771164:hit:1901:3.2.0.0;Target=C24476_a_3_0_l_241 > 84 215 +;Gap=M104 D2 M7 I4 M8 I1 M8 > > Thanks, > Panos > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Feb 12 12:14:16 2016 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 12 Feb 2016 12:14:16 -0700 Subject: [maker-devel] Q on MAKER In-Reply-To: References: <07C1C0FA-243E-4F05-AE17-FBECCE5BC3A5@gmail.com> <980827F7-C4F0-4E26-A480-F588B2E09C7B@gmail.com> <99f6989955acdf6fd6b0875affbeefa9@uci.edu> Message-ID: You need to view the output the programs produce, not the HMM. You can Run them through MAKER and then view the GFF3 files produced Here is a MAKER tutorial where this is done that you can follow along if you wish ?> http://gmod.org/wiki/MAKER_Tutorial_2013#Training_ab_initio_Gene_Predictors For Augustus training there are a number of threads related to how to do that on the MAKER mailing list archives ? https://groups.google.com/forum/#!searchin/maker-devel/augustus Also other resources online ?> http://www.molecularevolution.org/molevolfiles/exercises/augustus/training.html ?Carson > On Feb 11, 2016, at 5:18 PM, hcma wrote: > > Hi Carson, > > I have downloaded Apollo and what format of the SNAP and Augustus models does Apollo take? Do i need to reformat the SNAP.hmm and which output of Augustus to use if I train Augustus manually? > > Thanks again for your time. > > Best Regards > Karen > > > > > On 2016-02-05 06:36, Carson Holt wrote: >> Hi Karen, >> There are many ways to train Augustus. I prefer to identify gene >> models in MAKER (GFF3) and use those to train both SNAP and Augustus. >> Here is a previous post on the topic ?> >> https://groups.google.com/forum/#!searchin/maker-devel/augustus/maker-devel/FWMSTdqWQqI/lC3miQtiCpwJ >> [1] >> In the end you need to look at the SNAP and Augustus models together >> with evidence alignments in a genome browser (like desktop Apollo). >> When everything is trained well, both SNAP and Augustus models will >> look like each other and both seem to look like the evidence >> alignments. >> Thanks, >> Carson >>> On Feb 4, 2016, at 5:52 PM, hcma wrote: >>> Hi, >>> I have a genome sequence and Trinity assembly for a new species and >>> I am wondering what are the best steps to take when using MAKER? >>> 1. I used the genome sequence and all assembled Trinity sequence to >>> do first run of MAKER in order to generate training set for SNAP and >>> Augustus. >>> In maker_opts.ctl: >>> genome=all-chromosome-r1.04.fasta >>> est=Trinity.fasta >>> est2genome=1 >>> 2. Train SNAP >>> 3. Train Augustus >>> When i train Augustus, i only supply genome and protein file, should >>> i also supply the trinity file here? >>> 4. what's the best parameter to use when running MAKER the second >>> time for obtaining the final annotation? I would prefer not to use >>> any external protein data. >>> genome=all-chromosome-r1.04.fasta >>> est=Trinity.fasta >>> est2genome=0 >>> SNAP >>> Augustus >>> Thanks. >>> Best Regards >>> KAren >> Links: >> ------ >> [1] >> https://groups.google.com/forum/#!searchin/maker-devel/augustus/maker-devel/FWMSTdqWQqI/lC3miQtiCpwJ -------------- next part -------------- An HTML attachment was scrubbed... URL: From fdolze at students.uni-mainz.de Tue Feb 16 03:10:03 2016 From: fdolze at students.uni-mainz.de (Florian) Date: Tue, 16 Feb 2016 11:10:03 +0100 Subject: [maker-devel] Estimated runtime on 180mb genome @ 128 cores? In-Reply-To: <56BC65E7.6000904@students.uni-mainz.de> References: <56BC65E7.6000904@students.uni-mainz.de> Message-ID: <56C2F57B.8020208@students.uni-mainz.de> Hi all, I am trying to run MAKER on a project of mine and since this is the first time I use MAKER I'd like to ask some more experienced users what I can expect in regard to resource consumption and runtime of MAKER. My genome data is: * 180.652.019 bp genome length * 5.292 Scaffolds * 34.136 bp median scaffold length * 2.056.324 bp longest * 272.065 bp N50 - I use a 73mb transcriptome assembly as EST Evidence - SwissProt as Protein Homology Evidence - 60kb custom repeat library for RepeatMasker For gene prediction I am running with a SNAP hmm I generated using CEGMA, GeneMark, and Augustus trained by their webservice. I have options est2genome and protein2genome turned on (=1) and use tRNAscan and snoscan. And other options as following: #-----MAKER Behavior Options max_dna_len=2100000 #length for dividing up contigs into chunks (increases/decreases memory usage) <--- Is this reasonable? min_contig=1 #skip genome contigs below this length (under 10kb are often useless) pred_flank=200 #flank for extending evidence clusters sent to gene predictors pred_stats=0 #report AED and QI statistics for all predictions as well as models AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and 1) min_protein=0 #require at least this many amino acids in predicted proteins alt_splice=0 #Take extra steps to try and find alternative splicing, 1 = yes, 0 = no always_complete=0 #extra steps to force start and stop codons, 1 = yes, 0 = no map_forward=0 #map names and attributes forward from old GFF3 genes, 1 = yes, 0 = no keep_preds=1 #Concordance threshold to add unsupported gene prediction (bound by 0 and 1) split_hit=10000 #length for the splitting of hits (expected max intron size for evidence alignments) single_exon=1 #consider single exon EST evidence when generating annotations, 1 = yes, 0 = no single_length=250 #min length required for single exon ESTs if 'single_exon is enabled' correct_est_fusion=0 #limits use of ESTs in annotation to avoid fusion genes The maker_bopts.ctl file is unchanged. (Basically I follow this guide https://github.com/sujaikumar/assemblage/blob/master/README-annotation.md) At the moment I am running this with openMPI as: mpiexec -mca btl ^openib -n 128 /project/molgen/Bio/maker-2.31.8_MPI-1.8.1/bin/maker -base maker_run1 -fix_nucleotides on 128 cores with 130GB of memory. First of all, are those options I use viable? Is it possible to guesstimate the runtime I can expect? 5 days? 20 days? And is it reasonable to use additional cores or will this not benefit much? Thanks for your insights, Florian -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Tue Feb 16 09:42:51 2016 From: dence at genetics.utah.edu (Daniel Ence) Date: Tue, 16 Feb 2016 16:42:51 +0000 Subject: [maker-devel] Estimated runtime on 180mb genome @ 128 cores? In-Reply-To: <56C2F57B.8020208@students.uni-mainz.de> References: <56BC65E7.6000904@students.uni-mainz.de> <56C2F57B.8020208@students.uni-mainz.de> Message-ID: Hi Florian, I don?t think you want est2genome or protein2genome turned on for this run. Est2genome is usually only used if you don?t have any ab-initio predictors trained; protein2genome should only be used if you have good reason not to expect any introns at a ll (for example, a prokaryotic genome). Also, you set the max_dna_len parameter for 2.1Mbp, which is larger than your N50. Setting this too large prevents MAKER from speeding up it?s analysis by splitting contigs/scaffolds across multiple processors. There?s usually no reason to change this from the default setting. With a good N50 like you have, you?ll probably get good results. ~Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 On Feb 16, 2016, at 3:10 AM, Florian > wrote: Hi all, I am trying to run MAKER on a project of mine and since this is the first time I use MAKER I'd like to ask some more experienced users what I can expect in regard to resource consumption and runtime of MAKER. My genome data is: * 180.652.019 bp genome length * 5.292 Scaffolds * 34.136 bp median scaffold length * 2.056.324 bp longest * 272.065 bp N50 - I use a 73mb transcriptome assembly as EST Evidence - SwissProt as Protein Homology Evidence - 60kb custom repeat library for RepeatMasker For gene prediction I am running with a SNAP hmm I generated using CEGMA, GeneMark, and Augustus trained by their webservice. I have options est2genome and protein2genome turned on (=1) and use tRNAscan and snoscan. And other options as following: #-----MAKER Behavior Options max_dna_len=2100000 #length for dividing up contigs into chunks (increases/decreases memory usage) <--- Is this reasonable? min_contig=1 #skip genome contigs below this length (under 10kb are often useless) pred_flank=200 #flank for extending evidence clusters sent to gene predictors pred_stats=0 #report AED and QI statistics for all predictions as well as models AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and 1) min_protein=0 #require at least this many amino acids in predicted proteins alt_splice=0 #Take extra steps to try and find alternative splicing, 1 = yes, 0 = no always_complete=0 #extra steps to force start and stop codons, 1 = yes, 0 = no map_forward=0 #map names and attributes forward from old GFF3 genes, 1 = yes, 0 = no keep_preds=1 #Concordance threshold to add unsupported gene prediction (bound by 0 and 1) split_hit=10000 #length for the splitting of hits (expected max intron size for evidence alignments) single_exon=1 #consider single exon EST evidence when generating annotations, 1 = yes, 0 = no single_length=250 #min length required for single exon ESTs if 'single_exon is enabled' correct_est_fusion=0 #limits use of ESTs in annotation to avoid fusion genes The maker_bopts.ctl file is unchanged. (Basically I follow this guide https://github.com/sujaikumar/assemblage/blob/master/README-annotation.md) At the moment I am running this with openMPI as: mpiexec -mca btl ^openib -n 128 /project/molgen/Bio/maker-2.31.8_MPI-1.8.1/bin/maker -base maker_run1 -fix_nucleotides on 128 cores with 130GB of memory. First of all, are those options I use viable? Is it possible to guesstimate the runtime I can expect? 5 days? 20 days? And is it reasonable to use additional cores or will this not benefit much? Thanks for your insights, Florian _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Feb 16 09:53:55 2016 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 16 Feb 2016 09:53:55 -0700 Subject: [maker-devel] Estimated runtime on 180mb genome @ 128 cores? In-Reply-To: References: <56BC65E7.6000904@students.uni-mainz.de> <56C2F57B.8020208@students.uni-mainz.de> Message-ID: Agree. 500,000 is about the highest you ever want to go with max_dna_len. Increasing the value decreases parallelization and increases memory usage. The only biological reason to ever increase it is if genes are really long and don?t fit into windows of this size. Also test out the mpiexec command with something like ?hostname? to make sure it works. Example ?> mpiexec -mca btl ^openib -n 128 hostname Should print out 128 lines identifying all hosts in the communication ring. If it prints out the same host ID every time, then there is a problem and you may need to provide a hostfile to let mpiexec know all the hosts it can run across. ?Carson > On Feb 16, 2016, at 9:42 AM, Daniel Ence wrote: > > Hi Florian, I don?t think you want est2genome or protein2genome turned on for this run. Est2genome is usually only used if you don?t have any ab-initio predictors trained; protein2genome should only be used if you have good reason not to expect any introns at a ll (for example, a prokaryotic genome). > > Also, you set the max_dna_len parameter for 2.1Mbp, which is larger than your N50. Setting this too large prevents MAKER from speeding up it?s analysis by splitting contigs/scaffolds across multiple processors. There?s usually no reason to change this from the default setting. > > With a good N50 like you have, you?ll probably get good results. > > ~Daniel > > > > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > >> On Feb 16, 2016, at 3:10 AM, Florian > wrote: >> >> Hi all, >> >> I am trying to run MAKER on a project of mine and since this is the first time I use MAKER I'd like to ask some more experienced users what I can expect in regard to resource consumption and runtime of MAKER. >> >> My genome data is: >> >> 180.652.019 bp genome length >> 5.292 Scaffolds >> 34.136 bp median scaffold length >> 2.056.324 bp longest >> 272.065 bp N50 >> - I use a 73mb transcriptome assembly as EST Evidence >> - SwissProt as Protein Homology Evidence >> - 60kb custom repeat library for RepeatMasker >> >> >> >> For gene prediction I am running with a SNAP hmm I generated using CEGMA, GeneMark, and Augustus trained by their webservice. >> I have options est2genome and protein2genome turned on (=1) and use tRNAscan and snoscan. And other options as following: >> >> #-----MAKER Behavior Options >> max_dna_len=2100000 #length for dividing up contigs into chunks (increases/decreases memory usage) <--- Is this reasonable? >> min_contig=1 #skip genome contigs below this length (under 10kb are often useless) >> >> pred_flank=200 #flank for extending evidence clusters sent to gene predictors >> pred_stats=0 #report AED and QI statistics for all predictions as well as models >> AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and 1) >> min_protein=0 #require at least this many amino acids in predicted proteins >> alt_splice=0 #Take extra steps to try and find alternative splicing, 1 = yes, 0 = no >> always_complete=0 #extra steps to force start and stop codons, 1 = yes, 0 = no >> map_forward=0 #map names and attributes forward from old GFF3 genes, 1 = yes, 0 = no >> keep_preds=1 #Concordance threshold to add unsupported gene prediction (bound by 0 and 1) >> >> split_hit=10000 #length for the splitting of hits (expected max intron size for evidence alignments) >> single_exon=1 #consider single exon EST evidence when generating annotations, 1 = yes, 0 = no >> single_length=250 #min length required for single exon ESTs if 'single_exon is enabled' >> correct_est_fusion=0 #limits use of ESTs in annotation to avoid fusion genes >> >> The maker_bopts.ctl file is unchanged. >> >> (Basically I follow this guide https://github.com/sujaikumar/assemblage/blob/master/README-annotation.md ) >> >> >> At the moment I am running this with openMPI as: >> >> mpiexec -mca btl ^openib -n 128 /project/molgen/Bio/maker-2.31.8_MPI-1.8.1/bin/maker -base maker_run1 -fix_nucleotides >> >> on 128 cores with 130GB of memory. >> >> >> First of all, are those options I use viable? >> >> Is it possible to guesstimate the runtime I can expect? 5 days? 20 days? And is it reasonable to use additional cores or will this not benefit much? >> >> Thanks for your insights, >> Florian >> >> >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From alejocn5 at gmail.com Tue Feb 16 14:17:40 2016 From: alejocn5 at gmail.com (=?UTF-8?Q?Alejandro_Cer=C3=B3n_Noriega?=) Date: Tue, 16 Feb 2016 16:17:40 -0500 Subject: [maker-devel] problem with the example Message-ID: hello i am Alejandro I have tried to follow the tutorial MAKER 1-I Copy the files in the data directories to a temporary directory where i run an example file. 2-I Type maker -CTL to generate generic MAKER control files (foto_1) 3-I edit the control files to include the path of the genome file ( hsap_contig.fasta from the example) (foto_2) then I give the paht maker maker_exe.ctl maker_opts.ctl maker_bopts.ctl (foto 3) that generate a expected folder hsap_contig.maker.output but when i whatn to look for the gff file i dont find it, inside the /data/hsap_contig.maker.output/hsap_contig_datastore, i dont find the all subdirectories seq_name.gff - a gff file that can be loaded into GMOD, GBROWSE, or Apollo * seq_name.maker.transcripts.fasta - a fasta file of the MAKER annotated transcript sequences * seq_name.maker.proteins.fasta - a fasta file of the MAKER annotated protein sequences * seq_name.maker.XXX.transcript.fasta - a fasta file of ab-initio predicted transcript sequences from program XXX * seq_name.maker.XXX.proteins.fasta - a fasta file of ab-inito predicted protein sequences from program XXX * seq_name.maker.non_overlapping_ab_initio.transcripts.fasta - a fasta file of filtered ab-inito transcript sequences that don't overlap maker annotations * seq_name.maker.non_overlapping_ab_initio.proteins.fasta - a fasta file of filtered ab-inito protein sequences that don't overlap maker annotations * theVoid.seq_name/ - a directory containing all of the raw output files produced by MAKER, including BLAST reports, SNAP output, exonnerate output and the masked genomeic sequence. i only find a directorie named 80 (foto 4) i dont know if a make somthing wrong, also try to change the path of the EST (foto_5) thanks for your attention -- *Alejandro Cer?n Noriega, **B.Sc* MSc. Candidate Bioinformatics *K ?**?**?* -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: foto_1.png Type: image/png Size: 67330 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: foto_2.png Type: image/png Size: 257578 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Foto_3.png Type: image/png Size: 213241 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: foto_4.png Type: image/png Size: 129352 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: foto_5.png Type: image/png Size: 255944 bytes Desc: not available URL: From carsonhh at gmail.com Thu Feb 18 12:36:13 2016 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 18 Feb 2016 12:36:13 -0700 Subject: [maker-devel] problem with the example In-Reply-To: References: Message-ID: <4CD9B36B-8C9D-4E48-B1B6-ACAFF28DF3B2@gmail.com> To access files for individual sequences use the datastore index: /scratchsan/caceronn/Results/MAKER/data/hsap_contig.maker.output/hsap_contig_master_datastore_index.log look in that file to find the location of individual contig results. For merged results you have to use the gff3_merge script together with the datastore index. Here is a nice tutorial with step by step instructions and a video to easilly follow along ?> http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial_for_GMOD_Online_Training_2014 ?Carson > On Feb 16, 2016, at 2:17 PM, Alejandro Cer?n Noriega wrote: > > hello i am Alejandro > > I have tried to follow the tutorial MAKER > > 1-I Copy the files in the data directories to a temporary directory where i run an example file. > 2-I Type maker -CTL to generate generic MAKER control files (foto_1) > 3-I edit the control files to include the path of the genome file ( hsap_contig.fasta from the example) (foto_2) > then I give the paht maker maker_exe.ctl maker_opts.ctl maker_bopts.ctl (foto 3) > > that generate a expected folder > hsap_contig.maker.output > > but when i whatn to look for the gff file i dont find it, inside the /data/hsap_contig.maker.output/hsap_contig_datastore, i dont find the all subdirectories > > seq_name.gff - a gff file that can be loaded into GMOD, GBROWSE, > or Apollo > * seq_name.maker.transcripts.fasta - a fasta file of the MAKER > annotated transcript sequences > * seq_name.maker.proteins.fasta - a fasta file of the MAKER > annotated protein sequences > * seq_name.maker.XXX.transcript.fasta - a fasta file of ab-initio > predicted transcript sequences from program XXX > * seq_name.maker.XXX.proteins.fasta - a fasta file of ab-inito > predicted protein sequences from program XXX > * seq_name.maker.non_overlapping_ab_initio.transcripts.fasta - a > fasta file of filtered ab-inito transcript sequences that don't > overlap maker annotations > * seq_name.maker.non_overlapping_ab_initio.proteins.fasta - a > fasta file of filtered ab-inito protein sequences that don't > overlap maker annotations > * theVoid.seq_name/ - a directory containing all of the raw > output files produced by MAKER, including BLAST reports, SNAP > output, exonnerate output and the masked genomeic sequence. > > i only find a directorie named 80 (foto 4) > > i dont know if a make somthing wrong, > > also try to change the path of the EST (foto_5) > > > thanks for your attention > > > -- > Alejandro Cer?n Noriega, B.Sc > MSc. Candidate Bioinformatics > K ??? > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From fdolze at students.uni-mainz.de Fri Feb 26 07:16:10 2016 From: fdolze at students.uni-mainz.de (Florian) Date: Fri, 26 Feb 2016 15:16:10 +0100 Subject: [maker-devel] Possible to redirect maker output? Message-ID: <56D05E2A.1040201@students.uni-mainz.de> Hi all, I am trying to run maker on a cluster (2 nodes with 64 cores each), to speed things up I copied all input files to a ramdisk to reduce I/O time, but all subsequent results are still written to hdd. Is there a way I can tell maker to write the maker.results files to ramdisk (or generally any other directory than the current working dir) too? (are they actually used for the current run or are only files in the temp files location used?) Is anybody experienced with running maker on a similar setup and could tell me how you are handling this? thanks, Florian From scott at scottcain.net Fri Feb 26 10:50:06 2016 From: scott at scottcain.net (Scott Cain) Date: Fri, 26 Feb 2016 12:50:06 -0500 Subject: [maker-devel] GMOD 2016 meeting Message-ID: Hello all, I am pleased to announce that details have been finalized for the 2016 GMOD meeting. It will take place immediately following the Galaxy Community Conference at Indiana University in Bloomington, IN on June 30 and July 1. We're still working on agenda details, so if you have suggestions or would like to present, please let me know. For registration information, please see: https://gmod2016.eventbrite.com And for other information about the meeting, keep an eye on: http://gmod.org/wiki/Jun_2016_GMOD_Meeting I look forward to seeing you there! Scott -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research -------------- next part -------------- An HTML attachment was scrubbed... URL: From gloriarendon at gmail.com Fri Feb 26 15:14:26 2016 From: gloriarendon at gmail.com (Gloria Rendon) Date: Fri, 26 Feb 2016 16:14:26 -0600 Subject: [maker-devel] MAKER/3.00.0-beta: missing some accessory scripts Message-ID: Hello, My name is Gloria Rendon. I work at the Carle Woese Institute for Genomic Biology at the University of Illinois at Urbana-Champaign. In recent months we used MAKER/3.00.0-beta to generate annotations (in GFF3 format) for a de-novo assembly that we produced in-house for the Taro plant. As part of the same project, I need to run now an analysis with RNA seq data for the same Tarospecies. I am going to use STAR for the alignment step and I need to provide the annotations file in GTF format, not in GFF3 format as I currently have. In order to perform the GFF3->GTF conversion I was planning to run some of the accessory scripts that come with MAKER add_utr_start_stop_gff gff3_2_gtf However I just noticed that my installation of MAKER is missing those two scripts. This is how the MAKER/bin folder looks like now: $ ls /home/groups/hpcbio/apps/maker/maker-3.00.0-beta/bin/ AED_cdf_generator.pl ipr_update_gff maker_map_ids cegma2zff iprscan2gff3 map2assembly chado2gff3 maker map_data_ids compare_gff3_to_chado maker2chado map_fasta_ids cufflinks2gff3 maker2eval_gtf map_gff_ids evaluator maker2jbrowse match2gene.pl fasta_merge maker2wap quality_filter.pl fasta_tool maker2zff tophat2gff3 genemark_gtf2gff3 maker_functional_fasta gff3_merge maker_functional_gff btw, earlier versions of MAKER that are also installed on our cluster as also missing those scripts. Could you please tell me how to remedy the situation? Do you have executables of the two scripts that you can share with me? OR Do I need to re-install MAKER with special configuration options? Thank you very much for the attention to this matter. Sincerely, Gloria -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Feb 29 12:09:14 2016 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 29 Feb 2016 12:09:14 -0700 Subject: [maker-devel] Possible to redirect maker output? In-Reply-To: <56D05E2A.1040201@students.uni-mainz.de> References: <56D05E2A.1040201@students.uni-mainz.de> Message-ID: <75FD2CDE-AD66-416A-9A3E-6AF49B3FB13F@gmail.com> You can try setting TMP= in the control files to a RAM disk location (You will need a lot of RAM though, perhaps 500Gb). Even then some components used by MAKER may not function properly with tmpfs, but you can try. If it doesn?t work you?ll get an error. The main output directory on the other hand must be globally accessible to all nodes if working with MPI, and a RAM disk will only exist and be accessible on a single node (even though a directory with the same name may exists on multiple nodes, they will actually be separate and distinct locations, i.e. /dev/shm). ?Carson > On Feb 26, 2016, at 7:16 AM, Florian wrote: > > Hi all, > > I am trying to run maker on a cluster (2 nodes with 64 cores each), to speed things up I copied all input files to a ramdisk to reduce I/O time, but all subsequent results are still written to hdd. > > Is there a way I can tell maker to write the maker.results files to ramdisk (or generally any other directory than the current working dir) too? (are they actually used for the current run or are only files in the temp files location used?) > > Is anybody experienced with running maker on a similar setup and could tell me how you are handling this? > > > thanks, > Florian > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Mon Feb 29 12:17:29 2016 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 29 Feb 2016 12:17:29 -0700 Subject: [maker-devel] MAKER/3.00.0-beta: missing some accessory scripts In-Reply-To: References: Message-ID: You should be using maker2eval_gtf. The scripts you mention were actually deprecated in MAKER 2.10 onwards (about 5 years ago). You may be looking at old documentation. ?Carson > On Feb 26, 2016, at 3:14 PM, Gloria Rendon wrote: > > Hello, > > My name is Gloria Rendon. I work at the Carle Woese Institute for Genomic Biology at the University of Illinois at Urbana-Champaign. > > In recent months we used MAKER/3.00.0-beta to generate annotations (in GFF3 format) for a de-novo assembly that we produced in-house for the Taro plant. > > As part of the same project, I need to run now an analysis with RNA seq data for the same Tarospecies. > I am going to use STAR for the alignment step and I need to provide the annotations file in GTF format, not in GFF3 format as I currently have. > > In order to perform the GFF3->GTF conversion I was planning to run some of the accessory scripts that come with MAKER > > add_utr_start_stop_gff > gff3_2_gtf > > However I just noticed that my installation of MAKER is missing those two scripts. > This is how the MAKER/bin folder looks like now: > > $ ls /home/groups/hpcbio/apps/maker/maker-3.00.0-beta/bin/ > AED_cdf_generator.pl ipr_update_gff maker_map_ids > cegma2zff iprscan2gff3 map2assembly > chado2gff3 maker map_data_ids > compare_gff3_to_chado maker2chado map_fasta_ids > cufflinks2gff3 maker2eval_gtf map_gff_ids > evaluator maker2jbrowse match2gene.pl > fasta_merge maker2wap quality_filter.pl > fasta_tool maker2zff tophat2gff3 > genemark_gtf2gff3 maker_functional_fasta > gff3_merge maker_functional_gff > > > btw, earlier versions of MAKER that are also installed on our cluster as also missing those scripts. > > Could you please tell me how to remedy the situation? > Do you have executables of the two scripts that you can share with me? > OR > Do I need to re-install MAKER with special configuration options? > > Thank you very much for the attention to this matter. > > Sincerely, > > Gloria > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From hcma at uci.edu Thu Feb 4 17:52:12 2016 From: hcma at uci.edu (hcma) Date: Thu, 04 Feb 2016 16:52:12 -0800 Subject: [maker-devel] Q on MAKER In-Reply-To: <980827F7-C4F0-4E26-A480-F588B2E09C7B@gmail.com> References: <07C1C0FA-243E-4F05-AE17-FBECCE5BC3A5@gmail.com> <980827F7-C4F0-4E26-A480-F588B2E09C7B@gmail.com> Message-ID: <99f6989955acdf6fd6b0875affbeefa9@uci.edu> Hi, I have a genome sequence and Trinity assembly for a new species and I am wondering what are the best steps to take when using MAKER? 1. I used the genome sequence and all assembled Trinity sequence to do first run of MAKER in order to generate training set for SNAP and Augustus. In maker_opts.ctl: genome=all-chromosome-r1.04.fasta est=Trinity.fasta est2genome=1 2. Train SNAP 3. Train Augustus When i train Augustus, i only supply genome and protein file, should i also supply the trinity file here? 4. what's the best parameter to use when running MAKER the second time for obtaining the final annotation? I would prefer not to use any external protein data. genome=all-chromosome-r1.04.fasta est=Trinity.fasta est2genome=0 SNAP Augustus Thanks. Best Regards KAren From carsonhh at gmail.com Fri Feb 5 07:36:06 2016 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 5 Feb 2016 07:36:06 -0700 Subject: [maker-devel] Q on MAKER In-Reply-To: <99f6989955acdf6fd6b0875affbeefa9@uci.edu> References: <07C1C0FA-243E-4F05-AE17-FBECCE5BC3A5@gmail.com> <980827F7-C4F0-4E26-A480-F588B2E09C7B@gmail.com> <99f6989955acdf6fd6b0875affbeefa9@uci.edu> Message-ID: Hi Karen, There are many ways to train Augustus. I prefer to identify gene models in MAKER (GFF3) and use those to train both SNAP and Augustus. Here is a previous post on the topic ?> https://groups.google.com/forum/#!searchin/maker-devel/augustus/maker-devel/FWMSTdqWQqI/lC3miQtiCpwJ In the end you need to look at the SNAP and Augustus models together with evidence alignments in a genome browser (like desktop Apollo). When everything is trained well, both SNAP and Augustus models will look like each other and both seem to look like the evidence alignments. Thanks, Carson > On Feb 4, 2016, at 5:52 PM, hcma wrote: > > Hi, > > I have a genome sequence and Trinity assembly for a new species and I am wondering what are the best steps to take when using MAKER? > > 1. I used the genome sequence and all assembled Trinity sequence to do first run of MAKER in order to generate training set for SNAP and Augustus. > > In maker_opts.ctl: > genome=all-chromosome-r1.04.fasta > est=Trinity.fasta > est2genome=1 > > > 2. Train SNAP > > 3. Train Augustus > > When i train Augustus, i only supply genome and protein file, should i also supply the trinity file here? > > > 4. what's the best parameter to use when running MAKER the second time for obtaining the final annotation? I would prefer not to use any external protein data. > > genome=all-chromosome-r1.04.fasta > est=Trinity.fasta > est2genome=0 > SNAP > Augustus > > Thanks. > > Best Regards > KAren -------------- next part -------------- An HTML attachment was scrubbed... URL: From hcma at uci.edu Fri Feb 5 15:42:37 2016 From: hcma at uci.edu (hcma) Date: Fri, 05 Feb 2016 14:42:37 -0800 Subject: [maker-devel] Q on MAKER In-Reply-To: References: <07C1C0FA-243E-4F05-AE17-FBECCE5BC3A5@gmail.com> <980827F7-C4F0-4E26-A480-F588B2E09C7B@gmail.com> <99f6989955acdf6fd6b0875affbeefa9@uci.edu> Message-ID: <308df3bd49b7098b6f0d36cb2675a0de@uci.edu> Hi Dr Holt, Thanks for the email. Here is my pipeline, does it seems acceptable? Any comments is welcome and much appreciated. 1. Use maker to generate training gene set: genome=all-chromosome-r1.04.fasta est=Trinity.fasta est2genome=1 2. Use output of Maker to train SNAP: maker2zff dwil-all-chromosome-r1.04.all.gff fathom genome.ann genome.dna ?gene-stats fathom genome.ann genome.dna ?categorize 1000 fathom genome.ann genome.dna ?gene-stats fathom uni.ann uni.dna ?export 1000 ?plus hmm-assembler.pl genome . > dwil_genome.hmm 3. Use output of Maker to train Augustus on their webserver: File used: Upload ?export.dna? as the genome file Upload ?export.aa? as the protein file 4. second and final Maker run: genome=all-chromosome-r1.04.fasta est=Trinity.fasta est2genome=0 Snaphmm=output of 2 How do i incorporate the output of training set of gene from Augustus web server here into this step 4? Thanks for your time. Best Regards Karen On 2016-02-05 06:36, Carson Holt wrote: > Hi Karen, > > There are many ways to train Augustus. I prefer to identify gene > models in MAKER (GFF3) and use those to train both SNAP and Augustus. > Here is a previous post on the topic ?> > https://groups.google.com/forum/#!searchin/maker-devel/augustus/maker-devel/FWMSTdqWQqI/lC3miQtiCpwJ > [1] > > In the end you need to look at the SNAP and Augustus models together > with evidence alignments in a genome browser (like desktop Apollo). > When everything is trained well, both SNAP and Augustus models will > look like each other and both seem to look like the evidence > alignments. > > Thanks, > Carson > >> On Feb 4, 2016, at 5:52 PM, hcma wrote: >> >> Hi, >> >> I have a genome sequence and Trinity assembly for a new species and >> I am wondering what are the best steps to take when using MAKER? >> >> 1. I used the genome sequence and all assembled Trinity sequence to >> do first run of MAKER in order to generate training set for SNAP and >> Augustus. >> >> In maker_opts.ctl: >> genome=all-chromosome-r1.04.fasta >> est=Trinity.fasta >> est2genome=1 >> >> 2. Train SNAP >> >> 3. Train Augustus >> >> When i train Augustus, i only supply genome and protein file, should >> i also supply the trinity file here? >> >> 4. what's the best parameter to use when running MAKER the second >> time for obtaining the final annotation? I would prefer not to use >> any external protein data. >> >> genome=all-chromosome-r1.04.fasta >> est=Trinity.fasta >> est2genome=0 >> SNAP >> Augustus >> >> Thanks. >> >> Best Regards >> KAren > > > > Links: > ------ > [1] > https://groups.google.com/forum/#!searchin/maker-devel/augustus/maker-devel/FWMSTdqWQqI/lC3miQtiCpwJ From carsonhh at gmail.com Fri Feb 5 15:54:58 2016 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 5 Feb 2016 15:54:58 -0700 Subject: [maker-devel] Q on MAKER In-Reply-To: <308df3bd49b7098b6f0d36cb2675a0de@uci.edu> References: <07C1C0FA-243E-4F05-AE17-FBECCE5BC3A5@gmail.com> <980827F7-C4F0-4E26-A480-F588B2E09C7B@gmail.com> <99f6989955acdf6fd6b0875affbeefa9@uci.edu> <308df3bd49b7098b6f0d36cb2675a0de@uci.edu> Message-ID: Augustus gives you an entire directory rather than just a single file like SNAP. You have to take the directory and copy it to the .../augustus/config/species/ directory. Example: ?/augustus/config/species/arabidopsis/ Then ?arabidopsis? would be the species name to use with MAKER. Sometimes you may have to do a second round of both SNAP and Augustus training (called bootstrapping). Look at the models you get after the first round, and if they look good then, the second round is probably not going top be beneficial. ?Carson > On Feb 5, 2016, at 3:42 PM, hcma wrote: > > Hi Dr Holt, > > Thanks for the email. Here is my pipeline, does it seems acceptable? Any comments is welcome and much appreciated. > > > 1. Use maker to generate training gene set: > > genome=all-chromosome-r1.04.fasta > est=Trinity.fasta > est2genome=1 > > > 2. Use output of Maker to train SNAP: > > maker2zff dwil-all-chromosome-r1.04.all.gff > fathom genome.ann genome.dna ?gene-stats > fathom genome.ann genome.dna ?categorize 1000 > fathom genome.ann genome.dna ?gene-stats > fathom uni.ann uni.dna ?export 1000 ?plus > hmm-assembler.pl genome . > dwil_genome.hmm > > > 3. Use output of Maker to train Augustus on their webserver: > > File used: > > Upload ?export.dna? as the genome file > Upload ?export.aa? as the protein file > > > > 4. second and final Maker run: > > > genome=all-chromosome-r1.04.fasta > est=Trinity.fasta > est2genome=0 > Snaphmm=output of 2 > > How do i incorporate the output of training set of gene from Augustus web server here into this step 4? > > Thanks for your time. > > Best Regards > Karen > > > > > > > > > > > > On 2016-02-05 06:36, Carson Holt wrote: >> Hi Karen, >> There are many ways to train Augustus. I prefer to identify gene >> models in MAKER (GFF3) and use those to train both SNAP and Augustus. >> Here is a previous post on the topic ?> >> https://groups.google.com/forum/#!searchin/maker-devel/augustus/maker-devel/FWMSTdqWQqI/lC3miQtiCpwJ >> [1] >> In the end you need to look at the SNAP and Augustus models together >> with evidence alignments in a genome browser (like desktop Apollo). >> When everything is trained well, both SNAP and Augustus models will >> look like each other and both seem to look like the evidence >> alignments. >> Thanks, >> Carson >>> On Feb 4, 2016, at 5:52 PM, hcma wrote: >>> Hi, >>> I have a genome sequence and Trinity assembly for a new species and >>> I am wondering what are the best steps to take when using MAKER? >>> 1. I used the genome sequence and all assembled Trinity sequence to >>> do first run of MAKER in order to generate training set for SNAP and >>> Augustus. >>> In maker_opts.ctl: >>> genome=all-chromosome-r1.04.fasta >>> est=Trinity.fasta >>> est2genome=1 >>> 2. Train SNAP >>> 3. Train Augustus >>> When i train Augustus, i only supply genome and protein file, should >>> i also supply the trinity file here? >>> 4. what's the best parameter to use when running MAKER the second >>> time for obtaining the final annotation? I would prefer not to use >>> any external protein data. >>> genome=all-chromosome-r1.04.fasta >>> est=Trinity.fasta >>> est2genome=0 >>> SNAP >>> Augustus >>> Thanks. >>> Best Regards >>> KAren >> Links: >> ------ >> [1] >> https://groups.google.com/forum/#!searchin/maker-devel/augustus/maker-devel/FWMSTdqWQqI/lC3miQtiCpwJ > From hcma at uci.edu Fri Feb 5 15:58:56 2016 From: hcma at uci.edu (hcma) Date: Fri, 05 Feb 2016 14:58:56 -0800 Subject: [maker-devel] Q on MAKER In-Reply-To: References: <07C1C0FA-243E-4F05-AE17-FBECCE5BC3A5@gmail.com> <980827F7-C4F0-4E26-A480-F588B2E09C7B@gmail.com> <99f6989955acdf6fd6b0875affbeefa9@uci.edu> <308df3bd49b7098b6f0d36cb2675a0de@uci.edu> Message-ID: <4b6492c5148151cc52c91f2d56c6532b@uci.edu> Hi Carlson, These are the list of directories under maker/2.31.8 bin data GMOD INSTALL lib LICENSE MWAS perl README RELEASE src Where can i find augustus/? Or i have to ask my system admin to install this? Thanks. Best Regards Karen On 2016-02-05 14:54, Carson Holt wrote: > Augustus gives you an entire directory rather than just a single file > like SNAP. You have to take the directory and copy it to the > .../augustus/config/species/ directory. > > Example: > ?/augustus/config/species/arabidopsis/ > > Then ?arabidopsis? would be the species name to use with MAKER. > > Sometimes you may have to do a second round of both SNAP and Augustus > training (called bootstrapping). Look at the models you get after the > first round, and if they look good then, the second round is probably > not going top be beneficial. > > ?Carson > > > >> On Feb 5, 2016, at 3:42 PM, hcma wrote: >> >> Hi Dr Holt, >> >> Thanks for the email. Here is my pipeline, does it seems acceptable? >> Any comments is welcome and much appreciated. >> >> >> 1. Use maker to generate training gene set: >> >> genome=all-chromosome-r1.04.fasta >> est=Trinity.fasta >> est2genome=1 >> >> >> 2. Use output of Maker to train SNAP: >> >> maker2zff dwil-all-chromosome-r1.04.all.gff >> fathom genome.ann genome.dna ?gene-stats >> fathom genome.ann genome.dna ?categorize 1000 >> fathom genome.ann genome.dna ?gene-stats >> fathom uni.ann uni.dna ?export 1000 ?plus >> hmm-assembler.pl genome . > dwil_genome.hmm >> >> >> 3. Use output of Maker to train Augustus on their webserver: >> >> File used: >> >> Upload ?export.dna? as the genome file >> Upload ?export.aa? as the protein file >> >> >> >> 4. second and final Maker run: >> >> >> genome=all-chromosome-r1.04.fasta >> est=Trinity.fasta >> est2genome=0 >> Snaphmm=output of 2 >> >> How do i incorporate the output of training set of gene from Augustus >> web server here into this step 4? >> >> Thanks for your time. >> >> Best Regards >> Karen >> >> >> >> >> >> >> >> >> >> >> >> On 2016-02-05 06:36, Carson Holt wrote: >>> Hi Karen, >>> There are many ways to train Augustus. I prefer to identify gene >>> models in MAKER (GFF3) and use those to train both SNAP and Augustus. >>> Here is a previous post on the topic ?> >>> https://groups.google.com/forum/#!searchin/maker-devel/augustus/maker-devel/FWMSTdqWQqI/lC3miQtiCpwJ >>> [1] >>> In the end you need to look at the SNAP and Augustus models together >>> with evidence alignments in a genome browser (like desktop Apollo). >>> When everything is trained well, both SNAP and Augustus models will >>> look like each other and both seem to look like the evidence >>> alignments. >>> Thanks, >>> Carson >>>> On Feb 4, 2016, at 5:52 PM, hcma wrote: >>>> Hi, >>>> I have a genome sequence and Trinity assembly for a new species and >>>> I am wondering what are the best steps to take when using MAKER? >>>> 1. I used the genome sequence and all assembled Trinity sequence to >>>> do first run of MAKER in order to generate training set for SNAP and >>>> Augustus. >>>> In maker_opts.ctl: >>>> genome=all-chromosome-r1.04.fasta >>>> est=Trinity.fasta >>>> est2genome=1 >>>> 2. Train SNAP >>>> 3. Train Augustus >>>> When i train Augustus, i only supply genome and protein file, should >>>> i also supply the trinity file here? >>>> 4. what's the best parameter to use when running MAKER the second >>>> time for obtaining the final annotation? I would prefer not to use >>>> any external protein data. >>>> genome=all-chromosome-r1.04.fasta >>>> est=Trinity.fasta >>>> est2genome=0 >>>> SNAP >>>> Augustus >>>> Thanks. >>>> Best Regards >>>> KAren >>> Links: >>> ------ >>> [1] >>> https://groups.google.com/forum/#!searchin/maker-devel/augustus/maker-devel/FWMSTdqWQqI/lC3miQtiCpwJ >> From carsonhh at gmail.com Fri Feb 5 16:03:56 2016 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 5 Feb 2016 16:03:56 -0700 Subject: [maker-devel] Q on MAKER In-Reply-To: <4b6492c5148151cc52c91f2d56c6532b@uci.edu> References: <07C1C0FA-243E-4F05-AE17-FBECCE5BC3A5@gmail.com> <980827F7-C4F0-4E26-A480-F588B2E09C7B@gmail.com> <99f6989955acdf6fd6b0875affbeefa9@uci.edu> <308df3bd49b7098b6f0d36cb2675a0de@uci.edu> <4b6492c5148151cc52c91f2d56c6532b@uci.edu> Message-ID: <54B64F8D-D4A6-4E0F-9693-3AC4055CEA02@gmail.com> You need to find out where the augustus MAKER is using is installed. Check the maker_exe.ctl file you are using, or type ?which augustus?. ?Carson > On Feb 5, 2016, at 3:58 PM, hcma wrote: > > Hi Carlson, > > These are the list of directories under maker/2.31.8 > > bin data GMOD INSTALL lib LICENSE MWAS perl README RELEASE src > > > Where can i find augustus/? Or i have to ask my system admin to install this? > > Thanks. > > Best Regards > Karen > > > > > On 2016-02-05 14:54, Carson Holt wrote: >> Augustus gives you an entire directory rather than just a single file >> like SNAP. You have to take the directory and copy it to the >> .../augustus/config/species/ directory. >> Example: >> ?/augustus/config/species/arabidopsis/ >> Then ?arabidopsis? would be the species name to use with MAKER. >> Sometimes you may have to do a second round of both SNAP and Augustus >> training (called bootstrapping). Look at the models you get after the >> first round, and if they look good then, the second round is probably >> not going top be beneficial. >> ?Carson >>> On Feb 5, 2016, at 3:42 PM, hcma wrote: >>> Hi Dr Holt, >>> Thanks for the email. Here is my pipeline, does it seems acceptable? Any comments is welcome and much appreciated. >>> 1. Use maker to generate training gene set: >>> genome=all-chromosome-r1.04.fasta >>> est=Trinity.fasta >>> est2genome=1 >>> 2. Use output of Maker to train SNAP: >>> maker2zff dwil-all-chromosome-r1.04.all.gff >>> fathom genome.ann genome.dna ?gene-stats >>> fathom genome.ann genome.dna ?categorize 1000 >>> fathom genome.ann genome.dna ?gene-stats >>> fathom uni.ann uni.dna ?export 1000 ?plus >>> hmm-assembler.pl genome . > dwil_genome.hmm >>> 3. Use output of Maker to train Augustus on their webserver: >>> File used: >>> Upload ?export.dna? as the genome file >>> Upload ?export.aa? as the protein file >>> 4. second and final Maker run: >>> genome=all-chromosome-r1.04.fasta >>> est=Trinity.fasta >>> est2genome=0 >>> Snaphmm=output of 2 >>> How do i incorporate the output of training set of gene from Augustus web server here into this step 4? >>> Thanks for your time. >>> Best Regards >>> Karen >>> On 2016-02-05 06:36, Carson Holt wrote: >>>> Hi Karen, >>>> There are many ways to train Augustus. I prefer to identify gene >>>> models in MAKER (GFF3) and use those to train both SNAP and Augustus. >>>> Here is a previous post on the topic ?> >>>> https://groups.google.com/forum/#!searchin/maker-devel/augustus/maker-devel/FWMSTdqWQqI/lC3miQtiCpwJ >>>> [1] >>>> In the end you need to look at the SNAP and Augustus models together >>>> with evidence alignments in a genome browser (like desktop Apollo). >>>> When everything is trained well, both SNAP and Augustus models will >>>> look like each other and both seem to look like the evidence >>>> alignments. >>>> Thanks, >>>> Carson >>>>> On Feb 4, 2016, at 5:52 PM, hcma wrote: >>>>> Hi, >>>>> I have a genome sequence and Trinity assembly for a new species and >>>>> I am wondering what are the best steps to take when using MAKER? >>>>> 1. I used the genome sequence and all assembled Trinity sequence to >>>>> do first run of MAKER in order to generate training set for SNAP and >>>>> Augustus. >>>>> In maker_opts.ctl: >>>>> genome=all-chromosome-r1.04.fasta >>>>> est=Trinity.fasta >>>>> est2genome=1 >>>>> 2. Train SNAP >>>>> 3. Train Augustus >>>>> When i train Augustus, i only supply genome and protein file, should >>>>> i also supply the trinity file here? >>>>> 4. what's the best parameter to use when running MAKER the second >>>>> time for obtaining the final annotation? I would prefer not to use >>>>> any external protein data. >>>>> genome=all-chromosome-r1.04.fasta >>>>> est=Trinity.fasta >>>>> est2genome=0 >>>>> SNAP >>>>> Augustus >>>>> Thanks. >>>>> Best Regards >>>>> KAren >>>> Links: >>>> ------ >>>> [1] >>>> https://groups.google.com/forum/#!searchin/maker-devel/augustus/maker-devel/FWMSTdqWQqI/lC3miQtiCpwJ > From hcma at uci.edu Fri Feb 5 16:20:26 2016 From: hcma at uci.edu (hcma) Date: Fri, 05 Feb 2016 15:20:26 -0800 Subject: [maker-devel] Q on MAKER In-Reply-To: <54B64F8D-D4A6-4E0F-9693-3AC4055CEA02@gmail.com> References: <07C1C0FA-243E-4F05-AE17-FBECCE5BC3A5@gmail.com> <980827F7-C4F0-4E26-A480-F588B2E09C7B@gmail.com> <99f6989955acdf6fd6b0875affbeefa9@uci.edu> <308df3bd49b7098b6f0d36cb2675a0de@uci.edu> <4b6492c5148151cc52c91f2d56c6532b@uci.edu> <54B64F8D-D4A6-4E0F-9693-3AC4055CEA02@gmail.com> Message-ID: <5a40b7af9947dc8297046ba52620569e@uci.edu> Hi Carlson, Thanks for the instruction and in maker_exe.ctl, i only see path to snap, but not to augustus, so my system admin is checking this for me. From some manual i found, people use both snap and augustus when using MAKER to annotate genomes. Would you recommend using both or one of the 2 is sufficient? Thanks for your valuable time and advise. Best Regards Karen On 2016-02-05 15:03, Carson Holt wrote: > You need to find out where the augustus MAKER is using is installed. > Check the maker_exe.ctl file you are using, or type ?which augustus?. > > ?Carson > > >> On Feb 5, 2016, at 3:58 PM, hcma wrote: >> >> Hi Carlson, >> >> These are the list of directories under maker/2.31.8 >> >> bin data GMOD INSTALL lib LICENSE MWAS perl README RELEASE >> src >> >> >> Where can i find augustus/? Or i have to ask my system admin to >> install this? >> >> Thanks. >> >> Best Regards >> Karen >> >> >> >> >> On 2016-02-05 14:54, Carson Holt wrote: >>> Augustus gives you an entire directory rather than just a single file >>> like SNAP. You have to take the directory and copy it to the >>> .../augustus/config/species/ directory. >>> Example: >>> ?/augustus/config/species/arabidopsis/ >>> Then ?arabidopsis? would be the species name to use with MAKER. >>> Sometimes you may have to do a second round of both SNAP and Augustus >>> training (called bootstrapping). Look at the models you get after the >>> first round, and if they look good then, the second round is probably >>> not going top be beneficial. >>> ?Carson >>>> On Feb 5, 2016, at 3:42 PM, hcma wrote: >>>> Hi Dr Holt, >>>> Thanks for the email. Here is my pipeline, does it seems acceptable? >>>> Any comments is welcome and much appreciated. >>>> 1. Use maker to generate training gene set: >>>> genome=all-chromosome-r1.04.fasta >>>> est=Trinity.fasta >>>> est2genome=1 >>>> 2. Use output of Maker to train SNAP: >>>> maker2zff dwil-all-chromosome-r1.04.all.gff >>>> fathom genome.ann genome.dna ?gene-stats >>>> fathom genome.ann genome.dna ?categorize 1000 >>>> fathom genome.ann genome.dna ?gene-stats >>>> fathom uni.ann uni.dna ?export 1000 ?plus >>>> hmm-assembler.pl genome . > dwil_genome.hmm >>>> 3. Use output of Maker to train Augustus on their webserver: >>>> File used: >>>> Upload ?export.dna? as the genome file >>>> Upload ?export.aa? as the protein file >>>> 4. second and final Maker run: >>>> genome=all-chromosome-r1.04.fasta >>>> est=Trinity.fasta >>>> est2genome=0 >>>> Snaphmm=output of 2 >>>> How do i incorporate the output of training set of gene from >>>> Augustus web server here into this step 4? >>>> Thanks for your time. >>>> Best Regards >>>> Karen >>>> On 2016-02-05 06:36, Carson Holt wrote: >>>>> Hi Karen, >>>>> There are many ways to train Augustus. I prefer to identify gene >>>>> models in MAKER (GFF3) and use those to train both SNAP and >>>>> Augustus. >>>>> Here is a previous post on the topic ?> >>>>> https://groups.google.com/forum/#!searchin/maker-devel/augustus/maker-devel/FWMSTdqWQqI/lC3miQtiCpwJ >>>>> [1] >>>>> In the end you need to look at the SNAP and Augustus models >>>>> together >>>>> with evidence alignments in a genome browser (like desktop Apollo). >>>>> When everything is trained well, both SNAP and Augustus models will >>>>> look like each other and both seem to look like the evidence >>>>> alignments. >>>>> Thanks, >>>>> Carson >>>>>> On Feb 4, 2016, at 5:52 PM, hcma wrote: >>>>>> Hi, >>>>>> I have a genome sequence and Trinity assembly for a new species >>>>>> and >>>>>> I am wondering what are the best steps to take when using MAKER? >>>>>> 1. I used the genome sequence and all assembled Trinity sequence >>>>>> to >>>>>> do first run of MAKER in order to generate training set for SNAP >>>>>> and >>>>>> Augustus. >>>>>> In maker_opts.ctl: >>>>>> genome=all-chromosome-r1.04.fasta >>>>>> est=Trinity.fasta >>>>>> est2genome=1 >>>>>> 2. Train SNAP >>>>>> 3. Train Augustus >>>>>> When i train Augustus, i only supply genome and protein file, >>>>>> should >>>>>> i also supply the trinity file here? >>>>>> 4. what's the best parameter to use when running MAKER the second >>>>>> time for obtaining the final annotation? I would prefer not to use >>>>>> any external protein data. >>>>>> genome=all-chromosome-r1.04.fasta >>>>>> est=Trinity.fasta >>>>>> est2genome=0 >>>>>> SNAP >>>>>> Augustus >>>>>> Thanks. >>>>>> Best Regards >>>>>> KAren >>>>> Links: >>>>> ------ >>>>> [1] >>>>> https://groups.google.com/forum/#!searchin/maker-devel/augustus/maker-devel/FWMSTdqWQqI/lC3miQtiCpwJ >> From carsonhh at gmail.com Fri Feb 5 16:33:23 2016 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 5 Feb 2016 16:33:23 -0700 Subject: [maker-devel] Q on MAKER In-Reply-To: <5a40b7af9947dc8297046ba52620569e@uci.edu> References: <07C1C0FA-243E-4F05-AE17-FBECCE5BC3A5@gmail.com> <980827F7-C4F0-4E26-A480-F588B2E09C7B@gmail.com> <99f6989955acdf6fd6b0875affbeefa9@uci.edu> <308df3bd49b7098b6f0d36cb2675a0de@uci.edu> <4b6492c5148151cc52c91f2d56c6532b@uci.edu> <54B64F8D-D4A6-4E0F-9693-3AC4055CEA02@gmail.com> <5a40b7af9947dc8297046ba52620569e@uci.edu> Message-ID: <9449FD1E-5D22-4B0B-A61A-644D7B63335B@gmail.com> I recommend using both. You probably don't have augustus installed. --Carson Sent from my iPhone > On Feb 5, 2016, at 4:20 PM, hcma wrote: > > Hi Carlson, > > Thanks for the instruction and in maker_exe.ctl, i only see path to snap, but not to augustus, so my system admin is checking this for me. > > From some manual i found, people use both snap and augustus when using MAKER to annotate genomes. Would you recommend using both or one of the 2 is sufficient? > > Thanks for your valuable time and advise. > > Best Regards > Karen > > > > > >> On 2016-02-05 15:03, Carson Holt wrote: >> You need to find out where the augustus MAKER is using is installed. >> Check the maker_exe.ctl file you are using, or type ?which augustus?. >> ?Carson >>> On Feb 5, 2016, at 3:58 PM, hcma wrote: >>> Hi Carlson, >>> These are the list of directories under maker/2.31.8 >>> bin data GMOD INSTALL lib LICENSE MWAS perl README RELEASE src >>> Where can i find augustus/? Or i have to ask my system admin to install this? >>> Thanks. >>> Best Regards >>> Karen >>>> On 2016-02-05 14:54, Carson Holt wrote: >>>> Augustus gives you an entire directory rather than just a single file >>>> like SNAP. You have to take the directory and copy it to the >>>> .../augustus/config/species/ directory. >>>> Example: >>>> ?/augustus/config/species/arabidopsis/ >>>> Then ?arabidopsis? would be the species name to use with MAKER. >>>> Sometimes you may have to do a second round of both SNAP and Augustus >>>> training (called bootstrapping). Look at the models you get after the >>>> first round, and if they look good then, the second round is probably >>>> not going top be beneficial. >>>> ?Carson >>>>> On Feb 5, 2016, at 3:42 PM, hcma wrote: >>>>> Hi Dr Holt, >>>>> Thanks for the email. Here is my pipeline, does it seems acceptable? Any comments is welcome and much appreciated. >>>>> 1. Use maker to generate training gene set: >>>>> genome=all-chromosome-r1.04.fasta >>>>> est=Trinity.fasta >>>>> est2genome=1 >>>>> 2. Use output of Maker to train SNAP: >>>>> maker2zff dwil-all-chromosome-r1.04.all.gff >>>>> fathom genome.ann genome.dna ?gene-stats >>>>> fathom genome.ann genome.dna ?categorize 1000 >>>>> fathom genome.ann genome.dna ?gene-stats >>>>> fathom uni.ann uni.dna ?export 1000 ?plus >>>>> hmm-assembler.pl genome . > dwil_genome.hmm >>>>> 3. Use output of Maker to train Augustus on their webserver: >>>>> File used: >>>>> Upload ?export.dna? as the genome file >>>>> Upload ?export.aa? as the protein file >>>>> 4. second and final Maker run: >>>>> genome=all-chromosome-r1.04.fasta >>>>> est=Trinity.fasta >>>>> est2genome=0 >>>>> Snaphmm=output of 2 >>>>> How do i incorporate the output of training set of gene from Augustus web server here into this step 4? >>>>> Thanks for your time. >>>>> Best Regards >>>>> Karen >>>>>> On 2016-02-05 06:36, Carson Holt wrote: >>>>>> Hi Karen, >>>>>> There are many ways to train Augustus. I prefer to identify gene >>>>>> models in MAKER (GFF3) and use those to train both SNAP and Augustus. >>>>>> Here is a previous post on the topic ?> >>>>>> https://groups.google.com/forum/#!searchin/maker-devel/augustus/maker-devel/FWMSTdqWQqI/lC3miQtiCpwJ >>>>>> [1] >>>>>> In the end you need to look at the SNAP and Augustus models together >>>>>> with evidence alignments in a genome browser (like desktop Apollo). >>>>>> When everything is trained well, both SNAP and Augustus models will >>>>>> look like each other and both seem to look like the evidence >>>>>> alignments. >>>>>> Thanks, >>>>>> Carson >>>>>>> On Feb 4, 2016, at 5:52 PM, hcma wrote: >>>>>>> Hi, >>>>>>> I have a genome sequence and Trinity assembly for a new species and >>>>>>> I am wondering what are the best steps to take when using MAKER? >>>>>>> 1. I used the genome sequence and all assembled Trinity sequence to >>>>>>> do first run of MAKER in order to generate training set for SNAP and >>>>>>> Augustus. >>>>>>> In maker_opts.ctl: >>>>>>> genome=all-chromosome-r1.04.fasta >>>>>>> est=Trinity.fasta >>>>>>> est2genome=1 >>>>>>> 2. Train SNAP >>>>>>> 3. Train Augustus >>>>>>> When i train Augustus, i only supply genome and protein file, should >>>>>>> i also supply the trinity file here? >>>>>>> 4. what's the best parameter to use when running MAKER the second >>>>>>> time for obtaining the final annotation? I would prefer not to use >>>>>>> any external protein data. >>>>>>> genome=all-chromosome-r1.04.fasta >>>>>>> est=Trinity.fasta >>>>>>> est2genome=0 >>>>>>> SNAP >>>>>>> Augustus >>>>>>> Thanks. >>>>>>> Best Regards >>>>>>> KAren >>>>>> Links: >>>>>> ------ >>>>>> [1] >>>>>> https://groups.google.com/forum/#!searchin/maker-devel/augustus/maker-devel/FWMSTdqWQqI/lC3miQtiCpwJ > From dcard at uta.edu Mon Feb 8 09:05:21 2016 From: dcard at uta.edu (Card, Daren C) Date: Mon, 8 Feb 2016 10:05:21 -0600 Subject: [maker-devel] Most scaffolds fail with BadParameter error, Maker on TACC Lonestar Message-ID: <38614065-4DEF-47B4-8100-BD18901D2592@uta.edu> Hello, I?ve tried to run Maker on TACC Lonestar (4, trying to squeeze some last things in before deprecation), but I haven?t had much success. I get Maker to run, but only 28 proteins/transcripts are annotated and most scaffolds fail to finish properly, according to the ?master_datastore_index.log. In my STDERR, I see a consistent error show up for many scaffolds: ------------- EXCEPTION: Bio::Root::BadParameter ------------- MSG: ' 7.5' is not a valid score VALUE: 7.5 STACK: Error::throw STACK: Bio::Root::Root::throw /opt/apps/maker/2.30/bin/../perl/lib/Bio/Root/Root.pm:486 STACK: Bio::SeqFeature::Generic::score /opt/apps/maker/2.30/bin/../perl/lib/Bio/SeqFeature/Generic.pm:468 STACK: GFFDB::_ary_to_features /opt/apps/maker/2.30/bin/../lib/GFFDB.pm:891 STACK: GFFDB::phathits_on_chunk /opt/apps/maker/2.30/bin/../lib/GFFDB.pm:534 STACK: Process::MpiChunk::_go /opt/apps/maker/2.30/bin/../lib/Process/MpiChunk.pm:756 STACK: Process::MpiChunk::run /opt/apps/maker/2.30/bin/../lib/Process/MpiChunk.pm:341 STACK: main::node_thread /opt/apps/maker/2.30/bin/maker:1433 STACK: threads::new /opt/apps/maker/2.30/bin/../perl/lib/forks.pm:799 STACK: /opt/apps/maker/2.30/bin/maker:901 -------------------------------------------------------------- --> rank=18, hostname=c304-113.ls4.tacc.utexas.edu ERROR: Failed while doing repeat masking ERROR: Chunk failed at level:0, tier_type:1 FAILED CONTIG:scaffold279|size418813 The ?7.5? value can vary between errors, but other than that and the scaffold ID, the rest of the error message is the same. I obviously don?t have the expertise to diagnose the issue here, but I?m hoping someone can help me sort this out. A quick, unrelated question, is whether the Yandell lab (or anyone else) has a script that will produce a CDS file (multi-FASTA file) from a GFF annotation and FASTA genome sequence. I?m trying to produce a CDS from some NCBI genomes (annoying that it isn?t already included from NCBI), but the script I produced to do this is giving some suspect results. I figured if anyone had a well-tested script for this purpose, it would be someone on this list. Best, Daren Daren Card Ph.D. Candidate Castoe Lab University of Texas at Arlington dcard at uta.edu www.darencard.net From carsonhh at gmail.com Mon Feb 8 09:31:08 2016 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 8 Feb 2016 09:31:08 -0700 Subject: [maker-devel] Most scaffolds fail with BadParameter error, Maker on TACC Lonestar In-Reply-To: <38614065-4DEF-47B4-8100-BD18901D2592@uta.edu> References: <38614065-4DEF-47B4-8100-BD18901D2592@uta.edu> Message-ID: <9BA957A0-DD0F-4920-A778-65D0DE10F1ED@gmail.com> It?s failing because there is something wrong with the format of the input GFF file. It might not be GFF3, it may be GTF format, it may have mixed types (not just gene/mRNA/exon/CDS models), or it may have a missing Parent= or ID= tag required to generate the proper feature relationship. You can try and use GAL (http://www.sequenceontology.org/software/GAL.html ) to help validate of convert the format. Also note the message ?> MSG: ' 7.5' is not a valid score There is an extra whitespace inside the single quotes which probably means you have contaminating whitespace before the value. GFF3 is tab delimited, space characters are not permitted, and if required must be escaped following URI escaping convention. ?Carson > On Feb 8, 2016, at 9:05 AM, Card, Daren C wrote: > > Hello, > > I?ve tried to run Maker on TACC Lonestar (4, trying to squeeze some last things in before deprecation), but I haven?t had much success. I get Maker to run, but only 28 proteins/transcripts are annotated and most scaffolds fail to finish properly, according to the ?master_datastore_index.log. In my STDERR, I see a consistent error show up for many scaffolds: > > ------------- EXCEPTION: Bio::Root::BadParameter ------------- > MSG: ' 7.5' is not a valid score > VALUE: 7.5 > STACK: Error::throw > STACK: Bio::Root::Root::throw /opt/apps/maker/2.30/bin/../perl/lib/Bio/Root/Root.pm:486 > STACK: Bio::SeqFeature::Generic::score /opt/apps/maker/2.30/bin/../perl/lib/Bio/SeqFeature/Generic.pm:468 > STACK: GFFDB::_ary_to_features /opt/apps/maker/2.30/bin/../lib/GFFDB.pm:891 > STACK: GFFDB::phathits_on_chunk /opt/apps/maker/2.30/bin/../lib/GFFDB.pm:534 > STACK: Process::MpiChunk::_go /opt/apps/maker/2.30/bin/../lib/Process/MpiChunk.pm:756 > STACK: Process::MpiChunk::run /opt/apps/maker/2.30/bin/../lib/Process/MpiChunk.pm:341 > STACK: main::node_thread /opt/apps/maker/2.30/bin/maker:1433 > STACK: threads::new /opt/apps/maker/2.30/bin/../perl/lib/forks.pm:799 > STACK: /opt/apps/maker/2.30/bin/maker:901 > -------------------------------------------------------------- > --> rank=18, hostname=c304-113.ls4.tacc.utexas.edu > ERROR: Failed while doing repeat masking > ERROR: Chunk failed at level:0, tier_type:1 > FAILED CONTIG:scaffold279|size418813 > > The ?7.5? value can vary between errors, but other than that and the scaffold ID, the rest of the error message is the same. I obviously don?t have the expertise to diagnose the issue here, but I?m hoping someone can help me sort this out. > > A quick, unrelated question, is whether the Yandell lab (or anyone else) has a script that will produce a CDS file (multi-FASTA file) from a GFF annotation and FASTA genome sequence. I?m trying to produce a CDS from some NCBI genomes (annoying that it isn?t already included from NCBI), but the script I produced to do this is giving some suspect results. I figured if anyone had a well-tested script for this purpose, it would be someone on this list. > > Best, > Daren > > > Daren Card > Ph.D. Candidate > Castoe Lab > University of Texas at Arlington > dcard at uta.edu > www.darencard.net > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From hcma at uci.edu Tue Feb 9 15:35:13 2016 From: hcma at uci.edu (hcma) Date: Tue, 09 Feb 2016 14:35:13 -0800 Subject: [maker-devel] Q on MAKER In-Reply-To: <9449FD1E-5D22-4B0B-A61A-644D7B63335B@gmail.com> References: <07C1C0FA-243E-4F05-AE17-FBECCE5BC3A5@gmail.com> <980827F7-C4F0-4E26-A480-F588B2E09C7B@gmail.com> <99f6989955acdf6fd6b0875affbeefa9@uci.edu> <308df3bd49b7098b6f0d36cb2675a0de@uci.edu> <4b6492c5148151cc52c91f2d56c6532b@uci.edu> <54B64F8D-D4A6-4E0F-9693-3AC4055CEA02@gmail.com> <5a40b7af9947dc8297046ba52620569e@uci.edu> <9449FD1E-5D22-4B0B-A61A-644D7B63335B@gmail.com> Message-ID: <7e4d6f2773f654f8530155936b648832@uci.edu> Hi Carson, For the final run of annotation, I would like to incorporate tophat results from RNA-seq data, from your experience, do you know if it is better to use raw RNA-seq (Illumina paired-end data) or trimmed (trimmed using Trimmomatuc) data for feeding into tophat? If trimmed, do you recommend a particular programme? Thanks for your time. Best Regards KAren On 2016-02-05 15:33, Carson Holt wrote: > I recommend using both. You probably don't have augustus installed. > > --Carson > > Sent from my iPhone > >> On Feb 5, 2016, at 4:20 PM, hcma wrote: >> >> Hi Carlson, >> >> Thanks for the instruction and in maker_exe.ctl, i only see path to >> snap, but not to augustus, so my system admin is checking this for me. >> >> From some manual i found, people use both snap and augustus when using >> MAKER to annotate genomes. Would you recommend using both or one of >> the 2 is sufficient? >> >> Thanks for your valuable time and advise. >> >> Best Regards >> Karen >> >> >> >> >> >>> On 2016-02-05 15:03, Carson Holt wrote: >>> You need to find out where the augustus MAKER is using is installed. >>> Check the maker_exe.ctl file you are using, or type ?which augustus?. >>> ?Carson >>>> On Feb 5, 2016, at 3:58 PM, hcma wrote: >>>> Hi Carlson, >>>> These are the list of directories under maker/2.31.8 >>>> bin data GMOD INSTALL lib LICENSE MWAS perl README RELEASE >>>> src >>>> Where can i find augustus/? Or i have to ask my system admin to >>>> install this? >>>> Thanks. >>>> Best Regards >>>> Karen >>>>> On 2016-02-05 14:54, Carson Holt wrote: >>>>> Augustus gives you an entire directory rather than just a single >>>>> file >>>>> like SNAP. You have to take the directory and copy it to the >>>>> .../augustus/config/species/ directory. >>>>> Example: >>>>> ?/augustus/config/species/arabidopsis/ >>>>> Then ?arabidopsis? would be the species name to use with MAKER. >>>>> Sometimes you may have to do a second round of both SNAP and >>>>> Augustus >>>>> training (called bootstrapping). Look at the models you get after >>>>> the >>>>> first round, and if they look good then, the second round is >>>>> probably >>>>> not going top be beneficial. >>>>> ?Carson >>>>>> On Feb 5, 2016, at 3:42 PM, hcma wrote: >>>>>> Hi Dr Holt, >>>>>> Thanks for the email. Here is my pipeline, does it seems >>>>>> acceptable? Any comments is welcome and much appreciated. >>>>>> 1. Use maker to generate training gene set: >>>>>> genome=all-chromosome-r1.04.fasta >>>>>> est=Trinity.fasta >>>>>> est2genome=1 >>>>>> 2. Use output of Maker to train SNAP: >>>>>> maker2zff dwil-all-chromosome-r1.04.all.gff >>>>>> fathom genome.ann genome.dna ?gene-stats >>>>>> fathom genome.ann genome.dna ?categorize 1000 >>>>>> fathom genome.ann genome.dna ?gene-stats >>>>>> fathom uni.ann uni.dna ?export 1000 ?plus >>>>>> hmm-assembler.pl genome . > dwil_genome.hmm >>>>>> 3. Use output of Maker to train Augustus on their webserver: >>>>>> File used: >>>>>> Upload ?export.dna? as the genome file >>>>>> Upload ?export.aa? as the protein file >>>>>> 4. second and final Maker run: >>>>>> genome=all-chromosome-r1.04.fasta >>>>>> est=Trinity.fasta >>>>>> est2genome=0 >>>>>> Snaphmm=output of 2 >>>>>> How do i incorporate the output of training set of gene from >>>>>> Augustus web server here into this step 4? >>>>>> Thanks for your time. >>>>>> Best Regards >>>>>> Karen >>>>>>> On 2016-02-05 06:36, Carson Holt wrote: >>>>>>> Hi Karen, >>>>>>> There are many ways to train Augustus. I prefer to identify gene >>>>>>> models in MAKER (GFF3) and use those to train both SNAP and >>>>>>> Augustus. >>>>>>> Here is a previous post on the topic ?> >>>>>>> https://groups.google.com/forum/#!searchin/maker-devel/augustus/maker-devel/FWMSTdqWQqI/lC3miQtiCpwJ >>>>>>> [1] >>>>>>> In the end you need to look at the SNAP and Augustus models >>>>>>> together >>>>>>> with evidence alignments in a genome browser (like desktop >>>>>>> Apollo). >>>>>>> When everything is trained well, both SNAP and Augustus models >>>>>>> will >>>>>>> look like each other and both seem to look like the evidence >>>>>>> alignments. >>>>>>> Thanks, >>>>>>> Carson >>>>>>>> On Feb 4, 2016, at 5:52 PM, hcma wrote: >>>>>>>> Hi, >>>>>>>> I have a genome sequence and Trinity assembly for a new species >>>>>>>> and >>>>>>>> I am wondering what are the best steps to take when using MAKER? >>>>>>>> 1. I used the genome sequence and all assembled Trinity sequence >>>>>>>> to >>>>>>>> do first run of MAKER in order to generate training set for SNAP >>>>>>>> and >>>>>>>> Augustus. >>>>>>>> In maker_opts.ctl: >>>>>>>> genome=all-chromosome-r1.04.fasta >>>>>>>> est=Trinity.fasta >>>>>>>> est2genome=1 >>>>>>>> 2. Train SNAP >>>>>>>> 3. Train Augustus >>>>>>>> When i train Augustus, i only supply genome and protein file, >>>>>>>> should >>>>>>>> i also supply the trinity file here? >>>>>>>> 4. what's the best parameter to use when running MAKER the >>>>>>>> second >>>>>>>> time for obtaining the final annotation? I would prefer not to >>>>>>>> use >>>>>>>> any external protein data. >>>>>>>> genome=all-chromosome-r1.04.fasta >>>>>>>> est=Trinity.fasta >>>>>>>> est2genome=0 >>>>>>>> SNAP >>>>>>>> Augustus >>>>>>>> Thanks. >>>>>>>> Best Regards >>>>>>>> KAren >>>>>>> Links: >>>>>>> ------ >>>>>>> [1] >>>>>>> https://groups.google.com/forum/#!searchin/maker-devel/augustus/maker-devel/FWMSTdqWQqI/lC3miQtiCpwJ >> From jgallant at msu.edu Tue Feb 9 19:36:51 2016 From: jgallant at msu.edu (Jason Gallant) Date: Wed, 10 Feb 2016 02:36:51 +0000 Subject: [maker-devel] Extract FASTA Sequences from "Maker Standard" Build Message-ID: Hi Everyone, Quick question? I?ve run through Mike Cambell?s tutorial on building ?Maker Standard?, ?Maker Default? and ?Maker Max? datasets. I?ve decided that the ?Maker Standard? data (Transcripts with Evidence and/or IPR scan hits) makes the most sense for what we?re trying to do. Is there an easy way to create the fasta files associated with the maker standard build? Fasta_merge typically outputs a variety of .fasta files, which I?ve been able to create following this protocol for the ?maker max? dataset. I?d like to get these for the ?maker standard? build. Currently, the datastore contains the data for the ?maker max? data. One way, i suppose would be to re-run MAKER with the maker standard gff file, but it seems like an overly complicated way of doing it?? Any suggestions Mike (or others?) Has anyone written a script to do this automagically? Best, Jason Gallant -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.s.campbell1 at gmail.com Wed Feb 10 07:03:29 2016 From: michael.s.campbell1 at gmail.com (Michael Campbell) Date: Wed, 10 Feb 2016 09:03:29 -0500 Subject: [maker-devel] Extract FASTA Sequences from "Maker Standard" Build In-Reply-To: References: Message-ID: <2F89E4BC-C473-40A9-AE81-EAA2323B17D0@gmail.com> Hi Jason, Rerunning MAKER with the standard gff3 file would work, but for speed I would use the fasta_tool accessory script that is bundled with MAKER. All you need to make is a file with the list of transcript names from the standard gff3. Then you can use fasta_tool with the --select ooption to return all of the FASTA sequences that are in the list. The command would look like this PATH_TO_MAKER/maker/bin/fasta_tool --select id_file.txt max_transcritps.fasta | PATH_TO_MAKER/maker/bin/fasta_tool --wrap 80 > standard_transcripts.fasta fasta_tool outputs unwraped fasta by default, so I generally pipe the output back through fasta_tool to wrap the text. The above command line wraps the sequence at 80 characters. you can use a perl one liner like this one to make the id file perl -lane ' if ($F[2] eq mRNA){my ($id) = $_ =~ /Name=(\S+?);/; print $id;}? maker_standard.gff If you use these command line make sure you type them out yourself, email programs have a tendency to change characters slightly making copy/pasted command fail. Thanks, Mike > On Feb 9, 2016, at 9:36 PM, Jason Gallant wrote: > > Hi Everyone, > > Quick question? I?ve run through Mike Cambell?s tutorial on building ?Maker Standard?, ?Maker Default? and ?Maker Max? datasets. I?ve decided that the ?Maker Standard? data (Transcripts with Evidence and/or IPR scan hits) makes the most sense for what we?re trying to do. > > Is there an easy way to create the fasta files associated with the maker standard build? Fasta_merge typically outputs a variety of .fasta files, which I?ve been able to create following this protocol for the ?maker max? dataset. I?d like to get these for the ?maker standard? build. > > Currently, the datastore contains the data for the ?maker max? data. One way, i suppose would be to re-run MAKER with the maker standard gff file, but it seems like an overly complicated way of doing it?? > > Any suggestions Mike (or others?) Has anyone written a script to do this automagically? > > Best, > Jason Gallant > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From michael.s.campbell1 at gmail.com Wed Feb 10 07:17:11 2016 From: michael.s.campbell1 at gmail.com (Michael Campbell) Date: Wed, 10 Feb 2016 09:17:11 -0500 Subject: [maker-devel] Q on MAKER In-Reply-To: <7e4d6f2773f654f8530155936b648832@uci.edu> References: <07C1C0FA-243E-4F05-AE17-FBECCE5BC3A5@gmail.com> <980827F7-C4F0-4E26-A480-F588B2E09C7B@gmail.com> <99f6989955acdf6fd6b0875affbeefa9@uci.edu> <308df3bd49b7098b6f0d36cb2675a0de@uci.edu> <4b6492c5148151cc52c91f2d56c6532b@uci.edu> <54B64F8D-D4A6-4E0F-9693-3AC4055CEA02@gmail.com> <5a40b7af9947dc8297046ba52620569e@uci.edu> <9449FD1E-5D22-4B0B-A61A-644D7B63335B@gmail.com> <7e4d6f2773f654f8530155936b648832@uci.edu> Message-ID: <7495272C-476A-4985-8D49-30D991410535@gmail.com> HI Karen, From my experience trimming reads will not make things worse and it generally makes things better. As far as the best program to use, one doesn?t really stand out above the others as far as I can tell. However, with paired end reads it is important to use a trimmer that preserves the pairing between the two files (i.e when an entire read is discarded the paired read is moved into a file for singletons). Thanks Mike > On Feb 9, 2016, at 5:35 PM, hcma wrote: > > Hi Carson, > > For the final run of annotation, I would like to incorporate tophat results from RNA-seq data, from your experience, do you know if it is better to use raw RNA-seq (Illumina paired-end data) or trimmed (trimmed using Trimmomatuc) data for feeding into tophat? If trimmed, do you recommend a particular programme? > > Thanks for your time. > > Best Regards > KAren > > > > > On 2016-02-05 15:33, Carson Holt wrote: >> I recommend using both. You probably don't have augustus installed. >> --Carson >> Sent from my iPhone >>> On Feb 5, 2016, at 4:20 PM, hcma wrote: >>> Hi Carlson, >>> Thanks for the instruction and in maker_exe.ctl, i only see path to snap, but not to augustus, so my system admin is checking this for me. >>> From some manual i found, people use both snap and augustus when using MAKER to annotate genomes. Would you recommend using both or one of the 2 is sufficient? >>> Thanks for your valuable time and advise. >>> Best Regards >>> Karen >>>> On 2016-02-05 15:03, Carson Holt wrote: >>>> You need to find out where the augustus MAKER is using is installed. >>>> Check the maker_exe.ctl file you are using, or type ?which augustus?. >>>> ?Carson >>>>> On Feb 5, 2016, at 3:58 PM, hcma wrote: >>>>> Hi Carlson, >>>>> These are the list of directories under maker/2.31.8 >>>>> bin data GMOD INSTALL lib LICENSE MWAS perl README RELEASE src >>>>> Where can i find augustus/? Or i have to ask my system admin to install this? >>>>> Thanks. >>>>> Best Regards >>>>> Karen >>>>>> On 2016-02-05 14:54, Carson Holt wrote: >>>>>> Augustus gives you an entire directory rather than just a single file >>>>>> like SNAP. You have to take the directory and copy it to the >>>>>> .../augustus/config/species/ directory. >>>>>> Example: >>>>>> ?/augustus/config/species/arabidopsis/ >>>>>> Then ?arabidopsis? would be the species name to use with MAKER. >>>>>> Sometimes you may have to do a second round of both SNAP and Augustus >>>>>> training (called bootstrapping). Look at the models you get after the >>>>>> first round, and if they look good then, the second round is probably >>>>>> not going top be beneficial. >>>>>> ?Carson >>>>>>> On Feb 5, 2016, at 3:42 PM, hcma wrote: >>>>>>> Hi Dr Holt, >>>>>>> Thanks for the email. Here is my pipeline, does it seems acceptable? Any comments is welcome and much appreciated. >>>>>>> 1. Use maker to generate training gene set: >>>>>>> genome=all-chromosome-r1.04.fasta >>>>>>> est=Trinity.fasta >>>>>>> est2genome=1 >>>>>>> 2. Use output of Maker to train SNAP: >>>>>>> maker2zff dwil-all-chromosome-r1.04.all.gff >>>>>>> fathom genome.ann genome.dna ?gene-stats >>>>>>> fathom genome.ann genome.dna ?categorize 1000 >>>>>>> fathom genome.ann genome.dna ?gene-stats >>>>>>> fathom uni.ann uni.dna ?export 1000 ?plus >>>>>>> hmm-assembler.pl genome . > dwil_genome.hmm >>>>>>> 3. Use output of Maker to train Augustus on their webserver: >>>>>>> File used: >>>>>>> Upload ?export.dna? as the genome file >>>>>>> Upload ?export.aa? as the protein file >>>>>>> 4. second and final Maker run: >>>>>>> genome=all-chromosome-r1.04.fasta >>>>>>> est=Trinity.fasta >>>>>>> est2genome=0 >>>>>>> Snaphmm=output of 2 >>>>>>> How do i incorporate the output of training set of gene from Augustus web server here into this step 4? >>>>>>> Thanks for your time. >>>>>>> Best Regards >>>>>>> Karen >>>>>>>> On 2016-02-05 06:36, Carson Holt wrote: >>>>>>>> Hi Karen, >>>>>>>> There are many ways to train Augustus. I prefer to identify gene >>>>>>>> models in MAKER (GFF3) and use those to train both SNAP and Augustus. >>>>>>>> Here is a previous post on the topic ?> >>>>>>>> https://groups.google.com/forum/#!searchin/maker-devel/augustus/maker-devel/FWMSTdqWQqI/lC3miQtiCpwJ >>>>>>>> [1] >>>>>>>> In the end you need to look at the SNAP and Augustus models together >>>>>>>> with evidence alignments in a genome browser (like desktop Apollo). >>>>>>>> When everything is trained well, both SNAP and Augustus models will >>>>>>>> look like each other and both seem to look like the evidence >>>>>>>> alignments. >>>>>>>> Thanks, >>>>>>>> Carson >>>>>>>>> On Feb 4, 2016, at 5:52 PM, hcma wrote: >>>>>>>>> Hi, >>>>>>>>> I have a genome sequence and Trinity assembly for a new species and >>>>>>>>> I am wondering what are the best steps to take when using MAKER? >>>>>>>>> 1. I used the genome sequence and all assembled Trinity sequence to >>>>>>>>> do first run of MAKER in order to generate training set for SNAP and >>>>>>>>> Augustus. >>>>>>>>> In maker_opts.ctl: >>>>>>>>> genome=all-chromosome-r1.04.fasta >>>>>>>>> est=Trinity.fasta >>>>>>>>> est2genome=1 >>>>>>>>> 2. Train SNAP >>>>>>>>> 3. Train Augustus >>>>>>>>> When i train Augustus, i only supply genome and protein file, should >>>>>>>>> i also supply the trinity file here? >>>>>>>>> 4. what's the best parameter to use when running MAKER the second >>>>>>>>> time for obtaining the final annotation? I would prefer not to use >>>>>>>>> any external protein data. >>>>>>>>> genome=all-chromosome-r1.04.fasta >>>>>>>>> est=Trinity.fasta >>>>>>>>> est2genome=0 >>>>>>>>> SNAP >>>>>>>>> Augustus >>>>>>>>> Thanks. >>>>>>>>> Best Regards >>>>>>>>> KAren >>>>>>>> Links: >>>>>>>> ------ >>>>>>>> [1] >>>>>>>> https://groups.google.com/forum/#!searchin/maker-devel/augustus/maker-devel/FWMSTdqWQqI/lC3miQtiCpwJ > From hcma at uci.edu Wed Feb 10 15:27:41 2016 From: hcma at uci.edu (hcma) Date: Wed, 10 Feb 2016 14:27:41 -0800 Subject: [maker-devel] Q on MAKER In-Reply-To: <7495272C-476A-4985-8D49-30D991410535@gmail.com> References: <07C1C0FA-243E-4F05-AE17-FBECCE5BC3A5@gmail.com> <980827F7-C4F0-4E26-A480-F588B2E09C7B@gmail.com> <99f6989955acdf6fd6b0875affbeefa9@uci.edu> <308df3bd49b7098b6f0d36cb2675a0de@uci.edu> <4b6492c5148151cc52c91f2d56c6532b@uci.edu> <54B64F8D-D4A6-4E0F-9693-3AC4055CEA02@gmail.com> <5a40b7af9947dc8297046ba52620569e@uci.edu> <9449FD1E-5D22-4B0B-A61A-644D7B63335B@gmail.com> <7e4d6f2773f654f8530155936b648832@uci.edu> <7495272C-476A-4985-8D49-30D991410535@gmail.com> Message-ID: <7870d65f86546a8b486faf98c1f6fcc0@uci.edu> Hi Mike, Thanks for the reply. So i can input raw RNA-seq reads to Tophat and feed the output to maker? Thanks. Best Regards KAren On 2016-02-10 06:17, Michael Campbell wrote: > HI Karen, > > From my experience trimming reads will not make things worse and it > generally makes things better. As far as the best program to use, one > doesn?t really stand out above the others as far as I can tell. > However, with paired end reads it is important to use a trimmer that > preserves the pairing between the two files (i.e when an entire read > is discarded the paired read is moved into a file for singletons). > > Thanks > Mike > >> On Feb 9, 2016, at 5:35 PM, hcma wrote: >> >> Hi Carson, >> >> For the final run of annotation, I would like to incorporate tophat >> results from RNA-seq data, from your experience, do you know if it is >> better to use raw RNA-seq (Illumina paired-end data) or trimmed >> (trimmed using Trimmomatuc) data for feeding into tophat? If trimmed, >> do you recommend a particular programme? >> >> Thanks for your time. >> >> Best Regards >> KAren >> >> >> >> >> On 2016-02-05 15:33, Carson Holt wrote: >>> I recommend using both. You probably don't have augustus installed. >>> --Carson >>> Sent from my iPhone >>>> On Feb 5, 2016, at 4:20 PM, hcma wrote: >>>> Hi Carlson, >>>> Thanks for the instruction and in maker_exe.ctl, i only see path to >>>> snap, but not to augustus, so my system admin is checking this for >>>> me. >>>> From some manual i found, people use both snap and augustus when >>>> using MAKER to annotate genomes. Would you recommend using both or >>>> one of the 2 is sufficient? >>>> Thanks for your valuable time and advise. >>>> Best Regards >>>> Karen >>>>> On 2016-02-05 15:03, Carson Holt wrote: >>>>> You need to find out where the augustus MAKER is using is >>>>> installed. >>>>> Check the maker_exe.ctl file you are using, or type ?which >>>>> augustus?. >>>>> ?Carson >>>>>> On Feb 5, 2016, at 3:58 PM, hcma wrote: >>>>>> Hi Carlson, >>>>>> These are the list of directories under maker/2.31.8 >>>>>> bin data GMOD INSTALL lib LICENSE MWAS perl README >>>>>> RELEASE src >>>>>> Where can i find augustus/? Or i have to ask my system admin to >>>>>> install this? >>>>>> Thanks. >>>>>> Best Regards >>>>>> Karen >>>>>>> On 2016-02-05 14:54, Carson Holt wrote: >>>>>>> Augustus gives you an entire directory rather than just a single >>>>>>> file >>>>>>> like SNAP. You have to take the directory and copy it to the >>>>>>> .../augustus/config/species/ directory. >>>>>>> Example: >>>>>>> ?/augustus/config/species/arabidopsis/ >>>>>>> Then ?arabidopsis? would be the species name to use with MAKER. >>>>>>> Sometimes you may have to do a second round of both SNAP and >>>>>>> Augustus >>>>>>> training (called bootstrapping). Look at the models you get after >>>>>>> the >>>>>>> first round, and if they look good then, the second round is >>>>>>> probably >>>>>>> not going top be beneficial. >>>>>>> ?Carson >>>>>>>> On Feb 5, 2016, at 3:42 PM, hcma wrote: >>>>>>>> Hi Dr Holt, >>>>>>>> Thanks for the email. Here is my pipeline, does it seems >>>>>>>> acceptable? Any comments is welcome and much appreciated. >>>>>>>> 1. Use maker to generate training gene set: >>>>>>>> genome=all-chromosome-r1.04.fasta >>>>>>>> est=Trinity.fasta >>>>>>>> est2genome=1 >>>>>>>> 2. Use output of Maker to train SNAP: >>>>>>>> maker2zff dwil-all-chromosome-r1.04.all.gff >>>>>>>> fathom genome.ann genome.dna ?gene-stats >>>>>>>> fathom genome.ann genome.dna ?categorize 1000 >>>>>>>> fathom genome.ann genome.dna ?gene-stats >>>>>>>> fathom uni.ann uni.dna ?export 1000 ?plus >>>>>>>> hmm-assembler.pl genome . > dwil_genome.hmm >>>>>>>> 3. Use output of Maker to train Augustus on their webserver: >>>>>>>> File used: >>>>>>>> Upload ?export.dna? as the genome file >>>>>>>> Upload ?export.aa? as the protein file >>>>>>>> 4. second and final Maker run: >>>>>>>> genome=all-chromosome-r1.04.fasta >>>>>>>> est=Trinity.fasta >>>>>>>> est2genome=0 >>>>>>>> Snaphmm=output of 2 >>>>>>>> How do i incorporate the output of training set of gene from >>>>>>>> Augustus web server here into this step 4? >>>>>>>> Thanks for your time. >>>>>>>> Best Regards >>>>>>>> Karen >>>>>>>>> On 2016-02-05 06:36, Carson Holt wrote: >>>>>>>>> Hi Karen, >>>>>>>>> There are many ways to train Augustus. I prefer to identify >>>>>>>>> gene >>>>>>>>> models in MAKER (GFF3) and use those to train both SNAP and >>>>>>>>> Augustus. >>>>>>>>> Here is a previous post on the topic ?> >>>>>>>>> https://groups.google.com/forum/#!searchin/maker-devel/augustus/maker-devel/FWMSTdqWQqI/lC3miQtiCpwJ >>>>>>>>> [1] >>>>>>>>> In the end you need to look at the SNAP and Augustus models >>>>>>>>> together >>>>>>>>> with evidence alignments in a genome browser (like desktop >>>>>>>>> Apollo). >>>>>>>>> When everything is trained well, both SNAP and Augustus models >>>>>>>>> will >>>>>>>>> look like each other and both seem to look like the evidence >>>>>>>>> alignments. >>>>>>>>> Thanks, >>>>>>>>> Carson >>>>>>>>>> On Feb 4, 2016, at 5:52 PM, hcma wrote: >>>>>>>>>> Hi, >>>>>>>>>> I have a genome sequence and Trinity assembly for a new >>>>>>>>>> species and >>>>>>>>>> I am wondering what are the best steps to take when using >>>>>>>>>> MAKER? >>>>>>>>>> 1. I used the genome sequence and all assembled Trinity >>>>>>>>>> sequence to >>>>>>>>>> do first run of MAKER in order to generate training set for >>>>>>>>>> SNAP and >>>>>>>>>> Augustus. >>>>>>>>>> In maker_opts.ctl: >>>>>>>>>> genome=all-chromosome-r1.04.fasta >>>>>>>>>> est=Trinity.fasta >>>>>>>>>> est2genome=1 >>>>>>>>>> 2. Train SNAP >>>>>>>>>> 3. Train Augustus >>>>>>>>>> When i train Augustus, i only supply genome and protein file, >>>>>>>>>> should >>>>>>>>>> i also supply the trinity file here? >>>>>>>>>> 4. what's the best parameter to use when running MAKER the >>>>>>>>>> second >>>>>>>>>> time for obtaining the final annotation? I would prefer not to >>>>>>>>>> use >>>>>>>>>> any external protein data. >>>>>>>>>> genome=all-chromosome-r1.04.fasta >>>>>>>>>> est=Trinity.fasta >>>>>>>>>> est2genome=0 >>>>>>>>>> SNAP >>>>>>>>>> Augustus >>>>>>>>>> Thanks. >>>>>>>>>> Best Regards >>>>>>>>>> KAren >>>>>>>>> Links: >>>>>>>>> ------ >>>>>>>>> [1] >>>>>>>>> https://groups.google.com/forum/#!searchin/maker-devel/augustus/maker-devel/FWMSTdqWQqI/lC3miQtiCpwJ >> From carsonhh at gmail.com Wed Feb 10 19:32:00 2016 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 10 Feb 2016 19:32:00 -0700 Subject: [maker-devel] Q on MAKER In-Reply-To: <7870d65f86546a8b486faf98c1f6fcc0@uci.edu> References: <07C1C0FA-243E-4F05-AE17-FBECCE5BC3A5@gmail.com> <980827F7-C4F0-4E26-A480-F588B2E09C7B@gmail.com> <99f6989955acdf6fd6b0875affbeefa9@uci.edu> <308df3bd49b7098b6f0d36cb2675a0de@uci.edu> <4b6492c5148151cc52c91f2d56c6532b@uci.edu> <54B64F8D-D4A6-4E0F-9693-3AC4055CEA02@gmail.com> <5a40b7af9947dc8297046ba52620569e@uci.edu> <9449FD1E-5D22-4B0B-A61A-644D7B63335B@gmail.com> <7e4d6f2773f654f8530155936b648832@uci.edu> <7495272C-476A-4985-8D49-30D991410535@gmail.com> <7870d65f86546a8b486faf98c1f6fcc0@uci.edu> Message-ID: I find tophat results to be too noisy, and prefer cufflinks. There is both a tophat2gff and cufflinks2gff script that comes with MAKER. Also consider assembling the reads with Trinity (my overall preferred method because it yields the highest specificity). --Carson Sent from my iPhone > On Feb 10, 2016, at 3:27 PM, hcma wrote: > > Hi Mike, > > Thanks for the reply. So i can input raw RNA-seq reads to Tophat and feed the output to maker? > > Thanks. > > Best Regards > KAren > > > >> On 2016-02-10 06:17, Michael Campbell wrote: >> HI Karen, >> From my experience trimming reads will not make things worse and it >> generally makes things better. As far as the best program to use, one >> doesn?t really stand out above the others as far as I can tell. >> However, with paired end reads it is important to use a trimmer that >> preserves the pairing between the two files (i.e when an entire read >> is discarded the paired read is moved into a file for singletons). >> Thanks >> Mike >>> On Feb 9, 2016, at 5:35 PM, hcma wrote: >>> Hi Carson, >>> For the final run of annotation, I would like to incorporate tophat results from RNA-seq data, from your experience, do you know if it is better to use raw RNA-seq (Illumina paired-end data) or trimmed (trimmed using Trimmomatuc) data for feeding into tophat? If trimmed, do you recommend a particular programme? >>> Thanks for your time. >>> Best Regards >>> KAren >>>> On 2016-02-05 15:33, Carson Holt wrote: >>>> I recommend using both. You probably don't have augustus installed. >>>> --Carson >>>> Sent from my iPhone >>>>> On Feb 5, 2016, at 4:20 PM, hcma wrote: >>>>> Hi Carlson, >>>>> Thanks for the instruction and in maker_exe.ctl, i only see path to snap, but not to augustus, so my system admin is checking this for me. >>>>> From some manual i found, people use both snap and augustus when using MAKER to annotate genomes. Would you recommend using both or one of the 2 is sufficient? >>>>> Thanks for your valuable time and advise. >>>>> Best Regards >>>>> Karen >>>>>> On 2016-02-05 15:03, Carson Holt wrote: >>>>>> You need to find out where the augustus MAKER is using is installed. >>>>>> Check the maker_exe.ctl file you are using, or type ?which augustus?. >>>>>> ?Carson >>>>>>> On Feb 5, 2016, at 3:58 PM, hcma wrote: >>>>>>> Hi Carlson, >>>>>>> These are the list of directories under maker/2.31.8 >>>>>>> bin data GMOD INSTALL lib LICENSE MWAS perl README RELEASE src >>>>>>> Where can i find augustus/? Or i have to ask my system admin to install this? >>>>>>> Thanks. >>>>>>> Best Regards >>>>>>> Karen >>>>>>>> On 2016-02-05 14:54, Carson Holt wrote: >>>>>>>> Augustus gives you an entire directory rather than just a single file >>>>>>>> like SNAP. You have to take the directory and copy it to the >>>>>>>> .../augustus/config/species/ directory. >>>>>>>> Example: >>>>>>>> ?/augustus/config/species/arabidopsis/ >>>>>>>> Then ?arabidopsis? would be the species name to use with MAKER. >>>>>>>> Sometimes you may have to do a second round of both SNAP and Augustus >>>>>>>> training (called bootstrapping). Look at the models you get after the >>>>>>>> first round, and if they look good then, the second round is probably >>>>>>>> not going top be beneficial. >>>>>>>> ?Carson >>>>>>>>> On Feb 5, 2016, at 3:42 PM, hcma wrote: >>>>>>>>> Hi Dr Holt, >>>>>>>>> Thanks for the email. Here is my pipeline, does it seems acceptable? Any comments is welcome and much appreciated. >>>>>>>>> 1. Use maker to generate training gene set: >>>>>>>>> genome=all-chromosome-r1.04.fasta >>>>>>>>> est=Trinity.fasta >>>>>>>>> est2genome=1 >>>>>>>>> 2. Use output of Maker to train SNAP: >>>>>>>>> maker2zff dwil-all-chromosome-r1.04.all.gff >>>>>>>>> fathom genome.ann genome.dna ?gene-stats >>>>>>>>> fathom genome.ann genome.dna ?categorize 1000 >>>>>>>>> fathom genome.ann genome.dna ?gene-stats >>>>>>>>> fathom uni.ann uni.dna ?export 1000 ?plus >>>>>>>>> hmm-assembler.pl genome . > dwil_genome.hmm >>>>>>>>> 3. Use output of Maker to train Augustus on their webserver: >>>>>>>>> File used: >>>>>>>>> Upload ?export.dna? as the genome file >>>>>>>>> Upload ?export.aa? as the protein file >>>>>>>>> 4. second and final Maker run: >>>>>>>>> genome=all-chromosome-r1.04.fasta >>>>>>>>> est=Trinity.fasta >>>>>>>>> est2genome=0 >>>>>>>>> Snaphmm=output of 2 >>>>>>>>> How do i incorporate the output of training set of gene from Augustus web server here into this step 4? >>>>>>>>> Thanks for your time. >>>>>>>>> Best Regards >>>>>>>>> Karen >>>>>>>>>> On 2016-02-05 06:36, Carson Holt wrote: >>>>>>>>>> Hi Karen, >>>>>>>>>> There are many ways to train Augustus. I prefer to identify gene >>>>>>>>>> models in MAKER (GFF3) and use those to train both SNAP and Augustus. >>>>>>>>>> Here is a previous post on the topic ?> >>>>>>>>>> https://groups.google.com/forum/#!searchin/maker-devel/augustus/maker-devel/FWMSTdqWQqI/lC3miQtiCpwJ >>>>>>>>>> [1] >>>>>>>>>> In the end you need to look at the SNAP and Augustus models together >>>>>>>>>> with evidence alignments in a genome browser (like desktop Apollo). >>>>>>>>>> When everything is trained well, both SNAP and Augustus models will >>>>>>>>>> look like each other and both seem to look like the evidence >>>>>>>>>> alignments. >>>>>>>>>> Thanks, >>>>>>>>>> Carson >>>>>>>>>>> On Feb 4, 2016, at 5:52 PM, hcma wrote: >>>>>>>>>>> Hi, >>>>>>>>>>> I have a genome sequence and Trinity assembly for a new species and >>>>>>>>>>> I am wondering what are the best steps to take when using MAKER? >>>>>>>>>>> 1. I used the genome sequence and all assembled Trinity sequence to >>>>>>>>>>> do first run of MAKER in order to generate training set for SNAP and >>>>>>>>>>> Augustus. >>>>>>>>>>> In maker_opts.ctl: >>>>>>>>>>> genome=all-chromosome-r1.04.fasta >>>>>>>>>>> est=Trinity.fasta >>>>>>>>>>> est2genome=1 >>>>>>>>>>> 2. Train SNAP >>>>>>>>>>> 3. Train Augustus >>>>>>>>>>> When i train Augustus, i only supply genome and protein file, should >>>>>>>>>>> i also supply the trinity file here? >>>>>>>>>>> 4. what's the best parameter to use when running MAKER the second >>>>>>>>>>> time for obtaining the final annotation? I would prefer not to use >>>>>>>>>>> any external protein data. >>>>>>>>>>> genome=all-chromosome-r1.04.fasta >>>>>>>>>>> est=Trinity.fasta >>>>>>>>>>> est2genome=0 >>>>>>>>>>> SNAP >>>>>>>>>>> Augustus >>>>>>>>>>> Thanks. >>>>>>>>>>> Best Regards >>>>>>>>>>> KAren >>>>>>>>>> Links: >>>>>>>>>> ------ >>>>>>>>>> [1] >>>>>>>>>> https://groups.google.com/forum/#!searchin/maker-devel/augustus/maker-devel/FWMSTdqWQqI/lC3miQtiCpwJ > From fdolze at students.uni-mainz.de Thu Feb 11 03:43:51 2016 From: fdolze at students.uni-mainz.de (Florian) Date: Thu, 11 Feb 2016 11:43:51 +0100 Subject: [maker-devel] Maker-run with no clean finish on openMPI-cluster Message-ID: <56BC65E7.6000904@students.uni-mainz.de> Hi all, I am no expert for MPI so maybe this is something very trivial or maybe not caused by MAKER at all but I'd be glad to have your thoughts on this. I installed MAKER 2.31.8 with MPI support (openMPI 1.8.1) on our cluster. I ran maker with the options attached and the command in bsub_maker, and I _think_ it worked fine. Here is the last output of maker: running exonerate search. #--------- command -------------# Widget::exonerate::protein2genome: /gpfs/fs1/cluster/Apps/bioinf/maker/2.31.8/exe/exonerate/bin/exonerate -q /project/molgen/workbench_Florian/riparius_MAKER_v2/tmp_fil es/maker_yZhQlA/49/sp%7CQ4JHE0%7CXB36_ORYSJ.for.114901-115619.49.fasta -t /project/molgen/workbench_Florian/riparius_MAKER_v2/tmp_fil es/maker_yZhQlA/49/scaffold299_size115619.114901-115619.49.fasta -Q protein -T dna -m protein2genome --softmasktarget --percent 20 - -showcigar > /project/molgen/workbench_Florian/riparius_MAKER_v2/tmp_files/maker_yZhQlA/49/scaffold299_size115619.114901-115619.sp%7 CQ4JHE0%7CXB36_ORYSJ.p.exonerate #-------------------------------# cleaning blastx... in cluster::shadow_cluster... ...finished clustering. in cluster::shadow_cluster... ...finished clustering. cleaning clusters.... total clusters:37 now processing 0 ...processing 0 of 11 ...processing 1 of 11 ...processing 2 of 11 ...processing 3 of 11 ... ...processing 174 of 177 ...processing 175 of 177 ...processing 176 of 177 flattening protein clusters prepare section files Maker is now finished!!! Start_time: 1454700985 End_time: 1455023070 Elapsed: 322085 but my cluster job didnt finish here, instead I got the following errors until my runtime limit of 5 days was reached: Argument "ALRM" isn't numeric in exit at /home/fdolze/perl5/lib/perl5/x86_64-linux-thread-multi/forks.pm line 2184. SIGTERM received Argument "ALRM" isn't numeric in exit at /home/fdolze/perl5/lib/perl5/x86_64-linux-thread-multi/forks.pm line 2184. SIGTERM received Perl exited with active threads: 1 running and unjoined 0 finished and unjoined 0 running and detached SIGTERM received Perl exited with active threads: 1 running and unjoined 0 finished and unjoined 0 running and detached SIGTERM received Perl exited with active threads: 1 running and unjoined 0 finished and unjoined 0 running and detached SIGTERM received SIGTERM received Perl exited with active threads: 1 running and unjoined 0 finished and unjoined 0 running and detached SIGTERM received Perl exited with active threads: 1 running and unjoined 0 finished and unjoined 0 running and detached SIGTERM received Perl exited with active threads: 1 running and unjoined 0 finished and unjoined 0 running and detached Argument "ALRM" isn't numeric in exit at /home/fdolze/perl5/lib/perl5/x86_64-linux-thread-multi/forks.pm line 2184. SIGTERM received SIGTERM received Perl exited with active threads: 1 running and unjoined 0 finished and unjoined 0 running and detached Argument "ALRM" isn't numeric in exit at /home/fdolze/perl5/lib/perl5/x86_64-linux-thread-multi/forks.pm line 2184. Argument "ALRM" isn't numeric in exit at /home/fdolze/perl5/lib/perl5/x86_64-linux-thread-multi/forks.pm line 2184. SIGTERM received SIGTERM received SIGTERM received SIGTERM received Perl exited with active threads: 1 running and unjoined 0 finished and unjoined 0 running and detached SIGTERM received SIGTERM received Perl exited with active threads: 1 running and unjoined 0 finished and unjoined 0 running and detached Argument "ALRM" isn't numeric in exit at /home/fdolze/perl5/lib/perl5/x86_64-linux-thread-multi/forks.pm line 2184. [a0238:09542] *** Process received signal *** [a0238:09542] Signal: Segmentation fault (11) [a0238:09542] Signal code: Address not mapped (1) [a0238:09542] Failing at address: 0xa80 [a0238:09542] [ 0] /lib64/libpthread.so.0(+0xf710)[0x2ba955727710] [a0238:09542] [ 1] /usr/lib64/perl5/CORE/libperl.so(Perl_csighandler+0x22)[0x2ba954715002] [a0238:09542] [ 2] /lib64/libpthread.so.0(+0xf710)[0x2ba955727710] [a0238:09542] [ 3] /lib64/libc.so.6(__poll+0x53)[0x2ba955a170d3] [a0238:09542] [ 4] /cluster/mpi/gcc_4.4.7/OpenMPI-1.8.1/lib/libopen-pal.so.6(+0x6cfca)[0x2ba955fb4fca] [a0238:09542] [ 5] /cluster/mpi/gcc_4.4.7/OpenMPI-1.8.1/lib/libopen-pal.so.6(opal_libevent2021_event_base_loop+0x221)[0x2ba955fabf11] [a0238:09542] [ 6] /cluster/mpi/gcc_4.4.7/OpenMPI-1.8.1/lib/libopen-rte.so.7(+0x376ae)[0x2ba955d076ae] [a0238:09542] [ 7] /lib64/libpthread.so.0(+0x79d1)[0x2ba95571f9d1] [a0238:09542] [ 8] /lib64/libc.so.6(clone+0x6d)[0x2ba955a208fd] [a0238:09542] *** End of error message *** Argument "ALRM" isn't numeric in exit at /home/fdolze/perl5/lib/perl5/x86_64-linux-thread-multi/forks.pm line 2184. SIGTERM received SIGTERM received ... maybe someone experienced something similar before or can give me some hint if this is caused by my setup or by maker. kind regards, Florian Dolze -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- #-----BLAST and Exonerate Statistics Thresholds blast_type=ncbi+ #set to 'ncbi+', 'ncbi' or 'wublast' pcov_blastn=0.8 #Blastn Percent Coverage Threhold EST-Genome Alignments pid_blastn=0.85 #Blastn Percent Identity Threshold EST-Genome Aligments eval_blastn=1e-10 #Blastn eval cutoff bit_blastn=40 #Blastn bit cutoff depth_blastn=0 #Blastn depth cutoff (0 to disable cutoff) pcov_blastx=0.5 #Blastx Percent Coverage Threhold Protein-Genome Alignments pid_blastx=0.4 #Blastx Percent Identity Threshold Protein-Genome Aligments eval_blastx=1e-06 #Blastx eval cutoff bit_blastx=30 #Blastx bit cutoff depth_blastx=0 #Blastx depth cutoff (0 to disable cutoff) pcov_tblastx=0.8 #tBlastx Percent Coverage Threhold alt-EST-Genome Alignments pid_tblastx=0.85 #tBlastx Percent Identity Threshold alt-EST-Genome Aligments eval_tblastx=1e-10 #tBlastx eval cutoff bit_tblastx=40 #tBlastx bit cutoff depth_tblastx=0 #tBlastx depth cutoff (0 to disable cutoff) pcov_rm_blastx=0.5 #Blastx Percent Coverage Threhold For Transposable Element Masking pid_rm_blastx=0.4 #Blastx Percent Identity Threshold For Transposbale Element Masking eval_rm_blastx=1e-06 #Blastx eval cutoff for transposable element masking bit_rm_blastx=30 #Blastx bit cutoff for transposable element masking ep_score_limit=20 #Exonerate protein percent of maximal score threshold en_score_limit=20 #Exonerate nucleotide percent of maximal score threshold -------------- next part -------------- #-----Location of Executables Used by MAKER/EVALUATOR makeblastdb=/cluster/Apps/bioinf/BLAST/2.2.28/bin/makeblastdb #location of NCBI+ makeblastdb executable blastn=/cluster/Apps/bioinf/BLAST/2.2.28/bin/blastn #location of NCBI+ blastn executable blastx=/cluster/Apps/bioinf/BLAST/2.2.28/bin/blastx #location of NCBI+ blastx executable tblastx=/cluster/Apps/bioinf/BLAST/2.2.28/bin/tblastx #location of NCBI+ tblastx executable formatdb= #location of NCBI formatdb executable blastall= #location of NCBI blastall executable xdformat= #location of WUBLAST xdformat executable blasta= #location of WUBLAST blasta executable RepeatMasker=/gpfs/fs1/cluster/Apps/bioinf/maker/2.31.8/bin/../exe/RepeatMasker/RepeatMasker #location of RepeatMasker executable exonerate=/gpfs/fs1/cluster/Apps/bioinf/maker/2.31.8/bin/../exe/exonerate/bin/exonerate #location of exonerate executable #-----Ab-initio Gene Prediction Algorithms snap=/gpfs/fs1/cluster/Apps/bioinf/maker/2.31.8/bin/../exe/snap/snap #location of snap executable gmhmme3=/project/molgen/Maker_additional_tools/genemark-4.32/gmhmme3 #location of eukaryotic genemark executable gmhmmp= #location of prokaryotic genemark executable augustus=/project/molgen/Maker_additional_tools/augustus-3.2.1/bin/augustus #location of augustus executable fgenesh= #location of fgenesh executable tRNAscan-SE=/project/molgen/Maker_additional_tools/tRNAscan/bin/tRNAscan-SE #location of trnascan executable snoscan=/project/molgen/Maker_additional_tools/snoscan/bin/snoscan #location of snoscan executable #-----Other Algorithms probuild=/project/molgen/Maker_additional_tools/genemark-4.32/probuild #location of probuild executable (required for genemark) -------------- next part -------------- #-----Genome (these are always required) genome= /project/molgen/workbench_Florian/riparius_MAKER_v2/Crip_genome_v20_newHead.fa organism_type=eukaryotic #eukaryotic or prokaryotic. Default is eukaryotic #-----Re-annotation Using MAKER Derived GFF3 maker_gff= #MAKER derived GFF3 file est_pass=0 #use ESTs in maker_gff: 1 = yes, 0 = no altest_pass=0 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no protein_pass=0 #use protein alignments in maker_gff: 1 = yes, 0 = no rm_pass=0 #use repeats in maker_gff: 1 = yes, 0 = no model_pass=0 #use gene models in maker_gff: 1 = yes, 0 = no pred_pass=0 #use ab-initio predictions in maker_gff: 1 = yes, 0 = no other_pass=0 #passthrough anyything else in maker_gff: 1 = yes, 0 = no #-----EST Evidence (for best results provide a file for at least one) est=/project/molgen/workbench_Florian/riparius_MAKER_v2/riparius_cDNA_formatedHeader.fa #set of ESTs or assembled mRNA-seq in fasta format altest= #EST/cDNA sequence file in fasta format from an alternate organism est_gff= #aligned ESTs or mRNA-seq from an external GFF3 file altest_gff= #aligned ESTs from a closly relate species in GFF3 format #-----Protein Homology Evidence (for best results provide a file for at least one) protein=/project/molgen/workbench_Florian/riparius_MAKER_v2/uniprot_sprot.fasta #protein sequence file in fasta format (i.e. from mutiple oransisms) protein_gff= #aligned protein homology evidence from an external GFF3 file #-----Repeat Masking (leave values blank to skip repeat masking) model_org=all #select a model organism for RepBase masking in RepeatMasker rmlib=/project/molgen/workbench_Florian/riparius_MAKER_v2/20151208_Custom_Crip_repeat_library_final.fas #provide an organism specific repeat library in fasta format for RepeatMasker repeat_protein=/gpfs/fs1/cluster/Apps/bioinf/maker/2.31.8/data/te_proteins.fasta #provide a fasta file of transposable element proteins for RepeatRunner rm_gff= #pre-identified repeat elements from an external GFF3 file prok_rm=0 #forces MAKER to repeatmask prokaryotes (no reason to change this), 1 = yes, 0 = no softmask=1 #use soft-masking rather than hard-masking in BLAST (i.e. seg and dust filtering) #-----Gene Prediction snaphmm=/project/molgen/workbench_Florian/riparius_MAKER_v2/cegmasnap.hmm #SNAP HMM file gmhmm=/project/molgen/workbench_Florian/riparius_MAKER_v2/gmhmm.mod #GeneMark HMM file augustus_species=Riparius_Neu #Augustus gene prediction species model fgenesh_par_file= #FGENESH parameter file pred_gff= #ab-initio predictions from an external GFF3 file model_gff= #annotated gene models from an external GFF3 file (annotation pass-through) est2genome=1 #infer gene predictions directly from ESTs, 1 = yes, 0 = no protein2genome=1 #infer predictions from protein homology, 1 = yes, 0 = no trna=1 #find tRNAs with tRNAscan, 1 = yes, 0 = no snoscan_rrna=/project/molgen/workbench_Florian/riparius_MAKER_v2/C.thummi_28S_rDNA_gene.fasta #rRNA file to have Snoscan find snoRNAs unmask=0 #also run ab-initio prediction programs on unmasked sequence, 1 = yes, 0 = no #-----Other Annotation Feature Types (features MAKER doesn't recognize) other_gff= #extra features to pass-through to final MAKER generated GFF3 file #-----External Application Behavior Options alt_peptide=C #amino acid used to replace non-standard amino acids in BLAST databases cpus=1 #max number of cpus to use in BLAST and RepeatMasker (not for MPI, leave 1 when using MPI) #-----MAKER Behavior Options max_dna_len=2100000 #length for dividing up contigs into chunks (increases/decreases memory usage) min_contig=1 #skip genome contigs below this length (under 10kb are often useless) pred_flank=200 #flank for extending evidence clusters sent to gene predictors pred_stats=0 #report AED and QI statistics for all predictions as well as models AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and 1) min_protein=0 #require at least this many amino acids in predicted proteins alt_splice=0 #Take extra steps to try and find alternative splicing, 1 = yes, 0 = no always_complete=0 #extra steps to force start and stop codons, 1 = yes, 0 = no map_forward=0 #map names and attributes forward from old GFF3 genes, 1 = yes, 0 = no keep_preds=1 #Concordance threshold to add unsupported gene prediction (bound by 0 and 1) split_hit=10000 #length for the splitting of hits (expected max intron size for evidence alignments) single_exon=1 #consider single exon EST evidence when generating annotations, 1 = yes, 0 = no single_length=250 #min length required for single exon ESTs if 'single_exon is enabled' correct_est_fusion=0 #limits use of ESTs in annotation to avoid fusion genes tries=2 #number of times to try a contig if there is a failure for some reason clean_try=0 #remove all data from previous run before retrying, 1 = yes, 0 = no clean_up=0 #removes theVoid directory with individual analysis files, 1 = yes, 0 = no TMP=/project/molgen/workbench_Florian/riparius_MAKER_v2/tmp_files #specify a directory other than the system default temporary directory for temporary files -------------- next part -------------- #!/bin/bash #BSUB -n 128 #BSUB -q long #BSUB -W 7200 #BSUB -o mogon_maker_MPIrun_5_feb.log #BSUB -J riparius_makerMPI #BSUB -app Reserve1G mpiexec -mca btl ^openib -n 128 /project/molgen/Bio/maker-2.31.8_MPI-1.8.1/bin/maker -base maker_MPIrun3 -fix_nucleotides From hcma at uci.edu Thu Feb 11 15:32:45 2016 From: hcma at uci.edu (hcma) Date: Thu, 11 Feb 2016 14:32:45 -0800 Subject: [maker-devel] Q on MAKER In-Reply-To: References: <07C1C0FA-243E-4F05-AE17-FBECCE5BC3A5@gmail.com> <980827F7-C4F0-4E26-A480-F588B2E09C7B@gmail.com> <99f6989955acdf6fd6b0875affbeefa9@uci.edu> <308df3bd49b7098b6f0d36cb2675a0de@uci.edu> <4b6492c5148151cc52c91f2d56c6532b@uci.edu> <54B64F8D-D4A6-4E0F-9693-3AC4055CEA02@gmail.com> <5a40b7af9947dc8297046ba52620569e@uci.edu> <9449FD1E-5D22-4B0B-A61A-644D7B63335B@gmail.com> <7e4d6f2773f654f8530155936b648832@uci.edu> <7495272C-476A-4985-8D49-30D991410535@gmail.com> <7870d65f86546a8b486faf98c1f6fcc0@uci.edu> Message-ID: Hi Carlson, Thanks for sharing. I did assemble the Illumina RNA-seq PE100 reads de novo using Trinity and i input this to the 1st run of maker for generating a set of genes to train SNAP and augustus. Now, i am planning to run a 2nd run (and perhaps final run) of maker for gene prediction, provided that the result of Snap and Augustus looks similar to each other. I was going to incorporate the GFF result from tophat into 2nd run of maker for gene prediction, along with Trinity output, but avoiding external protein annotation. I already did a separate blast analysis to identify orthologous genes and i prefer to run maker without any protein evidence. Do you recommend to input the output of tophat2gff for this second run of maker for gene prediction? Thanks again for your time and advise. Best Regards Karen On 2016-02-10 18:32, Carson Holt wrote: > I find tophat results to be too noisy, and prefer cufflinks. There is > both a tophat2gff and cufflinks2gff script that comes with MAKER. Also > consider assembling the reads with Trinity (my overall preferred > method because it yields the highest specificity). > > --Carson > > Sent from my iPhone > >> On Feb 10, 2016, at 3:27 PM, hcma wrote: >> >> Hi Mike, >> >> Thanks for the reply. So i can input raw RNA-seq reads to Tophat and >> feed the output to maker? >> >> Thanks. >> >> Best Regards >> KAren >> >> >> >>> On 2016-02-10 06:17, Michael Campbell wrote: >>> HI Karen, >>> From my experience trimming reads will not make things worse and it >>> generally makes things better. As far as the best program to use, one >>> doesn?t really stand out above the others as far as I can tell. >>> However, with paired end reads it is important to use a trimmer that >>> preserves the pairing between the two files (i.e when an entire read >>> is discarded the paired read is moved into a file for singletons). >>> Thanks >>> Mike >>>> On Feb 9, 2016, at 5:35 PM, hcma wrote: >>>> Hi Carson, >>>> For the final run of annotation, I would like to incorporate tophat >>>> results from RNA-seq data, from your experience, do you know if it >>>> is better to use raw RNA-seq (Illumina paired-end data) or trimmed >>>> (trimmed using Trimmomatuc) data for feeding into tophat? If >>>> trimmed, do you recommend a particular programme? >>>> Thanks for your time. >>>> Best Regards >>>> KAren >>>>> On 2016-02-05 15:33, Carson Holt wrote: >>>>> I recommend using both. You probably don't have augustus >>>>> installed. >>>>> --Carson >>>>> Sent from my iPhone >>>>>> On Feb 5, 2016, at 4:20 PM, hcma wrote: >>>>>> Hi Carlson, >>>>>> Thanks for the instruction and in maker_exe.ctl, i only see path >>>>>> to snap, but not to augustus, so my system admin is checking this >>>>>> for me. >>>>>> From some manual i found, people use both snap and augustus when >>>>>> using MAKER to annotate genomes. Would you recommend using both or >>>>>> one of the 2 is sufficient? >>>>>> Thanks for your valuable time and advise. >>>>>> Best Regards >>>>>> Karen >>>>>>> On 2016-02-05 15:03, Carson Holt wrote: >>>>>>> You need to find out where the augustus MAKER is using is >>>>>>> installed. >>>>>>> Check the maker_exe.ctl file you are using, or type ?which >>>>>>> augustus?. >>>>>>> ?Carson >>>>>>>> On Feb 5, 2016, at 3:58 PM, hcma wrote: >>>>>>>> Hi Carlson, >>>>>>>> These are the list of directories under maker/2.31.8 >>>>>>>> bin data GMOD INSTALL lib LICENSE MWAS perl README >>>>>>>> RELEASE src >>>>>>>> Where can i find augustus/? Or i have to ask my system admin to >>>>>>>> install this? >>>>>>>> Thanks. >>>>>>>> Best Regards >>>>>>>> Karen >>>>>>>>> On 2016-02-05 14:54, Carson Holt wrote: >>>>>>>>> Augustus gives you an entire directory rather than just a >>>>>>>>> single file >>>>>>>>> like SNAP. You have to take the directory and copy it to the >>>>>>>>> .../augustus/config/species/ directory. >>>>>>>>> Example: >>>>>>>>> ?/augustus/config/species/arabidopsis/ >>>>>>>>> Then ?arabidopsis? would be the species name to use with MAKER. >>>>>>>>> Sometimes you may have to do a second round of both SNAP and >>>>>>>>> Augustus >>>>>>>>> training (called bootstrapping). Look at the models you get >>>>>>>>> after the >>>>>>>>> first round, and if they look good then, the second round is >>>>>>>>> probably >>>>>>>>> not going top be beneficial. >>>>>>>>> ?Carson >>>>>>>>>> On Feb 5, 2016, at 3:42 PM, hcma wrote: >>>>>>>>>> Hi Dr Holt, >>>>>>>>>> Thanks for the email. Here is my pipeline, does it seems >>>>>>>>>> acceptable? Any comments is welcome and much appreciated. >>>>>>>>>> 1. Use maker to generate training gene set: >>>>>>>>>> genome=all-chromosome-r1.04.fasta >>>>>>>>>> est=Trinity.fasta >>>>>>>>>> est2genome=1 >>>>>>>>>> 2. Use output of Maker to train SNAP: >>>>>>>>>> maker2zff dwil-all-chromosome-r1.04.all.gff >>>>>>>>>> fathom genome.ann genome.dna ?gene-stats >>>>>>>>>> fathom genome.ann genome.dna ?categorize 1000 >>>>>>>>>> fathom genome.ann genome.dna ?gene-stats >>>>>>>>>> fathom uni.ann uni.dna ?export 1000 ?plus >>>>>>>>>> hmm-assembler.pl genome . > dwil_genome.hmm >>>>>>>>>> 3. Use output of Maker to train Augustus on their webserver: >>>>>>>>>> File used: >>>>>>>>>> Upload ?export.dna? as the genome file >>>>>>>>>> Upload ?export.aa? as the protein file >>>>>>>>>> 4. second and final Maker run: >>>>>>>>>> genome=all-chromosome-r1.04.fasta >>>>>>>>>> est=Trinity.fasta >>>>>>>>>> est2genome=0 >>>>>>>>>> Snaphmm=output of 2 >>>>>>>>>> How do i incorporate the output of training set of gene from >>>>>>>>>> Augustus web server here into this step 4? >>>>>>>>>> Thanks for your time. >>>>>>>>>> Best Regards >>>>>>>>>> Karen >>>>>>>>>>> On 2016-02-05 06:36, Carson Holt wrote: >>>>>>>>>>> Hi Karen, >>>>>>>>>>> There are many ways to train Augustus. I prefer to identify >>>>>>>>>>> gene >>>>>>>>>>> models in MAKER (GFF3) and use those to train both SNAP and >>>>>>>>>>> Augustus. >>>>>>>>>>> Here is a previous post on the topic ?> >>>>>>>>>>> https://groups.google.com/forum/#!searchin/maker-devel/augustus/maker-devel/FWMSTdqWQqI/lC3miQtiCpwJ >>>>>>>>>>> [1] >>>>>>>>>>> In the end you need to look at the SNAP and Augustus models >>>>>>>>>>> together >>>>>>>>>>> with evidence alignments in a genome browser (like desktop >>>>>>>>>>> Apollo). >>>>>>>>>>> When everything is trained well, both SNAP and Augustus >>>>>>>>>>> models will >>>>>>>>>>> look like each other and both seem to look like the evidence >>>>>>>>>>> alignments. >>>>>>>>>>> Thanks, >>>>>>>>>>> Carson >>>>>>>>>>>> On Feb 4, 2016, at 5:52 PM, hcma wrote: >>>>>>>>>>>> Hi, >>>>>>>>>>>> I have a genome sequence and Trinity assembly for a new >>>>>>>>>>>> species and >>>>>>>>>>>> I am wondering what are the best steps to take when using >>>>>>>>>>>> MAKER? >>>>>>>>>>>> 1. I used the genome sequence and all assembled Trinity >>>>>>>>>>>> sequence to >>>>>>>>>>>> do first run of MAKER in order to generate training set for >>>>>>>>>>>> SNAP and >>>>>>>>>>>> Augustus. >>>>>>>>>>>> In maker_opts.ctl: >>>>>>>>>>>> genome=all-chromosome-r1.04.fasta >>>>>>>>>>>> est=Trinity.fasta >>>>>>>>>>>> est2genome=1 >>>>>>>>>>>> 2. Train SNAP >>>>>>>>>>>> 3. Train Augustus >>>>>>>>>>>> When i train Augustus, i only supply genome and protein >>>>>>>>>>>> file, should >>>>>>>>>>>> i also supply the trinity file here? >>>>>>>>>>>> 4. what's the best parameter to use when running MAKER the >>>>>>>>>>>> second >>>>>>>>>>>> time for obtaining the final annotation? I would prefer not >>>>>>>>>>>> to use >>>>>>>>>>>> any external protein data. >>>>>>>>>>>> genome=all-chromosome-r1.04.fasta >>>>>>>>>>>> est=Trinity.fasta >>>>>>>>>>>> est2genome=0 >>>>>>>>>>>> SNAP >>>>>>>>>>>> Augustus >>>>>>>>>>>> Thanks. >>>>>>>>>>>> Best Regards >>>>>>>>>>>> KAren >>>>>>>>>>> Links: >>>>>>>>>>> ------ >>>>>>>>>>> [1] >>>>>>>>>>> https://groups.google.com/forum/#!searchin/maker-devel/augustus/maker-devel/FWMSTdqWQqI/lC3miQtiCpwJ >> From carsonhh at gmail.com Thu Feb 11 15:36:44 2016 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 11 Feb 2016 15:36:44 -0700 Subject: [maker-devel] Q on MAKER In-Reply-To: References: <07C1C0FA-243E-4F05-AE17-FBECCE5BC3A5@gmail.com> <980827F7-C4F0-4E26-A480-F588B2E09C7B@gmail.com> <99f6989955acdf6fd6b0875affbeefa9@uci.edu> <308df3bd49b7098b6f0d36cb2675a0de@uci.edu> <4b6492c5148151cc52c91f2d56c6532b@uci.edu> <54B64F8D-D4A6-4E0F-9693-3AC4055CEA02@gmail.com> <5a40b7af9947dc8297046ba52620569e@uci.edu> <9449FD1E-5D22-4B0B-A61A-644D7B63335B@gmail.com> <7e4d6f2773f654f8530155936b648832@uci.edu> <7495272C-476A-4985-8D49-30D991410535@gmail.com> <7870d65f86546a8b486faf98c1f6fcc0@uci.edu> Message-ID: <56F1935F-F6BA-4755-92F2-17EE81909619@gmail.com> Not if you already have trinity results. It will actually decrease the specificity of the run (i.e. causes false gene calls because of spurious evidence support). ?Carson > On Feb 11, 2016, at 3:32 PM, hcma wrote: > > Hi Carlson, > > Thanks for sharing. > > I did assemble the Illumina RNA-seq PE100 reads de novo using Trinity and i input this to the 1st run of maker for generating a set of genes to train SNAP and augustus. Now, i am planning to run a 2nd run (and perhaps final run) of maker for gene prediction, provided that the result of Snap and Augustus looks similar to each other. > > I was going to incorporate the GFF result from tophat into 2nd run of maker for gene prediction, along with Trinity output, but avoiding external protein annotation. I already did a separate blast analysis to identify orthologous genes and i prefer to run maker without any protein evidence. > > Do you recommend to input the output of tophat2gff for this second run of maker for gene prediction? > > Thanks again for your time and advise. > > Best Regards > Karen > > > > On 2016-02-10 18:32, Carson Holt wrote: >> I find tophat results to be too noisy, and prefer cufflinks. There is >> both a tophat2gff and cufflinks2gff script that comes with MAKER. Also >> consider assembling the reads with Trinity (my overall preferred >> method because it yields the highest specificity). >> --Carson >> Sent from my iPhone >>> On Feb 10, 2016, at 3:27 PM, hcma wrote: >>> Hi Mike, >>> Thanks for the reply. So i can input raw RNA-seq reads to Tophat and feed the output to maker? >>> Thanks. >>> Best Regards >>> KAren >>>> On 2016-02-10 06:17, Michael Campbell wrote: >>>> HI Karen, >>>> From my experience trimming reads will not make things worse and it >>>> generally makes things better. As far as the best program to use, one >>>> doesn?t really stand out above the others as far as I can tell. >>>> However, with paired end reads it is important to use a trimmer that >>>> preserves the pairing between the two files (i.e when an entire read >>>> is discarded the paired read is moved into a file for singletons). >>>> Thanks >>>> Mike >>>>> On Feb 9, 2016, at 5:35 PM, hcma wrote: >>>>> Hi Carson, >>>>> For the final run of annotation, I would like to incorporate tophat results from RNA-seq data, from your experience, do you know if it is better to use raw RNA-seq (Illumina paired-end data) or trimmed (trimmed using Trimmomatuc) data for feeding into tophat? If trimmed, do you recommend a particular programme? >>>>> Thanks for your time. >>>>> Best Regards >>>>> KAren >>>>>> On 2016-02-05 15:33, Carson Holt wrote: >>>>>> I recommend using both. You probably don't have augustus installed. >>>>>> --Carson >>>>>> Sent from my iPhone >>>>>>> On Feb 5, 2016, at 4:20 PM, hcma wrote: >>>>>>> Hi Carlson, >>>>>>> Thanks for the instruction and in maker_exe.ctl, i only see path to snap, but not to augustus, so my system admin is checking this for me. >>>>>>> From some manual i found, people use both snap and augustus when using MAKER to annotate genomes. Would you recommend using both or one of the 2 is sufficient? >>>>>>> Thanks for your valuable time and advise. >>>>>>> Best Regards >>>>>>> Karen >>>>>>>> On 2016-02-05 15:03, Carson Holt wrote: >>>>>>>> You need to find out where the augustus MAKER is using is installed. >>>>>>>> Check the maker_exe.ctl file you are using, or type ?which augustus?. >>>>>>>> ?Carson >>>>>>>>> On Feb 5, 2016, at 3:58 PM, hcma wrote: >>>>>>>>> Hi Carlson, >>>>>>>>> These are the list of directories under maker/2.31.8 >>>>>>>>> bin data GMOD INSTALL lib LICENSE MWAS perl README RELEASE src >>>>>>>>> Where can i find augustus/? Or i have to ask my system admin to install this? >>>>>>>>> Thanks. >>>>>>>>> Best Regards >>>>>>>>> Karen >>>>>>>>>> On 2016-02-05 14:54, Carson Holt wrote: >>>>>>>>>> Augustus gives you an entire directory rather than just a single file >>>>>>>>>> like SNAP. You have to take the directory and copy it to the >>>>>>>>>> .../augustus/config/species/ directory. >>>>>>>>>> Example: >>>>>>>>>> ?/augustus/config/species/arabidopsis/ >>>>>>>>>> Then ?arabidopsis? would be the species name to use with MAKER. >>>>>>>>>> Sometimes you may have to do a second round of both SNAP and Augustus >>>>>>>>>> training (called bootstrapping). Look at the models you get after the >>>>>>>>>> first round, and if they look good then, the second round is probably >>>>>>>>>> not going top be beneficial. >>>>>>>>>> ?Carson >>>>>>>>>>> On Feb 5, 2016, at 3:42 PM, hcma wrote: >>>>>>>>>>> Hi Dr Holt, >>>>>>>>>>> Thanks for the email. Here is my pipeline, does it seems acceptable? Any comments is welcome and much appreciated. >>>>>>>>>>> 1. Use maker to generate training gene set: >>>>>>>>>>> genome=all-chromosome-r1.04.fasta >>>>>>>>>>> est=Trinity.fasta >>>>>>>>>>> est2genome=1 >>>>>>>>>>> 2. Use output of Maker to train SNAP: >>>>>>>>>>> maker2zff dwil-all-chromosome-r1.04.all.gff >>>>>>>>>>> fathom genome.ann genome.dna ?gene-stats >>>>>>>>>>> fathom genome.ann genome.dna ?categorize 1000 >>>>>>>>>>> fathom genome.ann genome.dna ?gene-stats >>>>>>>>>>> fathom uni.ann uni.dna ?export 1000 ?plus >>>>>>>>>>> hmm-assembler.pl genome . > dwil_genome.hmm >>>>>>>>>>> 3. Use output of Maker to train Augustus on their webserver: >>>>>>>>>>> File used: >>>>>>>>>>> Upload ?export.dna? as the genome file >>>>>>>>>>> Upload ?export.aa? as the protein file >>>>>>>>>>> 4. second and final Maker run: >>>>>>>>>>> genome=all-chromosome-r1.04.fasta >>>>>>>>>>> est=Trinity.fasta >>>>>>>>>>> est2genome=0 >>>>>>>>>>> Snaphmm=output of 2 >>>>>>>>>>> How do i incorporate the output of training set of gene from Augustus web server here into this step 4? >>>>>>>>>>> Thanks for your time. >>>>>>>>>>> Best Regards >>>>>>>>>>> Karen >>>>>>>>>>>> On 2016-02-05 06:36, Carson Holt wrote: >>>>>>>>>>>> Hi Karen, >>>>>>>>>>>> There are many ways to train Augustus. I prefer to identify gene >>>>>>>>>>>> models in MAKER (GFF3) and use those to train both SNAP and Augustus. >>>>>>>>>>>> Here is a previous post on the topic ?> >>>>>>>>>>>> https://groups.google.com/forum/#!searchin/maker-devel/augustus/maker-devel/FWMSTdqWQqI/lC3miQtiCpwJ >>>>>>>>>>>> [1] >>>>>>>>>>>> In the end you need to look at the SNAP and Augustus models together >>>>>>>>>>>> with evidence alignments in a genome browser (like desktop Apollo). >>>>>>>>>>>> When everything is trained well, both SNAP and Augustus models will >>>>>>>>>>>> look like each other and both seem to look like the evidence >>>>>>>>>>>> alignments. >>>>>>>>>>>> Thanks, >>>>>>>>>>>> Carson >>>>>>>>>>>>> On Feb 4, 2016, at 5:52 PM, hcma wrote: >>>>>>>>>>>>> Hi, >>>>>>>>>>>>> I have a genome sequence and Trinity assembly for a new species and >>>>>>>>>>>>> I am wondering what are the best steps to take when using MAKER? >>>>>>>>>>>>> 1. I used the genome sequence and all assembled Trinity sequence to >>>>>>>>>>>>> do first run of MAKER in order to generate training set for SNAP and >>>>>>>>>>>>> Augustus. >>>>>>>>>>>>> In maker_opts.ctl: >>>>>>>>>>>>> genome=all-chromosome-r1.04.fasta >>>>>>>>>>>>> est=Trinity.fasta >>>>>>>>>>>>> est2genome=1 >>>>>>>>>>>>> 2. Train SNAP >>>>>>>>>>>>> 3. Train Augustus >>>>>>>>>>>>> When i train Augustus, i only supply genome and protein file, should >>>>>>>>>>>>> i also supply the trinity file here? >>>>>>>>>>>>> 4. what's the best parameter to use when running MAKER the second >>>>>>>>>>>>> time for obtaining the final annotation? I would prefer not to use >>>>>>>>>>>>> any external protein data. >>>>>>>>>>>>> genome=all-chromosome-r1.04.fasta >>>>>>>>>>>>> est=Trinity.fasta >>>>>>>>>>>>> est2genome=0 >>>>>>>>>>>>> SNAP >>>>>>>>>>>>> Augustus >>>>>>>>>>>>> Thanks. >>>>>>>>>>>>> Best Regards >>>>>>>>>>>>> KAren >>>>>>>>>>>> Links: >>>>>>>>>>>> ------ >>>>>>>>>>>> [1] >>>>>>>>>>>> https://groups.google.com/forum/#!searchin/maker-devel/augustus/maker-devel/FWMSTdqWQqI/lC3miQtiCpwJ > From hcma at uci.edu Thu Feb 11 17:18:43 2016 From: hcma at uci.edu (hcma) Date: Thu, 11 Feb 2016 16:18:43 -0800 Subject: [maker-devel] Q on MAKER In-Reply-To: References: <07C1C0FA-243E-4F05-AE17-FBECCE5BC3A5@gmail.com> <980827F7-C4F0-4E26-A480-F588B2E09C7B@gmail.com> <99f6989955acdf6fd6b0875affbeefa9@uci.edu> Message-ID: Hi Carson, I have downloaded Apollo and what format of the SNAP and Augustus models does Apollo take? Do i need to reformat the SNAP.hmm and which output of Augustus to use if I train Augustus manually? Thanks again for your time. Best Regards Karen On 2016-02-05 06:36, Carson Holt wrote: > Hi Karen, > > There are many ways to train Augustus. I prefer to identify gene > models in MAKER (GFF3) and use those to train both SNAP and Augustus. > Here is a previous post on the topic ?> > https://groups.google.com/forum/#!searchin/maker-devel/augustus/maker-devel/FWMSTdqWQqI/lC3miQtiCpwJ > [1] > > In the end you need to look at the SNAP and Augustus models together > with evidence alignments in a genome browser (like desktop Apollo). > When everything is trained well, both SNAP and Augustus models will > look like each other and both seem to look like the evidence > alignments. > > Thanks, > Carson > >> On Feb 4, 2016, at 5:52 PM, hcma wrote: >> >> Hi, >> >> I have a genome sequence and Trinity assembly for a new species and >> I am wondering what are the best steps to take when using MAKER? >> >> 1. I used the genome sequence and all assembled Trinity sequence to >> do first run of MAKER in order to generate training set for SNAP and >> Augustus. >> >> In maker_opts.ctl: >> genome=all-chromosome-r1.04.fasta >> est=Trinity.fasta >> est2genome=1 >> >> 2. Train SNAP >> >> 3. Train Augustus >> >> When i train Augustus, i only supply genome and protein file, should >> i also supply the trinity file here? >> >> 4. what's the best parameter to use when running MAKER the second >> time for obtaining the final annotation? I would prefer not to use >> any external protein data. >> >> genome=all-chromosome-r1.04.fasta >> est=Trinity.fasta >> est2genome=0 >> SNAP >> Augustus >> >> Thanks. >> >> Best Regards >> KAren > > > > Links: > ------ > [1] > https://groups.google.com/forum/#!searchin/maker-devel/augustus/maker-devel/FWMSTdqWQqI/lC3miQtiCpwJ From panos.ioannidis at gmail.com Fri Feb 12 01:35:49 2016 From: panos.ioannidis at gmail.com (Panos Ioannidis) Date: Fri, 12 Feb 2016 09:35:49 +0100 Subject: [maker-devel] GFF features from Maker Message-ID: Hi guys, I have a few questions regarding annotated features in the GFF file built by Maker. 1) I'm a bit confused about the annotations coming from "est2genome" and "blastn", because they both give "expressed_sequence_match" features. So, what's the difference between them? How do the EST matches from est2genome differ from those from blastn? 2) Same goes for "protein2genome" and "blastx", since they both give "protein_match" features. 3) Last, what is the difference between the partial matches and full-length matches? For example, in almost all cases where est2genome gives an "expressed_sequence_match" feature for a genomic area, it also gives a "match_part" feature for sub-areas within this area. What is the meaning of this? I'm pasting one such area, below. scaffold3|size1771164 est2genome expressed_sequence_match 21953 22276 949 + . ID=scaffold3|size1771164:hit:1901:3.2.0.0;Name=C24476_a_3_0_l_241 scaffold3|size1771164 est2genome match_part 21953 22035 949 + . ID=scaffold3|size1771164:hsp:1902:3.2.0.0;Parent=scaffold3|size1771164:hit:1901:3.2.0.0;Target=C24476_a_3_0_l_241 1 83 +;Gap=M83 scaffold3|size1771164 est2genome match_part 22148 22276 949 + . ID=scaffold3|size1771164:hsp:1903:3.2.0.0;Parent=scaffold3|size1771164:hit:1901:3.2.0.0;Target=C24476_a_3_0_l_241 84 215 +;Gap=M104 D2 M7 I4 M8 I1 M8 Thanks, Panos -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Feb 12 07:48:46 2016 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 12 Feb 2016 07:48:46 -0700 Subject: [maker-devel] GFF features from Maker In-Reply-To: References: Message-ID: <806D9F3C-13AF-4EDE-ACA8-DA981255E5DD@gmail.com> Hi Panos, Terms used are governed by the sequence ontology (http://www.sequenceontology.org ), and specific definitions can be found there. Terms have a Parent/Child relationship with lower levels being more specific than higher levels. The match feature is used for ab initio reference results rather than the potentially better term predicted_gene because match is already handled correctly by most software and most databases like FlyBase already use it for that purpose (in part because predicted_gene was a latecomer to the ontology list and it is used more often to distinguish accepted models without human curation rather than reference predictions). Since match is an experimental_feature, it matches the expected separation between genes (biological_region) and analysis results (experimental_feature). It?s rather boring and technical, but it?s all the result of carful selection using the Sequence Ontology inheritance levels and term definitions. Example in attached image. ?Carson > On Feb 12, 2016, at 1:35 AM, Panos Ioannidis wrote: > > Hi guys, > > I have a few questions regarding annotated features in the GFF file built by Maker. > > 1) I'm a bit confused about the annotations coming from "est2genome" and "blastn", because they both give "expressed_sequence_match" features. So, what's the difference between them? How do the EST matches from est2genome differ from those from blastn? > > 2) Same goes for "protein2genome" and "blastx", since they both give "protein_match" features. > > 3) Last, what is the difference between the partial matches and full-length matches? For example, in almost all cases where est2genome gives an "expressed_sequence_match" feature for a genomic area, it also gives a "match_part" feature for sub-areas within this area. What is the meaning of this? I'm pasting one such area, below. > > scaffold3|size1771164 est2genome expressed_sequence_match 21953 22276 949 + . ID=scaffold3|size1771164:hit:1901:3.2.0.0;Name=C24476_a_3_0_l_241 > scaffold3|size1771164 est2genome match_part 21953 22035 949 + . ID=scaffold3|size1771164:hsp:1902:3.2.0.0;Parent=scaffold3|size1771164:hit:1901:3.2.0.0;Target=C24476_a_3_0_l_241 1 83 +;Gap=M83 > scaffold3|size1771164 est2genome match_part 22148 22276 949 + . ID=scaffold3|size1771164:hsp:1903:3.2.0.0;Parent=scaffold3|size1771164:hit:1901:3.2.0.0;Target=C24476_a_3_0_l_241 84 215 +;Gap=M104 D2 M7 I4 M8 I1 M8 > > Thanks, > Panos > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: SO-0000102.png Type: image/png Size: 7720 bytes Desc: not available URL: From carsonhh at gmail.com Fri Feb 12 07:56:41 2016 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 12 Feb 2016 07:56:41 -0700 Subject: [maker-devel] GFF features from Maker In-Reply-To: References: Message-ID: <1B5D7E98-850C-4D16-A5C1-5BE1EB5B8735@gmail.com> Also BLAST vs Exonerate is an algorithmic difference. BLAST aligns using traditional Smith Watmerman resulting in potenially out of order sub alignments called HSPs. Exonerate does spice aware alignments (in order and correctly trimmed for splice sites). More info on polishing alignments on wiki page here ?> http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial_for_GMOD_Online_Training_2014#Polishing_Evidence_Alignments ?Carson > On Feb 12, 2016, at 1:35 AM, Panos Ioannidis wrote: > > Hi guys, > > I have a few questions regarding annotated features in the GFF file built by Maker. > > 1) I'm a bit confused about the annotations coming from "est2genome" and "blastn", because they both give "expressed_sequence_match" features. So, what's the difference between them? How do the EST matches from est2genome differ from those from blastn? > > 2) Same goes for "protein2genome" and "blastx", since they both give "protein_match" features. > > 3) Last, what is the difference between the partial matches and full-length matches? For example, in almost all cases where est2genome gives an "expressed_sequence_match" feature for a genomic area, it also gives a "match_part" feature for sub-areas within this area. What is the meaning of this? I'm pasting one such area, below. > > scaffold3|size1771164 est2genome expressed_sequence_match 21953 22276 949 + . ID=scaffold3|size1771164:hit:1901:3.2.0.0;Name=C24476_a_3_0_l_241 > scaffold3|size1771164 est2genome match_part 21953 22035 949 + . ID=scaffold3|size1771164:hsp:1902:3.2.0.0;Parent=scaffold3|size1771164:hit:1901:3.2.0.0;Target=C24476_a_3_0_l_241 1 83 +;Gap=M83 > scaffold3|size1771164 est2genome match_part 22148 22276 949 + . ID=scaffold3|size1771164:hsp:1903:3.2.0.0;Parent=scaffold3|size1771164:hit:1901:3.2.0.0;Target=C24476_a_3_0_l_241 84 215 +;Gap=M104 D2 M7 I4 M8 I1 M8 > > Thanks, > Panos > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From panos.ioannidis at gmail.com Fri Feb 12 07:59:05 2016 From: panos.ioannidis at gmail.com (Panos Ioannidis) Date: Fri, 12 Feb 2016 15:59:05 +0100 Subject: [maker-devel] GFF features from Maker In-Reply-To: <1B5D7E98-850C-4D16-A5C1-5BE1EB5B8735@gmail.com> References: <1B5D7E98-850C-4D16-A5C1-5BE1EB5B8735@gmail.com> Message-ID: Thanks for all the info Carson! Panos On Fri, Feb 12, 2016 at 3:56 PM, Carson Holt wrote: > Also BLAST vs Exonerate is an algorithmic difference. BLAST aligns using > traditional Smith Watmerman resulting in potenially out of order sub > alignments called HSPs. Exonerate does spice aware alignments (in order and > correctly trimmed for splice sites). More info on polishing alignments on > wiki page here ?> > http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial_for_GMOD_Online_Training_2014#Polishing_Evidence_Alignments > > ?Carson > > > > On Feb 12, 2016, at 1:35 AM, Panos Ioannidis > wrote: > > Hi guys, > > I have a few questions regarding annotated features in the GFF file built > by Maker. > > 1) I'm a bit confused about the annotations coming from "est2genome" and > "blastn", because they both give "expressed_sequence_match" features. So, > what's the difference between them? How do the EST matches from est2genome > differ from those from blastn? > > 2) Same goes for "protein2genome" and "blastx", since they both give > "protein_match" features. > > 3) Last, what is the difference between the partial matches and > full-length matches? For example, in almost all cases where est2genome > gives an "expressed_sequence_match" feature for a genomic area, it also > gives a "match_part" feature for sub-areas within this area. What is the > meaning of this? I'm pasting one such area, below. > > scaffold3|size1771164 est2genome expressed_sequence_match > 21953 22276 949 + . > ID=scaffold3|size1771164:hit:1901:3.2.0.0;Name=C24476_a_3_0_l_241 > scaffold3|size1771164 est2genome match_part 21953 22035 > 949 + . > ID=scaffold3|size1771164:hsp:1902:3.2.0.0;Parent=scaffold3|size1771164:hit:1901:3.2.0.0;Target=C24476_a_3_0_l_241 > 1 83 +;Gap=M83 > scaffold3|size1771164 est2genome match_part 22148 22276 > 949 + . > ID=scaffold3|size1771164:hsp:1903:3.2.0.0;Parent=scaffold3|size1771164:hit:1901:3.2.0.0;Target=C24476_a_3_0_l_241 > 84 215 +;Gap=M104 D2 M7 I4 M8 I1 M8 > > Thanks, > Panos > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Feb 12 12:14:16 2016 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 12 Feb 2016 12:14:16 -0700 Subject: [maker-devel] Q on MAKER In-Reply-To: References: <07C1C0FA-243E-4F05-AE17-FBECCE5BC3A5@gmail.com> <980827F7-C4F0-4E26-A480-F588B2E09C7B@gmail.com> <99f6989955acdf6fd6b0875affbeefa9@uci.edu> Message-ID: You need to view the output the programs produce, not the HMM. You can Run them through MAKER and then view the GFF3 files produced Here is a MAKER tutorial where this is done that you can follow along if you wish ?> http://gmod.org/wiki/MAKER_Tutorial_2013#Training_ab_initio_Gene_Predictors For Augustus training there are a number of threads related to how to do that on the MAKER mailing list archives ? https://groups.google.com/forum/#!searchin/maker-devel/augustus Also other resources online ?> http://www.molecularevolution.org/molevolfiles/exercises/augustus/training.html ?Carson > On Feb 11, 2016, at 5:18 PM, hcma wrote: > > Hi Carson, > > I have downloaded Apollo and what format of the SNAP and Augustus models does Apollo take? Do i need to reformat the SNAP.hmm and which output of Augustus to use if I train Augustus manually? > > Thanks again for your time. > > Best Regards > Karen > > > > > On 2016-02-05 06:36, Carson Holt wrote: >> Hi Karen, >> There are many ways to train Augustus. I prefer to identify gene >> models in MAKER (GFF3) and use those to train both SNAP and Augustus. >> Here is a previous post on the topic ?> >> https://groups.google.com/forum/#!searchin/maker-devel/augustus/maker-devel/FWMSTdqWQqI/lC3miQtiCpwJ >> [1] >> In the end you need to look at the SNAP and Augustus models together >> with evidence alignments in a genome browser (like desktop Apollo). >> When everything is trained well, both SNAP and Augustus models will >> look like each other and both seem to look like the evidence >> alignments. >> Thanks, >> Carson >>> On Feb 4, 2016, at 5:52 PM, hcma wrote: >>> Hi, >>> I have a genome sequence and Trinity assembly for a new species and >>> I am wondering what are the best steps to take when using MAKER? >>> 1. I used the genome sequence and all assembled Trinity sequence to >>> do first run of MAKER in order to generate training set for SNAP and >>> Augustus. >>> In maker_opts.ctl: >>> genome=all-chromosome-r1.04.fasta >>> est=Trinity.fasta >>> est2genome=1 >>> 2. Train SNAP >>> 3. Train Augustus >>> When i train Augustus, i only supply genome and protein file, should >>> i also supply the trinity file here? >>> 4. what's the best parameter to use when running MAKER the second >>> time for obtaining the final annotation? I would prefer not to use >>> any external protein data. >>> genome=all-chromosome-r1.04.fasta >>> est=Trinity.fasta >>> est2genome=0 >>> SNAP >>> Augustus >>> Thanks. >>> Best Regards >>> KAren >> Links: >> ------ >> [1] >> https://groups.google.com/forum/#!searchin/maker-devel/augustus/maker-devel/FWMSTdqWQqI/lC3miQtiCpwJ -------------- next part -------------- An HTML attachment was scrubbed... URL: From fdolze at students.uni-mainz.de Tue Feb 16 03:10:03 2016 From: fdolze at students.uni-mainz.de (Florian) Date: Tue, 16 Feb 2016 11:10:03 +0100 Subject: [maker-devel] Estimated runtime on 180mb genome @ 128 cores? In-Reply-To: <56BC65E7.6000904@students.uni-mainz.de> References: <56BC65E7.6000904@students.uni-mainz.de> Message-ID: <56C2F57B.8020208@students.uni-mainz.de> Hi all, I am trying to run MAKER on a project of mine and since this is the first time I use MAKER I'd like to ask some more experienced users what I can expect in regard to resource consumption and runtime of MAKER. My genome data is: * 180.652.019 bp genome length * 5.292 Scaffolds * 34.136 bp median scaffold length * 2.056.324 bp longest * 272.065 bp N50 - I use a 73mb transcriptome assembly as EST Evidence - SwissProt as Protein Homology Evidence - 60kb custom repeat library for RepeatMasker For gene prediction I am running with a SNAP hmm I generated using CEGMA, GeneMark, and Augustus trained by their webservice. I have options est2genome and protein2genome turned on (=1) and use tRNAscan and snoscan. And other options as following: #-----MAKER Behavior Options max_dna_len=2100000 #length for dividing up contigs into chunks (increases/decreases memory usage) <--- Is this reasonable? min_contig=1 #skip genome contigs below this length (under 10kb are often useless) pred_flank=200 #flank for extending evidence clusters sent to gene predictors pred_stats=0 #report AED and QI statistics for all predictions as well as models AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and 1) min_protein=0 #require at least this many amino acids in predicted proteins alt_splice=0 #Take extra steps to try and find alternative splicing, 1 = yes, 0 = no always_complete=0 #extra steps to force start and stop codons, 1 = yes, 0 = no map_forward=0 #map names and attributes forward from old GFF3 genes, 1 = yes, 0 = no keep_preds=1 #Concordance threshold to add unsupported gene prediction (bound by 0 and 1) split_hit=10000 #length for the splitting of hits (expected max intron size for evidence alignments) single_exon=1 #consider single exon EST evidence when generating annotations, 1 = yes, 0 = no single_length=250 #min length required for single exon ESTs if 'single_exon is enabled' correct_est_fusion=0 #limits use of ESTs in annotation to avoid fusion genes The maker_bopts.ctl file is unchanged. (Basically I follow this guide https://github.com/sujaikumar/assemblage/blob/master/README-annotation.md) At the moment I am running this with openMPI as: mpiexec -mca btl ^openib -n 128 /project/molgen/Bio/maker-2.31.8_MPI-1.8.1/bin/maker -base maker_run1 -fix_nucleotides on 128 cores with 130GB of memory. First of all, are those options I use viable? Is it possible to guesstimate the runtime I can expect? 5 days? 20 days? And is it reasonable to use additional cores or will this not benefit much? Thanks for your insights, Florian -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Tue Feb 16 09:42:51 2016 From: dence at genetics.utah.edu (Daniel Ence) Date: Tue, 16 Feb 2016 16:42:51 +0000 Subject: [maker-devel] Estimated runtime on 180mb genome @ 128 cores? In-Reply-To: <56C2F57B.8020208@students.uni-mainz.de> References: <56BC65E7.6000904@students.uni-mainz.de> <56C2F57B.8020208@students.uni-mainz.de> Message-ID: Hi Florian, I don?t think you want est2genome or protein2genome turned on for this run. Est2genome is usually only used if you don?t have any ab-initio predictors trained; protein2genome should only be used if you have good reason not to expect any introns at a ll (for example, a prokaryotic genome). Also, you set the max_dna_len parameter for 2.1Mbp, which is larger than your N50. Setting this too large prevents MAKER from speeding up it?s analysis by splitting contigs/scaffolds across multiple processors. There?s usually no reason to change this from the default setting. With a good N50 like you have, you?ll probably get good results. ~Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 On Feb 16, 2016, at 3:10 AM, Florian > wrote: Hi all, I am trying to run MAKER on a project of mine and since this is the first time I use MAKER I'd like to ask some more experienced users what I can expect in regard to resource consumption and runtime of MAKER. My genome data is: * 180.652.019 bp genome length * 5.292 Scaffolds * 34.136 bp median scaffold length * 2.056.324 bp longest * 272.065 bp N50 - I use a 73mb transcriptome assembly as EST Evidence - SwissProt as Protein Homology Evidence - 60kb custom repeat library for RepeatMasker For gene prediction I am running with a SNAP hmm I generated using CEGMA, GeneMark, and Augustus trained by their webservice. I have options est2genome and protein2genome turned on (=1) and use tRNAscan and snoscan. And other options as following: #-----MAKER Behavior Options max_dna_len=2100000 #length for dividing up contigs into chunks (increases/decreases memory usage) <--- Is this reasonable? min_contig=1 #skip genome contigs below this length (under 10kb are often useless) pred_flank=200 #flank for extending evidence clusters sent to gene predictors pred_stats=0 #report AED and QI statistics for all predictions as well as models AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and 1) min_protein=0 #require at least this many amino acids in predicted proteins alt_splice=0 #Take extra steps to try and find alternative splicing, 1 = yes, 0 = no always_complete=0 #extra steps to force start and stop codons, 1 = yes, 0 = no map_forward=0 #map names and attributes forward from old GFF3 genes, 1 = yes, 0 = no keep_preds=1 #Concordance threshold to add unsupported gene prediction (bound by 0 and 1) split_hit=10000 #length for the splitting of hits (expected max intron size for evidence alignments) single_exon=1 #consider single exon EST evidence when generating annotations, 1 = yes, 0 = no single_length=250 #min length required for single exon ESTs if 'single_exon is enabled' correct_est_fusion=0 #limits use of ESTs in annotation to avoid fusion genes The maker_bopts.ctl file is unchanged. (Basically I follow this guide https://github.com/sujaikumar/assemblage/blob/master/README-annotation.md) At the moment I am running this with openMPI as: mpiexec -mca btl ^openib -n 128 /project/molgen/Bio/maker-2.31.8_MPI-1.8.1/bin/maker -base maker_run1 -fix_nucleotides on 128 cores with 130GB of memory. First of all, are those options I use viable? Is it possible to guesstimate the runtime I can expect? 5 days? 20 days? And is it reasonable to use additional cores or will this not benefit much? Thanks for your insights, Florian _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Feb 16 09:53:55 2016 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 16 Feb 2016 09:53:55 -0700 Subject: [maker-devel] Estimated runtime on 180mb genome @ 128 cores? In-Reply-To: References: <56BC65E7.6000904@students.uni-mainz.de> <56C2F57B.8020208@students.uni-mainz.de> Message-ID: Agree. 500,000 is about the highest you ever want to go with max_dna_len. Increasing the value decreases parallelization and increases memory usage. The only biological reason to ever increase it is if genes are really long and don?t fit into windows of this size. Also test out the mpiexec command with something like ?hostname? to make sure it works. Example ?> mpiexec -mca btl ^openib -n 128 hostname Should print out 128 lines identifying all hosts in the communication ring. If it prints out the same host ID every time, then there is a problem and you may need to provide a hostfile to let mpiexec know all the hosts it can run across. ?Carson > On Feb 16, 2016, at 9:42 AM, Daniel Ence wrote: > > Hi Florian, I don?t think you want est2genome or protein2genome turned on for this run. Est2genome is usually only used if you don?t have any ab-initio predictors trained; protein2genome should only be used if you have good reason not to expect any introns at a ll (for example, a prokaryotic genome). > > Also, you set the max_dna_len parameter for 2.1Mbp, which is larger than your N50. Setting this too large prevents MAKER from speeding up it?s analysis by splitting contigs/scaffolds across multiple processors. There?s usually no reason to change this from the default setting. > > With a good N50 like you have, you?ll probably get good results. > > ~Daniel > > > > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > >> On Feb 16, 2016, at 3:10 AM, Florian > wrote: >> >> Hi all, >> >> I am trying to run MAKER on a project of mine and since this is the first time I use MAKER I'd like to ask some more experienced users what I can expect in regard to resource consumption and runtime of MAKER. >> >> My genome data is: >> >> 180.652.019 bp genome length >> 5.292 Scaffolds >> 34.136 bp median scaffold length >> 2.056.324 bp longest >> 272.065 bp N50 >> - I use a 73mb transcriptome assembly as EST Evidence >> - SwissProt as Protein Homology Evidence >> - 60kb custom repeat library for RepeatMasker >> >> >> >> For gene prediction I am running with a SNAP hmm I generated using CEGMA, GeneMark, and Augustus trained by their webservice. >> I have options est2genome and protein2genome turned on (=1) and use tRNAscan and snoscan. And other options as following: >> >> #-----MAKER Behavior Options >> max_dna_len=2100000 #length for dividing up contigs into chunks (increases/decreases memory usage) <--- Is this reasonable? >> min_contig=1 #skip genome contigs below this length (under 10kb are often useless) >> >> pred_flank=200 #flank for extending evidence clusters sent to gene predictors >> pred_stats=0 #report AED and QI statistics for all predictions as well as models >> AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and 1) >> min_protein=0 #require at least this many amino acids in predicted proteins >> alt_splice=0 #Take extra steps to try and find alternative splicing, 1 = yes, 0 = no >> always_complete=0 #extra steps to force start and stop codons, 1 = yes, 0 = no >> map_forward=0 #map names and attributes forward from old GFF3 genes, 1 = yes, 0 = no >> keep_preds=1 #Concordance threshold to add unsupported gene prediction (bound by 0 and 1) >> >> split_hit=10000 #length for the splitting of hits (expected max intron size for evidence alignments) >> single_exon=1 #consider single exon EST evidence when generating annotations, 1 = yes, 0 = no >> single_length=250 #min length required for single exon ESTs if 'single_exon is enabled' >> correct_est_fusion=0 #limits use of ESTs in annotation to avoid fusion genes >> >> The maker_bopts.ctl file is unchanged. >> >> (Basically I follow this guide https://github.com/sujaikumar/assemblage/blob/master/README-annotation.md ) >> >> >> At the moment I am running this with openMPI as: >> >> mpiexec -mca btl ^openib -n 128 /project/molgen/Bio/maker-2.31.8_MPI-1.8.1/bin/maker -base maker_run1 -fix_nucleotides >> >> on 128 cores with 130GB of memory. >> >> >> First of all, are those options I use viable? >> >> Is it possible to guesstimate the runtime I can expect? 5 days? 20 days? And is it reasonable to use additional cores or will this not benefit much? >> >> Thanks for your insights, >> Florian >> >> >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From alejocn5 at gmail.com Tue Feb 16 14:17:40 2016 From: alejocn5 at gmail.com (=?UTF-8?Q?Alejandro_Cer=C3=B3n_Noriega?=) Date: Tue, 16 Feb 2016 16:17:40 -0500 Subject: [maker-devel] problem with the example Message-ID: hello i am Alejandro I have tried to follow the tutorial MAKER 1-I Copy the files in the data directories to a temporary directory where i run an example file. 2-I Type maker -CTL to generate generic MAKER control files (foto_1) 3-I edit the control files to include the path of the genome file ( hsap_contig.fasta from the example) (foto_2) then I give the paht maker maker_exe.ctl maker_opts.ctl maker_bopts.ctl (foto 3) that generate a expected folder hsap_contig.maker.output but when i whatn to look for the gff file i dont find it, inside the /data/hsap_contig.maker.output/hsap_contig_datastore, i dont find the all subdirectories seq_name.gff - a gff file that can be loaded into GMOD, GBROWSE, or Apollo * seq_name.maker.transcripts.fasta - a fasta file of the MAKER annotated transcript sequences * seq_name.maker.proteins.fasta - a fasta file of the MAKER annotated protein sequences * seq_name.maker.XXX.transcript.fasta - a fasta file of ab-initio predicted transcript sequences from program XXX * seq_name.maker.XXX.proteins.fasta - a fasta file of ab-inito predicted protein sequences from program XXX * seq_name.maker.non_overlapping_ab_initio.transcripts.fasta - a fasta file of filtered ab-inito transcript sequences that don't overlap maker annotations * seq_name.maker.non_overlapping_ab_initio.proteins.fasta - a fasta file of filtered ab-inito protein sequences that don't overlap maker annotations * theVoid.seq_name/ - a directory containing all of the raw output files produced by MAKER, including BLAST reports, SNAP output, exonnerate output and the masked genomeic sequence. i only find a directorie named 80 (foto 4) i dont know if a make somthing wrong, also try to change the path of the EST (foto_5) thanks for your attention -- *Alejandro Cer?n Noriega, **B.Sc* MSc. Candidate Bioinformatics *K ?**?**?* -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: foto_1.png Type: image/png Size: 67330 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: foto_2.png Type: image/png Size: 257578 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Foto_3.png Type: image/png Size: 213241 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: foto_4.png Type: image/png Size: 129352 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: foto_5.png Type: image/png Size: 255944 bytes Desc: not available URL: From carsonhh at gmail.com Thu Feb 18 12:36:13 2016 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 18 Feb 2016 12:36:13 -0700 Subject: [maker-devel] problem with the example In-Reply-To: References: Message-ID: <4CD9B36B-8C9D-4E48-B1B6-ACAFF28DF3B2@gmail.com> To access files for individual sequences use the datastore index: /scratchsan/caceronn/Results/MAKER/data/hsap_contig.maker.output/hsap_contig_master_datastore_index.log look in that file to find the location of individual contig results. For merged results you have to use the gff3_merge script together with the datastore index. Here is a nice tutorial with step by step instructions and a video to easilly follow along ?> http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial_for_GMOD_Online_Training_2014 ?Carson > On Feb 16, 2016, at 2:17 PM, Alejandro Cer?n Noriega wrote: > > hello i am Alejandro > > I have tried to follow the tutorial MAKER > > 1-I Copy the files in the data directories to a temporary directory where i run an example file. > 2-I Type maker -CTL to generate generic MAKER control files (foto_1) > 3-I edit the control files to include the path of the genome file ( hsap_contig.fasta from the example) (foto_2) > then I give the paht maker maker_exe.ctl maker_opts.ctl maker_bopts.ctl (foto 3) > > that generate a expected folder > hsap_contig.maker.output > > but when i whatn to look for the gff file i dont find it, inside the /data/hsap_contig.maker.output/hsap_contig_datastore, i dont find the all subdirectories > > seq_name.gff - a gff file that can be loaded into GMOD, GBROWSE, > or Apollo > * seq_name.maker.transcripts.fasta - a fasta file of the MAKER > annotated transcript sequences > * seq_name.maker.proteins.fasta - a fasta file of the MAKER > annotated protein sequences > * seq_name.maker.XXX.transcript.fasta - a fasta file of ab-initio > predicted transcript sequences from program XXX > * seq_name.maker.XXX.proteins.fasta - a fasta file of ab-inito > predicted protein sequences from program XXX > * seq_name.maker.non_overlapping_ab_initio.transcripts.fasta - a > fasta file of filtered ab-inito transcript sequences that don't > overlap maker annotations > * seq_name.maker.non_overlapping_ab_initio.proteins.fasta - a > fasta file of filtered ab-inito protein sequences that don't > overlap maker annotations > * theVoid.seq_name/ - a directory containing all of the raw > output files produced by MAKER, including BLAST reports, SNAP > output, exonnerate output and the masked genomeic sequence. > > i only find a directorie named 80 (foto 4) > > i dont know if a make somthing wrong, > > also try to change the path of the EST (foto_5) > > > thanks for your attention > > > -- > Alejandro Cer?n Noriega, B.Sc > MSc. Candidate Bioinformatics > K ??? > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From fdolze at students.uni-mainz.de Fri Feb 26 07:16:10 2016 From: fdolze at students.uni-mainz.de (Florian) Date: Fri, 26 Feb 2016 15:16:10 +0100 Subject: [maker-devel] Possible to redirect maker output? Message-ID: <56D05E2A.1040201@students.uni-mainz.de> Hi all, I am trying to run maker on a cluster (2 nodes with 64 cores each), to speed things up I copied all input files to a ramdisk to reduce I/O time, but all subsequent results are still written to hdd. Is there a way I can tell maker to write the maker.results files to ramdisk (or generally any other directory than the current working dir) too? (are they actually used for the current run or are only files in the temp files location used?) Is anybody experienced with running maker on a similar setup and could tell me how you are handling this? thanks, Florian From scott at scottcain.net Fri Feb 26 10:50:06 2016 From: scott at scottcain.net (Scott Cain) Date: Fri, 26 Feb 2016 12:50:06 -0500 Subject: [maker-devel] GMOD 2016 meeting Message-ID: Hello all, I am pleased to announce that details have been finalized for the 2016 GMOD meeting. It will take place immediately following the Galaxy Community Conference at Indiana University in Bloomington, IN on June 30 and July 1. We're still working on agenda details, so if you have suggestions or would like to present, please let me know. For registration information, please see: https://gmod2016.eventbrite.com And for other information about the meeting, keep an eye on: http://gmod.org/wiki/Jun_2016_GMOD_Meeting I look forward to seeing you there! Scott -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research -------------- next part -------------- An HTML attachment was scrubbed... URL: From gloriarendon at gmail.com Fri Feb 26 15:14:26 2016 From: gloriarendon at gmail.com (Gloria Rendon) Date: Fri, 26 Feb 2016 16:14:26 -0600 Subject: [maker-devel] MAKER/3.00.0-beta: missing some accessory scripts Message-ID: Hello, My name is Gloria Rendon. I work at the Carle Woese Institute for Genomic Biology at the University of Illinois at Urbana-Champaign. In recent months we used MAKER/3.00.0-beta to generate annotations (in GFF3 format) for a de-novo assembly that we produced in-house for the Taro plant. As part of the same project, I need to run now an analysis with RNA seq data for the same Tarospecies. I am going to use STAR for the alignment step and I need to provide the annotations file in GTF format, not in GFF3 format as I currently have. In order to perform the GFF3->GTF conversion I was planning to run some of the accessory scripts that come with MAKER add_utr_start_stop_gff gff3_2_gtf However I just noticed that my installation of MAKER is missing those two scripts. This is how the MAKER/bin folder looks like now: $ ls /home/groups/hpcbio/apps/maker/maker-3.00.0-beta/bin/ AED_cdf_generator.pl ipr_update_gff maker_map_ids cegma2zff iprscan2gff3 map2assembly chado2gff3 maker map_data_ids compare_gff3_to_chado maker2chado map_fasta_ids cufflinks2gff3 maker2eval_gtf map_gff_ids evaluator maker2jbrowse match2gene.pl fasta_merge maker2wap quality_filter.pl fasta_tool maker2zff tophat2gff3 genemark_gtf2gff3 maker_functional_fasta gff3_merge maker_functional_gff btw, earlier versions of MAKER that are also installed on our cluster as also missing those scripts. Could you please tell me how to remedy the situation? Do you have executables of the two scripts that you can share with me? OR Do I need to re-install MAKER with special configuration options? Thank you very much for the attention to this matter. Sincerely, Gloria -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Feb 29 12:09:14 2016 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 29 Feb 2016 12:09:14 -0700 Subject: [maker-devel] Possible to redirect maker output? In-Reply-To: <56D05E2A.1040201@students.uni-mainz.de> References: <56D05E2A.1040201@students.uni-mainz.de> Message-ID: <75FD2CDE-AD66-416A-9A3E-6AF49B3FB13F@gmail.com> You can try setting TMP= in the control files to a RAM disk location (You will need a lot of RAM though, perhaps 500Gb). Even then some components used by MAKER may not function properly with tmpfs, but you can try. If it doesn?t work you?ll get an error. The main output directory on the other hand must be globally accessible to all nodes if working with MPI, and a RAM disk will only exist and be accessible on a single node (even though a directory with the same name may exists on multiple nodes, they will actually be separate and distinct locations, i.e. /dev/shm). ?Carson > On Feb 26, 2016, at 7:16 AM, Florian wrote: > > Hi all, > > I am trying to run maker on a cluster (2 nodes with 64 cores each), to speed things up I copied all input files to a ramdisk to reduce I/O time, but all subsequent results are still written to hdd. > > Is there a way I can tell maker to write the maker.results files to ramdisk (or generally any other directory than the current working dir) too? (are they actually used for the current run or are only files in the temp files location used?) > > Is anybody experienced with running maker on a similar setup and could tell me how you are handling this? > > > thanks, > Florian > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Mon Feb 29 12:17:29 2016 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 29 Feb 2016 12:17:29 -0700 Subject: [maker-devel] MAKER/3.00.0-beta: missing some accessory scripts In-Reply-To: References: Message-ID: You should be using maker2eval_gtf. The scripts you mention were actually deprecated in MAKER 2.10 onwards (about 5 years ago). You may be looking at old documentation. ?Carson > On Feb 26, 2016, at 3:14 PM, Gloria Rendon wrote: > > Hello, > > My name is Gloria Rendon. I work at the Carle Woese Institute for Genomic Biology at the University of Illinois at Urbana-Champaign. > > In recent months we used MAKER/3.00.0-beta to generate annotations (in GFF3 format) for a de-novo assembly that we produced in-house for the Taro plant. > > As part of the same project, I need to run now an analysis with RNA seq data for the same Tarospecies. > I am going to use STAR for the alignment step and I need to provide the annotations file in GTF format, not in GFF3 format as I currently have. > > In order to perform the GFF3->GTF conversion I was planning to run some of the accessory scripts that come with MAKER > > add_utr_start_stop_gff > gff3_2_gtf > > However I just noticed that my installation of MAKER is missing those two scripts. > This is how the MAKER/bin folder looks like now: > > $ ls /home/groups/hpcbio/apps/maker/maker-3.00.0-beta/bin/ > AED_cdf_generator.pl ipr_update_gff maker_map_ids > cegma2zff iprscan2gff3 map2assembly > chado2gff3 maker map_data_ids > compare_gff3_to_chado maker2chado map_fasta_ids > cufflinks2gff3 maker2eval_gtf map_gff_ids > evaluator maker2jbrowse match2gene.pl > fasta_merge maker2wap quality_filter.pl > fasta_tool maker2zff tophat2gff3 > genemark_gtf2gff3 maker_functional_fasta > gff3_merge maker_functional_gff > > > btw, earlier versions of MAKER that are also installed on our cluster as also missing those scripts. > > Could you please tell me how to remedy the situation? > Do you have executables of the two scripts that you can share with me? > OR > Do I need to re-install MAKER with special configuration options? > > Thank you very much for the attention to this matter. > > Sincerely, > > Gloria > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From hcma at uci.edu Thu Feb 4 17:52:12 2016 From: hcma at uci.edu (hcma) Date: Thu, 04 Feb 2016 16:52:12 -0800 Subject: [maker-devel] Q on MAKER In-Reply-To: <980827F7-C4F0-4E26-A480-F588B2E09C7B@gmail.com> References: <07C1C0FA-243E-4F05-AE17-FBECCE5BC3A5@gmail.com> <980827F7-C4F0-4E26-A480-F588B2E09C7B@gmail.com> Message-ID: <99f6989955acdf6fd6b0875affbeefa9@uci.edu> Hi, I have a genome sequence and Trinity assembly for a new species and I am wondering what are the best steps to take when using MAKER? 1. I used the genome sequence and all assembled Trinity sequence to do first run of MAKER in order to generate training set for SNAP and Augustus. In maker_opts.ctl: genome=all-chromosome-r1.04.fasta est=Trinity.fasta est2genome=1 2. Train SNAP 3. Train Augustus When i train Augustus, i only supply genome and protein file, should i also supply the trinity file here? 4. what's the best parameter to use when running MAKER the second time for obtaining the final annotation? I would prefer not to use any external protein data. genome=all-chromosome-r1.04.fasta est=Trinity.fasta est2genome=0 SNAP Augustus Thanks. Best Regards KAren From carsonhh at gmail.com Fri Feb 5 07:36:06 2016 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 5 Feb 2016 07:36:06 -0700 Subject: [maker-devel] Q on MAKER In-Reply-To: <99f6989955acdf6fd6b0875affbeefa9@uci.edu> References: <07C1C0FA-243E-4F05-AE17-FBECCE5BC3A5@gmail.com> <980827F7-C4F0-4E26-A480-F588B2E09C7B@gmail.com> <99f6989955acdf6fd6b0875affbeefa9@uci.edu> Message-ID: Hi Karen, There are many ways to train Augustus. I prefer to identify gene models in MAKER (GFF3) and use those to train both SNAP and Augustus. Here is a previous post on the topic ?> https://groups.google.com/forum/#!searchin/maker-devel/augustus/maker-devel/FWMSTdqWQqI/lC3miQtiCpwJ In the end you need to look at the SNAP and Augustus models together with evidence alignments in a genome browser (like desktop Apollo). When everything is trained well, both SNAP and Augustus models will look like each other and both seem to look like the evidence alignments. Thanks, Carson > On Feb 4, 2016, at 5:52 PM, hcma wrote: > > Hi, > > I have a genome sequence and Trinity assembly for a new species and I am wondering what are the best steps to take when using MAKER? > > 1. I used the genome sequence and all assembled Trinity sequence to do first run of MAKER in order to generate training set for SNAP and Augustus. > > In maker_opts.ctl: > genome=all-chromosome-r1.04.fasta > est=Trinity.fasta > est2genome=1 > > > 2. Train SNAP > > 3. Train Augustus > > When i train Augustus, i only supply genome and protein file, should i also supply the trinity file here? > > > 4. what's the best parameter to use when running MAKER the second time for obtaining the final annotation? I would prefer not to use any external protein data. > > genome=all-chromosome-r1.04.fasta > est=Trinity.fasta > est2genome=0 > SNAP > Augustus > > Thanks. > > Best Regards > KAren -------------- next part -------------- An HTML attachment was scrubbed... URL: From hcma at uci.edu Fri Feb 5 15:42:37 2016 From: hcma at uci.edu (hcma) Date: Fri, 05 Feb 2016 14:42:37 -0800 Subject: [maker-devel] Q on MAKER In-Reply-To: References: <07C1C0FA-243E-4F05-AE17-FBECCE5BC3A5@gmail.com> <980827F7-C4F0-4E26-A480-F588B2E09C7B@gmail.com> <99f6989955acdf6fd6b0875affbeefa9@uci.edu> Message-ID: <308df3bd49b7098b6f0d36cb2675a0de@uci.edu> Hi Dr Holt, Thanks for the email. Here is my pipeline, does it seems acceptable? Any comments is welcome and much appreciated. 1. Use maker to generate training gene set: genome=all-chromosome-r1.04.fasta est=Trinity.fasta est2genome=1 2. Use output of Maker to train SNAP: maker2zff dwil-all-chromosome-r1.04.all.gff fathom genome.ann genome.dna ?gene-stats fathom genome.ann genome.dna ?categorize 1000 fathom genome.ann genome.dna ?gene-stats fathom uni.ann uni.dna ?export 1000 ?plus hmm-assembler.pl genome . > dwil_genome.hmm 3. Use output of Maker to train Augustus on their webserver: File used: Upload ?export.dna? as the genome file Upload ?export.aa? as the protein file 4. second and final Maker run: genome=all-chromosome-r1.04.fasta est=Trinity.fasta est2genome=0 Snaphmm=output of 2 How do i incorporate the output of training set of gene from Augustus web server here into this step 4? Thanks for your time. Best Regards Karen On 2016-02-05 06:36, Carson Holt wrote: > Hi Karen, > > There are many ways to train Augustus. I prefer to identify gene > models in MAKER (GFF3) and use those to train both SNAP and Augustus. > Here is a previous post on the topic ?> > https://groups.google.com/forum/#!searchin/maker-devel/augustus/maker-devel/FWMSTdqWQqI/lC3miQtiCpwJ > [1] > > In the end you need to look at the SNAP and Augustus models together > with evidence alignments in a genome browser (like desktop Apollo). > When everything is trained well, both SNAP and Augustus models will > look like each other and both seem to look like the evidence > alignments. > > Thanks, > Carson > >> On Feb 4, 2016, at 5:52 PM, hcma wrote: >> >> Hi, >> >> I have a genome sequence and Trinity assembly for a new species and >> I am wondering what are the best steps to take when using MAKER? >> >> 1. I used the genome sequence and all assembled Trinity sequence to >> do first run of MAKER in order to generate training set for SNAP and >> Augustus. >> >> In maker_opts.ctl: >> genome=all-chromosome-r1.04.fasta >> est=Trinity.fasta >> est2genome=1 >> >> 2. Train SNAP >> >> 3. Train Augustus >> >> When i train Augustus, i only supply genome and protein file, should >> i also supply the trinity file here? >> >> 4. what's the best parameter to use when running MAKER the second >> time for obtaining the final annotation? I would prefer not to use >> any external protein data. >> >> genome=all-chromosome-r1.04.fasta >> est=Trinity.fasta >> est2genome=0 >> SNAP >> Augustus >> >> Thanks. >> >> Best Regards >> KAren > > > > Links: > ------ > [1] > https://groups.google.com/forum/#!searchin/maker-devel/augustus/maker-devel/FWMSTdqWQqI/lC3miQtiCpwJ From carsonhh at gmail.com Fri Feb 5 15:54:58 2016 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 5 Feb 2016 15:54:58 -0700 Subject: [maker-devel] Q on MAKER In-Reply-To: <308df3bd49b7098b6f0d36cb2675a0de@uci.edu> References: <07C1C0FA-243E-4F05-AE17-FBECCE5BC3A5@gmail.com> <980827F7-C4F0-4E26-A480-F588B2E09C7B@gmail.com> <99f6989955acdf6fd6b0875affbeefa9@uci.edu> <308df3bd49b7098b6f0d36cb2675a0de@uci.edu> Message-ID: Augustus gives you an entire directory rather than just a single file like SNAP. You have to take the directory and copy it to the .../augustus/config/species/ directory. Example: ?/augustus/config/species/arabidopsis/ Then ?arabidopsis? would be the species name to use with MAKER. Sometimes you may have to do a second round of both SNAP and Augustus training (called bootstrapping). Look at the models you get after the first round, and if they look good then, the second round is probably not going top be beneficial. ?Carson > On Feb 5, 2016, at 3:42 PM, hcma wrote: > > Hi Dr Holt, > > Thanks for the email. Here is my pipeline, does it seems acceptable? Any comments is welcome and much appreciated. > > > 1. Use maker to generate training gene set: > > genome=all-chromosome-r1.04.fasta > est=Trinity.fasta > est2genome=1 > > > 2. Use output of Maker to train SNAP: > > maker2zff dwil-all-chromosome-r1.04.all.gff > fathom genome.ann genome.dna ?gene-stats > fathom genome.ann genome.dna ?categorize 1000 > fathom genome.ann genome.dna ?gene-stats > fathom uni.ann uni.dna ?export 1000 ?plus > hmm-assembler.pl genome . > dwil_genome.hmm > > > 3. Use output of Maker to train Augustus on their webserver: > > File used: > > Upload ?export.dna? as the genome file > Upload ?export.aa? as the protein file > > > > 4. second and final Maker run: > > > genome=all-chromosome-r1.04.fasta > est=Trinity.fasta > est2genome=0 > Snaphmm=output of 2 > > How do i incorporate the output of training set of gene from Augustus web server here into this step 4? > > Thanks for your time. > > Best Regards > Karen > > > > > > > > > > > > On 2016-02-05 06:36, Carson Holt wrote: >> Hi Karen, >> There are many ways to train Augustus. I prefer to identify gene >> models in MAKER (GFF3) and use those to train both SNAP and Augustus. >> Here is a previous post on the topic ?> >> https://groups.google.com/forum/#!searchin/maker-devel/augustus/maker-devel/FWMSTdqWQqI/lC3miQtiCpwJ >> [1] >> In the end you need to look at the SNAP and Augustus models together >> with evidence alignments in a genome browser (like desktop Apollo). >> When everything is trained well, both SNAP and Augustus models will >> look like each other and both seem to look like the evidence >> alignments. >> Thanks, >> Carson >>> On Feb 4, 2016, at 5:52 PM, hcma wrote: >>> Hi, >>> I have a genome sequence and Trinity assembly for a new species and >>> I am wondering what are the best steps to take when using MAKER? >>> 1. I used the genome sequence and all assembled Trinity sequence to >>> do first run of MAKER in order to generate training set for SNAP and >>> Augustus. >>> In maker_opts.ctl: >>> genome=all-chromosome-r1.04.fasta >>> est=Trinity.fasta >>> est2genome=1 >>> 2. Train SNAP >>> 3. Train Augustus >>> When i train Augustus, i only supply genome and protein file, should >>> i also supply the trinity file here? >>> 4. what's the best parameter to use when running MAKER the second >>> time for obtaining the final annotation? I would prefer not to use >>> any external protein data. >>> genome=all-chromosome-r1.04.fasta >>> est=Trinity.fasta >>> est2genome=0 >>> SNAP >>> Augustus >>> Thanks. >>> Best Regards >>> KAren >> Links: >> ------ >> [1] >> https://groups.google.com/forum/#!searchin/maker-devel/augustus/maker-devel/FWMSTdqWQqI/lC3miQtiCpwJ > From hcma at uci.edu Fri Feb 5 15:58:56 2016 From: hcma at uci.edu (hcma) Date: Fri, 05 Feb 2016 14:58:56 -0800 Subject: [maker-devel] Q on MAKER In-Reply-To: References: <07C1C0FA-243E-4F05-AE17-FBECCE5BC3A5@gmail.com> <980827F7-C4F0-4E26-A480-F588B2E09C7B@gmail.com> <99f6989955acdf6fd6b0875affbeefa9@uci.edu> <308df3bd49b7098b6f0d36cb2675a0de@uci.edu> Message-ID: <4b6492c5148151cc52c91f2d56c6532b@uci.edu> Hi Carlson, These are the list of directories under maker/2.31.8 bin data GMOD INSTALL lib LICENSE MWAS perl README RELEASE src Where can i find augustus/? Or i have to ask my system admin to install this? Thanks. Best Regards Karen On 2016-02-05 14:54, Carson Holt wrote: > Augustus gives you an entire directory rather than just a single file > like SNAP. You have to take the directory and copy it to the > .../augustus/config/species/ directory. > > Example: > ?/augustus/config/species/arabidopsis/ > > Then ?arabidopsis? would be the species name to use with MAKER. > > Sometimes you may have to do a second round of both SNAP and Augustus > training (called bootstrapping). Look at the models you get after the > first round, and if they look good then, the second round is probably > not going top be beneficial. > > ?Carson > > > >> On Feb 5, 2016, at 3:42 PM, hcma wrote: >> >> Hi Dr Holt, >> >> Thanks for the email. Here is my pipeline, does it seems acceptable? >> Any comments is welcome and much appreciated. >> >> >> 1. Use maker to generate training gene set: >> >> genome=all-chromosome-r1.04.fasta >> est=Trinity.fasta >> est2genome=1 >> >> >> 2. Use output of Maker to train SNAP: >> >> maker2zff dwil-all-chromosome-r1.04.all.gff >> fathom genome.ann genome.dna ?gene-stats >> fathom genome.ann genome.dna ?categorize 1000 >> fathom genome.ann genome.dna ?gene-stats >> fathom uni.ann uni.dna ?export 1000 ?plus >> hmm-assembler.pl genome . > dwil_genome.hmm >> >> >> 3. Use output of Maker to train Augustus on their webserver: >> >> File used: >> >> Upload ?export.dna? as the genome file >> Upload ?export.aa? as the protein file >> >> >> >> 4. second and final Maker run: >> >> >> genome=all-chromosome-r1.04.fasta >> est=Trinity.fasta >> est2genome=0 >> Snaphmm=output of 2 >> >> How do i incorporate the output of training set of gene from Augustus >> web server here into this step 4? >> >> Thanks for your time. >> >> Best Regards >> Karen >> >> >> >> >> >> >> >> >> >> >> >> On 2016-02-05 06:36, Carson Holt wrote: >>> Hi Karen, >>> There are many ways to train Augustus. I prefer to identify gene >>> models in MAKER (GFF3) and use those to train both SNAP and Augustus. >>> Here is a previous post on the topic ?> >>> https://groups.google.com/forum/#!searchin/maker-devel/augustus/maker-devel/FWMSTdqWQqI/lC3miQtiCpwJ >>> [1] >>> In the end you need to look at the SNAP and Augustus models together >>> with evidence alignments in a genome browser (like desktop Apollo). >>> When everything is trained well, both SNAP and Augustus models will >>> look like each other and both seem to look like the evidence >>> alignments. >>> Thanks, >>> Carson >>>> On Feb 4, 2016, at 5:52 PM, hcma wrote: >>>> Hi, >>>> I have a genome sequence and Trinity assembly for a new species and >>>> I am wondering what are the best steps to take when using MAKER? >>>> 1. I used the genome sequence and all assembled Trinity sequence to >>>> do first run of MAKER in order to generate training set for SNAP and >>>> Augustus. >>>> In maker_opts.ctl: >>>> genome=all-chromosome-r1.04.fasta >>>> est=Trinity.fasta >>>> est2genome=1 >>>> 2. Train SNAP >>>> 3. Train Augustus >>>> When i train Augustus, i only supply genome and protein file, should >>>> i also supply the trinity file here? >>>> 4. what's the best parameter to use when running MAKER the second >>>> time for obtaining the final annotation? I would prefer not to use >>>> any external protein data. >>>> genome=all-chromosome-r1.04.fasta >>>> est=Trinity.fasta >>>> est2genome=0 >>>> SNAP >>>> Augustus >>>> Thanks. >>>> Best Regards >>>> KAren >>> Links: >>> ------ >>> [1] >>> https://groups.google.com/forum/#!searchin/maker-devel/augustus/maker-devel/FWMSTdqWQqI/lC3miQtiCpwJ >> From carsonhh at gmail.com Fri Feb 5 16:03:56 2016 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 5 Feb 2016 16:03:56 -0700 Subject: [maker-devel] Q on MAKER In-Reply-To: <4b6492c5148151cc52c91f2d56c6532b@uci.edu> References: <07C1C0FA-243E-4F05-AE17-FBECCE5BC3A5@gmail.com> <980827F7-C4F0-4E26-A480-F588B2E09C7B@gmail.com> <99f6989955acdf6fd6b0875affbeefa9@uci.edu> <308df3bd49b7098b6f0d36cb2675a0de@uci.edu> <4b6492c5148151cc52c91f2d56c6532b@uci.edu> Message-ID: <54B64F8D-D4A6-4E0F-9693-3AC4055CEA02@gmail.com> You need to find out where the augustus MAKER is using is installed. Check the maker_exe.ctl file you are using, or type ?which augustus?. ?Carson > On Feb 5, 2016, at 3:58 PM, hcma wrote: > > Hi Carlson, > > These are the list of directories under maker/2.31.8 > > bin data GMOD INSTALL lib LICENSE MWAS perl README RELEASE src > > > Where can i find augustus/? Or i have to ask my system admin to install this? > > Thanks. > > Best Regards > Karen > > > > > On 2016-02-05 14:54, Carson Holt wrote: >> Augustus gives you an entire directory rather than just a single file >> like SNAP. You have to take the directory and copy it to the >> .../augustus/config/species/ directory. >> Example: >> ?/augustus/config/species/arabidopsis/ >> Then ?arabidopsis? would be the species name to use with MAKER. >> Sometimes you may have to do a second round of both SNAP and Augustus >> training (called bootstrapping). Look at the models you get after the >> first round, and if they look good then, the second round is probably >> not going top be beneficial. >> ?Carson >>> On Feb 5, 2016, at 3:42 PM, hcma wrote: >>> Hi Dr Holt, >>> Thanks for the email. Here is my pipeline, does it seems acceptable? Any comments is welcome and much appreciated. >>> 1. Use maker to generate training gene set: >>> genome=all-chromosome-r1.04.fasta >>> est=Trinity.fasta >>> est2genome=1 >>> 2. Use output of Maker to train SNAP: >>> maker2zff dwil-all-chromosome-r1.04.all.gff >>> fathom genome.ann genome.dna ?gene-stats >>> fathom genome.ann genome.dna ?categorize 1000 >>> fathom genome.ann genome.dna ?gene-stats >>> fathom uni.ann uni.dna ?export 1000 ?plus >>> hmm-assembler.pl genome . > dwil_genome.hmm >>> 3. Use output of Maker to train Augustus on their webserver: >>> File used: >>> Upload ?export.dna? as the genome file >>> Upload ?export.aa? as the protein file >>> 4. second and final Maker run: >>> genome=all-chromosome-r1.04.fasta >>> est=Trinity.fasta >>> est2genome=0 >>> Snaphmm=output of 2 >>> How do i incorporate the output of training set of gene from Augustus web server here into this step 4? >>> Thanks for your time. >>> Best Regards >>> Karen >>> On 2016-02-05 06:36, Carson Holt wrote: >>>> Hi Karen, >>>> There are many ways to train Augustus. I prefer to identify gene >>>> models in MAKER (GFF3) and use those to train both SNAP and Augustus. >>>> Here is a previous post on the topic ?> >>>> https://groups.google.com/forum/#!searchin/maker-devel/augustus/maker-devel/FWMSTdqWQqI/lC3miQtiCpwJ >>>> [1] >>>> In the end you need to look at the SNAP and Augustus models together >>>> with evidence alignments in a genome browser (like desktop Apollo). >>>> When everything is trained well, both SNAP and Augustus models will >>>> look like each other and both seem to look like the evidence >>>> alignments. >>>> Thanks, >>>> Carson >>>>> On Feb 4, 2016, at 5:52 PM, hcma wrote: >>>>> Hi, >>>>> I have a genome sequence and Trinity assembly for a new species and >>>>> I am wondering what are the best steps to take when using MAKER? >>>>> 1. I used the genome sequence and all assembled Trinity sequence to >>>>> do first run of MAKER in order to generate training set for SNAP and >>>>> Augustus. >>>>> In maker_opts.ctl: >>>>> genome=all-chromosome-r1.04.fasta >>>>> est=Trinity.fasta >>>>> est2genome=1 >>>>> 2. Train SNAP >>>>> 3. Train Augustus >>>>> When i train Augustus, i only supply genome and protein file, should >>>>> i also supply the trinity file here? >>>>> 4. what's the best parameter to use when running MAKER the second >>>>> time for obtaining the final annotation? I would prefer not to use >>>>> any external protein data. >>>>> genome=all-chromosome-r1.04.fasta >>>>> est=Trinity.fasta >>>>> est2genome=0 >>>>> SNAP >>>>> Augustus >>>>> Thanks. >>>>> Best Regards >>>>> KAren >>>> Links: >>>> ------ >>>> [1] >>>> https://groups.google.com/forum/#!searchin/maker-devel/augustus/maker-devel/FWMSTdqWQqI/lC3miQtiCpwJ > From hcma at uci.edu Fri Feb 5 16:20:26 2016 From: hcma at uci.edu (hcma) Date: Fri, 05 Feb 2016 15:20:26 -0800 Subject: [maker-devel] Q on MAKER In-Reply-To: <54B64F8D-D4A6-4E0F-9693-3AC4055CEA02@gmail.com> References: <07C1C0FA-243E-4F05-AE17-FBECCE5BC3A5@gmail.com> <980827F7-C4F0-4E26-A480-F588B2E09C7B@gmail.com> <99f6989955acdf6fd6b0875affbeefa9@uci.edu> <308df3bd49b7098b6f0d36cb2675a0de@uci.edu> <4b6492c5148151cc52c91f2d56c6532b@uci.edu> <54B64F8D-D4A6-4E0F-9693-3AC4055CEA02@gmail.com> Message-ID: <5a40b7af9947dc8297046ba52620569e@uci.edu> Hi Carlson, Thanks for the instruction and in maker_exe.ctl, i only see path to snap, but not to augustus, so my system admin is checking this for me. From some manual i found, people use both snap and augustus when using MAKER to annotate genomes. Would you recommend using both or one of the 2 is sufficient? Thanks for your valuable time and advise. Best Regards Karen On 2016-02-05 15:03, Carson Holt wrote: > You need to find out where the augustus MAKER is using is installed. > Check the maker_exe.ctl file you are using, or type ?which augustus?. > > ?Carson > > >> On Feb 5, 2016, at 3:58 PM, hcma wrote: >> >> Hi Carlson, >> >> These are the list of directories under maker/2.31.8 >> >> bin data GMOD INSTALL lib LICENSE MWAS perl README RELEASE >> src >> >> >> Where can i find augustus/? Or i have to ask my system admin to >> install this? >> >> Thanks. >> >> Best Regards >> Karen >> >> >> >> >> On 2016-02-05 14:54, Carson Holt wrote: >>> Augustus gives you an entire directory rather than just a single file >>> like SNAP. You have to take the directory and copy it to the >>> .../augustus/config/species/ directory. >>> Example: >>> ?/augustus/config/species/arabidopsis/ >>> Then ?arabidopsis? would be the species name to use with MAKER. >>> Sometimes you may have to do a second round of both SNAP and Augustus >>> training (called bootstrapping). Look at the models you get after the >>> first round, and if they look good then, the second round is probably >>> not going top be beneficial. >>> ?Carson >>>> On Feb 5, 2016, at 3:42 PM, hcma wrote: >>>> Hi Dr Holt, >>>> Thanks for the email. Here is my pipeline, does it seems acceptable? >>>> Any comments is welcome and much appreciated. >>>> 1. Use maker to generate training gene set: >>>> genome=all-chromosome-r1.04.fasta >>>> est=Trinity.fasta >>>> est2genome=1 >>>> 2. Use output of Maker to train SNAP: >>>> maker2zff dwil-all-chromosome-r1.04.all.gff >>>> fathom genome.ann genome.dna ?gene-stats >>>> fathom genome.ann genome.dna ?categorize 1000 >>>> fathom genome.ann genome.dna ?gene-stats >>>> fathom uni.ann uni.dna ?export 1000 ?plus >>>> hmm-assembler.pl genome . > dwil_genome.hmm >>>> 3. Use output of Maker to train Augustus on their webserver: >>>> File used: >>>> Upload ?export.dna? as the genome file >>>> Upload ?export.aa? as the protein file >>>> 4. second and final Maker run: >>>> genome=all-chromosome-r1.04.fasta >>>> est=Trinity.fasta >>>> est2genome=0 >>>> Snaphmm=output of 2 >>>> How do i incorporate the output of training set of gene from >>>> Augustus web server here into this step 4? >>>> Thanks for your time. >>>> Best Regards >>>> Karen >>>> On 2016-02-05 06:36, Carson Holt wrote: >>>>> Hi Karen, >>>>> There are many ways to train Augustus. I prefer to identify gene >>>>> models in MAKER (GFF3) and use those to train both SNAP and >>>>> Augustus. >>>>> Here is a previous post on the topic ?> >>>>> https://groups.google.com/forum/#!searchin/maker-devel/augustus/maker-devel/FWMSTdqWQqI/lC3miQtiCpwJ >>>>> [1] >>>>> In the end you need to look at the SNAP and Augustus models >>>>> together >>>>> with evidence alignments in a genome browser (like desktop Apollo). >>>>> When everything is trained well, both SNAP and Augustus models will >>>>> look like each other and both seem to look like the evidence >>>>> alignments. >>>>> Thanks, >>>>> Carson >>>>>> On Feb 4, 2016, at 5:52 PM, hcma wrote: >>>>>> Hi, >>>>>> I have a genome sequence and Trinity assembly for a new species >>>>>> and >>>>>> I am wondering what are the best steps to take when using MAKER? >>>>>> 1. I used the genome sequence and all assembled Trinity sequence >>>>>> to >>>>>> do first run of MAKER in order to generate training set for SNAP >>>>>> and >>>>>> Augustus. >>>>>> In maker_opts.ctl: >>>>>> genome=all-chromosome-r1.04.fasta >>>>>> est=Trinity.fasta >>>>>> est2genome=1 >>>>>> 2. Train SNAP >>>>>> 3. Train Augustus >>>>>> When i train Augustus, i only supply genome and protein file, >>>>>> should >>>>>> i also supply the trinity file here? >>>>>> 4. what's the best parameter to use when running MAKER the second >>>>>> time for obtaining the final annotation? I would prefer not to use >>>>>> any external protein data. >>>>>> genome=all-chromosome-r1.04.fasta >>>>>> est=Trinity.fasta >>>>>> est2genome=0 >>>>>> SNAP >>>>>> Augustus >>>>>> Thanks. >>>>>> Best Regards >>>>>> KAren >>>>> Links: >>>>> ------ >>>>> [1] >>>>> https://groups.google.com/forum/#!searchin/maker-devel/augustus/maker-devel/FWMSTdqWQqI/lC3miQtiCpwJ >> From carsonhh at gmail.com Fri Feb 5 16:33:23 2016 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 5 Feb 2016 16:33:23 -0700 Subject: [maker-devel] Q on MAKER In-Reply-To: <5a40b7af9947dc8297046ba52620569e@uci.edu> References: <07C1C0FA-243E-4F05-AE17-FBECCE5BC3A5@gmail.com> <980827F7-C4F0-4E26-A480-F588B2E09C7B@gmail.com> <99f6989955acdf6fd6b0875affbeefa9@uci.edu> <308df3bd49b7098b6f0d36cb2675a0de@uci.edu> <4b6492c5148151cc52c91f2d56c6532b@uci.edu> <54B64F8D-D4A6-4E0F-9693-3AC4055CEA02@gmail.com> <5a40b7af9947dc8297046ba52620569e@uci.edu> Message-ID: <9449FD1E-5D22-4B0B-A61A-644D7B63335B@gmail.com> I recommend using both. You probably don't have augustus installed. --Carson Sent from my iPhone > On Feb 5, 2016, at 4:20 PM, hcma wrote: > > Hi Carlson, > > Thanks for the instruction and in maker_exe.ctl, i only see path to snap, but not to augustus, so my system admin is checking this for me. > > From some manual i found, people use both snap and augustus when using MAKER to annotate genomes. Would you recommend using both or one of the 2 is sufficient? > > Thanks for your valuable time and advise. > > Best Regards > Karen > > > > > >> On 2016-02-05 15:03, Carson Holt wrote: >> You need to find out where the augustus MAKER is using is installed. >> Check the maker_exe.ctl file you are using, or type ?which augustus?. >> ?Carson >>> On Feb 5, 2016, at 3:58 PM, hcma wrote: >>> Hi Carlson, >>> These are the list of directories under maker/2.31.8 >>> bin data GMOD INSTALL lib LICENSE MWAS perl README RELEASE src >>> Where can i find augustus/? Or i have to ask my system admin to install this? >>> Thanks. >>> Best Regards >>> Karen >>>> On 2016-02-05 14:54, Carson Holt wrote: >>>> Augustus gives you an entire directory rather than just a single file >>>> like SNAP. You have to take the directory and copy it to the >>>> .../augustus/config/species/ directory. >>>> Example: >>>> ?/augustus/config/species/arabidopsis/ >>>> Then ?arabidopsis? would be the species name to use with MAKER. >>>> Sometimes you may have to do a second round of both SNAP and Augustus >>>> training (called bootstrapping). Look at the models you get after the >>>> first round, and if they look good then, the second round is probably >>>> not going top be beneficial. >>>> ?Carson >>>>> On Feb 5, 2016, at 3:42 PM, hcma wrote: >>>>> Hi Dr Holt, >>>>> Thanks for the email. Here is my pipeline, does it seems acceptable? Any comments is welcome and much appreciated. >>>>> 1. Use maker to generate training gene set: >>>>> genome=all-chromosome-r1.04.fasta >>>>> est=Trinity.fasta >>>>> est2genome=1 >>>>> 2. Use output of Maker to train SNAP: >>>>> maker2zff dwil-all-chromosome-r1.04.all.gff >>>>> fathom genome.ann genome.dna ?gene-stats >>>>> fathom genome.ann genome.dna ?categorize 1000 >>>>> fathom genome.ann genome.dna ?gene-stats >>>>> fathom uni.ann uni.dna ?export 1000 ?plus >>>>> hmm-assembler.pl genome . > dwil_genome.hmm >>>>> 3. Use output of Maker to train Augustus on their webserver: >>>>> File used: >>>>> Upload ?export.dna? as the genome file >>>>> Upload ?export.aa? as the protein file >>>>> 4. second and final Maker run: >>>>> genome=all-chromosome-r1.04.fasta >>>>> est=Trinity.fasta >>>>> est2genome=0 >>>>> Snaphmm=output of 2 >>>>> How do i incorporate the output of training set of gene from Augustus web server here into this step 4? >>>>> Thanks for your time. >>>>> Best Regards >>>>> Karen >>>>>> On 2016-02-05 06:36, Carson Holt wrote: >>>>>> Hi Karen, >>>>>> There are many ways to train Augustus. I prefer to identify gene >>>>>> models in MAKER (GFF3) and use those to train both SNAP and Augustus. >>>>>> Here is a previous post on the topic ?> >>>>>> https://groups.google.com/forum/#!searchin/maker-devel/augustus/maker-devel/FWMSTdqWQqI/lC3miQtiCpwJ >>>>>> [1] >>>>>> In the end you need to look at the SNAP and Augustus models together >>>>>> with evidence alignments in a genome browser (like desktop Apollo). >>>>>> When everything is trained well, both SNAP and Augustus models will >>>>>> look like each other and both seem to look like the evidence >>>>>> alignments. >>>>>> Thanks, >>>>>> Carson >>>>>>> On Feb 4, 2016, at 5:52 PM, hcma wrote: >>>>>>> Hi, >>>>>>> I have a genome sequence and Trinity assembly for a new species and >>>>>>> I am wondering what are the best steps to take when using MAKER? >>>>>>> 1. I used the genome sequence and all assembled Trinity sequence to >>>>>>> do first run of MAKER in order to generate training set for SNAP and >>>>>>> Augustus. >>>>>>> In maker_opts.ctl: >>>>>>> genome=all-chromosome-r1.04.fasta >>>>>>> est=Trinity.fasta >>>>>>> est2genome=1 >>>>>>> 2. Train SNAP >>>>>>> 3. Train Augustus >>>>>>> When i train Augustus, i only supply genome and protein file, should >>>>>>> i also supply the trinity file here? >>>>>>> 4. what's the best parameter to use when running MAKER the second >>>>>>> time for obtaining the final annotation? I would prefer not to use >>>>>>> any external protein data. >>>>>>> genome=all-chromosome-r1.04.fasta >>>>>>> est=Trinity.fasta >>>>>>> est2genome=0 >>>>>>> SNAP >>>>>>> Augustus >>>>>>> Thanks. >>>>>>> Best Regards >>>>>>> KAren >>>>>> Links: >>>>>> ------ >>>>>> [1] >>>>>> https://groups.google.com/forum/#!searchin/maker-devel/augustus/maker-devel/FWMSTdqWQqI/lC3miQtiCpwJ > From dcard at uta.edu Mon Feb 8 09:05:21 2016 From: dcard at uta.edu (Card, Daren C) Date: Mon, 8 Feb 2016 10:05:21 -0600 Subject: [maker-devel] Most scaffolds fail with BadParameter error, Maker on TACC Lonestar Message-ID: <38614065-4DEF-47B4-8100-BD18901D2592@uta.edu> Hello, I?ve tried to run Maker on TACC Lonestar (4, trying to squeeze some last things in before deprecation), but I haven?t had much success. I get Maker to run, but only 28 proteins/transcripts are annotated and most scaffolds fail to finish properly, according to the ?master_datastore_index.log. In my STDERR, I see a consistent error show up for many scaffolds: ------------- EXCEPTION: Bio::Root::BadParameter ------------- MSG: ' 7.5' is not a valid score VALUE: 7.5 STACK: Error::throw STACK: Bio::Root::Root::throw /opt/apps/maker/2.30/bin/../perl/lib/Bio/Root/Root.pm:486 STACK: Bio::SeqFeature::Generic::score /opt/apps/maker/2.30/bin/../perl/lib/Bio/SeqFeature/Generic.pm:468 STACK: GFFDB::_ary_to_features /opt/apps/maker/2.30/bin/../lib/GFFDB.pm:891 STACK: GFFDB::phathits_on_chunk /opt/apps/maker/2.30/bin/../lib/GFFDB.pm:534 STACK: Process::MpiChunk::_go /opt/apps/maker/2.30/bin/../lib/Process/MpiChunk.pm:756 STACK: Process::MpiChunk::run /opt/apps/maker/2.30/bin/../lib/Process/MpiChunk.pm:341 STACK: main::node_thread /opt/apps/maker/2.30/bin/maker:1433 STACK: threads::new /opt/apps/maker/2.30/bin/../perl/lib/forks.pm:799 STACK: /opt/apps/maker/2.30/bin/maker:901 -------------------------------------------------------------- --> rank=18, hostname=c304-113.ls4.tacc.utexas.edu ERROR: Failed while doing repeat masking ERROR: Chunk failed at level:0, tier_type:1 FAILED CONTIG:scaffold279|size418813 The ?7.5? value can vary between errors, but other than that and the scaffold ID, the rest of the error message is the same. I obviously don?t have the expertise to diagnose the issue here, but I?m hoping someone can help me sort this out. A quick, unrelated question, is whether the Yandell lab (or anyone else) has a script that will produce a CDS file (multi-FASTA file) from a GFF annotation and FASTA genome sequence. I?m trying to produce a CDS from some NCBI genomes (annoying that it isn?t already included from NCBI), but the script I produced to do this is giving some suspect results. I figured if anyone had a well-tested script for this purpose, it would be someone on this list. Best, Daren Daren Card Ph.D. Candidate Castoe Lab University of Texas at Arlington dcard at uta.edu www.darencard.net From carsonhh at gmail.com Mon Feb 8 09:31:08 2016 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 8 Feb 2016 09:31:08 -0700 Subject: [maker-devel] Most scaffolds fail with BadParameter error, Maker on TACC Lonestar In-Reply-To: <38614065-4DEF-47B4-8100-BD18901D2592@uta.edu> References: <38614065-4DEF-47B4-8100-BD18901D2592@uta.edu> Message-ID: <9BA957A0-DD0F-4920-A778-65D0DE10F1ED@gmail.com> It?s failing because there is something wrong with the format of the input GFF file. It might not be GFF3, it may be GTF format, it may have mixed types (not just gene/mRNA/exon/CDS models), or it may have a missing Parent= or ID= tag required to generate the proper feature relationship. You can try and use GAL (http://www.sequenceontology.org/software/GAL.html ) to help validate of convert the format. Also note the message ?> MSG: ' 7.5' is not a valid score There is an extra whitespace inside the single quotes which probably means you have contaminating whitespace before the value. GFF3 is tab delimited, space characters are not permitted, and if required must be escaped following URI escaping convention. ?Carson > On Feb 8, 2016, at 9:05 AM, Card, Daren C wrote: > > Hello, > > I?ve tried to run Maker on TACC Lonestar (4, trying to squeeze some last things in before deprecation), but I haven?t had much success. I get Maker to run, but only 28 proteins/transcripts are annotated and most scaffolds fail to finish properly, according to the ?master_datastore_index.log. In my STDERR, I see a consistent error show up for many scaffolds: > > ------------- EXCEPTION: Bio::Root::BadParameter ------------- > MSG: ' 7.5' is not a valid score > VALUE: 7.5 > STACK: Error::throw > STACK: Bio::Root::Root::throw /opt/apps/maker/2.30/bin/../perl/lib/Bio/Root/Root.pm:486 > STACK: Bio::SeqFeature::Generic::score /opt/apps/maker/2.30/bin/../perl/lib/Bio/SeqFeature/Generic.pm:468 > STACK: GFFDB::_ary_to_features /opt/apps/maker/2.30/bin/../lib/GFFDB.pm:891 > STACK: GFFDB::phathits_on_chunk /opt/apps/maker/2.30/bin/../lib/GFFDB.pm:534 > STACK: Process::MpiChunk::_go /opt/apps/maker/2.30/bin/../lib/Process/MpiChunk.pm:756 > STACK: Process::MpiChunk::run /opt/apps/maker/2.30/bin/../lib/Process/MpiChunk.pm:341 > STACK: main::node_thread /opt/apps/maker/2.30/bin/maker:1433 > STACK: threads::new /opt/apps/maker/2.30/bin/../perl/lib/forks.pm:799 > STACK: /opt/apps/maker/2.30/bin/maker:901 > -------------------------------------------------------------- > --> rank=18, hostname=c304-113.ls4.tacc.utexas.edu > ERROR: Failed while doing repeat masking > ERROR: Chunk failed at level:0, tier_type:1 > FAILED CONTIG:scaffold279|size418813 > > The ?7.5? value can vary between errors, but other than that and the scaffold ID, the rest of the error message is the same. I obviously don?t have the expertise to diagnose the issue here, but I?m hoping someone can help me sort this out. > > A quick, unrelated question, is whether the Yandell lab (or anyone else) has a script that will produce a CDS file (multi-FASTA file) from a GFF annotation and FASTA genome sequence. I?m trying to produce a CDS from some NCBI genomes (annoying that it isn?t already included from NCBI), but the script I produced to do this is giving some suspect results. I figured if anyone had a well-tested script for this purpose, it would be someone on this list. > > Best, > Daren > > > Daren Card > Ph.D. Candidate > Castoe Lab > University of Texas at Arlington > dcard at uta.edu > www.darencard.net > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From hcma at uci.edu Tue Feb 9 15:35:13 2016 From: hcma at uci.edu (hcma) Date: Tue, 09 Feb 2016 14:35:13 -0800 Subject: [maker-devel] Q on MAKER In-Reply-To: <9449FD1E-5D22-4B0B-A61A-644D7B63335B@gmail.com> References: <07C1C0FA-243E-4F05-AE17-FBECCE5BC3A5@gmail.com> <980827F7-C4F0-4E26-A480-F588B2E09C7B@gmail.com> <99f6989955acdf6fd6b0875affbeefa9@uci.edu> <308df3bd49b7098b6f0d36cb2675a0de@uci.edu> <4b6492c5148151cc52c91f2d56c6532b@uci.edu> <54B64F8D-D4A6-4E0F-9693-3AC4055CEA02@gmail.com> <5a40b7af9947dc8297046ba52620569e@uci.edu> <9449FD1E-5D22-4B0B-A61A-644D7B63335B@gmail.com> Message-ID: <7e4d6f2773f654f8530155936b648832@uci.edu> Hi Carson, For the final run of annotation, I would like to incorporate tophat results from RNA-seq data, from your experience, do you know if it is better to use raw RNA-seq (Illumina paired-end data) or trimmed (trimmed using Trimmomatuc) data for feeding into tophat? If trimmed, do you recommend a particular programme? Thanks for your time. Best Regards KAren On 2016-02-05 15:33, Carson Holt wrote: > I recommend using both. You probably don't have augustus installed. > > --Carson > > Sent from my iPhone > >> On Feb 5, 2016, at 4:20 PM, hcma wrote: >> >> Hi Carlson, >> >> Thanks for the instruction and in maker_exe.ctl, i only see path to >> snap, but not to augustus, so my system admin is checking this for me. >> >> From some manual i found, people use both snap and augustus when using >> MAKER to annotate genomes. Would you recommend using both or one of >> the 2 is sufficient? >> >> Thanks for your valuable time and advise. >> >> Best Regards >> Karen >> >> >> >> >> >>> On 2016-02-05 15:03, Carson Holt wrote: >>> You need to find out where the augustus MAKER is using is installed. >>> Check the maker_exe.ctl file you are using, or type ?which augustus?. >>> ?Carson >>>> On Feb 5, 2016, at 3:58 PM, hcma wrote: >>>> Hi Carlson, >>>> These are the list of directories under maker/2.31.8 >>>> bin data GMOD INSTALL lib LICENSE MWAS perl README RELEASE >>>> src >>>> Where can i find augustus/? Or i have to ask my system admin to >>>> install this? >>>> Thanks. >>>> Best Regards >>>> Karen >>>>> On 2016-02-05 14:54, Carson Holt wrote: >>>>> Augustus gives you an entire directory rather than just a single >>>>> file >>>>> like SNAP. You have to take the directory and copy it to the >>>>> .../augustus/config/species/ directory. >>>>> Example: >>>>> ?/augustus/config/species/arabidopsis/ >>>>> Then ?arabidopsis? would be the species name to use with MAKER. >>>>> Sometimes you may have to do a second round of both SNAP and >>>>> Augustus >>>>> training (called bootstrapping). Look at the models you get after >>>>> the >>>>> first round, and if they look good then, the second round is >>>>> probably >>>>> not going top be beneficial. >>>>> ?Carson >>>>>> On Feb 5, 2016, at 3:42 PM, hcma wrote: >>>>>> Hi Dr Holt, >>>>>> Thanks for the email. Here is my pipeline, does it seems >>>>>> acceptable? Any comments is welcome and much appreciated. >>>>>> 1. Use maker to generate training gene set: >>>>>> genome=all-chromosome-r1.04.fasta >>>>>> est=Trinity.fasta >>>>>> est2genome=1 >>>>>> 2. Use output of Maker to train SNAP: >>>>>> maker2zff dwil-all-chromosome-r1.04.all.gff >>>>>> fathom genome.ann genome.dna ?gene-stats >>>>>> fathom genome.ann genome.dna ?categorize 1000 >>>>>> fathom genome.ann genome.dna ?gene-stats >>>>>> fathom uni.ann uni.dna ?export 1000 ?plus >>>>>> hmm-assembler.pl genome . > dwil_genome.hmm >>>>>> 3. Use output of Maker to train Augustus on their webserver: >>>>>> File used: >>>>>> Upload ?export.dna? as the genome file >>>>>> Upload ?export.aa? as the protein file >>>>>> 4. second and final Maker run: >>>>>> genome=all-chromosome-r1.04.fasta >>>>>> est=Trinity.fasta >>>>>> est2genome=0 >>>>>> Snaphmm=output of 2 >>>>>> How do i incorporate the output of training set of gene from >>>>>> Augustus web server here into this step 4? >>>>>> Thanks for your time. >>>>>> Best Regards >>>>>> Karen >>>>>>> On 2016-02-05 06:36, Carson Holt wrote: >>>>>>> Hi Karen, >>>>>>> There are many ways to train Augustus. I prefer to identify gene >>>>>>> models in MAKER (GFF3) and use those to train both SNAP and >>>>>>> Augustus. >>>>>>> Here is a previous post on the topic ?> >>>>>>> https://groups.google.com/forum/#!searchin/maker-devel/augustus/maker-devel/FWMSTdqWQqI/lC3miQtiCpwJ >>>>>>> [1] >>>>>>> In the end you need to look at the SNAP and Augustus models >>>>>>> together >>>>>>> with evidence alignments in a genome browser (like desktop >>>>>>> Apollo). >>>>>>> When everything is trained well, both SNAP and Augustus models >>>>>>> will >>>>>>> look like each other and both seem to look like the evidence >>>>>>> alignments. >>>>>>> Thanks, >>>>>>> Carson >>>>>>>> On Feb 4, 2016, at 5:52 PM, hcma wrote: >>>>>>>> Hi, >>>>>>>> I have a genome sequence and Trinity assembly for a new species >>>>>>>> and >>>>>>>> I am wondering what are the best steps to take when using MAKER? >>>>>>>> 1. I used the genome sequence and all assembled Trinity sequence >>>>>>>> to >>>>>>>> do first run of MAKER in order to generate training set for SNAP >>>>>>>> and >>>>>>>> Augustus. >>>>>>>> In maker_opts.ctl: >>>>>>>> genome=all-chromosome-r1.04.fasta >>>>>>>> est=Trinity.fasta >>>>>>>> est2genome=1 >>>>>>>> 2. Train SNAP >>>>>>>> 3. Train Augustus >>>>>>>> When i train Augustus, i only supply genome and protein file, >>>>>>>> should >>>>>>>> i also supply the trinity file here? >>>>>>>> 4. what's the best parameter to use when running MAKER the >>>>>>>> second >>>>>>>> time for obtaining the final annotation? I would prefer not to >>>>>>>> use >>>>>>>> any external protein data. >>>>>>>> genome=all-chromosome-r1.04.fasta >>>>>>>> est=Trinity.fasta >>>>>>>> est2genome=0 >>>>>>>> SNAP >>>>>>>> Augustus >>>>>>>> Thanks. >>>>>>>> Best Regards >>>>>>>> KAren >>>>>>> Links: >>>>>>> ------ >>>>>>> [1] >>>>>>> https://groups.google.com/forum/#!searchin/maker-devel/augustus/maker-devel/FWMSTdqWQqI/lC3miQtiCpwJ >> From jgallant at msu.edu Tue Feb 9 19:36:51 2016 From: jgallant at msu.edu (Jason Gallant) Date: Wed, 10 Feb 2016 02:36:51 +0000 Subject: [maker-devel] Extract FASTA Sequences from "Maker Standard" Build Message-ID: Hi Everyone, Quick question? I?ve run through Mike Cambell?s tutorial on building ?Maker Standard?, ?Maker Default? and ?Maker Max? datasets. I?ve decided that the ?Maker Standard? data (Transcripts with Evidence and/or IPR scan hits) makes the most sense for what we?re trying to do. Is there an easy way to create the fasta files associated with the maker standard build? Fasta_merge typically outputs a variety of .fasta files, which I?ve been able to create following this protocol for the ?maker max? dataset. I?d like to get these for the ?maker standard? build. Currently, the datastore contains the data for the ?maker max? data. One way, i suppose would be to re-run MAKER with the maker standard gff file, but it seems like an overly complicated way of doing it?? Any suggestions Mike (or others?) Has anyone written a script to do this automagically? Best, Jason Gallant -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.s.campbell1 at gmail.com Wed Feb 10 07:03:29 2016 From: michael.s.campbell1 at gmail.com (Michael Campbell) Date: Wed, 10 Feb 2016 09:03:29 -0500 Subject: [maker-devel] Extract FASTA Sequences from "Maker Standard" Build In-Reply-To: References: Message-ID: <2F89E4BC-C473-40A9-AE81-EAA2323B17D0@gmail.com> Hi Jason, Rerunning MAKER with the standard gff3 file would work, but for speed I would use the fasta_tool accessory script that is bundled with MAKER. All you need to make is a file with the list of transcript names from the standard gff3. Then you can use fasta_tool with the --select ooption to return all of the FASTA sequences that are in the list. The command would look like this PATH_TO_MAKER/maker/bin/fasta_tool --select id_file.txt max_transcritps.fasta | PATH_TO_MAKER/maker/bin/fasta_tool --wrap 80 > standard_transcripts.fasta fasta_tool outputs unwraped fasta by default, so I generally pipe the output back through fasta_tool to wrap the text. The above command line wraps the sequence at 80 characters. you can use a perl one liner like this one to make the id file perl -lane ' if ($F[2] eq mRNA){my ($id) = $_ =~ /Name=(\S+?);/; print $id;}? maker_standard.gff If you use these command line make sure you type them out yourself, email programs have a tendency to change characters slightly making copy/pasted command fail. Thanks, Mike > On Feb 9, 2016, at 9:36 PM, Jason Gallant wrote: > > Hi Everyone, > > Quick question? I?ve run through Mike Cambell?s tutorial on building ?Maker Standard?, ?Maker Default? and ?Maker Max? datasets. I?ve decided that the ?Maker Standard? data (Transcripts with Evidence and/or IPR scan hits) makes the most sense for what we?re trying to do. > > Is there an easy way to create the fasta files associated with the maker standard build? Fasta_merge typically outputs a variety of .fasta files, which I?ve been able to create following this protocol for the ?maker max? dataset. I?d like to get these for the ?maker standard? build. > > Currently, the datastore contains the data for the ?maker max? data. One way, i suppose would be to re-run MAKER with the maker standard gff file, but it seems like an overly complicated way of doing it?? > > Any suggestions Mike (or others?) Has anyone written a script to do this automagically? > > Best, > Jason Gallant > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From michael.s.campbell1 at gmail.com Wed Feb 10 07:17:11 2016 From: michael.s.campbell1 at gmail.com (Michael Campbell) Date: Wed, 10 Feb 2016 09:17:11 -0500 Subject: [maker-devel] Q on MAKER In-Reply-To: <7e4d6f2773f654f8530155936b648832@uci.edu> References: <07C1C0FA-243E-4F05-AE17-FBECCE5BC3A5@gmail.com> <980827F7-C4F0-4E26-A480-F588B2E09C7B@gmail.com> <99f6989955acdf6fd6b0875affbeefa9@uci.edu> <308df3bd49b7098b6f0d36cb2675a0de@uci.edu> <4b6492c5148151cc52c91f2d56c6532b@uci.edu> <54B64F8D-D4A6-4E0F-9693-3AC4055CEA02@gmail.com> <5a40b7af9947dc8297046ba52620569e@uci.edu> <9449FD1E-5D22-4B0B-A61A-644D7B63335B@gmail.com> <7e4d6f2773f654f8530155936b648832@uci.edu> Message-ID: <7495272C-476A-4985-8D49-30D991410535@gmail.com> HI Karen, From my experience trimming reads will not make things worse and it generally makes things better. As far as the best program to use, one doesn?t really stand out above the others as far as I can tell. However, with paired end reads it is important to use a trimmer that preserves the pairing between the two files (i.e when an entire read is discarded the paired read is moved into a file for singletons). Thanks Mike > On Feb 9, 2016, at 5:35 PM, hcma wrote: > > Hi Carson, > > For the final run of annotation, I would like to incorporate tophat results from RNA-seq data, from your experience, do you know if it is better to use raw RNA-seq (Illumina paired-end data) or trimmed (trimmed using Trimmomatuc) data for feeding into tophat? If trimmed, do you recommend a particular programme? > > Thanks for your time. > > Best Regards > KAren > > > > > On 2016-02-05 15:33, Carson Holt wrote: >> I recommend using both. You probably don't have augustus installed. >> --Carson >> Sent from my iPhone >>> On Feb 5, 2016, at 4:20 PM, hcma wrote: >>> Hi Carlson, >>> Thanks for the instruction and in maker_exe.ctl, i only see path to snap, but not to augustus, so my system admin is checking this for me. >>> From some manual i found, people use both snap and augustus when using MAKER to annotate genomes. Would you recommend using both or one of the 2 is sufficient? >>> Thanks for your valuable time and advise. >>> Best Regards >>> Karen >>>> On 2016-02-05 15:03, Carson Holt wrote: >>>> You need to find out where the augustus MAKER is using is installed. >>>> Check the maker_exe.ctl file you are using, or type ?which augustus?. >>>> ?Carson >>>>> On Feb 5, 2016, at 3:58 PM, hcma wrote: >>>>> Hi Carlson, >>>>> These are the list of directories under maker/2.31.8 >>>>> bin data GMOD INSTALL lib LICENSE MWAS perl README RELEASE src >>>>> Where can i find augustus/? Or i have to ask my system admin to install this? >>>>> Thanks. >>>>> Best Regards >>>>> Karen >>>>>> On 2016-02-05 14:54, Carson Holt wrote: >>>>>> Augustus gives you an entire directory rather than just a single file >>>>>> like SNAP. You have to take the directory and copy it to the >>>>>> .../augustus/config/species/ directory. >>>>>> Example: >>>>>> ?/augustus/config/species/arabidopsis/ >>>>>> Then ?arabidopsis? would be the species name to use with MAKER. >>>>>> Sometimes you may have to do a second round of both SNAP and Augustus >>>>>> training (called bootstrapping). Look at the models you get after the >>>>>> first round, and if they look good then, the second round is probably >>>>>> not going top be beneficial. >>>>>> ?Carson >>>>>>> On Feb 5, 2016, at 3:42 PM, hcma wrote: >>>>>>> Hi Dr Holt, >>>>>>> Thanks for the email. Here is my pipeline, does it seems acceptable? Any comments is welcome and much appreciated. >>>>>>> 1. Use maker to generate training gene set: >>>>>>> genome=all-chromosome-r1.04.fasta >>>>>>> est=Trinity.fasta >>>>>>> est2genome=1 >>>>>>> 2. Use output of Maker to train SNAP: >>>>>>> maker2zff dwil-all-chromosome-r1.04.all.gff >>>>>>> fathom genome.ann genome.dna ?gene-stats >>>>>>> fathom genome.ann genome.dna ?categorize 1000 >>>>>>> fathom genome.ann genome.dna ?gene-stats >>>>>>> fathom uni.ann uni.dna ?export 1000 ?plus >>>>>>> hmm-assembler.pl genome . > dwil_genome.hmm >>>>>>> 3. Use output of Maker to train Augustus on their webserver: >>>>>>> File used: >>>>>>> Upload ?export.dna? as the genome file >>>>>>> Upload ?export.aa? as the protein file >>>>>>> 4. second and final Maker run: >>>>>>> genome=all-chromosome-r1.04.fasta >>>>>>> est=Trinity.fasta >>>>>>> est2genome=0 >>>>>>> Snaphmm=output of 2 >>>>>>> How do i incorporate the output of training set of gene from Augustus web server here into this step 4? >>>>>>> Thanks for your time. >>>>>>> Best Regards >>>>>>> Karen >>>>>>>> On 2016-02-05 06:36, Carson Holt wrote: >>>>>>>> Hi Karen, >>>>>>>> There are many ways to train Augustus. I prefer to identify gene >>>>>>>> models in MAKER (GFF3) and use those to train both SNAP and Augustus. >>>>>>>> Here is a previous post on the topic ?> >>>>>>>> https://groups.google.com/forum/#!searchin/maker-devel/augustus/maker-devel/FWMSTdqWQqI/lC3miQtiCpwJ >>>>>>>> [1] >>>>>>>> In the end you need to look at the SNAP and Augustus models together >>>>>>>> with evidence alignments in a genome browser (like desktop Apollo). >>>>>>>> When everything is trained well, both SNAP and Augustus models will >>>>>>>> look like each other and both seem to look like the evidence >>>>>>>> alignments. >>>>>>>> Thanks, >>>>>>>> Carson >>>>>>>>> On Feb 4, 2016, at 5:52 PM, hcma wrote: >>>>>>>>> Hi, >>>>>>>>> I have a genome sequence and Trinity assembly for a new species and >>>>>>>>> I am wondering what are the best steps to take when using MAKER? >>>>>>>>> 1. I used the genome sequence and all assembled Trinity sequence to >>>>>>>>> do first run of MAKER in order to generate training set for SNAP and >>>>>>>>> Augustus. >>>>>>>>> In maker_opts.ctl: >>>>>>>>> genome=all-chromosome-r1.04.fasta >>>>>>>>> est=Trinity.fasta >>>>>>>>> est2genome=1 >>>>>>>>> 2. Train SNAP >>>>>>>>> 3. Train Augustus >>>>>>>>> When i train Augustus, i only supply genome and protein file, should >>>>>>>>> i also supply the trinity file here? >>>>>>>>> 4. what's the best parameter to use when running MAKER the second >>>>>>>>> time for obtaining the final annotation? I would prefer not to use >>>>>>>>> any external protein data. >>>>>>>>> genome=all-chromosome-r1.04.fasta >>>>>>>>> est=Trinity.fasta >>>>>>>>> est2genome=0 >>>>>>>>> SNAP >>>>>>>>> Augustus >>>>>>>>> Thanks. >>>>>>>>> Best Regards >>>>>>>>> KAren >>>>>>>> Links: >>>>>>>> ------ >>>>>>>> [1] >>>>>>>> https://groups.google.com/forum/#!searchin/maker-devel/augustus/maker-devel/FWMSTdqWQqI/lC3miQtiCpwJ > From hcma at uci.edu Wed Feb 10 15:27:41 2016 From: hcma at uci.edu (hcma) Date: Wed, 10 Feb 2016 14:27:41 -0800 Subject: [maker-devel] Q on MAKER In-Reply-To: <7495272C-476A-4985-8D49-30D991410535@gmail.com> References: <07C1C0FA-243E-4F05-AE17-FBECCE5BC3A5@gmail.com> <980827F7-C4F0-4E26-A480-F588B2E09C7B@gmail.com> <99f6989955acdf6fd6b0875affbeefa9@uci.edu> <308df3bd49b7098b6f0d36cb2675a0de@uci.edu> <4b6492c5148151cc52c91f2d56c6532b@uci.edu> <54B64F8D-D4A6-4E0F-9693-3AC4055CEA02@gmail.com> <5a40b7af9947dc8297046ba52620569e@uci.edu> <9449FD1E-5D22-4B0B-A61A-644D7B63335B@gmail.com> <7e4d6f2773f654f8530155936b648832@uci.edu> <7495272C-476A-4985-8D49-30D991410535@gmail.com> Message-ID: <7870d65f86546a8b486faf98c1f6fcc0@uci.edu> Hi Mike, Thanks for the reply. So i can input raw RNA-seq reads to Tophat and feed the output to maker? Thanks. Best Regards KAren On 2016-02-10 06:17, Michael Campbell wrote: > HI Karen, > > From my experience trimming reads will not make things worse and it > generally makes things better. As far as the best program to use, one > doesn?t really stand out above the others as far as I can tell. > However, with paired end reads it is important to use a trimmer that > preserves the pairing between the two files (i.e when an entire read > is discarded the paired read is moved into a file for singletons). > > Thanks > Mike > >> On Feb 9, 2016, at 5:35 PM, hcma wrote: >> >> Hi Carson, >> >> For the final run of annotation, I would like to incorporate tophat >> results from RNA-seq data, from your experience, do you know if it is >> better to use raw RNA-seq (Illumina paired-end data) or trimmed >> (trimmed using Trimmomatuc) data for feeding into tophat? If trimmed, >> do you recommend a particular programme? >> >> Thanks for your time. >> >> Best Regards >> KAren >> >> >> >> >> On 2016-02-05 15:33, Carson Holt wrote: >>> I recommend using both. You probably don't have augustus installed. >>> --Carson >>> Sent from my iPhone >>>> On Feb 5, 2016, at 4:20 PM, hcma wrote: >>>> Hi Carlson, >>>> Thanks for the instruction and in maker_exe.ctl, i only see path to >>>> snap, but not to augustus, so my system admin is checking this for >>>> me. >>>> From some manual i found, people use both snap and augustus when >>>> using MAKER to annotate genomes. Would you recommend using both or >>>> one of the 2 is sufficient? >>>> Thanks for your valuable time and advise. >>>> Best Regards >>>> Karen >>>>> On 2016-02-05 15:03, Carson Holt wrote: >>>>> You need to find out where the augustus MAKER is using is >>>>> installed. >>>>> Check the maker_exe.ctl file you are using, or type ?which >>>>> augustus?. >>>>> ?Carson >>>>>> On Feb 5, 2016, at 3:58 PM, hcma wrote: >>>>>> Hi Carlson, >>>>>> These are the list of directories under maker/2.31.8 >>>>>> bin data GMOD INSTALL lib LICENSE MWAS perl README >>>>>> RELEASE src >>>>>> Where can i find augustus/? Or i have to ask my system admin to >>>>>> install this? >>>>>> Thanks. >>>>>> Best Regards >>>>>> Karen >>>>>>> On 2016-02-05 14:54, Carson Holt wrote: >>>>>>> Augustus gives you an entire directory rather than just a single >>>>>>> file >>>>>>> like SNAP. You have to take the directory and copy it to the >>>>>>> .../augustus/config/species/ directory. >>>>>>> Example: >>>>>>> ?/augustus/config/species/arabidopsis/ >>>>>>> Then ?arabidopsis? would be the species name to use with MAKER. >>>>>>> Sometimes you may have to do a second round of both SNAP and >>>>>>> Augustus >>>>>>> training (called bootstrapping). Look at the models you get after >>>>>>> the >>>>>>> first round, and if they look good then, the second round is >>>>>>> probably >>>>>>> not going top be beneficial. >>>>>>> ?Carson >>>>>>>> On Feb 5, 2016, at 3:42 PM, hcma wrote: >>>>>>>> Hi Dr Holt, >>>>>>>> Thanks for the email. Here is my pipeline, does it seems >>>>>>>> acceptable? Any comments is welcome and much appreciated. >>>>>>>> 1. Use maker to generate training gene set: >>>>>>>> genome=all-chromosome-r1.04.fasta >>>>>>>> est=Trinity.fasta >>>>>>>> est2genome=1 >>>>>>>> 2. Use output of Maker to train SNAP: >>>>>>>> maker2zff dwil-all-chromosome-r1.04.all.gff >>>>>>>> fathom genome.ann genome.dna ?gene-stats >>>>>>>> fathom genome.ann genome.dna ?categorize 1000 >>>>>>>> fathom genome.ann genome.dna ?gene-stats >>>>>>>> fathom uni.ann uni.dna ?export 1000 ?plus >>>>>>>> hmm-assembler.pl genome . > dwil_genome.hmm >>>>>>>> 3. Use output of Maker to train Augustus on their webserver: >>>>>>>> File used: >>>>>>>> Upload ?export.dna? as the genome file >>>>>>>> Upload ?export.aa? as the protein file >>>>>>>> 4. second and final Maker run: >>>>>>>> genome=all-chromosome-r1.04.fasta >>>>>>>> est=Trinity.fasta >>>>>>>> est2genome=0 >>>>>>>> Snaphmm=output of 2 >>>>>>>> How do i incorporate the output of training set of gene from >>>>>>>> Augustus web server here into this step 4? >>>>>>>> Thanks for your time. >>>>>>>> Best Regards >>>>>>>> Karen >>>>>>>>> On 2016-02-05 06:36, Carson Holt wrote: >>>>>>>>> Hi Karen, >>>>>>>>> There are many ways to train Augustus. I prefer to identify >>>>>>>>> gene >>>>>>>>> models in MAKER (GFF3) and use those to train both SNAP and >>>>>>>>> Augustus. >>>>>>>>> Here is a previous post on the topic ?> >>>>>>>>> https://groups.google.com/forum/#!searchin/maker-devel/augustus/maker-devel/FWMSTdqWQqI/lC3miQtiCpwJ >>>>>>>>> [1] >>>>>>>>> In the end you need to look at the SNAP and Augustus models >>>>>>>>> together >>>>>>>>> with evidence alignments in a genome browser (like desktop >>>>>>>>> Apollo). >>>>>>>>> When everything is trained well, both SNAP and Augustus models >>>>>>>>> will >>>>>>>>> look like each other and both seem to look like the evidence >>>>>>>>> alignments. >>>>>>>>> Thanks, >>>>>>>>> Carson >>>>>>>>>> On Feb 4, 2016, at 5:52 PM, hcma wrote: >>>>>>>>>> Hi, >>>>>>>>>> I have a genome sequence and Trinity assembly for a new >>>>>>>>>> species and >>>>>>>>>> I am wondering what are the best steps to take when using >>>>>>>>>> MAKER? >>>>>>>>>> 1. I used the genome sequence and all assembled Trinity >>>>>>>>>> sequence to >>>>>>>>>> do first run of MAKER in order to generate training set for >>>>>>>>>> SNAP and >>>>>>>>>> Augustus. >>>>>>>>>> In maker_opts.ctl: >>>>>>>>>> genome=all-chromosome-r1.04.fasta >>>>>>>>>> est=Trinity.fasta >>>>>>>>>> est2genome=1 >>>>>>>>>> 2. Train SNAP >>>>>>>>>> 3. Train Augustus >>>>>>>>>> When i train Augustus, i only supply genome and protein file, >>>>>>>>>> should >>>>>>>>>> i also supply the trinity file here? >>>>>>>>>> 4. what's the best parameter to use when running MAKER the >>>>>>>>>> second >>>>>>>>>> time for obtaining the final annotation? I would prefer not to >>>>>>>>>> use >>>>>>>>>> any external protein data. >>>>>>>>>> genome=all-chromosome-r1.04.fasta >>>>>>>>>> est=Trinity.fasta >>>>>>>>>> est2genome=0 >>>>>>>>>> SNAP >>>>>>>>>> Augustus >>>>>>>>>> Thanks. >>>>>>>>>> Best Regards >>>>>>>>>> KAren >>>>>>>>> Links: >>>>>>>>> ------ >>>>>>>>> [1] >>>>>>>>> https://groups.google.com/forum/#!searchin/maker-devel/augustus/maker-devel/FWMSTdqWQqI/lC3miQtiCpwJ >> From carsonhh at gmail.com Wed Feb 10 19:32:00 2016 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 10 Feb 2016 19:32:00 -0700 Subject: [maker-devel] Q on MAKER In-Reply-To: <7870d65f86546a8b486faf98c1f6fcc0@uci.edu> References: <07C1C0FA-243E-4F05-AE17-FBECCE5BC3A5@gmail.com> <980827F7-C4F0-4E26-A480-F588B2E09C7B@gmail.com> <99f6989955acdf6fd6b0875affbeefa9@uci.edu> <308df3bd49b7098b6f0d36cb2675a0de@uci.edu> <4b6492c5148151cc52c91f2d56c6532b@uci.edu> <54B64F8D-D4A6-4E0F-9693-3AC4055CEA02@gmail.com> <5a40b7af9947dc8297046ba52620569e@uci.edu> <9449FD1E-5D22-4B0B-A61A-644D7B63335B@gmail.com> <7e4d6f2773f654f8530155936b648832@uci.edu> <7495272C-476A-4985-8D49-30D991410535@gmail.com> <7870d65f86546a8b486faf98c1f6fcc0@uci.edu> Message-ID: I find tophat results to be too noisy, and prefer cufflinks. There is both a tophat2gff and cufflinks2gff script that comes with MAKER. Also consider assembling the reads with Trinity (my overall preferred method because it yields the highest specificity). --Carson Sent from my iPhone > On Feb 10, 2016, at 3:27 PM, hcma wrote: > > Hi Mike, > > Thanks for the reply. So i can input raw RNA-seq reads to Tophat and feed the output to maker? > > Thanks. > > Best Regards > KAren > > > >> On 2016-02-10 06:17, Michael Campbell wrote: >> HI Karen, >> From my experience trimming reads will not make things worse and it >> generally makes things better. As far as the best program to use, one >> doesn?t really stand out above the others as far as I can tell. >> However, with paired end reads it is important to use a trimmer that >> preserves the pairing between the two files (i.e when an entire read >> is discarded the paired read is moved into a file for singletons). >> Thanks >> Mike >>> On Feb 9, 2016, at 5:35 PM, hcma wrote: >>> Hi Carson, >>> For the final run of annotation, I would like to incorporate tophat results from RNA-seq data, from your experience, do you know if it is better to use raw RNA-seq (Illumina paired-end data) or trimmed (trimmed using Trimmomatuc) data for feeding into tophat? If trimmed, do you recommend a particular programme? >>> Thanks for your time. >>> Best Regards >>> KAren >>>> On 2016-02-05 15:33, Carson Holt wrote: >>>> I recommend using both. You probably don't have augustus installed. >>>> --Carson >>>> Sent from my iPhone >>>>> On Feb 5, 2016, at 4:20 PM, hcma wrote: >>>>> Hi Carlson, >>>>> Thanks for the instruction and in maker_exe.ctl, i only see path to snap, but not to augustus, so my system admin is checking this for me. >>>>> From some manual i found, people use both snap and augustus when using MAKER to annotate genomes. Would you recommend using both or one of the 2 is sufficient? >>>>> Thanks for your valuable time and advise. >>>>> Best Regards >>>>> Karen >>>>>> On 2016-02-05 15:03, Carson Holt wrote: >>>>>> You need to find out where the augustus MAKER is using is installed. >>>>>> Check the maker_exe.ctl file you are using, or type ?which augustus?. >>>>>> ?Carson >>>>>>> On Feb 5, 2016, at 3:58 PM, hcma wrote: >>>>>>> Hi Carlson, >>>>>>> These are the list of directories under maker/2.31.8 >>>>>>> bin data GMOD INSTALL lib LICENSE MWAS perl README RELEASE src >>>>>>> Where can i find augustus/? Or i have to ask my system admin to install this? >>>>>>> Thanks. >>>>>>> Best Regards >>>>>>> Karen >>>>>>>> On 2016-02-05 14:54, Carson Holt wrote: >>>>>>>> Augustus gives you an entire directory rather than just a single file >>>>>>>> like SNAP. You have to take the directory and copy it to the >>>>>>>> .../augustus/config/species/ directory. >>>>>>>> Example: >>>>>>>> ?/augustus/config/species/arabidopsis/ >>>>>>>> Then ?arabidopsis? would be the species name to use with MAKER. >>>>>>>> Sometimes you may have to do a second round of both SNAP and Augustus >>>>>>>> training (called bootstrapping). Look at the models you get after the >>>>>>>> first round, and if they look good then, the second round is probably >>>>>>>> not going top be beneficial. >>>>>>>> ?Carson >>>>>>>>> On Feb 5, 2016, at 3:42 PM, hcma wrote: >>>>>>>>> Hi Dr Holt, >>>>>>>>> Thanks for the email. Here is my pipeline, does it seems acceptable? Any comments is welcome and much appreciated. >>>>>>>>> 1. Use maker to generate training gene set: >>>>>>>>> genome=all-chromosome-r1.04.fasta >>>>>>>>> est=Trinity.fasta >>>>>>>>> est2genome=1 >>>>>>>>> 2. Use output of Maker to train SNAP: >>>>>>>>> maker2zff dwil-all-chromosome-r1.04.all.gff >>>>>>>>> fathom genome.ann genome.dna ?gene-stats >>>>>>>>> fathom genome.ann genome.dna ?categorize 1000 >>>>>>>>> fathom genome.ann genome.dna ?gene-stats >>>>>>>>> fathom uni.ann uni.dna ?export 1000 ?plus >>>>>>>>> hmm-assembler.pl genome . > dwil_genome.hmm >>>>>>>>> 3. Use output of Maker to train Augustus on their webserver: >>>>>>>>> File used: >>>>>>>>> Upload ?export.dna? as the genome file >>>>>>>>> Upload ?export.aa? as the protein file >>>>>>>>> 4. second and final Maker run: >>>>>>>>> genome=all-chromosome-r1.04.fasta >>>>>>>>> est=Trinity.fasta >>>>>>>>> est2genome=0 >>>>>>>>> Snaphmm=output of 2 >>>>>>>>> How do i incorporate the output of training set of gene from Augustus web server here into this step 4? >>>>>>>>> Thanks for your time. >>>>>>>>> Best Regards >>>>>>>>> Karen >>>>>>>>>> On 2016-02-05 06:36, Carson Holt wrote: >>>>>>>>>> Hi Karen, >>>>>>>>>> There are many ways to train Augustus. I prefer to identify gene >>>>>>>>>> models in MAKER (GFF3) and use those to train both SNAP and Augustus. >>>>>>>>>> Here is a previous post on the topic ?> >>>>>>>>>> https://groups.google.com/forum/#!searchin/maker-devel/augustus/maker-devel/FWMSTdqWQqI/lC3miQtiCpwJ >>>>>>>>>> [1] >>>>>>>>>> In the end you need to look at the SNAP and Augustus models together >>>>>>>>>> with evidence alignments in a genome browser (like desktop Apollo). >>>>>>>>>> When everything is trained well, both SNAP and Augustus models will >>>>>>>>>> look like each other and both seem to look like the evidence >>>>>>>>>> alignments. >>>>>>>>>> Thanks, >>>>>>>>>> Carson >>>>>>>>>>> On Feb 4, 2016, at 5:52 PM, hcma wrote: >>>>>>>>>>> Hi, >>>>>>>>>>> I have a genome sequence and Trinity assembly for a new species and >>>>>>>>>>> I am wondering what are the best steps to take when using MAKER? >>>>>>>>>>> 1. I used the genome sequence and all assembled Trinity sequence to >>>>>>>>>>> do first run of MAKER in order to generate training set for SNAP and >>>>>>>>>>> Augustus. >>>>>>>>>>> In maker_opts.ctl: >>>>>>>>>>> genome=all-chromosome-r1.04.fasta >>>>>>>>>>> est=Trinity.fasta >>>>>>>>>>> est2genome=1 >>>>>>>>>>> 2. Train SNAP >>>>>>>>>>> 3. Train Augustus >>>>>>>>>>> When i train Augustus, i only supply genome and protein file, should >>>>>>>>>>> i also supply the trinity file here? >>>>>>>>>>> 4. what's the best parameter to use when running MAKER the second >>>>>>>>>>> time for obtaining the final annotation? I would prefer not to use >>>>>>>>>>> any external protein data. >>>>>>>>>>> genome=all-chromosome-r1.04.fasta >>>>>>>>>>> est=Trinity.fasta >>>>>>>>>>> est2genome=0 >>>>>>>>>>> SNAP >>>>>>>>>>> Augustus >>>>>>>>>>> Thanks. >>>>>>>>>>> Best Regards >>>>>>>>>>> KAren >>>>>>>>>> Links: >>>>>>>>>> ------ >>>>>>>>>> [1] >>>>>>>>>> https://groups.google.com/forum/#!searchin/maker-devel/augustus/maker-devel/FWMSTdqWQqI/lC3miQtiCpwJ > From fdolze at students.uni-mainz.de Thu Feb 11 03:43:51 2016 From: fdolze at students.uni-mainz.de (Florian) Date: Thu, 11 Feb 2016 11:43:51 +0100 Subject: [maker-devel] Maker-run with no clean finish on openMPI-cluster Message-ID: <56BC65E7.6000904@students.uni-mainz.de> Hi all, I am no expert for MPI so maybe this is something very trivial or maybe not caused by MAKER at all but I'd be glad to have your thoughts on this. I installed MAKER 2.31.8 with MPI support (openMPI 1.8.1) on our cluster. I ran maker with the options attached and the command in bsub_maker, and I _think_ it worked fine. Here is the last output of maker: running exonerate search. #--------- command -------------# Widget::exonerate::protein2genome: /gpfs/fs1/cluster/Apps/bioinf/maker/2.31.8/exe/exonerate/bin/exonerate -q /project/molgen/workbench_Florian/riparius_MAKER_v2/tmp_fil es/maker_yZhQlA/49/sp%7CQ4JHE0%7CXB36_ORYSJ.for.114901-115619.49.fasta -t /project/molgen/workbench_Florian/riparius_MAKER_v2/tmp_fil es/maker_yZhQlA/49/scaffold299_size115619.114901-115619.49.fasta -Q protein -T dna -m protein2genome --softmasktarget --percent 20 - -showcigar > /project/molgen/workbench_Florian/riparius_MAKER_v2/tmp_files/maker_yZhQlA/49/scaffold299_size115619.114901-115619.sp%7 CQ4JHE0%7CXB36_ORYSJ.p.exonerate #-------------------------------# cleaning blastx... in cluster::shadow_cluster... ...finished clustering. in cluster::shadow_cluster... ...finished clustering. cleaning clusters.... total clusters:37 now processing 0 ...processing 0 of 11 ...processing 1 of 11 ...processing 2 of 11 ...processing 3 of 11 ... ...processing 174 of 177 ...processing 175 of 177 ...processing 176 of 177 flattening protein clusters prepare section files Maker is now finished!!! Start_time: 1454700985 End_time: 1455023070 Elapsed: 322085 but my cluster job didnt finish here, instead I got the following errors until my runtime limit of 5 days was reached: Argument "ALRM" isn't numeric in exit at /home/fdolze/perl5/lib/perl5/x86_64-linux-thread-multi/forks.pm line 2184. SIGTERM received Argument "ALRM" isn't numeric in exit at /home/fdolze/perl5/lib/perl5/x86_64-linux-thread-multi/forks.pm line 2184. SIGTERM received Perl exited with active threads: 1 running and unjoined 0 finished and unjoined 0 running and detached SIGTERM received Perl exited with active threads: 1 running and unjoined 0 finished and unjoined 0 running and detached SIGTERM received Perl exited with active threads: 1 running and unjoined 0 finished and unjoined 0 running and detached SIGTERM received SIGTERM received Perl exited with active threads: 1 running and unjoined 0 finished and unjoined 0 running and detached SIGTERM received Perl exited with active threads: 1 running and unjoined 0 finished and unjoined 0 running and detached SIGTERM received Perl exited with active threads: 1 running and unjoined 0 finished and unjoined 0 running and detached Argument "ALRM" isn't numeric in exit at /home/fdolze/perl5/lib/perl5/x86_64-linux-thread-multi/forks.pm line 2184. SIGTERM received SIGTERM received Perl exited with active threads: 1 running and unjoined 0 finished and unjoined 0 running and detached Argument "ALRM" isn't numeric in exit at /home/fdolze/perl5/lib/perl5/x86_64-linux-thread-multi/forks.pm line 2184. Argument "ALRM" isn't numeric in exit at /home/fdolze/perl5/lib/perl5/x86_64-linux-thread-multi/forks.pm line 2184. SIGTERM received SIGTERM received SIGTERM received SIGTERM received Perl exited with active threads: 1 running and unjoined 0 finished and unjoined 0 running and detached SIGTERM received SIGTERM received Perl exited with active threads: 1 running and unjoined 0 finished and unjoined 0 running and detached Argument "ALRM" isn't numeric in exit at /home/fdolze/perl5/lib/perl5/x86_64-linux-thread-multi/forks.pm line 2184. [a0238:09542] *** Process received signal *** [a0238:09542] Signal: Segmentation fault (11) [a0238:09542] Signal code: Address not mapped (1) [a0238:09542] Failing at address: 0xa80 [a0238:09542] [ 0] /lib64/libpthread.so.0(+0xf710)[0x2ba955727710] [a0238:09542] [ 1] /usr/lib64/perl5/CORE/libperl.so(Perl_csighandler+0x22)[0x2ba954715002] [a0238:09542] [ 2] /lib64/libpthread.so.0(+0xf710)[0x2ba955727710] [a0238:09542] [ 3] /lib64/libc.so.6(__poll+0x53)[0x2ba955a170d3] [a0238:09542] [ 4] /cluster/mpi/gcc_4.4.7/OpenMPI-1.8.1/lib/libopen-pal.so.6(+0x6cfca)[0x2ba955fb4fca] [a0238:09542] [ 5] /cluster/mpi/gcc_4.4.7/OpenMPI-1.8.1/lib/libopen-pal.so.6(opal_libevent2021_event_base_loop+0x221)[0x2ba955fabf11] [a0238:09542] [ 6] /cluster/mpi/gcc_4.4.7/OpenMPI-1.8.1/lib/libopen-rte.so.7(+0x376ae)[0x2ba955d076ae] [a0238:09542] [ 7] /lib64/libpthread.so.0(+0x79d1)[0x2ba95571f9d1] [a0238:09542] [ 8] /lib64/libc.so.6(clone+0x6d)[0x2ba955a208fd] [a0238:09542] *** End of error message *** Argument "ALRM" isn't numeric in exit at /home/fdolze/perl5/lib/perl5/x86_64-linux-thread-multi/forks.pm line 2184. SIGTERM received SIGTERM received ... maybe someone experienced something similar before or can give me some hint if this is caused by my setup or by maker. kind regards, Florian Dolze -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- #-----BLAST and Exonerate Statistics Thresholds blast_type=ncbi+ #set to 'ncbi+', 'ncbi' or 'wublast' pcov_blastn=0.8 #Blastn Percent Coverage Threhold EST-Genome Alignments pid_blastn=0.85 #Blastn Percent Identity Threshold EST-Genome Aligments eval_blastn=1e-10 #Blastn eval cutoff bit_blastn=40 #Blastn bit cutoff depth_blastn=0 #Blastn depth cutoff (0 to disable cutoff) pcov_blastx=0.5 #Blastx Percent Coverage Threhold Protein-Genome Alignments pid_blastx=0.4 #Blastx Percent Identity Threshold Protein-Genome Aligments eval_blastx=1e-06 #Blastx eval cutoff bit_blastx=30 #Blastx bit cutoff depth_blastx=0 #Blastx depth cutoff (0 to disable cutoff) pcov_tblastx=0.8 #tBlastx Percent Coverage Threhold alt-EST-Genome Alignments pid_tblastx=0.85 #tBlastx Percent Identity Threshold alt-EST-Genome Aligments eval_tblastx=1e-10 #tBlastx eval cutoff bit_tblastx=40 #tBlastx bit cutoff depth_tblastx=0 #tBlastx depth cutoff (0 to disable cutoff) pcov_rm_blastx=0.5 #Blastx Percent Coverage Threhold For Transposable Element Masking pid_rm_blastx=0.4 #Blastx Percent Identity Threshold For Transposbale Element Masking eval_rm_blastx=1e-06 #Blastx eval cutoff for transposable element masking bit_rm_blastx=30 #Blastx bit cutoff for transposable element masking ep_score_limit=20 #Exonerate protein percent of maximal score threshold en_score_limit=20 #Exonerate nucleotide percent of maximal score threshold -------------- next part -------------- #-----Location of Executables Used by MAKER/EVALUATOR makeblastdb=/cluster/Apps/bioinf/BLAST/2.2.28/bin/makeblastdb #location of NCBI+ makeblastdb executable blastn=/cluster/Apps/bioinf/BLAST/2.2.28/bin/blastn #location of NCBI+ blastn executable blastx=/cluster/Apps/bioinf/BLAST/2.2.28/bin/blastx #location of NCBI+ blastx executable tblastx=/cluster/Apps/bioinf/BLAST/2.2.28/bin/tblastx #location of NCBI+ tblastx executable formatdb= #location of NCBI formatdb executable blastall= #location of NCBI blastall executable xdformat= #location of WUBLAST xdformat executable blasta= #location of WUBLAST blasta executable RepeatMasker=/gpfs/fs1/cluster/Apps/bioinf/maker/2.31.8/bin/../exe/RepeatMasker/RepeatMasker #location of RepeatMasker executable exonerate=/gpfs/fs1/cluster/Apps/bioinf/maker/2.31.8/bin/../exe/exonerate/bin/exonerate #location of exonerate executable #-----Ab-initio Gene Prediction Algorithms snap=/gpfs/fs1/cluster/Apps/bioinf/maker/2.31.8/bin/../exe/snap/snap #location of snap executable gmhmme3=/project/molgen/Maker_additional_tools/genemark-4.32/gmhmme3 #location of eukaryotic genemark executable gmhmmp= #location of prokaryotic genemark executable augustus=/project/molgen/Maker_additional_tools/augustus-3.2.1/bin/augustus #location of augustus executable fgenesh= #location of fgenesh executable tRNAscan-SE=/project/molgen/Maker_additional_tools/tRNAscan/bin/tRNAscan-SE #location of trnascan executable snoscan=/project/molgen/Maker_additional_tools/snoscan/bin/snoscan #location of snoscan executable #-----Other Algorithms probuild=/project/molgen/Maker_additional_tools/genemark-4.32/probuild #location of probuild executable (required for genemark) -------------- next part -------------- #-----Genome (these are always required) genome= /project/molgen/workbench_Florian/riparius_MAKER_v2/Crip_genome_v20_newHead.fa organism_type=eukaryotic #eukaryotic or prokaryotic. Default is eukaryotic #-----Re-annotation Using MAKER Derived GFF3 maker_gff= #MAKER derived GFF3 file est_pass=0 #use ESTs in maker_gff: 1 = yes, 0 = no altest_pass=0 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no protein_pass=0 #use protein alignments in maker_gff: 1 = yes, 0 = no rm_pass=0 #use repeats in maker_gff: 1 = yes, 0 = no model_pass=0 #use gene models in maker_gff: 1 = yes, 0 = no pred_pass=0 #use ab-initio predictions in maker_gff: 1 = yes, 0 = no other_pass=0 #passthrough anyything else in maker_gff: 1 = yes, 0 = no #-----EST Evidence (for best results provide a file for at least one) est=/project/molgen/workbench_Florian/riparius_MAKER_v2/riparius_cDNA_formatedHeader.fa #set of ESTs or assembled mRNA-seq in fasta format altest= #EST/cDNA sequence file in fasta format from an alternate organism est_gff= #aligned ESTs or mRNA-seq from an external GFF3 file altest_gff= #aligned ESTs from a closly relate species in GFF3 format #-----Protein Homology Evidence (for best results provide a file for at least one) protein=/project/molgen/workbench_Florian/riparius_MAKER_v2/uniprot_sprot.fasta #protein sequence file in fasta format (i.e. from mutiple oransisms) protein_gff= #aligned protein homology evidence from an external GFF3 file #-----Repeat Masking (leave values blank to skip repeat masking) model_org=all #select a model organism for RepBase masking in RepeatMasker rmlib=/project/molgen/workbench_Florian/riparius_MAKER_v2/20151208_Custom_Crip_repeat_library_final.fas #provide an organism specific repeat library in fasta format for RepeatMasker repeat_protein=/gpfs/fs1/cluster/Apps/bioinf/maker/2.31.8/data/te_proteins.fasta #provide a fasta file of transposable element proteins for RepeatRunner rm_gff= #pre-identified repeat elements from an external GFF3 file prok_rm=0 #forces MAKER to repeatmask prokaryotes (no reason to change this), 1 = yes, 0 = no softmask=1 #use soft-masking rather than hard-masking in BLAST (i.e. seg and dust filtering) #-----Gene Prediction snaphmm=/project/molgen/workbench_Florian/riparius_MAKER_v2/cegmasnap.hmm #SNAP HMM file gmhmm=/project/molgen/workbench_Florian/riparius_MAKER_v2/gmhmm.mod #GeneMark HMM file augustus_species=Riparius_Neu #Augustus gene prediction species model fgenesh_par_file= #FGENESH parameter file pred_gff= #ab-initio predictions from an external GFF3 file model_gff= #annotated gene models from an external GFF3 file (annotation pass-through) est2genome=1 #infer gene predictions directly from ESTs, 1 = yes, 0 = no protein2genome=1 #infer predictions from protein homology, 1 = yes, 0 = no trna=1 #find tRNAs with tRNAscan, 1 = yes, 0 = no snoscan_rrna=/project/molgen/workbench_Florian/riparius_MAKER_v2/C.thummi_28S_rDNA_gene.fasta #rRNA file to have Snoscan find snoRNAs unmask=0 #also run ab-initio prediction programs on unmasked sequence, 1 = yes, 0 = no #-----Other Annotation Feature Types (features MAKER doesn't recognize) other_gff= #extra features to pass-through to final MAKER generated GFF3 file #-----External Application Behavior Options alt_peptide=C #amino acid used to replace non-standard amino acids in BLAST databases cpus=1 #max number of cpus to use in BLAST and RepeatMasker (not for MPI, leave 1 when using MPI) #-----MAKER Behavior Options max_dna_len=2100000 #length for dividing up contigs into chunks (increases/decreases memory usage) min_contig=1 #skip genome contigs below this length (under 10kb are often useless) pred_flank=200 #flank for extending evidence clusters sent to gene predictors pred_stats=0 #report AED and QI statistics for all predictions as well as models AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and 1) min_protein=0 #require at least this many amino acids in predicted proteins alt_splice=0 #Take extra steps to try and find alternative splicing, 1 = yes, 0 = no always_complete=0 #extra steps to force start and stop codons, 1 = yes, 0 = no map_forward=0 #map names and attributes forward from old GFF3 genes, 1 = yes, 0 = no keep_preds=1 #Concordance threshold to add unsupported gene prediction (bound by 0 and 1) split_hit=10000 #length for the splitting of hits (expected max intron size for evidence alignments) single_exon=1 #consider single exon EST evidence when generating annotations, 1 = yes, 0 = no single_length=250 #min length required for single exon ESTs if 'single_exon is enabled' correct_est_fusion=0 #limits use of ESTs in annotation to avoid fusion genes tries=2 #number of times to try a contig if there is a failure for some reason clean_try=0 #remove all data from previous run before retrying, 1 = yes, 0 = no clean_up=0 #removes theVoid directory with individual analysis files, 1 = yes, 0 = no TMP=/project/molgen/workbench_Florian/riparius_MAKER_v2/tmp_files #specify a directory other than the system default temporary directory for temporary files -------------- next part -------------- #!/bin/bash #BSUB -n 128 #BSUB -q long #BSUB -W 7200 #BSUB -o mogon_maker_MPIrun_5_feb.log #BSUB -J riparius_makerMPI #BSUB -app Reserve1G mpiexec -mca btl ^openib -n 128 /project/molgen/Bio/maker-2.31.8_MPI-1.8.1/bin/maker -base maker_MPIrun3 -fix_nucleotides From hcma at uci.edu Thu Feb 11 15:32:45 2016 From: hcma at uci.edu (hcma) Date: Thu, 11 Feb 2016 14:32:45 -0800 Subject: [maker-devel] Q on MAKER In-Reply-To: References: <07C1C0FA-243E-4F05-AE17-FBECCE5BC3A5@gmail.com> <980827F7-C4F0-4E26-A480-F588B2E09C7B@gmail.com> <99f6989955acdf6fd6b0875affbeefa9@uci.edu> <308df3bd49b7098b6f0d36cb2675a0de@uci.edu> <4b6492c5148151cc52c91f2d56c6532b@uci.edu> <54B64F8D-D4A6-4E0F-9693-3AC4055CEA02@gmail.com> <5a40b7af9947dc8297046ba52620569e@uci.edu> <9449FD1E-5D22-4B0B-A61A-644D7B63335B@gmail.com> <7e4d6f2773f654f8530155936b648832@uci.edu> <7495272C-476A-4985-8D49-30D991410535@gmail.com> <7870d65f86546a8b486faf98c1f6fcc0@uci.edu> Message-ID: Hi Carlson, Thanks for sharing. I did assemble the Illumina RNA-seq PE100 reads de novo using Trinity and i input this to the 1st run of maker for generating a set of genes to train SNAP and augustus. Now, i am planning to run a 2nd run (and perhaps final run) of maker for gene prediction, provided that the result of Snap and Augustus looks similar to each other. I was going to incorporate the GFF result from tophat into 2nd run of maker for gene prediction, along with Trinity output, but avoiding external protein annotation. I already did a separate blast analysis to identify orthologous genes and i prefer to run maker without any protein evidence. Do you recommend to input the output of tophat2gff for this second run of maker for gene prediction? Thanks again for your time and advise. Best Regards Karen On 2016-02-10 18:32, Carson Holt wrote: > I find tophat results to be too noisy, and prefer cufflinks. There is > both a tophat2gff and cufflinks2gff script that comes with MAKER. Also > consider assembling the reads with Trinity (my overall preferred > method because it yields the highest specificity). > > --Carson > > Sent from my iPhone > >> On Feb 10, 2016, at 3:27 PM, hcma wrote: >> >> Hi Mike, >> >> Thanks for the reply. So i can input raw RNA-seq reads to Tophat and >> feed the output to maker? >> >> Thanks. >> >> Best Regards >> KAren >> >> >> >>> On 2016-02-10 06:17, Michael Campbell wrote: >>> HI Karen, >>> From my experience trimming reads will not make things worse and it >>> generally makes things better. As far as the best program to use, one >>> doesn?t really stand out above the others as far as I can tell. >>> However, with paired end reads it is important to use a trimmer that >>> preserves the pairing between the two files (i.e when an entire read >>> is discarded the paired read is moved into a file for singletons). >>> Thanks >>> Mike >>>> On Feb 9, 2016, at 5:35 PM, hcma wrote: >>>> Hi Carson, >>>> For the final run of annotation, I would like to incorporate tophat >>>> results from RNA-seq data, from your experience, do you know if it >>>> is better to use raw RNA-seq (Illumina paired-end data) or trimmed >>>> (trimmed using Trimmomatuc) data for feeding into tophat? If >>>> trimmed, do you recommend a particular programme? >>>> Thanks for your time. >>>> Best Regards >>>> KAren >>>>> On 2016-02-05 15:33, Carson Holt wrote: >>>>> I recommend using both. You probably don't have augustus >>>>> installed. >>>>> --Carson >>>>> Sent from my iPhone >>>>>> On Feb 5, 2016, at 4:20 PM, hcma wrote: >>>>>> Hi Carlson, >>>>>> Thanks for the instruction and in maker_exe.ctl, i only see path >>>>>> to snap, but not to augustus, so my system admin is checking this >>>>>> for me. >>>>>> From some manual i found, people use both snap and augustus when >>>>>> using MAKER to annotate genomes. Would you recommend using both or >>>>>> one of the 2 is sufficient? >>>>>> Thanks for your valuable time and advise. >>>>>> Best Regards >>>>>> Karen >>>>>>> On 2016-02-05 15:03, Carson Holt wrote: >>>>>>> You need to find out where the augustus MAKER is using is >>>>>>> installed. >>>>>>> Check the maker_exe.ctl file you are using, or type ?which >>>>>>> augustus?. >>>>>>> ?Carson >>>>>>>> On Feb 5, 2016, at 3:58 PM, hcma wrote: >>>>>>>> Hi Carlson, >>>>>>>> These are the list of directories under maker/2.31.8 >>>>>>>> bin data GMOD INSTALL lib LICENSE MWAS perl README >>>>>>>> RELEASE src >>>>>>>> Where can i find augustus/? Or i have to ask my system admin to >>>>>>>> install this? >>>>>>>> Thanks. >>>>>>>> Best Regards >>>>>>>> Karen >>>>>>>>> On 2016-02-05 14:54, Carson Holt wrote: >>>>>>>>> Augustus gives you an entire directory rather than just a >>>>>>>>> single file >>>>>>>>> like SNAP. You have to take the directory and copy it to the >>>>>>>>> .../augustus/config/species/ directory. >>>>>>>>> Example: >>>>>>>>> ?/augustus/config/species/arabidopsis/ >>>>>>>>> Then ?arabidopsis? would be the species name to use with MAKER. >>>>>>>>> Sometimes you may have to do a second round of both SNAP and >>>>>>>>> Augustus >>>>>>>>> training (called bootstrapping). Look at the models you get >>>>>>>>> after the >>>>>>>>> first round, and if they look good then, the second round is >>>>>>>>> probably >>>>>>>>> not going top be beneficial. >>>>>>>>> ?Carson >>>>>>>>>> On Feb 5, 2016, at 3:42 PM, hcma wrote: >>>>>>>>>> Hi Dr Holt, >>>>>>>>>> Thanks for the email. Here is my pipeline, does it seems >>>>>>>>>> acceptable? Any comments is welcome and much appreciated. >>>>>>>>>> 1. Use maker to generate training gene set: >>>>>>>>>> genome=all-chromosome-r1.04.fasta >>>>>>>>>> est=Trinity.fasta >>>>>>>>>> est2genome=1 >>>>>>>>>> 2. Use output of Maker to train SNAP: >>>>>>>>>> maker2zff dwil-all-chromosome-r1.04.all.gff >>>>>>>>>> fathom genome.ann genome.dna ?gene-stats >>>>>>>>>> fathom genome.ann genome.dna ?categorize 1000 >>>>>>>>>> fathom genome.ann genome.dna ?gene-stats >>>>>>>>>> fathom uni.ann uni.dna ?export 1000 ?plus >>>>>>>>>> hmm-assembler.pl genome . > dwil_genome.hmm >>>>>>>>>> 3. Use output of Maker to train Augustus on their webserver: >>>>>>>>>> File used: >>>>>>>>>> Upload ?export.dna? as the genome file >>>>>>>>>> Upload ?export.aa? as the protein file >>>>>>>>>> 4. second and final Maker run: >>>>>>>>>> genome=all-chromosome-r1.04.fasta >>>>>>>>>> est=Trinity.fasta >>>>>>>>>> est2genome=0 >>>>>>>>>> Snaphmm=output of 2 >>>>>>>>>> How do i incorporate the output of training set of gene from >>>>>>>>>> Augustus web server here into this step 4? >>>>>>>>>> Thanks for your time. >>>>>>>>>> Best Regards >>>>>>>>>> Karen >>>>>>>>>>> On 2016-02-05 06:36, Carson Holt wrote: >>>>>>>>>>> Hi Karen, >>>>>>>>>>> There are many ways to train Augustus. I prefer to identify >>>>>>>>>>> gene >>>>>>>>>>> models in MAKER (GFF3) and use those to train both SNAP and >>>>>>>>>>> Augustus. >>>>>>>>>>> Here is a previous post on the topic ?> >>>>>>>>>>> https://groups.google.com/forum/#!searchin/maker-devel/augustus/maker-devel/FWMSTdqWQqI/lC3miQtiCpwJ >>>>>>>>>>> [1] >>>>>>>>>>> In the end you need to look at the SNAP and Augustus models >>>>>>>>>>> together >>>>>>>>>>> with evidence alignments in a genome browser (like desktop >>>>>>>>>>> Apollo). >>>>>>>>>>> When everything is trained well, both SNAP and Augustus >>>>>>>>>>> models will >>>>>>>>>>> look like each other and both seem to look like the evidence >>>>>>>>>>> alignments. >>>>>>>>>>> Thanks, >>>>>>>>>>> Carson >>>>>>>>>>>> On Feb 4, 2016, at 5:52 PM, hcma wrote: >>>>>>>>>>>> Hi, >>>>>>>>>>>> I have a genome sequence and Trinity assembly for a new >>>>>>>>>>>> species and >>>>>>>>>>>> I am wondering what are the best steps to take when using >>>>>>>>>>>> MAKER? >>>>>>>>>>>> 1. I used the genome sequence and all assembled Trinity >>>>>>>>>>>> sequence to >>>>>>>>>>>> do first run of MAKER in order to generate training set for >>>>>>>>>>>> SNAP and >>>>>>>>>>>> Augustus. >>>>>>>>>>>> In maker_opts.ctl: >>>>>>>>>>>> genome=all-chromosome-r1.04.fasta >>>>>>>>>>>> est=Trinity.fasta >>>>>>>>>>>> est2genome=1 >>>>>>>>>>>> 2. Train SNAP >>>>>>>>>>>> 3. Train Augustus >>>>>>>>>>>> When i train Augustus, i only supply genome and protein >>>>>>>>>>>> file, should >>>>>>>>>>>> i also supply the trinity file here? >>>>>>>>>>>> 4. what's the best parameter to use when running MAKER the >>>>>>>>>>>> second >>>>>>>>>>>> time for obtaining the final annotation? I would prefer not >>>>>>>>>>>> to use >>>>>>>>>>>> any external protein data. >>>>>>>>>>>> genome=all-chromosome-r1.04.fasta >>>>>>>>>>>> est=Trinity.fasta >>>>>>>>>>>> est2genome=0 >>>>>>>>>>>> SNAP >>>>>>>>>>>> Augustus >>>>>>>>>>>> Thanks. >>>>>>>>>>>> Best Regards >>>>>>>>>>>> KAren >>>>>>>>>>> Links: >>>>>>>>>>> ------ >>>>>>>>>>> [1] >>>>>>>>>>> https://groups.google.com/forum/#!searchin/maker-devel/augustus/maker-devel/FWMSTdqWQqI/lC3miQtiCpwJ >> From carsonhh at gmail.com Thu Feb 11 15:36:44 2016 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 11 Feb 2016 15:36:44 -0700 Subject: [maker-devel] Q on MAKER In-Reply-To: References: <07C1C0FA-243E-4F05-AE17-FBECCE5BC3A5@gmail.com> <980827F7-C4F0-4E26-A480-F588B2E09C7B@gmail.com> <99f6989955acdf6fd6b0875affbeefa9@uci.edu> <308df3bd49b7098b6f0d36cb2675a0de@uci.edu> <4b6492c5148151cc52c91f2d56c6532b@uci.edu> <54B64F8D-D4A6-4E0F-9693-3AC4055CEA02@gmail.com> <5a40b7af9947dc8297046ba52620569e@uci.edu> <9449FD1E-5D22-4B0B-A61A-644D7B63335B@gmail.com> <7e4d6f2773f654f8530155936b648832@uci.edu> <7495272C-476A-4985-8D49-30D991410535@gmail.com> <7870d65f86546a8b486faf98c1f6fcc0@uci.edu> Message-ID: <56F1935F-F6BA-4755-92F2-17EE81909619@gmail.com> Not if you already have trinity results. It will actually decrease the specificity of the run (i.e. causes false gene calls because of spurious evidence support). ?Carson > On Feb 11, 2016, at 3:32 PM, hcma wrote: > > Hi Carlson, > > Thanks for sharing. > > I did assemble the Illumina RNA-seq PE100 reads de novo using Trinity and i input this to the 1st run of maker for generating a set of genes to train SNAP and augustus. Now, i am planning to run a 2nd run (and perhaps final run) of maker for gene prediction, provided that the result of Snap and Augustus looks similar to each other. > > I was going to incorporate the GFF result from tophat into 2nd run of maker for gene prediction, along with Trinity output, but avoiding external protein annotation. I already did a separate blast analysis to identify orthologous genes and i prefer to run maker without any protein evidence. > > Do you recommend to input the output of tophat2gff for this second run of maker for gene prediction? > > Thanks again for your time and advise. > > Best Regards > Karen > > > > On 2016-02-10 18:32, Carson Holt wrote: >> I find tophat results to be too noisy, and prefer cufflinks. There is >> both a tophat2gff and cufflinks2gff script that comes with MAKER. Also >> consider assembling the reads with Trinity (my overall preferred >> method because it yields the highest specificity). >> --Carson >> Sent from my iPhone >>> On Feb 10, 2016, at 3:27 PM, hcma wrote: >>> Hi Mike, >>> Thanks for the reply. So i can input raw RNA-seq reads to Tophat and feed the output to maker? >>> Thanks. >>> Best Regards >>> KAren >>>> On 2016-02-10 06:17, Michael Campbell wrote: >>>> HI Karen, >>>> From my experience trimming reads will not make things worse and it >>>> generally makes things better. As far as the best program to use, one >>>> doesn?t really stand out above the others as far as I can tell. >>>> However, with paired end reads it is important to use a trimmer that >>>> preserves the pairing between the two files (i.e when an entire read >>>> is discarded the paired read is moved into a file for singletons). >>>> Thanks >>>> Mike >>>>> On Feb 9, 2016, at 5:35 PM, hcma wrote: >>>>> Hi Carson, >>>>> For the final run of annotation, I would like to incorporate tophat results from RNA-seq data, from your experience, do you know if it is better to use raw RNA-seq (Illumina paired-end data) or trimmed (trimmed using Trimmomatuc) data for feeding into tophat? If trimmed, do you recommend a particular programme? >>>>> Thanks for your time. >>>>> Best Regards >>>>> KAren >>>>>> On 2016-02-05 15:33, Carson Holt wrote: >>>>>> I recommend using both. You probably don't have augustus installed. >>>>>> --Carson >>>>>> Sent from my iPhone >>>>>>> On Feb 5, 2016, at 4:20 PM, hcma wrote: >>>>>>> Hi Carlson, >>>>>>> Thanks for the instruction and in maker_exe.ctl, i only see path to snap, but not to augustus, so my system admin is checking this for me. >>>>>>> From some manual i found, people use both snap and augustus when using MAKER to annotate genomes. Would you recommend using both or one of the 2 is sufficient? >>>>>>> Thanks for your valuable time and advise. >>>>>>> Best Regards >>>>>>> Karen >>>>>>>> On 2016-02-05 15:03, Carson Holt wrote: >>>>>>>> You need to find out where the augustus MAKER is using is installed. >>>>>>>> Check the maker_exe.ctl file you are using, or type ?which augustus?. >>>>>>>> ?Carson >>>>>>>>> On Feb 5, 2016, at 3:58 PM, hcma wrote: >>>>>>>>> Hi Carlson, >>>>>>>>> These are the list of directories under maker/2.31.8 >>>>>>>>> bin data GMOD INSTALL lib LICENSE MWAS perl README RELEASE src >>>>>>>>> Where can i find augustus/? Or i have to ask my system admin to install this? >>>>>>>>> Thanks. >>>>>>>>> Best Regards >>>>>>>>> Karen >>>>>>>>>> On 2016-02-05 14:54, Carson Holt wrote: >>>>>>>>>> Augustus gives you an entire directory rather than just a single file >>>>>>>>>> like SNAP. You have to take the directory and copy it to the >>>>>>>>>> .../augustus/config/species/ directory. >>>>>>>>>> Example: >>>>>>>>>> ?/augustus/config/species/arabidopsis/ >>>>>>>>>> Then ?arabidopsis? would be the species name to use with MAKER. >>>>>>>>>> Sometimes you may have to do a second round of both SNAP and Augustus >>>>>>>>>> training (called bootstrapping). Look at the models you get after the >>>>>>>>>> first round, and if they look good then, the second round is probably >>>>>>>>>> not going top be beneficial. >>>>>>>>>> ?Carson >>>>>>>>>>> On Feb 5, 2016, at 3:42 PM, hcma wrote: >>>>>>>>>>> Hi Dr Holt, >>>>>>>>>>> Thanks for the email. Here is my pipeline, does it seems acceptable? Any comments is welcome and much appreciated. >>>>>>>>>>> 1. Use maker to generate training gene set: >>>>>>>>>>> genome=all-chromosome-r1.04.fasta >>>>>>>>>>> est=Trinity.fasta >>>>>>>>>>> est2genome=1 >>>>>>>>>>> 2. Use output of Maker to train SNAP: >>>>>>>>>>> maker2zff dwil-all-chromosome-r1.04.all.gff >>>>>>>>>>> fathom genome.ann genome.dna ?gene-stats >>>>>>>>>>> fathom genome.ann genome.dna ?categorize 1000 >>>>>>>>>>> fathom genome.ann genome.dna ?gene-stats >>>>>>>>>>> fathom uni.ann uni.dna ?export 1000 ?plus >>>>>>>>>>> hmm-assembler.pl genome . > dwil_genome.hmm >>>>>>>>>>> 3. Use output of Maker to train Augustus on their webserver: >>>>>>>>>>> File used: >>>>>>>>>>> Upload ?export.dna? as the genome file >>>>>>>>>>> Upload ?export.aa? as the protein file >>>>>>>>>>> 4. second and final Maker run: >>>>>>>>>>> genome=all-chromosome-r1.04.fasta >>>>>>>>>>> est=Trinity.fasta >>>>>>>>>>> est2genome=0 >>>>>>>>>>> Snaphmm=output of 2 >>>>>>>>>>> How do i incorporate the output of training set of gene from Augustus web server here into this step 4? >>>>>>>>>>> Thanks for your time. >>>>>>>>>>> Best Regards >>>>>>>>>>> Karen >>>>>>>>>>>> On 2016-02-05 06:36, Carson Holt wrote: >>>>>>>>>>>> Hi Karen, >>>>>>>>>>>> There are many ways to train Augustus. I prefer to identify gene >>>>>>>>>>>> models in MAKER (GFF3) and use those to train both SNAP and Augustus. >>>>>>>>>>>> Here is a previous post on the topic ?> >>>>>>>>>>>> https://groups.google.com/forum/#!searchin/maker-devel/augustus/maker-devel/FWMSTdqWQqI/lC3miQtiCpwJ >>>>>>>>>>>> [1] >>>>>>>>>>>> In the end you need to look at the SNAP and Augustus models together >>>>>>>>>>>> with evidence alignments in a genome browser (like desktop Apollo). >>>>>>>>>>>> When everything is trained well, both SNAP and Augustus models will >>>>>>>>>>>> look like each other and both seem to look like the evidence >>>>>>>>>>>> alignments. >>>>>>>>>>>> Thanks, >>>>>>>>>>>> Carson >>>>>>>>>>>>> On Feb 4, 2016, at 5:52 PM, hcma wrote: >>>>>>>>>>>>> Hi, >>>>>>>>>>>>> I have a genome sequence and Trinity assembly for a new species and >>>>>>>>>>>>> I am wondering what are the best steps to take when using MAKER? >>>>>>>>>>>>> 1. I used the genome sequence and all assembled Trinity sequence to >>>>>>>>>>>>> do first run of MAKER in order to generate training set for SNAP and >>>>>>>>>>>>> Augustus. >>>>>>>>>>>>> In maker_opts.ctl: >>>>>>>>>>>>> genome=all-chromosome-r1.04.fasta >>>>>>>>>>>>> est=Trinity.fasta >>>>>>>>>>>>> est2genome=1 >>>>>>>>>>>>> 2. Train SNAP >>>>>>>>>>>>> 3. Train Augustus >>>>>>>>>>>>> When i train Augustus, i only supply genome and protein file, should >>>>>>>>>>>>> i also supply the trinity file here? >>>>>>>>>>>>> 4. what's the best parameter to use when running MAKER the second >>>>>>>>>>>>> time for obtaining the final annotation? I would prefer not to use >>>>>>>>>>>>> any external protein data. >>>>>>>>>>>>> genome=all-chromosome-r1.04.fasta >>>>>>>>>>>>> est=Trinity.fasta >>>>>>>>>>>>> est2genome=0 >>>>>>>>>>>>> SNAP >>>>>>>>>>>>> Augustus >>>>>>>>>>>>> Thanks. >>>>>>>>>>>>> Best Regards >>>>>>>>>>>>> KAren >>>>>>>>>>>> Links: >>>>>>>>>>>> ------ >>>>>>>>>>>> [1] >>>>>>>>>>>> https://groups.google.com/forum/#!searchin/maker-devel/augustus/maker-devel/FWMSTdqWQqI/lC3miQtiCpwJ > From hcma at uci.edu Thu Feb 11 17:18:43 2016 From: hcma at uci.edu (hcma) Date: Thu, 11 Feb 2016 16:18:43 -0800 Subject: [maker-devel] Q on MAKER In-Reply-To: References: <07C1C0FA-243E-4F05-AE17-FBECCE5BC3A5@gmail.com> <980827F7-C4F0-4E26-A480-F588B2E09C7B@gmail.com> <99f6989955acdf6fd6b0875affbeefa9@uci.edu> Message-ID: Hi Carson, I have downloaded Apollo and what format of the SNAP and Augustus models does Apollo take? Do i need to reformat the SNAP.hmm and which output of Augustus to use if I train Augustus manually? Thanks again for your time. Best Regards Karen On 2016-02-05 06:36, Carson Holt wrote: > Hi Karen, > > There are many ways to train Augustus. I prefer to identify gene > models in MAKER (GFF3) and use those to train both SNAP and Augustus. > Here is a previous post on the topic ?> > https://groups.google.com/forum/#!searchin/maker-devel/augustus/maker-devel/FWMSTdqWQqI/lC3miQtiCpwJ > [1] > > In the end you need to look at the SNAP and Augustus models together > with evidence alignments in a genome browser (like desktop Apollo). > When everything is trained well, both SNAP and Augustus models will > look like each other and both seem to look like the evidence > alignments. > > Thanks, > Carson > >> On Feb 4, 2016, at 5:52 PM, hcma wrote: >> >> Hi, >> >> I have a genome sequence and Trinity assembly for a new species and >> I am wondering what are the best steps to take when using MAKER? >> >> 1. I used the genome sequence and all assembled Trinity sequence to >> do first run of MAKER in order to generate training set for SNAP and >> Augustus. >> >> In maker_opts.ctl: >> genome=all-chromosome-r1.04.fasta >> est=Trinity.fasta >> est2genome=1 >> >> 2. Train SNAP >> >> 3. Train Augustus >> >> When i train Augustus, i only supply genome and protein file, should >> i also supply the trinity file here? >> >> 4. what's the best parameter to use when running MAKER the second >> time for obtaining the final annotation? I would prefer not to use >> any external protein data. >> >> genome=all-chromosome-r1.04.fasta >> est=Trinity.fasta >> est2genome=0 >> SNAP >> Augustus >> >> Thanks. >> >> Best Regards >> KAren > > > > Links: > ------ > [1] > https://groups.google.com/forum/#!searchin/maker-devel/augustus/maker-devel/FWMSTdqWQqI/lC3miQtiCpwJ From panos.ioannidis at gmail.com Fri Feb 12 01:35:49 2016 From: panos.ioannidis at gmail.com (Panos Ioannidis) Date: Fri, 12 Feb 2016 09:35:49 +0100 Subject: [maker-devel] GFF features from Maker Message-ID: Hi guys, I have a few questions regarding annotated features in the GFF file built by Maker. 1) I'm a bit confused about the annotations coming from "est2genome" and "blastn", because they both give "expressed_sequence_match" features. So, what's the difference between them? How do the EST matches from est2genome differ from those from blastn? 2) Same goes for "protein2genome" and "blastx", since they both give "protein_match" features. 3) Last, what is the difference between the partial matches and full-length matches? For example, in almost all cases where est2genome gives an "expressed_sequence_match" feature for a genomic area, it also gives a "match_part" feature for sub-areas within this area. What is the meaning of this? I'm pasting one such area, below. scaffold3|size1771164 est2genome expressed_sequence_match 21953 22276 949 + . ID=scaffold3|size1771164:hit:1901:3.2.0.0;Name=C24476_a_3_0_l_241 scaffold3|size1771164 est2genome match_part 21953 22035 949 + . ID=scaffold3|size1771164:hsp:1902:3.2.0.0;Parent=scaffold3|size1771164:hit:1901:3.2.0.0;Target=C24476_a_3_0_l_241 1 83 +;Gap=M83 scaffold3|size1771164 est2genome match_part 22148 22276 949 + . ID=scaffold3|size1771164:hsp:1903:3.2.0.0;Parent=scaffold3|size1771164:hit:1901:3.2.0.0;Target=C24476_a_3_0_l_241 84 215 +;Gap=M104 D2 M7 I4 M8 I1 M8 Thanks, Panos -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Feb 12 07:48:46 2016 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 12 Feb 2016 07:48:46 -0700 Subject: [maker-devel] GFF features from Maker In-Reply-To: References: Message-ID: <806D9F3C-13AF-4EDE-ACA8-DA981255E5DD@gmail.com> Hi Panos, Terms used are governed by the sequence ontology (http://www.sequenceontology.org ), and specific definitions can be found there. Terms have a Parent/Child relationship with lower levels being more specific than higher levels. The match feature is used for ab initio reference results rather than the potentially better term predicted_gene because match is already handled correctly by most software and most databases like FlyBase already use it for that purpose (in part because predicted_gene was a latecomer to the ontology list and it is used more often to distinguish accepted models without human curation rather than reference predictions). Since match is an experimental_feature, it matches the expected separation between genes (biological_region) and analysis results (experimental_feature). It?s rather boring and technical, but it?s all the result of carful selection using the Sequence Ontology inheritance levels and term definitions. Example in attached image. ?Carson > On Feb 12, 2016, at 1:35 AM, Panos Ioannidis wrote: > > Hi guys, > > I have a few questions regarding annotated features in the GFF file built by Maker. > > 1) I'm a bit confused about the annotations coming from "est2genome" and "blastn", because they both give "expressed_sequence_match" features. So, what's the difference between them? How do the EST matches from est2genome differ from those from blastn? > > 2) Same goes for "protein2genome" and "blastx", since they both give "protein_match" features. > > 3) Last, what is the difference between the partial matches and full-length matches? For example, in almost all cases where est2genome gives an "expressed_sequence_match" feature for a genomic area, it also gives a "match_part" feature for sub-areas within this area. What is the meaning of this? I'm pasting one such area, below. > > scaffold3|size1771164 est2genome expressed_sequence_match 21953 22276 949 + . ID=scaffold3|size1771164:hit:1901:3.2.0.0;Name=C24476_a_3_0_l_241 > scaffold3|size1771164 est2genome match_part 21953 22035 949 + . ID=scaffold3|size1771164:hsp:1902:3.2.0.0;Parent=scaffold3|size1771164:hit:1901:3.2.0.0;Target=C24476_a_3_0_l_241 1 83 +;Gap=M83 > scaffold3|size1771164 est2genome match_part 22148 22276 949 + . ID=scaffold3|size1771164:hsp:1903:3.2.0.0;Parent=scaffold3|size1771164:hit:1901:3.2.0.0;Target=C24476_a_3_0_l_241 84 215 +;Gap=M104 D2 M7 I4 M8 I1 M8 > > Thanks, > Panos > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: SO-0000102.png Type: image/png Size: 7720 bytes Desc: not available URL: From carsonhh at gmail.com Fri Feb 12 07:56:41 2016 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 12 Feb 2016 07:56:41 -0700 Subject: [maker-devel] GFF features from Maker In-Reply-To: References: Message-ID: <1B5D7E98-850C-4D16-A5C1-5BE1EB5B8735@gmail.com> Also BLAST vs Exonerate is an algorithmic difference. BLAST aligns using traditional Smith Watmerman resulting in potenially out of order sub alignments called HSPs. Exonerate does spice aware alignments (in order and correctly trimmed for splice sites). More info on polishing alignments on wiki page here ?> http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial_for_GMOD_Online_Training_2014#Polishing_Evidence_Alignments ?Carson > On Feb 12, 2016, at 1:35 AM, Panos Ioannidis wrote: > > Hi guys, > > I have a few questions regarding annotated features in the GFF file built by Maker. > > 1) I'm a bit confused about the annotations coming from "est2genome" and "blastn", because they both give "expressed_sequence_match" features. So, what's the difference between them? How do the EST matches from est2genome differ from those from blastn? > > 2) Same goes for "protein2genome" and "blastx", since they both give "protein_match" features. > > 3) Last, what is the difference between the partial matches and full-length matches? For example, in almost all cases where est2genome gives an "expressed_sequence_match" feature for a genomic area, it also gives a "match_part" feature for sub-areas within this area. What is the meaning of this? I'm pasting one such area, below. > > scaffold3|size1771164 est2genome expressed_sequence_match 21953 22276 949 + . ID=scaffold3|size1771164:hit:1901:3.2.0.0;Name=C24476_a_3_0_l_241 > scaffold3|size1771164 est2genome match_part 21953 22035 949 + . ID=scaffold3|size1771164:hsp:1902:3.2.0.0;Parent=scaffold3|size1771164:hit:1901:3.2.0.0;Target=C24476_a_3_0_l_241 1 83 +;Gap=M83 > scaffold3|size1771164 est2genome match_part 22148 22276 949 + . ID=scaffold3|size1771164:hsp:1903:3.2.0.0;Parent=scaffold3|size1771164:hit:1901:3.2.0.0;Target=C24476_a_3_0_l_241 84 215 +;Gap=M104 D2 M7 I4 M8 I1 M8 > > Thanks, > Panos > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From panos.ioannidis at gmail.com Fri Feb 12 07:59:05 2016 From: panos.ioannidis at gmail.com (Panos Ioannidis) Date: Fri, 12 Feb 2016 15:59:05 +0100 Subject: [maker-devel] GFF features from Maker In-Reply-To: <1B5D7E98-850C-4D16-A5C1-5BE1EB5B8735@gmail.com> References: <1B5D7E98-850C-4D16-A5C1-5BE1EB5B8735@gmail.com> Message-ID: Thanks for all the info Carson! Panos On Fri, Feb 12, 2016 at 3:56 PM, Carson Holt wrote: > Also BLAST vs Exonerate is an algorithmic difference. BLAST aligns using > traditional Smith Watmerman resulting in potenially out of order sub > alignments called HSPs. Exonerate does spice aware alignments (in order and > correctly trimmed for splice sites). More info on polishing alignments on > wiki page here ?> > http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial_for_GMOD_Online_Training_2014#Polishing_Evidence_Alignments > > ?Carson > > > > On Feb 12, 2016, at 1:35 AM, Panos Ioannidis > wrote: > > Hi guys, > > I have a few questions regarding annotated features in the GFF file built > by Maker. > > 1) I'm a bit confused about the annotations coming from "est2genome" and > "blastn", because they both give "expressed_sequence_match" features. So, > what's the difference between them? How do the EST matches from est2genome > differ from those from blastn? > > 2) Same goes for "protein2genome" and "blastx", since they both give > "protein_match" features. > > 3) Last, what is the difference between the partial matches and > full-length matches? For example, in almost all cases where est2genome > gives an "expressed_sequence_match" feature for a genomic area, it also > gives a "match_part" feature for sub-areas within this area. What is the > meaning of this? I'm pasting one such area, below. > > scaffold3|size1771164 est2genome expressed_sequence_match > 21953 22276 949 + . > ID=scaffold3|size1771164:hit:1901:3.2.0.0;Name=C24476_a_3_0_l_241 > scaffold3|size1771164 est2genome match_part 21953 22035 > 949 + . > ID=scaffold3|size1771164:hsp:1902:3.2.0.0;Parent=scaffold3|size1771164:hit:1901:3.2.0.0;Target=C24476_a_3_0_l_241 > 1 83 +;Gap=M83 > scaffold3|size1771164 est2genome match_part 22148 22276 > 949 + . > ID=scaffold3|size1771164:hsp:1903:3.2.0.0;Parent=scaffold3|size1771164:hit:1901:3.2.0.0;Target=C24476_a_3_0_l_241 > 84 215 +;Gap=M104 D2 M7 I4 M8 I1 M8 > > Thanks, > Panos > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Feb 12 12:14:16 2016 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 12 Feb 2016 12:14:16 -0700 Subject: [maker-devel] Q on MAKER In-Reply-To: References: <07C1C0FA-243E-4F05-AE17-FBECCE5BC3A5@gmail.com> <980827F7-C4F0-4E26-A480-F588B2E09C7B@gmail.com> <99f6989955acdf6fd6b0875affbeefa9@uci.edu> Message-ID: You need to view the output the programs produce, not the HMM. You can Run them through MAKER and then view the GFF3 files produced Here is a MAKER tutorial where this is done that you can follow along if you wish ?> http://gmod.org/wiki/MAKER_Tutorial_2013#Training_ab_initio_Gene_Predictors For Augustus training there are a number of threads related to how to do that on the MAKER mailing list archives ? https://groups.google.com/forum/#!searchin/maker-devel/augustus Also other resources online ?> http://www.molecularevolution.org/molevolfiles/exercises/augustus/training.html ?Carson > On Feb 11, 2016, at 5:18 PM, hcma wrote: > > Hi Carson, > > I have downloaded Apollo and what format of the SNAP and Augustus models does Apollo take? Do i need to reformat the SNAP.hmm and which output of Augustus to use if I train Augustus manually? > > Thanks again for your time. > > Best Regards > Karen > > > > > On 2016-02-05 06:36, Carson Holt wrote: >> Hi Karen, >> There are many ways to train Augustus. I prefer to identify gene >> models in MAKER (GFF3) and use those to train both SNAP and Augustus. >> Here is a previous post on the topic ?> >> https://groups.google.com/forum/#!searchin/maker-devel/augustus/maker-devel/FWMSTdqWQqI/lC3miQtiCpwJ >> [1] >> In the end you need to look at the SNAP and Augustus models together >> with evidence alignments in a genome browser (like desktop Apollo). >> When everything is trained well, both SNAP and Augustus models will >> look like each other and both seem to look like the evidence >> alignments. >> Thanks, >> Carson >>> On Feb 4, 2016, at 5:52 PM, hcma wrote: >>> Hi, >>> I have a genome sequence and Trinity assembly for a new species and >>> I am wondering what are the best steps to take when using MAKER? >>> 1. I used the genome sequence and all assembled Trinity sequence to >>> do first run of MAKER in order to generate training set for SNAP and >>> Augustus. >>> In maker_opts.ctl: >>> genome=all-chromosome-r1.04.fasta >>> est=Trinity.fasta >>> est2genome=1 >>> 2. Train SNAP >>> 3. Train Augustus >>> When i train Augustus, i only supply genome and protein file, should >>> i also supply the trinity file here? >>> 4. what's the best parameter to use when running MAKER the second >>> time for obtaining the final annotation? I would prefer not to use >>> any external protein data. >>> genome=all-chromosome-r1.04.fasta >>> est=Trinity.fasta >>> est2genome=0 >>> SNAP >>> Augustus >>> Thanks. >>> Best Regards >>> KAren >> Links: >> ------ >> [1] >> https://groups.google.com/forum/#!searchin/maker-devel/augustus/maker-devel/FWMSTdqWQqI/lC3miQtiCpwJ -------------- next part -------------- An HTML attachment was scrubbed... URL: From fdolze at students.uni-mainz.de Tue Feb 16 03:10:03 2016 From: fdolze at students.uni-mainz.de (Florian) Date: Tue, 16 Feb 2016 11:10:03 +0100 Subject: [maker-devel] Estimated runtime on 180mb genome @ 128 cores? In-Reply-To: <56BC65E7.6000904@students.uni-mainz.de> References: <56BC65E7.6000904@students.uni-mainz.de> Message-ID: <56C2F57B.8020208@students.uni-mainz.de> Hi all, I am trying to run MAKER on a project of mine and since this is the first time I use MAKER I'd like to ask some more experienced users what I can expect in regard to resource consumption and runtime of MAKER. My genome data is: * 180.652.019 bp genome length * 5.292 Scaffolds * 34.136 bp median scaffold length * 2.056.324 bp longest * 272.065 bp N50 - I use a 73mb transcriptome assembly as EST Evidence - SwissProt as Protein Homology Evidence - 60kb custom repeat library for RepeatMasker For gene prediction I am running with a SNAP hmm I generated using CEGMA, GeneMark, and Augustus trained by their webservice. I have options est2genome and protein2genome turned on (=1) and use tRNAscan and snoscan. And other options as following: #-----MAKER Behavior Options max_dna_len=2100000 #length for dividing up contigs into chunks (increases/decreases memory usage) <--- Is this reasonable? min_contig=1 #skip genome contigs below this length (under 10kb are often useless) pred_flank=200 #flank for extending evidence clusters sent to gene predictors pred_stats=0 #report AED and QI statistics for all predictions as well as models AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and 1) min_protein=0 #require at least this many amino acids in predicted proteins alt_splice=0 #Take extra steps to try and find alternative splicing, 1 = yes, 0 = no always_complete=0 #extra steps to force start and stop codons, 1 = yes, 0 = no map_forward=0 #map names and attributes forward from old GFF3 genes, 1 = yes, 0 = no keep_preds=1 #Concordance threshold to add unsupported gene prediction (bound by 0 and 1) split_hit=10000 #length for the splitting of hits (expected max intron size for evidence alignments) single_exon=1 #consider single exon EST evidence when generating annotations, 1 = yes, 0 = no single_length=250 #min length required for single exon ESTs if 'single_exon is enabled' correct_est_fusion=0 #limits use of ESTs in annotation to avoid fusion genes The maker_bopts.ctl file is unchanged. (Basically I follow this guide https://github.com/sujaikumar/assemblage/blob/master/README-annotation.md) At the moment I am running this with openMPI as: mpiexec -mca btl ^openib -n 128 /project/molgen/Bio/maker-2.31.8_MPI-1.8.1/bin/maker -base maker_run1 -fix_nucleotides on 128 cores with 130GB of memory. First of all, are those options I use viable? Is it possible to guesstimate the runtime I can expect? 5 days? 20 days? And is it reasonable to use additional cores or will this not benefit much? Thanks for your insights, Florian -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Tue Feb 16 09:42:51 2016 From: dence at genetics.utah.edu (Daniel Ence) Date: Tue, 16 Feb 2016 16:42:51 +0000 Subject: [maker-devel] Estimated runtime on 180mb genome @ 128 cores? In-Reply-To: <56C2F57B.8020208@students.uni-mainz.de> References: <56BC65E7.6000904@students.uni-mainz.de> <56C2F57B.8020208@students.uni-mainz.de> Message-ID: Hi Florian, I don?t think you want est2genome or protein2genome turned on for this run. Est2genome is usually only used if you don?t have any ab-initio predictors trained; protein2genome should only be used if you have good reason not to expect any introns at a ll (for example, a prokaryotic genome). Also, you set the max_dna_len parameter for 2.1Mbp, which is larger than your N50. Setting this too large prevents MAKER from speeding up it?s analysis by splitting contigs/scaffolds across multiple processors. There?s usually no reason to change this from the default setting. With a good N50 like you have, you?ll probably get good results. ~Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 On Feb 16, 2016, at 3:10 AM, Florian > wrote: Hi all, I am trying to run MAKER on a project of mine and since this is the first time I use MAKER I'd like to ask some more experienced users what I can expect in regard to resource consumption and runtime of MAKER. My genome data is: * 180.652.019 bp genome length * 5.292 Scaffolds * 34.136 bp median scaffold length * 2.056.324 bp longest * 272.065 bp N50 - I use a 73mb transcriptome assembly as EST Evidence - SwissProt as Protein Homology Evidence - 60kb custom repeat library for RepeatMasker For gene prediction I am running with a SNAP hmm I generated using CEGMA, GeneMark, and Augustus trained by their webservice. I have options est2genome and protein2genome turned on (=1) and use tRNAscan and snoscan. And other options as following: #-----MAKER Behavior Options max_dna_len=2100000 #length for dividing up contigs into chunks (increases/decreases memory usage) <--- Is this reasonable? min_contig=1 #skip genome contigs below this length (under 10kb are often useless) pred_flank=200 #flank for extending evidence clusters sent to gene predictors pred_stats=0 #report AED and QI statistics for all predictions as well as models AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and 1) min_protein=0 #require at least this many amino acids in predicted proteins alt_splice=0 #Take extra steps to try and find alternative splicing, 1 = yes, 0 = no always_complete=0 #extra steps to force start and stop codons, 1 = yes, 0 = no map_forward=0 #map names and attributes forward from old GFF3 genes, 1 = yes, 0 = no keep_preds=1 #Concordance threshold to add unsupported gene prediction (bound by 0 and 1) split_hit=10000 #length for the splitting of hits (expected max intron size for evidence alignments) single_exon=1 #consider single exon EST evidence when generating annotations, 1 = yes, 0 = no single_length=250 #min length required for single exon ESTs if 'single_exon is enabled' correct_est_fusion=0 #limits use of ESTs in annotation to avoid fusion genes The maker_bopts.ctl file is unchanged. (Basically I follow this guide https://github.com/sujaikumar/assemblage/blob/master/README-annotation.md) At the moment I am running this with openMPI as: mpiexec -mca btl ^openib -n 128 /project/molgen/Bio/maker-2.31.8_MPI-1.8.1/bin/maker -base maker_run1 -fix_nucleotides on 128 cores with 130GB of memory. First of all, are those options I use viable? Is it possible to guesstimate the runtime I can expect? 5 days? 20 days? And is it reasonable to use additional cores or will this not benefit much? Thanks for your insights, Florian _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Feb 16 09:53:55 2016 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 16 Feb 2016 09:53:55 -0700 Subject: [maker-devel] Estimated runtime on 180mb genome @ 128 cores? In-Reply-To: References: <56BC65E7.6000904@students.uni-mainz.de> <56C2F57B.8020208@students.uni-mainz.de> Message-ID: Agree. 500,000 is about the highest you ever want to go with max_dna_len. Increasing the value decreases parallelization and increases memory usage. The only biological reason to ever increase it is if genes are really long and don?t fit into windows of this size. Also test out the mpiexec command with something like ?hostname? to make sure it works. Example ?> mpiexec -mca btl ^openib -n 128 hostname Should print out 128 lines identifying all hosts in the communication ring. If it prints out the same host ID every time, then there is a problem and you may need to provide a hostfile to let mpiexec know all the hosts it can run across. ?Carson > On Feb 16, 2016, at 9:42 AM, Daniel Ence wrote: > > Hi Florian, I don?t think you want est2genome or protein2genome turned on for this run. Est2genome is usually only used if you don?t have any ab-initio predictors trained; protein2genome should only be used if you have good reason not to expect any introns at a ll (for example, a prokaryotic genome). > > Also, you set the max_dna_len parameter for 2.1Mbp, which is larger than your N50. Setting this too large prevents MAKER from speeding up it?s analysis by splitting contigs/scaffolds across multiple processors. There?s usually no reason to change this from the default setting. > > With a good N50 like you have, you?ll probably get good results. > > ~Daniel > > > > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > >> On Feb 16, 2016, at 3:10 AM, Florian > wrote: >> >> Hi all, >> >> I am trying to run MAKER on a project of mine and since this is the first time I use MAKER I'd like to ask some more experienced users what I can expect in regard to resource consumption and runtime of MAKER. >> >> My genome data is: >> >> 180.652.019 bp genome length >> 5.292 Scaffolds >> 34.136 bp median scaffold length >> 2.056.324 bp longest >> 272.065 bp N50 >> - I use a 73mb transcriptome assembly as EST Evidence >> - SwissProt as Protein Homology Evidence >> - 60kb custom repeat library for RepeatMasker >> >> >> >> For gene prediction I am running with a SNAP hmm I generated using CEGMA, GeneMark, and Augustus trained by their webservice. >> I have options est2genome and protein2genome turned on (=1) and use tRNAscan and snoscan. And other options as following: >> >> #-----MAKER Behavior Options >> max_dna_len=2100000 #length for dividing up contigs into chunks (increases/decreases memory usage) <--- Is this reasonable? >> min_contig=1 #skip genome contigs below this length (under 10kb are often useless) >> >> pred_flank=200 #flank for extending evidence clusters sent to gene predictors >> pred_stats=0 #report AED and QI statistics for all predictions as well as models >> AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and 1) >> min_protein=0 #require at least this many amino acids in predicted proteins >> alt_splice=0 #Take extra steps to try and find alternative splicing, 1 = yes, 0 = no >> always_complete=0 #extra steps to force start and stop codons, 1 = yes, 0 = no >> map_forward=0 #map names and attributes forward from old GFF3 genes, 1 = yes, 0 = no >> keep_preds=1 #Concordance threshold to add unsupported gene prediction (bound by 0 and 1) >> >> split_hit=10000 #length for the splitting of hits (expected max intron size for evidence alignments) >> single_exon=1 #consider single exon EST evidence when generating annotations, 1 = yes, 0 = no >> single_length=250 #min length required for single exon ESTs if 'single_exon is enabled' >> correct_est_fusion=0 #limits use of ESTs in annotation to avoid fusion genes >> >> The maker_bopts.ctl file is unchanged. >> >> (Basically I follow this guide https://github.com/sujaikumar/assemblage/blob/master/README-annotation.md ) >> >> >> At the moment I am running this with openMPI as: >> >> mpiexec -mca btl ^openib -n 128 /project/molgen/Bio/maker-2.31.8_MPI-1.8.1/bin/maker -base maker_run1 -fix_nucleotides >> >> on 128 cores with 130GB of memory. >> >> >> First of all, are those options I use viable? >> >> Is it possible to guesstimate the runtime I can expect? 5 days? 20 days? And is it reasonable to use additional cores or will this not benefit much? >> >> Thanks for your insights, >> Florian >> >> >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From alejocn5 at gmail.com Tue Feb 16 14:17:40 2016 From: alejocn5 at gmail.com (=?UTF-8?Q?Alejandro_Cer=C3=B3n_Noriega?=) Date: Tue, 16 Feb 2016 16:17:40 -0500 Subject: [maker-devel] problem with the example Message-ID: hello i am Alejandro I have tried to follow the tutorial MAKER 1-I Copy the files in the data directories to a temporary directory where i run an example file. 2-I Type maker -CTL to generate generic MAKER control files (foto_1) 3-I edit the control files to include the path of the genome file ( hsap_contig.fasta from the example) (foto_2) then I give the paht maker maker_exe.ctl maker_opts.ctl maker_bopts.ctl (foto 3) that generate a expected folder hsap_contig.maker.output but when i whatn to look for the gff file i dont find it, inside the /data/hsap_contig.maker.output/hsap_contig_datastore, i dont find the all subdirectories seq_name.gff - a gff file that can be loaded into GMOD, GBROWSE, or Apollo * seq_name.maker.transcripts.fasta - a fasta file of the MAKER annotated transcript sequences * seq_name.maker.proteins.fasta - a fasta file of the MAKER annotated protein sequences * seq_name.maker.XXX.transcript.fasta - a fasta file of ab-initio predicted transcript sequences from program XXX * seq_name.maker.XXX.proteins.fasta - a fasta file of ab-inito predicted protein sequences from program XXX * seq_name.maker.non_overlapping_ab_initio.transcripts.fasta - a fasta file of filtered ab-inito transcript sequences that don't overlap maker annotations * seq_name.maker.non_overlapping_ab_initio.proteins.fasta - a fasta file of filtered ab-inito protein sequences that don't overlap maker annotations * theVoid.seq_name/ - a directory containing all of the raw output files produced by MAKER, including BLAST reports, SNAP output, exonnerate output and the masked genomeic sequence. i only find a directorie named 80 (foto 4) i dont know if a make somthing wrong, also try to change the path of the EST (foto_5) thanks for your attention -- *Alejandro Cer?n Noriega, **B.Sc* MSc. Candidate Bioinformatics *K ?**?**?* -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: foto_1.png Type: image/png Size: 67330 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: foto_2.png Type: image/png Size: 257578 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Foto_3.png Type: image/png Size: 213241 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: foto_4.png Type: image/png Size: 129352 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: foto_5.png Type: image/png Size: 255944 bytes Desc: not available URL: From carsonhh at gmail.com Thu Feb 18 12:36:13 2016 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 18 Feb 2016 12:36:13 -0700 Subject: [maker-devel] problem with the example In-Reply-To: References: Message-ID: <4CD9B36B-8C9D-4E48-B1B6-ACAFF28DF3B2@gmail.com> To access files for individual sequences use the datastore index: /scratchsan/caceronn/Results/MAKER/data/hsap_contig.maker.output/hsap_contig_master_datastore_index.log look in that file to find the location of individual contig results. For merged results you have to use the gff3_merge script together with the datastore index. Here is a nice tutorial with step by step instructions and a video to easilly follow along ?> http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial_for_GMOD_Online_Training_2014 ?Carson > On Feb 16, 2016, at 2:17 PM, Alejandro Cer?n Noriega wrote: > > hello i am Alejandro > > I have tried to follow the tutorial MAKER > > 1-I Copy the files in the data directories to a temporary directory where i run an example file. > 2-I Type maker -CTL to generate generic MAKER control files (foto_1) > 3-I edit the control files to include the path of the genome file ( hsap_contig.fasta from the example) (foto_2) > then I give the paht maker maker_exe.ctl maker_opts.ctl maker_bopts.ctl (foto 3) > > that generate a expected folder > hsap_contig.maker.output > > but when i whatn to look for the gff file i dont find it, inside the /data/hsap_contig.maker.output/hsap_contig_datastore, i dont find the all subdirectories > > seq_name.gff - a gff file that can be loaded into GMOD, GBROWSE, > or Apollo > * seq_name.maker.transcripts.fasta - a fasta file of the MAKER > annotated transcript sequences > * seq_name.maker.proteins.fasta - a fasta file of the MAKER > annotated protein sequences > * seq_name.maker.XXX.transcript.fasta - a fasta file of ab-initio > predicted transcript sequences from program XXX > * seq_name.maker.XXX.proteins.fasta - a fasta file of ab-inito > predicted protein sequences from program XXX > * seq_name.maker.non_overlapping_ab_initio.transcripts.fasta - a > fasta file of filtered ab-inito transcript sequences that don't > overlap maker annotations > * seq_name.maker.non_overlapping_ab_initio.proteins.fasta - a > fasta file of filtered ab-inito protein sequences that don't > overlap maker annotations > * theVoid.seq_name/ - a directory containing all of the raw > output files produced by MAKER, including BLAST reports, SNAP > output, exonnerate output and the masked genomeic sequence. > > i only find a directorie named 80 (foto 4) > > i dont know if a make somthing wrong, > > also try to change the path of the EST (foto_5) > > > thanks for your attention > > > -- > Alejandro Cer?n Noriega, B.Sc > MSc. Candidate Bioinformatics > K ??? > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From fdolze at students.uni-mainz.de Fri Feb 26 07:16:10 2016 From: fdolze at students.uni-mainz.de (Florian) Date: Fri, 26 Feb 2016 15:16:10 +0100 Subject: [maker-devel] Possible to redirect maker output? Message-ID: <56D05E2A.1040201@students.uni-mainz.de> Hi all, I am trying to run maker on a cluster (2 nodes with 64 cores each), to speed things up I copied all input files to a ramdisk to reduce I/O time, but all subsequent results are still written to hdd. Is there a way I can tell maker to write the maker.results files to ramdisk (or generally any other directory than the current working dir) too? (are they actually used for the current run or are only files in the temp files location used?) Is anybody experienced with running maker on a similar setup and could tell me how you are handling this? thanks, Florian From scott at scottcain.net Fri Feb 26 10:50:06 2016 From: scott at scottcain.net (Scott Cain) Date: Fri, 26 Feb 2016 12:50:06 -0500 Subject: [maker-devel] GMOD 2016 meeting Message-ID: Hello all, I am pleased to announce that details have been finalized for the 2016 GMOD meeting. It will take place immediately following the Galaxy Community Conference at Indiana University in Bloomington, IN on June 30 and July 1. We're still working on agenda details, so if you have suggestions or would like to present, please let me know. For registration information, please see: https://gmod2016.eventbrite.com And for other information about the meeting, keep an eye on: http://gmod.org/wiki/Jun_2016_GMOD_Meeting I look forward to seeing you there! Scott -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research -------------- next part -------------- An HTML attachment was scrubbed... URL: From gloriarendon at gmail.com Fri Feb 26 15:14:26 2016 From: gloriarendon at gmail.com (Gloria Rendon) Date: Fri, 26 Feb 2016 16:14:26 -0600 Subject: [maker-devel] MAKER/3.00.0-beta: missing some accessory scripts Message-ID: Hello, My name is Gloria Rendon. I work at the Carle Woese Institute for Genomic Biology at the University of Illinois at Urbana-Champaign. In recent months we used MAKER/3.00.0-beta to generate annotations (in GFF3 format) for a de-novo assembly that we produced in-house for the Taro plant. As part of the same project, I need to run now an analysis with RNA seq data for the same Tarospecies. I am going to use STAR for the alignment step and I need to provide the annotations file in GTF format, not in GFF3 format as I currently have. In order to perform the GFF3->GTF conversion I was planning to run some of the accessory scripts that come with MAKER add_utr_start_stop_gff gff3_2_gtf However I just noticed that my installation of MAKER is missing those two scripts. This is how the MAKER/bin folder looks like now: $ ls /home/groups/hpcbio/apps/maker/maker-3.00.0-beta/bin/ AED_cdf_generator.pl ipr_update_gff maker_map_ids cegma2zff iprscan2gff3 map2assembly chado2gff3 maker map_data_ids compare_gff3_to_chado maker2chado map_fasta_ids cufflinks2gff3 maker2eval_gtf map_gff_ids evaluator maker2jbrowse match2gene.pl fasta_merge maker2wap quality_filter.pl fasta_tool maker2zff tophat2gff3 genemark_gtf2gff3 maker_functional_fasta gff3_merge maker_functional_gff btw, earlier versions of MAKER that are also installed on our cluster as also missing those scripts. Could you please tell me how to remedy the situation? Do you have executables of the two scripts that you can share with me? OR Do I need to re-install MAKER with special configuration options? Thank you very much for the attention to this matter. Sincerely, Gloria -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Feb 29 12:09:14 2016 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 29 Feb 2016 12:09:14 -0700 Subject: [maker-devel] Possible to redirect maker output? In-Reply-To: <56D05E2A.1040201@students.uni-mainz.de> References: <56D05E2A.1040201@students.uni-mainz.de> Message-ID: <75FD2CDE-AD66-416A-9A3E-6AF49B3FB13F@gmail.com> You can try setting TMP= in the control files to a RAM disk location (You will need a lot of RAM though, perhaps 500Gb). Even then some components used by MAKER may not function properly with tmpfs, but you can try. If it doesn?t work you?ll get an error. The main output directory on the other hand must be globally accessible to all nodes if working with MPI, and a RAM disk will only exist and be accessible on a single node (even though a directory with the same name may exists on multiple nodes, they will actually be separate and distinct locations, i.e. /dev/shm). ?Carson > On Feb 26, 2016, at 7:16 AM, Florian wrote: > > Hi all, > > I am trying to run maker on a cluster (2 nodes with 64 cores each), to speed things up I copied all input files to a ramdisk to reduce I/O time, but all subsequent results are still written to hdd. > > Is there a way I can tell maker to write the maker.results files to ramdisk (or generally any other directory than the current working dir) too? (are they actually used for the current run or are only files in the temp files location used?) > > Is anybody experienced with running maker on a similar setup and could tell me how you are handling this? > > > thanks, > Florian > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Mon Feb 29 12:17:29 2016 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 29 Feb 2016 12:17:29 -0700 Subject: [maker-devel] MAKER/3.00.0-beta: missing some accessory scripts In-Reply-To: References: Message-ID: You should be using maker2eval_gtf. The scripts you mention were actually deprecated in MAKER 2.10 onwards (about 5 years ago). You may be looking at old documentation. ?Carson > On Feb 26, 2016, at 3:14 PM, Gloria Rendon wrote: > > Hello, > > My name is Gloria Rendon. I work at the Carle Woese Institute for Genomic Biology at the University of Illinois at Urbana-Champaign. > > In recent months we used MAKER/3.00.0-beta to generate annotations (in GFF3 format) for a de-novo assembly that we produced in-house for the Taro plant. > > As part of the same project, I need to run now an analysis with RNA seq data for the same Tarospecies. > I am going to use STAR for the alignment step and I need to provide the annotations file in GTF format, not in GFF3 format as I currently have. > > In order to perform the GFF3->GTF conversion I was planning to run some of the accessory scripts that come with MAKER > > add_utr_start_stop_gff > gff3_2_gtf > > However I just noticed that my installation of MAKER is missing those two scripts. > This is how the MAKER/bin folder looks like now: > > $ ls /home/groups/hpcbio/apps/maker/maker-3.00.0-beta/bin/ > AED_cdf_generator.pl ipr_update_gff maker_map_ids > cegma2zff iprscan2gff3 map2assembly > chado2gff3 maker map_data_ids > compare_gff3_to_chado maker2chado map_fasta_ids > cufflinks2gff3 maker2eval_gtf map_gff_ids > evaluator maker2jbrowse match2gene.pl > fasta_merge maker2wap quality_filter.pl > fasta_tool maker2zff tophat2gff3 > genemark_gtf2gff3 maker_functional_fasta > gff3_merge maker_functional_gff > > > btw, earlier versions of MAKER that are also installed on our cluster as also missing those scripts. > > Could you please tell me how to remedy the situation? > Do you have executables of the two scripts that you can share with me? > OR > Do I need to re-install MAKER with special configuration options? > > Thank you very much for the attention to this matter. > > Sincerely, > > Gloria > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: