From FeatherstonJ at arc.agric.za Tue Nov 1 10:12:46 2016 From: FeatherstonJ at arc.agric.za (Jonathan Featherston) Date: Tue, 1 Nov 2016 15:12:46 +0000 Subject: [maker-devel] [Caution: Message contains Redirect URL content] InterProScan protein domain & AED physical evidence filtering In-Reply-To: References: Message-ID: <0C2463EA-53FD-4C9B-853A-BE933973E1FA@arc.agric.za> Dear Allison I'm not sure about your extra gene models but here is the script to perform quality filtering. A perl script I got from the forum somewhere (changed to txt in case it gets removed by mail server. Regards Jonathan -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: quality_filter.txt URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Nov 1 10:43:21 2016 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 1 Nov 2016 09:43:21 -0600 Subject: [maker-devel] [Caution: Message contains Redirect URL content] InterProScan protein domain & AED physical evidence filtering In-Reply-To: <0C2463EA-53FD-4C9B-853A-BE933973E1FA@arc.agric.za> References: <0C2463EA-53FD-4C9B-853A-BE933973E1FA@arc.agric.za> Message-ID: One note I?d like to make, is that doing a second round with keep_preds=1 is the wrong procedure (only do that if you really want to keep everything - i.e. in some fungi or oomycetes). Rather you should use InterProScan to evaluate the rejected models in the non-overlapping.abinit.proteins.fasta file, then grep the ones that have an IPR domain out of the GFF3 (will be match/match_part features) and then pass them to pred_gff in a separate run (just updates the format to gene/mRNA/exon/CDSwith proper reading frame). You can then merge the resulting GFF3's and fasta files. The reason there are differences between the runs is that there are models with AED less than 1 that get rejected for other reasons that you are brought back with keep_preds=1. For example if the only evidence is a protein alignment that has deep overlapping HSPs (extremely low complexity alignment) it will be filtered out even though AED is not technically equal to 1. Also if the overlapping protein evidence is in a different reading frame than the model it is supposed to support then the AED will be less than 1 but eAED will be 1 (extended AED), and the model will be rejected. ?Carson >> Hello MAKER google group, >> >> >> For the final round of a MAKER annotation for a de novo plant genome assembly, I ran MAKER twice: once with keep_preds=0 which annotated 20,284 genes and once with keep_preds=1 which annotated 34,055 genes. >> >> >> I ran the 34,055 genes (the keep_preds=1 set) through InterProScan to search the MAKER predictions for protein domain content and added this IPRScan output into the MAKER gff file with the ipr_update_gff accessory script. >> >> >> The game plan is to go through the 34,055 genes and remove any gene model that doesn? have either protein domain content or physical evidence. I am counting genes that have an AED=1 as the genes that don? have physical evidence. >> >> >> I have two questions: >> >> >> >> 1. I count 11,762 genes that have AED=1.0 in the keep_preds=1 annotation set, which leaves me with 22,293 genes that I? assuming have some physical evidence (34,055-11,762=22,293). But when I ran MAKER with keep_preds=0 originally, I only count 20,284 genes. What are the extra ?2,000 genes that are being annotated in the keep_preds=1 run that have and AED score of less than 1.0, but are not being annotated in the keep_preds=0 run? >> >> >> 2. My second question is if there is an accessory script available that will remove genes that lack either the IPRScan protein domains or physical evidence (AED < 1)? This type of gene removal was mentioned in a previous post from 2012 (https://groups.google.com/forum/#!searchin/maker-devel/sorry$20there$27s$20not$20a$20script$20prepackaged$20with$20MAKER$20for$20that$20yet.%7Csort:relevance/maker-devel/VaoXWlGHOjs/EElr_otrK8QJ ) and I was just wondering if since then someone wrote a script that will do this for me. >> >> >> >> If anyone could offer me any feedback, that would be greatly appreciated! >> >> >> >> Thank you, >> >> >> >> Allison >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From jacques.dainat at bils.se Tue Nov 1 11:08:45 2016 From: jacques.dainat at bils.se (Jacques Dainat) Date: Tue, 1 Nov 2016 17:08:45 +0100 Subject: [maker-devel] est_gff input does not provide any gene model In-Reply-To: References: Message-ID: <29E6299A-EA5F-4768-88CD-202ABB05AF89@bils.se> Thank you for the quick confirmation ! Just for clarification, what I provided to Maker was a correct gff3 file that indeed contain gene,mRNA,exon types but does not contain any CDS. I haven?t seen any information about the particular gff3 feature types expected for the est_gff files supplied. I think you should communicate more about it (within the maker_opt.ctl ?). It would be nice to stop the pipeline if the file provided contains no information. (When the file provided doesn?t exits too. The warning is not obvious to catch when launching on a cluster...) A last question. do the scores from the score column are used by MAKER from the est_gff file ? Jacques > On 01 Nov 2016, at 04:24, Carson Holt wrote: > > Evidence such as est_gff has to follow the alignment format used by GFF3 (i.e. match/match_part) whereas you are providing gene models (i.e. gene/mRNA/exon/CDS). Note that match/match_part are two level features whereas gene models are 3 levels. You need to reformat to match/match_part. > > ?Carson > > >> On Oct 31, 2016, at 4:51 AM, Jacques Dainat > wrote: >> >> Hello, >> >> I?m using usually Cufflinks output to feed Maker through the est_gff parameter, combined with the est2genome=1 parameter I get the wanted output. >> This time I used Stringtie output to feed Maker, but I don?t have any gene model predicted using the est2genome parameter. >> >> Any explanation ? Is it due to the gff3 format differences between these two file ? >> >> Cufflinks output example: >> Pnalgiovense_4592 Cufflinks match 363 977 17.844829 - . ID=1:s3_c1_r1.4.2;Name=1:s3_c1_r1.4.2; >> Pnalgiovense_4592 Cufflinks match_part 363 666 17.844829 - . ID=1:s3_c1_r1.4.2:exon-1;Name=1:s3_c1_r1.4.2;Parent=1:s3_c1_r1.4.2;Target=1:s3_c1_r1.4.2 1 304 +; >> Pnalgiovense_4592 Cufflinks match_part 743 977 17.844829 - . ID=1:s3_c1_r1.4.2:exon-2;Name=1:s3_c1_r1.4.2;Parent=1:s3_c1_r1.4.2;Target=1:s3_c1_r1.4.2 305 539 +; >> >> Stringtie output example: >> Pnalgiovense_112 StringTie gene 20 1256 1000 + . ID=HtMm_All.12253;cov=8.028295;fPKM=1.214491;gene_id=HtMm_All.12253;tPM=2.706611;transcript_id=HtMm_All.12253.1 >> Pnalgiovense_112 StringTie mRNA 20 1256 1000 + . ID=HtMm_All.12253.1;Parent=HtMm_All.12253;cov=8.028295;fPKM=1.214491;gene_id=HtMm_All.12253;tPM=2.706611;transcript_id=HtMm_All.12253.1 >> Pnalgiovense_112 StringTie exon 20 1256 1000 + . ID=HtMm_All.12253.1-exon-1;Parent=HtMm_All.12253.1;cov=8.028295;exon_number=1;gene_id=HtMm_All.12253;transcript_id=HtMm_All.12253.1 >> >> >> If it?s the Stringtie output that is problematic how can I fix it ? Removing gene, changing mRNA by match and exons by match_part is enough ? >> >> Best regards, >> >> >> Jacques Dainat, PhD >> NBIS (National Bioinformatics Infrastructure Sweden) >> Genome Annotation Service >> >> Address: (room E10:4204 - last floor) >> Uppsala University, BMC >> Department of Medical Biochemistry Microbiology, Genomics >> Husargatan 3, box 582 >> S-75123 Uppsala Sweden >> Phone: 01 84 71 46 25 >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Nov 1 11:25:36 2016 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 1 Nov 2016 10:25:36 -0600 Subject: [maker-devel] est_gff input does not provide any gene model In-Reply-To: <29E6299A-EA5F-4768-88CD-202ABB05AF89@bils.se> References: <29E6299A-EA5F-4768-88CD-202ABB05AF89@bils.se> Message-ID: <923C15DF-D705-416C-BCB8-CB87F1309797@gmail.com> The score will be ignored. The format to be used for evidence alignments is specified in the GFF3 spec (https://github.com/The-Sequence-Ontology/Specifications/blob/master/gff3.md ). An EST alignment example is also given as part of the GFF3 Spec. ?Carson > On Nov 1, 2016, at 10:08 AM, Jacques Dainat wrote: > > Thank you for the quick confirmation ! > > Just for clarification, what I provided to Maker was a correct gff3 file that indeed contain gene,mRNA,exon types but does not contain any CDS. > > I haven?t seen any information about the particular gff3 feature types expected for the est_gff files supplied. I think you should communicate more about it (within the maker_opt.ctl ?). > It would be nice to stop the pipeline if the file provided contains no information. (When the file provided doesn?t exits too. The warning is not obvious to catch when launching on a cluster...) > > A last question. do the scores from the score column are used by MAKER from the est_gff file ? > > Jacques > >> On 01 Nov 2016, at 04:24, Carson Holt > wrote: >> >> Evidence such as est_gff has to follow the alignment format used by GFF3 (i.e. match/match_part) whereas you are providing gene models (i.e. gene/mRNA/exon/CDS). Note that match/match_part are two level features whereas gene models are 3 levels. You need to reformat to match/match_part. >> >> ?Carson >> >> >>> On Oct 31, 2016, at 4:51 AM, Jacques Dainat > wrote: >>> >>> Hello, >>> >>> I?m using usually Cufflinks output to feed Maker through the est_gff parameter, combined with the est2genome=1 parameter I get the wanted output. >>> This time I used Stringtie output to feed Maker, but I don?t have any gene model predicted using the est2genome parameter. >>> >>> Any explanation ? Is it due to the gff3 format differences between these two file ? >>> >>> Cufflinks output example: >>> Pnalgiovense_4592 Cufflinks match 363 977 17.844829 - . ID=1:s3_c1_r1.4.2;Name=1:s3_c1_r1.4.2; >>> Pnalgiovense_4592 Cufflinks match_part 363 666 17.844829 - . ID=1:s3_c1_r1.4.2:exon-1;Name=1:s3_c1_r1.4.2;Parent=1:s3_c1_r1.4.2;Target=1:s3_c1_r1.4.2 1 304 +; >>> Pnalgiovense_4592 Cufflinks match_part 743 977 17.844829 - . ID=1:s3_c1_r1.4.2:exon-2;Name=1:s3_c1_r1.4.2;Parent=1:s3_c1_r1.4.2;Target=1:s3_c1_r1.4.2 305 539 +; >>> >>> Stringtie output example: >>> Pnalgiovense_112 StringTie gene 20 1256 1000 + . ID=HtMm_All.12253;cov=8.028295;fPKM=1.214491;gene_id=HtMm_All.12253;tPM=2.706611;transcript_id=HtMm_All.12253.1 >>> Pnalgiovense_112 StringTie mRNA 20 1256 1000 + . ID=HtMm_All.12253.1;Parent=HtMm_All.12253;cov=8.028295;fPKM=1.214491;gene_id=HtMm_All.12253;tPM=2.706611;transcript_id=HtMm_All.12253.1 >>> Pnalgiovense_112 StringTie exon 20 1256 1000 + . ID=HtMm_All.12253.1-exon-1;Parent=HtMm_All.12253.1;cov=8.028295;exon_number=1;gene_id=HtMm_All.12253;transcript_id=HtMm_All.12253.1 >>> >>> >>> If it?s the Stringtie output that is problematic how can I fix it ? Removing gene, changing mRNA by match and exons by match_part is enough ? >>> >>> Best regards, >>> >>> >>> Jacques Dainat, PhD >>> NBIS (National Bioinformatics Infrastructure Sweden) >>> Genome Annotation Service >>> >>> Address: (room E10:4204 - last floor) >>> Uppsala University, BMC >>> Department of Medical Biochemistry Microbiology, Genomics >>> Husargatan 3, box 582 >>> S-75123 Uppsala Sweden >>> Phone: 01 84 71 46 25 >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mohamed.amine.chebbi at univ-poitiers.fr Wed Nov 2 13:09:54 2016 From: mohamed.amine.chebbi at univ-poitiers.fr (Mohamed Amine Chebbi) Date: Wed, 2 Nov 2016 19:09:54 +0100 (CET) Subject: [maker-devel] ProtExcluder1.2 Error Message-ID: <236415532.6267908.1478110194546.JavaMail.zimbra@univ-poitiers.fr> Hi! I am working on creating a custom repeat library and I want to use ProtExcluder1.2 to trim potential genes from my repeat sequences. My blast version is BLAST 2.2.30+ I get this message error : Can not open the seqfile test.lib_blast_results.txt.fnolowm50seq mergeunmatchedregion.pl seqfile Illegal division by zero at ProtExcluder1.2/GCcontent.pl line 122. I wonder if you can help me to fix this. Thank you. Amine -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.s.campbell1 at gmail.com Thu Nov 3 12:57:35 2016 From: michael.s.campbell1 at gmail.com (Michael Campbell) Date: Thu, 3 Nov 2016 13:57:35 -0400 Subject: [maker-devel] ProtExcluder1.2 Error In-Reply-To: <236415532.6267908.1478110194546.JavaMail.zimbra@univ-poitiers.fr> References: <236415532.6267908.1478110194546.JavaMail.zimbra@univ-poitiers.fr> Message-ID: Hi Amine, That script is maintained by Ning Jiang and Kevin Childs. They know best what this script is expecting. I?ve ccd them on this email in the hope that they can provide some direction. Thanks, Mike > On Nov 2, 2016, at 2:09 PM, Mohamed Amine Chebbi wrote: > > Hi! > > I am working on creating a custom repeat library and I want to use ProtExcluder1.2 to trim potential genes from my repeat sequences. > My blast version is BLAST 2.2.30+ > > I get this message error : > > Can not open the seqfile test.lib_blast_results.txt.fnolowm50seq > mergeunmatchedregion.pl seqfile > Illegal division by zero at ProtExcluder1.2/GCcontent.pl line 122. > > I wonder if you can help me to fix this. > > Thank you. > > Amine > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From psh65 at cornell.edu Thu Nov 3 13:14:17 2016 From: psh65 at cornell.edu (Prashant S Hosmani) Date: Thu, 3 Nov 2016 18:14:17 +0000 Subject: [maker-devel] ProtExcluder1.2 Error In-Reply-To: References: <236415532.6267908.1478110194546.JavaMail.zimbra@univ-poitiers.fr> Message-ID: Hi Amine, I was getting similar error. You need to be careful with the blast versions. Try using the same blast version for makeblastdb. I was using BLAST 2.2.29+. After recreating new blast database with same version, it worked for me. Hope this helps. Prashant Prashant Hosmani Sol Genomics Network Boyce Thompson Institute, Ithaca, NY, USA On Nov 3, 2016, at 1:57 PM, Michael Campbell > wrote: Hi Amine, That script is maintained by Ning Jiang and Kevin Childs. They know best what this script is expecting. I?ve ccd them on this email in the hope that they can provide some direction. Thanks, Mike On Nov 2, 2016, at 2:09 PM, Mohamed Amine Chebbi > wrote: Hi! I am working on creating a custom repeat library and I want to use ProtExcluder1.2 to trim potential genes from my repeat sequences. My blast version is BLAST 2.2.30+ I get this message error : Can not open the seqfile test.lib_blast_results.txt.fnolowm50seq mergeunmatchedregion.pl seqfile Illegal division by zero at ProtExcluder1.2/GCcontent.pl line 122. I wonder if you can help me to fix this. Thank you. Amine _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From scott at scottcain.net Fri Nov 4 14:25:02 2016 From: scott at scottcain.net (Scott Cain) Date: Fri, 4 Nov 2016 15:25:02 -0400 Subject: [maker-devel] Last Call for GMOD talks at PAG Message-ID: Time is short! If you want to attend PAG and would like to present on a topic that would be of interest to the GMOD community, please send an abstract or at least a descriptive title to help at gmod.org. Types of talks typically include updates on GMOD software projects, usage stories for successful sites, proposals for new GMOD projects and descriptions of plugins for existing GMOD software projects like Tripal , JBrowse and Galaxy . Please consider giving a talk and sharing your experience and ideas! Scott -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research -------------- next part -------------- An HTML attachment was scrubbed... URL: From mohamed.amine.chebbi at univ-poitiers.fr Thu Nov 3 18:40:18 2016 From: mohamed.amine.chebbi at univ-poitiers.fr (chebbi mohamed amine) Date: Fri, 4 Nov 2016 00:40:18 +0100 (CET) Subject: [maker-devel] ProtExcluder1.2 Error In-Reply-To: <20161103185405.183337t1yq0no6x9@mail.msu.edu> References: <236415532.6267908.1478110194546.JavaMail.zimbra@univ-poitiers.fr> <20161103185405.183337t1yq0no6x9@mail.msu.edu> Message-ID: <1641376945.6912938.1478216418712.JavaMail.zimbra@univ-poitiers.fr> Hi ! Thank you Prashant for sharing your experience. Indeed using the same blast version 2.2.29 for makeblastdb seems to resolve the problem. It is looking to work fine for all the sequences except one as I have the message above: Fatal exception (source file ../../easel/esl_sqio_ascii.c, line 2001): Failed to fetch subsequence residues -- corrupt coords? sh: line 1: 46520 Aborted (core dumped) /hmmer-3.1b2-linux-intel-x86_64/binaries/esl-sfetch -c 1242..19031 all-te.lib rnd-4_family-1731#DNA >> blastx_results-all-te.txt.fnolowm50seq Did you encounter this problem before? Thank you for your help. Amine De: jiangn at msu.edu ?: "Prashant S Hosmani" Cc: "Michael Campbell" , "Mohamed Amine Chebbi" Envoy?: Jeudi 3 Novembre 2016 23:54:05 Objet: Re: [maker-devel] ProtExcluder1.2 Error Hi Prashant, Thank you so much for sharing your experience. It is important to keep everything in the same version. I will remind users about this when we update it and I may need to bother you then. Best regards, Ning Quoting Prashant S Hosmani : > Hi Amine, > > I was getting similar error. You need to be careful with the blast > versions. Try using the same blast version for makeblastdb. I was > using BLAST 2.2.29+. After recreating new blast database with same > version, it worked for me. > > Hope this helps. > Prashant > > > Prashant Hosmani > Sol Genomics Network > Boyce Thompson Institute, Ithaca, NY, USA > > > > On Nov 3, 2016, at 1:57 PM, Michael Campbell > > > wrote: > > Hi Amine, > > That script is maintained by Ning Jiang and Kevin Childs. They know > best what this script is expecting. I?ve ccd them on this email in > the hope that they can provide some direction. > > Thanks, > Mike > On Nov 2, 2016, at 2:09 PM, Mohamed Amine Chebbi > > > wrote: > > Hi! > > I am working on creating a custom repeat library and I want to use > ProtExcluder1.2 to trim potential genes from my repeat sequences. > My blast version is BLAST 2.2.30+ > > I get this message error : > > Can not open the seqfile test.lib_blast_results.txt.fnolowm50seq > mergeunmatchedregion.pl seqfile > Illegal division by zero at ProtExcluder1.2/GCcontent.pl line 122. > > I wonder if you can help me to fix this. > > Thank you. > > Amine > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mohamed.amine.chebbi at univ-poitiers.fr Fri Nov 4 05:44:02 2016 From: mohamed.amine.chebbi at univ-poitiers.fr (chebbi mohamed amine) Date: Fri, 4 Nov 2016 11:44:02 +0100 (CET) Subject: [maker-devel] ProtExcluder1.2 Error In-Reply-To: <1827195032.6913929.1478217820889.JavaMail.zimbra@univ-poitiers.fr> References: <236415532.6267908.1478110194546.JavaMail.zimbra@univ-poitiers.fr> <20161103185405.183337t1yq0no6x9@mail.msu.edu> <1641376945.6912938.1478216418712.JavaMail.zimbra@univ-poitiers.fr> <20161103195409.76212s1yy72mv95t@mail.msu.edu> <1827195032.6913929.1478217820889.JavaMail.zimbra@univ-poitiers.fr> Message-ID: <838628537.7128959.1478256242111.JavaMail.zimbra@univ-poitiers.fr> Hi J iangn ! I did some modifications in the script ProtExcluder1.2/mspesl-sfetch.pl by replacing : "esl-sfetch --index $ARGV[0] " by "samtools faidx $ARGV[0]" and "esl-sfetch -c $from..$to $ARGV[0] $line[7] >> $ARGV[3]" by "samtools faidx $ARGV[0] $line[7]:$from-$to >> $ARGV[3]" it works fine know and the script can extract the subsequences correctly. Best regard, Amine De: "chebbi mohamed amine" ?: "jiangn" Envoy?: Vendredi 4 Novembre 2016 01:03:40 Objet: Re: [maker-devel] ProtExcluder1.2 Error Hi J iangn In fact, this sequence has a size of 19031 bases. When I try the command /hmmer-3.1b2-linux-intel-x86_64/binaries/esl-sfetch -c 1242..19031 all-te.lib rnd-4_family-1731#DNA I get the error , however by testing with coordiantes inferior to 19031 it works fine. I think that it's a related problem to hmmer. I will try to add manualy the subsequence to the file .fnolowm50seq. Thank you Amine De: "jiangn" ?: "chebbi mohamed amine" Cc: "Prashant S Hosmani" , "Michael Campbell" Envoy?: Vendredi 4 Novembre 2016 00:54:09 Objet: Re: [maker-devel] ProtExcluder1.2 Error Hi Amine, I don't have this kind of experience. If only one sequence failed, I would suspect there might be some format issue for that specific sequence. Regards, Ning Quoting chebbi mohamed amine : > > Hi ! > > Thank you Prashant for sharing your experience. Indeed using the same > blast version 2.2.29 for makeblastdb seems to resolve the problem. It > is looking to work fine for all the sequences except one as I have > the message above: > > Fatal exception (source file ../../easel/esl_sqio_ascii.c, line 2001): > Failed to fetch subsequence residues -- corrupt coords? > sh: line 1: 46520 Aborted (core dumped) > /hmmer-3.1b2-linux-intel-x86_64/binaries/esl-sfetch -c 1242..19031 > all-te.lib rnd-4_family-1731#DNA >> > blastx_results-all-te.txt.fnolowm50seq > > Did you encounter this problem before? > > Thank you for your help. > > Amine > > > De: jiangn at msu.edu > ?: "Prashant S Hosmani" > Cc: "Michael Campbell" , "Mohamed > Amine Chebbi" > Envoy?: Jeudi 3 Novembre 2016 23:54:05 > Objet: Re: [maker-devel] ProtExcluder1.2 Error > > > > Hi Prashant, > > Thank you so much for sharing your experience. It is important to > keep everything in the same version. I will remind users about this > when we update it and I may need to bother you then. > > Best regards, > > Ning > > Quoting Prashant S Hosmani : > >> Hi Amine, >> >> I was getting similar error. You need to be careful with the blast >> versions. Try using the same blast version for makeblastdb. I was >> using BLAST 2.2.29+. After recreating new blast database with same >> version, it worked for me. >> >> Hope this helps. >> Prashant >> >> >> Prashant Hosmani >> Sol Genomics Network >> Boyce Thompson Institute, Ithaca, NY, USA >> >> >> >> On Nov 3, 2016, at 1:57 PM, Michael Campbell >> > >> wrote: >> >> Hi Amine, >> >> That script is maintained by Ning Jiang and Kevin Childs. They know >> best what this script is expecting. I?ve ccd them on this email in >> the hope that they can provide some direction. >> >> Thanks, >> Mike >> On Nov 2, 2016, at 2:09 PM, Mohamed Amine Chebbi >> > >> wrote: >> >> Hi! >> >> I am working on creating a custom repeat library and I want to use >> ProtExcluder1.2 to trim potential genes from my repeat sequences. >> My blast version is BLAST 2.2.30+ >> >> I get this message error : >> >> Can not open the seqfile test.lib_blast_results.txt.fnolowm50seq >> mergeunmatchedregion.pl seqfile >> Illegal division by zero at ProtExcluder1.2/GCcontent.pl line 122. >> >> I wonder if you can help me to fix this. >> >> Thank you. >> >> Amine >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jiangn at msu.edu Fri Nov 4 15:35:43 2016 From: jiangn at msu.edu (jiangn at msu.edu) Date: Fri, 04 Nov 2016 16:35:43 -0400 Subject: [maker-devel] ProtExcluder1.2 Error In-Reply-To: <838628537.7128959.1478256242111.JavaMail.zimbra@univ-poitiers.fr> References: <236415532.6267908.1478110194546.JavaMail.zimbra@univ-poitiers.fr> <20161103185405.183337t1yq0no6x9@mail.msu.edu> <1641376945.6912938.1478216418712.JavaMail.zimbra@univ-poitiers.fr> <20161103195409.76212s1yy72mv95t@mail.msu.edu> <1827195032.6913929.1478217820889.JavaMail.zimbra@univ-poitiers.fr> <838628537.7128959.1478256242111.JavaMail.zimbra@univ-poitiers.fr> Message-ID: <20161104163543.98626jb6y81eis67@mail.msu.edu> Hi Amine, That's good to know. Thank you! Ning Quoting chebbi mohamed amine : > Hi J iangn ! > > I did some modifications in the script > ProtExcluder1.2/mspesl-sfetch.pl by replacing : > > "esl-sfetch --index $ARGV[0] " by "samtools faidx $ARGV[0]" > and > "esl-sfetch -c $from..$to $ARGV[0] $line[7] >> $ARGV[3]" by "samtools > faidx $ARGV[0] $line[7]:$from-$to >> $ARGV[3]" > > it works fine know and the script can extract the subsequences correctly. > > Best regard, > Amine > > > De: "chebbi mohamed amine" > ?: "jiangn" > Envoy?: Vendredi 4 Novembre 2016 01:03:40 > Objet: Re: [maker-devel] ProtExcluder1.2 Error > > Hi J iangn > > In fact, this sequence has a size of 19031 bases. > When I try the command > /hmmer-3.1b2-linux-intel-x86_64/binaries/esl-sfetch -c 1242..19031 > all-te.lib rnd-4_family-1731#DNA I get the error , however by testing > with coordiantes inferior to 19031 it works fine. I think that it's a > related problem to hmmer. I will try to add manualy the subsequence > to the file .fnolowm50seq. > > Thank you > Amine > > De: "jiangn" > ?: "chebbi mohamed amine" > Cc: "Prashant S Hosmani" , "Michael Campbell" > > Envoy?: Vendredi 4 Novembre 2016 00:54:09 > Objet: Re: [maker-devel] ProtExcluder1.2 Error > > > > Hi Amine, > > I don't have this kind of experience. If only one sequence failed, I > would suspect there might be some format issue for that specific > sequence. > > Regards, > > Ning > > Quoting chebbi mohamed amine : > >> >> Hi ! >> >> Thank you Prashant for sharing your experience. Indeed using the same >> blast version 2.2.29 for makeblastdb seems to resolve the problem. It >> is looking to work fine for all the sequences except one as I have >> the message above: >> >> Fatal exception (source file ../../easel/esl_sqio_ascii.c, line 2001): >> Failed to fetch subsequence residues -- corrupt coords? >> sh: line 1: 46520 Aborted (core dumped) >> /hmmer-3.1b2-linux-intel-x86_64/binaries/esl-sfetch -c 1242..19031 >> all-te.lib rnd-4_family-1731#DNA >> >> blastx_results-all-te.txt.fnolowm50seq >> >> Did you encounter this problem before? >> >> Thank you for your help. >> >> Amine >> >> >> De: jiangn at msu.edu >> ?: "Prashant S Hosmani" >> Cc: "Michael Campbell" , "Mohamed >> Amine Chebbi" >> Envoy?: Jeudi 3 Novembre 2016 23:54:05 >> Objet: Re: [maker-devel] ProtExcluder1.2 Error >> >> >> >> Hi Prashant, >> >> Thank you so much for sharing your experience. It is important to >> keep everything in the same version. I will remind users about this >> when we update it and I may need to bother you then. >> >> Best regards, >> >> Ning >> >> Quoting Prashant S Hosmani : >> >>> Hi Amine, >>> >>> I was getting similar error. You need to be careful with the blast >>> versions. Try using the same blast version for makeblastdb. I was >>> using BLAST 2.2.29+. After recreating new blast database with same >>> version, it worked for me. >>> >>> Hope this helps. >>> Prashant >>> >>> >>> Prashant Hosmani >>> Sol Genomics Network >>> Boyce Thompson Institute, Ithaca, NY, USA >>> >>> >>> >>> On Nov 3, 2016, at 1:57 PM, Michael Campbell >>> > >>> wrote: >>> >>> Hi Amine, >>> >>> That script is maintained by Ning Jiang and Kevin Childs. They know >>> best what this script is expecting. I?ve ccd them on this email in >>> the hope that they can provide some direction. >>> >>> Thanks, >>> Mike >>> On Nov 2, 2016, at 2:09 PM, Mohamed Amine Chebbi >>> > >>> wrote: >>> >>> Hi! >>> >>> I am working on creating a custom repeat library and I want to use >>> ProtExcluder1.2 to trim potential genes from my repeat sequences. >>> My blast version is BLAST 2.2.30+ >>> >>> I get this message error : >>> >>> Can not open the seqfile test.lib_blast_results.txt.fnolowm50seq >>> mergeunmatchedregion.pl seqfile >>> Illegal division by zero at ProtExcluder1.2/GCcontent.pl line 122. >>> >>> I wonder if you can help me to fix this. >>> >>> Thank you. >>> >>> Amine >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> >> >> > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pierre-francois.bert at inra.fr Tue Nov 8 06:13:55 2016 From: pierre-francois.bert at inra.fr (Pierre-Francois Bert) Date: Tue, 8 Nov 2016 12:13:55 +0000 Subject: [maker-devel] Maker-P Message-ID: <1478607235425.40152@inra.fr> Hello, I'm interested in using maker-p but I can't find it within the last version 3 and neither find v2.29 to download. Can your please tell me how to proceed ? Best wishes. Pierre-Fran?ois Bert -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Nov 9 13:00:08 2016 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 9 Nov 2016 12:00:08 -0700 Subject: [maker-devel] Maker-P In-Reply-To: <1478607235425.40152@inra.fr> References: <1478607235425.40152@inra.fr> Message-ID: MAKER-P?s features and accessory scripts were integrated into MAKER with versions 2.29 and above as stated on the MAKER-P page. There is no longer a separate MAKER-P download and it is not a separate executable. You just download MAKER 2.29 or above and run .../maker/bin/maker ?Carson > On Nov 8, 2016, at 5:13 AM, Pierre-Francois Bert wrote: > > Hello, > I'm interested in using maker-p but I can't find it within the last version 3 and neither find v2.29 to download. > Can your please tell me how to proceed ? > Best wishes. > Pierre-Fran?ois Bert > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From jcornel3 at asu.edu Thu Nov 10 16:43:56 2016 From: jcornel3 at asu.edu (John Cornelius) Date: Thu, 10 Nov 2016 15:43:56 -0700 Subject: [maker-devel] Error running MAKER Message-ID: Hello, I'm using MAKER to annotate a tetraploid genome and while running it, I encountered the following error: #--------- command -------------# Widget::exonerate::est2genome: /packages/exonerate-2.2.0/bin/exonerate -q /tmp/maker_08Elxf/15/TRINITY_GG_19079_c1670_g1_i1.for.84770203-84771247.15.fasta -t /tmp/maker_08Elxf/15/chr9_10L.84770203-84771247.15.fasta -Q dna -T dna --model est2genome --minintron 20 --maxintron 10000 --showcigar --percent 20 > /tmp/maker_08Elxf/15/chr9_10L.84770203-84771247.TRINITY_GG_19079_c1670_g1_i1.e.exonerate #-------------------------------# running est2genome search. #--------- command -------------# Widget::exonerate::est2genome: /packages/exonerate-2.2.0/bin/exonerate -q /tmp/maker_08Elxf/10/TRINITY_GG_87963_c9694_g10_i12.for.49475083-49475985.10.fasta -t /tmp/maker_08Elxf/10/chr6L.49475083-49475985.10.fasta -Q dna -T dna --model est2genome --minintron 20 --maxintron 10000 --showcigar --percent 20 > /tmp/maker_08Elxf/10/chr6L.49475083-49475985.TRINITY_GG_87963_c9694_g10_i12.e.exonerate #-------------------------------# =================================================================================== = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES = PID 132376 RUNNING AT pnap-pe7-s03 = EXIT CODE: 135 = CLEANING UP REMAINING PROCESSES = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES =================================================================================== YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Bus error (signal 7) This typically refers to a problem with your application. Please see the FAQ page for debugging suggestions The the command I ran was the following: #PBS -l walltime=240:00:00 #PBS -N MAKER #PBS -l nodes=1:ppn=16 ##PBS -q hmem #PBS -j oe #PBS -m abe #PBS -M jcornelius at tgen.org #PBS -A tgen-205000 #PBS -o /scratch/jcornelius/xenopus_laevis/maker_run # --- load required modules --- # module load maker # --- run maker --- # cd /scratch/jcornelius/xenopus_laevis/maker_run mpiexec -n 16 maker -base XLNEURO.run1 -fix_nucleotides I'm not sure what could be causing this error and any help would be much appreciated. Thanks. -- John Cornelius MCB PhD Candidate Arizona State University -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Nov 11 15:59:54 2016 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 11 Nov 2016 14:59:54 -0700 Subject: [maker-devel] Error running MAKER In-Reply-To: References: Message-ID: <478D5289-91FD-4F3A-AED1-B2A81A742D43@gmail.com> The cause of the error is probably further back in the STDERR. With MPI so many processes are producing status and notes, that you can get several seconds of output after ta failure. If you kept the whole STDERR, I can help you look through it. searching for ?ERROR? all caps is usually where you will see it. Also MAKER keeps a log of progress, so even on failure, you can just restart it and it will pick up the analysis from the last successful step. ?Carson > On Nov 10, 2016, at 3:43 PM, John Cornelius wrote: > > Hello, I'm using MAKER to annotate a tetraploid genome and while running it, I encountered the following error: > > #--------- command -------------# > Widget::exonerate::est2genome: > /packages/exonerate-2.2.0/bin/exonerate -q /tmp/maker_08Elxf/15/TRINITY_GG_19079_c1670_g1_i1.for.84770203-84771247.15.fasta -t /tmp/maker_08Elxf/15/chr9_10L.84770203-84771247.15.fasta -Q dna -T dna --model est2genome --minintron 20 --maxintron 10000 --showcigar --percent 20 > /tmp/maker_08Elxf/15/chr9_10L.84770203-84771247.TRINITY_GG_19079_c1670_g1_i1.e.exonerate > #-------------------------------# > running est2genome search. > #--------- command -------------# > Widget::exonerate::est2genome: > /packages/exonerate-2.2.0/bin/exonerate -q /tmp/maker_08Elxf/10/TRINITY_GG_87963_c9694_g10_i12.for.49475083-49475985.10.fasta -t /tmp/maker_08Elxf/10/chr6L.49475083-49475985.10.fasta -Q dna -T dna --model est2genome --minintron 20 --maxintron 10000 --showcigar --percent 20 > /tmp/maker_08Elxf/10/chr6L.49475083-49475985.TRINITY_GG_87963_c9694_g10_i12.e.exonerate > #-------------------------------# > > =================================================================================== > = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES > = PID 132376 RUNNING AT pnap-pe7-s03 > = EXIT CODE: 135 > = CLEANING UP REMAINING PROCESSES > = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES > =================================================================================== > YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Bus error (signal 7) > This typically refers to a problem with your application. > Please see the FAQ page for debugging suggestions > > The the command I ran was the following: > > #PBS -l walltime=240:00:00 > #PBS -N MAKER > #PBS -l nodes=1:ppn=16 > ##PBS -q hmem > #PBS -j oe > #PBS -m abe > #PBS -M jcornelius at tgen.org > #PBS -A tgen-205000 > #PBS -o /scratch/jcornelius/xenopus_laevis/maker_run > > # --- load required modules --- # > > module load maker > > # --- run maker --- # > > cd /scratch/jcornelius/xenopus_laevis/maker_run > mpiexec -n 16 maker -base XLNEURO.run1 -fix_nucleotides > > I'm not sure what could be causing this error and any help would be much appreciated. Thanks. > -- > John Cornelius > MCB PhD Candidate > Arizona State University > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From lmeunier at ulg.ac.be Mon Nov 14 02:50:50 2016 From: lmeunier at ulg.ac.be (=?UTF-8?B?TG/Dr2M=?=) Date: Mon, 14 Nov 2016 09:50:50 +0100 Subject: [maker-devel] Predictions without evidence Message-ID: Hello, I am a Ph. D. student, and I am using MAKER to automate gene prediction for many genomes as part of a genome mining work, so I don't include evidence for its use. If I understood well, when exploiting multiple gene predictor softwares, AED is used to define the prediction which matches the best the evidence. So, as I don't use evidence, is there a choice made by MAKER when working with multiple gene predictors? If yes, how does it work? Also, I have not well understood, if the selection of the gene predictor to use is made for every gene? Sorry to asking if the answer is obvious, but after reading your papers and looking on the archived posts, I have not found the answer. By the way, I have also a question about your paper on MAKER2 (Holt and Yandell, 2011). It is said many times that gene predictors used in MAKER pipeline give better results than when used alone, but I have not understand why. Can you explain this fact? Best regards, Lo?c Meunier From jacques.dainat at bils.se Mon Nov 14 02:55:06 2016 From: jacques.dainat at bils.se (Jacques Dainat) Date: Mon, 14 Nov 2016 09:55:06 +0100 Subject: [maker-devel] strand of single exon EST from fasta Message-ID: <2E91C252-D244-47A2-B896-99EE0F69EBBA@bils.se> Hello, I?m annotating several strains of a same fungus, and I have stranded RNAseq for all of them. I?m using MAKER3. Let?s say I?m annotating the species1 using its species-specific assembled transcripts that are in gff. I know that MAKER cannot do anything about the strand coming from the est_gff. In order to check that everything went fine during my transcriptome assembly and the strands correctly defined, I checked the annotation within a browser. I can see the strands from my transcripts in gff format were perfect (match with the proteins strands / and with abinitio prediction strands / and ORFs are OK). As I wanted to take advantage on my other strains RNAseq I decided to use them within this annotation. As the transcriptome assemblies of these RNAseq have been done based on their corresponding genomes, I cannot use the gff files. Indeed, the location are not corresponding to the genome of my species1. So I decided to extract the sequences in fasta format to feed MAKER with (alt_est parameter). When I visualise those transcript alignements I was really surprised by the strands decided by MAKER. It seems completely random, while all the est fasta sequences from a same locus are given in the same strand. So, I have two questions: 1) How the strand is decided for single exon EST provided in fasta format ? (I thought it was based on the longest ORF) 2) Is it normal that the second annotation using these alt_est is worse (far less gene models) than the previous one ? (I thought the strand of my single exon alt_ests would not play a role during the the annotation process. Or maybe it?s another biais from these alt_est => loci less well defined ?) Here 3 examples: The top green track has the correct strand and is based on the gff file. The bottom green cluster tracks are fasta sequences from the other strains aligned through MAKER. (I dont?t know if it could play a role but all sequences from a same locus have been sent to MAKER in the same strand). Thank you very much for your help, Jacques Dainat -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Screen Shot 2016-11-13 at 13.05.24.png Type: image/png Size: 52019 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Screen Shot 2016-11-13 at 13.05.44.png Type: image/png Size: 26966 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Screen Shot 2016-11-13 at 13.07.13.png Type: image/png Size: 24338 bytes Desc: not available URL: From carsonhh at gmail.com Mon Nov 14 14:08:13 2016 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 14 Nov 2016 13:08:13 -0700 Subject: [maker-devel] strand of single exon EST from fasta In-Reply-To: <2E91C252-D244-47A2-B896-99EE0F69EBBA@bils.se> References: <2E91C252-D244-47A2-B896-99EE0F69EBBA@bils.se> Message-ID: Single exon EST and alt-EST strand are based on longest ORF. In the event that there is a tie, then whatever strand that was assigned by the aligner would be maintained. alt-ESTs are less likely to align or produce a model than the ESTs. If you have competing models on opposite strands for the same CDS, then support from ab initio, spliced EST, or exonerate protein alignments will be needed for the model. ?Carson > On Nov 14, 2016, at 1:55 AM, Jacques Dainat wrote: > > Hello, > > I?m annotating several strains of a same fungus, and I have stranded RNAseq for all of them. I?m using MAKER3. > Let?s say I?m annotating the species1 using its species-specific assembled transcripts that are in gff. I know that MAKER cannot do anything about the strand coming from the est_gff. In order to check that everything went fine during my transcriptome assembly and the strands correctly defined, I checked the annotation within a browser. I can see the strands from my transcripts in gff format were perfect (match with the proteins strands / and with abinitio prediction strands / and ORFs are OK). > > As I wanted to take advantage on my other strains RNAseq I decided to use them within this annotation. As the transcriptome assemblies of these RNAseq have been done based on their corresponding genomes, I cannot use the gff files. Indeed, the location are not corresponding to the genome of my species1. So I decided to extract the sequences in fasta format to feed MAKER with (alt_est parameter). > When I visualise those transcript alignements I was really surprised by the strands decided by MAKER. It seems completely random, while all the est fasta sequences from a same locus are given in the same strand. > > So, I have two questions: > 1) How the strand is decided for single exon EST provided in fasta format ? (I thought it was based on the longest ORF) > 2) Is it normal that the second annotation using these alt_est is worse (far less gene models) than the previous one ? (I thought the strand of my single exon alt_ests would not play a role during the the annotation process. Or maybe it?s another biais from these alt_est => loci less well defined ?) > > > > Here 3 examples: The top green track has the correct strand and is based on the gff file. The bottom green cluster tracks are fasta sequences from the other strains aligned through MAKER. (I dont?t know if it could play a role but all sequences from a same locus have been sent to MAKER in the same strand). > > > Thank you very much for your help, > > Jacques Dainat > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Nov 14 14:18:26 2016 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 14 Nov 2016 13:18:26 -0700 Subject: [maker-devel] Predictions without evidence In-Reply-To: References: Message-ID: <7BDEAAF4-230C-4315-B353-43381237BCB0@gmail.com> Gene predictors have to be trained on each organism to generate a matched HMM. If they are not trained, they will not work well. MAKER also sends hints to the predictor based on the evidence alignments to further alter probabilities used by the predictor to better match the evidence. Evidence is also used in final filtering. All models without evidence will have an AED of 1, which means no support. Not using evidence will result in very poor models especially if you don?t have an HMM built exactly for the organism. The main problem will be over prediction. Note the behavior of SNAP alone in the MAKER2 paper. The result is tens of thousands of false positive gene models. If you only run multiple gene predictors without evidence, the final model will be whatever model has the best consensus structure for the set. If the set consists of two models, then there is no consensus and the longest one is kept. ?Carson > On Nov 14, 2016, at 1:50 AM, Lo?c wrote: > > Hello, > > I am a Ph. D. student, and I am using MAKER to automate gene prediction for many genomes as part of a genome mining work, so I don't include evidence for its use. > If I understood well, when exploiting multiple gene predictor softwares, AED is used to define the prediction which matches the best the evidence. > > So, as I don't use evidence, is there a choice made by MAKER when working with multiple gene predictors? If yes, how does it work? > Also, I have not well understood, if the selection of the gene predictor to use is made for every gene? > > Sorry to asking if the answer is obvious, but after reading your papers and looking on the archived posts, I have not found the answer. > > By the way, I have also a question about your paper on MAKER2 (Holt and Yandell, 2011). It is said many times that gene predictors used in MAKER pipeline give better results than when used alone, but I have not understand why. Can you explain this fact? > > Best regards, > > Lo?c Meunier > > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Thu Nov 17 15:05:53 2016 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 17 Nov 2016 14:05:53 -0700 Subject: [maker-devel] About split genes problem in Maker annotations In-Reply-To: <75508AB460A77C4798EC49425637E292194A0DA6@PETREL-MA.imcb.a-star.edu.sg> References: <75508AB460A77C4798EC49425637E292194A0DA6@PETREL-MA.imcb.a-star.edu.sg> Message-ID: <36BBB195-EEB4-4B3A-9463-3E4171731390@gmail.com> est2genome and protein2genome should only be used for initial training. They are not predictors, rather they take an EST/protein alignment, find the longest ORF and then turn the ORF directly into a gene model. It is good enough to build a training dataset, but the models will almost always be partial and fragmented. Also because the alignments both produce and support themselves, they always score well, so their AED values are meaningless. Once you have a predictor trained, you should turn est2genome and protein2genome off. With a trained predictor, the alignments will then serve as hints to Augustus as to where likely introns/exons will be, and this will give the desired behavior. Note Augustus will attempt to build the most probable model given the hints and the assembly sequence. If there are any assembly issues affecting the ORF, the predictor will often skip exons or split the model in the locus. Also make sure you have built a species specific repeat library to add to the default repeat libraries used by MAKER (you can use tools like RepeatModeler to do this). Otherwise you will get spurious alignments of much of your evidecence and Augustus will generate false positive results. You may also want to add a large dataset like Uniprot/swiss-prot to the protein evidence. The best way to evaluate annotations and performance is to visually review annotation in tools like Apollo. It will allow you to see if evidence, gene predictions, and final models achieve consensus or if alignments don?t match (spurious alignment generally suggests a repeat masking issue or evidence quality issue) or if raw ab initio predictions don?t match (indicates insufficient training or an underlying assembly issues). ?Carson > On Nov 16, 2016, at 8:01 PM, Prashant Narendra SHINGATE wrote: > > Hi Carson, > > We are annotating the genome of a fish with a relatively small genome (~450Mb) using Maker and encountering many genes that are split and predicted as multiple genes. We are using Augustus for de novo prediction. Fortunately we have full-length RNAseq for about 4000 genes (and total ~50k transcripts) from the same species, and whole-genome protein sequences from a very closely related species. > > First we trained Augustus using ~4000 full length RNAseq transcript from the same species. This trained Augustus model was used in the Maker annotation pipeline along with ~50k RNAseq transcripts (>1000bp) and whole-genome proteins sequences from a closely related species. > > We first tried annotating using the options est2genome=1, protein2genome=1 and Augustus ON. We found several genes were split and the program seemed to give weight to Augustus prediction in spite of having full-length RNAseq and protein sequences aligned to the gene predicted loci (visualized using Jbrowser). > > In the next trial we used est2genome=1, protein2genome=1 and Augustus OFF in the first step. In the second step we did reiteration by est2genome=0, protein2genome=0 and Augustus ON. Still the output contained split genes. > > In the third trial we used est2genome=1, protein2genome=1 and Augustus OFF and checked the output. In this output full-length genes were predicted whenever full-length RNAseq and/or protein sequences were available. This seems to suggest that when we use Augustus, more weight is given to Augustus de novo prediction and the synthesis of evidence from RNAseq and protein sequences is not happening. > > Can you please let us know why we are getting split genes in spite of having full-length RNAseq and/or protein sequences? What changes would you suggest to the protocol to overcome this problem? > > We thank you very much for your help and time. > > Regards, > Prashant Shingate, PhD :: Research Fellow :: Comparative and Medical Genomics Lab :: Institute of Molecular and Cell Biology (IMCB) :: Agency for Science, Technology and Research (A*STAR) > 61 Biopolis Drive :: #05-04 Proteos :: Singapore 138673 :: DID (+65) 6586 9570 :: Fax (+65) 6779 1117:: http://www.imcb.a-star.edu.sg/ > We advance science and develop innovative technology to further economic growth and improve lives. > > > > > Note: This message may contain confidential information. If this Email/Fax has been sent to you by mistake, please notify the sender and delete it immediately. Thank you. -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Nov 17 22:04:31 2016 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 17 Nov 2016 21:04:31 -0700 Subject: [maker-devel] Error running MAKER In-Reply-To: References: <478D5289-91FD-4F3A-AED1-B2A81A742D43@gmail.com> Message-ID: <8E9C294A-B68C-42C1-999C-13165985AD93@gmail.com> To use less RAM, try lowering max_dna_len=, setting blast_depth= parameters to 20 pr 30 in maker_bopts.ctl (default is limitless), or when using MPI, starting fewer processes per node (requires manipulation of hostfile or using round robin distribution flag for MPI flavors where it is available). The memory issue could be causing the lock failure as well. ?Carson > On Nov 17, 2016, at 7:53 PM, John Cornelius wrote: > > Ok, so I went and searched one of the output logs for all the lines that say ERROR and I got 44 lines with the following message: > > ERROR: Lock broken in runlog > > With these lines found at the end: > > ERROR: Failed while polishig ESTs > ERROR: Chunk failed at level:2, tier_type:3 > ERROR: Could not query process table: Cannot allocate memory at /packages/maker/2.31.8/bin/../lib/Proc/ProcessTable_simple.pm line 62. > > From that last line it looks like the process is running out of RAM would that be right? Thanks. > > On Fri, Nov 11, 2016 at 2:59 PM, Carson Holt > wrote: > The cause of the error is probably further back in the STDERR. With MPI so many processes are producing status and notes, that you can get several seconds of output after ta failure. If you kept the whole STDERR, I can help you look through it. searching for ?ERROR? all caps is usually where you will see it. Also MAKER keeps a log of progress, so even on failure, you can just restart it and it will pick up the analysis from the last successful step. > > ?Carson > > >> On Nov 10, 2016, at 3:43 PM, John Cornelius > wrote: >> >> Hello, I'm using MAKER to annotate a tetraploid genome and while running it, I encountered the following error: >> >> #--------- command -------------# >> Widget::exonerate::est2genome: >> /packages/exonerate-2.2.0/bin/exonerate -q /tmp/maker_08Elxf/15/TRINITY_GG_19079_c1670_g1_i1.for.84770203-84771247.15.fasta -t /tmp/maker_08Elxf/15/chr9_10L.84770203-84771247.15.fasta -Q dna -T dna --model est2genome --minintron 20 --maxintron 10000 --showcigar --percent 20 > /tmp/maker_08Elxf/15/chr9_10L.84770203-84771247.TRINITY_GG_19079_c1670_g1_i1.e.exonerate >> #-------------------------------# >> running est2genome search. >> #--------- command -------------# >> Widget::exonerate::est2genome: >> /packages/exonerate-2.2.0/bin/exonerate -q /tmp/maker_08Elxf/10/TRINITY_GG_87963_c9694_g10_i12.for.49475083-49475985.10.fasta -t /tmp/maker_08Elxf/10/chr6L.49475083-49475985.10.fasta -Q dna -T dna --model est2genome --minintron 20 --maxintron 10000 --showcigar --percent 20 > /tmp/maker_08Elxf/10/chr6L.49475083-49475985.TRINITY_GG_87963_c9694_g10_i12.e.exonerate >> #-------------------------------# >> >> =================================================================================== >> = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES >> = PID 132376 RUNNING AT pnap-pe7-s03 >> = EXIT CODE: 135 >> = CLEANING UP REMAINING PROCESSES >> = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES >> =================================================================================== >> YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Bus error (signal 7) >> This typically refers to a problem with your application. >> Please see the FAQ page for debugging suggestions >> >> The the command I ran was the following: >> >> #PBS -l walltime=240:00:00 >> #PBS -N MAKER >> #PBS -l nodes=1:ppn=16 >> ##PBS -q hmem >> #PBS -j oe >> #PBS -m abe >> #PBS -M jcornelius at tgen.org >> #PBS -A tgen-205000 >> #PBS -o /scratch/jcornelius/xenopus_laevis/maker_run >> >> # --- load required modules --- # >> >> module load maker >> >> # --- run maker --- # >> >> cd /scratch/jcornelius/xenopus_laevis/maker_run >> mpiexec -n 16 maker -base XLNEURO.run1 -fix_nucleotides >> >> I'm not sure what could be causing this error and any help would be much appreciated. Thanks. >> -- >> John Cornelius >> MCB PhD Candidate >> Arizona State University >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > > -- > John Cornelius > MCB PhD Candidate > Arizona State University -------------- next part -------------- An HTML attachment was scrubbed... URL: From jcornel3 at asu.edu Fri Nov 18 13:14:52 2016 From: jcornel3 at asu.edu (John Cornelius) Date: Fri, 18 Nov 2016 12:14:52 -0700 Subject: [maker-devel] Error running MAKER In-Reply-To: <8E9C294A-B68C-42C1-999C-13165985AD93@gmail.com> References: <478D5289-91FD-4F3A-AED1-B2A81A742D43@gmail.com> <8E9C294A-B68C-42C1-999C-13165985AD93@gmail.com> Message-ID: Would the lock failure cause problems with the annotation? It looks like Maker is still progressing, just not as quickly as I thought it would be. On Thu, Nov 17, 2016 at 9:04 PM, Carson Holt wrote: > To use less RAM, try lowering max_dna_len=, setting blast_depth= > parameters to 20 pr 30 in maker_bopts.ctl (default is limitless), or when > using MPI, starting fewer processes per node (requires manipulation of > hostfile or using round robin distribution flag for MPI flavors where it is > available). > > The memory issue could be causing the lock failure as well. > > ?Carson > > > > On Nov 17, 2016, at 7:53 PM, John Cornelius wrote: > > Ok, so I went and searched one of the output logs for all the lines that > say ERROR and I got 44 lines with the following message: > > ERROR: Lock broken in runlog > > With these lines found at the end: > > ERROR: Failed while polishig ESTs > ERROR: Chunk failed at level:2, tier_type:3 > ERROR: Could not query process table: Cannot allocate memory at > /packages/maker/2.31.8/bin/../lib/Proc/ProcessTable_simple.pm line 62. > > From that last line it looks like the process is running out of RAM would > that be right? Thanks. > > On Fri, Nov 11, 2016 at 2:59 PM, Carson Holt wrote: > >> The cause of the error is probably further back in the STDERR. With MPI >> so many processes are producing status and notes, that you can get several >> seconds of output after ta failure. If you kept the whole STDERR, I can >> help you look through it. searching for ?ERROR? all caps is usually where >> you will see it. Also MAKER keeps a log of progress, so even on failure, >> you can just restart it and it will pick up the analysis from the last >> successful step. >> >> ?Carson >> >> >> On Nov 10, 2016, at 3:43 PM, John Cornelius wrote: >> >> Hello, I'm using MAKER to annotate a tetraploid genome and while running >> it, I encountered the following error: >> >> #--------- command -------------# >> Widget::exonerate::est2genome: >> /packages/exonerate-2.2.0/bin/exonerate -q >> /tmp/maker_08Elxf/15/TRINITY_GG_19079_c1670_g1_i1.for.84770203-84771247.15.fasta >> -t /tmp/maker_08Elxf/15/chr9_10L.84770203-84771247.15.fasta -Q dna -T >> dna --model est2genome --minintron 20 --maxintron 10000 --showcigar >> --percent 20 > /tmp/maker_08Elxf/15/chr9_10L. >> 84770203-84771247.TRINITY_GG_19079_c1670_g1_i1.e.exonerate >> #-------------------------------# >> running est2genome search. >> #--------- command -------------# >> Widget::exonerate::est2genome: >> /packages/exonerate-2.2.0/bin/exonerate -q >> /tmp/maker_08Elxf/10/TRINITY_GG_87963_c9694_g10_i12.for.49475083-49475985.10.fasta >> -t /tmp/maker_08Elxf/10/chr6L.49475083-49475985.10.fasta -Q dna -T dna >> --model est2genome --minintron 20 --maxintron 10000 --showcigar --percent >> 20 > /tmp/maker_08Elxf/10/chr6L.49475083-49475985.TRINITY_GG_8796 >> 3_c9694_g10_i12.e.exonerate >> #-------------------------------# >> >> ============================================================ >> ======================= >> = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES >> = PID 132376 RUNNING AT pnap-pe7-s03 >> = EXIT CODE: 135 >> = CLEANING UP REMAINING PROCESSES >> = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES >> ============================================================ >> ======================= >> YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Bus error (signal 7) >> This typically refers to a problem with your application. >> Please see the FAQ page for debugging suggestions >> >> The the command I ran was the following: >> >> #PBS -l walltime=240:00:00 >> #PBS -N MAKER >> #PBS -l nodes=1:ppn=16 >> ##PBS -q hmem >> #PBS -j oe >> #PBS -m abe >> #PBS -M jcornelius at tgen.org >> #PBS -A tgen-205000 >> #PBS -o /scratch/jcornelius/xenopus_laevis/maker_run >> >> # --- load required modules --- # >> >> module load maker >> >> # --- run maker --- # >> >> cd /scratch/jcornelius/xenopus_laevis/maker_run >> mpiexec -n 16 maker -base XLNEURO.run1 -fix_nucleotides >> >> I'm not sure what could be causing this error and any help would be much >> appreciated. Thanks. >> -- >> John Cornelius >> MCB PhD Candidate >> Arizona State University >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> > > > -- > John Cornelius > MCB PhD Candidate > Arizona State University > > > -- John Cornelius MCB PhD Candidate Arizona State University -------------- next part -------------- An HTML attachment was scrubbed... URL: From mohamed.amine.chebbi at univ-poitiers.fr Thu Nov 24 15:45:01 2016 From: mohamed.amine.chebbi at univ-poitiers.fr (Mohamed Amine Chebbi) Date: Thu, 24 Nov 2016 22:45:01 +0100 (CET) Subject: [maker-devel] map_fasta_ids : No mapping available... Message-ID: <773569486.15711466.1480023901276.JavaMail.zimbra@univ-poitiers.fr> Hello ! I'am attempting to rename genes of maker.proteins.fasta for Genebank submission using the map_fasta_ids script. It seems to work correctly for the major of gene models, except to those ones having the below warning message : WARNING: No mapping available for maker-scaffold_1710-augustus-gene-0.3-mRNA-1 WARNING: No mapping available for maker-scaffold_1710-augustus-gene-0.0-mRNA-1 WARNING: No mapping available for maker-scaffold_1710-snap-gene-0.6-mRNA-1 WARNING: No mapping available for maker-scaffold_1710-augustus-gene-0.4-mRNA-1 WARNING: No mapping available for maker-scaffold_1734-augustus-gene-0.1-mRNA-1 WARNING: No mapping available for maker-scaffold_1734-augustus-gene-0.2-mRNA-1 WARNING: No mapping available for maker-scaffold_1734-augustus-gene-0.0-mRNA-1 WARNING: No mapping available for maker-scaffold_1734-augustus-gene-0.5-mRNA-1 WARNING: No mapping available for maker-scaffold_1734-augustus-gene-0.6-mRNA-1 WARNING: No mapping available for maker-scaffold_1734-snap-gene-0.15-mRNA-1 WARNING: No mapping available for maker-scaffold_1734-snap-gene-0.16-mRNA-1 Looking into the maker.gff file, these gene names are missing and may be replaced by other ones which differ by the numbers following the gene predictor. I wounder if you can explain me the reason of these warning message and how to resolve it. Thank you , Best, Amine -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Nov 24 20:04:59 2016 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 24 Nov 2016 19:04:59 -0700 Subject: [maker-devel] Error running MAKER In-Reply-To: References: <478D5289-91FD-4F3A-AED1-B2A81A742D43@gmail.com> <8E9C294A-B68C-42C1-999C-13165985AD93@gmail.com> Message-ID: <3C668404-EA3C-46B4-9676-8F95E2AFB64F@gmail.com> A lock failure can become an issue if two separate jobs are running simultaneously. They may both try to process the same contig at the same time (modifying each others files) which will cause one or both to fail. On failure, it should always retry at some later point. So it can usually recover from this. If you see any partial lines in the resulting GFF3, then it did not recover and you need to just rerun whatever contig this happened on. ?Carson > On Nov 18, 2016, at 12:14 PM, John Cornelius wrote: > > Would the lock failure cause problems with the annotation? It looks like Maker is still progressing, just not as quickly as I thought it would be. > > On Thu, Nov 17, 2016 at 9:04 PM, Carson Holt > wrote: > To use less RAM, try lowering max_dna_len=, setting blast_depth= parameters to 20 pr 30 in maker_bopts.ctl (default is limitless), or when using MPI, starting fewer processes per node (requires manipulation of hostfile or using round robin distribution flag for MPI flavors where it is available). > > The memory issue could be causing the lock failure as well. > > ?Carson > > > >> On Nov 17, 2016, at 7:53 PM, John Cornelius > wrote: >> >> Ok, so I went and searched one of the output logs for all the lines that say ERROR and I got 44 lines with the following message: >> >> ERROR: Lock broken in runlog >> >> With these lines found at the end: >> >> ERROR: Failed while polishig ESTs >> ERROR: Chunk failed at level:2, tier_type:3 >> ERROR: Could not query process table: Cannot allocate memory at /packages/maker/2.31.8/bin/../lib/Proc/ProcessTable_simple.pm line 62. >> >> From that last line it looks like the process is running out of RAM would that be right? Thanks. >> >> On Fri, Nov 11, 2016 at 2:59 PM, Carson Holt > wrote: >> The cause of the error is probably further back in the STDERR. With MPI so many processes are producing status and notes, that you can get several seconds of output after ta failure. If you kept the whole STDERR, I can help you look through it. searching for ?ERROR? all caps is usually where you will see it. Also MAKER keeps a log of progress, so even on failure, you can just restart it and it will pick up the analysis from the last successful step. >> >> ?Carson >> >> >>> On Nov 10, 2016, at 3:43 PM, John Cornelius > wrote: >>> >>> Hello, I'm using MAKER to annotate a tetraploid genome and while running it, I encountered the following error: >>> >>> #--------- command -------------# >>> Widget::exonerate::est2genome: >>> /packages/exonerate-2.2.0/bin/exonerate -q /tmp/maker_08Elxf/15/TRINITY_GG_19079_c1670_g1_i1.for.84770203-84771247.15.fasta -t /tmp/maker_08Elxf/15/chr9_10L.84770203-84771247.15.fasta -Q dna -T dna --model est2genome --minintron 20 --maxintron 10000 --showcigar --percent 20 > /tmp/maker_08Elxf/15/chr9_10L.84770203-84771247.TRINITY_GG_19079_c1670_g1_i1.e.exonerate >>> #-------------------------------# >>> running est2genome search. >>> #--------- command -------------# >>> Widget::exonerate::est2genome: >>> /packages/exonerate-2.2.0/bin/exonerate -q /tmp/maker_08Elxf/10/TRINITY_GG_87963_c9694_g10_i12.for.49475083-49475985.10.fasta -t /tmp/maker_08Elxf/10/chr6L.49475083-49475985.10.fasta -Q dna -T dna --model est2genome --minintron 20 --maxintron 10000 --showcigar --percent 20 > /tmp/maker_08Elxf/10/chr6L.49475083-49475985.TRINITY_GG_87963_c9694_g10_i12.e.exonerate >>> #-------------------------------# >>> >>> =================================================================================== >>> = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES >>> = PID 132376 RUNNING AT pnap-pe7-s03 >>> = EXIT CODE: 135 >>> = CLEANING UP REMAINING PROCESSES >>> = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES >>> =================================================================================== >>> YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Bus error (signal 7) >>> This typically refers to a problem with your application. >>> Please see the FAQ page for debugging suggestions >>> >>> The the command I ran was the following: >>> >>> #PBS -l walltime=240:00:00 >>> #PBS -N MAKER >>> #PBS -l nodes=1:ppn=16 >>> ##PBS -q hmem >>> #PBS -j oe >>> #PBS -m abe >>> #PBS -M jcornelius at tgen.org >>> #PBS -A tgen-205000 >>> #PBS -o /scratch/jcornelius/xenopus_laevis/maker_run >>> >>> # --- load required modules --- # >>> >>> module load maker >>> >>> # --- run maker --- # >>> >>> cd /scratch/jcornelius/xenopus_laevis/maker_run >>> mpiexec -n 16 maker -base XLNEURO.run1 -fix_nucleotides >>> >>> I'm not sure what could be causing this error and any help would be much appreciated. Thanks. >>> -- >>> John Cornelius >>> MCB PhD Candidate >>> Arizona State University >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> >> >> -- >> John Cornelius >> MCB PhD Candidate >> Arizona State University > > > > > -- > John Cornelius > MCB PhD Candidate > Arizona State University -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Nov 28 10:26:40 2016 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 28 Nov 2016 09:26:40 -0700 Subject: [maker-devel] map_fasta_ids : No mapping available... In-Reply-To: <773569486.15711466.1480023901276.JavaMail.zimbra@univ-poitiers.fr> References: <773569486.15711466.1480023901276.JavaMail.zimbra@univ-poitiers.fr> Message-ID: <401400E0-7581-4407-A30E-A787485B0E86@gmail.com> The map file you run with is two columns (old_id and new_id). If the input file has IDs that do not match anything in the old_id column then it throws the warning. It means there is a mismatch between the map file being used and the fasta file. This can occur if you did downstream manipulation of the fasta file, are using the wrong fasta file, or if you used GFF3 as input to a maker step that as generated an ID mismatch. ?Carson > On Nov 24, 2016, at 2:45 PM, Mohamed Amine Chebbi wrote: > > Hello ! > > I'am attempting to rename genes of maker.proteins.fasta for Genebank submission using the map_fasta_ids script. It seems to work correctly for the major of gene models, except to those ones having the below warning message : > > WARNING: No mapping available for maker-scaffold_1710-augustus-gene-0.3-mRNA-1 > WARNING: No mapping available for maker-scaffold_1710-augustus-gene-0.0-mRNA-1 > WARNING: No mapping available for maker-scaffold_1710-snap-gene-0.6-mRNA-1 > WARNING: No mapping available for maker-scaffold_1710-augustus-gene-0.4-mRNA-1 > WARNING: No mapping available for maker-scaffold_1734-augustus-gene-0.1-mRNA-1 > WARNING: No mapping available for maker-scaffold_1734-augustus-gene-0.2-mRNA-1 > WARNING: No mapping available for maker-scaffold_1734-augustus-gene-0.0-mRNA-1 > WARNING: No mapping available for maker-scaffold_1734-augustus-gene-0.5-mRNA-1 > WARNING: No mapping available for maker-scaffold_1734-augustus-gene-0.6-mRNA-1 > WARNING: No mapping available for maker-scaffold_1734-snap-gene-0.15-mRNA-1 > WARNING: No mapping available for maker-scaffold_1734-snap-gene-0.16-mRNA-1 > > Looking into the maker.gff file, these gene names are missing and may be replaced by other ones which differ by the numbers following the gene predictor. > > I wounder if you can explain me the reason of these warning message and how to resolve it. > > Thank you , > > Best, > Amine > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From parulk at caltech.edu Tue Nov 29 11:13:06 2016 From: parulk at caltech.edu (Kudtarkar, Parul V.) Date: Tue, 29 Nov 2016 17:13:06 +0000 Subject: [maker-devel] error: training genemodel with SNAP and GeneMark & run time to generate AUGUTUS species file Message-ID: <5F5AE8A3-967E-4876-8581-FE54FB676210@caltech.edu> Dear Maker developers, 1. We use assembled RNAseq(from same species) and protein evidence(from evolutionary close species) to generate training gene structure(1st iteration, est2genome=1,protein2genome=1 ). 2. This is than used to train abinito gene predictors, SNAP and AUGUSTUS. 3. GeneMarkES( version: GeneMark-ES / ET v.4.32) is used to produce training data-set with the command gmes_petap.pl --sequence pmin_jelly.fa 4. We would be predicting genes using results from SNAP, Genemark and AUGUSTUS(2nd iteration, est2genome=0, protein2genome=0) I have couple of questions relating to Genemark and AUGUSTUS 1. AUGUSTUS We do not have a species file for species file of our interest or evolutionary closer species following command is used to generate species file /autoAug.pl --genome=pmin_jelly.fa --species=pminiata --cdna=pmin_transcripts.fa --trainingset=genome.gff3 --singleCPU -v --useexisting AUGUSTUS is taking too long to compute species file, is there a solution for this issue. Using species file from other organism might generate false positives. Is it advised in such situations to not used AUGUSTUS model? 2. Genemark I used the gmhmm file generated in the genemark output directory, however I encounter following error ------------------------- STATUS: Parsing control files... ERROR: You have failed to provide a value for 'gmhmme3' in the control files. ERROR: You have failed to provide a value for 'probuild' in the control files. --------------------- FYI ----- maker_opts.ctl #-----Gene Prediction snaphmm=/home/parul/Pmin_new/maker_snap/pmin1.hmm #SNAP HMM file gmhmm=/home/parul/Pmin_new/maker_snap/gmhmm.mod #GeneMark HMM file ----- Using SNAP for training gene model yields over 6000-7000 additional gene. The model has good cumulative AED value. I was hoping in addition to SNAP, if I could use AUGUSTUS and GeneMark to train the gene model to fuse dispersed models so that the gene count is within the expected range. Thanks and regards, Parul Sent from my iPhone -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Tue Nov 29 11:28:33 2016 From: dence at genetics.utah.edu (Daniel Ence) Date: Tue, 29 Nov 2016 17:28:33 +0000 Subject: [maker-devel] error: training genemodel with SNAP and GeneMark & run time to generate AUGUTUS species file In-Reply-To: <5F5AE8A3-967E-4876-8581-FE54FB676210@caltech.edu> References: <5F5AE8A3-967E-4876-8581-FE54FB676210@caltech.edu> Message-ID: <359BAE14-18C2-4B91-A628-9613F94C8468@genetics.utah.edu> HI Parul, Training augustus does take a long time. Much longer than for the other two predictors that you mentioned. Have you tried using the webAugustus web portal? The team that made augustus run it and can probably help you with trouble-shooting their page for creating training sets: http://bioinf.uni-greifswald.de/webaugustus/training/create The error that you got regarding genemark is saying that maker can?t find the genemark and probuild executable files. These are specified in the maker_exe.ctl file, not the ?opts? file. You need to put valid paths to those executable files in for the given parameters. This is something that is usually specified during installation of MAKER. Hope that helps, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 On Nov 29, 2016, at 10:13 AM, Kudtarkar, Parul V. > wrote: Dear Maker developers, 1. We use assembled RNAseq(from same species) and protein evidence(from evolutionary close species) to generate training gene structure(1st iteration, est2genome=1,protein2genome=1 ). 2. This is than used to train abinito gene predictors, SNAP and AUGUSTUS. 3. GeneMarkES( version: GeneMark-ES / ET v.4.32) is used to produce training data-set with the command gmes_petap.pl --sequence pmin_jelly.fa 4. We would be predicting genes using results from SNAP, Genemark and AUGUSTUS(2nd iteration, est2genome=0, protein2genome=0) I have couple of questions relating to Genemark and AUGUSTUS 1. AUGUSTUS We do not have a species file for species file of our interest or evolutionary closer species following command is used to generate species file /autoAug.pl --genome=pmin_jelly.fa --species=pminiata --cdna=pmin_transcripts.fa --trainingset=genome.gff3 --singleCPU -v --useexisting AUGUSTUS is taking too long to compute species file, is there a solution for this issue. Using species file from other organism might generate false positives. Is it advised in such situations to not used AUGUSTUS model? 2. Genemark I used the gmhmm file generated in the genemark output directory, however I encounter following error ------------------------- STATUS: Parsing control files... ERROR: You have failed to provide a value for 'gmhmme3' in the control files. ERROR: You have failed to provide a value for 'probuild' in the control files. --------------------- FYI ----- maker_opts.ctl #-----Gene Prediction snaphmm=/home/parul/Pmin_new/maker_snap/pmin1.hmm #SNAP HMM file gmhmm=/home/parul/Pmin_new/maker_snap/gmhmm.mod #GeneMark HMM file ----- Using SNAP for training gene model yields over 6000-7000 additional gene. The model has good cumulative AED value. I was hoping in addition to SNAP, if I could use AUGUSTUS and GeneMark to train the gene model to fuse dispersed models so that the gene count is within the expected range. Thanks and regards, Parul Sent from my iPhone _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Nov 29 11:34:31 2016 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 29 Nov 2016 10:34:31 -0700 Subject: [maker-devel] error: training genemodel with SNAP and GeneMark & run time to generate AUGUTUS species file In-Reply-To: <5F5AE8A3-967E-4876-8581-FE54FB676210@caltech.edu> References: <5F5AE8A3-967E-4876-8581-FE54FB676210@caltech.edu> Message-ID: <596EAC73-4DB5-4144-A8EA-0E955AA0E028@gmail.com> How to train Augustus ?> http://www.molecularevolution.org/molevolfiles/exercises/augustus/training.html Step 2 shows how to create an empty species to start training with. Then Step 4 (optimize_augustus.pl) is the step that takes a while. Then for GeneMark, you must set the location of the necessary GeneMark executables in the maker_exe.ctl file. After getting all predictors trained, and running a few contigs, take a moment to review the predictor performance by manually reviewing them in something like Apollo. It is not uncommon that one or more perform poorly on an organism (they should each produce similar predictions). If one is significantly off relative to the other predictors and the evidence, it should be dropped. A bad behaving predictor will reduce the overall annotation performance. ?Carson > On Nov 29, 2016, at 10:13 AM, Kudtarkar, Parul V. wrote: > > >> Dear Maker developers, >> >> 1. We use assembled RNAseq(from same species) and protein evidence(from evolutionary close species) to generate training gene structure(1st iteration, est2genome=1,protein2genome=1 ). >> >> 2. This is than used to train abinito gene predictors, SNAP and AUGUSTUS. >> >> 3. GeneMarkES( version: GeneMark-ES / ET v.4.32) is used to produce training data-set with the command >> >> gmes_petap.pl --sequence pmin_jelly.fa >> >> 4. We would be predicting genes using results from SNAP, Genemark and AUGUSTUS(2nd iteration, est2genome=0, protein2genome=0) >> >> I have couple of questions relating to Genemark and AUGUSTUS >> >> 1. AUGUSTUS >> >> We do not have a species file for species file of our interest or evolutionary closer species >> >> following command is used to generate species file >> >> >> /autoAug.pl --genome=pmin_jelly.fa --species=pminiata --cdna=pmin_transcripts.fa --trainingset=genome.gff3 --singleCPU -v --useexisting >> AUGUSTUS is taking too long to compute species file, is there a solution for this issue. Using species file from other organism might generate false positives. Is it advised in such situations to not used AUGUSTUS model? >> >> 2. Genemark >> >> I used the gmhmm file generated in the genemark output directory, however I encounter following error >> >> >> ------------------------- >> >> STATUS: Parsing control files... >> ERROR: You have failed to provide a value for 'gmhmme3' in the control files. >> ERROR: You have failed to provide a value for 'probuild' in the control files. >> --------------------- >> FYI >> >> ----- >> >> maker_opts.ctl >> >> >> #-----Gene Prediction >> snaphmm=/home/parul/Pmin_new/maker_snap/pmin1.hmm #SNAP HMM file >> gmhmm=/home/parul/Pmin_new/maker_snap/gmhmm.mod #GeneMark HMM file >> >> ----- >> >> Using SNAP for training gene model yields over 6000-7000 additional gene. The model has good cumulative AED value. >> >> I was hoping in addition to SNAP, if I could use AUGUSTUS and GeneMark to train the gene model to fuse dispersed models so that the gene count is within the expected range. >> >> >> Thanks and regards, >> >> Parul >> > > Sent from my iPhone -------------- next part -------------- An HTML attachment was scrubbed... URL: From parulk at caltech.edu Tue Nov 29 17:40:30 2016 From: parulk at caltech.edu (Kudtarkar, Parul V.) Date: Tue, 29 Nov 2016 23:40:30 +0000 Subject: [maker-devel] error: training genemodel with SNAP and GeneMark & run time to generate AUGUTUS species file In-Reply-To: <596EAC73-4DB5-4144-A8EA-0E955AA0E028@gmail.com> References: <5F5AE8A3-967E-4876-8581-FE54FB676210@caltech.edu>, <596EAC73-4DB5-4144-A8EA-0E955AA0E028@gmail.com> Message-ID: Dear Carson and Daniel, Thanks for getting back to me promptly. Adding the path to genemark executable in maker_exe.ctl fixes the error. Hopefully optimize_augustus.pl runs quicker compared to autoAug.pl (which has been running for almost a week now) It would be interesting and we look forward to evaluate which model optimizes our expected gene count, AED values and has recognizable domains. PS. We think BUSCO has helped us to evaluate gene model completeness. Thanks, Parul ---- Parul Kudtarkar Bioinformatician Biology and Biological Engineering Office: 278 Beckman Institute California Institute of Technology MC 139-74 Pasadena CA 91125 http://www.echinobase.org ________________________________ From: Carson Holt Sent: Tuesday, November 29, 2016 9:34:31 AM To: Kudtarkar, Parul V. Cc: maker-devel at yandell-lab.org Subject: Re: error: training genemodel with SNAP and GeneMark & run time to generate AUGUTUS species file How to train Augustus -> http://www.molecularevolution.org/molevolfiles/exercises/augustus/training.html Step 2 shows how to create an empty species to start training with. Then Step 4 (optimize_augustus.pl) is the step that takes a while. Then for GeneMark, you must set the location of the necessary GeneMark executables in the maker_exe.ctl file. After getting all predictors trained, and running a few contigs, take a moment to review the predictor performance by manually reviewing them in something like Apollo. It is not uncommon that one or more perform poorly on an organism (they should each produce similar predictions). If one is significantly off relative to the other predictors and the evidence, it should be dropped. A bad behaving predictor will reduce the overall annotation performance. -Carson On Nov 29, 2016, at 10:13 AM, Kudtarkar, Parul V. > wrote: Dear Maker developers, 1. We use assembled RNAseq(from same species) and protein evidence(from evolutionary close species) to generate training gene structure(1st iteration, est2genome=1,protein2genome=1 ). 2. This is than used to train abinito gene predictors, SNAP and AUGUSTUS. 3. GeneMarkES( version: GeneMark-ES / ET v.4.32) is used to produce training data-set with the command gmes_petap.pl --sequence pmin_jelly.fa 4. We would be predicting genes using results from SNAP, Genemark and AUGUSTUS(2nd iteration, est2genome=0, protein2genome=0) I have couple of questions relating to Genemark and AUGUSTUS 1. AUGUSTUS We do not have a species file for species file of our interest or evolutionary closer species following command is used to generate species file /autoAug.pl --genome=pmin_jelly.fa --species=pminiata --cdna=pmin_transcripts.fa --trainingset=genome.gff3 --singleCPU -v --useexisting AUGUSTUS is taking too long to compute species file, is there a solution for this issue. Using species file from other organism might generate false positives. Is it advised in such situations to not used AUGUSTUS model? 2. Genemark I used the gmhmm file generated in the genemark output directory, however I encounter following error ------------------------- STATUS: Parsing control files... ERROR: You have failed to provide a value for 'gmhmme3' in the control files. ERROR: You have failed to provide a value for 'probuild' in the control files. --------------------- FYI ----- maker_opts.ctl #-----Gene Prediction snaphmm=/home/parul/Pmin_new/maker_snap/pmin1.hmm #SNAP HMM file gmhmm=/home/parul/Pmin_new/maker_snap/gmhmm.mod #GeneMark HMM file ----- Using SNAP for training gene model yields over 6000-7000 additional gene. The model has good cumulative AED value. I was hoping in addition to SNAP, if I could use AUGUSTUS and GeneMark to train the gene model to fuse dispersed models so that the gene count is within the expected range. Thanks and regards, Parul Sent from my iPhone -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Nov 30 13:24:36 2016 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 30 Nov 2016 12:24:36 -0700 Subject: [maker-devel] Error running MAKER In-Reply-To: References: <478D5289-91FD-4F3A-AED1-B2A81A742D43@gmail.com> <8E9C294A-B68C-42C1-999C-13165985AD93@gmail.com> <3C668404-EA3C-46B4-9676-8F95E2AFB64F@gmail.com> Message-ID: Yes. You can either separate out the contig using fasta_tool or find the contig in the datastore directory (failed contigs will have fasta created there just for the failed contig). Then you can use 'maker -g contig.fasta -base original_base_name? (-g and -base options) to specify that you want it to use the new contig fasta but write results to the given base directory (i.e. same as previous output directory). Remember to set -t (or tries in the maker_opts.ctl file) to a higher count when doing this. ?Carson > On Nov 30, 2016, at 12:11 PM, John Cornelius wrote: > > Awesome! Thanks for the help. MAKER finally finished it's initial run today however, I noticed that there was still one large sequence that failed. Would it be possible to run MAKER on just that sequence and then combine the result of that run with the output of my main maker run? > > On Thu, Nov 24, 2016 at 7:04 PM, Carson Holt > wrote: > A lock failure can become an issue if two separate jobs are running simultaneously. They may both try to process the same contig at the same time (modifying each others files) which will cause one or both to fail. On failure, it should always retry at some later point. So it can usually recover from this. If you see any partial lines in the resulting GFF3, then it did not recover and you need to just rerun whatever contig this happened on. > > ?Carson > > > >> On Nov 18, 2016, at 12:14 PM, John Cornelius > wrote: >> >> Would the lock failure cause problems with the annotation? It looks like Maker is still progressing, just not as quickly as I thought it would be. >> >> On Thu, Nov 17, 2016 at 9:04 PM, Carson Holt > wrote: >> To use less RAM, try lowering max_dna_len=, setting blast_depth= parameters to 20 pr 30 in maker_bopts.ctl (default is limitless), or when using MPI, starting fewer processes per node (requires manipulation of hostfile or using round robin distribution flag for MPI flavors where it is available). >> >> The memory issue could be causing the lock failure as well. >> >> ?Carson >> >> >> >>> On Nov 17, 2016, at 7:53 PM, John Cornelius > wrote: >>> >>> Ok, so I went and searched one of the output logs for all the lines that say ERROR and I got 44 lines with the following message: >>> >>> ERROR: Lock broken in runlog >>> >>> With these lines found at the end: >>> >>> ERROR: Failed while polishig ESTs >>> ERROR: Chunk failed at level:2, tier_type:3 >>> ERROR: Could not query process table: Cannot allocate memory at /packages/maker/2.31.8/bin/../lib/Proc/ProcessTable_simple.pm line 62. >>> >>> From that last line it looks like the process is running out of RAM would that be right? Thanks. >>> >>> On Fri, Nov 11, 2016 at 2:59 PM, Carson Holt > wrote: >>> The cause of the error is probably further back in the STDERR. With MPI so many processes are producing status and notes, that you can get several seconds of output after ta failure. If you kept the whole STDERR, I can help you look through it. searching for ?ERROR? all caps is usually where you will see it. Also MAKER keeps a log of progress, so even on failure, you can just restart it and it will pick up the analysis from the last successful step. >>> >>> ?Carson >>> >>> >>>> On Nov 10, 2016, at 3:43 PM, John Cornelius > wrote: >>>> >>>> Hello, I'm using MAKER to annotate a tetraploid genome and while running it, I encountered the following error: >>>> >>>> #--------- command -------------# >>>> Widget::exonerate::est2genome: >>>> /packages/exonerate-2.2.0/bin/exonerate -q /tmp/maker_08Elxf/15/TRINITY_GG_19079_c1670_g1_i1.for.84770203-84771247.15.fasta -t /tmp/maker_08Elxf/15/chr9_10L.84770203-84771247.15.fasta -Q dna -T dna --model est2genome --minintron 20 --maxintron 10000 --showcigar --percent 20 > /tmp/maker_08Elxf/15/chr9_10L.84770203-84771247.TRINITY_GG_19079_c1670_g1_i1.e.exonerate >>>> #-------------------------------# >>>> running est2genome search. >>>> #--------- command -------------# >>>> Widget::exonerate::est2genome: >>>> /packages/exonerate-2.2.0/bin/exonerate -q /tmp/maker_08Elxf/10/TRINITY_GG_87963_c9694_g10_i12.for.49475083-49475985.10.fasta -t /tmp/maker_08Elxf/10/chr6L.49475083-49475985.10.fasta -Q dna -T dna --model est2genome --minintron 20 --maxintron 10000 --showcigar --percent 20 > /tmp/maker_08Elxf/10/chr6L.49475083-49475985.TRINITY_GG_87963_c9694_g10_i12.e.exonerate >>>> #-------------------------------# >>>> >>>> =================================================================================== >>>> = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES >>>> = PID 132376 RUNNING AT pnap-pe7-s03 >>>> = EXIT CODE: 135 >>>> = CLEANING UP REMAINING PROCESSES >>>> = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES >>>> =================================================================================== >>>> YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Bus error (signal 7) >>>> This typically refers to a problem with your application. >>>> Please see the FAQ page for debugging suggestions >>>> >>>> The the command I ran was the following: >>>> >>>> #PBS -l walltime=240:00:00 >>>> #PBS -N MAKER >>>> #PBS -l nodes=1:ppn=16 >>>> ##PBS -q hmem >>>> #PBS -j oe >>>> #PBS -m abe >>>> #PBS -M jcornelius at tgen.org >>>> #PBS -A tgen-205000 >>>> #PBS -o /scratch/jcornelius/xenopus_laevis/maker_run >>>> >>>> # --- load required modules --- # >>>> >>>> module load maker >>>> >>>> # --- run maker --- # >>>> >>>> cd /scratch/jcornelius/xenopus_laevis/maker_run >>>> mpiexec -n 16 maker -base XLNEURO.run1 -fix_nucleotides >>>> >>>> I'm not sure what could be causing this error and any help would be much appreciated. Thanks. >>>> -- >>>> John Cornelius >>>> MCB PhD Candidate >>>> Arizona State University >>>> _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at box290.bluehost.com >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> >>> >>> >>> -- >>> John Cornelius >>> MCB PhD Candidate >>> Arizona State University >> >> >> >> >> -- >> John Cornelius >> MCB PhD Candidate >> Arizona State University > > > > > -- > John Cornelius > MCB PhD Candidate > Arizona State University -------------- next part -------------- An HTML attachment was scrubbed... URL: From FeatherstonJ at arc.agric.za Tue Nov 1 09:12:46 2016 From: FeatherstonJ at arc.agric.za (Jonathan Featherston) Date: Tue, 1 Nov 2016 15:12:46 +0000 Subject: [maker-devel] [Caution: Message contains Redirect URL content] InterProScan protein domain & AED physical evidence filtering In-Reply-To: References: Message-ID: <0C2463EA-53FD-4C9B-853A-BE933973E1FA@arc.agric.za> Dear Allison I'm not sure about your extra gene models but here is the script to perform quality filtering. A perl script I got from the forum somewhere (changed to txt in case it gets removed by mail server. Regards Jonathan -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: quality_filter.txt URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Nov 1 09:43:21 2016 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 1 Nov 2016 09:43:21 -0600 Subject: [maker-devel] [Caution: Message contains Redirect URL content] InterProScan protein domain & AED physical evidence filtering In-Reply-To: <0C2463EA-53FD-4C9B-853A-BE933973E1FA@arc.agric.za> References: <0C2463EA-53FD-4C9B-853A-BE933973E1FA@arc.agric.za> Message-ID: One note I?d like to make, is that doing a second round with keep_preds=1 is the wrong procedure (only do that if you really want to keep everything - i.e. in some fungi or oomycetes). Rather you should use InterProScan to evaluate the rejected models in the non-overlapping.abinit.proteins.fasta file, then grep the ones that have an IPR domain out of the GFF3 (will be match/match_part features) and then pass them to pred_gff in a separate run (just updates the format to gene/mRNA/exon/CDSwith proper reading frame). You can then merge the resulting GFF3's and fasta files. The reason there are differences between the runs is that there are models with AED less than 1 that get rejected for other reasons that you are brought back with keep_preds=1. For example if the only evidence is a protein alignment that has deep overlapping HSPs (extremely low complexity alignment) it will be filtered out even though AED is not technically equal to 1. Also if the overlapping protein evidence is in a different reading frame than the model it is supposed to support then the AED will be less than 1 but eAED will be 1 (extended AED), and the model will be rejected. ?Carson >> Hello MAKER google group, >> >> >> For the final round of a MAKER annotation for a de novo plant genome assembly, I ran MAKER twice: once with keep_preds=0 which annotated 20,284 genes and once with keep_preds=1 which annotated 34,055 genes. >> >> >> I ran the 34,055 genes (the keep_preds=1 set) through InterProScan to search the MAKER predictions for protein domain content and added this IPRScan output into the MAKER gff file with the ipr_update_gff accessory script. >> >> >> The game plan is to go through the 34,055 genes and remove any gene model that doesn? have either protein domain content or physical evidence. I am counting genes that have an AED=1 as the genes that don? have physical evidence. >> >> >> I have two questions: >> >> >> >> 1. I count 11,762 genes that have AED=1.0 in the keep_preds=1 annotation set, which leaves me with 22,293 genes that I? assuming have some physical evidence (34,055-11,762=22,293). But when I ran MAKER with keep_preds=0 originally, I only count 20,284 genes. What are the extra ?2,000 genes that are being annotated in the keep_preds=1 run that have and AED score of less than 1.0, but are not being annotated in the keep_preds=0 run? >> >> >> 2. My second question is if there is an accessory script available that will remove genes that lack either the IPRScan protein domains or physical evidence (AED < 1)? This type of gene removal was mentioned in a previous post from 2012 (https://groups.google.com/forum/#!searchin/maker-devel/sorry$20there$27s$20not$20a$20script$20prepackaged$20with$20MAKER$20for$20that$20yet.%7Csort:relevance/maker-devel/VaoXWlGHOjs/EElr_otrK8QJ ) and I was just wondering if since then someone wrote a script that will do this for me. >> >> >> >> If anyone could offer me any feedback, that would be greatly appreciated! >> >> >> >> Thank you, >> >> >> >> Allison >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From jacques.dainat at bils.se Tue Nov 1 10:08:45 2016 From: jacques.dainat at bils.se (Jacques Dainat) Date: Tue, 1 Nov 2016 17:08:45 +0100 Subject: [maker-devel] est_gff input does not provide any gene model In-Reply-To: References: Message-ID: <29E6299A-EA5F-4768-88CD-202ABB05AF89@bils.se> Thank you for the quick confirmation ! Just for clarification, what I provided to Maker was a correct gff3 file that indeed contain gene,mRNA,exon types but does not contain any CDS. I haven?t seen any information about the particular gff3 feature types expected for the est_gff files supplied. I think you should communicate more about it (within the maker_opt.ctl ?). It would be nice to stop the pipeline if the file provided contains no information. (When the file provided doesn?t exits too. The warning is not obvious to catch when launching on a cluster...) A last question. do the scores from the score column are used by MAKER from the est_gff file ? Jacques > On 01 Nov 2016, at 04:24, Carson Holt wrote: > > Evidence such as est_gff has to follow the alignment format used by GFF3 (i.e. match/match_part) whereas you are providing gene models (i.e. gene/mRNA/exon/CDS). Note that match/match_part are two level features whereas gene models are 3 levels. You need to reformat to match/match_part. > > ?Carson > > >> On Oct 31, 2016, at 4:51 AM, Jacques Dainat > wrote: >> >> Hello, >> >> I?m using usually Cufflinks output to feed Maker through the est_gff parameter, combined with the est2genome=1 parameter I get the wanted output. >> This time I used Stringtie output to feed Maker, but I don?t have any gene model predicted using the est2genome parameter. >> >> Any explanation ? Is it due to the gff3 format differences between these two file ? >> >> Cufflinks output example: >> Pnalgiovense_4592 Cufflinks match 363 977 17.844829 - . ID=1:s3_c1_r1.4.2;Name=1:s3_c1_r1.4.2; >> Pnalgiovense_4592 Cufflinks match_part 363 666 17.844829 - . ID=1:s3_c1_r1.4.2:exon-1;Name=1:s3_c1_r1.4.2;Parent=1:s3_c1_r1.4.2;Target=1:s3_c1_r1.4.2 1 304 +; >> Pnalgiovense_4592 Cufflinks match_part 743 977 17.844829 - . ID=1:s3_c1_r1.4.2:exon-2;Name=1:s3_c1_r1.4.2;Parent=1:s3_c1_r1.4.2;Target=1:s3_c1_r1.4.2 305 539 +; >> >> Stringtie output example: >> Pnalgiovense_112 StringTie gene 20 1256 1000 + . ID=HtMm_All.12253;cov=8.028295;fPKM=1.214491;gene_id=HtMm_All.12253;tPM=2.706611;transcript_id=HtMm_All.12253.1 >> Pnalgiovense_112 StringTie mRNA 20 1256 1000 + . ID=HtMm_All.12253.1;Parent=HtMm_All.12253;cov=8.028295;fPKM=1.214491;gene_id=HtMm_All.12253;tPM=2.706611;transcript_id=HtMm_All.12253.1 >> Pnalgiovense_112 StringTie exon 20 1256 1000 + . ID=HtMm_All.12253.1-exon-1;Parent=HtMm_All.12253.1;cov=8.028295;exon_number=1;gene_id=HtMm_All.12253;transcript_id=HtMm_All.12253.1 >> >> >> If it?s the Stringtie output that is problematic how can I fix it ? Removing gene, changing mRNA by match and exons by match_part is enough ? >> >> Best regards, >> >> >> Jacques Dainat, PhD >> NBIS (National Bioinformatics Infrastructure Sweden) >> Genome Annotation Service >> >> Address: (room E10:4204 - last floor) >> Uppsala University, BMC >> Department of Medical Biochemistry Microbiology, Genomics >> Husargatan 3, box 582 >> S-75123 Uppsala Sweden >> Phone: 01 84 71 46 25 >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Nov 1 10:25:36 2016 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 1 Nov 2016 10:25:36 -0600 Subject: [maker-devel] est_gff input does not provide any gene model In-Reply-To: <29E6299A-EA5F-4768-88CD-202ABB05AF89@bils.se> References: <29E6299A-EA5F-4768-88CD-202ABB05AF89@bils.se> Message-ID: <923C15DF-D705-416C-BCB8-CB87F1309797@gmail.com> The score will be ignored. The format to be used for evidence alignments is specified in the GFF3 spec (https://github.com/The-Sequence-Ontology/Specifications/blob/master/gff3.md ). An EST alignment example is also given as part of the GFF3 Spec. ?Carson > On Nov 1, 2016, at 10:08 AM, Jacques Dainat wrote: > > Thank you for the quick confirmation ! > > Just for clarification, what I provided to Maker was a correct gff3 file that indeed contain gene,mRNA,exon types but does not contain any CDS. > > I haven?t seen any information about the particular gff3 feature types expected for the est_gff files supplied. I think you should communicate more about it (within the maker_opt.ctl ?). > It would be nice to stop the pipeline if the file provided contains no information. (When the file provided doesn?t exits too. The warning is not obvious to catch when launching on a cluster...) > > A last question. do the scores from the score column are used by MAKER from the est_gff file ? > > Jacques > >> On 01 Nov 2016, at 04:24, Carson Holt > wrote: >> >> Evidence such as est_gff has to follow the alignment format used by GFF3 (i.e. match/match_part) whereas you are providing gene models (i.e. gene/mRNA/exon/CDS). Note that match/match_part are two level features whereas gene models are 3 levels. You need to reformat to match/match_part. >> >> ?Carson >> >> >>> On Oct 31, 2016, at 4:51 AM, Jacques Dainat > wrote: >>> >>> Hello, >>> >>> I?m using usually Cufflinks output to feed Maker through the est_gff parameter, combined with the est2genome=1 parameter I get the wanted output. >>> This time I used Stringtie output to feed Maker, but I don?t have any gene model predicted using the est2genome parameter. >>> >>> Any explanation ? Is it due to the gff3 format differences between these two file ? >>> >>> Cufflinks output example: >>> Pnalgiovense_4592 Cufflinks match 363 977 17.844829 - . ID=1:s3_c1_r1.4.2;Name=1:s3_c1_r1.4.2; >>> Pnalgiovense_4592 Cufflinks match_part 363 666 17.844829 - . ID=1:s3_c1_r1.4.2:exon-1;Name=1:s3_c1_r1.4.2;Parent=1:s3_c1_r1.4.2;Target=1:s3_c1_r1.4.2 1 304 +; >>> Pnalgiovense_4592 Cufflinks match_part 743 977 17.844829 - . ID=1:s3_c1_r1.4.2:exon-2;Name=1:s3_c1_r1.4.2;Parent=1:s3_c1_r1.4.2;Target=1:s3_c1_r1.4.2 305 539 +; >>> >>> Stringtie output example: >>> Pnalgiovense_112 StringTie gene 20 1256 1000 + . ID=HtMm_All.12253;cov=8.028295;fPKM=1.214491;gene_id=HtMm_All.12253;tPM=2.706611;transcript_id=HtMm_All.12253.1 >>> Pnalgiovense_112 StringTie mRNA 20 1256 1000 + . ID=HtMm_All.12253.1;Parent=HtMm_All.12253;cov=8.028295;fPKM=1.214491;gene_id=HtMm_All.12253;tPM=2.706611;transcript_id=HtMm_All.12253.1 >>> Pnalgiovense_112 StringTie exon 20 1256 1000 + . ID=HtMm_All.12253.1-exon-1;Parent=HtMm_All.12253.1;cov=8.028295;exon_number=1;gene_id=HtMm_All.12253;transcript_id=HtMm_All.12253.1 >>> >>> >>> If it?s the Stringtie output that is problematic how can I fix it ? Removing gene, changing mRNA by match and exons by match_part is enough ? >>> >>> Best regards, >>> >>> >>> Jacques Dainat, PhD >>> NBIS (National Bioinformatics Infrastructure Sweden) >>> Genome Annotation Service >>> >>> Address: (room E10:4204 - last floor) >>> Uppsala University, BMC >>> Department of Medical Biochemistry Microbiology, Genomics >>> Husargatan 3, box 582 >>> S-75123 Uppsala Sweden >>> Phone: 01 84 71 46 25 >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mohamed.amine.chebbi at univ-poitiers.fr Wed Nov 2 12:09:54 2016 From: mohamed.amine.chebbi at univ-poitiers.fr (Mohamed Amine Chebbi) Date: Wed, 2 Nov 2016 19:09:54 +0100 (CET) Subject: [maker-devel] ProtExcluder1.2 Error Message-ID: <236415532.6267908.1478110194546.JavaMail.zimbra@univ-poitiers.fr> Hi! I am working on creating a custom repeat library and I want to use ProtExcluder1.2 to trim potential genes from my repeat sequences. My blast version is BLAST 2.2.30+ I get this message error : Can not open the seqfile test.lib_blast_results.txt.fnolowm50seq mergeunmatchedregion.pl seqfile Illegal division by zero at ProtExcluder1.2/GCcontent.pl line 122. I wonder if you can help me to fix this. Thank you. Amine -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.s.campbell1 at gmail.com Thu Nov 3 11:57:35 2016 From: michael.s.campbell1 at gmail.com (Michael Campbell) Date: Thu, 3 Nov 2016 13:57:35 -0400 Subject: [maker-devel] ProtExcluder1.2 Error In-Reply-To: <236415532.6267908.1478110194546.JavaMail.zimbra@univ-poitiers.fr> References: <236415532.6267908.1478110194546.JavaMail.zimbra@univ-poitiers.fr> Message-ID: Hi Amine, That script is maintained by Ning Jiang and Kevin Childs. They know best what this script is expecting. I?ve ccd them on this email in the hope that they can provide some direction. Thanks, Mike > On Nov 2, 2016, at 2:09 PM, Mohamed Amine Chebbi wrote: > > Hi! > > I am working on creating a custom repeat library and I want to use ProtExcluder1.2 to trim potential genes from my repeat sequences. > My blast version is BLAST 2.2.30+ > > I get this message error : > > Can not open the seqfile test.lib_blast_results.txt.fnolowm50seq > mergeunmatchedregion.pl seqfile > Illegal division by zero at ProtExcluder1.2/GCcontent.pl line 122. > > I wonder if you can help me to fix this. > > Thank you. > > Amine > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From psh65 at cornell.edu Thu Nov 3 12:14:17 2016 From: psh65 at cornell.edu (Prashant S Hosmani) Date: Thu, 3 Nov 2016 18:14:17 +0000 Subject: [maker-devel] ProtExcluder1.2 Error In-Reply-To: References: <236415532.6267908.1478110194546.JavaMail.zimbra@univ-poitiers.fr> Message-ID: Hi Amine, I was getting similar error. You need to be careful with the blast versions. Try using the same blast version for makeblastdb. I was using BLAST 2.2.29+. After recreating new blast database with same version, it worked for me. Hope this helps. Prashant Prashant Hosmani Sol Genomics Network Boyce Thompson Institute, Ithaca, NY, USA On Nov 3, 2016, at 1:57 PM, Michael Campbell > wrote: Hi Amine, That script is maintained by Ning Jiang and Kevin Childs. They know best what this script is expecting. I?ve ccd them on this email in the hope that they can provide some direction. Thanks, Mike On Nov 2, 2016, at 2:09 PM, Mohamed Amine Chebbi > wrote: Hi! I am working on creating a custom repeat library and I want to use ProtExcluder1.2 to trim potential genes from my repeat sequences. My blast version is BLAST 2.2.30+ I get this message error : Can not open the seqfile test.lib_blast_results.txt.fnolowm50seq mergeunmatchedregion.pl seqfile Illegal division by zero at ProtExcluder1.2/GCcontent.pl line 122. I wonder if you can help me to fix this. Thank you. Amine _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From scott at scottcain.net Fri Nov 4 13:25:02 2016 From: scott at scottcain.net (Scott Cain) Date: Fri, 4 Nov 2016 15:25:02 -0400 Subject: [maker-devel] Last Call for GMOD talks at PAG Message-ID: Time is short! If you want to attend PAG and would like to present on a topic that would be of interest to the GMOD community, please send an abstract or at least a descriptive title to help at gmod.org. Types of talks typically include updates on GMOD software projects, usage stories for successful sites, proposals for new GMOD projects and descriptions of plugins for existing GMOD software projects like Tripal , JBrowse and Galaxy . Please consider giving a talk and sharing your experience and ideas! Scott -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research -------------- next part -------------- An HTML attachment was scrubbed... URL: From mohamed.amine.chebbi at univ-poitiers.fr Thu Nov 3 17:40:18 2016 From: mohamed.amine.chebbi at univ-poitiers.fr (chebbi mohamed amine) Date: Fri, 4 Nov 2016 00:40:18 +0100 (CET) Subject: [maker-devel] ProtExcluder1.2 Error In-Reply-To: <20161103185405.183337t1yq0no6x9@mail.msu.edu> References: <236415532.6267908.1478110194546.JavaMail.zimbra@univ-poitiers.fr> <20161103185405.183337t1yq0no6x9@mail.msu.edu> Message-ID: <1641376945.6912938.1478216418712.JavaMail.zimbra@univ-poitiers.fr> Hi ! Thank you Prashant for sharing your experience. Indeed using the same blast version 2.2.29 for makeblastdb seems to resolve the problem. It is looking to work fine for all the sequences except one as I have the message above: Fatal exception (source file ../../easel/esl_sqio_ascii.c, line 2001): Failed to fetch subsequence residues -- corrupt coords? sh: line 1: 46520 Aborted (core dumped) /hmmer-3.1b2-linux-intel-x86_64/binaries/esl-sfetch -c 1242..19031 all-te.lib rnd-4_family-1731#DNA >> blastx_results-all-te.txt.fnolowm50seq Did you encounter this problem before? Thank you for your help. Amine De: jiangn at msu.edu ?: "Prashant S Hosmani" Cc: "Michael Campbell" , "Mohamed Amine Chebbi" Envoy?: Jeudi 3 Novembre 2016 23:54:05 Objet: Re: [maker-devel] ProtExcluder1.2 Error Hi Prashant, Thank you so much for sharing your experience. It is important to keep everything in the same version. I will remind users about this when we update it and I may need to bother you then. Best regards, Ning Quoting Prashant S Hosmani : > Hi Amine, > > I was getting similar error. You need to be careful with the blast > versions. Try using the same blast version for makeblastdb. I was > using BLAST 2.2.29+. After recreating new blast database with same > version, it worked for me. > > Hope this helps. > Prashant > > > Prashant Hosmani > Sol Genomics Network > Boyce Thompson Institute, Ithaca, NY, USA > > > > On Nov 3, 2016, at 1:57 PM, Michael Campbell > > > wrote: > > Hi Amine, > > That script is maintained by Ning Jiang and Kevin Childs. They know > best what this script is expecting. I?ve ccd them on this email in > the hope that they can provide some direction. > > Thanks, > Mike > On Nov 2, 2016, at 2:09 PM, Mohamed Amine Chebbi > > > wrote: > > Hi! > > I am working on creating a custom repeat library and I want to use > ProtExcluder1.2 to trim potential genes from my repeat sequences. > My blast version is BLAST 2.2.30+ > > I get this message error : > > Can not open the seqfile test.lib_blast_results.txt.fnolowm50seq > mergeunmatchedregion.pl seqfile > Illegal division by zero at ProtExcluder1.2/GCcontent.pl line 122. > > I wonder if you can help me to fix this. > > Thank you. > > Amine > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mohamed.amine.chebbi at univ-poitiers.fr Fri Nov 4 04:44:02 2016 From: mohamed.amine.chebbi at univ-poitiers.fr (chebbi mohamed amine) Date: Fri, 4 Nov 2016 11:44:02 +0100 (CET) Subject: [maker-devel] ProtExcluder1.2 Error In-Reply-To: <1827195032.6913929.1478217820889.JavaMail.zimbra@univ-poitiers.fr> References: <236415532.6267908.1478110194546.JavaMail.zimbra@univ-poitiers.fr> <20161103185405.183337t1yq0no6x9@mail.msu.edu> <1641376945.6912938.1478216418712.JavaMail.zimbra@univ-poitiers.fr> <20161103195409.76212s1yy72mv95t@mail.msu.edu> <1827195032.6913929.1478217820889.JavaMail.zimbra@univ-poitiers.fr> Message-ID: <838628537.7128959.1478256242111.JavaMail.zimbra@univ-poitiers.fr> Hi J iangn ! I did some modifications in the script ProtExcluder1.2/mspesl-sfetch.pl by replacing : "esl-sfetch --index $ARGV[0] " by "samtools faidx $ARGV[0]" and "esl-sfetch -c $from..$to $ARGV[0] $line[7] >> $ARGV[3]" by "samtools faidx $ARGV[0] $line[7]:$from-$to >> $ARGV[3]" it works fine know and the script can extract the subsequences correctly. Best regard, Amine De: "chebbi mohamed amine" ?: "jiangn" Envoy?: Vendredi 4 Novembre 2016 01:03:40 Objet: Re: [maker-devel] ProtExcluder1.2 Error Hi J iangn In fact, this sequence has a size of 19031 bases. When I try the command /hmmer-3.1b2-linux-intel-x86_64/binaries/esl-sfetch -c 1242..19031 all-te.lib rnd-4_family-1731#DNA I get the error , however by testing with coordiantes inferior to 19031 it works fine. I think that it's a related problem to hmmer. I will try to add manualy the subsequence to the file .fnolowm50seq. Thank you Amine De: "jiangn" ?: "chebbi mohamed amine" Cc: "Prashant S Hosmani" , "Michael Campbell" Envoy?: Vendredi 4 Novembre 2016 00:54:09 Objet: Re: [maker-devel] ProtExcluder1.2 Error Hi Amine, I don't have this kind of experience. If only one sequence failed, I would suspect there might be some format issue for that specific sequence. Regards, Ning Quoting chebbi mohamed amine : > > Hi ! > > Thank you Prashant for sharing your experience. Indeed using the same > blast version 2.2.29 for makeblastdb seems to resolve the problem. It > is looking to work fine for all the sequences except one as I have > the message above: > > Fatal exception (source file ../../easel/esl_sqio_ascii.c, line 2001): > Failed to fetch subsequence residues -- corrupt coords? > sh: line 1: 46520 Aborted (core dumped) > /hmmer-3.1b2-linux-intel-x86_64/binaries/esl-sfetch -c 1242..19031 > all-te.lib rnd-4_family-1731#DNA >> > blastx_results-all-te.txt.fnolowm50seq > > Did you encounter this problem before? > > Thank you for your help. > > Amine > > > De: jiangn at msu.edu > ?: "Prashant S Hosmani" > Cc: "Michael Campbell" , "Mohamed > Amine Chebbi" > Envoy?: Jeudi 3 Novembre 2016 23:54:05 > Objet: Re: [maker-devel] ProtExcluder1.2 Error > > > > Hi Prashant, > > Thank you so much for sharing your experience. It is important to > keep everything in the same version. I will remind users about this > when we update it and I may need to bother you then. > > Best regards, > > Ning > > Quoting Prashant S Hosmani : > >> Hi Amine, >> >> I was getting similar error. You need to be careful with the blast >> versions. Try using the same blast version for makeblastdb. I was >> using BLAST 2.2.29+. After recreating new blast database with same >> version, it worked for me. >> >> Hope this helps. >> Prashant >> >> >> Prashant Hosmani >> Sol Genomics Network >> Boyce Thompson Institute, Ithaca, NY, USA >> >> >> >> On Nov 3, 2016, at 1:57 PM, Michael Campbell >> > >> wrote: >> >> Hi Amine, >> >> That script is maintained by Ning Jiang and Kevin Childs. They know >> best what this script is expecting. I?ve ccd them on this email in >> the hope that they can provide some direction. >> >> Thanks, >> Mike >> On Nov 2, 2016, at 2:09 PM, Mohamed Amine Chebbi >> > >> wrote: >> >> Hi! >> >> I am working on creating a custom repeat library and I want to use >> ProtExcluder1.2 to trim potential genes from my repeat sequences. >> My blast version is BLAST 2.2.30+ >> >> I get this message error : >> >> Can not open the seqfile test.lib_blast_results.txt.fnolowm50seq >> mergeunmatchedregion.pl seqfile >> Illegal division by zero at ProtExcluder1.2/GCcontent.pl line 122. >> >> I wonder if you can help me to fix this. >> >> Thank you. >> >> Amine >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jiangn at msu.edu Fri Nov 4 14:35:43 2016 From: jiangn at msu.edu (jiangn at msu.edu) Date: Fri, 04 Nov 2016 16:35:43 -0400 Subject: [maker-devel] ProtExcluder1.2 Error In-Reply-To: <838628537.7128959.1478256242111.JavaMail.zimbra@univ-poitiers.fr> References: <236415532.6267908.1478110194546.JavaMail.zimbra@univ-poitiers.fr> <20161103185405.183337t1yq0no6x9@mail.msu.edu> <1641376945.6912938.1478216418712.JavaMail.zimbra@univ-poitiers.fr> <20161103195409.76212s1yy72mv95t@mail.msu.edu> <1827195032.6913929.1478217820889.JavaMail.zimbra@univ-poitiers.fr> <838628537.7128959.1478256242111.JavaMail.zimbra@univ-poitiers.fr> Message-ID: <20161104163543.98626jb6y81eis67@mail.msu.edu> Hi Amine, That's good to know. Thank you! Ning Quoting chebbi mohamed amine : > Hi J iangn ! > > I did some modifications in the script > ProtExcluder1.2/mspesl-sfetch.pl by replacing : > > "esl-sfetch --index $ARGV[0] " by "samtools faidx $ARGV[0]" > and > "esl-sfetch -c $from..$to $ARGV[0] $line[7] >> $ARGV[3]" by "samtools > faidx $ARGV[0] $line[7]:$from-$to >> $ARGV[3]" > > it works fine know and the script can extract the subsequences correctly. > > Best regard, > Amine > > > De: "chebbi mohamed amine" > ?: "jiangn" > Envoy?: Vendredi 4 Novembre 2016 01:03:40 > Objet: Re: [maker-devel] ProtExcluder1.2 Error > > Hi J iangn > > In fact, this sequence has a size of 19031 bases. > When I try the command > /hmmer-3.1b2-linux-intel-x86_64/binaries/esl-sfetch -c 1242..19031 > all-te.lib rnd-4_family-1731#DNA I get the error , however by testing > with coordiantes inferior to 19031 it works fine. I think that it's a > related problem to hmmer. I will try to add manualy the subsequence > to the file .fnolowm50seq. > > Thank you > Amine > > De: "jiangn" > ?: "chebbi mohamed amine" > Cc: "Prashant S Hosmani" , "Michael Campbell" > > Envoy?: Vendredi 4 Novembre 2016 00:54:09 > Objet: Re: [maker-devel] ProtExcluder1.2 Error > > > > Hi Amine, > > I don't have this kind of experience. If only one sequence failed, I > would suspect there might be some format issue for that specific > sequence. > > Regards, > > Ning > > Quoting chebbi mohamed amine : > >> >> Hi ! >> >> Thank you Prashant for sharing your experience. Indeed using the same >> blast version 2.2.29 for makeblastdb seems to resolve the problem. It >> is looking to work fine for all the sequences except one as I have >> the message above: >> >> Fatal exception (source file ../../easel/esl_sqio_ascii.c, line 2001): >> Failed to fetch subsequence residues -- corrupt coords? >> sh: line 1: 46520 Aborted (core dumped) >> /hmmer-3.1b2-linux-intel-x86_64/binaries/esl-sfetch -c 1242..19031 >> all-te.lib rnd-4_family-1731#DNA >> >> blastx_results-all-te.txt.fnolowm50seq >> >> Did you encounter this problem before? >> >> Thank you for your help. >> >> Amine >> >> >> De: jiangn at msu.edu >> ?: "Prashant S Hosmani" >> Cc: "Michael Campbell" , "Mohamed >> Amine Chebbi" >> Envoy?: Jeudi 3 Novembre 2016 23:54:05 >> Objet: Re: [maker-devel] ProtExcluder1.2 Error >> >> >> >> Hi Prashant, >> >> Thank you so much for sharing your experience. It is important to >> keep everything in the same version. I will remind users about this >> when we update it and I may need to bother you then. >> >> Best regards, >> >> Ning >> >> Quoting Prashant S Hosmani : >> >>> Hi Amine, >>> >>> I was getting similar error. You need to be careful with the blast >>> versions. Try using the same blast version for makeblastdb. I was >>> using BLAST 2.2.29+. After recreating new blast database with same >>> version, it worked for me. >>> >>> Hope this helps. >>> Prashant >>> >>> >>> Prashant Hosmani >>> Sol Genomics Network >>> Boyce Thompson Institute, Ithaca, NY, USA >>> >>> >>> >>> On Nov 3, 2016, at 1:57 PM, Michael Campbell >>> > >>> wrote: >>> >>> Hi Amine, >>> >>> That script is maintained by Ning Jiang and Kevin Childs. They know >>> best what this script is expecting. I?ve ccd them on this email in >>> the hope that they can provide some direction. >>> >>> Thanks, >>> Mike >>> On Nov 2, 2016, at 2:09 PM, Mohamed Amine Chebbi >>> > >>> wrote: >>> >>> Hi! >>> >>> I am working on creating a custom repeat library and I want to use >>> ProtExcluder1.2 to trim potential genes from my repeat sequences. >>> My blast version is BLAST 2.2.30+ >>> >>> I get this message error : >>> >>> Can not open the seqfile test.lib_blast_results.txt.fnolowm50seq >>> mergeunmatchedregion.pl seqfile >>> Illegal division by zero at ProtExcluder1.2/GCcontent.pl line 122. >>> >>> I wonder if you can help me to fix this. >>> >>> Thank you. >>> >>> Amine >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> >> >> > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pierre-francois.bert at inra.fr Tue Nov 8 05:13:55 2016 From: pierre-francois.bert at inra.fr (Pierre-Francois Bert) Date: Tue, 8 Nov 2016 12:13:55 +0000 Subject: [maker-devel] Maker-P Message-ID: <1478607235425.40152@inra.fr> Hello, I'm interested in using maker-p but I can't find it within the last version 3 and neither find v2.29 to download. Can your please tell me how to proceed ? Best wishes. Pierre-Fran?ois Bert -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Nov 9 12:00:08 2016 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 9 Nov 2016 12:00:08 -0700 Subject: [maker-devel] Maker-P In-Reply-To: <1478607235425.40152@inra.fr> References: <1478607235425.40152@inra.fr> Message-ID: MAKER-P?s features and accessory scripts were integrated into MAKER with versions 2.29 and above as stated on the MAKER-P page. There is no longer a separate MAKER-P download and it is not a separate executable. You just download MAKER 2.29 or above and run .../maker/bin/maker ?Carson > On Nov 8, 2016, at 5:13 AM, Pierre-Francois Bert wrote: > > Hello, > I'm interested in using maker-p but I can't find it within the last version 3 and neither find v2.29 to download. > Can your please tell me how to proceed ? > Best wishes. > Pierre-Fran?ois Bert > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From jcornel3 at asu.edu Thu Nov 10 15:43:56 2016 From: jcornel3 at asu.edu (John Cornelius) Date: Thu, 10 Nov 2016 15:43:56 -0700 Subject: [maker-devel] Error running MAKER Message-ID: Hello, I'm using MAKER to annotate a tetraploid genome and while running it, I encountered the following error: #--------- command -------------# Widget::exonerate::est2genome: /packages/exonerate-2.2.0/bin/exonerate -q /tmp/maker_08Elxf/15/TRINITY_GG_19079_c1670_g1_i1.for.84770203-84771247.15.fasta -t /tmp/maker_08Elxf/15/chr9_10L.84770203-84771247.15.fasta -Q dna -T dna --model est2genome --minintron 20 --maxintron 10000 --showcigar --percent 20 > /tmp/maker_08Elxf/15/chr9_10L.84770203-84771247.TRINITY_GG_19079_c1670_g1_i1.e.exonerate #-------------------------------# running est2genome search. #--------- command -------------# Widget::exonerate::est2genome: /packages/exonerate-2.2.0/bin/exonerate -q /tmp/maker_08Elxf/10/TRINITY_GG_87963_c9694_g10_i12.for.49475083-49475985.10.fasta -t /tmp/maker_08Elxf/10/chr6L.49475083-49475985.10.fasta -Q dna -T dna --model est2genome --minintron 20 --maxintron 10000 --showcigar --percent 20 > /tmp/maker_08Elxf/10/chr6L.49475083-49475985.TRINITY_GG_87963_c9694_g10_i12.e.exonerate #-------------------------------# =================================================================================== = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES = PID 132376 RUNNING AT pnap-pe7-s03 = EXIT CODE: 135 = CLEANING UP REMAINING PROCESSES = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES =================================================================================== YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Bus error (signal 7) This typically refers to a problem with your application. Please see the FAQ page for debugging suggestions The the command I ran was the following: #PBS -l walltime=240:00:00 #PBS -N MAKER #PBS -l nodes=1:ppn=16 ##PBS -q hmem #PBS -j oe #PBS -m abe #PBS -M jcornelius at tgen.org #PBS -A tgen-205000 #PBS -o /scratch/jcornelius/xenopus_laevis/maker_run # --- load required modules --- # module load maker # --- run maker --- # cd /scratch/jcornelius/xenopus_laevis/maker_run mpiexec -n 16 maker -base XLNEURO.run1 -fix_nucleotides I'm not sure what could be causing this error and any help would be much appreciated. Thanks. -- John Cornelius MCB PhD Candidate Arizona State University -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Nov 11 14:59:54 2016 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 11 Nov 2016 14:59:54 -0700 Subject: [maker-devel] Error running MAKER In-Reply-To: References: Message-ID: <478D5289-91FD-4F3A-AED1-B2A81A742D43@gmail.com> The cause of the error is probably further back in the STDERR. With MPI so many processes are producing status and notes, that you can get several seconds of output after ta failure. If you kept the whole STDERR, I can help you look through it. searching for ?ERROR? all caps is usually where you will see it. Also MAKER keeps a log of progress, so even on failure, you can just restart it and it will pick up the analysis from the last successful step. ?Carson > On Nov 10, 2016, at 3:43 PM, John Cornelius wrote: > > Hello, I'm using MAKER to annotate a tetraploid genome and while running it, I encountered the following error: > > #--------- command -------------# > Widget::exonerate::est2genome: > /packages/exonerate-2.2.0/bin/exonerate -q /tmp/maker_08Elxf/15/TRINITY_GG_19079_c1670_g1_i1.for.84770203-84771247.15.fasta -t /tmp/maker_08Elxf/15/chr9_10L.84770203-84771247.15.fasta -Q dna -T dna --model est2genome --minintron 20 --maxintron 10000 --showcigar --percent 20 > /tmp/maker_08Elxf/15/chr9_10L.84770203-84771247.TRINITY_GG_19079_c1670_g1_i1.e.exonerate > #-------------------------------# > running est2genome search. > #--------- command -------------# > Widget::exonerate::est2genome: > /packages/exonerate-2.2.0/bin/exonerate -q /tmp/maker_08Elxf/10/TRINITY_GG_87963_c9694_g10_i12.for.49475083-49475985.10.fasta -t /tmp/maker_08Elxf/10/chr6L.49475083-49475985.10.fasta -Q dna -T dna --model est2genome --minintron 20 --maxintron 10000 --showcigar --percent 20 > /tmp/maker_08Elxf/10/chr6L.49475083-49475985.TRINITY_GG_87963_c9694_g10_i12.e.exonerate > #-------------------------------# > > =================================================================================== > = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES > = PID 132376 RUNNING AT pnap-pe7-s03 > = EXIT CODE: 135 > = CLEANING UP REMAINING PROCESSES > = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES > =================================================================================== > YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Bus error (signal 7) > This typically refers to a problem with your application. > Please see the FAQ page for debugging suggestions > > The the command I ran was the following: > > #PBS -l walltime=240:00:00 > #PBS -N MAKER > #PBS -l nodes=1:ppn=16 > ##PBS -q hmem > #PBS -j oe > #PBS -m abe > #PBS -M jcornelius at tgen.org > #PBS -A tgen-205000 > #PBS -o /scratch/jcornelius/xenopus_laevis/maker_run > > # --- load required modules --- # > > module load maker > > # --- run maker --- # > > cd /scratch/jcornelius/xenopus_laevis/maker_run > mpiexec -n 16 maker -base XLNEURO.run1 -fix_nucleotides > > I'm not sure what could be causing this error and any help would be much appreciated. Thanks. > -- > John Cornelius > MCB PhD Candidate > Arizona State University > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From lmeunier at ulg.ac.be Mon Nov 14 01:50:50 2016 From: lmeunier at ulg.ac.be (=?UTF-8?B?TG/Dr2M=?=) Date: Mon, 14 Nov 2016 09:50:50 +0100 Subject: [maker-devel] Predictions without evidence Message-ID: Hello, I am a Ph. D. student, and I am using MAKER to automate gene prediction for many genomes as part of a genome mining work, so I don't include evidence for its use. If I understood well, when exploiting multiple gene predictor softwares, AED is used to define the prediction which matches the best the evidence. So, as I don't use evidence, is there a choice made by MAKER when working with multiple gene predictors? If yes, how does it work? Also, I have not well understood, if the selection of the gene predictor to use is made for every gene? Sorry to asking if the answer is obvious, but after reading your papers and looking on the archived posts, I have not found the answer. By the way, I have also a question about your paper on MAKER2 (Holt and Yandell, 2011). It is said many times that gene predictors used in MAKER pipeline give better results than when used alone, but I have not understand why. Can you explain this fact? Best regards, Lo?c Meunier From jacques.dainat at bils.se Mon Nov 14 01:55:06 2016 From: jacques.dainat at bils.se (Jacques Dainat) Date: Mon, 14 Nov 2016 09:55:06 +0100 Subject: [maker-devel] strand of single exon EST from fasta Message-ID: <2E91C252-D244-47A2-B896-99EE0F69EBBA@bils.se> Hello, I?m annotating several strains of a same fungus, and I have stranded RNAseq for all of them. I?m using MAKER3. Let?s say I?m annotating the species1 using its species-specific assembled transcripts that are in gff. I know that MAKER cannot do anything about the strand coming from the est_gff. In order to check that everything went fine during my transcriptome assembly and the strands correctly defined, I checked the annotation within a browser. I can see the strands from my transcripts in gff format were perfect (match with the proteins strands / and with abinitio prediction strands / and ORFs are OK). As I wanted to take advantage on my other strains RNAseq I decided to use them within this annotation. As the transcriptome assemblies of these RNAseq have been done based on their corresponding genomes, I cannot use the gff files. Indeed, the location are not corresponding to the genome of my species1. So I decided to extract the sequences in fasta format to feed MAKER with (alt_est parameter). When I visualise those transcript alignements I was really surprised by the strands decided by MAKER. It seems completely random, while all the est fasta sequences from a same locus are given in the same strand. So, I have two questions: 1) How the strand is decided for single exon EST provided in fasta format ? (I thought it was based on the longest ORF) 2) Is it normal that the second annotation using these alt_est is worse (far less gene models) than the previous one ? (I thought the strand of my single exon alt_ests would not play a role during the the annotation process. Or maybe it?s another biais from these alt_est => loci less well defined ?) Here 3 examples: The top green track has the correct strand and is based on the gff file. The bottom green cluster tracks are fasta sequences from the other strains aligned through MAKER. (I dont?t know if it could play a role but all sequences from a same locus have been sent to MAKER in the same strand). Thank you very much for your help, Jacques Dainat -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Screen Shot 2016-11-13 at 13.05.24.png Type: image/png Size: 52019 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Screen Shot 2016-11-13 at 13.05.44.png Type: image/png Size: 26966 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Screen Shot 2016-11-13 at 13.07.13.png Type: image/png Size: 24338 bytes Desc: not available URL: From carsonhh at gmail.com Mon Nov 14 13:08:13 2016 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 14 Nov 2016 13:08:13 -0700 Subject: [maker-devel] strand of single exon EST from fasta In-Reply-To: <2E91C252-D244-47A2-B896-99EE0F69EBBA@bils.se> References: <2E91C252-D244-47A2-B896-99EE0F69EBBA@bils.se> Message-ID: Single exon EST and alt-EST strand are based on longest ORF. In the event that there is a tie, then whatever strand that was assigned by the aligner would be maintained. alt-ESTs are less likely to align or produce a model than the ESTs. If you have competing models on opposite strands for the same CDS, then support from ab initio, spliced EST, or exonerate protein alignments will be needed for the model. ?Carson > On Nov 14, 2016, at 1:55 AM, Jacques Dainat wrote: > > Hello, > > I?m annotating several strains of a same fungus, and I have stranded RNAseq for all of them. I?m using MAKER3. > Let?s say I?m annotating the species1 using its species-specific assembled transcripts that are in gff. I know that MAKER cannot do anything about the strand coming from the est_gff. In order to check that everything went fine during my transcriptome assembly and the strands correctly defined, I checked the annotation within a browser. I can see the strands from my transcripts in gff format were perfect (match with the proteins strands / and with abinitio prediction strands / and ORFs are OK). > > As I wanted to take advantage on my other strains RNAseq I decided to use them within this annotation. As the transcriptome assemblies of these RNAseq have been done based on their corresponding genomes, I cannot use the gff files. Indeed, the location are not corresponding to the genome of my species1. So I decided to extract the sequences in fasta format to feed MAKER with (alt_est parameter). > When I visualise those transcript alignements I was really surprised by the strands decided by MAKER. It seems completely random, while all the est fasta sequences from a same locus are given in the same strand. > > So, I have two questions: > 1) How the strand is decided for single exon EST provided in fasta format ? (I thought it was based on the longest ORF) > 2) Is it normal that the second annotation using these alt_est is worse (far less gene models) than the previous one ? (I thought the strand of my single exon alt_ests would not play a role during the the annotation process. Or maybe it?s another biais from these alt_est => loci less well defined ?) > > > > Here 3 examples: The top green track has the correct strand and is based on the gff file. The bottom green cluster tracks are fasta sequences from the other strains aligned through MAKER. (I dont?t know if it could play a role but all sequences from a same locus have been sent to MAKER in the same strand). > > > Thank you very much for your help, > > Jacques Dainat > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Nov 14 13:18:26 2016 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 14 Nov 2016 13:18:26 -0700 Subject: [maker-devel] Predictions without evidence In-Reply-To: References: Message-ID: <7BDEAAF4-230C-4315-B353-43381237BCB0@gmail.com> Gene predictors have to be trained on each organism to generate a matched HMM. If they are not trained, they will not work well. MAKER also sends hints to the predictor based on the evidence alignments to further alter probabilities used by the predictor to better match the evidence. Evidence is also used in final filtering. All models without evidence will have an AED of 1, which means no support. Not using evidence will result in very poor models especially if you don?t have an HMM built exactly for the organism. The main problem will be over prediction. Note the behavior of SNAP alone in the MAKER2 paper. The result is tens of thousands of false positive gene models. If you only run multiple gene predictors without evidence, the final model will be whatever model has the best consensus structure for the set. If the set consists of two models, then there is no consensus and the longest one is kept. ?Carson > On Nov 14, 2016, at 1:50 AM, Lo?c wrote: > > Hello, > > I am a Ph. D. student, and I am using MAKER to automate gene prediction for many genomes as part of a genome mining work, so I don't include evidence for its use. > If I understood well, when exploiting multiple gene predictor softwares, AED is used to define the prediction which matches the best the evidence. > > So, as I don't use evidence, is there a choice made by MAKER when working with multiple gene predictors? If yes, how does it work? > Also, I have not well understood, if the selection of the gene predictor to use is made for every gene? > > Sorry to asking if the answer is obvious, but after reading your papers and looking on the archived posts, I have not found the answer. > > By the way, I have also a question about your paper on MAKER2 (Holt and Yandell, 2011). It is said many times that gene predictors used in MAKER pipeline give better results than when used alone, but I have not understand why. Can you explain this fact? > > Best regards, > > Lo?c Meunier > > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Thu Nov 17 14:05:53 2016 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 17 Nov 2016 14:05:53 -0700 Subject: [maker-devel] About split genes problem in Maker annotations In-Reply-To: <75508AB460A77C4798EC49425637E292194A0DA6@PETREL-MA.imcb.a-star.edu.sg> References: <75508AB460A77C4798EC49425637E292194A0DA6@PETREL-MA.imcb.a-star.edu.sg> Message-ID: <36BBB195-EEB4-4B3A-9463-3E4171731390@gmail.com> est2genome and protein2genome should only be used for initial training. They are not predictors, rather they take an EST/protein alignment, find the longest ORF and then turn the ORF directly into a gene model. It is good enough to build a training dataset, but the models will almost always be partial and fragmented. Also because the alignments both produce and support themselves, they always score well, so their AED values are meaningless. Once you have a predictor trained, you should turn est2genome and protein2genome off. With a trained predictor, the alignments will then serve as hints to Augustus as to where likely introns/exons will be, and this will give the desired behavior. Note Augustus will attempt to build the most probable model given the hints and the assembly sequence. If there are any assembly issues affecting the ORF, the predictor will often skip exons or split the model in the locus. Also make sure you have built a species specific repeat library to add to the default repeat libraries used by MAKER (you can use tools like RepeatModeler to do this). Otherwise you will get spurious alignments of much of your evidecence and Augustus will generate false positive results. You may also want to add a large dataset like Uniprot/swiss-prot to the protein evidence. The best way to evaluate annotations and performance is to visually review annotation in tools like Apollo. It will allow you to see if evidence, gene predictions, and final models achieve consensus or if alignments don?t match (spurious alignment generally suggests a repeat masking issue or evidence quality issue) or if raw ab initio predictions don?t match (indicates insufficient training or an underlying assembly issues). ?Carson > On Nov 16, 2016, at 8:01 PM, Prashant Narendra SHINGATE wrote: > > Hi Carson, > > We are annotating the genome of a fish with a relatively small genome (~450Mb) using Maker and encountering many genes that are split and predicted as multiple genes. We are using Augustus for de novo prediction. Fortunately we have full-length RNAseq for about 4000 genes (and total ~50k transcripts) from the same species, and whole-genome protein sequences from a very closely related species. > > First we trained Augustus using ~4000 full length RNAseq transcript from the same species. This trained Augustus model was used in the Maker annotation pipeline along with ~50k RNAseq transcripts (>1000bp) and whole-genome proteins sequences from a closely related species. > > We first tried annotating using the options est2genome=1, protein2genome=1 and Augustus ON. We found several genes were split and the program seemed to give weight to Augustus prediction in spite of having full-length RNAseq and protein sequences aligned to the gene predicted loci (visualized using Jbrowser). > > In the next trial we used est2genome=1, protein2genome=1 and Augustus OFF in the first step. In the second step we did reiteration by est2genome=0, protein2genome=0 and Augustus ON. Still the output contained split genes. > > In the third trial we used est2genome=1, protein2genome=1 and Augustus OFF and checked the output. In this output full-length genes were predicted whenever full-length RNAseq and/or protein sequences were available. This seems to suggest that when we use Augustus, more weight is given to Augustus de novo prediction and the synthesis of evidence from RNAseq and protein sequences is not happening. > > Can you please let us know why we are getting split genes in spite of having full-length RNAseq and/or protein sequences? What changes would you suggest to the protocol to overcome this problem? > > We thank you very much for your help and time. > > Regards, > Prashant Shingate, PhD :: Research Fellow :: Comparative and Medical Genomics Lab :: Institute of Molecular and Cell Biology (IMCB) :: Agency for Science, Technology and Research (A*STAR) > 61 Biopolis Drive :: #05-04 Proteos :: Singapore 138673 :: DID (+65) 6586 9570 :: Fax (+65) 6779 1117:: http://www.imcb.a-star.edu.sg/ > We advance science and develop innovative technology to further economic growth and improve lives. > > > > > Note: This message may contain confidential information. If this Email/Fax has been sent to you by mistake, please notify the sender and delete it immediately. Thank you. -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Nov 17 21:04:31 2016 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 17 Nov 2016 21:04:31 -0700 Subject: [maker-devel] Error running MAKER In-Reply-To: References: <478D5289-91FD-4F3A-AED1-B2A81A742D43@gmail.com> Message-ID: <8E9C294A-B68C-42C1-999C-13165985AD93@gmail.com> To use less RAM, try lowering max_dna_len=, setting blast_depth= parameters to 20 pr 30 in maker_bopts.ctl (default is limitless), or when using MPI, starting fewer processes per node (requires manipulation of hostfile or using round robin distribution flag for MPI flavors where it is available). The memory issue could be causing the lock failure as well. ?Carson > On Nov 17, 2016, at 7:53 PM, John Cornelius wrote: > > Ok, so I went and searched one of the output logs for all the lines that say ERROR and I got 44 lines with the following message: > > ERROR: Lock broken in runlog > > With these lines found at the end: > > ERROR: Failed while polishig ESTs > ERROR: Chunk failed at level:2, tier_type:3 > ERROR: Could not query process table: Cannot allocate memory at /packages/maker/2.31.8/bin/../lib/Proc/ProcessTable_simple.pm line 62. > > From that last line it looks like the process is running out of RAM would that be right? Thanks. > > On Fri, Nov 11, 2016 at 2:59 PM, Carson Holt > wrote: > The cause of the error is probably further back in the STDERR. With MPI so many processes are producing status and notes, that you can get several seconds of output after ta failure. If you kept the whole STDERR, I can help you look through it. searching for ?ERROR? all caps is usually where you will see it. Also MAKER keeps a log of progress, so even on failure, you can just restart it and it will pick up the analysis from the last successful step. > > ?Carson > > >> On Nov 10, 2016, at 3:43 PM, John Cornelius > wrote: >> >> Hello, I'm using MAKER to annotate a tetraploid genome and while running it, I encountered the following error: >> >> #--------- command -------------# >> Widget::exonerate::est2genome: >> /packages/exonerate-2.2.0/bin/exonerate -q /tmp/maker_08Elxf/15/TRINITY_GG_19079_c1670_g1_i1.for.84770203-84771247.15.fasta -t /tmp/maker_08Elxf/15/chr9_10L.84770203-84771247.15.fasta -Q dna -T dna --model est2genome --minintron 20 --maxintron 10000 --showcigar --percent 20 > /tmp/maker_08Elxf/15/chr9_10L.84770203-84771247.TRINITY_GG_19079_c1670_g1_i1.e.exonerate >> #-------------------------------# >> running est2genome search. >> #--------- command -------------# >> Widget::exonerate::est2genome: >> /packages/exonerate-2.2.0/bin/exonerate -q /tmp/maker_08Elxf/10/TRINITY_GG_87963_c9694_g10_i12.for.49475083-49475985.10.fasta -t /tmp/maker_08Elxf/10/chr6L.49475083-49475985.10.fasta -Q dna -T dna --model est2genome --minintron 20 --maxintron 10000 --showcigar --percent 20 > /tmp/maker_08Elxf/10/chr6L.49475083-49475985.TRINITY_GG_87963_c9694_g10_i12.e.exonerate >> #-------------------------------# >> >> =================================================================================== >> = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES >> = PID 132376 RUNNING AT pnap-pe7-s03 >> = EXIT CODE: 135 >> = CLEANING UP REMAINING PROCESSES >> = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES >> =================================================================================== >> YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Bus error (signal 7) >> This typically refers to a problem with your application. >> Please see the FAQ page for debugging suggestions >> >> The the command I ran was the following: >> >> #PBS -l walltime=240:00:00 >> #PBS -N MAKER >> #PBS -l nodes=1:ppn=16 >> ##PBS -q hmem >> #PBS -j oe >> #PBS -m abe >> #PBS -M jcornelius at tgen.org >> #PBS -A tgen-205000 >> #PBS -o /scratch/jcornelius/xenopus_laevis/maker_run >> >> # --- load required modules --- # >> >> module load maker >> >> # --- run maker --- # >> >> cd /scratch/jcornelius/xenopus_laevis/maker_run >> mpiexec -n 16 maker -base XLNEURO.run1 -fix_nucleotides >> >> I'm not sure what could be causing this error and any help would be much appreciated. Thanks. >> -- >> John Cornelius >> MCB PhD Candidate >> Arizona State University >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > > -- > John Cornelius > MCB PhD Candidate > Arizona State University -------------- next part -------------- An HTML attachment was scrubbed... URL: From jcornel3 at asu.edu Fri Nov 18 12:14:52 2016 From: jcornel3 at asu.edu (John Cornelius) Date: Fri, 18 Nov 2016 12:14:52 -0700 Subject: [maker-devel] Error running MAKER In-Reply-To: <8E9C294A-B68C-42C1-999C-13165985AD93@gmail.com> References: <478D5289-91FD-4F3A-AED1-B2A81A742D43@gmail.com> <8E9C294A-B68C-42C1-999C-13165985AD93@gmail.com> Message-ID: Would the lock failure cause problems with the annotation? It looks like Maker is still progressing, just not as quickly as I thought it would be. On Thu, Nov 17, 2016 at 9:04 PM, Carson Holt wrote: > To use less RAM, try lowering max_dna_len=, setting blast_depth= > parameters to 20 pr 30 in maker_bopts.ctl (default is limitless), or when > using MPI, starting fewer processes per node (requires manipulation of > hostfile or using round robin distribution flag for MPI flavors where it is > available). > > The memory issue could be causing the lock failure as well. > > ?Carson > > > > On Nov 17, 2016, at 7:53 PM, John Cornelius wrote: > > Ok, so I went and searched one of the output logs for all the lines that > say ERROR and I got 44 lines with the following message: > > ERROR: Lock broken in runlog > > With these lines found at the end: > > ERROR: Failed while polishig ESTs > ERROR: Chunk failed at level:2, tier_type:3 > ERROR: Could not query process table: Cannot allocate memory at > /packages/maker/2.31.8/bin/../lib/Proc/ProcessTable_simple.pm line 62. > > From that last line it looks like the process is running out of RAM would > that be right? Thanks. > > On Fri, Nov 11, 2016 at 2:59 PM, Carson Holt wrote: > >> The cause of the error is probably further back in the STDERR. With MPI >> so many processes are producing status and notes, that you can get several >> seconds of output after ta failure. If you kept the whole STDERR, I can >> help you look through it. searching for ?ERROR? all caps is usually where >> you will see it. Also MAKER keeps a log of progress, so even on failure, >> you can just restart it and it will pick up the analysis from the last >> successful step. >> >> ?Carson >> >> >> On Nov 10, 2016, at 3:43 PM, John Cornelius wrote: >> >> Hello, I'm using MAKER to annotate a tetraploid genome and while running >> it, I encountered the following error: >> >> #--------- command -------------# >> Widget::exonerate::est2genome: >> /packages/exonerate-2.2.0/bin/exonerate -q >> /tmp/maker_08Elxf/15/TRINITY_GG_19079_c1670_g1_i1.for.84770203-84771247.15.fasta >> -t /tmp/maker_08Elxf/15/chr9_10L.84770203-84771247.15.fasta -Q dna -T >> dna --model est2genome --minintron 20 --maxintron 10000 --showcigar >> --percent 20 > /tmp/maker_08Elxf/15/chr9_10L. >> 84770203-84771247.TRINITY_GG_19079_c1670_g1_i1.e.exonerate >> #-------------------------------# >> running est2genome search. >> #--------- command -------------# >> Widget::exonerate::est2genome: >> /packages/exonerate-2.2.0/bin/exonerate -q >> /tmp/maker_08Elxf/10/TRINITY_GG_87963_c9694_g10_i12.for.49475083-49475985.10.fasta >> -t /tmp/maker_08Elxf/10/chr6L.49475083-49475985.10.fasta -Q dna -T dna >> --model est2genome --minintron 20 --maxintron 10000 --showcigar --percent >> 20 > /tmp/maker_08Elxf/10/chr6L.49475083-49475985.TRINITY_GG_8796 >> 3_c9694_g10_i12.e.exonerate >> #-------------------------------# >> >> ============================================================ >> ======================= >> = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES >> = PID 132376 RUNNING AT pnap-pe7-s03 >> = EXIT CODE: 135 >> = CLEANING UP REMAINING PROCESSES >> = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES >> ============================================================ >> ======================= >> YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Bus error (signal 7) >> This typically refers to a problem with your application. >> Please see the FAQ page for debugging suggestions >> >> The the command I ran was the following: >> >> #PBS -l walltime=240:00:00 >> #PBS -N MAKER >> #PBS -l nodes=1:ppn=16 >> ##PBS -q hmem >> #PBS -j oe >> #PBS -m abe >> #PBS -M jcornelius at tgen.org >> #PBS -A tgen-205000 >> #PBS -o /scratch/jcornelius/xenopus_laevis/maker_run >> >> # --- load required modules --- # >> >> module load maker >> >> # --- run maker --- # >> >> cd /scratch/jcornelius/xenopus_laevis/maker_run >> mpiexec -n 16 maker -base XLNEURO.run1 -fix_nucleotides >> >> I'm not sure what could be causing this error and any help would be much >> appreciated. Thanks. >> -- >> John Cornelius >> MCB PhD Candidate >> Arizona State University >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> > > > -- > John Cornelius > MCB PhD Candidate > Arizona State University > > > -- John Cornelius MCB PhD Candidate Arizona State University -------------- next part -------------- An HTML attachment was scrubbed... URL: From mohamed.amine.chebbi at univ-poitiers.fr Thu Nov 24 14:45:01 2016 From: mohamed.amine.chebbi at univ-poitiers.fr (Mohamed Amine Chebbi) Date: Thu, 24 Nov 2016 22:45:01 +0100 (CET) Subject: [maker-devel] map_fasta_ids : No mapping available... Message-ID: <773569486.15711466.1480023901276.JavaMail.zimbra@univ-poitiers.fr> Hello ! I'am attempting to rename genes of maker.proteins.fasta for Genebank submission using the map_fasta_ids script. It seems to work correctly for the major of gene models, except to those ones having the below warning message : WARNING: No mapping available for maker-scaffold_1710-augustus-gene-0.3-mRNA-1 WARNING: No mapping available for maker-scaffold_1710-augustus-gene-0.0-mRNA-1 WARNING: No mapping available for maker-scaffold_1710-snap-gene-0.6-mRNA-1 WARNING: No mapping available for maker-scaffold_1710-augustus-gene-0.4-mRNA-1 WARNING: No mapping available for maker-scaffold_1734-augustus-gene-0.1-mRNA-1 WARNING: No mapping available for maker-scaffold_1734-augustus-gene-0.2-mRNA-1 WARNING: No mapping available for maker-scaffold_1734-augustus-gene-0.0-mRNA-1 WARNING: No mapping available for maker-scaffold_1734-augustus-gene-0.5-mRNA-1 WARNING: No mapping available for maker-scaffold_1734-augustus-gene-0.6-mRNA-1 WARNING: No mapping available for maker-scaffold_1734-snap-gene-0.15-mRNA-1 WARNING: No mapping available for maker-scaffold_1734-snap-gene-0.16-mRNA-1 Looking into the maker.gff file, these gene names are missing and may be replaced by other ones which differ by the numbers following the gene predictor. I wounder if you can explain me the reason of these warning message and how to resolve it. Thank you , Best, Amine -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Nov 24 19:04:59 2016 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 24 Nov 2016 19:04:59 -0700 Subject: [maker-devel] Error running MAKER In-Reply-To: References: <478D5289-91FD-4F3A-AED1-B2A81A742D43@gmail.com> <8E9C294A-B68C-42C1-999C-13165985AD93@gmail.com> Message-ID: <3C668404-EA3C-46B4-9676-8F95E2AFB64F@gmail.com> A lock failure can become an issue if two separate jobs are running simultaneously. They may both try to process the same contig at the same time (modifying each others files) which will cause one or both to fail. On failure, it should always retry at some later point. So it can usually recover from this. If you see any partial lines in the resulting GFF3, then it did not recover and you need to just rerun whatever contig this happened on. ?Carson > On Nov 18, 2016, at 12:14 PM, John Cornelius wrote: > > Would the lock failure cause problems with the annotation? It looks like Maker is still progressing, just not as quickly as I thought it would be. > > On Thu, Nov 17, 2016 at 9:04 PM, Carson Holt > wrote: > To use less RAM, try lowering max_dna_len=, setting blast_depth= parameters to 20 pr 30 in maker_bopts.ctl (default is limitless), or when using MPI, starting fewer processes per node (requires manipulation of hostfile or using round robin distribution flag for MPI flavors where it is available). > > The memory issue could be causing the lock failure as well. > > ?Carson > > > >> On Nov 17, 2016, at 7:53 PM, John Cornelius > wrote: >> >> Ok, so I went and searched one of the output logs for all the lines that say ERROR and I got 44 lines with the following message: >> >> ERROR: Lock broken in runlog >> >> With these lines found at the end: >> >> ERROR: Failed while polishig ESTs >> ERROR: Chunk failed at level:2, tier_type:3 >> ERROR: Could not query process table: Cannot allocate memory at /packages/maker/2.31.8/bin/../lib/Proc/ProcessTable_simple.pm line 62. >> >> From that last line it looks like the process is running out of RAM would that be right? Thanks. >> >> On Fri, Nov 11, 2016 at 2:59 PM, Carson Holt > wrote: >> The cause of the error is probably further back in the STDERR. With MPI so many processes are producing status and notes, that you can get several seconds of output after ta failure. If you kept the whole STDERR, I can help you look through it. searching for ?ERROR? all caps is usually where you will see it. Also MAKER keeps a log of progress, so even on failure, you can just restart it and it will pick up the analysis from the last successful step. >> >> ?Carson >> >> >>> On Nov 10, 2016, at 3:43 PM, John Cornelius > wrote: >>> >>> Hello, I'm using MAKER to annotate a tetraploid genome and while running it, I encountered the following error: >>> >>> #--------- command -------------# >>> Widget::exonerate::est2genome: >>> /packages/exonerate-2.2.0/bin/exonerate -q /tmp/maker_08Elxf/15/TRINITY_GG_19079_c1670_g1_i1.for.84770203-84771247.15.fasta -t /tmp/maker_08Elxf/15/chr9_10L.84770203-84771247.15.fasta -Q dna -T dna --model est2genome --minintron 20 --maxintron 10000 --showcigar --percent 20 > /tmp/maker_08Elxf/15/chr9_10L.84770203-84771247.TRINITY_GG_19079_c1670_g1_i1.e.exonerate >>> #-------------------------------# >>> running est2genome search. >>> #--------- command -------------# >>> Widget::exonerate::est2genome: >>> /packages/exonerate-2.2.0/bin/exonerate -q /tmp/maker_08Elxf/10/TRINITY_GG_87963_c9694_g10_i12.for.49475083-49475985.10.fasta -t /tmp/maker_08Elxf/10/chr6L.49475083-49475985.10.fasta -Q dna -T dna --model est2genome --minintron 20 --maxintron 10000 --showcigar --percent 20 > /tmp/maker_08Elxf/10/chr6L.49475083-49475985.TRINITY_GG_87963_c9694_g10_i12.e.exonerate >>> #-------------------------------# >>> >>> =================================================================================== >>> = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES >>> = PID 132376 RUNNING AT pnap-pe7-s03 >>> = EXIT CODE: 135 >>> = CLEANING UP REMAINING PROCESSES >>> = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES >>> =================================================================================== >>> YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Bus error (signal 7) >>> This typically refers to a problem with your application. >>> Please see the FAQ page for debugging suggestions >>> >>> The the command I ran was the following: >>> >>> #PBS -l walltime=240:00:00 >>> #PBS -N MAKER >>> #PBS -l nodes=1:ppn=16 >>> ##PBS -q hmem >>> #PBS -j oe >>> #PBS -m abe >>> #PBS -M jcornelius at tgen.org >>> #PBS -A tgen-205000 >>> #PBS -o /scratch/jcornelius/xenopus_laevis/maker_run >>> >>> # --- load required modules --- # >>> >>> module load maker >>> >>> # --- run maker --- # >>> >>> cd /scratch/jcornelius/xenopus_laevis/maker_run >>> mpiexec -n 16 maker -base XLNEURO.run1 -fix_nucleotides >>> >>> I'm not sure what could be causing this error and any help would be much appreciated. Thanks. >>> -- >>> John Cornelius >>> MCB PhD Candidate >>> Arizona State University >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> >> >> -- >> John Cornelius >> MCB PhD Candidate >> Arizona State University > > > > > -- > John Cornelius > MCB PhD Candidate > Arizona State University -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Nov 28 09:26:40 2016 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 28 Nov 2016 09:26:40 -0700 Subject: [maker-devel] map_fasta_ids : No mapping available... In-Reply-To: <773569486.15711466.1480023901276.JavaMail.zimbra@univ-poitiers.fr> References: <773569486.15711466.1480023901276.JavaMail.zimbra@univ-poitiers.fr> Message-ID: <401400E0-7581-4407-A30E-A787485B0E86@gmail.com> The map file you run with is two columns (old_id and new_id). If the input file has IDs that do not match anything in the old_id column then it throws the warning. It means there is a mismatch between the map file being used and the fasta file. This can occur if you did downstream manipulation of the fasta file, are using the wrong fasta file, or if you used GFF3 as input to a maker step that as generated an ID mismatch. ?Carson > On Nov 24, 2016, at 2:45 PM, Mohamed Amine Chebbi wrote: > > Hello ! > > I'am attempting to rename genes of maker.proteins.fasta for Genebank submission using the map_fasta_ids script. It seems to work correctly for the major of gene models, except to those ones having the below warning message : > > WARNING: No mapping available for maker-scaffold_1710-augustus-gene-0.3-mRNA-1 > WARNING: No mapping available for maker-scaffold_1710-augustus-gene-0.0-mRNA-1 > WARNING: No mapping available for maker-scaffold_1710-snap-gene-0.6-mRNA-1 > WARNING: No mapping available for maker-scaffold_1710-augustus-gene-0.4-mRNA-1 > WARNING: No mapping available for maker-scaffold_1734-augustus-gene-0.1-mRNA-1 > WARNING: No mapping available for maker-scaffold_1734-augustus-gene-0.2-mRNA-1 > WARNING: No mapping available for maker-scaffold_1734-augustus-gene-0.0-mRNA-1 > WARNING: No mapping available for maker-scaffold_1734-augustus-gene-0.5-mRNA-1 > WARNING: No mapping available for maker-scaffold_1734-augustus-gene-0.6-mRNA-1 > WARNING: No mapping available for maker-scaffold_1734-snap-gene-0.15-mRNA-1 > WARNING: No mapping available for maker-scaffold_1734-snap-gene-0.16-mRNA-1 > > Looking into the maker.gff file, these gene names are missing and may be replaced by other ones which differ by the numbers following the gene predictor. > > I wounder if you can explain me the reason of these warning message and how to resolve it. > > Thank you , > > Best, > Amine > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From parulk at caltech.edu Tue Nov 29 10:13:06 2016 From: parulk at caltech.edu (Kudtarkar, Parul V.) Date: Tue, 29 Nov 2016 17:13:06 +0000 Subject: [maker-devel] error: training genemodel with SNAP and GeneMark & run time to generate AUGUTUS species file Message-ID: <5F5AE8A3-967E-4876-8581-FE54FB676210@caltech.edu> Dear Maker developers, 1. We use assembled RNAseq(from same species) and protein evidence(from evolutionary close species) to generate training gene structure(1st iteration, est2genome=1,protein2genome=1 ). 2. This is than used to train abinito gene predictors, SNAP and AUGUSTUS. 3. GeneMarkES( version: GeneMark-ES / ET v.4.32) is used to produce training data-set with the command gmes_petap.pl --sequence pmin_jelly.fa 4. We would be predicting genes using results from SNAP, Genemark and AUGUSTUS(2nd iteration, est2genome=0, protein2genome=0) I have couple of questions relating to Genemark and AUGUSTUS 1. AUGUSTUS We do not have a species file for species file of our interest or evolutionary closer species following command is used to generate species file /autoAug.pl --genome=pmin_jelly.fa --species=pminiata --cdna=pmin_transcripts.fa --trainingset=genome.gff3 --singleCPU -v --useexisting AUGUSTUS is taking too long to compute species file, is there a solution for this issue. Using species file from other organism might generate false positives. Is it advised in such situations to not used AUGUSTUS model? 2. Genemark I used the gmhmm file generated in the genemark output directory, however I encounter following error ------------------------- STATUS: Parsing control files... ERROR: You have failed to provide a value for 'gmhmme3' in the control files. ERROR: You have failed to provide a value for 'probuild' in the control files. --------------------- FYI ----- maker_opts.ctl #-----Gene Prediction snaphmm=/home/parul/Pmin_new/maker_snap/pmin1.hmm #SNAP HMM file gmhmm=/home/parul/Pmin_new/maker_snap/gmhmm.mod #GeneMark HMM file ----- Using SNAP for training gene model yields over 6000-7000 additional gene. The model has good cumulative AED value. I was hoping in addition to SNAP, if I could use AUGUSTUS and GeneMark to train the gene model to fuse dispersed models so that the gene count is within the expected range. Thanks and regards, Parul Sent from my iPhone -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Tue Nov 29 10:28:33 2016 From: dence at genetics.utah.edu (Daniel Ence) Date: Tue, 29 Nov 2016 17:28:33 +0000 Subject: [maker-devel] error: training genemodel with SNAP and GeneMark & run time to generate AUGUTUS species file In-Reply-To: <5F5AE8A3-967E-4876-8581-FE54FB676210@caltech.edu> References: <5F5AE8A3-967E-4876-8581-FE54FB676210@caltech.edu> Message-ID: <359BAE14-18C2-4B91-A628-9613F94C8468@genetics.utah.edu> HI Parul, Training augustus does take a long time. Much longer than for the other two predictors that you mentioned. Have you tried using the webAugustus web portal? The team that made augustus run it and can probably help you with trouble-shooting their page for creating training sets: http://bioinf.uni-greifswald.de/webaugustus/training/create The error that you got regarding genemark is saying that maker can?t find the genemark and probuild executable files. These are specified in the maker_exe.ctl file, not the ?opts? file. You need to put valid paths to those executable files in for the given parameters. This is something that is usually specified during installation of MAKER. Hope that helps, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 On Nov 29, 2016, at 10:13 AM, Kudtarkar, Parul V. > wrote: Dear Maker developers, 1. We use assembled RNAseq(from same species) and protein evidence(from evolutionary close species) to generate training gene structure(1st iteration, est2genome=1,protein2genome=1 ). 2. This is than used to train abinito gene predictors, SNAP and AUGUSTUS. 3. GeneMarkES( version: GeneMark-ES / ET v.4.32) is used to produce training data-set with the command gmes_petap.pl --sequence pmin_jelly.fa 4. We would be predicting genes using results from SNAP, Genemark and AUGUSTUS(2nd iteration, est2genome=0, protein2genome=0) I have couple of questions relating to Genemark and AUGUSTUS 1. AUGUSTUS We do not have a species file for species file of our interest or evolutionary closer species following command is used to generate species file /autoAug.pl --genome=pmin_jelly.fa --species=pminiata --cdna=pmin_transcripts.fa --trainingset=genome.gff3 --singleCPU -v --useexisting AUGUSTUS is taking too long to compute species file, is there a solution for this issue. Using species file from other organism might generate false positives. Is it advised in such situations to not used AUGUSTUS model? 2. Genemark I used the gmhmm file generated in the genemark output directory, however I encounter following error ------------------------- STATUS: Parsing control files... ERROR: You have failed to provide a value for 'gmhmme3' in the control files. ERROR: You have failed to provide a value for 'probuild' in the control files. --------------------- FYI ----- maker_opts.ctl #-----Gene Prediction snaphmm=/home/parul/Pmin_new/maker_snap/pmin1.hmm #SNAP HMM file gmhmm=/home/parul/Pmin_new/maker_snap/gmhmm.mod #GeneMark HMM file ----- Using SNAP for training gene model yields over 6000-7000 additional gene. The model has good cumulative AED value. I was hoping in addition to SNAP, if I could use AUGUSTUS and GeneMark to train the gene model to fuse dispersed models so that the gene count is within the expected range. Thanks and regards, Parul Sent from my iPhone _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Nov 29 10:34:31 2016 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 29 Nov 2016 10:34:31 -0700 Subject: [maker-devel] error: training genemodel with SNAP and GeneMark & run time to generate AUGUTUS species file In-Reply-To: <5F5AE8A3-967E-4876-8581-FE54FB676210@caltech.edu> References: <5F5AE8A3-967E-4876-8581-FE54FB676210@caltech.edu> Message-ID: <596EAC73-4DB5-4144-A8EA-0E955AA0E028@gmail.com> How to train Augustus ?> http://www.molecularevolution.org/molevolfiles/exercises/augustus/training.html Step 2 shows how to create an empty species to start training with. Then Step 4 (optimize_augustus.pl) is the step that takes a while. Then for GeneMark, you must set the location of the necessary GeneMark executables in the maker_exe.ctl file. After getting all predictors trained, and running a few contigs, take a moment to review the predictor performance by manually reviewing them in something like Apollo. It is not uncommon that one or more perform poorly on an organism (they should each produce similar predictions). If one is significantly off relative to the other predictors and the evidence, it should be dropped. A bad behaving predictor will reduce the overall annotation performance. ?Carson > On Nov 29, 2016, at 10:13 AM, Kudtarkar, Parul V. wrote: > > >> Dear Maker developers, >> >> 1. We use assembled RNAseq(from same species) and protein evidence(from evolutionary close species) to generate training gene structure(1st iteration, est2genome=1,protein2genome=1 ). >> >> 2. This is than used to train abinito gene predictors, SNAP and AUGUSTUS. >> >> 3. GeneMarkES( version: GeneMark-ES / ET v.4.32) is used to produce training data-set with the command >> >> gmes_petap.pl --sequence pmin_jelly.fa >> >> 4. We would be predicting genes using results from SNAP, Genemark and AUGUSTUS(2nd iteration, est2genome=0, protein2genome=0) >> >> I have couple of questions relating to Genemark and AUGUSTUS >> >> 1. AUGUSTUS >> >> We do not have a species file for species file of our interest or evolutionary closer species >> >> following command is used to generate species file >> >> >> /autoAug.pl --genome=pmin_jelly.fa --species=pminiata --cdna=pmin_transcripts.fa --trainingset=genome.gff3 --singleCPU -v --useexisting >> AUGUSTUS is taking too long to compute species file, is there a solution for this issue. Using species file from other organism might generate false positives. Is it advised in such situations to not used AUGUSTUS model? >> >> 2. Genemark >> >> I used the gmhmm file generated in the genemark output directory, however I encounter following error >> >> >> ------------------------- >> >> STATUS: Parsing control files... >> ERROR: You have failed to provide a value for 'gmhmme3' in the control files. >> ERROR: You have failed to provide a value for 'probuild' in the control files. >> --------------------- >> FYI >> >> ----- >> >> maker_opts.ctl >> >> >> #-----Gene Prediction >> snaphmm=/home/parul/Pmin_new/maker_snap/pmin1.hmm #SNAP HMM file >> gmhmm=/home/parul/Pmin_new/maker_snap/gmhmm.mod #GeneMark HMM file >> >> ----- >> >> Using SNAP for training gene model yields over 6000-7000 additional gene. The model has good cumulative AED value. >> >> I was hoping in addition to SNAP, if I could use AUGUSTUS and GeneMark to train the gene model to fuse dispersed models so that the gene count is within the expected range. >> >> >> Thanks and regards, >> >> Parul >> > > Sent from my iPhone -------------- next part -------------- An HTML attachment was scrubbed... URL: From parulk at caltech.edu Tue Nov 29 16:40:30 2016 From: parulk at caltech.edu (Kudtarkar, Parul V.) Date: Tue, 29 Nov 2016 23:40:30 +0000 Subject: [maker-devel] error: training genemodel with SNAP and GeneMark & run time to generate AUGUTUS species file In-Reply-To: <596EAC73-4DB5-4144-A8EA-0E955AA0E028@gmail.com> References: <5F5AE8A3-967E-4876-8581-FE54FB676210@caltech.edu>, <596EAC73-4DB5-4144-A8EA-0E955AA0E028@gmail.com> Message-ID: Dear Carson and Daniel, Thanks for getting back to me promptly. Adding the path to genemark executable in maker_exe.ctl fixes the error. Hopefully optimize_augustus.pl runs quicker compared to autoAug.pl (which has been running for almost a week now) It would be interesting and we look forward to evaluate which model optimizes our expected gene count, AED values and has recognizable domains. PS. We think BUSCO has helped us to evaluate gene model completeness. Thanks, Parul ---- Parul Kudtarkar Bioinformatician Biology and Biological Engineering Office: 278 Beckman Institute California Institute of Technology MC 139-74 Pasadena CA 91125 http://www.echinobase.org ________________________________ From: Carson Holt Sent: Tuesday, November 29, 2016 9:34:31 AM To: Kudtarkar, Parul V. Cc: maker-devel at yandell-lab.org Subject: Re: error: training genemodel with SNAP and GeneMark & run time to generate AUGUTUS species file How to train Augustus -> http://www.molecularevolution.org/molevolfiles/exercises/augustus/training.html Step 2 shows how to create an empty species to start training with. Then Step 4 (optimize_augustus.pl) is the step that takes a while. Then for GeneMark, you must set the location of the necessary GeneMark executables in the maker_exe.ctl file. After getting all predictors trained, and running a few contigs, take a moment to review the predictor performance by manually reviewing them in something like Apollo. It is not uncommon that one or more perform poorly on an organism (they should each produce similar predictions). If one is significantly off relative to the other predictors and the evidence, it should be dropped. A bad behaving predictor will reduce the overall annotation performance. -Carson On Nov 29, 2016, at 10:13 AM, Kudtarkar, Parul V. > wrote: Dear Maker developers, 1. We use assembled RNAseq(from same species) and protein evidence(from evolutionary close species) to generate training gene structure(1st iteration, est2genome=1,protein2genome=1 ). 2. This is than used to train abinito gene predictors, SNAP and AUGUSTUS. 3. GeneMarkES( version: GeneMark-ES / ET v.4.32) is used to produce training data-set with the command gmes_petap.pl --sequence pmin_jelly.fa 4. We would be predicting genes using results from SNAP, Genemark and AUGUSTUS(2nd iteration, est2genome=0, protein2genome=0) I have couple of questions relating to Genemark and AUGUSTUS 1. AUGUSTUS We do not have a species file for species file of our interest or evolutionary closer species following command is used to generate species file /autoAug.pl --genome=pmin_jelly.fa --species=pminiata --cdna=pmin_transcripts.fa --trainingset=genome.gff3 --singleCPU -v --useexisting AUGUSTUS is taking too long to compute species file, is there a solution for this issue. Using species file from other organism might generate false positives. Is it advised in such situations to not used AUGUSTUS model? 2. Genemark I used the gmhmm file generated in the genemark output directory, however I encounter following error ------------------------- STATUS: Parsing control files... ERROR: You have failed to provide a value for 'gmhmme3' in the control files. ERROR: You have failed to provide a value for 'probuild' in the control files. --------------------- FYI ----- maker_opts.ctl #-----Gene Prediction snaphmm=/home/parul/Pmin_new/maker_snap/pmin1.hmm #SNAP HMM file gmhmm=/home/parul/Pmin_new/maker_snap/gmhmm.mod #GeneMark HMM file ----- Using SNAP for training gene model yields over 6000-7000 additional gene. The model has good cumulative AED value. I was hoping in addition to SNAP, if I could use AUGUSTUS and GeneMark to train the gene model to fuse dispersed models so that the gene count is within the expected range. Thanks and regards, Parul Sent from my iPhone -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Nov 30 12:24:36 2016 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 30 Nov 2016 12:24:36 -0700 Subject: [maker-devel] Error running MAKER In-Reply-To: References: <478D5289-91FD-4F3A-AED1-B2A81A742D43@gmail.com> <8E9C294A-B68C-42C1-999C-13165985AD93@gmail.com> <3C668404-EA3C-46B4-9676-8F95E2AFB64F@gmail.com> Message-ID: Yes. You can either separate out the contig using fasta_tool or find the contig in the datastore directory (failed contigs will have fasta created there just for the failed contig). Then you can use 'maker -g contig.fasta -base original_base_name? (-g and -base options) to specify that you want it to use the new contig fasta but write results to the given base directory (i.e. same as previous output directory). Remember to set -t (or tries in the maker_opts.ctl file) to a higher count when doing this. ?Carson > On Nov 30, 2016, at 12:11 PM, John Cornelius wrote: > > Awesome! Thanks for the help. MAKER finally finished it's initial run today however, I noticed that there was still one large sequence that failed. Would it be possible to run MAKER on just that sequence and then combine the result of that run with the output of my main maker run? > > On Thu, Nov 24, 2016 at 7:04 PM, Carson Holt > wrote: > A lock failure can become an issue if two separate jobs are running simultaneously. They may both try to process the same contig at the same time (modifying each others files) which will cause one or both to fail. On failure, it should always retry at some later point. So it can usually recover from this. If you see any partial lines in the resulting GFF3, then it did not recover and you need to just rerun whatever contig this happened on. > > ?Carson > > > >> On Nov 18, 2016, at 12:14 PM, John Cornelius > wrote: >> >> Would the lock failure cause problems with the annotation? It looks like Maker is still progressing, just not as quickly as I thought it would be. >> >> On Thu, Nov 17, 2016 at 9:04 PM, Carson Holt > wrote: >> To use less RAM, try lowering max_dna_len=, setting blast_depth= parameters to 20 pr 30 in maker_bopts.ctl (default is limitless), or when using MPI, starting fewer processes per node (requires manipulation of hostfile or using round robin distribution flag for MPI flavors where it is available). >> >> The memory issue could be causing the lock failure as well. >> >> ?Carson >> >> >> >>> On Nov 17, 2016, at 7:53 PM, John Cornelius > wrote: >>> >>> Ok, so I went and searched one of the output logs for all the lines that say ERROR and I got 44 lines with the following message: >>> >>> ERROR: Lock broken in runlog >>> >>> With these lines found at the end: >>> >>> ERROR: Failed while polishig ESTs >>> ERROR: Chunk failed at level:2, tier_type:3 >>> ERROR: Could not query process table: Cannot allocate memory at /packages/maker/2.31.8/bin/../lib/Proc/ProcessTable_simple.pm line 62. >>> >>> From that last line it looks like the process is running out of RAM would that be right? Thanks. >>> >>> On Fri, Nov 11, 2016 at 2:59 PM, Carson Holt > wrote: >>> The cause of the error is probably further back in the STDERR. With MPI so many processes are producing status and notes, that you can get several seconds of output after ta failure. If you kept the whole STDERR, I can help you look through it. searching for ?ERROR? all caps is usually where you will see it. Also MAKER keeps a log of progress, so even on failure, you can just restart it and it will pick up the analysis from the last successful step. >>> >>> ?Carson >>> >>> >>>> On Nov 10, 2016, at 3:43 PM, John Cornelius > wrote: >>>> >>>> Hello, I'm using MAKER to annotate a tetraploid genome and while running it, I encountered the following error: >>>> >>>> #--------- command -------------# >>>> Widget::exonerate::est2genome: >>>> /packages/exonerate-2.2.0/bin/exonerate -q /tmp/maker_08Elxf/15/TRINITY_GG_19079_c1670_g1_i1.for.84770203-84771247.15.fasta -t /tmp/maker_08Elxf/15/chr9_10L.84770203-84771247.15.fasta -Q dna -T dna --model est2genome --minintron 20 --maxintron 10000 --showcigar --percent 20 > /tmp/maker_08Elxf/15/chr9_10L.84770203-84771247.TRINITY_GG_19079_c1670_g1_i1.e.exonerate >>>> #-------------------------------# >>>> running est2genome search. >>>> #--------- command -------------# >>>> Widget::exonerate::est2genome: >>>> /packages/exonerate-2.2.0/bin/exonerate -q /tmp/maker_08Elxf/10/TRINITY_GG_87963_c9694_g10_i12.for.49475083-49475985.10.fasta -t /tmp/maker_08Elxf/10/chr6L.49475083-49475985.10.fasta -Q dna -T dna --model est2genome --minintron 20 --maxintron 10000 --showcigar --percent 20 > /tmp/maker_08Elxf/10/chr6L.49475083-49475985.TRINITY_GG_87963_c9694_g10_i12.e.exonerate >>>> #-------------------------------# >>>> >>>> =================================================================================== >>>> = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES >>>> = PID 132376 RUNNING AT pnap-pe7-s03 >>>> = EXIT CODE: 135 >>>> = CLEANING UP REMAINING PROCESSES >>>> = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES >>>> =================================================================================== >>>> YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Bus error (signal 7) >>>> This typically refers to a problem with your application. >>>> Please see the FAQ page for debugging suggestions >>>> >>>> The the command I ran was the following: >>>> >>>> #PBS -l walltime=240:00:00 >>>> #PBS -N MAKER >>>> #PBS -l nodes=1:ppn=16 >>>> ##PBS -q hmem >>>> #PBS -j oe >>>> #PBS -m abe >>>> #PBS -M jcornelius at tgen.org >>>> #PBS -A tgen-205000 >>>> #PBS -o /scratch/jcornelius/xenopus_laevis/maker_run >>>> >>>> # --- load required modules --- # >>>> >>>> module load maker >>>> >>>> # --- run maker --- # >>>> >>>> cd /scratch/jcornelius/xenopus_laevis/maker_run >>>> mpiexec -n 16 maker -base XLNEURO.run1 -fix_nucleotides >>>> >>>> I'm not sure what could be causing this error and any help would be much appreciated. Thanks. >>>> -- >>>> John Cornelius >>>> MCB PhD Candidate >>>> Arizona State University >>>> _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at box290.bluehost.com >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> >>> >>> >>> -- >>> John Cornelius >>> MCB PhD Candidate >>> Arizona State University >> >> >> >> >> -- >> John Cornelius >> MCB PhD Candidate >> Arizona State University > > > > > -- > John Cornelius > MCB PhD Candidate > Arizona State University -------------- next part -------------- An HTML attachment was scrubbed... URL: From FeatherstonJ at arc.agric.za Tue Nov 1 09:12:46 2016 From: FeatherstonJ at arc.agric.za (Jonathan Featherston) Date: Tue, 1 Nov 2016 15:12:46 +0000 Subject: [maker-devel] [Caution: Message contains Redirect URL content] InterProScan protein domain & AED physical evidence filtering In-Reply-To: References: Message-ID: <0C2463EA-53FD-4C9B-853A-BE933973E1FA@arc.agric.za> Dear Allison I'm not sure about your extra gene models but here is the script to perform quality filtering. A perl script I got from the forum somewhere (changed to txt in case it gets removed by mail server. Regards Jonathan -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- #!/usr/bin/perl -w use strict; #use lib ('/home/mcampbell/lib'); #use PostData; use Getopt::Std; use vars qw($opt_s $opt_d $opt_a $opt_p $opt_c $opt_m $opt_u); getopts('sda:pcmu'); use FileHandle; #----------------------------------------------------------------------------- #----------------------------------- MAIN ------------------------------------ #----------------------------------------------------------------------------- my $usage = "\n\nquality_filter.pl: generates defualt and standard gene builds from a maker geneated gff3_file with iprscan data pushed onto column 9 using ipr_update_gff.\n USAGE: quality_filter.pl -[options]\n OPTIONS: -d Prints transcripts with an AED <1 (MAKER default) -s Prints transcripts with an AED <1 and/or Pfam domain if in gff3 (MAKER Standard) -a Prints transcripts with an AED < the given value\n\n"; my $FILE1 = $ARGV[0]; #my $FILE2 = $ARGV[1]; die($usage) unless $ARGV[0] && ($opt_a || $opt_s || $opt_d); my %LU_G; my %LU_T; build_lus($FILE1); #build_lu_tid($FILE2); filter($FILE1); #PostData(\%LU_G); #PostData(\%LU_T); #----------------------------------------------------------------------------- #---------------------------------- SUBS ------------------------------------- #----------------------------------------------------------------------------- sub build_lus{ my $file = shift; my %data; my $fh = new FileHandle; $fh->open($file); while (defined(my $line = <$fh>)){ chomp($line); last if $line =~ /^\#\#FASTA/; next if $line =~ /^\#/; my @array = split(/\t/, $line); next unless $array[2] =~ /mRNA/; my ($tid) = $array[8] =~ /ID\=(.+?);.*/; my ($gid) = $array[8] =~ /Parent\=(.+?);.*/; my @c9 = split(/\;/, $array[8]); foreach my $x (@c9){ my ($k,$v) = $x =~ /(.+)\=(.+)/; $data{$k}=$v; } #load the LU if ($opt_s && (($data{'Dbxref'} && $data{'Dbxref'} =~ /Pfam/) || $data{'_AED'} < 1)){ $LU_G{$gid}=1; $LU_T{$tid}=1; } elsif ($opt_d && $data{'_AED'} < 1){ $LU_G{$gid}=1; $LU_T{$tid}=1; } elsif ($opt_a && $data{'_AED'} < $opt_a){ $LU_G{$gid}=1; $LU_T{$tid}=1; } undef %data; } } #----------------------------------------------------------------------------- sub filter{ my $file = shift; my $fh = new FileHandle; $fh->open($file); print "##gff-version 3\n"; while (defined(my $line = <$fh>)){ chomp($line); last if $line =~ /^\#\#FASTA/; next if $line =~ /^\#/; my @array = split(/\t/, $line); if ($array[2] eq 'gene'){ my ($id) = $array[8] =~ /ID=(\S+?);/; print $line."\n" if defined($LU_G{$id}); } elsif ($array[2] eq 'mRNA'){ my ($id) = $array[8] =~ /ID=(\S+?);/; print $line."\n" if defined($LU_T{$id}); } elsif ($array[2] eq 'exon'| $array[2] eq 'CDS'| $array[2] eq 'three_prime_UTR'| $array[2] eq 'five_prime_UTR'){ my $bool = 0; my ($ids) = $array[8] =~ /Parent=(\S+);?/; # my ($ids) = $array[8] =~ /Parent=(\S+?);/; $ids =~ s/;//; my @ids_array = split(/,/, $ids); foreach my $x (@ids_array){ if (defined($LU_T{$x})){ $bool++; } else{ $line =~ s/$x[^:]//; } } print $line."\n" if $bool; } else{print $line."\n"} } $fh->close(); } #----------------------------------------------------------------------------- sub build_lu_gid{ my $file = shift; my $fh = new FileHandle; $fh->open($file); while (defined(my $line = <$fh>)){ chomp($line); $LU_G{$line}=1; } $fh->close(); } #----------------------------------------------------------------------------- sub build_lu_tid{ my $file = shift; my $fh = new FileHandle; $fh->open($file); while (defined(my $line = <$fh>)){ chomp($line); last if $line =~ /^\#\#FASTA/; next if $line =~ /^\#/; my @array = split(/\t/, $line); if ($array[2] =~ 'mRNA'){ my ($tid) = $line =~ /ID=(.+?);/; my ($gid) = $line =~ /Parent=(.+?);/; if (defined($LU_G{$gid})){ $LU_T{$tid}=1; } } } $fh->close(); } #----------------------------------------------------------------------------- -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Nov 1 09:43:21 2016 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 1 Nov 2016 09:43:21 -0600 Subject: [maker-devel] [Caution: Message contains Redirect URL content] InterProScan protein domain & AED physical evidence filtering In-Reply-To: <0C2463EA-53FD-4C9B-853A-BE933973E1FA@arc.agric.za> References: <0C2463EA-53FD-4C9B-853A-BE933973E1FA@arc.agric.za> Message-ID: One note I?d like to make, is that doing a second round with keep_preds=1 is the wrong procedure (only do that if you really want to keep everything - i.e. in some fungi or oomycetes). Rather you should use InterProScan to evaluate the rejected models in the non-overlapping.abinit.proteins.fasta file, then grep the ones that have an IPR domain out of the GFF3 (will be match/match_part features) and then pass them to pred_gff in a separate run (just updates the format to gene/mRNA/exon/CDSwith proper reading frame). You can then merge the resulting GFF3's and fasta files. The reason there are differences between the runs is that there are models with AED less than 1 that get rejected for other reasons that you are brought back with keep_preds=1. For example if the only evidence is a protein alignment that has deep overlapping HSPs (extremely low complexity alignment) it will be filtered out even though AED is not technically equal to 1. Also if the overlapping protein evidence is in a different reading frame than the model it is supposed to support then the AED will be less than 1 but eAED will be 1 (extended AED), and the model will be rejected. ?Carson >> Hello MAKER google group, >> >> >> For the final round of a MAKER annotation for a de novo plant genome assembly, I ran MAKER twice: once with keep_preds=0 which annotated 20,284 genes and once with keep_preds=1 which annotated 34,055 genes. >> >> >> I ran the 34,055 genes (the keep_preds=1 set) through InterProScan to search the MAKER predictions for protein domain content and added this IPRScan output into the MAKER gff file with the ipr_update_gff accessory script. >> >> >> The game plan is to go through the 34,055 genes and remove any gene model that doesn? have either protein domain content or physical evidence. I am counting genes that have an AED=1 as the genes that don? have physical evidence. >> >> >> I have two questions: >> >> >> >> 1. I count 11,762 genes that have AED=1.0 in the keep_preds=1 annotation set, which leaves me with 22,293 genes that I? assuming have some physical evidence (34,055-11,762=22,293). But when I ran MAKER with keep_preds=0 originally, I only count 20,284 genes. What are the extra ?2,000 genes that are being annotated in the keep_preds=1 run that have and AED score of less than 1.0, but are not being annotated in the keep_preds=0 run? >> >> >> 2. My second question is if there is an accessory script available that will remove genes that lack either the IPRScan protein domains or physical evidence (AED < 1)? This type of gene removal was mentioned in a previous post from 2012 (https://groups.google.com/forum/#!searchin/maker-devel/sorry$20there$27s$20not$20a$20script$20prepackaged$20with$20MAKER$20for$20that$20yet.%7Csort:relevance/maker-devel/VaoXWlGHOjs/EElr_otrK8QJ ) and I was just wondering if since then someone wrote a script that will do this for me. >> >> >> >> If anyone could offer me any feedback, that would be greatly appreciated! >> >> >> >> Thank you, >> >> >> >> Allison >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From jacques.dainat at bils.se Tue Nov 1 10:08:45 2016 From: jacques.dainat at bils.se (Jacques Dainat) Date: Tue, 1 Nov 2016 17:08:45 +0100 Subject: [maker-devel] est_gff input does not provide any gene model In-Reply-To: References: Message-ID: <29E6299A-EA5F-4768-88CD-202ABB05AF89@bils.se> Thank you for the quick confirmation ! Just for clarification, what I provided to Maker was a correct gff3 file that indeed contain gene,mRNA,exon types but does not contain any CDS. I haven?t seen any information about the particular gff3 feature types expected for the est_gff files supplied. I think you should communicate more about it (within the maker_opt.ctl ?). It would be nice to stop the pipeline if the file provided contains no information. (When the file provided doesn?t exits too. The warning is not obvious to catch when launching on a cluster...) A last question. do the scores from the score column are used by MAKER from the est_gff file ? Jacques > On 01 Nov 2016, at 04:24, Carson Holt wrote: > > Evidence such as est_gff has to follow the alignment format used by GFF3 (i.e. match/match_part) whereas you are providing gene models (i.e. gene/mRNA/exon/CDS). Note that match/match_part are two level features whereas gene models are 3 levels. You need to reformat to match/match_part. > > ?Carson > > >> On Oct 31, 2016, at 4:51 AM, Jacques Dainat > wrote: >> >> Hello, >> >> I?m using usually Cufflinks output to feed Maker through the est_gff parameter, combined with the est2genome=1 parameter I get the wanted output. >> This time I used Stringtie output to feed Maker, but I don?t have any gene model predicted using the est2genome parameter. >> >> Any explanation ? Is it due to the gff3 format differences between these two file ? >> >> Cufflinks output example: >> Pnalgiovense_4592 Cufflinks match 363 977 17.844829 - . ID=1:s3_c1_r1.4.2;Name=1:s3_c1_r1.4.2; >> Pnalgiovense_4592 Cufflinks match_part 363 666 17.844829 - . ID=1:s3_c1_r1.4.2:exon-1;Name=1:s3_c1_r1.4.2;Parent=1:s3_c1_r1.4.2;Target=1:s3_c1_r1.4.2 1 304 +; >> Pnalgiovense_4592 Cufflinks match_part 743 977 17.844829 - . ID=1:s3_c1_r1.4.2:exon-2;Name=1:s3_c1_r1.4.2;Parent=1:s3_c1_r1.4.2;Target=1:s3_c1_r1.4.2 305 539 +; >> >> Stringtie output example: >> Pnalgiovense_112 StringTie gene 20 1256 1000 + . ID=HtMm_All.12253;cov=8.028295;fPKM=1.214491;gene_id=HtMm_All.12253;tPM=2.706611;transcript_id=HtMm_All.12253.1 >> Pnalgiovense_112 StringTie mRNA 20 1256 1000 + . ID=HtMm_All.12253.1;Parent=HtMm_All.12253;cov=8.028295;fPKM=1.214491;gene_id=HtMm_All.12253;tPM=2.706611;transcript_id=HtMm_All.12253.1 >> Pnalgiovense_112 StringTie exon 20 1256 1000 + . ID=HtMm_All.12253.1-exon-1;Parent=HtMm_All.12253.1;cov=8.028295;exon_number=1;gene_id=HtMm_All.12253;transcript_id=HtMm_All.12253.1 >> >> >> If it?s the Stringtie output that is problematic how can I fix it ? Removing gene, changing mRNA by match and exons by match_part is enough ? >> >> Best regards, >> >> >> Jacques Dainat, PhD >> NBIS (National Bioinformatics Infrastructure Sweden) >> Genome Annotation Service >> >> Address: (room E10:4204 - last floor) >> Uppsala University, BMC >> Department of Medical Biochemistry Microbiology, Genomics >> Husargatan 3, box 582 >> S-75123 Uppsala Sweden >> Phone: 01 84 71 46 25 >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Nov 1 10:25:36 2016 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 1 Nov 2016 10:25:36 -0600 Subject: [maker-devel] est_gff input does not provide any gene model In-Reply-To: <29E6299A-EA5F-4768-88CD-202ABB05AF89@bils.se> References: <29E6299A-EA5F-4768-88CD-202ABB05AF89@bils.se> Message-ID: <923C15DF-D705-416C-BCB8-CB87F1309797@gmail.com> The score will be ignored. The format to be used for evidence alignments is specified in the GFF3 spec (https://github.com/The-Sequence-Ontology/Specifications/blob/master/gff3.md ). An EST alignment example is also given as part of the GFF3 Spec. ?Carson > On Nov 1, 2016, at 10:08 AM, Jacques Dainat wrote: > > Thank you for the quick confirmation ! > > Just for clarification, what I provided to Maker was a correct gff3 file that indeed contain gene,mRNA,exon types but does not contain any CDS. > > I haven?t seen any information about the particular gff3 feature types expected for the est_gff files supplied. I think you should communicate more about it (within the maker_opt.ctl ?). > It would be nice to stop the pipeline if the file provided contains no information. (When the file provided doesn?t exits too. The warning is not obvious to catch when launching on a cluster...) > > A last question. do the scores from the score column are used by MAKER from the est_gff file ? > > Jacques > >> On 01 Nov 2016, at 04:24, Carson Holt > wrote: >> >> Evidence such as est_gff has to follow the alignment format used by GFF3 (i.e. match/match_part) whereas you are providing gene models (i.e. gene/mRNA/exon/CDS). Note that match/match_part are two level features whereas gene models are 3 levels. You need to reformat to match/match_part. >> >> ?Carson >> >> >>> On Oct 31, 2016, at 4:51 AM, Jacques Dainat > wrote: >>> >>> Hello, >>> >>> I?m using usually Cufflinks output to feed Maker through the est_gff parameter, combined with the est2genome=1 parameter I get the wanted output. >>> This time I used Stringtie output to feed Maker, but I don?t have any gene model predicted using the est2genome parameter. >>> >>> Any explanation ? Is it due to the gff3 format differences between these two file ? >>> >>> Cufflinks output example: >>> Pnalgiovense_4592 Cufflinks match 363 977 17.844829 - . ID=1:s3_c1_r1.4.2;Name=1:s3_c1_r1.4.2; >>> Pnalgiovense_4592 Cufflinks match_part 363 666 17.844829 - . ID=1:s3_c1_r1.4.2:exon-1;Name=1:s3_c1_r1.4.2;Parent=1:s3_c1_r1.4.2;Target=1:s3_c1_r1.4.2 1 304 +; >>> Pnalgiovense_4592 Cufflinks match_part 743 977 17.844829 - . ID=1:s3_c1_r1.4.2:exon-2;Name=1:s3_c1_r1.4.2;Parent=1:s3_c1_r1.4.2;Target=1:s3_c1_r1.4.2 305 539 +; >>> >>> Stringtie output example: >>> Pnalgiovense_112 StringTie gene 20 1256 1000 + . ID=HtMm_All.12253;cov=8.028295;fPKM=1.214491;gene_id=HtMm_All.12253;tPM=2.706611;transcript_id=HtMm_All.12253.1 >>> Pnalgiovense_112 StringTie mRNA 20 1256 1000 + . ID=HtMm_All.12253.1;Parent=HtMm_All.12253;cov=8.028295;fPKM=1.214491;gene_id=HtMm_All.12253;tPM=2.706611;transcript_id=HtMm_All.12253.1 >>> Pnalgiovense_112 StringTie exon 20 1256 1000 + . ID=HtMm_All.12253.1-exon-1;Parent=HtMm_All.12253.1;cov=8.028295;exon_number=1;gene_id=HtMm_All.12253;transcript_id=HtMm_All.12253.1 >>> >>> >>> If it?s the Stringtie output that is problematic how can I fix it ? Removing gene, changing mRNA by match and exons by match_part is enough ? >>> >>> Best regards, >>> >>> >>> Jacques Dainat, PhD >>> NBIS (National Bioinformatics Infrastructure Sweden) >>> Genome Annotation Service >>> >>> Address: (room E10:4204 - last floor) >>> Uppsala University, BMC >>> Department of Medical Biochemistry Microbiology, Genomics >>> Husargatan 3, box 582 >>> S-75123 Uppsala Sweden >>> Phone: 01 84 71 46 25 >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mohamed.amine.chebbi at univ-poitiers.fr Wed Nov 2 12:09:54 2016 From: mohamed.amine.chebbi at univ-poitiers.fr (Mohamed Amine Chebbi) Date: Wed, 2 Nov 2016 19:09:54 +0100 (CET) Subject: [maker-devel] ProtExcluder1.2 Error Message-ID: <236415532.6267908.1478110194546.JavaMail.zimbra@univ-poitiers.fr> Hi! I am working on creating a custom repeat library and I want to use ProtExcluder1.2 to trim potential genes from my repeat sequences. My blast version is BLAST 2.2.30+ I get this message error : Can not open the seqfile test.lib_blast_results.txt.fnolowm50seq mergeunmatchedregion.pl seqfile Illegal division by zero at ProtExcluder1.2/GCcontent.pl line 122. I wonder if you can help me to fix this. Thank you. Amine -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.s.campbell1 at gmail.com Thu Nov 3 11:57:35 2016 From: michael.s.campbell1 at gmail.com (Michael Campbell) Date: Thu, 3 Nov 2016 13:57:35 -0400 Subject: [maker-devel] ProtExcluder1.2 Error In-Reply-To: <236415532.6267908.1478110194546.JavaMail.zimbra@univ-poitiers.fr> References: <236415532.6267908.1478110194546.JavaMail.zimbra@univ-poitiers.fr> Message-ID: Hi Amine, That script is maintained by Ning Jiang and Kevin Childs. They know best what this script is expecting. I?ve ccd them on this email in the hope that they can provide some direction. Thanks, Mike > On Nov 2, 2016, at 2:09 PM, Mohamed Amine Chebbi wrote: > > Hi! > > I am working on creating a custom repeat library and I want to use ProtExcluder1.2 to trim potential genes from my repeat sequences. > My blast version is BLAST 2.2.30+ > > I get this message error : > > Can not open the seqfile test.lib_blast_results.txt.fnolowm50seq > mergeunmatchedregion.pl seqfile > Illegal division by zero at ProtExcluder1.2/GCcontent.pl line 122. > > I wonder if you can help me to fix this. > > Thank you. > > Amine > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From psh65 at cornell.edu Thu Nov 3 12:14:17 2016 From: psh65 at cornell.edu (Prashant S Hosmani) Date: Thu, 3 Nov 2016 18:14:17 +0000 Subject: [maker-devel] ProtExcluder1.2 Error In-Reply-To: References: <236415532.6267908.1478110194546.JavaMail.zimbra@univ-poitiers.fr> Message-ID: Hi Amine, I was getting similar error. You need to be careful with the blast versions. Try using the same blast version for makeblastdb. I was using BLAST 2.2.29+. After recreating new blast database with same version, it worked for me. Hope this helps. Prashant Prashant Hosmani Sol Genomics Network Boyce Thompson Institute, Ithaca, NY, USA On Nov 3, 2016, at 1:57 PM, Michael Campbell > wrote: Hi Amine, That script is maintained by Ning Jiang and Kevin Childs. They know best what this script is expecting. I?ve ccd them on this email in the hope that they can provide some direction. Thanks, Mike On Nov 2, 2016, at 2:09 PM, Mohamed Amine Chebbi > wrote: Hi! I am working on creating a custom repeat library and I want to use ProtExcluder1.2 to trim potential genes from my repeat sequences. My blast version is BLAST 2.2.30+ I get this message error : Can not open the seqfile test.lib_blast_results.txt.fnolowm50seq mergeunmatchedregion.pl seqfile Illegal division by zero at ProtExcluder1.2/GCcontent.pl line 122. I wonder if you can help me to fix this. Thank you. Amine _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From scott at scottcain.net Fri Nov 4 13:25:02 2016 From: scott at scottcain.net (Scott Cain) Date: Fri, 4 Nov 2016 15:25:02 -0400 Subject: [maker-devel] Last Call for GMOD talks at PAG Message-ID: Time is short! If you want to attend PAG and would like to present on a topic that would be of interest to the GMOD community, please send an abstract or at least a descriptive title to help at gmod.org. Types of talks typically include updates on GMOD software projects, usage stories for successful sites, proposals for new GMOD projects and descriptions of plugins for existing GMOD software projects like Tripal , JBrowse and Galaxy . Please consider giving a talk and sharing your experience and ideas! Scott -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research -------------- next part -------------- An HTML attachment was scrubbed... URL: From mohamed.amine.chebbi at univ-poitiers.fr Thu Nov 3 17:40:18 2016 From: mohamed.amine.chebbi at univ-poitiers.fr (chebbi mohamed amine) Date: Fri, 4 Nov 2016 00:40:18 +0100 (CET) Subject: [maker-devel] ProtExcluder1.2 Error In-Reply-To: <20161103185405.183337t1yq0no6x9@mail.msu.edu> References: <236415532.6267908.1478110194546.JavaMail.zimbra@univ-poitiers.fr> <20161103185405.183337t1yq0no6x9@mail.msu.edu> Message-ID: <1641376945.6912938.1478216418712.JavaMail.zimbra@univ-poitiers.fr> Hi ! Thank you Prashant for sharing your experience. Indeed using the same blast version 2.2.29 for makeblastdb seems to resolve the problem. It is looking to work fine for all the sequences except one as I have the message above: Fatal exception (source file ../../easel/esl_sqio_ascii.c, line 2001): Failed to fetch subsequence residues -- corrupt coords? sh: line 1: 46520 Aborted (core dumped) /hmmer-3.1b2-linux-intel-x86_64/binaries/esl-sfetch -c 1242..19031 all-te.lib rnd-4_family-1731#DNA >> blastx_results-all-te.txt.fnolowm50seq Did you encounter this problem before? Thank you for your help. Amine De: jiangn at msu.edu ?: "Prashant S Hosmani" Cc: "Michael Campbell" , "Mohamed Amine Chebbi" Envoy?: Jeudi 3 Novembre 2016 23:54:05 Objet: Re: [maker-devel] ProtExcluder1.2 Error Hi Prashant, Thank you so much for sharing your experience. It is important to keep everything in the same version. I will remind users about this when we update it and I may need to bother you then. Best regards, Ning Quoting Prashant S Hosmani : > Hi Amine, > > I was getting similar error. You need to be careful with the blast > versions. Try using the same blast version for makeblastdb. I was > using BLAST 2.2.29+. After recreating new blast database with same > version, it worked for me. > > Hope this helps. > Prashant > > > Prashant Hosmani > Sol Genomics Network > Boyce Thompson Institute, Ithaca, NY, USA > > > > On Nov 3, 2016, at 1:57 PM, Michael Campbell > > > wrote: > > Hi Amine, > > That script is maintained by Ning Jiang and Kevin Childs. They know > best what this script is expecting. I?ve ccd them on this email in > the hope that they can provide some direction. > > Thanks, > Mike > On Nov 2, 2016, at 2:09 PM, Mohamed Amine Chebbi > > > wrote: > > Hi! > > I am working on creating a custom repeat library and I want to use > ProtExcluder1.2 to trim potential genes from my repeat sequences. > My blast version is BLAST 2.2.30+ > > I get this message error : > > Can not open the seqfile test.lib_blast_results.txt.fnolowm50seq > mergeunmatchedregion.pl seqfile > Illegal division by zero at ProtExcluder1.2/GCcontent.pl line 122. > > I wonder if you can help me to fix this. > > Thank you. > > Amine > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mohamed.amine.chebbi at univ-poitiers.fr Fri Nov 4 04:44:02 2016 From: mohamed.amine.chebbi at univ-poitiers.fr (chebbi mohamed amine) Date: Fri, 4 Nov 2016 11:44:02 +0100 (CET) Subject: [maker-devel] ProtExcluder1.2 Error In-Reply-To: <1827195032.6913929.1478217820889.JavaMail.zimbra@univ-poitiers.fr> References: <236415532.6267908.1478110194546.JavaMail.zimbra@univ-poitiers.fr> <20161103185405.183337t1yq0no6x9@mail.msu.edu> <1641376945.6912938.1478216418712.JavaMail.zimbra@univ-poitiers.fr> <20161103195409.76212s1yy72mv95t@mail.msu.edu> <1827195032.6913929.1478217820889.JavaMail.zimbra@univ-poitiers.fr> Message-ID: <838628537.7128959.1478256242111.JavaMail.zimbra@univ-poitiers.fr> Hi J iangn ! I did some modifications in the script ProtExcluder1.2/mspesl-sfetch.pl by replacing : "esl-sfetch --index $ARGV[0] " by "samtools faidx $ARGV[0]" and "esl-sfetch -c $from..$to $ARGV[0] $line[7] >> $ARGV[3]" by "samtools faidx $ARGV[0] $line[7]:$from-$to >> $ARGV[3]" it works fine know and the script can extract the subsequences correctly. Best regard, Amine De: "chebbi mohamed amine" ?: "jiangn" Envoy?: Vendredi 4 Novembre 2016 01:03:40 Objet: Re: [maker-devel] ProtExcluder1.2 Error Hi J iangn In fact, this sequence has a size of 19031 bases. When I try the command /hmmer-3.1b2-linux-intel-x86_64/binaries/esl-sfetch -c 1242..19031 all-te.lib rnd-4_family-1731#DNA I get the error , however by testing with coordiantes inferior to 19031 it works fine. I think that it's a related problem to hmmer. I will try to add manualy the subsequence to the file .fnolowm50seq. Thank you Amine De: "jiangn" ?: "chebbi mohamed amine" Cc: "Prashant S Hosmani" , "Michael Campbell" Envoy?: Vendredi 4 Novembre 2016 00:54:09 Objet: Re: [maker-devel] ProtExcluder1.2 Error Hi Amine, I don't have this kind of experience. If only one sequence failed, I would suspect there might be some format issue for that specific sequence. Regards, Ning Quoting chebbi mohamed amine : > > Hi ! > > Thank you Prashant for sharing your experience. Indeed using the same > blast version 2.2.29 for makeblastdb seems to resolve the problem. It > is looking to work fine for all the sequences except one as I have > the message above: > > Fatal exception (source file ../../easel/esl_sqio_ascii.c, line 2001): > Failed to fetch subsequence residues -- corrupt coords? > sh: line 1: 46520 Aborted (core dumped) > /hmmer-3.1b2-linux-intel-x86_64/binaries/esl-sfetch -c 1242..19031 > all-te.lib rnd-4_family-1731#DNA >> > blastx_results-all-te.txt.fnolowm50seq > > Did you encounter this problem before? > > Thank you for your help. > > Amine > > > De: jiangn at msu.edu > ?: "Prashant S Hosmani" > Cc: "Michael Campbell" , "Mohamed > Amine Chebbi" > Envoy?: Jeudi 3 Novembre 2016 23:54:05 > Objet: Re: [maker-devel] ProtExcluder1.2 Error > > > > Hi Prashant, > > Thank you so much for sharing your experience. It is important to > keep everything in the same version. I will remind users about this > when we update it and I may need to bother you then. > > Best regards, > > Ning > > Quoting Prashant S Hosmani : > >> Hi Amine, >> >> I was getting similar error. You need to be careful with the blast >> versions. Try using the same blast version for makeblastdb. I was >> using BLAST 2.2.29+. After recreating new blast database with same >> version, it worked for me. >> >> Hope this helps. >> Prashant >> >> >> Prashant Hosmani >> Sol Genomics Network >> Boyce Thompson Institute, Ithaca, NY, USA >> >> >> >> On Nov 3, 2016, at 1:57 PM, Michael Campbell >> > >> wrote: >> >> Hi Amine, >> >> That script is maintained by Ning Jiang and Kevin Childs. They know >> best what this script is expecting. I?ve ccd them on this email in >> the hope that they can provide some direction. >> >> Thanks, >> Mike >> On Nov 2, 2016, at 2:09 PM, Mohamed Amine Chebbi >> > >> wrote: >> >> Hi! >> >> I am working on creating a custom repeat library and I want to use >> ProtExcluder1.2 to trim potential genes from my repeat sequences. >> My blast version is BLAST 2.2.30+ >> >> I get this message error : >> >> Can not open the seqfile test.lib_blast_results.txt.fnolowm50seq >> mergeunmatchedregion.pl seqfile >> Illegal division by zero at ProtExcluder1.2/GCcontent.pl line 122. >> >> I wonder if you can help me to fix this. >> >> Thank you. >> >> Amine >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jiangn at msu.edu Fri Nov 4 14:35:43 2016 From: jiangn at msu.edu (jiangn at msu.edu) Date: Fri, 04 Nov 2016 16:35:43 -0400 Subject: [maker-devel] ProtExcluder1.2 Error In-Reply-To: <838628537.7128959.1478256242111.JavaMail.zimbra@univ-poitiers.fr> References: <236415532.6267908.1478110194546.JavaMail.zimbra@univ-poitiers.fr> <20161103185405.183337t1yq0no6x9@mail.msu.edu> <1641376945.6912938.1478216418712.JavaMail.zimbra@univ-poitiers.fr> <20161103195409.76212s1yy72mv95t@mail.msu.edu> <1827195032.6913929.1478217820889.JavaMail.zimbra@univ-poitiers.fr> <838628537.7128959.1478256242111.JavaMail.zimbra@univ-poitiers.fr> Message-ID: <20161104163543.98626jb6y81eis67@mail.msu.edu> Hi Amine, That's good to know. Thank you! Ning Quoting chebbi mohamed amine : > Hi J iangn ! > > I did some modifications in the script > ProtExcluder1.2/mspesl-sfetch.pl by replacing : > > "esl-sfetch --index $ARGV[0] " by "samtools faidx $ARGV[0]" > and > "esl-sfetch -c $from..$to $ARGV[0] $line[7] >> $ARGV[3]" by "samtools > faidx $ARGV[0] $line[7]:$from-$to >> $ARGV[3]" > > it works fine know and the script can extract the subsequences correctly. > > Best regard, > Amine > > > De: "chebbi mohamed amine" > ?: "jiangn" > Envoy?: Vendredi 4 Novembre 2016 01:03:40 > Objet: Re: [maker-devel] ProtExcluder1.2 Error > > Hi J iangn > > In fact, this sequence has a size of 19031 bases. > When I try the command > /hmmer-3.1b2-linux-intel-x86_64/binaries/esl-sfetch -c 1242..19031 > all-te.lib rnd-4_family-1731#DNA I get the error , however by testing > with coordiantes inferior to 19031 it works fine. I think that it's a > related problem to hmmer. I will try to add manualy the subsequence > to the file .fnolowm50seq. > > Thank you > Amine > > De: "jiangn" > ?: "chebbi mohamed amine" > Cc: "Prashant S Hosmani" , "Michael Campbell" > > Envoy?: Vendredi 4 Novembre 2016 00:54:09 > Objet: Re: [maker-devel] ProtExcluder1.2 Error > > > > Hi Amine, > > I don't have this kind of experience. If only one sequence failed, I > would suspect there might be some format issue for that specific > sequence. > > Regards, > > Ning > > Quoting chebbi mohamed amine : > >> >> Hi ! >> >> Thank you Prashant for sharing your experience. Indeed using the same >> blast version 2.2.29 for makeblastdb seems to resolve the problem. It >> is looking to work fine for all the sequences except one as I have >> the message above: >> >> Fatal exception (source file ../../easel/esl_sqio_ascii.c, line 2001): >> Failed to fetch subsequence residues -- corrupt coords? >> sh: line 1: 46520 Aborted (core dumped) >> /hmmer-3.1b2-linux-intel-x86_64/binaries/esl-sfetch -c 1242..19031 >> all-te.lib rnd-4_family-1731#DNA >> >> blastx_results-all-te.txt.fnolowm50seq >> >> Did you encounter this problem before? >> >> Thank you for your help. >> >> Amine >> >> >> De: jiangn at msu.edu >> ?: "Prashant S Hosmani" >> Cc: "Michael Campbell" , "Mohamed >> Amine Chebbi" >> Envoy?: Jeudi 3 Novembre 2016 23:54:05 >> Objet: Re: [maker-devel] ProtExcluder1.2 Error >> >> >> >> Hi Prashant, >> >> Thank you so much for sharing your experience. It is important to >> keep everything in the same version. I will remind users about this >> when we update it and I may need to bother you then. >> >> Best regards, >> >> Ning >> >> Quoting Prashant S Hosmani : >> >>> Hi Amine, >>> >>> I was getting similar error. You need to be careful with the blast >>> versions. Try using the same blast version for makeblastdb. I was >>> using BLAST 2.2.29+. After recreating new blast database with same >>> version, it worked for me. >>> >>> Hope this helps. >>> Prashant >>> >>> >>> Prashant Hosmani >>> Sol Genomics Network >>> Boyce Thompson Institute, Ithaca, NY, USA >>> >>> >>> >>> On Nov 3, 2016, at 1:57 PM, Michael Campbell >>> > >>> wrote: >>> >>> Hi Amine, >>> >>> That script is maintained by Ning Jiang and Kevin Childs. They know >>> best what this script is expecting. I?ve ccd them on this email in >>> the hope that they can provide some direction. >>> >>> Thanks, >>> Mike >>> On Nov 2, 2016, at 2:09 PM, Mohamed Amine Chebbi >>> > >>> wrote: >>> >>> Hi! >>> >>> I am working on creating a custom repeat library and I want to use >>> ProtExcluder1.2 to trim potential genes from my repeat sequences. >>> My blast version is BLAST 2.2.30+ >>> >>> I get this message error : >>> >>> Can not open the seqfile test.lib_blast_results.txt.fnolowm50seq >>> mergeunmatchedregion.pl seqfile >>> Illegal division by zero at ProtExcluder1.2/GCcontent.pl line 122. >>> >>> I wonder if you can help me to fix this. >>> >>> Thank you. >>> >>> Amine >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> >> >> > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pierre-francois.bert at inra.fr Tue Nov 8 05:13:55 2016 From: pierre-francois.bert at inra.fr (Pierre-Francois Bert) Date: Tue, 8 Nov 2016 12:13:55 +0000 Subject: [maker-devel] Maker-P Message-ID: <1478607235425.40152@inra.fr> Hello, I'm interested in using maker-p but I can't find it within the last version 3 and neither find v2.29 to download. Can your please tell me how to proceed ? Best wishes. Pierre-Fran?ois Bert -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Nov 9 12:00:08 2016 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 9 Nov 2016 12:00:08 -0700 Subject: [maker-devel] Maker-P In-Reply-To: <1478607235425.40152@inra.fr> References: <1478607235425.40152@inra.fr> Message-ID: MAKER-P?s features and accessory scripts were integrated into MAKER with versions 2.29 and above as stated on the MAKER-P page. There is no longer a separate MAKER-P download and it is not a separate executable. You just download MAKER 2.29 or above and run .../maker/bin/maker ?Carson > On Nov 8, 2016, at 5:13 AM, Pierre-Francois Bert wrote: > > Hello, > I'm interested in using maker-p but I can't find it within the last version 3 and neither find v2.29 to download. > Can your please tell me how to proceed ? > Best wishes. > Pierre-Fran?ois Bert > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From jcornel3 at asu.edu Thu Nov 10 15:43:56 2016 From: jcornel3 at asu.edu (John Cornelius) Date: Thu, 10 Nov 2016 15:43:56 -0700 Subject: [maker-devel] Error running MAKER Message-ID: Hello, I'm using MAKER to annotate a tetraploid genome and while running it, I encountered the following error: #--------- command -------------# Widget::exonerate::est2genome: /packages/exonerate-2.2.0/bin/exonerate -q /tmp/maker_08Elxf/15/TRINITY_GG_19079_c1670_g1_i1.for.84770203-84771247.15.fasta -t /tmp/maker_08Elxf/15/chr9_10L.84770203-84771247.15.fasta -Q dna -T dna --model est2genome --minintron 20 --maxintron 10000 --showcigar --percent 20 > /tmp/maker_08Elxf/15/chr9_10L.84770203-84771247.TRINITY_GG_19079_c1670_g1_i1.e.exonerate #-------------------------------# running est2genome search. #--------- command -------------# Widget::exonerate::est2genome: /packages/exonerate-2.2.0/bin/exonerate -q /tmp/maker_08Elxf/10/TRINITY_GG_87963_c9694_g10_i12.for.49475083-49475985.10.fasta -t /tmp/maker_08Elxf/10/chr6L.49475083-49475985.10.fasta -Q dna -T dna --model est2genome --minintron 20 --maxintron 10000 --showcigar --percent 20 > /tmp/maker_08Elxf/10/chr6L.49475083-49475985.TRINITY_GG_87963_c9694_g10_i12.e.exonerate #-------------------------------# =================================================================================== = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES = PID 132376 RUNNING AT pnap-pe7-s03 = EXIT CODE: 135 = CLEANING UP REMAINING PROCESSES = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES =================================================================================== YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Bus error (signal 7) This typically refers to a problem with your application. Please see the FAQ page for debugging suggestions The the command I ran was the following: #PBS -l walltime=240:00:00 #PBS -N MAKER #PBS -l nodes=1:ppn=16 ##PBS -q hmem #PBS -j oe #PBS -m abe #PBS -M jcornelius at tgen.org #PBS -A tgen-205000 #PBS -o /scratch/jcornelius/xenopus_laevis/maker_run # --- load required modules --- # module load maker # --- run maker --- # cd /scratch/jcornelius/xenopus_laevis/maker_run mpiexec -n 16 maker -base XLNEURO.run1 -fix_nucleotides I'm not sure what could be causing this error and any help would be much appreciated. Thanks. -- John Cornelius MCB PhD Candidate Arizona State University -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Nov 11 14:59:54 2016 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 11 Nov 2016 14:59:54 -0700 Subject: [maker-devel] Error running MAKER In-Reply-To: References: Message-ID: <478D5289-91FD-4F3A-AED1-B2A81A742D43@gmail.com> The cause of the error is probably further back in the STDERR. With MPI so many processes are producing status and notes, that you can get several seconds of output after ta failure. If you kept the whole STDERR, I can help you look through it. searching for ?ERROR? all caps is usually where you will see it. Also MAKER keeps a log of progress, so even on failure, you can just restart it and it will pick up the analysis from the last successful step. ?Carson > On Nov 10, 2016, at 3:43 PM, John Cornelius wrote: > > Hello, I'm using MAKER to annotate a tetraploid genome and while running it, I encountered the following error: > > #--------- command -------------# > Widget::exonerate::est2genome: > /packages/exonerate-2.2.0/bin/exonerate -q /tmp/maker_08Elxf/15/TRINITY_GG_19079_c1670_g1_i1.for.84770203-84771247.15.fasta -t /tmp/maker_08Elxf/15/chr9_10L.84770203-84771247.15.fasta -Q dna -T dna --model est2genome --minintron 20 --maxintron 10000 --showcigar --percent 20 > /tmp/maker_08Elxf/15/chr9_10L.84770203-84771247.TRINITY_GG_19079_c1670_g1_i1.e.exonerate > #-------------------------------# > running est2genome search. > #--------- command -------------# > Widget::exonerate::est2genome: > /packages/exonerate-2.2.0/bin/exonerate -q /tmp/maker_08Elxf/10/TRINITY_GG_87963_c9694_g10_i12.for.49475083-49475985.10.fasta -t /tmp/maker_08Elxf/10/chr6L.49475083-49475985.10.fasta -Q dna -T dna --model est2genome --minintron 20 --maxintron 10000 --showcigar --percent 20 > /tmp/maker_08Elxf/10/chr6L.49475083-49475985.TRINITY_GG_87963_c9694_g10_i12.e.exonerate > #-------------------------------# > > =================================================================================== > = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES > = PID 132376 RUNNING AT pnap-pe7-s03 > = EXIT CODE: 135 > = CLEANING UP REMAINING PROCESSES > = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES > =================================================================================== > YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Bus error (signal 7) > This typically refers to a problem with your application. > Please see the FAQ page for debugging suggestions > > The the command I ran was the following: > > #PBS -l walltime=240:00:00 > #PBS -N MAKER > #PBS -l nodes=1:ppn=16 > ##PBS -q hmem > #PBS -j oe > #PBS -m abe > #PBS -M jcornelius at tgen.org > #PBS -A tgen-205000 > #PBS -o /scratch/jcornelius/xenopus_laevis/maker_run > > # --- load required modules --- # > > module load maker > > # --- run maker --- # > > cd /scratch/jcornelius/xenopus_laevis/maker_run > mpiexec -n 16 maker -base XLNEURO.run1 -fix_nucleotides > > I'm not sure what could be causing this error and any help would be much appreciated. Thanks. > -- > John Cornelius > MCB PhD Candidate > Arizona State University > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From lmeunier at ulg.ac.be Mon Nov 14 01:50:50 2016 From: lmeunier at ulg.ac.be (=?UTF-8?B?TG/Dr2M=?=) Date: Mon, 14 Nov 2016 09:50:50 +0100 Subject: [maker-devel] Predictions without evidence Message-ID: Hello, I am a Ph. D. student, and I am using MAKER to automate gene prediction for many genomes as part of a genome mining work, so I don't include evidence for its use. If I understood well, when exploiting multiple gene predictor softwares, AED is used to define the prediction which matches the best the evidence. So, as I don't use evidence, is there a choice made by MAKER when working with multiple gene predictors? If yes, how does it work? Also, I have not well understood, if the selection of the gene predictor to use is made for every gene? Sorry to asking if the answer is obvious, but after reading your papers and looking on the archived posts, I have not found the answer. By the way, I have also a question about your paper on MAKER2 (Holt and Yandell, 2011). It is said many times that gene predictors used in MAKER pipeline give better results than when used alone, but I have not understand why. Can you explain this fact? Best regards, Lo?c Meunier From jacques.dainat at bils.se Mon Nov 14 01:55:06 2016 From: jacques.dainat at bils.se (Jacques Dainat) Date: Mon, 14 Nov 2016 09:55:06 +0100 Subject: [maker-devel] strand of single exon EST from fasta Message-ID: <2E91C252-D244-47A2-B896-99EE0F69EBBA@bils.se> Hello, I?m annotating several strains of a same fungus, and I have stranded RNAseq for all of them. I?m using MAKER3. Let?s say I?m annotating the species1 using its species-specific assembled transcripts that are in gff. I know that MAKER cannot do anything about the strand coming from the est_gff. In order to check that everything went fine during my transcriptome assembly and the strands correctly defined, I checked the annotation within a browser. I can see the strands from my transcripts in gff format were perfect (match with the proteins strands / and with abinitio prediction strands / and ORFs are OK). As I wanted to take advantage on my other strains RNAseq I decided to use them within this annotation. As the transcriptome assemblies of these RNAseq have been done based on their corresponding genomes, I cannot use the gff files. Indeed, the location are not corresponding to the genome of my species1. So I decided to extract the sequences in fasta format to feed MAKER with (alt_est parameter). When I visualise those transcript alignements I was really surprised by the strands decided by MAKER. It seems completely random, while all the est fasta sequences from a same locus are given in the same strand. So, I have two questions: 1) How the strand is decided for single exon EST provided in fasta format ? (I thought it was based on the longest ORF) 2) Is it normal that the second annotation using these alt_est is worse (far less gene models) than the previous one ? (I thought the strand of my single exon alt_ests would not play a role during the the annotation process. Or maybe it?s another biais from these alt_est => loci less well defined ?) Here 3 examples: The top green track has the correct strand and is based on the gff file. The bottom green cluster tracks are fasta sequences from the other strains aligned through MAKER. (I dont?t know if it could play a role but all sequences from a same locus have been sent to MAKER in the same strand). Thank you very much for your help, Jacques Dainat -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Screen Shot 2016-11-13 at 13.05.24.png Type: image/png Size: 52019 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Screen Shot 2016-11-13 at 13.05.44.png Type: image/png Size: 26966 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Screen Shot 2016-11-13 at 13.07.13.png Type: image/png Size: 24338 bytes Desc: not available URL: From carsonhh at gmail.com Mon Nov 14 13:08:13 2016 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 14 Nov 2016 13:08:13 -0700 Subject: [maker-devel] strand of single exon EST from fasta In-Reply-To: <2E91C252-D244-47A2-B896-99EE0F69EBBA@bils.se> References: <2E91C252-D244-47A2-B896-99EE0F69EBBA@bils.se> Message-ID: Single exon EST and alt-EST strand are based on longest ORF. In the event that there is a tie, then whatever strand that was assigned by the aligner would be maintained. alt-ESTs are less likely to align or produce a model than the ESTs. If you have competing models on opposite strands for the same CDS, then support from ab initio, spliced EST, or exonerate protein alignments will be needed for the model. ?Carson > On Nov 14, 2016, at 1:55 AM, Jacques Dainat wrote: > > Hello, > > I?m annotating several strains of a same fungus, and I have stranded RNAseq for all of them. I?m using MAKER3. > Let?s say I?m annotating the species1 using its species-specific assembled transcripts that are in gff. I know that MAKER cannot do anything about the strand coming from the est_gff. In order to check that everything went fine during my transcriptome assembly and the strands correctly defined, I checked the annotation within a browser. I can see the strands from my transcripts in gff format were perfect (match with the proteins strands / and with abinitio prediction strands / and ORFs are OK). > > As I wanted to take advantage on my other strains RNAseq I decided to use them within this annotation. As the transcriptome assemblies of these RNAseq have been done based on their corresponding genomes, I cannot use the gff files. Indeed, the location are not corresponding to the genome of my species1. So I decided to extract the sequences in fasta format to feed MAKER with (alt_est parameter). > When I visualise those transcript alignements I was really surprised by the strands decided by MAKER. It seems completely random, while all the est fasta sequences from a same locus are given in the same strand. > > So, I have two questions: > 1) How the strand is decided for single exon EST provided in fasta format ? (I thought it was based on the longest ORF) > 2) Is it normal that the second annotation using these alt_est is worse (far less gene models) than the previous one ? (I thought the strand of my single exon alt_ests would not play a role during the the annotation process. Or maybe it?s another biais from these alt_est => loci less well defined ?) > > > > Here 3 examples: The top green track has the correct strand and is based on the gff file. The bottom green cluster tracks are fasta sequences from the other strains aligned through MAKER. (I dont?t know if it could play a role but all sequences from a same locus have been sent to MAKER in the same strand). > > > Thank you very much for your help, > > Jacques Dainat > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Nov 14 13:18:26 2016 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 14 Nov 2016 13:18:26 -0700 Subject: [maker-devel] Predictions without evidence In-Reply-To: References: Message-ID: <7BDEAAF4-230C-4315-B353-43381237BCB0@gmail.com> Gene predictors have to be trained on each organism to generate a matched HMM. If they are not trained, they will not work well. MAKER also sends hints to the predictor based on the evidence alignments to further alter probabilities used by the predictor to better match the evidence. Evidence is also used in final filtering. All models without evidence will have an AED of 1, which means no support. Not using evidence will result in very poor models especially if you don?t have an HMM built exactly for the organism. The main problem will be over prediction. Note the behavior of SNAP alone in the MAKER2 paper. The result is tens of thousands of false positive gene models. If you only run multiple gene predictors without evidence, the final model will be whatever model has the best consensus structure for the set. If the set consists of two models, then there is no consensus and the longest one is kept. ?Carson > On Nov 14, 2016, at 1:50 AM, Lo?c wrote: > > Hello, > > I am a Ph. D. student, and I am using MAKER to automate gene prediction for many genomes as part of a genome mining work, so I don't include evidence for its use. > If I understood well, when exploiting multiple gene predictor softwares, AED is used to define the prediction which matches the best the evidence. > > So, as I don't use evidence, is there a choice made by MAKER when working with multiple gene predictors? If yes, how does it work? > Also, I have not well understood, if the selection of the gene predictor to use is made for every gene? > > Sorry to asking if the answer is obvious, but after reading your papers and looking on the archived posts, I have not found the answer. > > By the way, I have also a question about your paper on MAKER2 (Holt and Yandell, 2011). It is said many times that gene predictors used in MAKER pipeline give better results than when used alone, but I have not understand why. Can you explain this fact? > > Best regards, > > Lo?c Meunier > > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Thu Nov 17 14:05:53 2016 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 17 Nov 2016 14:05:53 -0700 Subject: [maker-devel] About split genes problem in Maker annotations In-Reply-To: <75508AB460A77C4798EC49425637E292194A0DA6@PETREL-MA.imcb.a-star.edu.sg> References: <75508AB460A77C4798EC49425637E292194A0DA6@PETREL-MA.imcb.a-star.edu.sg> Message-ID: <36BBB195-EEB4-4B3A-9463-3E4171731390@gmail.com> est2genome and protein2genome should only be used for initial training. They are not predictors, rather they take an EST/protein alignment, find the longest ORF and then turn the ORF directly into a gene model. It is good enough to build a training dataset, but the models will almost always be partial and fragmented. Also because the alignments both produce and support themselves, they always score well, so their AED values are meaningless. Once you have a predictor trained, you should turn est2genome and protein2genome off. With a trained predictor, the alignments will then serve as hints to Augustus as to where likely introns/exons will be, and this will give the desired behavior. Note Augustus will attempt to build the most probable model given the hints and the assembly sequence. If there are any assembly issues affecting the ORF, the predictor will often skip exons or split the model in the locus. Also make sure you have built a species specific repeat library to add to the default repeat libraries used by MAKER (you can use tools like RepeatModeler to do this). Otherwise you will get spurious alignments of much of your evidecence and Augustus will generate false positive results. You may also want to add a large dataset like Uniprot/swiss-prot to the protein evidence. The best way to evaluate annotations and performance is to visually review annotation in tools like Apollo. It will allow you to see if evidence, gene predictions, and final models achieve consensus or if alignments don?t match (spurious alignment generally suggests a repeat masking issue or evidence quality issue) or if raw ab initio predictions don?t match (indicates insufficient training or an underlying assembly issues). ?Carson > On Nov 16, 2016, at 8:01 PM, Prashant Narendra SHINGATE wrote: > > Hi Carson, > > We are annotating the genome of a fish with a relatively small genome (~450Mb) using Maker and encountering many genes that are split and predicted as multiple genes. We are using Augustus for de novo prediction. Fortunately we have full-length RNAseq for about 4000 genes (and total ~50k transcripts) from the same species, and whole-genome protein sequences from a very closely related species. > > First we trained Augustus using ~4000 full length RNAseq transcript from the same species. This trained Augustus model was used in the Maker annotation pipeline along with ~50k RNAseq transcripts (>1000bp) and whole-genome proteins sequences from a closely related species. > > We first tried annotating using the options est2genome=1, protein2genome=1 and Augustus ON. We found several genes were split and the program seemed to give weight to Augustus prediction in spite of having full-length RNAseq and protein sequences aligned to the gene predicted loci (visualized using Jbrowser). > > In the next trial we used est2genome=1, protein2genome=1 and Augustus OFF in the first step. In the second step we did reiteration by est2genome=0, protein2genome=0 and Augustus ON. Still the output contained split genes. > > In the third trial we used est2genome=1, protein2genome=1 and Augustus OFF and checked the output. In this output full-length genes were predicted whenever full-length RNAseq and/or protein sequences were available. This seems to suggest that when we use Augustus, more weight is given to Augustus de novo prediction and the synthesis of evidence from RNAseq and protein sequences is not happening. > > Can you please let us know why we are getting split genes in spite of having full-length RNAseq and/or protein sequences? What changes would you suggest to the protocol to overcome this problem? > > We thank you very much for your help and time. > > Regards, > Prashant Shingate, PhD :: Research Fellow :: Comparative and Medical Genomics Lab :: Institute of Molecular and Cell Biology (IMCB) :: Agency for Science, Technology and Research (A*STAR) > 61 Biopolis Drive :: #05-04 Proteos :: Singapore 138673 :: DID (+65) 6586 9570 :: Fax (+65) 6779 1117:: http://www.imcb.a-star.edu.sg/ > We advance science and develop innovative technology to further economic growth and improve lives. > > > > > Note: This message may contain confidential information. If this Email/Fax has been sent to you by mistake, please notify the sender and delete it immediately. Thank you. -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Nov 17 21:04:31 2016 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 17 Nov 2016 21:04:31 -0700 Subject: [maker-devel] Error running MAKER In-Reply-To: References: <478D5289-91FD-4F3A-AED1-B2A81A742D43@gmail.com> Message-ID: <8E9C294A-B68C-42C1-999C-13165985AD93@gmail.com> To use less RAM, try lowering max_dna_len=, setting blast_depth= parameters to 20 pr 30 in maker_bopts.ctl (default is limitless), or when using MPI, starting fewer processes per node (requires manipulation of hostfile or using round robin distribution flag for MPI flavors where it is available). The memory issue could be causing the lock failure as well. ?Carson > On Nov 17, 2016, at 7:53 PM, John Cornelius wrote: > > Ok, so I went and searched one of the output logs for all the lines that say ERROR and I got 44 lines with the following message: > > ERROR: Lock broken in runlog > > With these lines found at the end: > > ERROR: Failed while polishig ESTs > ERROR: Chunk failed at level:2, tier_type:3 > ERROR: Could not query process table: Cannot allocate memory at /packages/maker/2.31.8/bin/../lib/Proc/ProcessTable_simple.pm line 62. > > From that last line it looks like the process is running out of RAM would that be right? Thanks. > > On Fri, Nov 11, 2016 at 2:59 PM, Carson Holt > wrote: > The cause of the error is probably further back in the STDERR. With MPI so many processes are producing status and notes, that you can get several seconds of output after ta failure. If you kept the whole STDERR, I can help you look through it. searching for ?ERROR? all caps is usually where you will see it. Also MAKER keeps a log of progress, so even on failure, you can just restart it and it will pick up the analysis from the last successful step. > > ?Carson > > >> On Nov 10, 2016, at 3:43 PM, John Cornelius > wrote: >> >> Hello, I'm using MAKER to annotate a tetraploid genome and while running it, I encountered the following error: >> >> #--------- command -------------# >> Widget::exonerate::est2genome: >> /packages/exonerate-2.2.0/bin/exonerate -q /tmp/maker_08Elxf/15/TRINITY_GG_19079_c1670_g1_i1.for.84770203-84771247.15.fasta -t /tmp/maker_08Elxf/15/chr9_10L.84770203-84771247.15.fasta -Q dna -T dna --model est2genome --minintron 20 --maxintron 10000 --showcigar --percent 20 > /tmp/maker_08Elxf/15/chr9_10L.84770203-84771247.TRINITY_GG_19079_c1670_g1_i1.e.exonerate >> #-------------------------------# >> running est2genome search. >> #--------- command -------------# >> Widget::exonerate::est2genome: >> /packages/exonerate-2.2.0/bin/exonerate -q /tmp/maker_08Elxf/10/TRINITY_GG_87963_c9694_g10_i12.for.49475083-49475985.10.fasta -t /tmp/maker_08Elxf/10/chr6L.49475083-49475985.10.fasta -Q dna -T dna --model est2genome --minintron 20 --maxintron 10000 --showcigar --percent 20 > /tmp/maker_08Elxf/10/chr6L.49475083-49475985.TRINITY_GG_87963_c9694_g10_i12.e.exonerate >> #-------------------------------# >> >> =================================================================================== >> = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES >> = PID 132376 RUNNING AT pnap-pe7-s03 >> = EXIT CODE: 135 >> = CLEANING UP REMAINING PROCESSES >> = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES >> =================================================================================== >> YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Bus error (signal 7) >> This typically refers to a problem with your application. >> Please see the FAQ page for debugging suggestions >> >> The the command I ran was the following: >> >> #PBS -l walltime=240:00:00 >> #PBS -N MAKER >> #PBS -l nodes=1:ppn=16 >> ##PBS -q hmem >> #PBS -j oe >> #PBS -m abe >> #PBS -M jcornelius at tgen.org >> #PBS -A tgen-205000 >> #PBS -o /scratch/jcornelius/xenopus_laevis/maker_run >> >> # --- load required modules --- # >> >> module load maker >> >> # --- run maker --- # >> >> cd /scratch/jcornelius/xenopus_laevis/maker_run >> mpiexec -n 16 maker -base XLNEURO.run1 -fix_nucleotides >> >> I'm not sure what could be causing this error and any help would be much appreciated. Thanks. >> -- >> John Cornelius >> MCB PhD Candidate >> Arizona State University >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > > -- > John Cornelius > MCB PhD Candidate > Arizona State University -------------- next part -------------- An HTML attachment was scrubbed... URL: From jcornel3 at asu.edu Fri Nov 18 12:14:52 2016 From: jcornel3 at asu.edu (John Cornelius) Date: Fri, 18 Nov 2016 12:14:52 -0700 Subject: [maker-devel] Error running MAKER In-Reply-To: <8E9C294A-B68C-42C1-999C-13165985AD93@gmail.com> References: <478D5289-91FD-4F3A-AED1-B2A81A742D43@gmail.com> <8E9C294A-B68C-42C1-999C-13165985AD93@gmail.com> Message-ID: Would the lock failure cause problems with the annotation? It looks like Maker is still progressing, just not as quickly as I thought it would be. On Thu, Nov 17, 2016 at 9:04 PM, Carson Holt wrote: > To use less RAM, try lowering max_dna_len=, setting blast_depth= > parameters to 20 pr 30 in maker_bopts.ctl (default is limitless), or when > using MPI, starting fewer processes per node (requires manipulation of > hostfile or using round robin distribution flag for MPI flavors where it is > available). > > The memory issue could be causing the lock failure as well. > > ?Carson > > > > On Nov 17, 2016, at 7:53 PM, John Cornelius wrote: > > Ok, so I went and searched one of the output logs for all the lines that > say ERROR and I got 44 lines with the following message: > > ERROR: Lock broken in runlog > > With these lines found at the end: > > ERROR: Failed while polishig ESTs > ERROR: Chunk failed at level:2, tier_type:3 > ERROR: Could not query process table: Cannot allocate memory at > /packages/maker/2.31.8/bin/../lib/Proc/ProcessTable_simple.pm line 62. > > From that last line it looks like the process is running out of RAM would > that be right? Thanks. > > On Fri, Nov 11, 2016 at 2:59 PM, Carson Holt wrote: > >> The cause of the error is probably further back in the STDERR. With MPI >> so many processes are producing status and notes, that you can get several >> seconds of output after ta failure. If you kept the whole STDERR, I can >> help you look through it. searching for ?ERROR? all caps is usually where >> you will see it. Also MAKER keeps a log of progress, so even on failure, >> you can just restart it and it will pick up the analysis from the last >> successful step. >> >> ?Carson >> >> >> On Nov 10, 2016, at 3:43 PM, John Cornelius wrote: >> >> Hello, I'm using MAKER to annotate a tetraploid genome and while running >> it, I encountered the following error: >> >> #--------- command -------------# >> Widget::exonerate::est2genome: >> /packages/exonerate-2.2.0/bin/exonerate -q >> /tmp/maker_08Elxf/15/TRINITY_GG_19079_c1670_g1_i1.for.84770203-84771247.15.fasta >> -t /tmp/maker_08Elxf/15/chr9_10L.84770203-84771247.15.fasta -Q dna -T >> dna --model est2genome --minintron 20 --maxintron 10000 --showcigar >> --percent 20 > /tmp/maker_08Elxf/15/chr9_10L. >> 84770203-84771247.TRINITY_GG_19079_c1670_g1_i1.e.exonerate >> #-------------------------------# >> running est2genome search. >> #--------- command -------------# >> Widget::exonerate::est2genome: >> /packages/exonerate-2.2.0/bin/exonerate -q >> /tmp/maker_08Elxf/10/TRINITY_GG_87963_c9694_g10_i12.for.49475083-49475985.10.fasta >> -t /tmp/maker_08Elxf/10/chr6L.49475083-49475985.10.fasta -Q dna -T dna >> --model est2genome --minintron 20 --maxintron 10000 --showcigar --percent >> 20 > /tmp/maker_08Elxf/10/chr6L.49475083-49475985.TRINITY_GG_8796 >> 3_c9694_g10_i12.e.exonerate >> #-------------------------------# >> >> ============================================================ >> ======================= >> = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES >> = PID 132376 RUNNING AT pnap-pe7-s03 >> = EXIT CODE: 135 >> = CLEANING UP REMAINING PROCESSES >> = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES >> ============================================================ >> ======================= >> YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Bus error (signal 7) >> This typically refers to a problem with your application. >> Please see the FAQ page for debugging suggestions >> >> The the command I ran was the following: >> >> #PBS -l walltime=240:00:00 >> #PBS -N MAKER >> #PBS -l nodes=1:ppn=16 >> ##PBS -q hmem >> #PBS -j oe >> #PBS -m abe >> #PBS -M jcornelius at tgen.org >> #PBS -A tgen-205000 >> #PBS -o /scratch/jcornelius/xenopus_laevis/maker_run >> >> # --- load required modules --- # >> >> module load maker >> >> # --- run maker --- # >> >> cd /scratch/jcornelius/xenopus_laevis/maker_run >> mpiexec -n 16 maker -base XLNEURO.run1 -fix_nucleotides >> >> I'm not sure what could be causing this error and any help would be much >> appreciated. Thanks. >> -- >> John Cornelius >> MCB PhD Candidate >> Arizona State University >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> > > > -- > John Cornelius > MCB PhD Candidate > Arizona State University > > > -- John Cornelius MCB PhD Candidate Arizona State University -------------- next part -------------- An HTML attachment was scrubbed... URL: From mohamed.amine.chebbi at univ-poitiers.fr Thu Nov 24 14:45:01 2016 From: mohamed.amine.chebbi at univ-poitiers.fr (Mohamed Amine Chebbi) Date: Thu, 24 Nov 2016 22:45:01 +0100 (CET) Subject: [maker-devel] map_fasta_ids : No mapping available... Message-ID: <773569486.15711466.1480023901276.JavaMail.zimbra@univ-poitiers.fr> Hello ! I'am attempting to rename genes of maker.proteins.fasta for Genebank submission using the map_fasta_ids script. It seems to work correctly for the major of gene models, except to those ones having the below warning message : WARNING: No mapping available for maker-scaffold_1710-augustus-gene-0.3-mRNA-1 WARNING: No mapping available for maker-scaffold_1710-augustus-gene-0.0-mRNA-1 WARNING: No mapping available for maker-scaffold_1710-snap-gene-0.6-mRNA-1 WARNING: No mapping available for maker-scaffold_1710-augustus-gene-0.4-mRNA-1 WARNING: No mapping available for maker-scaffold_1734-augustus-gene-0.1-mRNA-1 WARNING: No mapping available for maker-scaffold_1734-augustus-gene-0.2-mRNA-1 WARNING: No mapping available for maker-scaffold_1734-augustus-gene-0.0-mRNA-1 WARNING: No mapping available for maker-scaffold_1734-augustus-gene-0.5-mRNA-1 WARNING: No mapping available for maker-scaffold_1734-augustus-gene-0.6-mRNA-1 WARNING: No mapping available for maker-scaffold_1734-snap-gene-0.15-mRNA-1 WARNING: No mapping available for maker-scaffold_1734-snap-gene-0.16-mRNA-1 Looking into the maker.gff file, these gene names are missing and may be replaced by other ones which differ by the numbers following the gene predictor. I wounder if you can explain me the reason of these warning message and how to resolve it. Thank you , Best, Amine -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Nov 24 19:04:59 2016 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 24 Nov 2016 19:04:59 -0700 Subject: [maker-devel] Error running MAKER In-Reply-To: References: <478D5289-91FD-4F3A-AED1-B2A81A742D43@gmail.com> <8E9C294A-B68C-42C1-999C-13165985AD93@gmail.com> Message-ID: <3C668404-EA3C-46B4-9676-8F95E2AFB64F@gmail.com> A lock failure can become an issue if two separate jobs are running simultaneously. They may both try to process the same contig at the same time (modifying each others files) which will cause one or both to fail. On failure, it should always retry at some later point. So it can usually recover from this. If you see any partial lines in the resulting GFF3, then it did not recover and you need to just rerun whatever contig this happened on. ?Carson > On Nov 18, 2016, at 12:14 PM, John Cornelius wrote: > > Would the lock failure cause problems with the annotation? It looks like Maker is still progressing, just not as quickly as I thought it would be. > > On Thu, Nov 17, 2016 at 9:04 PM, Carson Holt > wrote: > To use less RAM, try lowering max_dna_len=, setting blast_depth= parameters to 20 pr 30 in maker_bopts.ctl (default is limitless), or when using MPI, starting fewer processes per node (requires manipulation of hostfile or using round robin distribution flag for MPI flavors where it is available). > > The memory issue could be causing the lock failure as well. > > ?Carson > > > >> On Nov 17, 2016, at 7:53 PM, John Cornelius > wrote: >> >> Ok, so I went and searched one of the output logs for all the lines that say ERROR and I got 44 lines with the following message: >> >> ERROR: Lock broken in runlog >> >> With these lines found at the end: >> >> ERROR: Failed while polishig ESTs >> ERROR: Chunk failed at level:2, tier_type:3 >> ERROR: Could not query process table: Cannot allocate memory at /packages/maker/2.31.8/bin/../lib/Proc/ProcessTable_simple.pm line 62. >> >> From that last line it looks like the process is running out of RAM would that be right? Thanks. >> >> On Fri, Nov 11, 2016 at 2:59 PM, Carson Holt > wrote: >> The cause of the error is probably further back in the STDERR. With MPI so many processes are producing status and notes, that you can get several seconds of output after ta failure. If you kept the whole STDERR, I can help you look through it. searching for ?ERROR? all caps is usually where you will see it. Also MAKER keeps a log of progress, so even on failure, you can just restart it and it will pick up the analysis from the last successful step. >> >> ?Carson >> >> >>> On Nov 10, 2016, at 3:43 PM, John Cornelius > wrote: >>> >>> Hello, I'm using MAKER to annotate a tetraploid genome and while running it, I encountered the following error: >>> >>> #--------- command -------------# >>> Widget::exonerate::est2genome: >>> /packages/exonerate-2.2.0/bin/exonerate -q /tmp/maker_08Elxf/15/TRINITY_GG_19079_c1670_g1_i1.for.84770203-84771247.15.fasta -t /tmp/maker_08Elxf/15/chr9_10L.84770203-84771247.15.fasta -Q dna -T dna --model est2genome --minintron 20 --maxintron 10000 --showcigar --percent 20 > /tmp/maker_08Elxf/15/chr9_10L.84770203-84771247.TRINITY_GG_19079_c1670_g1_i1.e.exonerate >>> #-------------------------------# >>> running est2genome search. >>> #--------- command -------------# >>> Widget::exonerate::est2genome: >>> /packages/exonerate-2.2.0/bin/exonerate -q /tmp/maker_08Elxf/10/TRINITY_GG_87963_c9694_g10_i12.for.49475083-49475985.10.fasta -t /tmp/maker_08Elxf/10/chr6L.49475083-49475985.10.fasta -Q dna -T dna --model est2genome --minintron 20 --maxintron 10000 --showcigar --percent 20 > /tmp/maker_08Elxf/10/chr6L.49475083-49475985.TRINITY_GG_87963_c9694_g10_i12.e.exonerate >>> #-------------------------------# >>> >>> =================================================================================== >>> = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES >>> = PID 132376 RUNNING AT pnap-pe7-s03 >>> = EXIT CODE: 135 >>> = CLEANING UP REMAINING PROCESSES >>> = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES >>> =================================================================================== >>> YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Bus error (signal 7) >>> This typically refers to a problem with your application. >>> Please see the FAQ page for debugging suggestions >>> >>> The the command I ran was the following: >>> >>> #PBS -l walltime=240:00:00 >>> #PBS -N MAKER >>> #PBS -l nodes=1:ppn=16 >>> ##PBS -q hmem >>> #PBS -j oe >>> #PBS -m abe >>> #PBS -M jcornelius at tgen.org >>> #PBS -A tgen-205000 >>> #PBS -o /scratch/jcornelius/xenopus_laevis/maker_run >>> >>> # --- load required modules --- # >>> >>> module load maker >>> >>> # --- run maker --- # >>> >>> cd /scratch/jcornelius/xenopus_laevis/maker_run >>> mpiexec -n 16 maker -base XLNEURO.run1 -fix_nucleotides >>> >>> I'm not sure what could be causing this error and any help would be much appreciated. Thanks. >>> -- >>> John Cornelius >>> MCB PhD Candidate >>> Arizona State University >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> >> >> -- >> John Cornelius >> MCB PhD Candidate >> Arizona State University > > > > > -- > John Cornelius > MCB PhD Candidate > Arizona State University -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Nov 28 09:26:40 2016 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 28 Nov 2016 09:26:40 -0700 Subject: [maker-devel] map_fasta_ids : No mapping available... In-Reply-To: <773569486.15711466.1480023901276.JavaMail.zimbra@univ-poitiers.fr> References: <773569486.15711466.1480023901276.JavaMail.zimbra@univ-poitiers.fr> Message-ID: <401400E0-7581-4407-A30E-A787485B0E86@gmail.com> The map file you run with is two columns (old_id and new_id). If the input file has IDs that do not match anything in the old_id column then it throws the warning. It means there is a mismatch between the map file being used and the fasta file. This can occur if you did downstream manipulation of the fasta file, are using the wrong fasta file, or if you used GFF3 as input to a maker step that as generated an ID mismatch. ?Carson > On Nov 24, 2016, at 2:45 PM, Mohamed Amine Chebbi wrote: > > Hello ! > > I'am attempting to rename genes of maker.proteins.fasta for Genebank submission using the map_fasta_ids script. It seems to work correctly for the major of gene models, except to those ones having the below warning message : > > WARNING: No mapping available for maker-scaffold_1710-augustus-gene-0.3-mRNA-1 > WARNING: No mapping available for maker-scaffold_1710-augustus-gene-0.0-mRNA-1 > WARNING: No mapping available for maker-scaffold_1710-snap-gene-0.6-mRNA-1 > WARNING: No mapping available for maker-scaffold_1710-augustus-gene-0.4-mRNA-1 > WARNING: No mapping available for maker-scaffold_1734-augustus-gene-0.1-mRNA-1 > WARNING: No mapping available for maker-scaffold_1734-augustus-gene-0.2-mRNA-1 > WARNING: No mapping available for maker-scaffold_1734-augustus-gene-0.0-mRNA-1 > WARNING: No mapping available for maker-scaffold_1734-augustus-gene-0.5-mRNA-1 > WARNING: No mapping available for maker-scaffold_1734-augustus-gene-0.6-mRNA-1 > WARNING: No mapping available for maker-scaffold_1734-snap-gene-0.15-mRNA-1 > WARNING: No mapping available for maker-scaffold_1734-snap-gene-0.16-mRNA-1 > > Looking into the maker.gff file, these gene names are missing and may be replaced by other ones which differ by the numbers following the gene predictor. > > I wounder if you can explain me the reason of these warning message and how to resolve it. > > Thank you , > > Best, > Amine > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From parulk at caltech.edu Tue Nov 29 10:13:06 2016 From: parulk at caltech.edu (Kudtarkar, Parul V.) Date: Tue, 29 Nov 2016 17:13:06 +0000 Subject: [maker-devel] error: training genemodel with SNAP and GeneMark & run time to generate AUGUTUS species file Message-ID: <5F5AE8A3-967E-4876-8581-FE54FB676210@caltech.edu> Dear Maker developers, 1. We use assembled RNAseq(from same species) and protein evidence(from evolutionary close species) to generate training gene structure(1st iteration, est2genome=1,protein2genome=1 ). 2. This is than used to train abinito gene predictors, SNAP and AUGUSTUS. 3. GeneMarkES( version: GeneMark-ES / ET v.4.32) is used to produce training data-set with the command gmes_petap.pl --sequence pmin_jelly.fa 4. We would be predicting genes using results from SNAP, Genemark and AUGUSTUS(2nd iteration, est2genome=0, protein2genome=0) I have couple of questions relating to Genemark and AUGUSTUS 1. AUGUSTUS We do not have a species file for species file of our interest or evolutionary closer species following command is used to generate species file /autoAug.pl --genome=pmin_jelly.fa --species=pminiata --cdna=pmin_transcripts.fa --trainingset=genome.gff3 --singleCPU -v --useexisting AUGUSTUS is taking too long to compute species file, is there a solution for this issue. Using species file from other organism might generate false positives. Is it advised in such situations to not used AUGUSTUS model? 2. Genemark I used the gmhmm file generated in the genemark output directory, however I encounter following error ------------------------- STATUS: Parsing control files... ERROR: You have failed to provide a value for 'gmhmme3' in the control files. ERROR: You have failed to provide a value for 'probuild' in the control files. --------------------- FYI ----- maker_opts.ctl #-----Gene Prediction snaphmm=/home/parul/Pmin_new/maker_snap/pmin1.hmm #SNAP HMM file gmhmm=/home/parul/Pmin_new/maker_snap/gmhmm.mod #GeneMark HMM file ----- Using SNAP for training gene model yields over 6000-7000 additional gene. The model has good cumulative AED value. I was hoping in addition to SNAP, if I could use AUGUSTUS and GeneMark to train the gene model to fuse dispersed models so that the gene count is within the expected range. Thanks and regards, Parul Sent from my iPhone -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Tue Nov 29 10:28:33 2016 From: dence at genetics.utah.edu (Daniel Ence) Date: Tue, 29 Nov 2016 17:28:33 +0000 Subject: [maker-devel] error: training genemodel with SNAP and GeneMark & run time to generate AUGUTUS species file In-Reply-To: <5F5AE8A3-967E-4876-8581-FE54FB676210@caltech.edu> References: <5F5AE8A3-967E-4876-8581-FE54FB676210@caltech.edu> Message-ID: <359BAE14-18C2-4B91-A628-9613F94C8468@genetics.utah.edu> HI Parul, Training augustus does take a long time. Much longer than for the other two predictors that you mentioned. Have you tried using the webAugustus web portal? The team that made augustus run it and can probably help you with trouble-shooting their page for creating training sets: http://bioinf.uni-greifswald.de/webaugustus/training/create The error that you got regarding genemark is saying that maker can?t find the genemark and probuild executable files. These are specified in the maker_exe.ctl file, not the ?opts? file. You need to put valid paths to those executable files in for the given parameters. This is something that is usually specified during installation of MAKER. Hope that helps, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 On Nov 29, 2016, at 10:13 AM, Kudtarkar, Parul V. > wrote: Dear Maker developers, 1. We use assembled RNAseq(from same species) and protein evidence(from evolutionary close species) to generate training gene structure(1st iteration, est2genome=1,protein2genome=1 ). 2. This is than used to train abinito gene predictors, SNAP and AUGUSTUS. 3. GeneMarkES( version: GeneMark-ES / ET v.4.32) is used to produce training data-set with the command gmes_petap.pl --sequence pmin_jelly.fa 4. We would be predicting genes using results from SNAP, Genemark and AUGUSTUS(2nd iteration, est2genome=0, protein2genome=0) I have couple of questions relating to Genemark and AUGUSTUS 1. AUGUSTUS We do not have a species file for species file of our interest or evolutionary closer species following command is used to generate species file /autoAug.pl --genome=pmin_jelly.fa --species=pminiata --cdna=pmin_transcripts.fa --trainingset=genome.gff3 --singleCPU -v --useexisting AUGUSTUS is taking too long to compute species file, is there a solution for this issue. Using species file from other organism might generate false positives. Is it advised in such situations to not used AUGUSTUS model? 2. Genemark I used the gmhmm file generated in the genemark output directory, however I encounter following error ------------------------- STATUS: Parsing control files... ERROR: You have failed to provide a value for 'gmhmme3' in the control files. ERROR: You have failed to provide a value for 'probuild' in the control files. --------------------- FYI ----- maker_opts.ctl #-----Gene Prediction snaphmm=/home/parul/Pmin_new/maker_snap/pmin1.hmm #SNAP HMM file gmhmm=/home/parul/Pmin_new/maker_snap/gmhmm.mod #GeneMark HMM file ----- Using SNAP for training gene model yields over 6000-7000 additional gene. The model has good cumulative AED value. I was hoping in addition to SNAP, if I could use AUGUSTUS and GeneMark to train the gene model to fuse dispersed models so that the gene count is within the expected range. Thanks and regards, Parul Sent from my iPhone _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Nov 29 10:34:31 2016 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 29 Nov 2016 10:34:31 -0700 Subject: [maker-devel] error: training genemodel with SNAP and GeneMark & run time to generate AUGUTUS species file In-Reply-To: <5F5AE8A3-967E-4876-8581-FE54FB676210@caltech.edu> References: <5F5AE8A3-967E-4876-8581-FE54FB676210@caltech.edu> Message-ID: <596EAC73-4DB5-4144-A8EA-0E955AA0E028@gmail.com> How to train Augustus ?> http://www.molecularevolution.org/molevolfiles/exercises/augustus/training.html Step 2 shows how to create an empty species to start training with. Then Step 4 (optimize_augustus.pl) is the step that takes a while. Then for GeneMark, you must set the location of the necessary GeneMark executables in the maker_exe.ctl file. After getting all predictors trained, and running a few contigs, take a moment to review the predictor performance by manually reviewing them in something like Apollo. It is not uncommon that one or more perform poorly on an organism (they should each produce similar predictions). If one is significantly off relative to the other predictors and the evidence, it should be dropped. A bad behaving predictor will reduce the overall annotation performance. ?Carson > On Nov 29, 2016, at 10:13 AM, Kudtarkar, Parul V. wrote: > > >> Dear Maker developers, >> >> 1. We use assembled RNAseq(from same species) and protein evidence(from evolutionary close species) to generate training gene structure(1st iteration, est2genome=1,protein2genome=1 ). >> >> 2. This is than used to train abinito gene predictors, SNAP and AUGUSTUS. >> >> 3. GeneMarkES( version: GeneMark-ES / ET v.4.32) is used to produce training data-set with the command >> >> gmes_petap.pl --sequence pmin_jelly.fa >> >> 4. We would be predicting genes using results from SNAP, Genemark and AUGUSTUS(2nd iteration, est2genome=0, protein2genome=0) >> >> I have couple of questions relating to Genemark and AUGUSTUS >> >> 1. AUGUSTUS >> >> We do not have a species file for species file of our interest or evolutionary closer species >> >> following command is used to generate species file >> >> >> /autoAug.pl --genome=pmin_jelly.fa --species=pminiata --cdna=pmin_transcripts.fa --trainingset=genome.gff3 --singleCPU -v --useexisting >> AUGUSTUS is taking too long to compute species file, is there a solution for this issue. Using species file from other organism might generate false positives. Is it advised in such situations to not used AUGUSTUS model? >> >> 2. Genemark >> >> I used the gmhmm file generated in the genemark output directory, however I encounter following error >> >> >> ------------------------- >> >> STATUS: Parsing control files... >> ERROR: You have failed to provide a value for 'gmhmme3' in the control files. >> ERROR: You have failed to provide a value for 'probuild' in the control files. >> --------------------- >> FYI >> >> ----- >> >> maker_opts.ctl >> >> >> #-----Gene Prediction >> snaphmm=/home/parul/Pmin_new/maker_snap/pmin1.hmm #SNAP HMM file >> gmhmm=/home/parul/Pmin_new/maker_snap/gmhmm.mod #GeneMark HMM file >> >> ----- >> >> Using SNAP for training gene model yields over 6000-7000 additional gene. The model has good cumulative AED value. >> >> I was hoping in addition to SNAP, if I could use AUGUSTUS and GeneMark to train the gene model to fuse dispersed models so that the gene count is within the expected range. >> >> >> Thanks and regards, >> >> Parul >> > > Sent from my iPhone -------------- next part -------------- An HTML attachment was scrubbed... URL: From parulk at caltech.edu Tue Nov 29 16:40:30 2016 From: parulk at caltech.edu (Kudtarkar, Parul V.) Date: Tue, 29 Nov 2016 23:40:30 +0000 Subject: [maker-devel] error: training genemodel with SNAP and GeneMark & run time to generate AUGUTUS species file In-Reply-To: <596EAC73-4DB5-4144-A8EA-0E955AA0E028@gmail.com> References: <5F5AE8A3-967E-4876-8581-FE54FB676210@caltech.edu>, <596EAC73-4DB5-4144-A8EA-0E955AA0E028@gmail.com> Message-ID: Dear Carson and Daniel, Thanks for getting back to me promptly. Adding the path to genemark executable in maker_exe.ctl fixes the error. Hopefully optimize_augustus.pl runs quicker compared to autoAug.pl (which has been running for almost a week now) It would be interesting and we look forward to evaluate which model optimizes our expected gene count, AED values and has recognizable domains. PS. We think BUSCO has helped us to evaluate gene model completeness. Thanks, Parul ---- Parul Kudtarkar Bioinformatician Biology and Biological Engineering Office: 278 Beckman Institute California Institute of Technology MC 139-74 Pasadena CA 91125 http://www.echinobase.org ________________________________ From: Carson Holt Sent: Tuesday, November 29, 2016 9:34:31 AM To: Kudtarkar, Parul V. Cc: maker-devel at yandell-lab.org Subject: Re: error: training genemodel with SNAP and GeneMark & run time to generate AUGUTUS species file How to train Augustus -> http://www.molecularevolution.org/molevolfiles/exercises/augustus/training.html Step 2 shows how to create an empty species to start training with. Then Step 4 (optimize_augustus.pl) is the step that takes a while. Then for GeneMark, you must set the location of the necessary GeneMark executables in the maker_exe.ctl file. After getting all predictors trained, and running a few contigs, take a moment to review the predictor performance by manually reviewing them in something like Apollo. It is not uncommon that one or more perform poorly on an organism (they should each produce similar predictions). If one is significantly off relative to the other predictors and the evidence, it should be dropped. A bad behaving predictor will reduce the overall annotation performance. -Carson On Nov 29, 2016, at 10:13 AM, Kudtarkar, Parul V. > wrote: Dear Maker developers, 1. We use assembled RNAseq(from same species) and protein evidence(from evolutionary close species) to generate training gene structure(1st iteration, est2genome=1,protein2genome=1 ). 2. This is than used to train abinito gene predictors, SNAP and AUGUSTUS. 3. GeneMarkES( version: GeneMark-ES / ET v.4.32) is used to produce training data-set with the command gmes_petap.pl --sequence pmin_jelly.fa 4. We would be predicting genes using results from SNAP, Genemark and AUGUSTUS(2nd iteration, est2genome=0, protein2genome=0) I have couple of questions relating to Genemark and AUGUSTUS 1. AUGUSTUS We do not have a species file for species file of our interest or evolutionary closer species following command is used to generate species file /autoAug.pl --genome=pmin_jelly.fa --species=pminiata --cdna=pmin_transcripts.fa --trainingset=genome.gff3 --singleCPU -v --useexisting AUGUSTUS is taking too long to compute species file, is there a solution for this issue. Using species file from other organism might generate false positives. Is it advised in such situations to not used AUGUSTUS model? 2. Genemark I used the gmhmm file generated in the genemark output directory, however I encounter following error ------------------------- STATUS: Parsing control files... ERROR: You have failed to provide a value for 'gmhmme3' in the control files. ERROR: You have failed to provide a value for 'probuild' in the control files. --------------------- FYI ----- maker_opts.ctl #-----Gene Prediction snaphmm=/home/parul/Pmin_new/maker_snap/pmin1.hmm #SNAP HMM file gmhmm=/home/parul/Pmin_new/maker_snap/gmhmm.mod #GeneMark HMM file ----- Using SNAP for training gene model yields over 6000-7000 additional gene. The model has good cumulative AED value. I was hoping in addition to SNAP, if I could use AUGUSTUS and GeneMark to train the gene model to fuse dispersed models so that the gene count is within the expected range. Thanks and regards, Parul Sent from my iPhone -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Nov 30 12:24:36 2016 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 30 Nov 2016 12:24:36 -0700 Subject: [maker-devel] Error running MAKER In-Reply-To: References: <478D5289-91FD-4F3A-AED1-B2A81A742D43@gmail.com> <8E9C294A-B68C-42C1-999C-13165985AD93@gmail.com> <3C668404-EA3C-46B4-9676-8F95E2AFB64F@gmail.com> Message-ID: Yes. You can either separate out the contig using fasta_tool or find the contig in the datastore directory (failed contigs will have fasta created there just for the failed contig). Then you can use 'maker -g contig.fasta -base original_base_name? (-g and -base options) to specify that you want it to use the new contig fasta but write results to the given base directory (i.e. same as previous output directory). Remember to set -t (or tries in the maker_opts.ctl file) to a higher count when doing this. ?Carson > On Nov 30, 2016, at 12:11 PM, John Cornelius wrote: > > Awesome! Thanks for the help. MAKER finally finished it's initial run today however, I noticed that there was still one large sequence that failed. Would it be possible to run MAKER on just that sequence and then combine the result of that run with the output of my main maker run? > > On Thu, Nov 24, 2016 at 7:04 PM, Carson Holt > wrote: > A lock failure can become an issue if two separate jobs are running simultaneously. They may both try to process the same contig at the same time (modifying each others files) which will cause one or both to fail. On failure, it should always retry at some later point. So it can usually recover from this. If you see any partial lines in the resulting GFF3, then it did not recover and you need to just rerun whatever contig this happened on. > > ?Carson > > > >> On Nov 18, 2016, at 12:14 PM, John Cornelius > wrote: >> >> Would the lock failure cause problems with the annotation? It looks like Maker is still progressing, just not as quickly as I thought it would be. >> >> On Thu, Nov 17, 2016 at 9:04 PM, Carson Holt > wrote: >> To use less RAM, try lowering max_dna_len=, setting blast_depth= parameters to 20 pr 30 in maker_bopts.ctl (default is limitless), or when using MPI, starting fewer processes per node (requires manipulation of hostfile or using round robin distribution flag for MPI flavors where it is available). >> >> The memory issue could be causing the lock failure as well. >> >> ?Carson >> >> >> >>> On Nov 17, 2016, at 7:53 PM, John Cornelius > wrote: >>> >>> Ok, so I went and searched one of the output logs for all the lines that say ERROR and I got 44 lines with the following message: >>> >>> ERROR: Lock broken in runlog >>> >>> With these lines found at the end: >>> >>> ERROR: Failed while polishig ESTs >>> ERROR: Chunk failed at level:2, tier_type:3 >>> ERROR: Could not query process table: Cannot allocate memory at /packages/maker/2.31.8/bin/../lib/Proc/ProcessTable_simple.pm line 62. >>> >>> From that last line it looks like the process is running out of RAM would that be right? Thanks. >>> >>> On Fri, Nov 11, 2016 at 2:59 PM, Carson Holt > wrote: >>> The cause of the error is probably further back in the STDERR. With MPI so many processes are producing status and notes, that you can get several seconds of output after ta failure. If you kept the whole STDERR, I can help you look through it. searching for ?ERROR? all caps is usually where you will see it. Also MAKER keeps a log of progress, so even on failure, you can just restart it and it will pick up the analysis from the last successful step. >>> >>> ?Carson >>> >>> >>>> On Nov 10, 2016, at 3:43 PM, John Cornelius > wrote: >>>> >>>> Hello, I'm using MAKER to annotate a tetraploid genome and while running it, I encountered the following error: >>>> >>>> #--------- command -------------# >>>> Widget::exonerate::est2genome: >>>> /packages/exonerate-2.2.0/bin/exonerate -q /tmp/maker_08Elxf/15/TRINITY_GG_19079_c1670_g1_i1.for.84770203-84771247.15.fasta -t /tmp/maker_08Elxf/15/chr9_10L.84770203-84771247.15.fasta -Q dna -T dna --model est2genome --minintron 20 --maxintron 10000 --showcigar --percent 20 > /tmp/maker_08Elxf/15/chr9_10L.84770203-84771247.TRINITY_GG_19079_c1670_g1_i1.e.exonerate >>>> #-------------------------------# >>>> running est2genome search. >>>> #--------- command -------------# >>>> Widget::exonerate::est2genome: >>>> /packages/exonerate-2.2.0/bin/exonerate -q /tmp/maker_08Elxf/10/TRINITY_GG_87963_c9694_g10_i12.for.49475083-49475985.10.fasta -t /tmp/maker_08Elxf/10/chr6L.49475083-49475985.10.fasta -Q dna -T dna --model est2genome --minintron 20 --maxintron 10000 --showcigar --percent 20 > /tmp/maker_08Elxf/10/chr6L.49475083-49475985.TRINITY_GG_87963_c9694_g10_i12.e.exonerate >>>> #-------------------------------# >>>> >>>> =================================================================================== >>>> = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES >>>> = PID 132376 RUNNING AT pnap-pe7-s03 >>>> = EXIT CODE: 135 >>>> = CLEANING UP REMAINING PROCESSES >>>> = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES >>>> =================================================================================== >>>> YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Bus error (signal 7) >>>> This typically refers to a problem with your application. >>>> Please see the FAQ page for debugging suggestions >>>> >>>> The the command I ran was the following: >>>> >>>> #PBS -l walltime=240:00:00 >>>> #PBS -N MAKER >>>> #PBS -l nodes=1:ppn=16 >>>> ##PBS -q hmem >>>> #PBS -j oe >>>> #PBS -m abe >>>> #PBS -M jcornelius at tgen.org >>>> #PBS -A tgen-205000 >>>> #PBS -o /scratch/jcornelius/xenopus_laevis/maker_run >>>> >>>> # --- load required modules --- # >>>> >>>> module load maker >>>> >>>> # --- run maker --- # >>>> >>>> cd /scratch/jcornelius/xenopus_laevis/maker_run >>>> mpiexec -n 16 maker -base XLNEURO.run1 -fix_nucleotides >>>> >>>> I'm not sure what could be causing this error and any help would be much appreciated. Thanks. >>>> -- >>>> John Cornelius >>>> MCB PhD Candidate >>>> Arizona State University >>>> _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at box290.bluehost.com >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> >>> >>> >>> -- >>> John Cornelius >>> MCB PhD Candidate >>> Arizona State University >> >> >> >> >> -- >> John Cornelius >> MCB PhD Candidate >> Arizona State University > > > > > -- > John Cornelius > MCB PhD Candidate > Arizona State University -------------- next part -------------- An HTML attachment was scrubbed... URL: From FeatherstonJ at arc.agric.za Tue Nov 1 09:12:46 2016 From: FeatherstonJ at arc.agric.za (Jonathan Featherston) Date: Tue, 1 Nov 2016 15:12:46 +0000 Subject: [maker-devel] [Caution: Message contains Redirect URL content] InterProScan protein domain & AED physical evidence filtering In-Reply-To: References: Message-ID: <0C2463EA-53FD-4C9B-853A-BE933973E1FA@arc.agric.za> Dear Allison I'm not sure about your extra gene models but here is the script to perform quality filtering. A perl script I got from the forum somewhere (changed to txt in case it gets removed by mail server. Regards Jonathan -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- #!/usr/bin/perl -w use strict; #use lib ('/home/mcampbell/lib'); #use PostData; use Getopt::Std; use vars qw($opt_s $opt_d $opt_a $opt_p $opt_c $opt_m $opt_u); getopts('sda:pcmu'); use FileHandle; #----------------------------------------------------------------------------- #----------------------------------- MAIN ------------------------------------ #----------------------------------------------------------------------------- my $usage = "\n\nquality_filter.pl: generates defualt and standard gene builds from a maker geneated gff3_file with iprscan data pushed onto column 9 using ipr_update_gff.\n USAGE: quality_filter.pl -[options]\n OPTIONS: -d Prints transcripts with an AED <1 (MAKER default) -s Prints transcripts with an AED <1 and/or Pfam domain if in gff3 (MAKER Standard) -a Prints transcripts with an AED < the given value\n\n"; my $FILE1 = $ARGV[0]; #my $FILE2 = $ARGV[1]; die($usage) unless $ARGV[0] && ($opt_a || $opt_s || $opt_d); my %LU_G; my %LU_T; build_lus($FILE1); #build_lu_tid($FILE2); filter($FILE1); #PostData(\%LU_G); #PostData(\%LU_T); #----------------------------------------------------------------------------- #---------------------------------- SUBS ------------------------------------- #----------------------------------------------------------------------------- sub build_lus{ my $file = shift; my %data; my $fh = new FileHandle; $fh->open($file); while (defined(my $line = <$fh>)){ chomp($line); last if $line =~ /^\#\#FASTA/; next if $line =~ /^\#/; my @array = split(/\t/, $line); next unless $array[2] =~ /mRNA/; my ($tid) = $array[8] =~ /ID\=(.+?);.*/; my ($gid) = $array[8] =~ /Parent\=(.+?);.*/; my @c9 = split(/\;/, $array[8]); foreach my $x (@c9){ my ($k,$v) = $x =~ /(.+)\=(.+)/; $data{$k}=$v; } #load the LU if ($opt_s && (($data{'Dbxref'} && $data{'Dbxref'} =~ /Pfam/) || $data{'_AED'} < 1)){ $LU_G{$gid}=1; $LU_T{$tid}=1; } elsif ($opt_d && $data{'_AED'} < 1){ $LU_G{$gid}=1; $LU_T{$tid}=1; } elsif ($opt_a && $data{'_AED'} < $opt_a){ $LU_G{$gid}=1; $LU_T{$tid}=1; } undef %data; } } #----------------------------------------------------------------------------- sub filter{ my $file = shift; my $fh = new FileHandle; $fh->open($file); print "##gff-version 3\n"; while (defined(my $line = <$fh>)){ chomp($line); last if $line =~ /^\#\#FASTA/; next if $line =~ /^\#/; my @array = split(/\t/, $line); if ($array[2] eq 'gene'){ my ($id) = $array[8] =~ /ID=(\S+?);/; print $line."\n" if defined($LU_G{$id}); } elsif ($array[2] eq 'mRNA'){ my ($id) = $array[8] =~ /ID=(\S+?);/; print $line."\n" if defined($LU_T{$id}); } elsif ($array[2] eq 'exon'| $array[2] eq 'CDS'| $array[2] eq 'three_prime_UTR'| $array[2] eq 'five_prime_UTR'){ my $bool = 0; my ($ids) = $array[8] =~ /Parent=(\S+);?/; # my ($ids) = $array[8] =~ /Parent=(\S+?);/; $ids =~ s/;//; my @ids_array = split(/,/, $ids); foreach my $x (@ids_array){ if (defined($LU_T{$x})){ $bool++; } else{ $line =~ s/$x[^:]//; } } print $line."\n" if $bool; } else{print $line."\n"} } $fh->close(); } #----------------------------------------------------------------------------- sub build_lu_gid{ my $file = shift; my $fh = new FileHandle; $fh->open($file); while (defined(my $line = <$fh>)){ chomp($line); $LU_G{$line}=1; } $fh->close(); } #----------------------------------------------------------------------------- sub build_lu_tid{ my $file = shift; my $fh = new FileHandle; $fh->open($file); while (defined(my $line = <$fh>)){ chomp($line); last if $line =~ /^\#\#FASTA/; next if $line =~ /^\#/; my @array = split(/\t/, $line); if ($array[2] =~ 'mRNA'){ my ($tid) = $line =~ /ID=(.+?);/; my ($gid) = $line =~ /Parent=(.+?);/; if (defined($LU_G{$gid})){ $LU_T{$tid}=1; } } } $fh->close(); } #----------------------------------------------------------------------------- -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Nov 1 09:43:21 2016 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 1 Nov 2016 09:43:21 -0600 Subject: [maker-devel] [Caution: Message contains Redirect URL content] InterProScan protein domain & AED physical evidence filtering In-Reply-To: <0C2463EA-53FD-4C9B-853A-BE933973E1FA@arc.agric.za> References: <0C2463EA-53FD-4C9B-853A-BE933973E1FA@arc.agric.za> Message-ID: One note I?d like to make, is that doing a second round with keep_preds=1 is the wrong procedure (only do that if you really want to keep everything - i.e. in some fungi or oomycetes). Rather you should use InterProScan to evaluate the rejected models in the non-overlapping.abinit.proteins.fasta file, then grep the ones that have an IPR domain out of the GFF3 (will be match/match_part features) and then pass them to pred_gff in a separate run (just updates the format to gene/mRNA/exon/CDSwith proper reading frame). You can then merge the resulting GFF3's and fasta files. The reason there are differences between the runs is that there are models with AED less than 1 that get rejected for other reasons that you are brought back with keep_preds=1. For example if the only evidence is a protein alignment that has deep overlapping HSPs (extremely low complexity alignment) it will be filtered out even though AED is not technically equal to 1. Also if the overlapping protein evidence is in a different reading frame than the model it is supposed to support then the AED will be less than 1 but eAED will be 1 (extended AED), and the model will be rejected. ?Carson >> Hello MAKER google group, >> >> >> For the final round of a MAKER annotation for a de novo plant genome assembly, I ran MAKER twice: once with keep_preds=0 which annotated 20,284 genes and once with keep_preds=1 which annotated 34,055 genes. >> >> >> I ran the 34,055 genes (the keep_preds=1 set) through InterProScan to search the MAKER predictions for protein domain content and added this IPRScan output into the MAKER gff file with the ipr_update_gff accessory script. >> >> >> The game plan is to go through the 34,055 genes and remove any gene model that doesn? have either protein domain content or physical evidence. I am counting genes that have an AED=1 as the genes that don? have physical evidence. >> >> >> I have two questions: >> >> >> >> 1. I count 11,762 genes that have AED=1.0 in the keep_preds=1 annotation set, which leaves me with 22,293 genes that I? assuming have some physical evidence (34,055-11,762=22,293). But when I ran MAKER with keep_preds=0 originally, I only count 20,284 genes. What are the extra ?2,000 genes that are being annotated in the keep_preds=1 run that have and AED score of less than 1.0, but are not being annotated in the keep_preds=0 run? >> >> >> 2. My second question is if there is an accessory script available that will remove genes that lack either the IPRScan protein domains or physical evidence (AED < 1)? This type of gene removal was mentioned in a previous post from 2012 (https://groups.google.com/forum/#!searchin/maker-devel/sorry$20there$27s$20not$20a$20script$20prepackaged$20with$20MAKER$20for$20that$20yet.%7Csort:relevance/maker-devel/VaoXWlGHOjs/EElr_otrK8QJ ) and I was just wondering if since then someone wrote a script that will do this for me. >> >> >> >> If anyone could offer me any feedback, that would be greatly appreciated! >> >> >> >> Thank you, >> >> >> >> Allison >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From jacques.dainat at bils.se Tue Nov 1 10:08:45 2016 From: jacques.dainat at bils.se (Jacques Dainat) Date: Tue, 1 Nov 2016 17:08:45 +0100 Subject: [maker-devel] est_gff input does not provide any gene model In-Reply-To: References: Message-ID: <29E6299A-EA5F-4768-88CD-202ABB05AF89@bils.se> Thank you for the quick confirmation ! Just for clarification, what I provided to Maker was a correct gff3 file that indeed contain gene,mRNA,exon types but does not contain any CDS. I haven?t seen any information about the particular gff3 feature types expected for the est_gff files supplied. I think you should communicate more about it (within the maker_opt.ctl ?). It would be nice to stop the pipeline if the file provided contains no information. (When the file provided doesn?t exits too. The warning is not obvious to catch when launching on a cluster...) A last question. do the scores from the score column are used by MAKER from the est_gff file ? Jacques > On 01 Nov 2016, at 04:24, Carson Holt wrote: > > Evidence such as est_gff has to follow the alignment format used by GFF3 (i.e. match/match_part) whereas you are providing gene models (i.e. gene/mRNA/exon/CDS). Note that match/match_part are two level features whereas gene models are 3 levels. You need to reformat to match/match_part. > > ?Carson > > >> On Oct 31, 2016, at 4:51 AM, Jacques Dainat > wrote: >> >> Hello, >> >> I?m using usually Cufflinks output to feed Maker through the est_gff parameter, combined with the est2genome=1 parameter I get the wanted output. >> This time I used Stringtie output to feed Maker, but I don?t have any gene model predicted using the est2genome parameter. >> >> Any explanation ? Is it due to the gff3 format differences between these two file ? >> >> Cufflinks output example: >> Pnalgiovense_4592 Cufflinks match 363 977 17.844829 - . ID=1:s3_c1_r1.4.2;Name=1:s3_c1_r1.4.2; >> Pnalgiovense_4592 Cufflinks match_part 363 666 17.844829 - . ID=1:s3_c1_r1.4.2:exon-1;Name=1:s3_c1_r1.4.2;Parent=1:s3_c1_r1.4.2;Target=1:s3_c1_r1.4.2 1 304 +; >> Pnalgiovense_4592 Cufflinks match_part 743 977 17.844829 - . ID=1:s3_c1_r1.4.2:exon-2;Name=1:s3_c1_r1.4.2;Parent=1:s3_c1_r1.4.2;Target=1:s3_c1_r1.4.2 305 539 +; >> >> Stringtie output example: >> Pnalgiovense_112 StringTie gene 20 1256 1000 + . ID=HtMm_All.12253;cov=8.028295;fPKM=1.214491;gene_id=HtMm_All.12253;tPM=2.706611;transcript_id=HtMm_All.12253.1 >> Pnalgiovense_112 StringTie mRNA 20 1256 1000 + . ID=HtMm_All.12253.1;Parent=HtMm_All.12253;cov=8.028295;fPKM=1.214491;gene_id=HtMm_All.12253;tPM=2.706611;transcript_id=HtMm_All.12253.1 >> Pnalgiovense_112 StringTie exon 20 1256 1000 + . ID=HtMm_All.12253.1-exon-1;Parent=HtMm_All.12253.1;cov=8.028295;exon_number=1;gene_id=HtMm_All.12253;transcript_id=HtMm_All.12253.1 >> >> >> If it?s the Stringtie output that is problematic how can I fix it ? Removing gene, changing mRNA by match and exons by match_part is enough ? >> >> Best regards, >> >> >> Jacques Dainat, PhD >> NBIS (National Bioinformatics Infrastructure Sweden) >> Genome Annotation Service >> >> Address: (room E10:4204 - last floor) >> Uppsala University, BMC >> Department of Medical Biochemistry Microbiology, Genomics >> Husargatan 3, box 582 >> S-75123 Uppsala Sweden >> Phone: 01 84 71 46 25 >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Nov 1 10:25:36 2016 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 1 Nov 2016 10:25:36 -0600 Subject: [maker-devel] est_gff input does not provide any gene model In-Reply-To: <29E6299A-EA5F-4768-88CD-202ABB05AF89@bils.se> References: <29E6299A-EA5F-4768-88CD-202ABB05AF89@bils.se> Message-ID: <923C15DF-D705-416C-BCB8-CB87F1309797@gmail.com> The score will be ignored. The format to be used for evidence alignments is specified in the GFF3 spec (https://github.com/The-Sequence-Ontology/Specifications/blob/master/gff3.md ). An EST alignment example is also given as part of the GFF3 Spec. ?Carson > On Nov 1, 2016, at 10:08 AM, Jacques Dainat wrote: > > Thank you for the quick confirmation ! > > Just for clarification, what I provided to Maker was a correct gff3 file that indeed contain gene,mRNA,exon types but does not contain any CDS. > > I haven?t seen any information about the particular gff3 feature types expected for the est_gff files supplied. I think you should communicate more about it (within the maker_opt.ctl ?). > It would be nice to stop the pipeline if the file provided contains no information. (When the file provided doesn?t exits too. The warning is not obvious to catch when launching on a cluster...) > > A last question. do the scores from the score column are used by MAKER from the est_gff file ? > > Jacques > >> On 01 Nov 2016, at 04:24, Carson Holt > wrote: >> >> Evidence such as est_gff has to follow the alignment format used by GFF3 (i.e. match/match_part) whereas you are providing gene models (i.e. gene/mRNA/exon/CDS). Note that match/match_part are two level features whereas gene models are 3 levels. You need to reformat to match/match_part. >> >> ?Carson >> >> >>> On Oct 31, 2016, at 4:51 AM, Jacques Dainat > wrote: >>> >>> Hello, >>> >>> I?m using usually Cufflinks output to feed Maker through the est_gff parameter, combined with the est2genome=1 parameter I get the wanted output. >>> This time I used Stringtie output to feed Maker, but I don?t have any gene model predicted using the est2genome parameter. >>> >>> Any explanation ? Is it due to the gff3 format differences between these two file ? >>> >>> Cufflinks output example: >>> Pnalgiovense_4592 Cufflinks match 363 977 17.844829 - . ID=1:s3_c1_r1.4.2;Name=1:s3_c1_r1.4.2; >>> Pnalgiovense_4592 Cufflinks match_part 363 666 17.844829 - . ID=1:s3_c1_r1.4.2:exon-1;Name=1:s3_c1_r1.4.2;Parent=1:s3_c1_r1.4.2;Target=1:s3_c1_r1.4.2 1 304 +; >>> Pnalgiovense_4592 Cufflinks match_part 743 977 17.844829 - . ID=1:s3_c1_r1.4.2:exon-2;Name=1:s3_c1_r1.4.2;Parent=1:s3_c1_r1.4.2;Target=1:s3_c1_r1.4.2 305 539 +; >>> >>> Stringtie output example: >>> Pnalgiovense_112 StringTie gene 20 1256 1000 + . ID=HtMm_All.12253;cov=8.028295;fPKM=1.214491;gene_id=HtMm_All.12253;tPM=2.706611;transcript_id=HtMm_All.12253.1 >>> Pnalgiovense_112 StringTie mRNA 20 1256 1000 + . ID=HtMm_All.12253.1;Parent=HtMm_All.12253;cov=8.028295;fPKM=1.214491;gene_id=HtMm_All.12253;tPM=2.706611;transcript_id=HtMm_All.12253.1 >>> Pnalgiovense_112 StringTie exon 20 1256 1000 + . ID=HtMm_All.12253.1-exon-1;Parent=HtMm_All.12253.1;cov=8.028295;exon_number=1;gene_id=HtMm_All.12253;transcript_id=HtMm_All.12253.1 >>> >>> >>> If it?s the Stringtie output that is problematic how can I fix it ? Removing gene, changing mRNA by match and exons by match_part is enough ? >>> >>> Best regards, >>> >>> >>> Jacques Dainat, PhD >>> NBIS (National Bioinformatics Infrastructure Sweden) >>> Genome Annotation Service >>> >>> Address: (room E10:4204 - last floor) >>> Uppsala University, BMC >>> Department of Medical Biochemistry Microbiology, Genomics >>> Husargatan 3, box 582 >>> S-75123 Uppsala Sweden >>> Phone: 01 84 71 46 25 >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mohamed.amine.chebbi at univ-poitiers.fr Wed Nov 2 12:09:54 2016 From: mohamed.amine.chebbi at univ-poitiers.fr (Mohamed Amine Chebbi) Date: Wed, 2 Nov 2016 19:09:54 +0100 (CET) Subject: [maker-devel] ProtExcluder1.2 Error Message-ID: <236415532.6267908.1478110194546.JavaMail.zimbra@univ-poitiers.fr> Hi! I am working on creating a custom repeat library and I want to use ProtExcluder1.2 to trim potential genes from my repeat sequences. My blast version is BLAST 2.2.30+ I get this message error : Can not open the seqfile test.lib_blast_results.txt.fnolowm50seq mergeunmatchedregion.pl seqfile Illegal division by zero at ProtExcluder1.2/GCcontent.pl line 122. I wonder if you can help me to fix this. Thank you. Amine -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.s.campbell1 at gmail.com Thu Nov 3 11:57:35 2016 From: michael.s.campbell1 at gmail.com (Michael Campbell) Date: Thu, 3 Nov 2016 13:57:35 -0400 Subject: [maker-devel] ProtExcluder1.2 Error In-Reply-To: <236415532.6267908.1478110194546.JavaMail.zimbra@univ-poitiers.fr> References: <236415532.6267908.1478110194546.JavaMail.zimbra@univ-poitiers.fr> Message-ID: Hi Amine, That script is maintained by Ning Jiang and Kevin Childs. They know best what this script is expecting. I?ve ccd them on this email in the hope that they can provide some direction. Thanks, Mike > On Nov 2, 2016, at 2:09 PM, Mohamed Amine Chebbi wrote: > > Hi! > > I am working on creating a custom repeat library and I want to use ProtExcluder1.2 to trim potential genes from my repeat sequences. > My blast version is BLAST 2.2.30+ > > I get this message error : > > Can not open the seqfile test.lib_blast_results.txt.fnolowm50seq > mergeunmatchedregion.pl seqfile > Illegal division by zero at ProtExcluder1.2/GCcontent.pl line 122. > > I wonder if you can help me to fix this. > > Thank you. > > Amine > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From psh65 at cornell.edu Thu Nov 3 12:14:17 2016 From: psh65 at cornell.edu (Prashant S Hosmani) Date: Thu, 3 Nov 2016 18:14:17 +0000 Subject: [maker-devel] ProtExcluder1.2 Error In-Reply-To: References: <236415532.6267908.1478110194546.JavaMail.zimbra@univ-poitiers.fr> Message-ID: Hi Amine, I was getting similar error. You need to be careful with the blast versions. Try using the same blast version for makeblastdb. I was using BLAST 2.2.29+. After recreating new blast database with same version, it worked for me. Hope this helps. Prashant Prashant Hosmani Sol Genomics Network Boyce Thompson Institute, Ithaca, NY, USA On Nov 3, 2016, at 1:57 PM, Michael Campbell > wrote: Hi Amine, That script is maintained by Ning Jiang and Kevin Childs. They know best what this script is expecting. I?ve ccd them on this email in the hope that they can provide some direction. Thanks, Mike On Nov 2, 2016, at 2:09 PM, Mohamed Amine Chebbi > wrote: Hi! I am working on creating a custom repeat library and I want to use ProtExcluder1.2 to trim potential genes from my repeat sequences. My blast version is BLAST 2.2.30+ I get this message error : Can not open the seqfile test.lib_blast_results.txt.fnolowm50seq mergeunmatchedregion.pl seqfile Illegal division by zero at ProtExcluder1.2/GCcontent.pl line 122. I wonder if you can help me to fix this. Thank you. Amine _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From scott at scottcain.net Fri Nov 4 13:25:02 2016 From: scott at scottcain.net (Scott Cain) Date: Fri, 4 Nov 2016 15:25:02 -0400 Subject: [maker-devel] Last Call for GMOD talks at PAG Message-ID: Time is short! If you want to attend PAG and would like to present on a topic that would be of interest to the GMOD community, please send an abstract or at least a descriptive title to help at gmod.org. Types of talks typically include updates on GMOD software projects, usage stories for successful sites, proposals for new GMOD projects and descriptions of plugins for existing GMOD software projects like Tripal , JBrowse and Galaxy . Please consider giving a talk and sharing your experience and ideas! Scott -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research -------------- next part -------------- An HTML attachment was scrubbed... URL: From mohamed.amine.chebbi at univ-poitiers.fr Thu Nov 3 17:40:18 2016 From: mohamed.amine.chebbi at univ-poitiers.fr (chebbi mohamed amine) Date: Fri, 4 Nov 2016 00:40:18 +0100 (CET) Subject: [maker-devel] ProtExcluder1.2 Error In-Reply-To: <20161103185405.183337t1yq0no6x9@mail.msu.edu> References: <236415532.6267908.1478110194546.JavaMail.zimbra@univ-poitiers.fr> <20161103185405.183337t1yq0no6x9@mail.msu.edu> Message-ID: <1641376945.6912938.1478216418712.JavaMail.zimbra@univ-poitiers.fr> Hi ! Thank you Prashant for sharing your experience. Indeed using the same blast version 2.2.29 for makeblastdb seems to resolve the problem. It is looking to work fine for all the sequences except one as I have the message above: Fatal exception (source file ../../easel/esl_sqio_ascii.c, line 2001): Failed to fetch subsequence residues -- corrupt coords? sh: line 1: 46520 Aborted (core dumped) /hmmer-3.1b2-linux-intel-x86_64/binaries/esl-sfetch -c 1242..19031 all-te.lib rnd-4_family-1731#DNA >> blastx_results-all-te.txt.fnolowm50seq Did you encounter this problem before? Thank you for your help. Amine De: jiangn at msu.edu ?: "Prashant S Hosmani" Cc: "Michael Campbell" , "Mohamed Amine Chebbi" Envoy?: Jeudi 3 Novembre 2016 23:54:05 Objet: Re: [maker-devel] ProtExcluder1.2 Error Hi Prashant, Thank you so much for sharing your experience. It is important to keep everything in the same version. I will remind users about this when we update it and I may need to bother you then. Best regards, Ning Quoting Prashant S Hosmani : > Hi Amine, > > I was getting similar error. You need to be careful with the blast > versions. Try using the same blast version for makeblastdb. I was > using BLAST 2.2.29+. After recreating new blast database with same > version, it worked for me. > > Hope this helps. > Prashant > > > Prashant Hosmani > Sol Genomics Network > Boyce Thompson Institute, Ithaca, NY, USA > > > > On Nov 3, 2016, at 1:57 PM, Michael Campbell > > > wrote: > > Hi Amine, > > That script is maintained by Ning Jiang and Kevin Childs. They know > best what this script is expecting. I?ve ccd them on this email in > the hope that they can provide some direction. > > Thanks, > Mike > On Nov 2, 2016, at 2:09 PM, Mohamed Amine Chebbi > > > wrote: > > Hi! > > I am working on creating a custom repeat library and I want to use > ProtExcluder1.2 to trim potential genes from my repeat sequences. > My blast version is BLAST 2.2.30+ > > I get this message error : > > Can not open the seqfile test.lib_blast_results.txt.fnolowm50seq > mergeunmatchedregion.pl seqfile > Illegal division by zero at ProtExcluder1.2/GCcontent.pl line 122. > > I wonder if you can help me to fix this. > > Thank you. > > Amine > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mohamed.amine.chebbi at univ-poitiers.fr Fri Nov 4 04:44:02 2016 From: mohamed.amine.chebbi at univ-poitiers.fr (chebbi mohamed amine) Date: Fri, 4 Nov 2016 11:44:02 +0100 (CET) Subject: [maker-devel] ProtExcluder1.2 Error In-Reply-To: <1827195032.6913929.1478217820889.JavaMail.zimbra@univ-poitiers.fr> References: <236415532.6267908.1478110194546.JavaMail.zimbra@univ-poitiers.fr> <20161103185405.183337t1yq0no6x9@mail.msu.edu> <1641376945.6912938.1478216418712.JavaMail.zimbra@univ-poitiers.fr> <20161103195409.76212s1yy72mv95t@mail.msu.edu> <1827195032.6913929.1478217820889.JavaMail.zimbra@univ-poitiers.fr> Message-ID: <838628537.7128959.1478256242111.JavaMail.zimbra@univ-poitiers.fr> Hi J iangn ! I did some modifications in the script ProtExcluder1.2/mspesl-sfetch.pl by replacing : "esl-sfetch --index $ARGV[0] " by "samtools faidx $ARGV[0]" and "esl-sfetch -c $from..$to $ARGV[0] $line[7] >> $ARGV[3]" by "samtools faidx $ARGV[0] $line[7]:$from-$to >> $ARGV[3]" it works fine know and the script can extract the subsequences correctly. Best regard, Amine De: "chebbi mohamed amine" ?: "jiangn" Envoy?: Vendredi 4 Novembre 2016 01:03:40 Objet: Re: [maker-devel] ProtExcluder1.2 Error Hi J iangn In fact, this sequence has a size of 19031 bases. When I try the command /hmmer-3.1b2-linux-intel-x86_64/binaries/esl-sfetch -c 1242..19031 all-te.lib rnd-4_family-1731#DNA I get the error , however by testing with coordiantes inferior to 19031 it works fine. I think that it's a related problem to hmmer. I will try to add manualy the subsequence to the file .fnolowm50seq. Thank you Amine De: "jiangn" ?: "chebbi mohamed amine" Cc: "Prashant S Hosmani" , "Michael Campbell" Envoy?: Vendredi 4 Novembre 2016 00:54:09 Objet: Re: [maker-devel] ProtExcluder1.2 Error Hi Amine, I don't have this kind of experience. If only one sequence failed, I would suspect there might be some format issue for that specific sequence. Regards, Ning Quoting chebbi mohamed amine : > > Hi ! > > Thank you Prashant for sharing your experience. Indeed using the same > blast version 2.2.29 for makeblastdb seems to resolve the problem. It > is looking to work fine for all the sequences except one as I have > the message above: > > Fatal exception (source file ../../easel/esl_sqio_ascii.c, line 2001): > Failed to fetch subsequence residues -- corrupt coords? > sh: line 1: 46520 Aborted (core dumped) > /hmmer-3.1b2-linux-intel-x86_64/binaries/esl-sfetch -c 1242..19031 > all-te.lib rnd-4_family-1731#DNA >> > blastx_results-all-te.txt.fnolowm50seq > > Did you encounter this problem before? > > Thank you for your help. > > Amine > > > De: jiangn at msu.edu > ?: "Prashant S Hosmani" > Cc: "Michael Campbell" , "Mohamed > Amine Chebbi" > Envoy?: Jeudi 3 Novembre 2016 23:54:05 > Objet: Re: [maker-devel] ProtExcluder1.2 Error > > > > Hi Prashant, > > Thank you so much for sharing your experience. It is important to > keep everything in the same version. I will remind users about this > when we update it and I may need to bother you then. > > Best regards, > > Ning > > Quoting Prashant S Hosmani : > >> Hi Amine, >> >> I was getting similar error. You need to be careful with the blast >> versions. Try using the same blast version for makeblastdb. I was >> using BLAST 2.2.29+. After recreating new blast database with same >> version, it worked for me. >> >> Hope this helps. >> Prashant >> >> >> Prashant Hosmani >> Sol Genomics Network >> Boyce Thompson Institute, Ithaca, NY, USA >> >> >> >> On Nov 3, 2016, at 1:57 PM, Michael Campbell >> > >> wrote: >> >> Hi Amine, >> >> That script is maintained by Ning Jiang and Kevin Childs. They know >> best what this script is expecting. I?ve ccd them on this email in >> the hope that they can provide some direction. >> >> Thanks, >> Mike >> On Nov 2, 2016, at 2:09 PM, Mohamed Amine Chebbi >> > >> wrote: >> >> Hi! >> >> I am working on creating a custom repeat library and I want to use >> ProtExcluder1.2 to trim potential genes from my repeat sequences. >> My blast version is BLAST 2.2.30+ >> >> I get this message error : >> >> Can not open the seqfile test.lib_blast_results.txt.fnolowm50seq >> mergeunmatchedregion.pl seqfile >> Illegal division by zero at ProtExcluder1.2/GCcontent.pl line 122. >> >> I wonder if you can help me to fix this. >> >> Thank you. >> >> Amine >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jiangn at msu.edu Fri Nov 4 14:35:43 2016 From: jiangn at msu.edu (jiangn at msu.edu) Date: Fri, 04 Nov 2016 16:35:43 -0400 Subject: [maker-devel] ProtExcluder1.2 Error In-Reply-To: <838628537.7128959.1478256242111.JavaMail.zimbra@univ-poitiers.fr> References: <236415532.6267908.1478110194546.JavaMail.zimbra@univ-poitiers.fr> <20161103185405.183337t1yq0no6x9@mail.msu.edu> <1641376945.6912938.1478216418712.JavaMail.zimbra@univ-poitiers.fr> <20161103195409.76212s1yy72mv95t@mail.msu.edu> <1827195032.6913929.1478217820889.JavaMail.zimbra@univ-poitiers.fr> <838628537.7128959.1478256242111.JavaMail.zimbra@univ-poitiers.fr> Message-ID: <20161104163543.98626jb6y81eis67@mail.msu.edu> Hi Amine, That's good to know. Thank you! Ning Quoting chebbi mohamed amine : > Hi J iangn ! > > I did some modifications in the script > ProtExcluder1.2/mspesl-sfetch.pl by replacing : > > "esl-sfetch --index $ARGV[0] " by "samtools faidx $ARGV[0]" > and > "esl-sfetch -c $from..$to $ARGV[0] $line[7] >> $ARGV[3]" by "samtools > faidx $ARGV[0] $line[7]:$from-$to >> $ARGV[3]" > > it works fine know and the script can extract the subsequences correctly. > > Best regard, > Amine > > > De: "chebbi mohamed amine" > ?: "jiangn" > Envoy?: Vendredi 4 Novembre 2016 01:03:40 > Objet: Re: [maker-devel] ProtExcluder1.2 Error > > Hi J iangn > > In fact, this sequence has a size of 19031 bases. > When I try the command > /hmmer-3.1b2-linux-intel-x86_64/binaries/esl-sfetch -c 1242..19031 > all-te.lib rnd-4_family-1731#DNA I get the error , however by testing > with coordiantes inferior to 19031 it works fine. I think that it's a > related problem to hmmer. I will try to add manualy the subsequence > to the file .fnolowm50seq. > > Thank you > Amine > > De: "jiangn" > ?: "chebbi mohamed amine" > Cc: "Prashant S Hosmani" , "Michael Campbell" > > Envoy?: Vendredi 4 Novembre 2016 00:54:09 > Objet: Re: [maker-devel] ProtExcluder1.2 Error > > > > Hi Amine, > > I don't have this kind of experience. If only one sequence failed, I > would suspect there might be some format issue for that specific > sequence. > > Regards, > > Ning > > Quoting chebbi mohamed amine : > >> >> Hi ! >> >> Thank you Prashant for sharing your experience. Indeed using the same >> blast version 2.2.29 for makeblastdb seems to resolve the problem. It >> is looking to work fine for all the sequences except one as I have >> the message above: >> >> Fatal exception (source file ../../easel/esl_sqio_ascii.c, line 2001): >> Failed to fetch subsequence residues -- corrupt coords? >> sh: line 1: 46520 Aborted (core dumped) >> /hmmer-3.1b2-linux-intel-x86_64/binaries/esl-sfetch -c 1242..19031 >> all-te.lib rnd-4_family-1731#DNA >> >> blastx_results-all-te.txt.fnolowm50seq >> >> Did you encounter this problem before? >> >> Thank you for your help. >> >> Amine >> >> >> De: jiangn at msu.edu >> ?: "Prashant S Hosmani" >> Cc: "Michael Campbell" , "Mohamed >> Amine Chebbi" >> Envoy?: Jeudi 3 Novembre 2016 23:54:05 >> Objet: Re: [maker-devel] ProtExcluder1.2 Error >> >> >> >> Hi Prashant, >> >> Thank you so much for sharing your experience. It is important to >> keep everything in the same version. I will remind users about this >> when we update it and I may need to bother you then. >> >> Best regards, >> >> Ning >> >> Quoting Prashant S Hosmani : >> >>> Hi Amine, >>> >>> I was getting similar error. You need to be careful with the blast >>> versions. Try using the same blast version for makeblastdb. I was >>> using BLAST 2.2.29+. After recreating new blast database with same >>> version, it worked for me. >>> >>> Hope this helps. >>> Prashant >>> >>> >>> Prashant Hosmani >>> Sol Genomics Network >>> Boyce Thompson Institute, Ithaca, NY, USA >>> >>> >>> >>> On Nov 3, 2016, at 1:57 PM, Michael Campbell >>> > >>> wrote: >>> >>> Hi Amine, >>> >>> That script is maintained by Ning Jiang and Kevin Childs. They know >>> best what this script is expecting. I?ve ccd them on this email in >>> the hope that they can provide some direction. >>> >>> Thanks, >>> Mike >>> On Nov 2, 2016, at 2:09 PM, Mohamed Amine Chebbi >>> > >>> wrote: >>> >>> Hi! >>> >>> I am working on creating a custom repeat library and I want to use >>> ProtExcluder1.2 to trim potential genes from my repeat sequences. >>> My blast version is BLAST 2.2.30+ >>> >>> I get this message error : >>> >>> Can not open the seqfile test.lib_blast_results.txt.fnolowm50seq >>> mergeunmatchedregion.pl seqfile >>> Illegal division by zero at ProtExcluder1.2/GCcontent.pl line 122. >>> >>> I wonder if you can help me to fix this. >>> >>> Thank you. >>> >>> Amine >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> >> >> > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pierre-francois.bert at inra.fr Tue Nov 8 05:13:55 2016 From: pierre-francois.bert at inra.fr (Pierre-Francois Bert) Date: Tue, 8 Nov 2016 12:13:55 +0000 Subject: [maker-devel] Maker-P Message-ID: <1478607235425.40152@inra.fr> Hello, I'm interested in using maker-p but I can't find it within the last version 3 and neither find v2.29 to download. Can your please tell me how to proceed ? Best wishes. Pierre-Fran?ois Bert -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Nov 9 12:00:08 2016 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 9 Nov 2016 12:00:08 -0700 Subject: [maker-devel] Maker-P In-Reply-To: <1478607235425.40152@inra.fr> References: <1478607235425.40152@inra.fr> Message-ID: MAKER-P?s features and accessory scripts were integrated into MAKER with versions 2.29 and above as stated on the MAKER-P page. There is no longer a separate MAKER-P download and it is not a separate executable. You just download MAKER 2.29 or above and run .../maker/bin/maker ?Carson > On Nov 8, 2016, at 5:13 AM, Pierre-Francois Bert wrote: > > Hello, > I'm interested in using maker-p but I can't find it within the last version 3 and neither find v2.29 to download. > Can your please tell me how to proceed ? > Best wishes. > Pierre-Fran?ois Bert > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From jcornel3 at asu.edu Thu Nov 10 15:43:56 2016 From: jcornel3 at asu.edu (John Cornelius) Date: Thu, 10 Nov 2016 15:43:56 -0700 Subject: [maker-devel] Error running MAKER Message-ID: Hello, I'm using MAKER to annotate a tetraploid genome and while running it, I encountered the following error: #--------- command -------------# Widget::exonerate::est2genome: /packages/exonerate-2.2.0/bin/exonerate -q /tmp/maker_08Elxf/15/TRINITY_GG_19079_c1670_g1_i1.for.84770203-84771247.15.fasta -t /tmp/maker_08Elxf/15/chr9_10L.84770203-84771247.15.fasta -Q dna -T dna --model est2genome --minintron 20 --maxintron 10000 --showcigar --percent 20 > /tmp/maker_08Elxf/15/chr9_10L.84770203-84771247.TRINITY_GG_19079_c1670_g1_i1.e.exonerate #-------------------------------# running est2genome search. #--------- command -------------# Widget::exonerate::est2genome: /packages/exonerate-2.2.0/bin/exonerate -q /tmp/maker_08Elxf/10/TRINITY_GG_87963_c9694_g10_i12.for.49475083-49475985.10.fasta -t /tmp/maker_08Elxf/10/chr6L.49475083-49475985.10.fasta -Q dna -T dna --model est2genome --minintron 20 --maxintron 10000 --showcigar --percent 20 > /tmp/maker_08Elxf/10/chr6L.49475083-49475985.TRINITY_GG_87963_c9694_g10_i12.e.exonerate #-------------------------------# =================================================================================== = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES = PID 132376 RUNNING AT pnap-pe7-s03 = EXIT CODE: 135 = CLEANING UP REMAINING PROCESSES = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES =================================================================================== YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Bus error (signal 7) This typically refers to a problem with your application. Please see the FAQ page for debugging suggestions The the command I ran was the following: #PBS -l walltime=240:00:00 #PBS -N MAKER #PBS -l nodes=1:ppn=16 ##PBS -q hmem #PBS -j oe #PBS -m abe #PBS -M jcornelius at tgen.org #PBS -A tgen-205000 #PBS -o /scratch/jcornelius/xenopus_laevis/maker_run # --- load required modules --- # module load maker # --- run maker --- # cd /scratch/jcornelius/xenopus_laevis/maker_run mpiexec -n 16 maker -base XLNEURO.run1 -fix_nucleotides I'm not sure what could be causing this error and any help would be much appreciated. Thanks. -- John Cornelius MCB PhD Candidate Arizona State University -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Nov 11 14:59:54 2016 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 11 Nov 2016 14:59:54 -0700 Subject: [maker-devel] Error running MAKER In-Reply-To: References: Message-ID: <478D5289-91FD-4F3A-AED1-B2A81A742D43@gmail.com> The cause of the error is probably further back in the STDERR. With MPI so many processes are producing status and notes, that you can get several seconds of output after ta failure. If you kept the whole STDERR, I can help you look through it. searching for ?ERROR? all caps is usually where you will see it. Also MAKER keeps a log of progress, so even on failure, you can just restart it and it will pick up the analysis from the last successful step. ?Carson > On Nov 10, 2016, at 3:43 PM, John Cornelius wrote: > > Hello, I'm using MAKER to annotate a tetraploid genome and while running it, I encountered the following error: > > #--------- command -------------# > Widget::exonerate::est2genome: > /packages/exonerate-2.2.0/bin/exonerate -q /tmp/maker_08Elxf/15/TRINITY_GG_19079_c1670_g1_i1.for.84770203-84771247.15.fasta -t /tmp/maker_08Elxf/15/chr9_10L.84770203-84771247.15.fasta -Q dna -T dna --model est2genome --minintron 20 --maxintron 10000 --showcigar --percent 20 > /tmp/maker_08Elxf/15/chr9_10L.84770203-84771247.TRINITY_GG_19079_c1670_g1_i1.e.exonerate > #-------------------------------# > running est2genome search. > #--------- command -------------# > Widget::exonerate::est2genome: > /packages/exonerate-2.2.0/bin/exonerate -q /tmp/maker_08Elxf/10/TRINITY_GG_87963_c9694_g10_i12.for.49475083-49475985.10.fasta -t /tmp/maker_08Elxf/10/chr6L.49475083-49475985.10.fasta -Q dna -T dna --model est2genome --minintron 20 --maxintron 10000 --showcigar --percent 20 > /tmp/maker_08Elxf/10/chr6L.49475083-49475985.TRINITY_GG_87963_c9694_g10_i12.e.exonerate > #-------------------------------# > > =================================================================================== > = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES > = PID 132376 RUNNING AT pnap-pe7-s03 > = EXIT CODE: 135 > = CLEANING UP REMAINING PROCESSES > = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES > =================================================================================== > YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Bus error (signal 7) > This typically refers to a problem with your application. > Please see the FAQ page for debugging suggestions > > The the command I ran was the following: > > #PBS -l walltime=240:00:00 > #PBS -N MAKER > #PBS -l nodes=1:ppn=16 > ##PBS -q hmem > #PBS -j oe > #PBS -m abe > #PBS -M jcornelius at tgen.org > #PBS -A tgen-205000 > #PBS -o /scratch/jcornelius/xenopus_laevis/maker_run > > # --- load required modules --- # > > module load maker > > # --- run maker --- # > > cd /scratch/jcornelius/xenopus_laevis/maker_run > mpiexec -n 16 maker -base XLNEURO.run1 -fix_nucleotides > > I'm not sure what could be causing this error and any help would be much appreciated. Thanks. > -- > John Cornelius > MCB PhD Candidate > Arizona State University > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From lmeunier at ulg.ac.be Mon Nov 14 01:50:50 2016 From: lmeunier at ulg.ac.be (=?UTF-8?B?TG/Dr2M=?=) Date: Mon, 14 Nov 2016 09:50:50 +0100 Subject: [maker-devel] Predictions without evidence Message-ID: Hello, I am a Ph. D. student, and I am using MAKER to automate gene prediction for many genomes as part of a genome mining work, so I don't include evidence for its use. If I understood well, when exploiting multiple gene predictor softwares, AED is used to define the prediction which matches the best the evidence. So, as I don't use evidence, is there a choice made by MAKER when working with multiple gene predictors? If yes, how does it work? Also, I have not well understood, if the selection of the gene predictor to use is made for every gene? Sorry to asking if the answer is obvious, but after reading your papers and looking on the archived posts, I have not found the answer. By the way, I have also a question about your paper on MAKER2 (Holt and Yandell, 2011). It is said many times that gene predictors used in MAKER pipeline give better results than when used alone, but I have not understand why. Can you explain this fact? Best regards, Lo?c Meunier From jacques.dainat at bils.se Mon Nov 14 01:55:06 2016 From: jacques.dainat at bils.se (Jacques Dainat) Date: Mon, 14 Nov 2016 09:55:06 +0100 Subject: [maker-devel] strand of single exon EST from fasta Message-ID: <2E91C252-D244-47A2-B896-99EE0F69EBBA@bils.se> Hello, I?m annotating several strains of a same fungus, and I have stranded RNAseq for all of them. I?m using MAKER3. Let?s say I?m annotating the species1 using its species-specific assembled transcripts that are in gff. I know that MAKER cannot do anything about the strand coming from the est_gff. In order to check that everything went fine during my transcriptome assembly and the strands correctly defined, I checked the annotation within a browser. I can see the strands from my transcripts in gff format were perfect (match with the proteins strands / and with abinitio prediction strands / and ORFs are OK). As I wanted to take advantage on my other strains RNAseq I decided to use them within this annotation. As the transcriptome assemblies of these RNAseq have been done based on their corresponding genomes, I cannot use the gff files. Indeed, the location are not corresponding to the genome of my species1. So I decided to extract the sequences in fasta format to feed MAKER with (alt_est parameter). When I visualise those transcript alignements I was really surprised by the strands decided by MAKER. It seems completely random, while all the est fasta sequences from a same locus are given in the same strand. So, I have two questions: 1) How the strand is decided for single exon EST provided in fasta format ? (I thought it was based on the longest ORF) 2) Is it normal that the second annotation using these alt_est is worse (far less gene models) than the previous one ? (I thought the strand of my single exon alt_ests would not play a role during the the annotation process. Or maybe it?s another biais from these alt_est => loci less well defined ?) Here 3 examples: The top green track has the correct strand and is based on the gff file. The bottom green cluster tracks are fasta sequences from the other strains aligned through MAKER. (I dont?t know if it could play a role but all sequences from a same locus have been sent to MAKER in the same strand). Thank you very much for your help, Jacques Dainat -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Screen Shot 2016-11-13 at 13.05.24.png Type: image/png Size: 52019 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Screen Shot 2016-11-13 at 13.05.44.png Type: image/png Size: 26966 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Screen Shot 2016-11-13 at 13.07.13.png Type: image/png Size: 24338 bytes Desc: not available URL: From carsonhh at gmail.com Mon Nov 14 13:08:13 2016 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 14 Nov 2016 13:08:13 -0700 Subject: [maker-devel] strand of single exon EST from fasta In-Reply-To: <2E91C252-D244-47A2-B896-99EE0F69EBBA@bils.se> References: <2E91C252-D244-47A2-B896-99EE0F69EBBA@bils.se> Message-ID: Single exon EST and alt-EST strand are based on longest ORF. In the event that there is a tie, then whatever strand that was assigned by the aligner would be maintained. alt-ESTs are less likely to align or produce a model than the ESTs. If you have competing models on opposite strands for the same CDS, then support from ab initio, spliced EST, or exonerate protein alignments will be needed for the model. ?Carson > On Nov 14, 2016, at 1:55 AM, Jacques Dainat wrote: > > Hello, > > I?m annotating several strains of a same fungus, and I have stranded RNAseq for all of them. I?m using MAKER3. > Let?s say I?m annotating the species1 using its species-specific assembled transcripts that are in gff. I know that MAKER cannot do anything about the strand coming from the est_gff. In order to check that everything went fine during my transcriptome assembly and the strands correctly defined, I checked the annotation within a browser. I can see the strands from my transcripts in gff format were perfect (match with the proteins strands / and with abinitio prediction strands / and ORFs are OK). > > As I wanted to take advantage on my other strains RNAseq I decided to use them within this annotation. As the transcriptome assemblies of these RNAseq have been done based on their corresponding genomes, I cannot use the gff files. Indeed, the location are not corresponding to the genome of my species1. So I decided to extract the sequences in fasta format to feed MAKER with (alt_est parameter). > When I visualise those transcript alignements I was really surprised by the strands decided by MAKER. It seems completely random, while all the est fasta sequences from a same locus are given in the same strand. > > So, I have two questions: > 1) How the strand is decided for single exon EST provided in fasta format ? (I thought it was based on the longest ORF) > 2) Is it normal that the second annotation using these alt_est is worse (far less gene models) than the previous one ? (I thought the strand of my single exon alt_ests would not play a role during the the annotation process. Or maybe it?s another biais from these alt_est => loci less well defined ?) > > > > Here 3 examples: The top green track has the correct strand and is based on the gff file. The bottom green cluster tracks are fasta sequences from the other strains aligned through MAKER. (I dont?t know if it could play a role but all sequences from a same locus have been sent to MAKER in the same strand). > > > Thank you very much for your help, > > Jacques Dainat > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Nov 14 13:18:26 2016 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 14 Nov 2016 13:18:26 -0700 Subject: [maker-devel] Predictions without evidence In-Reply-To: References: Message-ID: <7BDEAAF4-230C-4315-B353-43381237BCB0@gmail.com> Gene predictors have to be trained on each organism to generate a matched HMM. If they are not trained, they will not work well. MAKER also sends hints to the predictor based on the evidence alignments to further alter probabilities used by the predictor to better match the evidence. Evidence is also used in final filtering. All models without evidence will have an AED of 1, which means no support. Not using evidence will result in very poor models especially if you don?t have an HMM built exactly for the organism. The main problem will be over prediction. Note the behavior of SNAP alone in the MAKER2 paper. The result is tens of thousands of false positive gene models. If you only run multiple gene predictors without evidence, the final model will be whatever model has the best consensus structure for the set. If the set consists of two models, then there is no consensus and the longest one is kept. ?Carson > On Nov 14, 2016, at 1:50 AM, Lo?c wrote: > > Hello, > > I am a Ph. D. student, and I am using MAKER to automate gene prediction for many genomes as part of a genome mining work, so I don't include evidence for its use. > If I understood well, when exploiting multiple gene predictor softwares, AED is used to define the prediction which matches the best the evidence. > > So, as I don't use evidence, is there a choice made by MAKER when working with multiple gene predictors? If yes, how does it work? > Also, I have not well understood, if the selection of the gene predictor to use is made for every gene? > > Sorry to asking if the answer is obvious, but after reading your papers and looking on the archived posts, I have not found the answer. > > By the way, I have also a question about your paper on MAKER2 (Holt and Yandell, 2011). It is said many times that gene predictors used in MAKER pipeline give better results than when used alone, but I have not understand why. Can you explain this fact? > > Best regards, > > Lo?c Meunier > > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Thu Nov 17 14:05:53 2016 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 17 Nov 2016 14:05:53 -0700 Subject: [maker-devel] About split genes problem in Maker annotations In-Reply-To: <75508AB460A77C4798EC49425637E292194A0DA6@PETREL-MA.imcb.a-star.edu.sg> References: <75508AB460A77C4798EC49425637E292194A0DA6@PETREL-MA.imcb.a-star.edu.sg> Message-ID: <36BBB195-EEB4-4B3A-9463-3E4171731390@gmail.com> est2genome and protein2genome should only be used for initial training. They are not predictors, rather they take an EST/protein alignment, find the longest ORF and then turn the ORF directly into a gene model. It is good enough to build a training dataset, but the models will almost always be partial and fragmented. Also because the alignments both produce and support themselves, they always score well, so their AED values are meaningless. Once you have a predictor trained, you should turn est2genome and protein2genome off. With a trained predictor, the alignments will then serve as hints to Augustus as to where likely introns/exons will be, and this will give the desired behavior. Note Augustus will attempt to build the most probable model given the hints and the assembly sequence. If there are any assembly issues affecting the ORF, the predictor will often skip exons or split the model in the locus. Also make sure you have built a species specific repeat library to add to the default repeat libraries used by MAKER (you can use tools like RepeatModeler to do this). Otherwise you will get spurious alignments of much of your evidecence and Augustus will generate false positive results. You may also want to add a large dataset like Uniprot/swiss-prot to the protein evidence. The best way to evaluate annotations and performance is to visually review annotation in tools like Apollo. It will allow you to see if evidence, gene predictions, and final models achieve consensus or if alignments don?t match (spurious alignment generally suggests a repeat masking issue or evidence quality issue) or if raw ab initio predictions don?t match (indicates insufficient training or an underlying assembly issues). ?Carson > On Nov 16, 2016, at 8:01 PM, Prashant Narendra SHINGATE wrote: > > Hi Carson, > > We are annotating the genome of a fish with a relatively small genome (~450Mb) using Maker and encountering many genes that are split and predicted as multiple genes. We are using Augustus for de novo prediction. Fortunately we have full-length RNAseq for about 4000 genes (and total ~50k transcripts) from the same species, and whole-genome protein sequences from a very closely related species. > > First we trained Augustus using ~4000 full length RNAseq transcript from the same species. This trained Augustus model was used in the Maker annotation pipeline along with ~50k RNAseq transcripts (>1000bp) and whole-genome proteins sequences from a closely related species. > > We first tried annotating using the options est2genome=1, protein2genome=1 and Augustus ON. We found several genes were split and the program seemed to give weight to Augustus prediction in spite of having full-length RNAseq and protein sequences aligned to the gene predicted loci (visualized using Jbrowser). > > In the next trial we used est2genome=1, protein2genome=1 and Augustus OFF in the first step. In the second step we did reiteration by est2genome=0, protein2genome=0 and Augustus ON. Still the output contained split genes. > > In the third trial we used est2genome=1, protein2genome=1 and Augustus OFF and checked the output. In this output full-length genes were predicted whenever full-length RNAseq and/or protein sequences were available. This seems to suggest that when we use Augustus, more weight is given to Augustus de novo prediction and the synthesis of evidence from RNAseq and protein sequences is not happening. > > Can you please let us know why we are getting split genes in spite of having full-length RNAseq and/or protein sequences? What changes would you suggest to the protocol to overcome this problem? > > We thank you very much for your help and time. > > Regards, > Prashant Shingate, PhD :: Research Fellow :: Comparative and Medical Genomics Lab :: Institute of Molecular and Cell Biology (IMCB) :: Agency for Science, Technology and Research (A*STAR) > 61 Biopolis Drive :: #05-04 Proteos :: Singapore 138673 :: DID (+65) 6586 9570 :: Fax (+65) 6779 1117:: http://www.imcb.a-star.edu.sg/ > We advance science and develop innovative technology to further economic growth and improve lives. > > > > > Note: This message may contain confidential information. If this Email/Fax has been sent to you by mistake, please notify the sender and delete it immediately. Thank you. -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Nov 17 21:04:31 2016 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 17 Nov 2016 21:04:31 -0700 Subject: [maker-devel] Error running MAKER In-Reply-To: References: <478D5289-91FD-4F3A-AED1-B2A81A742D43@gmail.com> Message-ID: <8E9C294A-B68C-42C1-999C-13165985AD93@gmail.com> To use less RAM, try lowering max_dna_len=, setting blast_depth= parameters to 20 pr 30 in maker_bopts.ctl (default is limitless), or when using MPI, starting fewer processes per node (requires manipulation of hostfile or using round robin distribution flag for MPI flavors where it is available). The memory issue could be causing the lock failure as well. ?Carson > On Nov 17, 2016, at 7:53 PM, John Cornelius wrote: > > Ok, so I went and searched one of the output logs for all the lines that say ERROR and I got 44 lines with the following message: > > ERROR: Lock broken in runlog > > With these lines found at the end: > > ERROR: Failed while polishig ESTs > ERROR: Chunk failed at level:2, tier_type:3 > ERROR: Could not query process table: Cannot allocate memory at /packages/maker/2.31.8/bin/../lib/Proc/ProcessTable_simple.pm line 62. > > From that last line it looks like the process is running out of RAM would that be right? Thanks. > > On Fri, Nov 11, 2016 at 2:59 PM, Carson Holt > wrote: > The cause of the error is probably further back in the STDERR. With MPI so many processes are producing status and notes, that you can get several seconds of output after ta failure. If you kept the whole STDERR, I can help you look through it. searching for ?ERROR? all caps is usually where you will see it. Also MAKER keeps a log of progress, so even on failure, you can just restart it and it will pick up the analysis from the last successful step. > > ?Carson > > >> On Nov 10, 2016, at 3:43 PM, John Cornelius > wrote: >> >> Hello, I'm using MAKER to annotate a tetraploid genome and while running it, I encountered the following error: >> >> #--------- command -------------# >> Widget::exonerate::est2genome: >> /packages/exonerate-2.2.0/bin/exonerate -q /tmp/maker_08Elxf/15/TRINITY_GG_19079_c1670_g1_i1.for.84770203-84771247.15.fasta -t /tmp/maker_08Elxf/15/chr9_10L.84770203-84771247.15.fasta -Q dna -T dna --model est2genome --minintron 20 --maxintron 10000 --showcigar --percent 20 > /tmp/maker_08Elxf/15/chr9_10L.84770203-84771247.TRINITY_GG_19079_c1670_g1_i1.e.exonerate >> #-------------------------------# >> running est2genome search. >> #--------- command -------------# >> Widget::exonerate::est2genome: >> /packages/exonerate-2.2.0/bin/exonerate -q /tmp/maker_08Elxf/10/TRINITY_GG_87963_c9694_g10_i12.for.49475083-49475985.10.fasta -t /tmp/maker_08Elxf/10/chr6L.49475083-49475985.10.fasta -Q dna -T dna --model est2genome --minintron 20 --maxintron 10000 --showcigar --percent 20 > /tmp/maker_08Elxf/10/chr6L.49475083-49475985.TRINITY_GG_87963_c9694_g10_i12.e.exonerate >> #-------------------------------# >> >> =================================================================================== >> = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES >> = PID 132376 RUNNING AT pnap-pe7-s03 >> = EXIT CODE: 135 >> = CLEANING UP REMAINING PROCESSES >> = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES >> =================================================================================== >> YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Bus error (signal 7) >> This typically refers to a problem with your application. >> Please see the FAQ page for debugging suggestions >> >> The the command I ran was the following: >> >> #PBS -l walltime=240:00:00 >> #PBS -N MAKER >> #PBS -l nodes=1:ppn=16 >> ##PBS -q hmem >> #PBS -j oe >> #PBS -m abe >> #PBS -M jcornelius at tgen.org >> #PBS -A tgen-205000 >> #PBS -o /scratch/jcornelius/xenopus_laevis/maker_run >> >> # --- load required modules --- # >> >> module load maker >> >> # --- run maker --- # >> >> cd /scratch/jcornelius/xenopus_laevis/maker_run >> mpiexec -n 16 maker -base XLNEURO.run1 -fix_nucleotides >> >> I'm not sure what could be causing this error and any help would be much appreciated. Thanks. >> -- >> John Cornelius >> MCB PhD Candidate >> Arizona State University >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > > -- > John Cornelius > MCB PhD Candidate > Arizona State University -------------- next part -------------- An HTML attachment was scrubbed... URL: From jcornel3 at asu.edu Fri Nov 18 12:14:52 2016 From: jcornel3 at asu.edu (John Cornelius) Date: Fri, 18 Nov 2016 12:14:52 -0700 Subject: [maker-devel] Error running MAKER In-Reply-To: <8E9C294A-B68C-42C1-999C-13165985AD93@gmail.com> References: <478D5289-91FD-4F3A-AED1-B2A81A742D43@gmail.com> <8E9C294A-B68C-42C1-999C-13165985AD93@gmail.com> Message-ID: Would the lock failure cause problems with the annotation? It looks like Maker is still progressing, just not as quickly as I thought it would be. On Thu, Nov 17, 2016 at 9:04 PM, Carson Holt wrote: > To use less RAM, try lowering max_dna_len=, setting blast_depth= > parameters to 20 pr 30 in maker_bopts.ctl (default is limitless), or when > using MPI, starting fewer processes per node (requires manipulation of > hostfile or using round robin distribution flag for MPI flavors where it is > available). > > The memory issue could be causing the lock failure as well. > > ?Carson > > > > On Nov 17, 2016, at 7:53 PM, John Cornelius wrote: > > Ok, so I went and searched one of the output logs for all the lines that > say ERROR and I got 44 lines with the following message: > > ERROR: Lock broken in runlog > > With these lines found at the end: > > ERROR: Failed while polishig ESTs > ERROR: Chunk failed at level:2, tier_type:3 > ERROR: Could not query process table: Cannot allocate memory at > /packages/maker/2.31.8/bin/../lib/Proc/ProcessTable_simple.pm line 62. > > From that last line it looks like the process is running out of RAM would > that be right? Thanks. > > On Fri, Nov 11, 2016 at 2:59 PM, Carson Holt wrote: > >> The cause of the error is probably further back in the STDERR. With MPI >> so many processes are producing status and notes, that you can get several >> seconds of output after ta failure. If you kept the whole STDERR, I can >> help you look through it. searching for ?ERROR? all caps is usually where >> you will see it. Also MAKER keeps a log of progress, so even on failure, >> you can just restart it and it will pick up the analysis from the last >> successful step. >> >> ?Carson >> >> >> On Nov 10, 2016, at 3:43 PM, John Cornelius wrote: >> >> Hello, I'm using MAKER to annotate a tetraploid genome and while running >> it, I encountered the following error: >> >> #--------- command -------------# >> Widget::exonerate::est2genome: >> /packages/exonerate-2.2.0/bin/exonerate -q >> /tmp/maker_08Elxf/15/TRINITY_GG_19079_c1670_g1_i1.for.84770203-84771247.15.fasta >> -t /tmp/maker_08Elxf/15/chr9_10L.84770203-84771247.15.fasta -Q dna -T >> dna --model est2genome --minintron 20 --maxintron 10000 --showcigar >> --percent 20 > /tmp/maker_08Elxf/15/chr9_10L. >> 84770203-84771247.TRINITY_GG_19079_c1670_g1_i1.e.exonerate >> #-------------------------------# >> running est2genome search. >> #--------- command -------------# >> Widget::exonerate::est2genome: >> /packages/exonerate-2.2.0/bin/exonerate -q >> /tmp/maker_08Elxf/10/TRINITY_GG_87963_c9694_g10_i12.for.49475083-49475985.10.fasta >> -t /tmp/maker_08Elxf/10/chr6L.49475083-49475985.10.fasta -Q dna -T dna >> --model est2genome --minintron 20 --maxintron 10000 --showcigar --percent >> 20 > /tmp/maker_08Elxf/10/chr6L.49475083-49475985.TRINITY_GG_8796 >> 3_c9694_g10_i12.e.exonerate >> #-------------------------------# >> >> ============================================================ >> ======================= >> = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES >> = PID 132376 RUNNING AT pnap-pe7-s03 >> = EXIT CODE: 135 >> = CLEANING UP REMAINING PROCESSES >> = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES >> ============================================================ >> ======================= >> YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Bus error (signal 7) >> This typically refers to a problem with your application. >> Please see the FAQ page for debugging suggestions >> >> The the command I ran was the following: >> >> #PBS -l walltime=240:00:00 >> #PBS -N MAKER >> #PBS -l nodes=1:ppn=16 >> ##PBS -q hmem >> #PBS -j oe >> #PBS -m abe >> #PBS -M jcornelius at tgen.org >> #PBS -A tgen-205000 >> #PBS -o /scratch/jcornelius/xenopus_laevis/maker_run >> >> # --- load required modules --- # >> >> module load maker >> >> # --- run maker --- # >> >> cd /scratch/jcornelius/xenopus_laevis/maker_run >> mpiexec -n 16 maker -base XLNEURO.run1 -fix_nucleotides >> >> I'm not sure what could be causing this error and any help would be much >> appreciated. Thanks. >> -- >> John Cornelius >> MCB PhD Candidate >> Arizona State University >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> > > > -- > John Cornelius > MCB PhD Candidate > Arizona State University > > > -- John Cornelius MCB PhD Candidate Arizona State University -------------- next part -------------- An HTML attachment was scrubbed... URL: From mohamed.amine.chebbi at univ-poitiers.fr Thu Nov 24 14:45:01 2016 From: mohamed.amine.chebbi at univ-poitiers.fr (Mohamed Amine Chebbi) Date: Thu, 24 Nov 2016 22:45:01 +0100 (CET) Subject: [maker-devel] map_fasta_ids : No mapping available... Message-ID: <773569486.15711466.1480023901276.JavaMail.zimbra@univ-poitiers.fr> Hello ! I'am attempting to rename genes of maker.proteins.fasta for Genebank submission using the map_fasta_ids script. It seems to work correctly for the major of gene models, except to those ones having the below warning message : WARNING: No mapping available for maker-scaffold_1710-augustus-gene-0.3-mRNA-1 WARNING: No mapping available for maker-scaffold_1710-augustus-gene-0.0-mRNA-1 WARNING: No mapping available for maker-scaffold_1710-snap-gene-0.6-mRNA-1 WARNING: No mapping available for maker-scaffold_1710-augustus-gene-0.4-mRNA-1 WARNING: No mapping available for maker-scaffold_1734-augustus-gene-0.1-mRNA-1 WARNING: No mapping available for maker-scaffold_1734-augustus-gene-0.2-mRNA-1 WARNING: No mapping available for maker-scaffold_1734-augustus-gene-0.0-mRNA-1 WARNING: No mapping available for maker-scaffold_1734-augustus-gene-0.5-mRNA-1 WARNING: No mapping available for maker-scaffold_1734-augustus-gene-0.6-mRNA-1 WARNING: No mapping available for maker-scaffold_1734-snap-gene-0.15-mRNA-1 WARNING: No mapping available for maker-scaffold_1734-snap-gene-0.16-mRNA-1 Looking into the maker.gff file, these gene names are missing and may be replaced by other ones which differ by the numbers following the gene predictor. I wounder if you can explain me the reason of these warning message and how to resolve it. Thank you , Best, Amine -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Nov 24 19:04:59 2016 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 24 Nov 2016 19:04:59 -0700 Subject: [maker-devel] Error running MAKER In-Reply-To: References: <478D5289-91FD-4F3A-AED1-B2A81A742D43@gmail.com> <8E9C294A-B68C-42C1-999C-13165985AD93@gmail.com> Message-ID: <3C668404-EA3C-46B4-9676-8F95E2AFB64F@gmail.com> A lock failure can become an issue if two separate jobs are running simultaneously. They may both try to process the same contig at the same time (modifying each others files) which will cause one or both to fail. On failure, it should always retry at some later point. So it can usually recover from this. If you see any partial lines in the resulting GFF3, then it did not recover and you need to just rerun whatever contig this happened on. ?Carson > On Nov 18, 2016, at 12:14 PM, John Cornelius wrote: > > Would the lock failure cause problems with the annotation? It looks like Maker is still progressing, just not as quickly as I thought it would be. > > On Thu, Nov 17, 2016 at 9:04 PM, Carson Holt > wrote: > To use less RAM, try lowering max_dna_len=, setting blast_depth= parameters to 20 pr 30 in maker_bopts.ctl (default is limitless), or when using MPI, starting fewer processes per node (requires manipulation of hostfile or using round robin distribution flag for MPI flavors where it is available). > > The memory issue could be causing the lock failure as well. > > ?Carson > > > >> On Nov 17, 2016, at 7:53 PM, John Cornelius > wrote: >> >> Ok, so I went and searched one of the output logs for all the lines that say ERROR and I got 44 lines with the following message: >> >> ERROR: Lock broken in runlog >> >> With these lines found at the end: >> >> ERROR: Failed while polishig ESTs >> ERROR: Chunk failed at level:2, tier_type:3 >> ERROR: Could not query process table: Cannot allocate memory at /packages/maker/2.31.8/bin/../lib/Proc/ProcessTable_simple.pm line 62. >> >> From that last line it looks like the process is running out of RAM would that be right? Thanks. >> >> On Fri, Nov 11, 2016 at 2:59 PM, Carson Holt > wrote: >> The cause of the error is probably further back in the STDERR. With MPI so many processes are producing status and notes, that you can get several seconds of output after ta failure. If you kept the whole STDERR, I can help you look through it. searching for ?ERROR? all caps is usually where you will see it. Also MAKER keeps a log of progress, so even on failure, you can just restart it and it will pick up the analysis from the last successful step. >> >> ?Carson >> >> >>> On Nov 10, 2016, at 3:43 PM, John Cornelius > wrote: >>> >>> Hello, I'm using MAKER to annotate a tetraploid genome and while running it, I encountered the following error: >>> >>> #--------- command -------------# >>> Widget::exonerate::est2genome: >>> /packages/exonerate-2.2.0/bin/exonerate -q /tmp/maker_08Elxf/15/TRINITY_GG_19079_c1670_g1_i1.for.84770203-84771247.15.fasta -t /tmp/maker_08Elxf/15/chr9_10L.84770203-84771247.15.fasta -Q dna -T dna --model est2genome --minintron 20 --maxintron 10000 --showcigar --percent 20 > /tmp/maker_08Elxf/15/chr9_10L.84770203-84771247.TRINITY_GG_19079_c1670_g1_i1.e.exonerate >>> #-------------------------------# >>> running est2genome search. >>> #--------- command -------------# >>> Widget::exonerate::est2genome: >>> /packages/exonerate-2.2.0/bin/exonerate -q /tmp/maker_08Elxf/10/TRINITY_GG_87963_c9694_g10_i12.for.49475083-49475985.10.fasta -t /tmp/maker_08Elxf/10/chr6L.49475083-49475985.10.fasta -Q dna -T dna --model est2genome --minintron 20 --maxintron 10000 --showcigar --percent 20 > /tmp/maker_08Elxf/10/chr6L.49475083-49475985.TRINITY_GG_87963_c9694_g10_i12.e.exonerate >>> #-------------------------------# >>> >>> =================================================================================== >>> = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES >>> = PID 132376 RUNNING AT pnap-pe7-s03 >>> = EXIT CODE: 135 >>> = CLEANING UP REMAINING PROCESSES >>> = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES >>> =================================================================================== >>> YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Bus error (signal 7) >>> This typically refers to a problem with your application. >>> Please see the FAQ page for debugging suggestions >>> >>> The the command I ran was the following: >>> >>> #PBS -l walltime=240:00:00 >>> #PBS -N MAKER >>> #PBS -l nodes=1:ppn=16 >>> ##PBS -q hmem >>> #PBS -j oe >>> #PBS -m abe >>> #PBS -M jcornelius at tgen.org >>> #PBS -A tgen-205000 >>> #PBS -o /scratch/jcornelius/xenopus_laevis/maker_run >>> >>> # --- load required modules --- # >>> >>> module load maker >>> >>> # --- run maker --- # >>> >>> cd /scratch/jcornelius/xenopus_laevis/maker_run >>> mpiexec -n 16 maker -base XLNEURO.run1 -fix_nucleotides >>> >>> I'm not sure what could be causing this error and any help would be much appreciated. Thanks. >>> -- >>> John Cornelius >>> MCB PhD Candidate >>> Arizona State University >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> >> >> -- >> John Cornelius >> MCB PhD Candidate >> Arizona State University > > > > > -- > John Cornelius > MCB PhD Candidate > Arizona State University -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Nov 28 09:26:40 2016 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 28 Nov 2016 09:26:40 -0700 Subject: [maker-devel] map_fasta_ids : No mapping available... In-Reply-To: <773569486.15711466.1480023901276.JavaMail.zimbra@univ-poitiers.fr> References: <773569486.15711466.1480023901276.JavaMail.zimbra@univ-poitiers.fr> Message-ID: <401400E0-7581-4407-A30E-A787485B0E86@gmail.com> The map file you run with is two columns (old_id and new_id). If the input file has IDs that do not match anything in the old_id column then it throws the warning. It means there is a mismatch between the map file being used and the fasta file. This can occur if you did downstream manipulation of the fasta file, are using the wrong fasta file, or if you used GFF3 as input to a maker step that as generated an ID mismatch. ?Carson > On Nov 24, 2016, at 2:45 PM, Mohamed Amine Chebbi wrote: > > Hello ! > > I'am attempting to rename genes of maker.proteins.fasta for Genebank submission using the map_fasta_ids script. It seems to work correctly for the major of gene models, except to those ones having the below warning message : > > WARNING: No mapping available for maker-scaffold_1710-augustus-gene-0.3-mRNA-1 > WARNING: No mapping available for maker-scaffold_1710-augustus-gene-0.0-mRNA-1 > WARNING: No mapping available for maker-scaffold_1710-snap-gene-0.6-mRNA-1 > WARNING: No mapping available for maker-scaffold_1710-augustus-gene-0.4-mRNA-1 > WARNING: No mapping available for maker-scaffold_1734-augustus-gene-0.1-mRNA-1 > WARNING: No mapping available for maker-scaffold_1734-augustus-gene-0.2-mRNA-1 > WARNING: No mapping available for maker-scaffold_1734-augustus-gene-0.0-mRNA-1 > WARNING: No mapping available for maker-scaffold_1734-augustus-gene-0.5-mRNA-1 > WARNING: No mapping available for maker-scaffold_1734-augustus-gene-0.6-mRNA-1 > WARNING: No mapping available for maker-scaffold_1734-snap-gene-0.15-mRNA-1 > WARNING: No mapping available for maker-scaffold_1734-snap-gene-0.16-mRNA-1 > > Looking into the maker.gff file, these gene names are missing and may be replaced by other ones which differ by the numbers following the gene predictor. > > I wounder if you can explain me the reason of these warning message and how to resolve it. > > Thank you , > > Best, > Amine > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From parulk at caltech.edu Tue Nov 29 10:13:06 2016 From: parulk at caltech.edu (Kudtarkar, Parul V.) Date: Tue, 29 Nov 2016 17:13:06 +0000 Subject: [maker-devel] error: training genemodel with SNAP and GeneMark & run time to generate AUGUTUS species file Message-ID: <5F5AE8A3-967E-4876-8581-FE54FB676210@caltech.edu> Dear Maker developers, 1. We use assembled RNAseq(from same species) and protein evidence(from evolutionary close species) to generate training gene structure(1st iteration, est2genome=1,protein2genome=1 ). 2. This is than used to train abinito gene predictors, SNAP and AUGUSTUS. 3. GeneMarkES( version: GeneMark-ES / ET v.4.32) is used to produce training data-set with the command gmes_petap.pl --sequence pmin_jelly.fa 4. We would be predicting genes using results from SNAP, Genemark and AUGUSTUS(2nd iteration, est2genome=0, protein2genome=0) I have couple of questions relating to Genemark and AUGUSTUS 1. AUGUSTUS We do not have a species file for species file of our interest or evolutionary closer species following command is used to generate species file /autoAug.pl --genome=pmin_jelly.fa --species=pminiata --cdna=pmin_transcripts.fa --trainingset=genome.gff3 --singleCPU -v --useexisting AUGUSTUS is taking too long to compute species file, is there a solution for this issue. Using species file from other organism might generate false positives. Is it advised in such situations to not used AUGUSTUS model? 2. Genemark I used the gmhmm file generated in the genemark output directory, however I encounter following error ------------------------- STATUS: Parsing control files... ERROR: You have failed to provide a value for 'gmhmme3' in the control files. ERROR: You have failed to provide a value for 'probuild' in the control files. --------------------- FYI ----- maker_opts.ctl #-----Gene Prediction snaphmm=/home/parul/Pmin_new/maker_snap/pmin1.hmm #SNAP HMM file gmhmm=/home/parul/Pmin_new/maker_snap/gmhmm.mod #GeneMark HMM file ----- Using SNAP for training gene model yields over 6000-7000 additional gene. The model has good cumulative AED value. I was hoping in addition to SNAP, if I could use AUGUSTUS and GeneMark to train the gene model to fuse dispersed models so that the gene count is within the expected range. Thanks and regards, Parul Sent from my iPhone -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Tue Nov 29 10:28:33 2016 From: dence at genetics.utah.edu (Daniel Ence) Date: Tue, 29 Nov 2016 17:28:33 +0000 Subject: [maker-devel] error: training genemodel with SNAP and GeneMark & run time to generate AUGUTUS species file In-Reply-To: <5F5AE8A3-967E-4876-8581-FE54FB676210@caltech.edu> References: <5F5AE8A3-967E-4876-8581-FE54FB676210@caltech.edu> Message-ID: <359BAE14-18C2-4B91-A628-9613F94C8468@genetics.utah.edu> HI Parul, Training augustus does take a long time. Much longer than for the other two predictors that you mentioned. Have you tried using the webAugustus web portal? The team that made augustus run it and can probably help you with trouble-shooting their page for creating training sets: http://bioinf.uni-greifswald.de/webaugustus/training/create The error that you got regarding genemark is saying that maker can?t find the genemark and probuild executable files. These are specified in the maker_exe.ctl file, not the ?opts? file. You need to put valid paths to those executable files in for the given parameters. This is something that is usually specified during installation of MAKER. Hope that helps, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 On Nov 29, 2016, at 10:13 AM, Kudtarkar, Parul V. > wrote: Dear Maker developers, 1. We use assembled RNAseq(from same species) and protein evidence(from evolutionary close species) to generate training gene structure(1st iteration, est2genome=1,protein2genome=1 ). 2. This is than used to train abinito gene predictors, SNAP and AUGUSTUS. 3. GeneMarkES( version: GeneMark-ES / ET v.4.32) is used to produce training data-set with the command gmes_petap.pl --sequence pmin_jelly.fa 4. We would be predicting genes using results from SNAP, Genemark and AUGUSTUS(2nd iteration, est2genome=0, protein2genome=0) I have couple of questions relating to Genemark and AUGUSTUS 1. AUGUSTUS We do not have a species file for species file of our interest or evolutionary closer species following command is used to generate species file /autoAug.pl --genome=pmin_jelly.fa --species=pminiata --cdna=pmin_transcripts.fa --trainingset=genome.gff3 --singleCPU -v --useexisting AUGUSTUS is taking too long to compute species file, is there a solution for this issue. Using species file from other organism might generate false positives. Is it advised in such situations to not used AUGUSTUS model? 2. Genemark I used the gmhmm file generated in the genemark output directory, however I encounter following error ------------------------- STATUS: Parsing control files... ERROR: You have failed to provide a value for 'gmhmme3' in the control files. ERROR: You have failed to provide a value for 'probuild' in the control files. --------------------- FYI ----- maker_opts.ctl #-----Gene Prediction snaphmm=/home/parul/Pmin_new/maker_snap/pmin1.hmm #SNAP HMM file gmhmm=/home/parul/Pmin_new/maker_snap/gmhmm.mod #GeneMark HMM file ----- Using SNAP for training gene model yields over 6000-7000 additional gene. The model has good cumulative AED value. I was hoping in addition to SNAP, if I could use AUGUSTUS and GeneMark to train the gene model to fuse dispersed models so that the gene count is within the expected range. Thanks and regards, Parul Sent from my iPhone _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Nov 29 10:34:31 2016 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 29 Nov 2016 10:34:31 -0700 Subject: [maker-devel] error: training genemodel with SNAP and GeneMark & run time to generate AUGUTUS species file In-Reply-To: <5F5AE8A3-967E-4876-8581-FE54FB676210@caltech.edu> References: <5F5AE8A3-967E-4876-8581-FE54FB676210@caltech.edu> Message-ID: <596EAC73-4DB5-4144-A8EA-0E955AA0E028@gmail.com> How to train Augustus ?> http://www.molecularevolution.org/molevolfiles/exercises/augustus/training.html Step 2 shows how to create an empty species to start training with. Then Step 4 (optimize_augustus.pl) is the step that takes a while. Then for GeneMark, you must set the location of the necessary GeneMark executables in the maker_exe.ctl file. After getting all predictors trained, and running a few contigs, take a moment to review the predictor performance by manually reviewing them in something like Apollo. It is not uncommon that one or more perform poorly on an organism (they should each produce similar predictions). If one is significantly off relative to the other predictors and the evidence, it should be dropped. A bad behaving predictor will reduce the overall annotation performance. ?Carson > On Nov 29, 2016, at 10:13 AM, Kudtarkar, Parul V. wrote: > > >> Dear Maker developers, >> >> 1. We use assembled RNAseq(from same species) and protein evidence(from evolutionary close species) to generate training gene structure(1st iteration, est2genome=1,protein2genome=1 ). >> >> 2. This is than used to train abinito gene predictors, SNAP and AUGUSTUS. >> >> 3. GeneMarkES( version: GeneMark-ES / ET v.4.32) is used to produce training data-set with the command >> >> gmes_petap.pl --sequence pmin_jelly.fa >> >> 4. We would be predicting genes using results from SNAP, Genemark and AUGUSTUS(2nd iteration, est2genome=0, protein2genome=0) >> >> I have couple of questions relating to Genemark and AUGUSTUS >> >> 1. AUGUSTUS >> >> We do not have a species file for species file of our interest or evolutionary closer species >> >> following command is used to generate species file >> >> >> /autoAug.pl --genome=pmin_jelly.fa --species=pminiata --cdna=pmin_transcripts.fa --trainingset=genome.gff3 --singleCPU -v --useexisting >> AUGUSTUS is taking too long to compute species file, is there a solution for this issue. Using species file from other organism might generate false positives. Is it advised in such situations to not used AUGUSTUS model? >> >> 2. Genemark >> >> I used the gmhmm file generated in the genemark output directory, however I encounter following error >> >> >> ------------------------- >> >> STATUS: Parsing control files... >> ERROR: You have failed to provide a value for 'gmhmme3' in the control files. >> ERROR: You have failed to provide a value for 'probuild' in the control files. >> --------------------- >> FYI >> >> ----- >> >> maker_opts.ctl >> >> >> #-----Gene Prediction >> snaphmm=/home/parul/Pmin_new/maker_snap/pmin1.hmm #SNAP HMM file >> gmhmm=/home/parul/Pmin_new/maker_snap/gmhmm.mod #GeneMark HMM file >> >> ----- >> >> Using SNAP for training gene model yields over 6000-7000 additional gene. The model has good cumulative AED value. >> >> I was hoping in addition to SNAP, if I could use AUGUSTUS and GeneMark to train the gene model to fuse dispersed models so that the gene count is within the expected range. >> >> >> Thanks and regards, >> >> Parul >> > > Sent from my iPhone -------------- next part -------------- An HTML attachment was scrubbed... URL: From parulk at caltech.edu Tue Nov 29 16:40:30 2016 From: parulk at caltech.edu (Kudtarkar, Parul V.) Date: Tue, 29 Nov 2016 23:40:30 +0000 Subject: [maker-devel] error: training genemodel with SNAP and GeneMark & run time to generate AUGUTUS species file In-Reply-To: <596EAC73-4DB5-4144-A8EA-0E955AA0E028@gmail.com> References: <5F5AE8A3-967E-4876-8581-FE54FB676210@caltech.edu>, <596EAC73-4DB5-4144-A8EA-0E955AA0E028@gmail.com> Message-ID: Dear Carson and Daniel, Thanks for getting back to me promptly. Adding the path to genemark executable in maker_exe.ctl fixes the error. Hopefully optimize_augustus.pl runs quicker compared to autoAug.pl (which has been running for almost a week now) It would be interesting and we look forward to evaluate which model optimizes our expected gene count, AED values and has recognizable domains. PS. We think BUSCO has helped us to evaluate gene model completeness. Thanks, Parul ---- Parul Kudtarkar Bioinformatician Biology and Biological Engineering Office: 278 Beckman Institute California Institute of Technology MC 139-74 Pasadena CA 91125 http://www.echinobase.org ________________________________ From: Carson Holt Sent: Tuesday, November 29, 2016 9:34:31 AM To: Kudtarkar, Parul V. Cc: maker-devel at yandell-lab.org Subject: Re: error: training genemodel with SNAP and GeneMark & run time to generate AUGUTUS species file How to train Augustus -> http://www.molecularevolution.org/molevolfiles/exercises/augustus/training.html Step 2 shows how to create an empty species to start training with. Then Step 4 (optimize_augustus.pl) is the step that takes a while. Then for GeneMark, you must set the location of the necessary GeneMark executables in the maker_exe.ctl file. After getting all predictors trained, and running a few contigs, take a moment to review the predictor performance by manually reviewing them in something like Apollo. It is not uncommon that one or more perform poorly on an organism (they should each produce similar predictions). If one is significantly off relative to the other predictors and the evidence, it should be dropped. A bad behaving predictor will reduce the overall annotation performance. -Carson On Nov 29, 2016, at 10:13 AM, Kudtarkar, Parul V. > wrote: Dear Maker developers, 1. We use assembled RNAseq(from same species) and protein evidence(from evolutionary close species) to generate training gene structure(1st iteration, est2genome=1,protein2genome=1 ). 2. This is than used to train abinito gene predictors, SNAP and AUGUSTUS. 3. GeneMarkES( version: GeneMark-ES / ET v.4.32) is used to produce training data-set with the command gmes_petap.pl --sequence pmin_jelly.fa 4. We would be predicting genes using results from SNAP, Genemark and AUGUSTUS(2nd iteration, est2genome=0, protein2genome=0) I have couple of questions relating to Genemark and AUGUSTUS 1. AUGUSTUS We do not have a species file for species file of our interest or evolutionary closer species following command is used to generate species file /autoAug.pl --genome=pmin_jelly.fa --species=pminiata --cdna=pmin_transcripts.fa --trainingset=genome.gff3 --singleCPU -v --useexisting AUGUSTUS is taking too long to compute species file, is there a solution for this issue. Using species file from other organism might generate false positives. Is it advised in such situations to not used AUGUSTUS model? 2. Genemark I used the gmhmm file generated in the genemark output directory, however I encounter following error ------------------------- STATUS: Parsing control files... ERROR: You have failed to provide a value for 'gmhmme3' in the control files. ERROR: You have failed to provide a value for 'probuild' in the control files. --------------------- FYI ----- maker_opts.ctl #-----Gene Prediction snaphmm=/home/parul/Pmin_new/maker_snap/pmin1.hmm #SNAP HMM file gmhmm=/home/parul/Pmin_new/maker_snap/gmhmm.mod #GeneMark HMM file ----- Using SNAP for training gene model yields over 6000-7000 additional gene. The model has good cumulative AED value. I was hoping in addition to SNAP, if I could use AUGUSTUS and GeneMark to train the gene model to fuse dispersed models so that the gene count is within the expected range. Thanks and regards, Parul Sent from my iPhone -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Nov 30 12:24:36 2016 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 30 Nov 2016 12:24:36 -0700 Subject: [maker-devel] Error running MAKER In-Reply-To: References: <478D5289-91FD-4F3A-AED1-B2A81A742D43@gmail.com> <8E9C294A-B68C-42C1-999C-13165985AD93@gmail.com> <3C668404-EA3C-46B4-9676-8F95E2AFB64F@gmail.com> Message-ID: Yes. You can either separate out the contig using fasta_tool or find the contig in the datastore directory (failed contigs will have fasta created there just for the failed contig). Then you can use 'maker -g contig.fasta -base original_base_name? (-g and -base options) to specify that you want it to use the new contig fasta but write results to the given base directory (i.e. same as previous output directory). Remember to set -t (or tries in the maker_opts.ctl file) to a higher count when doing this. ?Carson > On Nov 30, 2016, at 12:11 PM, John Cornelius wrote: > > Awesome! Thanks for the help. MAKER finally finished it's initial run today however, I noticed that there was still one large sequence that failed. Would it be possible to run MAKER on just that sequence and then combine the result of that run with the output of my main maker run? > > On Thu, Nov 24, 2016 at 7:04 PM, Carson Holt > wrote: > A lock failure can become an issue if two separate jobs are running simultaneously. They may both try to process the same contig at the same time (modifying each others files) which will cause one or both to fail. On failure, it should always retry at some later point. So it can usually recover from this. If you see any partial lines in the resulting GFF3, then it did not recover and you need to just rerun whatever contig this happened on. > > ?Carson > > > >> On Nov 18, 2016, at 12:14 PM, John Cornelius > wrote: >> >> Would the lock failure cause problems with the annotation? It looks like Maker is still progressing, just not as quickly as I thought it would be. >> >> On Thu, Nov 17, 2016 at 9:04 PM, Carson Holt > wrote: >> To use less RAM, try lowering max_dna_len=, setting blast_depth= parameters to 20 pr 30 in maker_bopts.ctl (default is limitless), or when using MPI, starting fewer processes per node (requires manipulation of hostfile or using round robin distribution flag for MPI flavors where it is available). >> >> The memory issue could be causing the lock failure as well. >> >> ?Carson >> >> >> >>> On Nov 17, 2016, at 7:53 PM, John Cornelius > wrote: >>> >>> Ok, so I went and searched one of the output logs for all the lines that say ERROR and I got 44 lines with the following message: >>> >>> ERROR: Lock broken in runlog >>> >>> With these lines found at the end: >>> >>> ERROR: Failed while polishig ESTs >>> ERROR: Chunk failed at level:2, tier_type:3 >>> ERROR: Could not query process table: Cannot allocate memory at /packages/maker/2.31.8/bin/../lib/Proc/ProcessTable_simple.pm line 62. >>> >>> From that last line it looks like the process is running out of RAM would that be right? Thanks. >>> >>> On Fri, Nov 11, 2016 at 2:59 PM, Carson Holt > wrote: >>> The cause of the error is probably further back in the STDERR. With MPI so many processes are producing status and notes, that you can get several seconds of output after ta failure. If you kept the whole STDERR, I can help you look through it. searching for ?ERROR? all caps is usually where you will see it. Also MAKER keeps a log of progress, so even on failure, you can just restart it and it will pick up the analysis from the last successful step. >>> >>> ?Carson >>> >>> >>>> On Nov 10, 2016, at 3:43 PM, John Cornelius > wrote: >>>> >>>> Hello, I'm using MAKER to annotate a tetraploid genome and while running it, I encountered the following error: >>>> >>>> #--------- command -------------# >>>> Widget::exonerate::est2genome: >>>> /packages/exonerate-2.2.0/bin/exonerate -q /tmp/maker_08Elxf/15/TRINITY_GG_19079_c1670_g1_i1.for.84770203-84771247.15.fasta -t /tmp/maker_08Elxf/15/chr9_10L.84770203-84771247.15.fasta -Q dna -T dna --model est2genome --minintron 20 --maxintron 10000 --showcigar --percent 20 > /tmp/maker_08Elxf/15/chr9_10L.84770203-84771247.TRINITY_GG_19079_c1670_g1_i1.e.exonerate >>>> #-------------------------------# >>>> running est2genome search. >>>> #--------- command -------------# >>>> Widget::exonerate::est2genome: >>>> /packages/exonerate-2.2.0/bin/exonerate -q /tmp/maker_08Elxf/10/TRINITY_GG_87963_c9694_g10_i12.for.49475083-49475985.10.fasta -t /tmp/maker_08Elxf/10/chr6L.49475083-49475985.10.fasta -Q dna -T dna --model est2genome --minintron 20 --maxintron 10000 --showcigar --percent 20 > /tmp/maker_08Elxf/10/chr6L.49475083-49475985.TRINITY_GG_87963_c9694_g10_i12.e.exonerate >>>> #-------------------------------# >>>> >>>> =================================================================================== >>>> = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES >>>> = PID 132376 RUNNING AT pnap-pe7-s03 >>>> = EXIT CODE: 135 >>>> = CLEANING UP REMAINING PROCESSES >>>> = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES >>>> =================================================================================== >>>> YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Bus error (signal 7) >>>> This typically refers to a problem with your application. >>>> Please see the FAQ page for debugging suggestions >>>> >>>> The the command I ran was the following: >>>> >>>> #PBS -l walltime=240:00:00 >>>> #PBS -N MAKER >>>> #PBS -l nodes=1:ppn=16 >>>> ##PBS -q hmem >>>> #PBS -j oe >>>> #PBS -m abe >>>> #PBS -M jcornelius at tgen.org >>>> #PBS -A tgen-205000 >>>> #PBS -o /scratch/jcornelius/xenopus_laevis/maker_run >>>> >>>> # --- load required modules --- # >>>> >>>> module load maker >>>> >>>> # --- run maker --- # >>>> >>>> cd /scratch/jcornelius/xenopus_laevis/maker_run >>>> mpiexec -n 16 maker -base XLNEURO.run1 -fix_nucleotides >>>> >>>> I'm not sure what could be causing this error and any help would be much appreciated. Thanks. >>>> -- >>>> John Cornelius >>>> MCB PhD Candidate >>>> Arizona State University >>>> _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at box290.bluehost.com >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> >>> >>> >>> -- >>> John Cornelius >>> MCB PhD Candidate >>> Arizona State University >> >> >> >> >> -- >> John Cornelius >> MCB PhD Candidate >> Arizona State University > > > > > -- > John Cornelius > MCB PhD Candidate > Arizona State University -------------- next part -------------- An HTML attachment was scrubbed... URL: