From jacques.dainat at nbis.se Wed May 2 06:55:13 2018 From: jacques.dainat at nbis.se (Jacques Dainat) Date: Wed, 2 May 2018 13:55:13 +0200 Subject: [maker-devel] keep_preds=0 but some gene models have anyway an AED equal to 1. Message-ID: Dear all, It is not the first time I see that, but I have an annotation launched with the option keeps_pred=0 that contains gene models with AED score equal to 1. As far as I understand, AED score to 1 means there is no evidence support. So it should be purely abinitio without any line of evidence in front of this prediction. But as it is written in ?Michael S-Cambell et al., Genome Annotation and Curation Using MAKER and MAKER-P, Current Protocols in Bioinformatics, 2014? page 4.11.36 about the the keep_preds option: "MAKER rejects models that do not have at least some form of evidence support.? Setting keep_preds to 1 ?remove the evidence support requirement?. So, how should I understand my results? Some predictions with low support (but support anyway) can have AED score equal to 1? Or some purely abinitio prediction without support at all can anyway be selected even when keep_preds=0 is set up? Best regards, Jacques Dainat -------------- next part -------------- An HTML attachment was scrubbed... URL: From kapeelc at gmail.com Tue May 1 11:16:27 2018 From: kapeelc at gmail.com (Kapeel Chougule) Date: Tue, 1 May 2018 12:16:27 -0400 Subject: [maker-devel] MAKER v3 ERROR: Failed while processing all repeats Message-ID: Hi, I am using MAKER v3 to update community annotation with new RNA-seq evidence data. Part of my problem is that MAKER shows below error when repeat masking. I have attached the community annotation gff, maker log and maker_opts.ctl for your reference. I searched for this error in the maker-dev google group and found some hints to update BLAST to 2.7, rmblast to 2.6 and reconfigure RepeatMasker which I did but it still fails with the error. Any help appreciated. deleted:0 hits collecting blastx repeatmasking processing all repeats in cluster::shadow_cluster... Died at /mnt/grid/ware/hpc/home/data/mcampbel/applications/maker/bin/../lib/Bio/Search/Hit/PhatHit/Base.pm line 188. --> rank=1, hostname=bnbcompute15.blacknblue.cshl.edu ERROR: Failed while processing all repeats ERROR: Chunk failed at level:3, tier_type:1 FAILED CONTIG:Chr06 ERROR: Chunk failed at level:2, tier_type:0 FAILED CONTIG:Chr06 Thanks Kapeel -- *Kapeel ChouguleComputational Scientist Developer II* *One Bungtown Road Cold Spring Harbor, NY 11724http://www.warelab.org/ * -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Chr06.gff Type: application/octet-stream Size: 8239003 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_opts.ctl Type: application/octet-stream Size: 4779 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker.run.log Type: application/octet-stream Size: 5368010 bytes Desc: not available URL: From kapeelc at gmail.com Fri May 4 11:01:39 2018 From: kapeelc at gmail.com (Kapeel Chougule) Date: Fri, 4 May 2018 12:01:39 -0400 Subject: [maker-devel] MAKER v3 ERROR: Failed while processing all repeats Message-ID: Hi, I am using MAKER v3 to update community annotation with new RNA-seq evidence data. Part of my problem is that MAKER shows below error when repeat masking. I have attached the community annotation gff, maker log and maker_opts.ctl for your reference. I searched for this error in the maker-dev google group and found some hints to update BLAST to 2.7, rmblast to 2.6 and reconfigure RepeatMasker which I did but it still fails with the error. Any help appreciated. deleted:0 hits collecting blastx repeatmasking processing all repeats in cluster::shadow_cluster... Died at /mnt/grid/ware/hpc/home/data/mcampbel/applications/maker/ bin/../lib/Bio/Search/Hit/PhatHit/Base.pm line 188. --> rank=1, hostname=bnbcompute15.blacknblue.cshl.edu ERROR: Failed while processing all repeats ERROR: Chunk failed at level:3, tier_type:1 FAILED CONTIG:Chr06 ERROR: Chunk failed at level:2, tier_type:0 FAILED CONTIG:Chr06 Genome fasta: https://de.cyverse.org/dl/d/F612A6E2-A58E-44F0-895D-B766B41FE287/Chr06.gff MAKER_run_log: https://de.cyverse.org/dl/d/0B898D91-1520-4D19-9835-DC7EDD52415F/maker.run.log Maker_opts.ctl: https://de.cyverse.org/dl/d/15EB321B-4604-47C9-8E4F-0DC8D78517CE/maker_opts.ctl Thanks -- *Kapeel ChouguleComputational Scientist Developer II* *One Bungtown Road Cold Spring Harbor, NY 11724http://www.warelab.org/ * -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri May 4 13:30:44 2018 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 4 May 2018 12:30:44 -0600 Subject: [maker-devel] keep_preds=0 but some gene models have anyway an AED equal to 1. In-Reply-To: References: Message-ID: <41640591-9D18-4B08-9FD7-00C550EB7B2A@gmail.com> By default MAKER will not let models through without at least some degree of evidence overlap (AED < 1). This is because ab initio predictors overcall (sometimes by as much as a factor of 10, i.e. 10 false positives for every true positive). You can dial in the minimum AED support using the AED_threshold. But if yo8u want to keep everything that does not have overlap with a better supported model then setting keep_pred=1 will allow even unsupported models to be maintained (1 for yes and 0 for no). Situations where you may want to do this include passing in old annotation datasets or working on organisms with very high genedensity and low ab initio false positive rates (many species of fungi meet this criteria). ?Carson > On May 2, 2018, at 5:55 AM, Jacques Dainat wrote: > > Dear all, > > It is not the first time I see that, but I have an annotation launched with the option keeps_pred=0 that contains gene models with AED score equal to 1. As far as I understand, AED score to 1 means there is no evidence support. So it should be purely abinitio without any line of evidence in front of this prediction. > But as it is written in ?Michael S-Cambell et al., Genome Annotation and Curation Using MAKER and MAKER-P, Current Protocols in Bioinformatics, 2014? page 4.11.36 about the the keep_preds option: > "MAKER rejects models that do not have at least some form of evidence support.? Setting keep_preds to 1 ?remove the evidence support requirement?. > > So, how should I understand my results? Some predictions with low support (but support anyway) can have AED score equal to 1? Or some purely abinitio prediction without support at all can anyway be selected even when keep_preds=0 is set up? > > Best regards, > > Jacques Dainat > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri May 4 13:46:55 2018 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 4 May 2018 12:46:55 -0600 Subject: [maker-devel] MAKER v3 ERROR: Failed while processing all repeats In-Reply-To: References: Message-ID: <161EF76F-4CFF-457E-A762-5E0FE82EABA2@gmail.com> Hi Kapeel, The failure is caused by the absence of a start or end coordinate (usually caused by a BLAST report truncation - there is a BLAST bug supposably fixed now where reports were being truncated by BLAST). If you?ve done all the updates to installed tools, make sure you aslo set the location of the updated tools in maker_exe.ctl and reran the ./configure script for RepeatMasker (internal to it?s install directory) or the old tool is likely still being used. Also if that doesn?t fix it, try the following. Use the attached file to replace ?/maker/lib/Widget/RepeatMasker.pm There is also an as of yet unfixed RepeatMasker bug where it reports a 0 value for the start/end coordinate when configured with RMBLAST (RepeatMasker uses a 1 based coordinate system, so 0 is not supposed to be possible and it only happens with RMBLAST). The change I made to the parser is a hack where I have MAKER change the RepeatMasker coordinate to 1 whenever it sees the invalid 0. ?Carson > On May 1, 2018, at 10:16 AM, Kapeel Chougule wrote: > > Hi, > > I am using MAKER v3 to update community annotation with new RNA-seq evidence data. Part of my problem is that MAKER shows below error when repeat masking. I have attached the community annotation gff, maker log and maker_opts.ctl for your reference. I searched for this error in the maker-dev google group and found some hints to update BLAST to 2.7, rmblast to 2.6 and reconfigure RepeatMasker which I did but it still fails with the error. Any help appreciated. > > deleted:0 hits > collecting blastx repeatmasking > processing all repeats > in cluster::shadow_cluster... > Died at /mnt/grid/ware/hpc/home/data/mcampbel/applications/maker/bin/../lib/Bio/Search/Hit/PhatHit/Base.pm line 188. > --> rank=1, hostname=bnbcompute15.blacknblue.cshl.edu > ERROR: Failed while processing all repeats > ERROR: Chunk failed at level:3, tier_type:1 > FAILED CONTIG:Chr06 > > ERROR: Chunk failed at level:2, tier_type:0 > FAILED CONTIG:Chr06 > > > Thanks > > Kapeel > -- > > Kapeel Chougule > Computational Scientist Developer II > One Bungtown Road Cold Spring Harbor, NY 11724 > http://www.warelab.org/ > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: RepeatMasker.pm Type: text/x-perl-script Size: 9317 bytes Desc: not available URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: From jacques.dainat at nbis.se Mon May 7 06:25:13 2018 From: jacques.dainat at nbis.se (Jacques Dainat) Date: Mon, 7 May 2018 13:25:13 +0200 Subject: [maker-devel] keep_preds=0 but some gene models have anyway an AED equal to 1. In-Reply-To: <41640591-9D18-4B08-9FD7-00C550EB7B2A@gmail.com> References: <41640591-9D18-4B08-9FD7-00C550EB7B2A@gmail.com> Message-ID: <5404E0E7-82CD-420F-AAEF-CFD01793C075@nbis.se> Hi, Thank you for your reply, nevertheless it doesn't it answer my question. Either I didn't express myself well enough or I don't get something obvious from your answer. I will try to rephrase the problem. My problem is that I am setting keep_phred=0 and that I obtain AED of 1. Can unsupported data can be selected with keep_phred=0? what I understood is that it is not the case so do you have any idea of why I have those AED equal to 1? Thank you again for your help. Best regards, Jacques > On 4 May 2018, at 20:30, Carson Holt wrote: > > By default MAKER will not let models through without at least some degree of evidence overlap (AED < 1). This is because ab initio predictors overcall (sometimes by as much as a factor of 10, i.e. 10 false positives for every true positive). You can dial in the minimum AED support using the AED_threshold. But if yo8u want to keep everything that does not have overlap with a better supported model then setting keep_pred=1 will allow even unsupported models to be maintained (1 for yes and 0 for no). Situations where you may want to do this include passing in old annotation datasets or working on organisms with very high genedensity and low ab initio false positive rates (many species of fungi meet this criteria). > > ?Carson > > >> On May 2, 2018, at 5:55 AM, Jacques Dainat > wrote: >> >> Dear all, >> >> It is not the first time I see that, but I have an annotation launched with the option keeps_pred=0 that contains gene models with AED score equal to 1. As far as I understand, AED score to 1 means there is no evidence support. So it should be purely abinitio without any line of evidence in front of this prediction. >> But as it is written in ?Michael S-Cambell et al., Genome Annotation and Curation Using MAKER and MAKER-P, Current Protocols in Bioinformatics, 2014? page 4.11.36 about the the keep_preds option: >> "MAKER rejects models that do not have at least some form of evidence support.? Setting keep_preds to 1 ?remove the evidence support requirement?. >> >> So, how should I understand my results? Some predictions with low support (but support anyway) can have AED score equal to 1? Or some purely abinitio prediction without support at all can anyway be selected even when keep_preds=0 is set up? >> >> Best regards, >> >> Jacques Dainat >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue May 8 08:28:27 2018 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 8 May 2018 07:28:27 -0600 Subject: [maker-devel] keep_preds=0 but some gene models have anyway an AED equal to 1. In-Reply-To: <5404E0E7-82CD-420F-AAEF-CFD01793C075@nbis.se> References: <41640591-9D18-4B08-9FD7-00C550EB7B2A@gmail.com> <5404E0E7-82CD-420F-AAEF-CFD01793C075@nbis.se> Message-ID: <8ADB4928-A00E-41E1-97B6-413F8257276C@gmail.com> You should not see AED=1 in the annotations (unless you supplied features to model_gff those are always maintained). But you will see AED=1 in the evidence track. Make sure you are looking at features with a source tag of ?maker? and type of gene/mRNA/exon/CDS, and not type match/match_part. The match/match_part features are reference features in the evidence track. The reference features will also have their own fasta file. The only fasta you should use are the maker.proteins.fasta and maker.transcripts.fasta not the snap_masked.protein.fasta for example. ?Carson > On May 7, 2018, at 5:25 AM, Jacques Dainat wrote: > > Hi, > > Thank you for your reply, nevertheless it doesn't it answer my question. Either I didn't express myself well enough or I don't get something obvious from your answer. > > I will try to rephrase the problem. > My problem is that I am setting keep_phred=0 and that I obtain AED of 1. Can unsupported data can be selected with keep_phred=0? what I understood is that it is not the case so do you have any idea of why I have those AED equal to 1? > > Thank you again for your help. > Best regards, > > Jacques > > >> On 4 May 2018, at 20:30, Carson Holt > wrote: >> >> By default MAKER will not let models through without at least some degree of evidence overlap (AED < 1). This is because ab initio predictors overcall (sometimes by as much as a factor of 10, i.e. 10 false positives for every true positive). You can dial in the minimum AED support using the AED_threshold. But if yo8u want to keep everything that does not have overlap with a better supported model then setting keep_pred=1 will allow even unsupported models to be maintained (1 for yes and 0 for no). Situations where you may want to do this include passing in old annotation datasets or working on organisms with very high genedensity and low ab initio false positive rates (many species of fungi meet this criteria). >> >> ?Carson >> >> >>> On May 2, 2018, at 5:55 AM, Jacques Dainat > wrote: >>> >>> Dear all, >>> >>> It is not the first time I see that, but I have an annotation launched with the option keeps_pred=0 that contains gene models with AED score equal to 1. As far as I understand, AED score to 1 means there is no evidence support. So it should be purely abinitio without any line of evidence in front of this prediction. >>> But as it is written in ?Michael S-Cambell et al., Genome Annotation and Curation Using MAKER and MAKER-P, Current Protocols in Bioinformatics, 2014? page 4.11.36 about the the keep_preds option: >>> "MAKER rejects models that do not have at least some form of evidence support.? Setting keep_preds to 1 ?remove the evidence support requirement?. >>> >>> So, how should I understand my results? Some predictions with low support (but support anyway) can have AED score equal to 1? Or some purely abinitio prediction without support at all can anyway be selected even when keep_preds=0 is set up? >>> >>> Best regards, >>> >>> Jacques Dainat >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kapeelc at gmail.com Mon May 14 09:35:30 2018 From: kapeelc at gmail.com (Kapeel Chougule) Date: Mon, 14 May 2018 10:35:30 -0400 Subject: [maker-devel] MAKER v3 ERROR: Failed while processing all repeats In-Reply-To: <161EF76F-4CFF-457E-A762-5E0FE82EABA2@gmail.com> References: <161EF76F-4CFF-457E-A762-5E0FE82EABA2@gmail.com> Message-ID: Thanks Carson, It works now. I only had to replace the Repeat Masker perl module and it ran without any errors. Best Kapeel On Fri, May 4, 2018 at 2:46 PM, Carson Holt wrote: > Hi Kapeel, > > The failure is caused by the absence of a start or end coordinate (usually > caused by a BLAST report truncation - there is a BLAST bug supposably fixed > now where reports were being truncated by BLAST). If you?ve done all the > updates to installed tools, make sure you aslo set the location of the > updated tools in maker_exe.ctl and reran the ./configure script for > RepeatMasker (internal to it?s install directory) or the old tool is likely > still being used. > > Also if that doesn?t fix it, try the following. Use the attached file to > replace ?/maker/lib/Widget/RepeatMasker.pm > > There is also an as of yet unfixed RepeatMasker bug where it reports a 0 > value for the start/end coordinate when configured with RMBLAST > (RepeatMasker uses a 1 based coordinate system, so 0 is not supposed to be > possible and it only happens with RMBLAST). The change I made to the parser > is a hack where I have MAKER change the RepeatMasker coordinate to 1 > whenever it sees the invalid 0. > > ?Carson > > > > > > On May 1, 2018, at 10:16 AM, Kapeel Chougule wrote: > > Hi, > > I am using MAKER v3 to update community annotation with new RNA-seq > evidence data. Part of my problem is that MAKER shows below error when > repeat masking. I have attached the community annotation gff, maker log and > maker_opts.ctl for your reference. I searched for this error in the > maker-dev google group and found some hints > to > update BLAST to 2.7, rmblast to 2.6 and reconfigure RepeatMasker which I > did but it still fails with the error. Any help appreciated. > > deleted:0 hits > collecting blastx repeatmasking > processing all repeats > in cluster::shadow_cluster... > Died at /mnt/grid/ware/hpc/home/data/mcampbel/applications/maker/ > bin/../lib/Bio/Search/Hit/PhatHit/Base.pm line 188. > --> rank=1, hostname=bnbcompute15.blacknblue.cshl.edu > ERROR: Failed while processing all repeats > ERROR: Chunk failed at level:3, tier_type:1 > FAILED CONTIG:Chr06 > > > ERROR: Chunk failed at level:2, tier_type:0 > > FAILED CONTIG:Chr06 > > > Thanks > > Kapeel > -- > > > > *Kapeel ChouguleComputational Scientist Developer II* > > > *One Bungtown Road Cold Spring Harbor, NY 11724http://www.warelab.org/ > * > ________________ > _______________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > > -- *Kapeel ChouguleComputational Scientist Developer II* *One Bungtown Road Cold Spring Harbor, NY 11724http://www.warelab.org/ * -------------- next part -------------- An HTML attachment was scrubbed... URL: From vsoza at uw.edu Thu May 24 18:19:33 2018 From: vsoza at uw.edu (Valerie Soza) Date: Thu, 24 May 2018 16:19:33 -0700 Subject: [maker-devel] databases supported with ipr_update_gff script Message-ID: <814B3327-AA11-4BB8-B15B-A8A3C03FC950@uw.edu> Hi Maker community I am using the accessory scripts provided with Maker 2.3.19 to do some functional annotations of genes predicted with the Maker pipeline. For integrating information from InterProScan, I want to use the ipr_update_gff script. When I looked at the script, I found the following lines: my %db_map = (BlastProDom => 'ProDom', FPrintScan => 'PRINTS', Gene3D => 'Gene3D', HMMPanther => 'PANTHER', HMMPfam => 'Pfam', HMMPIR => 'PIR', HMMSmart => 'SMART', HMMTigr => 'JCVI_TIGRFAMS', PatternScan => 'Prosite', ProfileScan => 'Prosite', ); Does this indicate that these are the only databases that the script will extract information for from an InterProScan report? I wanted to use all databases currently available from InterProScan 5, InterProScan version 5.28-67.0, see below, but am wondering whether the Maker script will recognize results from all of the following databases? ? CDD ? COILS ? Gene3D ? HAMAP ? MOBIDB ? PANTHER ? Pfam ? PIRSF ? PRINTS ? ProDom ? PROSITE (Profiles and Patterns) ? SFLD ? SMART (unlicensed components only by default - this analysis has simplified post-processing that includes an E-value filter, however you should not expect it to give the same match output as the fully licensed version of SMART) ? SUPERFAMILY ? TIGRFAMs Thanks. -Valerie Valerie Soza, Ph.D. c/o Hall Lab Department of Biology University of Washington Johnson Hall 202A Box 351800 Seattle, WA 98195-1800 206-543-6740 http://staff.washington.edu/vsoza/ From carsonhh at gmail.com Fri May 25 13:19:18 2018 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 25 May 2018 12:19:18 -0600 Subject: [maker-devel] databases supported with ipr_update_gff script In-Reply-To: <814B3327-AA11-4BB8-B15B-A8A3C03FC950@uw.edu> References: <814B3327-AA11-4BB8-B15B-A8A3C03FC950@uw.edu> Message-ID: <822CD0A7-4C7E-4AA8-9B37-85836CD02196@gmail.com> Those are just for name conversion (takes what in the report and rename it to a known DB_xref term). If there is no conversion, the name will stay the same as in the report (unaltered). Different databases have there own db_xref values. I can?t remember where the ones we are using came from (I think it was from GMOD?s Chado database). NCBI also has their own ?> https://www.ncbi.nlm.nih.gov/genbank/collab/db_xref/ , Uniprot ?> https://www.uniprot.org/docs/dbxref , and you can search around for others as well. ?Carson > On May 24, 2018, at 5:19 PM, Valerie Soza wrote: > > Hi Maker community > > I am using the accessory scripts provided with Maker 2.3.19 to do some functional annotations of genes predicted with the Maker pipeline. For integrating information from InterProScan, I want to use the ipr_update_gff script. When I looked at the script, I found the following lines: > > my %db_map = (BlastProDom => 'ProDom', > FPrintScan => 'PRINTS', > Gene3D => 'Gene3D', > HMMPanther => 'PANTHER', > HMMPfam => 'Pfam', > HMMPIR => 'PIR', > HMMSmart => 'SMART', > HMMTigr => 'JCVI_TIGRFAMS', > PatternScan => 'Prosite', > ProfileScan => 'Prosite', > ); > > Does this indicate that these are the only databases that the script will extract information for from an InterProScan report? > > I wanted to use all databases currently available from InterProScan 5, InterProScan version 5.28-67.0, see below, but am wondering whether the Maker script will recognize results from all of the following databases? > > ? CDD > ? COILS > ? Gene3D > ? HAMAP > ? MOBIDB > ? PANTHER > ? Pfam > ? PIRSF > ? PRINTS > ? ProDom > ? PROSITE (Profiles and Patterns) > ? SFLD > ? SMART (unlicensed components only by default - this analysis has simplified post-processing that includes an E-value filter, however you should not expect it to give the same match output as the fully licensed version of SMART) > ? SUPERFAMILY > ? TIGRFAMs > > Thanks. > > -Valerie > > Valerie Soza, Ph.D. > c/o Hall Lab > Department of Biology > University of Washington > Johnson Hall 202A > Box 351800 > Seattle, WA 98195-1800 > 206-543-6740 > http://staff.washington.edu/vsoza/ > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From timo.metz at googlemail.com Tue May 8 07:11:05 2018 From: timo.metz at googlemail.com (Timo Metz) Date: Tue, 08 May 2018 12:11:05 -0000 Subject: [maker-devel] large UTR overhang Message-ID: Hey guys, Attached there is a picture of a recent MAKER run where I used pacbio reads and trinity assembled short reads plus proteins from Swissprot. I don't really get why MAKER assigns such a large fraction to be a UTR. From the name I can tell that the final gene model stems from a snap ab-initio prediction but even that gene prediction is not 100% identical to the part of the gene model which is not UTR. Is there any setting in MAKER in which I can somehow have an influence on how MAKER assigns such UTRs? I already tried out the "correct_est_fusion" option but it did not help. best Timo -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: UTRoverhang.png Type: image/png Size: 81905 bytes Desc: not available URL: From timo.metz at googlemail.com Tue May 22 06:07:46 2018 From: timo.metz at googlemail.com (Timo Metz) Date: Tue, 22 May 2018 13:07:46 +0200 Subject: [maker-devel] MAKER beta not inferring gene models from protein evidence Message-ID: Hey guys, I have installed Maker v3 beta in order to use the built-in Evidence-Modeler which is not part of v2.31. Now I could see that, even if using the same evidence, the BUSCO completeness of the transcriptome drops when using the v3 beta compared to the v2.31. I could identify the reason leading to this was that MAKER v3 now does not infer gene models from Protein evidence if there is no additional support from RNA-seq/ests. In v2.31 it did, on the contrary. Is there any option in v3 beta to also get gene models only from protein evidence or is this something that v3 beta is not able to do anymore? best Timo -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue May 29 11:07:20 2018 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 29 May 2018 10:07:20 -0600 Subject: [maker-devel] large UTR overhang In-Reply-To: References: Message-ID: <3CE6DD4B-B176-4051-98B7-D47F33987E10@gmail.com> MAKER 3 does not have any additional requirement for transcript support that MAKER 2 does not have. However, if you are using the correct_est_fusion=1 option, it will only use the polished protein evidence rather than the unpolished blastx alignments which is probably what you are seeing. The model you show also likely corresponds to either a paralogous duplication or a broken ORF due to assembly error. You can see clearly that both SNAP and Augustus want to break the region into two separate models (they can?t find a single workable ORF). The raw BLASTX alignments and transcription data want to merge the region (I don?t see any support for merging from polished protein2genome alignments though - maybe you just cut that off in the image?). So when the predictors are fed hints suggesting the longer model, they build the best model they can, but the ORF is broken, so remaining exons will match the transcript evidence exactly, but have to be UTR given the broken ORF. This means you are either merging things that shouldn?t be merged (based on bad evidence alignments) or the assembly has an error that keeps the ORF from functioning in that region as it should. The overall structure is still captured, but the translation is truncated. Here is a secondary tool you can try called DeFusion that may help if you are getting false merges because of the evidence ?> https://wjidea.github.io/defusion/ ?Carson > On May 8, 2018, at 6:10 AM, Timo Metz wrote: > > Hey guys, > > Attached there is a picture of a recent MAKER run where I used pacbio reads and trinity assembled short reads plus proteins from Swissprot. I don't really get why MAKER assigns such a large fraction to be a UTR. From the name I can tell that the final gene model stems from a snap ab-initio prediction but even that gene prediction is not 100% identical to the part of the gene model which is not UTR. > > Is there any setting in MAKER in which I can somehow have an influence on how MAKER assigns such UTRs? I already tried out the "correct_est_fusion" option but it did not help. > > best > > Timo > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From ksenia.lavrichenko at gmail.com Thu May 31 09:30:12 2018 From: ksenia.lavrichenko at gmail.com (Ksenia Lavrichenko) Date: Thu, 31 May 2018 16:30:12 +0200 Subject: [maker-devel] Building MAKER with specific perl version Message-ID: Hi, I have been banging my head for a while now, trying to install MAKER with my specific perl. I found this old thread: https://groups.google.com/forum/#!msg/maker-devel/hScqdJW0FsU/3KT_UF7k9XMJ However, this does not work for me. I make sure bin/* and Build are deleted before I run $myperl Build.PL. I see my perl in shebang of Build however after ./Build install all scripts in bin have "#! /usr/bin/perl" which produces a version error when I try to run maker -h. Any tips of what do I need to adjust in Build.PL? Many thanks, Ksenia -------------- next part -------------- An HTML attachment was scrubbed... URL: From jacques.dainat at nbis.se Wed May 2 05:55:13 2018 From: jacques.dainat at nbis.se (Jacques Dainat) Date: Wed, 2 May 2018 13:55:13 +0200 Subject: [maker-devel] keep_preds=0 but some gene models have anyway an AED equal to 1. Message-ID: Dear all, It is not the first time I see that, but I have an annotation launched with the option keeps_pred=0 that contains gene models with AED score equal to 1. As far as I understand, AED score to 1 means there is no evidence support. So it should be purely abinitio without any line of evidence in front of this prediction. But as it is written in ?Michael S-Cambell et al., Genome Annotation and Curation Using MAKER and MAKER-P, Current Protocols in Bioinformatics, 2014? page 4.11.36 about the the keep_preds option: "MAKER rejects models that do not have at least some form of evidence support.? Setting keep_preds to 1 ?remove the evidence support requirement?. So, how should I understand my results? Some predictions with low support (but support anyway) can have AED score equal to 1? Or some purely abinitio prediction without support at all can anyway be selected even when keep_preds=0 is set up? Best regards, Jacques Dainat -------------- next part -------------- An HTML attachment was scrubbed... URL: From kapeelc at gmail.com Tue May 1 10:16:27 2018 From: kapeelc at gmail.com (Kapeel Chougule) Date: Tue, 1 May 2018 12:16:27 -0400 Subject: [maker-devel] MAKER v3 ERROR: Failed while processing all repeats Message-ID: Hi, I am using MAKER v3 to update community annotation with new RNA-seq evidence data. Part of my problem is that MAKER shows below error when repeat masking. I have attached the community annotation gff, maker log and maker_opts.ctl for your reference. I searched for this error in the maker-dev google group and found some hints to update BLAST to 2.7, rmblast to 2.6 and reconfigure RepeatMasker which I did but it still fails with the error. Any help appreciated. deleted:0 hits collecting blastx repeatmasking processing all repeats in cluster::shadow_cluster... Died at /mnt/grid/ware/hpc/home/data/mcampbel/applications/maker/bin/../lib/Bio/Search/Hit/PhatHit/Base.pm line 188. --> rank=1, hostname=bnbcompute15.blacknblue.cshl.edu ERROR: Failed while processing all repeats ERROR: Chunk failed at level:3, tier_type:1 FAILED CONTIG:Chr06 ERROR: Chunk failed at level:2, tier_type:0 FAILED CONTIG:Chr06 Thanks Kapeel -- *Kapeel ChouguleComputational Scientist Developer II* *One Bungtown Road Cold Spring Harbor, NY 11724http://www.warelab.org/ * -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Chr06.gff Type: application/octet-stream Size: 8239003 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_opts.ctl Type: application/octet-stream Size: 4779 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker.run.log Type: application/octet-stream Size: 5368010 bytes Desc: not available URL: From kapeelc at gmail.com Fri May 4 10:01:39 2018 From: kapeelc at gmail.com (Kapeel Chougule) Date: Fri, 4 May 2018 12:01:39 -0400 Subject: [maker-devel] MAKER v3 ERROR: Failed while processing all repeats Message-ID: Hi, I am using MAKER v3 to update community annotation with new RNA-seq evidence data. Part of my problem is that MAKER shows below error when repeat masking. I have attached the community annotation gff, maker log and maker_opts.ctl for your reference. I searched for this error in the maker-dev google group and found some hints to update BLAST to 2.7, rmblast to 2.6 and reconfigure RepeatMasker which I did but it still fails with the error. Any help appreciated. deleted:0 hits collecting blastx repeatmasking processing all repeats in cluster::shadow_cluster... Died at /mnt/grid/ware/hpc/home/data/mcampbel/applications/maker/ bin/../lib/Bio/Search/Hit/PhatHit/Base.pm line 188. --> rank=1, hostname=bnbcompute15.blacknblue.cshl.edu ERROR: Failed while processing all repeats ERROR: Chunk failed at level:3, tier_type:1 FAILED CONTIG:Chr06 ERROR: Chunk failed at level:2, tier_type:0 FAILED CONTIG:Chr06 Genome fasta: https://de.cyverse.org/dl/d/F612A6E2-A58E-44F0-895D-B766B41FE287/Chr06.gff MAKER_run_log: https://de.cyverse.org/dl/d/0B898D91-1520-4D19-9835-DC7EDD52415F/maker.run.log Maker_opts.ctl: https://de.cyverse.org/dl/d/15EB321B-4604-47C9-8E4F-0DC8D78517CE/maker_opts.ctl Thanks -- *Kapeel ChouguleComputational Scientist Developer II* *One Bungtown Road Cold Spring Harbor, NY 11724http://www.warelab.org/ * -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri May 4 12:30:44 2018 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 4 May 2018 12:30:44 -0600 Subject: [maker-devel] keep_preds=0 but some gene models have anyway an AED equal to 1. In-Reply-To: References: Message-ID: <41640591-9D18-4B08-9FD7-00C550EB7B2A@gmail.com> By default MAKER will not let models through without at least some degree of evidence overlap (AED < 1). This is because ab initio predictors overcall (sometimes by as much as a factor of 10, i.e. 10 false positives for every true positive). You can dial in the minimum AED support using the AED_threshold. But if yo8u want to keep everything that does not have overlap with a better supported model then setting keep_pred=1 will allow even unsupported models to be maintained (1 for yes and 0 for no). Situations where you may want to do this include passing in old annotation datasets or working on organisms with very high genedensity and low ab initio false positive rates (many species of fungi meet this criteria). ?Carson > On May 2, 2018, at 5:55 AM, Jacques Dainat wrote: > > Dear all, > > It is not the first time I see that, but I have an annotation launched with the option keeps_pred=0 that contains gene models with AED score equal to 1. As far as I understand, AED score to 1 means there is no evidence support. So it should be purely abinitio without any line of evidence in front of this prediction. > But as it is written in ?Michael S-Cambell et al., Genome Annotation and Curation Using MAKER and MAKER-P, Current Protocols in Bioinformatics, 2014? page 4.11.36 about the the keep_preds option: > "MAKER rejects models that do not have at least some form of evidence support.? Setting keep_preds to 1 ?remove the evidence support requirement?. > > So, how should I understand my results? Some predictions with low support (but support anyway) can have AED score equal to 1? Or some purely abinitio prediction without support at all can anyway be selected even when keep_preds=0 is set up? > > Best regards, > > Jacques Dainat > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri May 4 12:46:55 2018 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 4 May 2018 12:46:55 -0600 Subject: [maker-devel] MAKER v3 ERROR: Failed while processing all repeats In-Reply-To: References: Message-ID: <161EF76F-4CFF-457E-A762-5E0FE82EABA2@gmail.com> Hi Kapeel, The failure is caused by the absence of a start or end coordinate (usually caused by a BLAST report truncation - there is a BLAST bug supposably fixed now where reports were being truncated by BLAST). If you?ve done all the updates to installed tools, make sure you aslo set the location of the updated tools in maker_exe.ctl and reran the ./configure script for RepeatMasker (internal to it?s install directory) or the old tool is likely still being used. Also if that doesn?t fix it, try the following. Use the attached file to replace ?/maker/lib/Widget/RepeatMasker.pm There is also an as of yet unfixed RepeatMasker bug where it reports a 0 value for the start/end coordinate when configured with RMBLAST (RepeatMasker uses a 1 based coordinate system, so 0 is not supposed to be possible and it only happens with RMBLAST). The change I made to the parser is a hack where I have MAKER change the RepeatMasker coordinate to 1 whenever it sees the invalid 0. ?Carson > On May 1, 2018, at 10:16 AM, Kapeel Chougule wrote: > > Hi, > > I am using MAKER v3 to update community annotation with new RNA-seq evidence data. Part of my problem is that MAKER shows below error when repeat masking. I have attached the community annotation gff, maker log and maker_opts.ctl for your reference. I searched for this error in the maker-dev google group and found some hints to update BLAST to 2.7, rmblast to 2.6 and reconfigure RepeatMasker which I did but it still fails with the error. Any help appreciated. > > deleted:0 hits > collecting blastx repeatmasking > processing all repeats > in cluster::shadow_cluster... > Died at /mnt/grid/ware/hpc/home/data/mcampbel/applications/maker/bin/../lib/Bio/Search/Hit/PhatHit/Base.pm line 188. > --> rank=1, hostname=bnbcompute15.blacknblue.cshl.edu > ERROR: Failed while processing all repeats > ERROR: Chunk failed at level:3, tier_type:1 > FAILED CONTIG:Chr06 > > ERROR: Chunk failed at level:2, tier_type:0 > FAILED CONTIG:Chr06 > > > Thanks > > Kapeel > -- > > Kapeel Chougule > Computational Scientist Developer II > One Bungtown Road Cold Spring Harbor, NY 11724 > http://www.warelab.org/ > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: RepeatMasker.pm Type: text/x-perl-script Size: 9317 bytes Desc: not available URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: From jacques.dainat at nbis.se Mon May 7 05:25:13 2018 From: jacques.dainat at nbis.se (Jacques Dainat) Date: Mon, 7 May 2018 13:25:13 +0200 Subject: [maker-devel] keep_preds=0 but some gene models have anyway an AED equal to 1. In-Reply-To: <41640591-9D18-4B08-9FD7-00C550EB7B2A@gmail.com> References: <41640591-9D18-4B08-9FD7-00C550EB7B2A@gmail.com> Message-ID: <5404E0E7-82CD-420F-AAEF-CFD01793C075@nbis.se> Hi, Thank you for your reply, nevertheless it doesn't it answer my question. Either I didn't express myself well enough or I don't get something obvious from your answer. I will try to rephrase the problem. My problem is that I am setting keep_phred=0 and that I obtain AED of 1. Can unsupported data can be selected with keep_phred=0? what I understood is that it is not the case so do you have any idea of why I have those AED equal to 1? Thank you again for your help. Best regards, Jacques > On 4 May 2018, at 20:30, Carson Holt wrote: > > By default MAKER will not let models through without at least some degree of evidence overlap (AED < 1). This is because ab initio predictors overcall (sometimes by as much as a factor of 10, i.e. 10 false positives for every true positive). You can dial in the minimum AED support using the AED_threshold. But if yo8u want to keep everything that does not have overlap with a better supported model then setting keep_pred=1 will allow even unsupported models to be maintained (1 for yes and 0 for no). Situations where you may want to do this include passing in old annotation datasets or working on organisms with very high genedensity and low ab initio false positive rates (many species of fungi meet this criteria). > > ?Carson > > >> On May 2, 2018, at 5:55 AM, Jacques Dainat > wrote: >> >> Dear all, >> >> It is not the first time I see that, but I have an annotation launched with the option keeps_pred=0 that contains gene models with AED score equal to 1. As far as I understand, AED score to 1 means there is no evidence support. So it should be purely abinitio without any line of evidence in front of this prediction. >> But as it is written in ?Michael S-Cambell et al., Genome Annotation and Curation Using MAKER and MAKER-P, Current Protocols in Bioinformatics, 2014? page 4.11.36 about the the keep_preds option: >> "MAKER rejects models that do not have at least some form of evidence support.? Setting keep_preds to 1 ?remove the evidence support requirement?. >> >> So, how should I understand my results? Some predictions with low support (but support anyway) can have AED score equal to 1? Or some purely abinitio prediction without support at all can anyway be selected even when keep_preds=0 is set up? >> >> Best regards, >> >> Jacques Dainat >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue May 8 07:28:27 2018 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 8 May 2018 07:28:27 -0600 Subject: [maker-devel] keep_preds=0 but some gene models have anyway an AED equal to 1. In-Reply-To: <5404E0E7-82CD-420F-AAEF-CFD01793C075@nbis.se> References: <41640591-9D18-4B08-9FD7-00C550EB7B2A@gmail.com> <5404E0E7-82CD-420F-AAEF-CFD01793C075@nbis.se> Message-ID: <8ADB4928-A00E-41E1-97B6-413F8257276C@gmail.com> You should not see AED=1 in the annotations (unless you supplied features to model_gff those are always maintained). But you will see AED=1 in the evidence track. Make sure you are looking at features with a source tag of ?maker? and type of gene/mRNA/exon/CDS, and not type match/match_part. The match/match_part features are reference features in the evidence track. The reference features will also have their own fasta file. The only fasta you should use are the maker.proteins.fasta and maker.transcripts.fasta not the snap_masked.protein.fasta for example. ?Carson > On May 7, 2018, at 5:25 AM, Jacques Dainat wrote: > > Hi, > > Thank you for your reply, nevertheless it doesn't it answer my question. Either I didn't express myself well enough or I don't get something obvious from your answer. > > I will try to rephrase the problem. > My problem is that I am setting keep_phred=0 and that I obtain AED of 1. Can unsupported data can be selected with keep_phred=0? what I understood is that it is not the case so do you have any idea of why I have those AED equal to 1? > > Thank you again for your help. > Best regards, > > Jacques > > >> On 4 May 2018, at 20:30, Carson Holt > wrote: >> >> By default MAKER will not let models through without at least some degree of evidence overlap (AED < 1). This is because ab initio predictors overcall (sometimes by as much as a factor of 10, i.e. 10 false positives for every true positive). You can dial in the minimum AED support using the AED_threshold. But if yo8u want to keep everything that does not have overlap with a better supported model then setting keep_pred=1 will allow even unsupported models to be maintained (1 for yes and 0 for no). Situations where you may want to do this include passing in old annotation datasets or working on organisms with very high genedensity and low ab initio false positive rates (many species of fungi meet this criteria). >> >> ?Carson >> >> >>> On May 2, 2018, at 5:55 AM, Jacques Dainat > wrote: >>> >>> Dear all, >>> >>> It is not the first time I see that, but I have an annotation launched with the option keeps_pred=0 that contains gene models with AED score equal to 1. As far as I understand, AED score to 1 means there is no evidence support. So it should be purely abinitio without any line of evidence in front of this prediction. >>> But as it is written in ?Michael S-Cambell et al., Genome Annotation and Curation Using MAKER and MAKER-P, Current Protocols in Bioinformatics, 2014? page 4.11.36 about the the keep_preds option: >>> "MAKER rejects models that do not have at least some form of evidence support.? Setting keep_preds to 1 ?remove the evidence support requirement?. >>> >>> So, how should I understand my results? Some predictions with low support (but support anyway) can have AED score equal to 1? Or some purely abinitio prediction without support at all can anyway be selected even when keep_preds=0 is set up? >>> >>> Best regards, >>> >>> Jacques Dainat >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kapeelc at gmail.com Mon May 14 08:35:30 2018 From: kapeelc at gmail.com (Kapeel Chougule) Date: Mon, 14 May 2018 10:35:30 -0400 Subject: [maker-devel] MAKER v3 ERROR: Failed while processing all repeats In-Reply-To: <161EF76F-4CFF-457E-A762-5E0FE82EABA2@gmail.com> References: <161EF76F-4CFF-457E-A762-5E0FE82EABA2@gmail.com> Message-ID: Thanks Carson, It works now. I only had to replace the Repeat Masker perl module and it ran without any errors. Best Kapeel On Fri, May 4, 2018 at 2:46 PM, Carson Holt wrote: > Hi Kapeel, > > The failure is caused by the absence of a start or end coordinate (usually > caused by a BLAST report truncation - there is a BLAST bug supposably fixed > now where reports were being truncated by BLAST). If you?ve done all the > updates to installed tools, make sure you aslo set the location of the > updated tools in maker_exe.ctl and reran the ./configure script for > RepeatMasker (internal to it?s install directory) or the old tool is likely > still being used. > > Also if that doesn?t fix it, try the following. Use the attached file to > replace ?/maker/lib/Widget/RepeatMasker.pm > > There is also an as of yet unfixed RepeatMasker bug where it reports a 0 > value for the start/end coordinate when configured with RMBLAST > (RepeatMasker uses a 1 based coordinate system, so 0 is not supposed to be > possible and it only happens with RMBLAST). The change I made to the parser > is a hack where I have MAKER change the RepeatMasker coordinate to 1 > whenever it sees the invalid 0. > > ?Carson > > > > > > On May 1, 2018, at 10:16 AM, Kapeel Chougule wrote: > > Hi, > > I am using MAKER v3 to update community annotation with new RNA-seq > evidence data. Part of my problem is that MAKER shows below error when > repeat masking. I have attached the community annotation gff, maker log and > maker_opts.ctl for your reference. I searched for this error in the > maker-dev google group and found some hints > to > update BLAST to 2.7, rmblast to 2.6 and reconfigure RepeatMasker which I > did but it still fails with the error. Any help appreciated. > > deleted:0 hits > collecting blastx repeatmasking > processing all repeats > in cluster::shadow_cluster... > Died at /mnt/grid/ware/hpc/home/data/mcampbel/applications/maker/ > bin/../lib/Bio/Search/Hit/PhatHit/Base.pm line 188. > --> rank=1, hostname=bnbcompute15.blacknblue.cshl.edu > ERROR: Failed while processing all repeats > ERROR: Chunk failed at level:3, tier_type:1 > FAILED CONTIG:Chr06 > > > ERROR: Chunk failed at level:2, tier_type:0 > > FAILED CONTIG:Chr06 > > > Thanks > > Kapeel > -- > > > > *Kapeel ChouguleComputational Scientist Developer II* > > > *One Bungtown Road Cold Spring Harbor, NY 11724http://www.warelab.org/ > * > ________________ > _______________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > > -- *Kapeel ChouguleComputational Scientist Developer II* *One Bungtown Road Cold Spring Harbor, NY 11724http://www.warelab.org/ * -------------- next part -------------- An HTML attachment was scrubbed... URL: From vsoza at uw.edu Thu May 24 17:19:33 2018 From: vsoza at uw.edu (Valerie Soza) Date: Thu, 24 May 2018 16:19:33 -0700 Subject: [maker-devel] databases supported with ipr_update_gff script Message-ID: <814B3327-AA11-4BB8-B15B-A8A3C03FC950@uw.edu> Hi Maker community I am using the accessory scripts provided with Maker 2.3.19 to do some functional annotations of genes predicted with the Maker pipeline. For integrating information from InterProScan, I want to use the ipr_update_gff script. When I looked at the script, I found the following lines: my %db_map = (BlastProDom => 'ProDom', FPrintScan => 'PRINTS', Gene3D => 'Gene3D', HMMPanther => 'PANTHER', HMMPfam => 'Pfam', HMMPIR => 'PIR', HMMSmart => 'SMART', HMMTigr => 'JCVI_TIGRFAMS', PatternScan => 'Prosite', ProfileScan => 'Prosite', ); Does this indicate that these are the only databases that the script will extract information for from an InterProScan report? I wanted to use all databases currently available from InterProScan 5, InterProScan version 5.28-67.0, see below, but am wondering whether the Maker script will recognize results from all of the following databases? ? CDD ? COILS ? Gene3D ? HAMAP ? MOBIDB ? PANTHER ? Pfam ? PIRSF ? PRINTS ? ProDom ? PROSITE (Profiles and Patterns) ? SFLD ? SMART (unlicensed components only by default - this analysis has simplified post-processing that includes an E-value filter, however you should not expect it to give the same match output as the fully licensed version of SMART) ? SUPERFAMILY ? TIGRFAMs Thanks. -Valerie Valerie Soza, Ph.D. c/o Hall Lab Department of Biology University of Washington Johnson Hall 202A Box 351800 Seattle, WA 98195-1800 206-543-6740 http://staff.washington.edu/vsoza/ From carsonhh at gmail.com Fri May 25 12:19:18 2018 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 25 May 2018 12:19:18 -0600 Subject: [maker-devel] databases supported with ipr_update_gff script In-Reply-To: <814B3327-AA11-4BB8-B15B-A8A3C03FC950@uw.edu> References: <814B3327-AA11-4BB8-B15B-A8A3C03FC950@uw.edu> Message-ID: <822CD0A7-4C7E-4AA8-9B37-85836CD02196@gmail.com> Those are just for name conversion (takes what in the report and rename it to a known DB_xref term). If there is no conversion, the name will stay the same as in the report (unaltered). Different databases have there own db_xref values. I can?t remember where the ones we are using came from (I think it was from GMOD?s Chado database). NCBI also has their own ?> https://www.ncbi.nlm.nih.gov/genbank/collab/db_xref/ , Uniprot ?> https://www.uniprot.org/docs/dbxref , and you can search around for others as well. ?Carson > On May 24, 2018, at 5:19 PM, Valerie Soza wrote: > > Hi Maker community > > I am using the accessory scripts provided with Maker 2.3.19 to do some functional annotations of genes predicted with the Maker pipeline. For integrating information from InterProScan, I want to use the ipr_update_gff script. When I looked at the script, I found the following lines: > > my %db_map = (BlastProDom => 'ProDom', > FPrintScan => 'PRINTS', > Gene3D => 'Gene3D', > HMMPanther => 'PANTHER', > HMMPfam => 'Pfam', > HMMPIR => 'PIR', > HMMSmart => 'SMART', > HMMTigr => 'JCVI_TIGRFAMS', > PatternScan => 'Prosite', > ProfileScan => 'Prosite', > ); > > Does this indicate that these are the only databases that the script will extract information for from an InterProScan report? > > I wanted to use all databases currently available from InterProScan 5, InterProScan version 5.28-67.0, see below, but am wondering whether the Maker script will recognize results from all of the following databases? > > ? CDD > ? COILS > ? Gene3D > ? HAMAP > ? MOBIDB > ? PANTHER > ? Pfam > ? PIRSF > ? PRINTS > ? ProDom > ? PROSITE (Profiles and Patterns) > ? SFLD > ? SMART (unlicensed components only by default - this analysis has simplified post-processing that includes an E-value filter, however you should not expect it to give the same match output as the fully licensed version of SMART) > ? SUPERFAMILY > ? TIGRFAMs > > Thanks. > > -Valerie > > Valerie Soza, Ph.D. > c/o Hall Lab > Department of Biology > University of Washington > Johnson Hall 202A > Box 351800 > Seattle, WA 98195-1800 > 206-543-6740 > http://staff.washington.edu/vsoza/ > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From timo.metz at googlemail.com Tue May 8 06:11:05 2018 From: timo.metz at googlemail.com (Timo Metz) Date: Tue, 08 May 2018 12:11:05 -0000 Subject: [maker-devel] large UTR overhang Message-ID: Hey guys, Attached there is a picture of a recent MAKER run where I used pacbio reads and trinity assembled short reads plus proteins from Swissprot. I don't really get why MAKER assigns such a large fraction to be a UTR. From the name I can tell that the final gene model stems from a snap ab-initio prediction but even that gene prediction is not 100% identical to the part of the gene model which is not UTR. Is there any setting in MAKER in which I can somehow have an influence on how MAKER assigns such UTRs? I already tried out the "correct_est_fusion" option but it did not help. best Timo -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: UTRoverhang.png Type: image/png Size: 81905 bytes Desc: not available URL: From timo.metz at googlemail.com Tue May 22 05:07:46 2018 From: timo.metz at googlemail.com (Timo Metz) Date: Tue, 22 May 2018 13:07:46 +0200 Subject: [maker-devel] MAKER beta not inferring gene models from protein evidence Message-ID: Hey guys, I have installed Maker v3 beta in order to use the built-in Evidence-Modeler which is not part of v2.31. Now I could see that, even if using the same evidence, the BUSCO completeness of the transcriptome drops when using the v3 beta compared to the v2.31. I could identify the reason leading to this was that MAKER v3 now does not infer gene models from Protein evidence if there is no additional support from RNA-seq/ests. In v2.31 it did, on the contrary. Is there any option in v3 beta to also get gene models only from protein evidence or is this something that v3 beta is not able to do anymore? best Timo -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue May 29 10:07:20 2018 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 29 May 2018 10:07:20 -0600 Subject: [maker-devel] large UTR overhang In-Reply-To: References: Message-ID: <3CE6DD4B-B176-4051-98B7-D47F33987E10@gmail.com> MAKER 3 does not have any additional requirement for transcript support that MAKER 2 does not have. However, if you are using the correct_est_fusion=1 option, it will only use the polished protein evidence rather than the unpolished blastx alignments which is probably what you are seeing. The model you show also likely corresponds to either a paralogous duplication or a broken ORF due to assembly error. You can see clearly that both SNAP and Augustus want to break the region into two separate models (they can?t find a single workable ORF). The raw BLASTX alignments and transcription data want to merge the region (I don?t see any support for merging from polished protein2genome alignments though - maybe you just cut that off in the image?). So when the predictors are fed hints suggesting the longer model, they build the best model they can, but the ORF is broken, so remaining exons will match the transcript evidence exactly, but have to be UTR given the broken ORF. This means you are either merging things that shouldn?t be merged (based on bad evidence alignments) or the assembly has an error that keeps the ORF from functioning in that region as it should. The overall structure is still captured, but the translation is truncated. Here is a secondary tool you can try called DeFusion that may help if you are getting false merges because of the evidence ?> https://wjidea.github.io/defusion/ ?Carson > On May 8, 2018, at 6:10 AM, Timo Metz wrote: > > Hey guys, > > Attached there is a picture of a recent MAKER run where I used pacbio reads and trinity assembled short reads plus proteins from Swissprot. I don't really get why MAKER assigns such a large fraction to be a UTR. From the name I can tell that the final gene model stems from a snap ab-initio prediction but even that gene prediction is not 100% identical to the part of the gene model which is not UTR. > > Is there any setting in MAKER in which I can somehow have an influence on how MAKER assigns such UTRs? I already tried out the "correct_est_fusion" option but it did not help. > > best > > Timo > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From ksenia.lavrichenko at gmail.com Thu May 31 08:30:12 2018 From: ksenia.lavrichenko at gmail.com (Ksenia Lavrichenko) Date: Thu, 31 May 2018 16:30:12 +0200 Subject: [maker-devel] Building MAKER with specific perl version Message-ID: Hi, I have been banging my head for a while now, trying to install MAKER with my specific perl. I found this old thread: https://groups.google.com/forum/#!msg/maker-devel/hScqdJW0FsU/3KT_UF7k9XMJ However, this does not work for me. I make sure bin/* and Build are deleted before I run $myperl Build.PL. I see my perl in shebang of Build however after ./Build install all scripts in bin have "#! /usr/bin/perl" which produces a version error when I try to run maker -h. Any tips of what do I need to adjust in Build.PL? Many thanks, Ksenia -------------- next part -------------- An HTML attachment was scrubbed... URL: From jacques.dainat at nbis.se Wed May 2 05:55:13 2018 From: jacques.dainat at nbis.se (Jacques Dainat) Date: Wed, 2 May 2018 13:55:13 +0200 Subject: [maker-devel] keep_preds=0 but some gene models have anyway an AED equal to 1. Message-ID: Dear all, It is not the first time I see that, but I have an annotation launched with the option keeps_pred=0 that contains gene models with AED score equal to 1. As far as I understand, AED score to 1 means there is no evidence support. So it should be purely abinitio without any line of evidence in front of this prediction. But as it is written in ?Michael S-Cambell et al., Genome Annotation and Curation Using MAKER and MAKER-P, Current Protocols in Bioinformatics, 2014? page 4.11.36 about the the keep_preds option: "MAKER rejects models that do not have at least some form of evidence support.? Setting keep_preds to 1 ?remove the evidence support requirement?. So, how should I understand my results? Some predictions with low support (but support anyway) can have AED score equal to 1? Or some purely abinitio prediction without support at all can anyway be selected even when keep_preds=0 is set up? Best regards, Jacques Dainat -------------- next part -------------- An HTML attachment was scrubbed... URL: From kapeelc at gmail.com Tue May 1 10:16:27 2018 From: kapeelc at gmail.com (Kapeel Chougule) Date: Tue, 1 May 2018 12:16:27 -0400 Subject: [maker-devel] MAKER v3 ERROR: Failed while processing all repeats Message-ID: Hi, I am using MAKER v3 to update community annotation with new RNA-seq evidence data. Part of my problem is that MAKER shows below error when repeat masking. I have attached the community annotation gff, maker log and maker_opts.ctl for your reference. I searched for this error in the maker-dev google group and found some hints to update BLAST to 2.7, rmblast to 2.6 and reconfigure RepeatMasker which I did but it still fails with the error. Any help appreciated. deleted:0 hits collecting blastx repeatmasking processing all repeats in cluster::shadow_cluster... Died at /mnt/grid/ware/hpc/home/data/mcampbel/applications/maker/bin/../lib/Bio/Search/Hit/PhatHit/Base.pm line 188. --> rank=1, hostname=bnbcompute15.blacknblue.cshl.edu ERROR: Failed while processing all repeats ERROR: Chunk failed at level:3, tier_type:1 FAILED CONTIG:Chr06 ERROR: Chunk failed at level:2, tier_type:0 FAILED CONTIG:Chr06 Thanks Kapeel -- *Kapeel ChouguleComputational Scientist Developer II* *One Bungtown Road Cold Spring Harbor, NY 11724http://www.warelab.org/ * -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Chr06.gff Type: application/octet-stream Size: 8239004 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_opts.ctl Type: application/octet-stream Size: 4780 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker.run.log Type: application/octet-stream Size: 5368011 bytes Desc: not available URL: From kapeelc at gmail.com Fri May 4 10:01:39 2018 From: kapeelc at gmail.com (Kapeel Chougule) Date: Fri, 4 May 2018 12:01:39 -0400 Subject: [maker-devel] MAKER v3 ERROR: Failed while processing all repeats Message-ID: Hi, I am using MAKER v3 to update community annotation with new RNA-seq evidence data. Part of my problem is that MAKER shows below error when repeat masking. I have attached the community annotation gff, maker log and maker_opts.ctl for your reference. I searched for this error in the maker-dev google group and found some hints to update BLAST to 2.7, rmblast to 2.6 and reconfigure RepeatMasker which I did but it still fails with the error. Any help appreciated. deleted:0 hits collecting blastx repeatmasking processing all repeats in cluster::shadow_cluster... Died at /mnt/grid/ware/hpc/home/data/mcampbel/applications/maker/ bin/../lib/Bio/Search/Hit/PhatHit/Base.pm line 188. --> rank=1, hostname=bnbcompute15.blacknblue.cshl.edu ERROR: Failed while processing all repeats ERROR: Chunk failed at level:3, tier_type:1 FAILED CONTIG:Chr06 ERROR: Chunk failed at level:2, tier_type:0 FAILED CONTIG:Chr06 Genome fasta: https://de.cyverse.org/dl/d/F612A6E2-A58E-44F0-895D-B766B41FE287/Chr06.gff MAKER_run_log: https://de.cyverse.org/dl/d/0B898D91-1520-4D19-9835-DC7EDD52415F/maker.run.log Maker_opts.ctl: https://de.cyverse.org/dl/d/15EB321B-4604-47C9-8E4F-0DC8D78517CE/maker_opts.ctl Thanks -- *Kapeel ChouguleComputational Scientist Developer II* *One Bungtown Road Cold Spring Harbor, NY 11724http://www.warelab.org/ * -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri May 4 12:30:44 2018 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 4 May 2018 12:30:44 -0600 Subject: [maker-devel] keep_preds=0 but some gene models have anyway an AED equal to 1. In-Reply-To: References: Message-ID: <41640591-9D18-4B08-9FD7-00C550EB7B2A@gmail.com> By default MAKER will not let models through without at least some degree of evidence overlap (AED < 1). This is because ab initio predictors overcall (sometimes by as much as a factor of 10, i.e. 10 false positives for every true positive). You can dial in the minimum AED support using the AED_threshold. But if yo8u want to keep everything that does not have overlap with a better supported model then setting keep_pred=1 will allow even unsupported models to be maintained (1 for yes and 0 for no). Situations where you may want to do this include passing in old annotation datasets or working on organisms with very high genedensity and low ab initio false positive rates (many species of fungi meet this criteria). ?Carson > On May 2, 2018, at 5:55 AM, Jacques Dainat wrote: > > Dear all, > > It is not the first time I see that, but I have an annotation launched with the option keeps_pred=0 that contains gene models with AED score equal to 1. As far as I understand, AED score to 1 means there is no evidence support. So it should be purely abinitio without any line of evidence in front of this prediction. > But as it is written in ?Michael S-Cambell et al., Genome Annotation and Curation Using MAKER and MAKER-P, Current Protocols in Bioinformatics, 2014? page 4.11.36 about the the keep_preds option: > "MAKER rejects models that do not have at least some form of evidence support.? Setting keep_preds to 1 ?remove the evidence support requirement?. > > So, how should I understand my results? Some predictions with low support (but support anyway) can have AED score equal to 1? Or some purely abinitio prediction without support at all can anyway be selected even when keep_preds=0 is set up? > > Best regards, > > Jacques Dainat > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri May 4 12:46:55 2018 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 4 May 2018 12:46:55 -0600 Subject: [maker-devel] MAKER v3 ERROR: Failed while processing all repeats In-Reply-To: References: Message-ID: <161EF76F-4CFF-457E-A762-5E0FE82EABA2@gmail.com> Hi Kapeel, The failure is caused by the absence of a start or end coordinate (usually caused by a BLAST report truncation - there is a BLAST bug supposably fixed now where reports were being truncated by BLAST). If you?ve done all the updates to installed tools, make sure you aslo set the location of the updated tools in maker_exe.ctl and reran the ./configure script for RepeatMasker (internal to it?s install directory) or the old tool is likely still being used. Also if that doesn?t fix it, try the following. Use the attached file to replace ?/maker/lib/Widget/RepeatMasker.pm There is also an as of yet unfixed RepeatMasker bug where it reports a 0 value for the start/end coordinate when configured with RMBLAST (RepeatMasker uses a 1 based coordinate system, so 0 is not supposed to be possible and it only happens with RMBLAST). The change I made to the parser is a hack where I have MAKER change the RepeatMasker coordinate to 1 whenever it sees the invalid 0. ?Carson > On May 1, 2018, at 10:16 AM, Kapeel Chougule wrote: > > Hi, > > I am using MAKER v3 to update community annotation with new RNA-seq evidence data. Part of my problem is that MAKER shows below error when repeat masking. I have attached the community annotation gff, maker log and maker_opts.ctl for your reference. I searched for this error in the maker-dev google group and found some hints to update BLAST to 2.7, rmblast to 2.6 and reconfigure RepeatMasker which I did but it still fails with the error. Any help appreciated. > > deleted:0 hits > collecting blastx repeatmasking > processing all repeats > in cluster::shadow_cluster... > Died at /mnt/grid/ware/hpc/home/data/mcampbel/applications/maker/bin/../lib/Bio/Search/Hit/PhatHit/Base.pm line 188. > --> rank=1, hostname=bnbcompute15.blacknblue.cshl.edu > ERROR: Failed while processing all repeats > ERROR: Chunk failed at level:3, tier_type:1 > FAILED CONTIG:Chr06 > > ERROR: Chunk failed at level:2, tier_type:0 > FAILED CONTIG:Chr06 > > > Thanks > > Kapeel > -- > > Kapeel Chougule > Computational Scientist Developer II > One Bungtown Road Cold Spring Harbor, NY 11724 > http://www.warelab.org/ > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: RepeatMasker.pm Type: text/x-perl-script Size: 9317 bytes Desc: not available URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: From jacques.dainat at nbis.se Mon May 7 05:25:13 2018 From: jacques.dainat at nbis.se (Jacques Dainat) Date: Mon, 7 May 2018 13:25:13 +0200 Subject: [maker-devel] keep_preds=0 but some gene models have anyway an AED equal to 1. In-Reply-To: <41640591-9D18-4B08-9FD7-00C550EB7B2A@gmail.com> References: <41640591-9D18-4B08-9FD7-00C550EB7B2A@gmail.com> Message-ID: <5404E0E7-82CD-420F-AAEF-CFD01793C075@nbis.se> Hi, Thank you for your reply, nevertheless it doesn't it answer my question. Either I didn't express myself well enough or I don't get something obvious from your answer. I will try to rephrase the problem. My problem is that I am setting keep_phred=0 and that I obtain AED of 1. Can unsupported data can be selected with keep_phred=0? what I understood is that it is not the case so do you have any idea of why I have those AED equal to 1? Thank you again for your help. Best regards, Jacques > On 4 May 2018, at 20:30, Carson Holt wrote: > > By default MAKER will not let models through without at least some degree of evidence overlap (AED < 1). This is because ab initio predictors overcall (sometimes by as much as a factor of 10, i.e. 10 false positives for every true positive). You can dial in the minimum AED support using the AED_threshold. But if yo8u want to keep everything that does not have overlap with a better supported model then setting keep_pred=1 will allow even unsupported models to be maintained (1 for yes and 0 for no). Situations where you may want to do this include passing in old annotation datasets or working on organisms with very high genedensity and low ab initio false positive rates (many species of fungi meet this criteria). > > ?Carson > > >> On May 2, 2018, at 5:55 AM, Jacques Dainat > wrote: >> >> Dear all, >> >> It is not the first time I see that, but I have an annotation launched with the option keeps_pred=0 that contains gene models with AED score equal to 1. As far as I understand, AED score to 1 means there is no evidence support. So it should be purely abinitio without any line of evidence in front of this prediction. >> But as it is written in ?Michael S-Cambell et al., Genome Annotation and Curation Using MAKER and MAKER-P, Current Protocols in Bioinformatics, 2014? page 4.11.36 about the the keep_preds option: >> "MAKER rejects models that do not have at least some form of evidence support.? Setting keep_preds to 1 ?remove the evidence support requirement?. >> >> So, how should I understand my results? Some predictions with low support (but support anyway) can have AED score equal to 1? Or some purely abinitio prediction without support at all can anyway be selected even when keep_preds=0 is set up? >> >> Best regards, >> >> Jacques Dainat >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue May 8 07:28:27 2018 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 8 May 2018 07:28:27 -0600 Subject: [maker-devel] keep_preds=0 but some gene models have anyway an AED equal to 1. In-Reply-To: <5404E0E7-82CD-420F-AAEF-CFD01793C075@nbis.se> References: <41640591-9D18-4B08-9FD7-00C550EB7B2A@gmail.com> <5404E0E7-82CD-420F-AAEF-CFD01793C075@nbis.se> Message-ID: <8ADB4928-A00E-41E1-97B6-413F8257276C@gmail.com> You should not see AED=1 in the annotations (unless you supplied features to model_gff those are always maintained). But you will see AED=1 in the evidence track. Make sure you are looking at features with a source tag of ?maker? and type of gene/mRNA/exon/CDS, and not type match/match_part. The match/match_part features are reference features in the evidence track. The reference features will also have their own fasta file. The only fasta you should use are the maker.proteins.fasta and maker.transcripts.fasta not the snap_masked.protein.fasta for example. ?Carson > On May 7, 2018, at 5:25 AM, Jacques Dainat wrote: > > Hi, > > Thank you for your reply, nevertheless it doesn't it answer my question. Either I didn't express myself well enough or I don't get something obvious from your answer. > > I will try to rephrase the problem. > My problem is that I am setting keep_phred=0 and that I obtain AED of 1. Can unsupported data can be selected with keep_phred=0? what I understood is that it is not the case so do you have any idea of why I have those AED equal to 1? > > Thank you again for your help. > Best regards, > > Jacques > > >> On 4 May 2018, at 20:30, Carson Holt > wrote: >> >> By default MAKER will not let models through without at least some degree of evidence overlap (AED < 1). This is because ab initio predictors overcall (sometimes by as much as a factor of 10, i.e. 10 false positives for every true positive). You can dial in the minimum AED support using the AED_threshold. But if yo8u want to keep everything that does not have overlap with a better supported model then setting keep_pred=1 will allow even unsupported models to be maintained (1 for yes and 0 for no). Situations where you may want to do this include passing in old annotation datasets or working on organisms with very high genedensity and low ab initio false positive rates (many species of fungi meet this criteria). >> >> ?Carson >> >> >>> On May 2, 2018, at 5:55 AM, Jacques Dainat > wrote: >>> >>> Dear all, >>> >>> It is not the first time I see that, but I have an annotation launched with the option keeps_pred=0 that contains gene models with AED score equal to 1. As far as I understand, AED score to 1 means there is no evidence support. So it should be purely abinitio without any line of evidence in front of this prediction. >>> But as it is written in ?Michael S-Cambell et al., Genome Annotation and Curation Using MAKER and MAKER-P, Current Protocols in Bioinformatics, 2014? page 4.11.36 about the the keep_preds option: >>> "MAKER rejects models that do not have at least some form of evidence support.? Setting keep_preds to 1 ?remove the evidence support requirement?. >>> >>> So, how should I understand my results? Some predictions with low support (but support anyway) can have AED score equal to 1? Or some purely abinitio prediction without support at all can anyway be selected even when keep_preds=0 is set up? >>> >>> Best regards, >>> >>> Jacques Dainat >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kapeelc at gmail.com Mon May 14 08:35:30 2018 From: kapeelc at gmail.com (Kapeel Chougule) Date: Mon, 14 May 2018 10:35:30 -0400 Subject: [maker-devel] MAKER v3 ERROR: Failed while processing all repeats In-Reply-To: <161EF76F-4CFF-457E-A762-5E0FE82EABA2@gmail.com> References: <161EF76F-4CFF-457E-A762-5E0FE82EABA2@gmail.com> Message-ID: Thanks Carson, It works now. I only had to replace the Repeat Masker perl module and it ran without any errors. Best Kapeel On Fri, May 4, 2018 at 2:46 PM, Carson Holt wrote: > Hi Kapeel, > > The failure is caused by the absence of a start or end coordinate (usually > caused by a BLAST report truncation - there is a BLAST bug supposably fixed > now where reports were being truncated by BLAST). If you?ve done all the > updates to installed tools, make sure you aslo set the location of the > updated tools in maker_exe.ctl and reran the ./configure script for > RepeatMasker (internal to it?s install directory) or the old tool is likely > still being used. > > Also if that doesn?t fix it, try the following. Use the attached file to > replace ?/maker/lib/Widget/RepeatMasker.pm > > There is also an as of yet unfixed RepeatMasker bug where it reports a 0 > value for the start/end coordinate when configured with RMBLAST > (RepeatMasker uses a 1 based coordinate system, so 0 is not supposed to be > possible and it only happens with RMBLAST). The change I made to the parser > is a hack where I have MAKER change the RepeatMasker coordinate to 1 > whenever it sees the invalid 0. > > ?Carson > > > > > > On May 1, 2018, at 10:16 AM, Kapeel Chougule wrote: > > Hi, > > I am using MAKER v3 to update community annotation with new RNA-seq > evidence data. Part of my problem is that MAKER shows below error when > repeat masking. I have attached the community annotation gff, maker log and > maker_opts.ctl for your reference. I searched for this error in the > maker-dev google group and found some hints > to > update BLAST to 2.7, rmblast to 2.6 and reconfigure RepeatMasker which I > did but it still fails with the error. Any help appreciated. > > deleted:0 hits > collecting blastx repeatmasking > processing all repeats > in cluster::shadow_cluster... > Died at /mnt/grid/ware/hpc/home/data/mcampbel/applications/maker/ > bin/../lib/Bio/Search/Hit/PhatHit/Base.pm line 188. > --> rank=1, hostname=bnbcompute15.blacknblue.cshl.edu > ERROR: Failed while processing all repeats > ERROR: Chunk failed at level:3, tier_type:1 > FAILED CONTIG:Chr06 > > > ERROR: Chunk failed at level:2, tier_type:0 > > FAILED CONTIG:Chr06 > > > Thanks > > Kapeel > -- > > > > *Kapeel ChouguleComputational Scientist Developer II* > > > *One Bungtown Road Cold Spring Harbor, NY 11724http://www.warelab.org/ > * > ________________ > _______________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > > -- *Kapeel ChouguleComputational Scientist Developer II* *One Bungtown Road Cold Spring Harbor, NY 11724http://www.warelab.org/ * -------------- next part -------------- An HTML attachment was scrubbed... URL: From vsoza at uw.edu Thu May 24 17:19:33 2018 From: vsoza at uw.edu (Valerie Soza) Date: Thu, 24 May 2018 16:19:33 -0700 Subject: [maker-devel] databases supported with ipr_update_gff script Message-ID: <814B3327-AA11-4BB8-B15B-A8A3C03FC950@uw.edu> Hi Maker community I am using the accessory scripts provided with Maker 2.3.19 to do some functional annotations of genes predicted with the Maker pipeline. For integrating information from InterProScan, I want to use the ipr_update_gff script. When I looked at the script, I found the following lines: my %db_map = (BlastProDom => 'ProDom', FPrintScan => 'PRINTS', Gene3D => 'Gene3D', HMMPanther => 'PANTHER', HMMPfam => 'Pfam', HMMPIR => 'PIR', HMMSmart => 'SMART', HMMTigr => 'JCVI_TIGRFAMS', PatternScan => 'Prosite', ProfileScan => 'Prosite', ); Does this indicate that these are the only databases that the script will extract information for from an InterProScan report? I wanted to use all databases currently available from InterProScan 5, InterProScan version 5.28-67.0, see below, but am wondering whether the Maker script will recognize results from all of the following databases? ? CDD ? COILS ? Gene3D ? HAMAP ? MOBIDB ? PANTHER ? Pfam ? PIRSF ? PRINTS ? ProDom ? PROSITE (Profiles and Patterns) ? SFLD ? SMART (unlicensed components only by default - this analysis has simplified post-processing that includes an E-value filter, however you should not expect it to give the same match output as the fully licensed version of SMART) ? SUPERFAMILY ? TIGRFAMs Thanks. -Valerie Valerie Soza, Ph.D. c/o Hall Lab Department of Biology University of Washington Johnson Hall 202A Box 351800 Seattle, WA 98195-1800 206-543-6740 http://staff.washington.edu/vsoza/ From carsonhh at gmail.com Fri May 25 12:19:18 2018 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 25 May 2018 12:19:18 -0600 Subject: [maker-devel] databases supported with ipr_update_gff script In-Reply-To: <814B3327-AA11-4BB8-B15B-A8A3C03FC950@uw.edu> References: <814B3327-AA11-4BB8-B15B-A8A3C03FC950@uw.edu> Message-ID: <822CD0A7-4C7E-4AA8-9B37-85836CD02196@gmail.com> Those are just for name conversion (takes what in the report and rename it to a known DB_xref term). If there is no conversion, the name will stay the same as in the report (unaltered). Different databases have there own db_xref values. I can?t remember where the ones we are using came from (I think it was from GMOD?s Chado database). NCBI also has their own ?> https://www.ncbi.nlm.nih.gov/genbank/collab/db_xref/ , Uniprot ?> https://www.uniprot.org/docs/dbxref , and you can search around for others as well. ?Carson > On May 24, 2018, at 5:19 PM, Valerie Soza wrote: > > Hi Maker community > > I am using the accessory scripts provided with Maker 2.3.19 to do some functional annotations of genes predicted with the Maker pipeline. For integrating information from InterProScan, I want to use the ipr_update_gff script. When I looked at the script, I found the following lines: > > my %db_map = (BlastProDom => 'ProDom', > FPrintScan => 'PRINTS', > Gene3D => 'Gene3D', > HMMPanther => 'PANTHER', > HMMPfam => 'Pfam', > HMMPIR => 'PIR', > HMMSmart => 'SMART', > HMMTigr => 'JCVI_TIGRFAMS', > PatternScan => 'Prosite', > ProfileScan => 'Prosite', > ); > > Does this indicate that these are the only databases that the script will extract information for from an InterProScan report? > > I wanted to use all databases currently available from InterProScan 5, InterProScan version 5.28-67.0, see below, but am wondering whether the Maker script will recognize results from all of the following databases? > > ? CDD > ? COILS > ? Gene3D > ? HAMAP > ? MOBIDB > ? PANTHER > ? Pfam > ? PIRSF > ? PRINTS > ? ProDom > ? PROSITE (Profiles and Patterns) > ? SFLD > ? SMART (unlicensed components only by default - this analysis has simplified post-processing that includes an E-value filter, however you should not expect it to give the same match output as the fully licensed version of SMART) > ? SUPERFAMILY > ? TIGRFAMs > > Thanks. > > -Valerie > > Valerie Soza, Ph.D. > c/o Hall Lab > Department of Biology > University of Washington > Johnson Hall 202A > Box 351800 > Seattle, WA 98195-1800 > 206-543-6740 > http://staff.washington.edu/vsoza/ > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From timo.metz at googlemail.com Tue May 8 06:11:05 2018 From: timo.metz at googlemail.com (Timo Metz) Date: Tue, 08 May 2018 12:11:05 -0000 Subject: [maker-devel] large UTR overhang Message-ID: Hey guys, Attached there is a picture of a recent MAKER run where I used pacbio reads and trinity assembled short reads plus proteins from Swissprot. I don't really get why MAKER assigns such a large fraction to be a UTR. From the name I can tell that the final gene model stems from a snap ab-initio prediction but even that gene prediction is not 100% identical to the part of the gene model which is not UTR. Is there any setting in MAKER in which I can somehow have an influence on how MAKER assigns such UTRs? I already tried out the "correct_est_fusion" option but it did not help. best Timo -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: UTRoverhang.png Type: image/png Size: 81905 bytes Desc: not available URL: From timo.metz at googlemail.com Tue May 22 05:07:46 2018 From: timo.metz at googlemail.com (Timo Metz) Date: Tue, 22 May 2018 13:07:46 +0200 Subject: [maker-devel] MAKER beta not inferring gene models from protein evidence Message-ID: Hey guys, I have installed Maker v3 beta in order to use the built-in Evidence-Modeler which is not part of v2.31. Now I could see that, even if using the same evidence, the BUSCO completeness of the transcriptome drops when using the v3 beta compared to the v2.31. I could identify the reason leading to this was that MAKER v3 now does not infer gene models from Protein evidence if there is no additional support from RNA-seq/ests. In v2.31 it did, on the contrary. Is there any option in v3 beta to also get gene models only from protein evidence or is this something that v3 beta is not able to do anymore? best Timo -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue May 29 10:07:20 2018 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 29 May 2018 10:07:20 -0600 Subject: [maker-devel] large UTR overhang In-Reply-To: References: Message-ID: <3CE6DD4B-B176-4051-98B7-D47F33987E10@gmail.com> MAKER 3 does not have any additional requirement for transcript support that MAKER 2 does not have. However, if you are using the correct_est_fusion=1 option, it will only use the polished protein evidence rather than the unpolished blastx alignments which is probably what you are seeing. The model you show also likely corresponds to either a paralogous duplication or a broken ORF due to assembly error. You can see clearly that both SNAP and Augustus want to break the region into two separate models (they can?t find a single workable ORF). The raw BLASTX alignments and transcription data want to merge the region (I don?t see any support for merging from polished protein2genome alignments though - maybe you just cut that off in the image?). So when the predictors are fed hints suggesting the longer model, they build the best model they can, but the ORF is broken, so remaining exons will match the transcript evidence exactly, but have to be UTR given the broken ORF. This means you are either merging things that shouldn?t be merged (based on bad evidence alignments) or the assembly has an error that keeps the ORF from functioning in that region as it should. The overall structure is still captured, but the translation is truncated. Here is a secondary tool you can try called DeFusion that may help if you are getting false merges because of the evidence ?> https://wjidea.github.io/defusion/ ?Carson > On May 8, 2018, at 6:10 AM, Timo Metz wrote: > > Hey guys, > > Attached there is a picture of a recent MAKER run where I used pacbio reads and trinity assembled short reads plus proteins from Swissprot. I don't really get why MAKER assigns such a large fraction to be a UTR. From the name I can tell that the final gene model stems from a snap ab-initio prediction but even that gene prediction is not 100% identical to the part of the gene model which is not UTR. > > Is there any setting in MAKER in which I can somehow have an influence on how MAKER assigns such UTRs? I already tried out the "correct_est_fusion" option but it did not help. > > best > > Timo > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From ksenia.lavrichenko at gmail.com Thu May 31 08:30:12 2018 From: ksenia.lavrichenko at gmail.com (Ksenia Lavrichenko) Date: Thu, 31 May 2018 16:30:12 +0200 Subject: [maker-devel] Building MAKER with specific perl version Message-ID: Hi, I have been banging my head for a while now, trying to install MAKER with my specific perl. I found this old thread: https://groups.google.com/forum/#!msg/maker-devel/hScqdJW0FsU/3KT_UF7k9XMJ However, this does not work for me. I make sure bin/* and Build are deleted before I run $myperl Build.PL. I see my perl in shebang of Build however after ./Build install all scripts in bin have "#! /usr/bin/perl" which produces a version error when I try to run maker -h. Any tips of what do I need to adjust in Build.PL? Many thanks, Ksenia -------------- next part -------------- An HTML attachment was scrubbed... URL: From jacques.dainat at nbis.se Wed May 2 05:55:13 2018 From: jacques.dainat at nbis.se (Jacques Dainat) Date: Wed, 2 May 2018 13:55:13 +0200 Subject: [maker-devel] keep_preds=0 but some gene models have anyway an AED equal to 1. Message-ID: Dear all, It is not the first time I see that, but I have an annotation launched with the option keeps_pred=0 that contains gene models with AED score equal to 1. As far as I understand, AED score to 1 means there is no evidence support. So it should be purely abinitio without any line of evidence in front of this prediction. But as it is written in ?Michael S-Cambell et al., Genome Annotation and Curation Using MAKER and MAKER-P, Current Protocols in Bioinformatics, 2014? page 4.11.36 about the the keep_preds option: "MAKER rejects models that do not have at least some form of evidence support.? Setting keep_preds to 1 ?remove the evidence support requirement?. So, how should I understand my results? Some predictions with low support (but support anyway) can have AED score equal to 1? Or some purely abinitio prediction without support at all can anyway be selected even when keep_preds=0 is set up? Best regards, Jacques Dainat -------------- next part -------------- An HTML attachment was scrubbed... URL: From kapeelc at gmail.com Tue May 1 10:16:27 2018 From: kapeelc at gmail.com (Kapeel Chougule) Date: Tue, 1 May 2018 12:16:27 -0400 Subject: [maker-devel] MAKER v3 ERROR: Failed while processing all repeats Message-ID: Hi, I am using MAKER v3 to update community annotation with new RNA-seq evidence data. Part of my problem is that MAKER shows below error when repeat masking. I have attached the community annotation gff, maker log and maker_opts.ctl for your reference. I searched for this error in the maker-dev google group and found some hints to update BLAST to 2.7, rmblast to 2.6 and reconfigure RepeatMasker which I did but it still fails with the error. Any help appreciated. deleted:0 hits collecting blastx repeatmasking processing all repeats in cluster::shadow_cluster... Died at /mnt/grid/ware/hpc/home/data/mcampbel/applications/maker/bin/../lib/Bio/Search/Hit/PhatHit/Base.pm line 188. --> rank=1, hostname=bnbcompute15.blacknblue.cshl.edu ERROR: Failed while processing all repeats ERROR: Chunk failed at level:3, tier_type:1 FAILED CONTIG:Chr06 ERROR: Chunk failed at level:2, tier_type:0 FAILED CONTIG:Chr06 Thanks Kapeel -- *Kapeel ChouguleComputational Scientist Developer II* *One Bungtown Road Cold Spring Harbor, NY 11724http://www.warelab.org/ * -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Chr06.gff Type: application/octet-stream Size: 8239004 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_opts.ctl Type: application/octet-stream Size: 4780 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker.run.log Type: application/octet-stream Size: 5368011 bytes Desc: not available URL: From kapeelc at gmail.com Fri May 4 10:01:39 2018 From: kapeelc at gmail.com (Kapeel Chougule) Date: Fri, 4 May 2018 12:01:39 -0400 Subject: [maker-devel] MAKER v3 ERROR: Failed while processing all repeats Message-ID: Hi, I am using MAKER v3 to update community annotation with new RNA-seq evidence data. Part of my problem is that MAKER shows below error when repeat masking. I have attached the community annotation gff, maker log and maker_opts.ctl for your reference. I searched for this error in the maker-dev google group and found some hints to update BLAST to 2.7, rmblast to 2.6 and reconfigure RepeatMasker which I did but it still fails with the error. Any help appreciated. deleted:0 hits collecting blastx repeatmasking processing all repeats in cluster::shadow_cluster... Died at /mnt/grid/ware/hpc/home/data/mcampbel/applications/maker/ bin/../lib/Bio/Search/Hit/PhatHit/Base.pm line 188. --> rank=1, hostname=bnbcompute15.blacknblue.cshl.edu ERROR: Failed while processing all repeats ERROR: Chunk failed at level:3, tier_type:1 FAILED CONTIG:Chr06 ERROR: Chunk failed at level:2, tier_type:0 FAILED CONTIG:Chr06 Genome fasta: https://de.cyverse.org/dl/d/F612A6E2-A58E-44F0-895D-B766B41FE287/Chr06.gff MAKER_run_log: https://de.cyverse.org/dl/d/0B898D91-1520-4D19-9835-DC7EDD52415F/maker.run.log Maker_opts.ctl: https://de.cyverse.org/dl/d/15EB321B-4604-47C9-8E4F-0DC8D78517CE/maker_opts.ctl Thanks -- *Kapeel ChouguleComputational Scientist Developer II* *One Bungtown Road Cold Spring Harbor, NY 11724http://www.warelab.org/ * -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri May 4 12:30:44 2018 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 4 May 2018 12:30:44 -0600 Subject: [maker-devel] keep_preds=0 but some gene models have anyway an AED equal to 1. In-Reply-To: References: Message-ID: <41640591-9D18-4B08-9FD7-00C550EB7B2A@gmail.com> By default MAKER will not let models through without at least some degree of evidence overlap (AED < 1). This is because ab initio predictors overcall (sometimes by as much as a factor of 10, i.e. 10 false positives for every true positive). You can dial in the minimum AED support using the AED_threshold. But if yo8u want to keep everything that does not have overlap with a better supported model then setting keep_pred=1 will allow even unsupported models to be maintained (1 for yes and 0 for no). Situations where you may want to do this include passing in old annotation datasets or working on organisms with very high genedensity and low ab initio false positive rates (many species of fungi meet this criteria). ?Carson > On May 2, 2018, at 5:55 AM, Jacques Dainat wrote: > > Dear all, > > It is not the first time I see that, but I have an annotation launched with the option keeps_pred=0 that contains gene models with AED score equal to 1. As far as I understand, AED score to 1 means there is no evidence support. So it should be purely abinitio without any line of evidence in front of this prediction. > But as it is written in ?Michael S-Cambell et al., Genome Annotation and Curation Using MAKER and MAKER-P, Current Protocols in Bioinformatics, 2014? page 4.11.36 about the the keep_preds option: > "MAKER rejects models that do not have at least some form of evidence support.? Setting keep_preds to 1 ?remove the evidence support requirement?. > > So, how should I understand my results? Some predictions with low support (but support anyway) can have AED score equal to 1? Or some purely abinitio prediction without support at all can anyway be selected even when keep_preds=0 is set up? > > Best regards, > > Jacques Dainat > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri May 4 12:46:55 2018 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 4 May 2018 12:46:55 -0600 Subject: [maker-devel] MAKER v3 ERROR: Failed while processing all repeats In-Reply-To: References: Message-ID: <161EF76F-4CFF-457E-A762-5E0FE82EABA2@gmail.com> Hi Kapeel, The failure is caused by the absence of a start or end coordinate (usually caused by a BLAST report truncation - there is a BLAST bug supposably fixed now where reports were being truncated by BLAST). If you?ve done all the updates to installed tools, make sure you aslo set the location of the updated tools in maker_exe.ctl and reran the ./configure script for RepeatMasker (internal to it?s install directory) or the old tool is likely still being used. Also if that doesn?t fix it, try the following. Use the attached file to replace ?/maker/lib/Widget/RepeatMasker.pm There is also an as of yet unfixed RepeatMasker bug where it reports a 0 value for the start/end coordinate when configured with RMBLAST (RepeatMasker uses a 1 based coordinate system, so 0 is not supposed to be possible and it only happens with RMBLAST). The change I made to the parser is a hack where I have MAKER change the RepeatMasker coordinate to 1 whenever it sees the invalid 0. ?Carson > On May 1, 2018, at 10:16 AM, Kapeel Chougule wrote: > > Hi, > > I am using MAKER v3 to update community annotation with new RNA-seq evidence data. Part of my problem is that MAKER shows below error when repeat masking. I have attached the community annotation gff, maker log and maker_opts.ctl for your reference. I searched for this error in the maker-dev google group and found some hints to update BLAST to 2.7, rmblast to 2.6 and reconfigure RepeatMasker which I did but it still fails with the error. Any help appreciated. > > deleted:0 hits > collecting blastx repeatmasking > processing all repeats > in cluster::shadow_cluster... > Died at /mnt/grid/ware/hpc/home/data/mcampbel/applications/maker/bin/../lib/Bio/Search/Hit/PhatHit/Base.pm line 188. > --> rank=1, hostname=bnbcompute15.blacknblue.cshl.edu > ERROR: Failed while processing all repeats > ERROR: Chunk failed at level:3, tier_type:1 > FAILED CONTIG:Chr06 > > ERROR: Chunk failed at level:2, tier_type:0 > FAILED CONTIG:Chr06 > > > Thanks > > Kapeel > -- > > Kapeel Chougule > Computational Scientist Developer II > One Bungtown Road Cold Spring Harbor, NY 11724 > http://www.warelab.org/ > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: RepeatMasker.pm Type: text/x-perl-script Size: 9317 bytes Desc: not available URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: From jacques.dainat at nbis.se Mon May 7 05:25:13 2018 From: jacques.dainat at nbis.se (Jacques Dainat) Date: Mon, 7 May 2018 13:25:13 +0200 Subject: [maker-devel] keep_preds=0 but some gene models have anyway an AED equal to 1. In-Reply-To: <41640591-9D18-4B08-9FD7-00C550EB7B2A@gmail.com> References: <41640591-9D18-4B08-9FD7-00C550EB7B2A@gmail.com> Message-ID: <5404E0E7-82CD-420F-AAEF-CFD01793C075@nbis.se> Hi, Thank you for your reply, nevertheless it doesn't it answer my question. Either I didn't express myself well enough or I don't get something obvious from your answer. I will try to rephrase the problem. My problem is that I am setting keep_phred=0 and that I obtain AED of 1. Can unsupported data can be selected with keep_phred=0? what I understood is that it is not the case so do you have any idea of why I have those AED equal to 1? Thank you again for your help. Best regards, Jacques > On 4 May 2018, at 20:30, Carson Holt wrote: > > By default MAKER will not let models through without at least some degree of evidence overlap (AED < 1). This is because ab initio predictors overcall (sometimes by as much as a factor of 10, i.e. 10 false positives for every true positive). You can dial in the minimum AED support using the AED_threshold. But if yo8u want to keep everything that does not have overlap with a better supported model then setting keep_pred=1 will allow even unsupported models to be maintained (1 for yes and 0 for no). Situations where you may want to do this include passing in old annotation datasets or working on organisms with very high genedensity and low ab initio false positive rates (many species of fungi meet this criteria). > > ?Carson > > >> On May 2, 2018, at 5:55 AM, Jacques Dainat > wrote: >> >> Dear all, >> >> It is not the first time I see that, but I have an annotation launched with the option keeps_pred=0 that contains gene models with AED score equal to 1. As far as I understand, AED score to 1 means there is no evidence support. So it should be purely abinitio without any line of evidence in front of this prediction. >> But as it is written in ?Michael S-Cambell et al., Genome Annotation and Curation Using MAKER and MAKER-P, Current Protocols in Bioinformatics, 2014? page 4.11.36 about the the keep_preds option: >> "MAKER rejects models that do not have at least some form of evidence support.? Setting keep_preds to 1 ?remove the evidence support requirement?. >> >> So, how should I understand my results? Some predictions with low support (but support anyway) can have AED score equal to 1? Or some purely abinitio prediction without support at all can anyway be selected even when keep_preds=0 is set up? >> >> Best regards, >> >> Jacques Dainat >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue May 8 07:28:27 2018 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 8 May 2018 07:28:27 -0600 Subject: [maker-devel] keep_preds=0 but some gene models have anyway an AED equal to 1. In-Reply-To: <5404E0E7-82CD-420F-AAEF-CFD01793C075@nbis.se> References: <41640591-9D18-4B08-9FD7-00C550EB7B2A@gmail.com> <5404E0E7-82CD-420F-AAEF-CFD01793C075@nbis.se> Message-ID: <8ADB4928-A00E-41E1-97B6-413F8257276C@gmail.com> You should not see AED=1 in the annotations (unless you supplied features to model_gff those are always maintained). But you will see AED=1 in the evidence track. Make sure you are looking at features with a source tag of ?maker? and type of gene/mRNA/exon/CDS, and not type match/match_part. The match/match_part features are reference features in the evidence track. The reference features will also have their own fasta file. The only fasta you should use are the maker.proteins.fasta and maker.transcripts.fasta not the snap_masked.protein.fasta for example. ?Carson > On May 7, 2018, at 5:25 AM, Jacques Dainat wrote: > > Hi, > > Thank you for your reply, nevertheless it doesn't it answer my question. Either I didn't express myself well enough or I don't get something obvious from your answer. > > I will try to rephrase the problem. > My problem is that I am setting keep_phred=0 and that I obtain AED of 1. Can unsupported data can be selected with keep_phred=0? what I understood is that it is not the case so do you have any idea of why I have those AED equal to 1? > > Thank you again for your help. > Best regards, > > Jacques > > >> On 4 May 2018, at 20:30, Carson Holt > wrote: >> >> By default MAKER will not let models through without at least some degree of evidence overlap (AED < 1). This is because ab initio predictors overcall (sometimes by as much as a factor of 10, i.e. 10 false positives for every true positive). You can dial in the minimum AED support using the AED_threshold. But if yo8u want to keep everything that does not have overlap with a better supported model then setting keep_pred=1 will allow even unsupported models to be maintained (1 for yes and 0 for no). Situations where you may want to do this include passing in old annotation datasets or working on organisms with very high genedensity and low ab initio false positive rates (many species of fungi meet this criteria). >> >> ?Carson >> >> >>> On May 2, 2018, at 5:55 AM, Jacques Dainat > wrote: >>> >>> Dear all, >>> >>> It is not the first time I see that, but I have an annotation launched with the option keeps_pred=0 that contains gene models with AED score equal to 1. As far as I understand, AED score to 1 means there is no evidence support. So it should be purely abinitio without any line of evidence in front of this prediction. >>> But as it is written in ?Michael S-Cambell et al., Genome Annotation and Curation Using MAKER and MAKER-P, Current Protocols in Bioinformatics, 2014? page 4.11.36 about the the keep_preds option: >>> "MAKER rejects models that do not have at least some form of evidence support.? Setting keep_preds to 1 ?remove the evidence support requirement?. >>> >>> So, how should I understand my results? Some predictions with low support (but support anyway) can have AED score equal to 1? Or some purely abinitio prediction without support at all can anyway be selected even when keep_preds=0 is set up? >>> >>> Best regards, >>> >>> Jacques Dainat >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kapeelc at gmail.com Mon May 14 08:35:30 2018 From: kapeelc at gmail.com (Kapeel Chougule) Date: Mon, 14 May 2018 10:35:30 -0400 Subject: [maker-devel] MAKER v3 ERROR: Failed while processing all repeats In-Reply-To: <161EF76F-4CFF-457E-A762-5E0FE82EABA2@gmail.com> References: <161EF76F-4CFF-457E-A762-5E0FE82EABA2@gmail.com> Message-ID: Thanks Carson, It works now. I only had to replace the Repeat Masker perl module and it ran without any errors. Best Kapeel On Fri, May 4, 2018 at 2:46 PM, Carson Holt wrote: > Hi Kapeel, > > The failure is caused by the absence of a start or end coordinate (usually > caused by a BLAST report truncation - there is a BLAST bug supposably fixed > now where reports were being truncated by BLAST). If you?ve done all the > updates to installed tools, make sure you aslo set the location of the > updated tools in maker_exe.ctl and reran the ./configure script for > RepeatMasker (internal to it?s install directory) or the old tool is likely > still being used. > > Also if that doesn?t fix it, try the following. Use the attached file to > replace ?/maker/lib/Widget/RepeatMasker.pm > > There is also an as of yet unfixed RepeatMasker bug where it reports a 0 > value for the start/end coordinate when configured with RMBLAST > (RepeatMasker uses a 1 based coordinate system, so 0 is not supposed to be > possible and it only happens with RMBLAST). The change I made to the parser > is a hack where I have MAKER change the RepeatMasker coordinate to 1 > whenever it sees the invalid 0. > > ?Carson > > > > > > On May 1, 2018, at 10:16 AM, Kapeel Chougule wrote: > > Hi, > > I am using MAKER v3 to update community annotation with new RNA-seq > evidence data. Part of my problem is that MAKER shows below error when > repeat masking. I have attached the community annotation gff, maker log and > maker_opts.ctl for your reference. I searched for this error in the > maker-dev google group and found some hints > to > update BLAST to 2.7, rmblast to 2.6 and reconfigure RepeatMasker which I > did but it still fails with the error. Any help appreciated. > > deleted:0 hits > collecting blastx repeatmasking > processing all repeats > in cluster::shadow_cluster... > Died at /mnt/grid/ware/hpc/home/data/mcampbel/applications/maker/ > bin/../lib/Bio/Search/Hit/PhatHit/Base.pm line 188. > --> rank=1, hostname=bnbcompute15.blacknblue.cshl.edu > ERROR: Failed while processing all repeats > ERROR: Chunk failed at level:3, tier_type:1 > FAILED CONTIG:Chr06 > > > ERROR: Chunk failed at level:2, tier_type:0 > > FAILED CONTIG:Chr06 > > > Thanks > > Kapeel > -- > > > > *Kapeel ChouguleComputational Scientist Developer II* > > > *One Bungtown Road Cold Spring Harbor, NY 11724http://www.warelab.org/ > * > ________________ > _______________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > > -- *Kapeel ChouguleComputational Scientist Developer II* *One Bungtown Road Cold Spring Harbor, NY 11724http://www.warelab.org/ * -------------- next part -------------- An HTML attachment was scrubbed... URL: From vsoza at uw.edu Thu May 24 17:19:33 2018 From: vsoza at uw.edu (Valerie Soza) Date: Thu, 24 May 2018 16:19:33 -0700 Subject: [maker-devel] databases supported with ipr_update_gff script Message-ID: <814B3327-AA11-4BB8-B15B-A8A3C03FC950@uw.edu> Hi Maker community I am using the accessory scripts provided with Maker 2.3.19 to do some functional annotations of genes predicted with the Maker pipeline. For integrating information from InterProScan, I want to use the ipr_update_gff script. When I looked at the script, I found the following lines: my %db_map = (BlastProDom => 'ProDom', FPrintScan => 'PRINTS', Gene3D => 'Gene3D', HMMPanther => 'PANTHER', HMMPfam => 'Pfam', HMMPIR => 'PIR', HMMSmart => 'SMART', HMMTigr => 'JCVI_TIGRFAMS', PatternScan => 'Prosite', ProfileScan => 'Prosite', ); Does this indicate that these are the only databases that the script will extract information for from an InterProScan report? I wanted to use all databases currently available from InterProScan 5, InterProScan version 5.28-67.0, see below, but am wondering whether the Maker script will recognize results from all of the following databases? ? CDD ? COILS ? Gene3D ? HAMAP ? MOBIDB ? PANTHER ? Pfam ? PIRSF ? PRINTS ? ProDom ? PROSITE (Profiles and Patterns) ? SFLD ? SMART (unlicensed components only by default - this analysis has simplified post-processing that includes an E-value filter, however you should not expect it to give the same match output as the fully licensed version of SMART) ? SUPERFAMILY ? TIGRFAMs Thanks. -Valerie Valerie Soza, Ph.D. c/o Hall Lab Department of Biology University of Washington Johnson Hall 202A Box 351800 Seattle, WA 98195-1800 206-543-6740 http://staff.washington.edu/vsoza/ From carsonhh at gmail.com Fri May 25 12:19:18 2018 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 25 May 2018 12:19:18 -0600 Subject: [maker-devel] databases supported with ipr_update_gff script In-Reply-To: <814B3327-AA11-4BB8-B15B-A8A3C03FC950@uw.edu> References: <814B3327-AA11-4BB8-B15B-A8A3C03FC950@uw.edu> Message-ID: <822CD0A7-4C7E-4AA8-9B37-85836CD02196@gmail.com> Those are just for name conversion (takes what in the report and rename it to a known DB_xref term). If there is no conversion, the name will stay the same as in the report (unaltered). Different databases have there own db_xref values. I can?t remember where the ones we are using came from (I think it was from GMOD?s Chado database). NCBI also has their own ?> https://www.ncbi.nlm.nih.gov/genbank/collab/db_xref/ , Uniprot ?> https://www.uniprot.org/docs/dbxref , and you can search around for others as well. ?Carson > On May 24, 2018, at 5:19 PM, Valerie Soza wrote: > > Hi Maker community > > I am using the accessory scripts provided with Maker 2.3.19 to do some functional annotations of genes predicted with the Maker pipeline. For integrating information from InterProScan, I want to use the ipr_update_gff script. When I looked at the script, I found the following lines: > > my %db_map = (BlastProDom => 'ProDom', > FPrintScan => 'PRINTS', > Gene3D => 'Gene3D', > HMMPanther => 'PANTHER', > HMMPfam => 'Pfam', > HMMPIR => 'PIR', > HMMSmart => 'SMART', > HMMTigr => 'JCVI_TIGRFAMS', > PatternScan => 'Prosite', > ProfileScan => 'Prosite', > ); > > Does this indicate that these are the only databases that the script will extract information for from an InterProScan report? > > I wanted to use all databases currently available from InterProScan 5, InterProScan version 5.28-67.0, see below, but am wondering whether the Maker script will recognize results from all of the following databases? > > ? CDD > ? COILS > ? Gene3D > ? HAMAP > ? MOBIDB > ? PANTHER > ? Pfam > ? PIRSF > ? PRINTS > ? ProDom > ? PROSITE (Profiles and Patterns) > ? SFLD > ? SMART (unlicensed components only by default - this analysis has simplified post-processing that includes an E-value filter, however you should not expect it to give the same match output as the fully licensed version of SMART) > ? SUPERFAMILY > ? TIGRFAMs > > Thanks. > > -Valerie > > Valerie Soza, Ph.D. > c/o Hall Lab > Department of Biology > University of Washington > Johnson Hall 202A > Box 351800 > Seattle, WA 98195-1800 > 206-543-6740 > http://staff.washington.edu/vsoza/ > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From timo.metz at googlemail.com Tue May 8 06:11:05 2018 From: timo.metz at googlemail.com (Timo Metz) Date: Tue, 08 May 2018 12:11:05 -0000 Subject: [maker-devel] large UTR overhang Message-ID: Hey guys, Attached there is a picture of a recent MAKER run where I used pacbio reads and trinity assembled short reads plus proteins from Swissprot. I don't really get why MAKER assigns such a large fraction to be a UTR. From the name I can tell that the final gene model stems from a snap ab-initio prediction but even that gene prediction is not 100% identical to the part of the gene model which is not UTR. Is there any setting in MAKER in which I can somehow have an influence on how MAKER assigns such UTRs? I already tried out the "correct_est_fusion" option but it did not help. best Timo -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: UTRoverhang.png Type: image/png Size: 81905 bytes Desc: not available URL: From timo.metz at googlemail.com Tue May 22 05:07:46 2018 From: timo.metz at googlemail.com (Timo Metz) Date: Tue, 22 May 2018 13:07:46 +0200 Subject: [maker-devel] MAKER beta not inferring gene models from protein evidence Message-ID: Hey guys, I have installed Maker v3 beta in order to use the built-in Evidence-Modeler which is not part of v2.31. Now I could see that, even if using the same evidence, the BUSCO completeness of the transcriptome drops when using the v3 beta compared to the v2.31. I could identify the reason leading to this was that MAKER v3 now does not infer gene models from Protein evidence if there is no additional support from RNA-seq/ests. In v2.31 it did, on the contrary. Is there any option in v3 beta to also get gene models only from protein evidence or is this something that v3 beta is not able to do anymore? best Timo -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue May 29 10:07:20 2018 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 29 May 2018 10:07:20 -0600 Subject: [maker-devel] large UTR overhang In-Reply-To: References: Message-ID: <3CE6DD4B-B176-4051-98B7-D47F33987E10@gmail.com> MAKER 3 does not have any additional requirement for transcript support that MAKER 2 does not have. However, if you are using the correct_est_fusion=1 option, it will only use the polished protein evidence rather than the unpolished blastx alignments which is probably what you are seeing. The model you show also likely corresponds to either a paralogous duplication or a broken ORF due to assembly error. You can see clearly that both SNAP and Augustus want to break the region into two separate models (they can?t find a single workable ORF). The raw BLASTX alignments and transcription data want to merge the region (I don?t see any support for merging from polished protein2genome alignments though - maybe you just cut that off in the image?). So when the predictors are fed hints suggesting the longer model, they build the best model they can, but the ORF is broken, so remaining exons will match the transcript evidence exactly, but have to be UTR given the broken ORF. This means you are either merging things that shouldn?t be merged (based on bad evidence alignments) or the assembly has an error that keeps the ORF from functioning in that region as it should. The overall structure is still captured, but the translation is truncated. Here is a secondary tool you can try called DeFusion that may help if you are getting false merges because of the evidence ?> https://wjidea.github.io/defusion/ ?Carson > On May 8, 2018, at 6:10 AM, Timo Metz wrote: > > Hey guys, > > Attached there is a picture of a recent MAKER run where I used pacbio reads and trinity assembled short reads plus proteins from Swissprot. I don't really get why MAKER assigns such a large fraction to be a UTR. From the name I can tell that the final gene model stems from a snap ab-initio prediction but even that gene prediction is not 100% identical to the part of the gene model which is not UTR. > > Is there any setting in MAKER in which I can somehow have an influence on how MAKER assigns such UTRs? I already tried out the "correct_est_fusion" option but it did not help. > > best > > Timo > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From ksenia.lavrichenko at gmail.com Thu May 31 08:30:12 2018 From: ksenia.lavrichenko at gmail.com (Ksenia Lavrichenko) Date: Thu, 31 May 2018 16:30:12 +0200 Subject: [maker-devel] Building MAKER with specific perl version Message-ID: Hi, I have been banging my head for a while now, trying to install MAKER with my specific perl. I found this old thread: https://groups.google.com/forum/#!msg/maker-devel/hScqdJW0FsU/3KT_UF7k9XMJ However, this does not work for me. I make sure bin/* and Build are deleted before I run $myperl Build.PL. I see my perl in shebang of Build however after ./Build install all scripts in bin have "#! /usr/bin/perl" which produces a version error when I try to run maker -h. Any tips of what do I need to adjust in Build.PL? Many thanks, Ksenia -------------- next part -------------- An HTML attachment was scrubbed... URL: