From carsonhh at gmail.com Tue Jun 8 14:04:29 2021 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 8 Jun 2021 14:04:29 -0600 Subject: [maker-devel] maker-devel post from brian.mack@usda.gov requires approval In-Reply-To: References: Message-ID: <712258C3-EFFD-4216-B65F-A50564E7B3EE@gmail.com> No requirement to train on a masked genome. If I?m not mistaken GeneMark splits up the genome on long stretches of N, so it may make training faster, but it may also split contains on some introns. ?Carson > to approve or deny the request. > > From: "Mack, Brian - ARS" > > Subject: genemark-es training on masked genome > Date: May 17, 2021 at 9:47:57 AM MDT > To: "maker-devel at yandell-lab.org " > > > > Hi, > > I?m using MAKER for a fungal genome and I was wondering if it is recommended to train genemark-es on a masked genome? If so, would running just the repeat masking step with MAKER and using the masked files from the ?Void? directories be appropriate (cat *output/*datastore/*/*/*/*Void*/query.masked.fasta > masked_genome.fasta) ? > > Thanks, > Brian > > > > > This electronic message contains information generated by the USDA solely for the intended recipients. Any unauthorized interception of this message or the use or disclosure of the information it contains may violate the law and subject the violator to civil or criminal penalties. If you believe you have received this message in error, please notify the sender and delete the email immediately. > > > From: maker-devel-request at yandell-lab.org > Subject: confirm 9305bcd5964821ff5fff547bdb1b0466c9c50b58 > Date: May 17, 2021 at 9:48:09 AM MDT > > > If you reply to this message, keeping the Subject: header intact, > Mailman will discard the held message. Do this if the message is > spam. If you reply to this message and include an Approved: header > with the list password in it, the message will be approved for posting > to the list. The Approved: header can also appear in the first line > of the body of the reply. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 1376 bytes Desc: not available URL: From carsonhh at gmail.com Tue Jun 8 14:09:43 2021 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 8 Jun 2021 14:09:43 -0600 Subject: [maker-devel] Improving BUSCO stats In-Reply-To: References: Message-ID: It may be insufficient evidence. You can scan the rejected Augustus/Snap models for known protein domains using InterProScan and add them back to the final set (model rescue). Info in this paper https://www.yandell-lab.org/publications/pdf/maker_current_protocols.pdf (see Basic Protocol 5). If model rescue does not improve it, then you may have genes split across short contigs. In that cases there is not enough sequence for the gene predictors to call a model, but there is enough to generate a BUSCO match. If that?s the case, you would have to improve the assembly to recover the models. ?Carson > On May 16, 2021, at 2:53 PM, Kyungyong Seong wrote: > > Hi > > The BUSCO statistics obtained from my genome seems to be decent with 97.3% completeness (-m geno). I am having problems generating genome annotation sets that show comparable BUSCO completeness (-m prot). Currently, completeness is around 88%, and iterative MAKER annotation is not significantly increasing this value. > > I started with prot2genome and cdn2genome alignments. I then trained AUGUSTUS with BUSCO gene sets and SNAP with the predicted gene models with good quality, and ran MAKER without prot/cdn2genome. The third run was with newly trained AUGUSTUS and SNAP, which only increased the BUSCO completeness by 2%. I imagined that single copy orthologs would be well supported by evidence and may be relatively easy to predict as well. I wasn't quite sure what is happening. Would you have any advice? > > Thank you! > Kyungyong > > > > > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 1376 bytes Desc: not available URL: From aylward.megan at gmail.com Tue Jun 15 03:32:01 2021 From: aylward.megan at gmail.com (Megan Aylward) Date: Tue, 15 Jun 2021 09:32:01 -0000 Subject: [maker-devel] Error using ab intio gene predictor with gffs Message-ID: Hi, After running one round of Maker, I am trying to run a second round using the outputs from the first as inputs for the different evidence for EST, proteins, and repeats in gff format. I am receiving the following error: Can't locate object method "add_entry" via package "1" (perhaps you forgot to load "1"?) at maker-3.01.03/bin/../lib/Widget/snap.pm line 540. ERROR: Failed while annotating transcripts ERROR: Chunk failed at level:1, tier_type:4 There are not any other errors prior to this one. As I understand it may be an issue with one of the feature files. Could this be the issue, and if so do you have any suggestions of how to detect what is causing the error? Many thanks, Megan -------------- next part -------------- An HTML attachment was scrubbed... URL: From vireofeathers at gmail.com Sun Jun 27 19:55:00 2021 From: vireofeathers at gmail.com (Sarah Baker) Date: Sun, 27 Jun 2021 18:55:00 -0700 Subject: [maker-devel] Extract CDS sequences without UTRs Message-ID: Hello, I have annotated several genomes using the Maker2 pipeline with the goal of estimating dN/dS ratios for many genes. I have been using the fasta_merge script to extract the coding sequences, but I just noticed that the nucleotide sequences that it outputs (in *.all.maker.transcripts.fasta) sometimes include the 5' and 3' UTRs and they are not always in the correct reading frame. Is there a way to output CDS sequences in-frame and without UTRs (eg. so that the contents of *.all.maker.transcripts.fasta could be directly translated to the *.all.maker.proteins.fasta output by fasta_merge)? Thank you for this great program! -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Jun 8 14:04:29 2021 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 8 Jun 2021 14:04:29 -0600 Subject: [maker-devel] maker-devel post from brian.mack@usda.gov requires approval In-Reply-To: References: Message-ID: <712258C3-EFFD-4216-B65F-A50564E7B3EE@gmail.com> No requirement to train on a masked genome. If I?m not mistaken GeneMark splits up the genome on long stretches of N, so it may make training faster, but it may also split contains on some introns. ?Carson > to approve or deny the request. > > From: "Mack, Brian - ARS" > > Subject: genemark-es training on masked genome > Date: May 17, 2021 at 9:47:57 AM MDT > To: "maker-devel at yandell-lab.org " > > > > Hi, > > I?m using MAKER for a fungal genome and I was wondering if it is recommended to train genemark-es on a masked genome? If so, would running just the repeat masking step with MAKER and using the masked files from the ?Void? directories be appropriate (cat *output/*datastore/*/*/*/*Void*/query.masked.fasta > masked_genome.fasta) ? > > Thanks, > Brian > > > > > This electronic message contains information generated by the USDA solely for the intended recipients. Any unauthorized interception of this message or the use or disclosure of the information it contains may violate the law and subject the violator to civil or criminal penalties. If you believe you have received this message in error, please notify the sender and delete the email immediately. > > > From: maker-devel-request at yandell-lab.org > Subject: confirm 9305bcd5964821ff5fff547bdb1b0466c9c50b58 > Date: May 17, 2021 at 9:48:09 AM MDT > > > If you reply to this message, keeping the Subject: header intact, > Mailman will discard the held message. Do this if the message is > spam. If you reply to this message and include an Approved: header > with the list password in it, the message will be approved for posting > to the list. The Approved: header can also appear in the first line > of the body of the reply. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 1376 bytes Desc: not available URL: From carsonhh at gmail.com Tue Jun 8 14:09:43 2021 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 8 Jun 2021 14:09:43 -0600 Subject: [maker-devel] Improving BUSCO stats In-Reply-To: References: Message-ID: It may be insufficient evidence. You can scan the rejected Augustus/Snap models for known protein domains using InterProScan and add them back to the final set (model rescue). Info in this paper https://www.yandell-lab.org/publications/pdf/maker_current_protocols.pdf (see Basic Protocol 5). If model rescue does not improve it, then you may have genes split across short contigs. In that cases there is not enough sequence for the gene predictors to call a model, but there is enough to generate a BUSCO match. If that?s the case, you would have to improve the assembly to recover the models. ?Carson > On May 16, 2021, at 2:53 PM, Kyungyong Seong wrote: > > Hi > > The BUSCO statistics obtained from my genome seems to be decent with 97.3% completeness (-m geno). I am having problems generating genome annotation sets that show comparable BUSCO completeness (-m prot). Currently, completeness is around 88%, and iterative MAKER annotation is not significantly increasing this value. > > I started with prot2genome and cdn2genome alignments. I then trained AUGUSTUS with BUSCO gene sets and SNAP with the predicted gene models with good quality, and ran MAKER without prot/cdn2genome. The third run was with newly trained AUGUSTUS and SNAP, which only increased the BUSCO completeness by 2%. I imagined that single copy orthologs would be well supported by evidence and may be relatively easy to predict as well. I wasn't quite sure what is happening. Would you have any advice? > > Thank you! > Kyungyong > > > > > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 1376 bytes Desc: not available URL: From aylward.megan at gmail.com Tue Jun 15 03:32:01 2021 From: aylward.megan at gmail.com (Megan Aylward) Date: Tue, 15 Jun 2021 09:32:01 -0000 Subject: [maker-devel] Error using ab intio gene predictor with gffs Message-ID: Hi, After running one round of Maker, I am trying to run a second round using the outputs from the first as inputs for the different evidence for EST, proteins, and repeats in gff format. I am receiving the following error: Can't locate object method "add_entry" via package "1" (perhaps you forgot to load "1"?) at maker-3.01.03/bin/../lib/Widget/snap.pm line 540. ERROR: Failed while annotating transcripts ERROR: Chunk failed at level:1, tier_type:4 There are not any other errors prior to this one. As I understand it may be an issue with one of the feature files. Could this be the issue, and if so do you have any suggestions of how to detect what is causing the error? Many thanks, Megan -------------- next part -------------- An HTML attachment was scrubbed... URL: From vireofeathers at gmail.com Sun Jun 27 19:55:00 2021 From: vireofeathers at gmail.com (Sarah Baker) Date: Sun, 27 Jun 2021 18:55:00 -0700 Subject: [maker-devel] Extract CDS sequences without UTRs Message-ID: Hello, I have annotated several genomes using the Maker2 pipeline with the goal of estimating dN/dS ratios for many genes. I have been using the fasta_merge script to extract the coding sequences, but I just noticed that the nucleotide sequences that it outputs (in *.all.maker.transcripts.fasta) sometimes include the 5' and 3' UTRs and they are not always in the correct reading frame. Is there a way to output CDS sequences in-frame and without UTRs (eg. so that the contents of *.all.maker.transcripts.fasta could be directly translated to the *.all.maker.proteins.fasta output by fasta_merge)? Thank you for this great program! -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Jun 8 14:04:29 2021 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 8 Jun 2021 14:04:29 -0600 Subject: [maker-devel] maker-devel post from brian.mack@usda.gov requires approval In-Reply-To: References: Message-ID: <712258C3-EFFD-4216-B65F-A50564E7B3EE@gmail.com> No requirement to train on a masked genome. If I?m not mistaken GeneMark splits up the genome on long stretches of N, so it may make training faster, but it may also split contains on some introns. ?Carson > to approve or deny the request. > > From: "Mack, Brian - ARS" > > Subject: genemark-es training on masked genome > Date: May 17, 2021 at 9:47:57 AM MDT > To: "maker-devel at yandell-lab.org " > > > > Hi, > > I?m using MAKER for a fungal genome and I was wondering if it is recommended to train genemark-es on a masked genome? If so, would running just the repeat masking step with MAKER and using the masked files from the ?Void? directories be appropriate (cat *output/*datastore/*/*/*/*Void*/query.masked.fasta > masked_genome.fasta) ? > > Thanks, > Brian > > > > > This electronic message contains information generated by the USDA solely for the intended recipients. Any unauthorized interception of this message or the use or disclosure of the information it contains may violate the law and subject the violator to civil or criminal penalties. If you believe you have received this message in error, please notify the sender and delete the email immediately. > > > From: maker-devel-request at yandell-lab.org > Subject: confirm 9305bcd5964821ff5fff547bdb1b0466c9c50b58 > Date: May 17, 2021 at 9:48:09 AM MDT > > > If you reply to this message, keeping the Subject: header intact, > Mailman will discard the held message. Do this if the message is > spam. If you reply to this message and include an Approved: header > with the list password in it, the message will be approved for posting > to the list. The Approved: header can also appear in the first line > of the body of the reply. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 1376 bytes Desc: not available URL: From carsonhh at gmail.com Tue Jun 8 14:09:43 2021 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 8 Jun 2021 14:09:43 -0600 Subject: [maker-devel] Improving BUSCO stats In-Reply-To: References: Message-ID: It may be insufficient evidence. You can scan the rejected Augustus/Snap models for known protein domains using InterProScan and add them back to the final set (model rescue). Info in this paper https://www.yandell-lab.org/publications/pdf/maker_current_protocols.pdf (see Basic Protocol 5). If model rescue does not improve it, then you may have genes split across short contigs. In that cases there is not enough sequence for the gene predictors to call a model, but there is enough to generate a BUSCO match. If that?s the case, you would have to improve the assembly to recover the models. ?Carson > On May 16, 2021, at 2:53 PM, Kyungyong Seong wrote: > > Hi > > The BUSCO statistics obtained from my genome seems to be decent with 97.3% completeness (-m geno). I am having problems generating genome annotation sets that show comparable BUSCO completeness (-m prot). Currently, completeness is around 88%, and iterative MAKER annotation is not significantly increasing this value. > > I started with prot2genome and cdn2genome alignments. I then trained AUGUSTUS with BUSCO gene sets and SNAP with the predicted gene models with good quality, and ran MAKER without prot/cdn2genome. The third run was with newly trained AUGUSTUS and SNAP, which only increased the BUSCO completeness by 2%. I imagined that single copy orthologs would be well supported by evidence and may be relatively easy to predict as well. I wasn't quite sure what is happening. Would you have any advice? > > Thank you! > Kyungyong > > > > > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 1376 bytes Desc: not available URL: From aylward.megan at gmail.com Tue Jun 15 03:32:01 2021 From: aylward.megan at gmail.com (Megan Aylward) Date: Tue, 15 Jun 2021 09:32:01 -0000 Subject: [maker-devel] Error using ab intio gene predictor with gffs Message-ID: Hi, After running one round of Maker, I am trying to run a second round using the outputs from the first as inputs for the different evidence for EST, proteins, and repeats in gff format. I am receiving the following error: Can't locate object method "add_entry" via package "1" (perhaps you forgot to load "1"?) at maker-3.01.03/bin/../lib/Widget/snap.pm line 540. ERROR: Failed while annotating transcripts ERROR: Chunk failed at level:1, tier_type:4 There are not any other errors prior to this one. As I understand it may be an issue with one of the feature files. Could this be the issue, and if so do you have any suggestions of how to detect what is causing the error? Many thanks, Megan -------------- next part -------------- An HTML attachment was scrubbed... URL: From vireofeathers at gmail.com Sun Jun 27 19:55:00 2021 From: vireofeathers at gmail.com (Sarah Baker) Date: Sun, 27 Jun 2021 18:55:00 -0700 Subject: [maker-devel] Extract CDS sequences without UTRs Message-ID: Hello, I have annotated several genomes using the Maker2 pipeline with the goal of estimating dN/dS ratios for many genes. I have been using the fasta_merge script to extract the coding sequences, but I just noticed that the nucleotide sequences that it outputs (in *.all.maker.transcripts.fasta) sometimes include the 5' and 3' UTRs and they are not always in the correct reading frame. Is there a way to output CDS sequences in-frame and without UTRs (eg. so that the contents of *.all.maker.transcripts.fasta could be directly translated to the *.all.maker.proteins.fasta output by fasta_merge)? Thank you for this great program! -------------- next part -------------- An HTML attachment was scrubbed... URL: