From eennadi at gmail.com Thu Nov 2 14:51:00 2017 From: eennadi at gmail.com (Emmanuel Nnadi) Date: Thu, 2 Nov 2017 20:51:00 +0100 Subject: [maker-devel] Error trying to submit genome to ncbi Message-ID: Hi, I am trying to submit my genome i annotated using maker and they sent back this error, 1. Please remove any N nucleotides from the beginning or end of the sequence 2.No feature should begin or end inside a gap. Instead the feature should be made partial at the gap boundary. [3] Coding regions should not be 5' partial if they begin with the start methionine. If this is an internal methionine int he translation than it is fine if they are partial. Conversely, all coding regions must have a stop codon or be 3' partial. You have a large number of gene features that are not associated with other features. Please include on these features in the gene description field some description of what the gene would have encoded. A feature table example of this is: <41156 >40652 gene gene_desc transposon locus_tag CR513_45338 note nonfunctional due to frameshift Please how can i use maker to solve this problem? Nnadi Nnaemeka Emmanuel Department of Microbiology, Faculty of Natural and Applied Science, Plateau State University, Bokkos, Plateau State, Nigeria. Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications -------------- next part -------------- An HTML attachment was scrubbed... URL: From dandence at gmail.com Thu Nov 2 15:08:54 2017 From: dandence at gmail.com (Daniel Ence) Date: Thu, 2 Nov 2017 16:08:54 -0400 Subject: [maker-devel] Error trying to submit genome to ncbi In-Reply-To: References: Message-ID: Hi, I think you?ve posted before about issues 1 and 2 from the NCBI. The note for issue 3 from NCBI sounds like there are gene features that don?t have associated transcript, CDS or exon features. I?m not certain how that could be a result from MAKER. It might be something that someone else created (manually or with another tool), and then passed to maker from a GFF file. In the example included in your email, it looks like these offending genes are transposons that have been annotated as genes. If that is the case for the rest of the offending genes, then I would suggest changing the ?type? field (column 3) from ?gene? to something else, like ?transposable_element? perhaps. ~Daniel > On Nov 2, 2017, at 3:51 PM, Emmanuel Nnadi wrote: > > Hi, > > I am trying to submit my genome i annotated using maker and they sent back this error, > 1. Please remove any N nucleotides from the beginning or end of the sequence > 2.No feature should begin or end inside a gap. Instead the feature should > be made partial at the gap boundary. > > [3] Coding regions should not be 5' partial if they begin with the start > methionine. If this is an internal methionine int he translation than > it is fine if they are partial. Conversely, all coding regions > must have a stop codon or be 3' partial. > You have a large number of gene features that are not associated > with other features. Please include on these features in the > gene description field some description of what the gene would > have encoded. > > A feature table example of this is: > > <41156 >40652 gene > gene_desc transposon > locus_tag CR513_45338 > note nonfunctional due to frameshift > Please how can i use maker to solve this problem? > > > Nnadi Nnaemeka Emmanuel > Department of Microbiology, > Faculty of Natural and Applied Science, > Plateau State University, Bokkos, Plateau State, Nigeria. > Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 1356 bytes Desc: not available URL: From dandence at gmail.com Thu Nov 2 15:24:31 2017 From: dandence at gmail.com (Daniel Ence) Date: Thu, 2 Nov 2017 16:24:31 -0400 Subject: [maker-devel] Error trying to submit genome to ncbi In-Reply-To: References: Message-ID: <5EB1FECF-535B-447D-AFCF-E13174DB4232@gmail.com> Hi, Thank you for sending me your data, but which ones are the offending genes that NCBI is complaining about? Can you identify the problem that NCBI is giving in some subset of the gene features? ~Daniel > On Nov 2, 2017, at 4:20 PM, Emmanuel Nnadi wrote: > > Hi Daniel thanks for your reply. > > I have attached my .tbl file > > you would see > <77753 >77549 gene > locus_tag CR513_00193 > gene AtMg00820 > note nonfunctional due to frameshift > > > Is another example. > > Its becoming frustrating. > > I have not posted the two errors before > [1] Please remove any N nucleotides from the beginning or end of the sequence. > > [2] No feature should begin or end inside a gap. Instead the feature should > be made partial at the gap boundary. > > [3] Coding regions should not be 5' partial if they begin with the start > methionine. If this is an internal methionine int he translation than > it is fine if they are partial. Conversely, all coding regions > must have a stop codon or be 3' partial. > > Nnadi Nnaemeka Emmanuel > Department of Microbiology, > Faculty of Natural and Applied Science, > Plateau State University, Bokkos, Plateau State, Nigeria. > Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications > On Thu, Nov 2, 2017 at 9:08 PM, Daniel Ence > wrote: > Hi, I think you?ve posted before about issues 1 and 2 from the NCBI. The note for issue 3 from NCBI sounds like there are gene features that don?t have associated transcript, CDS or exon features. I?m not certain how that could be a result from MAKER. It might be something that someone else created (manually or with another tool), and then passed to maker from a GFF file. In the example included in your email, it looks like these offending genes are transposons that have been annotated as genes. If that is the case for the rest of the offending genes, then I would suggest changing the ?type? field (column 3) from ?gene? to something else, like ?transposable_element? perhaps. > > ~Daniel > > >> On Nov 2, 2017, at 3:51 PM, Emmanuel Nnadi > wrote: >> >> Hi, >> >> I am trying to submit my genome i annotated using maker and they sent back this error, >> 1. Please remove any N nucleotides from the beginning or end of the sequence >> 2.No feature should begin or end inside a gap. Instead the feature should >> be made partial at the gap boundary. >> >> [3] Coding regions should not be 5' partial if they begin with the start >> methionine. If this is an internal methionine int he translation than >> it is fine if they are partial. Conversely, all coding regions >> must have a stop codon or be 3' partial. >> You have a large number of gene features that are not associated >> with other features. Please include on these features in the >> gene description field some description of what the gene would >> have encoded. >> >> A feature table example of this is: >> >> <41156 >40652 gene >> gene_desc transposon >> locus_tag CR513_45338 >> note nonfunctional due to frameshift >> Please how can i use maker to solve this problem? >> >> >> Nnadi Nnaemeka Emmanuel >> Department of Microbiology, >> Faculty of Natural and Applied Science, >> Plateau State University, Bokkos, Plateau State, Nigeria. >> Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 1356 bytes Desc: not available URL: From dandence at gmail.com Thu Nov 2 15:46:03 2017 From: dandence at gmail.com (Daniel Ence) Date: Thu, 2 Nov 2017 16:46:03 -0400 Subject: [maker-devel] Error trying to submit genome to ncbi In-Reply-To: References: <5EB1FECF-535B-447D-AFCF-E13174DB4232@gmail.com> Message-ID: These gene features with the ?nonfunctional due to frameshift? indeed do not have other features associated with them in the tbl files. Is this reflected in the gff3 files for these annotations that maker produced? I?m not certain how maker would maker a gene without a CDS or mRNA, but identifying those discrepancies would a place to understand what has happened. > On Nov 2, 2017, at 4:30 PM, Emmanuel Nnadi wrote: > > Hi Daniel, > > This is the mail they sent to me > > [1] Please remove any N nucleotides from the beginning or end of the sequence. > > [2] No feature should begin or end inside a gap. Instead the feature should > be made partial at the gap boundary. > > [3] Coding regions should not be 5' partial if they begin with the start > methionine. If this is an internal methionine int he translation than > it is fine if they are partial. Conversely, all coding regions > must have a stop codon or be 3' partial. > > [4] You have a large number of gene features that are not associated > with other features. Please include on these features in the > gene description field some description of what the gene would > have encoded. > > A feature table example of this is: > > <41156 >40652 gene > gene_desc transposon > locus_tag CR513_45338 > note nonfunctional due to frameshift > > [5] Every coding region must have a corresponding mRNA and in > every case the mRNA product name must match exactly that of the > CDS feature. > > 2 coding regions do not have an mRNA > ORIG/combined_1-5000.sqn:CDS cytochrome c oxidase subunit 2 (contig_100:<38458- > 39198, 40429->40623) CR513_00692 > ORIG/combined_1-5000.sqn:CDS cytochrome c oxidase subunit 1 > (contig_100:c>113064-111485, c111245-111221) CR513_00691 > > So I just went to the .tbl file and searched for nonfunctional due to frameshift They are quite much, I have two more .tbl files > > I used GAG annotation to remove NNN and to add start and stop codon but ncbi still complained. > > > I have ran out of idea > > Please help me > > > > > > > Nnadi Nnaemeka Emmanuel > Department of Microbiology, > Faculty of Natural and Applied Science, > Plateau State University, Bokkos, Plateau State, Nigeria. > Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications > On Thu, Nov 2, 2017 at 9:24 PM, Daniel Ence > wrote: > Hi, Thank you for sending me your data, but which ones are the offending genes that NCBI is complaining about? Can you identify the problem that NCBI is giving in some subset of the gene features? > > ~Daniel > > > > >> On Nov 2, 2017, at 4:20 PM, Emmanuel Nnadi > wrote: >> >> Hi Daniel thanks for your reply. >> >> I have attached my .tbl file >> >> you would see >> <77753 >77549 gene >> locus_tag CR513_00193 >> gene AtMg00820 >> note nonfunctional due to frameshift >> >> >> Is another example. >> >> Its becoming frustrating. >> >> I have not posted the two errors before >> [1] Please remove any N nucleotides from the beginning or end of the sequence. >> >> [2] No feature should begin or end inside a gap. Instead the feature should >> be made partial at the gap boundary. >> >> [3] Coding regions should not be 5' partial if they begin with the start >> methionine. If this is an internal methionine int he translation than >> it is fine if they are partial. Conversely, all coding regions >> must have a stop codon or be 3' partial. >> >> Nnadi Nnaemeka Emmanuel >> Department of Microbiology, >> Faculty of Natural and Applied Science, >> Plateau State University, Bokkos, Plateau State, Nigeria. >> Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications >> On Thu, Nov 2, 2017 at 9:08 PM, Daniel Ence > wrote: >> Hi, I think you?ve posted before about issues 1 and 2 from the NCBI. The note for issue 3 from NCBI sounds like there are gene features that don?t have associated transcript, CDS or exon features. I?m not certain how that could be a result from MAKER. It might be something that someone else created (manually or with another tool), and then passed to maker from a GFF file. In the example included in your email, it looks like these offending genes are transposons that have been annotated as genes. If that is the case for the rest of the offending genes, then I would suggest changing the ?type? field (column 3) from ?gene? to something else, like ?transposable_element? perhaps. >> >> ~Daniel >> >> >>> On Nov 2, 2017, at 3:51 PM, Emmanuel Nnadi > wrote: >>> >>> Hi, >>> >>> I am trying to submit my genome i annotated using maker and they sent back this error, >>> 1. Please remove any N nucleotides from the beginning or end of the sequence >>> 2.No feature should begin or end inside a gap. Instead the feature should >>> be made partial at the gap boundary. >>> >>> [3] Coding regions should not be 5' partial if they begin with the start >>> methionine. If this is an internal methionine int he translation than >>> it is fine if they are partial. Conversely, all coding regions >>> must have a stop codon or be 3' partial. >>> You have a large number of gene features that are not associated >>> with other features. Please include on these features in the >>> gene description field some description of what the gene would >>> have encoded. >>> >>> A feature table example of this is: >>> >>> <41156 >40652 gene >>> gene_desc transposon >>> locus_tag CR513_45338 >>> note nonfunctional due to frameshift >>> Please how can i use maker to solve this problem? >>> >>> >>> Nnadi Nnaemeka Emmanuel >>> Department of Microbiology, >>> Faculty of Natural and Applied Science, >>> Plateau State University, Bokkos, Plateau State, Nigeria. >>> Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 1356 bytes Desc: not available URL: From carsonhh at gmail.com Thu Nov 2 15:48:40 2017 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 2 Nov 2017 14:48:40 -0600 Subject: [maker-devel] Error trying to submit genome to ncbi In-Reply-To: References: <5EB1FECF-535B-447D-AFCF-E13174DB4232@gmail.com> Message-ID: <56DF0ADA-40DA-4C88-AD37-BF63D8BCFD22@gmail.com> If you modified the fasta files to remove N?s etc after they were annotated, then that would generate a mismatch between the GFF3 coordinates and the fasta sequence. Have you modified or split contigs in the assembly in any way? I seem to remember you posting an issue about the fasta submission to NCBI previously. ?Carson > On Nov 2, 2017, at 2:46 PM, Daniel Ence wrote: > > These gene features with the ?nonfunctional due to frameshift? indeed do not have other features associated with them in the tbl files. Is this reflected in the gff3 files for these annotations that maker produced? I?m not certain how maker would maker a gene without a CDS or mRNA, but identifying those discrepancies would a place to understand what has happened. > > > >> On Nov 2, 2017, at 4:30 PM, Emmanuel Nnadi > wrote: >> >> Hi Daniel, >> >> This is the mail they sent to me >> >> [1] Please remove any N nucleotides from the beginning or end of the sequence. >> >> [2] No feature should begin or end inside a gap. Instead the feature should >> be made partial at the gap boundary. >> >> [3] Coding regions should not be 5' partial if they begin with the start >> methionine. If this is an internal methionine int he translation than >> it is fine if they are partial. Conversely, all coding regions >> must have a stop codon or be 3' partial. >> >> [4] You have a large number of gene features that are not associated >> with other features. Please include on these features in the >> gene description field some description of what the gene would >> have encoded. >> >> A feature table example of this is: >> >> <41156 >40652 gene >> gene_desc transposon >> locus_tag CR513_45338 >> note nonfunctional due to frameshift >> >> [5] Every coding region must have a corresponding mRNA and in >> every case the mRNA product name must match exactly that of the >> CDS feature. >> >> 2 coding regions do not have an mRNA >> ORIG/combined_1-5000.sqn:CDS cytochrome c oxidase subunit 2 (contig_100:<38458- >> 39198, 40429->40623) CR513_00692 >> ORIG/combined_1-5000.sqn:CDS cytochrome c oxidase subunit 1 >> (contig_100:c>113064-111485, c111245-111221) CR513_00691 >> >> So I just went to the .tbl file and searched for nonfunctional due to frameshift They are quite much, I have two more .tbl files >> >> I used GAG annotation to remove NNN and to add start and stop codon but ncbi still complained. >> >> >> I have ran out of idea >> >> Please help me >> >> >> >> >> >> >> Nnadi Nnaemeka Emmanuel >> Department of Microbiology, >> Faculty of Natural and Applied Science, >> Plateau State University, Bokkos, Plateau State, Nigeria. >> Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications >> On Thu, Nov 2, 2017 at 9:24 PM, Daniel Ence > wrote: >> Hi, Thank you for sending me your data, but which ones are the offending genes that NCBI is complaining about? Can you identify the problem that NCBI is giving in some subset of the gene features? >> >> ~Daniel >> >> >> >> >>> On Nov 2, 2017, at 4:20 PM, Emmanuel Nnadi > wrote: >>> >>> Hi Daniel thanks for your reply. >>> >>> I have attached my .tbl file >>> >>> you would see >>> <77753 >77549 gene >>> locus_tag CR513_00193 >>> gene AtMg00820 >>> note nonfunctional due to frameshift >>> >>> >>> Is another example. >>> >>> Its becoming frustrating. >>> >>> I have not posted the two errors before >>> [1] Please remove any N nucleotides from the beginning or end of the sequence. >>> >>> [2] No feature should begin or end inside a gap. Instead the feature should >>> be made partial at the gap boundary. >>> >>> [3] Coding regions should not be 5' partial if they begin with the start >>> methionine. If this is an internal methionine int he translation than >>> it is fine if they are partial. Conversely, all coding regions >>> must have a stop codon or be 3' partial. >>> >>> Nnadi Nnaemeka Emmanuel >>> Department of Microbiology, >>> Faculty of Natural and Applied Science, >>> Plateau State University, Bokkos, Plateau State, Nigeria. >>> Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications >>> On Thu, Nov 2, 2017 at 9:08 PM, Daniel Ence > wrote: >>> Hi, I think you?ve posted before about issues 1 and 2 from the NCBI. The note for issue 3 from NCBI sounds like there are gene features that don?t have associated transcript, CDS or exon features. I?m not certain how that could be a result from MAKER. It might be something that someone else created (manually or with another tool), and then passed to maker from a GFF file. In the example included in your email, it looks like these offending genes are transposons that have been annotated as genes. If that is the case for the rest of the offending genes, then I would suggest changing the ?type? field (column 3) from ?gene? to something else, like ?transposable_element? perhaps. >>> >>> ~Daniel >>> >>> >>>> On Nov 2, 2017, at 3:51 PM, Emmanuel Nnadi > wrote: >>>> >>>> Hi, >>>> >>>> I am trying to submit my genome i annotated using maker and they sent back this error, >>>> 1. Please remove any N nucleotides from the beginning or end of the sequence >>>> 2.No feature should begin or end inside a gap. Instead the feature should >>>> be made partial at the gap boundary. >>>> >>>> [3] Coding regions should not be 5' partial if they begin with the start >>>> methionine. If this is an internal methionine int he translation than >>>> it is fine if they are partial. Conversely, all coding regions >>>> must have a stop codon or be 3' partial. >>>> You have a large number of gene features that are not associated >>>> with other features. Please include on these features in the >>>> gene description field some description of what the gene would >>>> have encoded. >>>> >>>> A feature table example of this is: >>>> >>>> <41156 >40652 gene >>>> gene_desc transposon >>>> locus_tag CR513_45338 >>>> note nonfunctional due to frameshift >>>> Please how can i use maker to solve this problem? >>>> >>>> >>>> Nnadi Nnaemeka Emmanuel >>>> Department of Microbiology, >>>> Faculty of Natural and Applied Science, >>>> Plateau State University, Bokkos, Plateau State, Nigeria. >>>> Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at box290.bluehost.com >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> >>> >> >> > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From dandence at gmail.com Thu Nov 2 16:07:01 2017 From: dandence at gmail.com (Daniel Ence) Date: Thu, 2 Nov 2017 17:07:01 -0400 Subject: [maker-devel] Error trying to submit genome to ncbi In-Reply-To: References: <5EB1FECF-535B-447D-AFCF-E13174DB4232@gmail.com> Message-ID: Hi Emmanuel, I recommend looking into what Carson suggested. If you edited the fasta files for the ?NNN? characters for the transcripts or reference genome and then resubmitted without changing the gff3 coordinates, then that would result in these kind of errors. ~Daniel > On Nov 2, 2017, at 5:02 PM, Emmanuel Nnadi wrote: > > ?muc_functional.blast.gff -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 1356 bytes Desc: not available URL: From dandence at gmail.com Thu Nov 2 16:56:24 2017 From: dandence at gmail.com (Daniel Ence) Date: Thu, 2 Nov 2017 17:56:24 -0400 Subject: [maker-devel] Error trying to submit genome to ncbi In-Reply-To: References: <5EB1FECF-535B-447D-AFCF-E13174DB4232@gmail.com> Message-ID: <20FE86D2-2431-4CD8-B4E1-E700F723760C@gmail.com> Hi Emmanuel, Please ?reply all? to in these exchanges so that they?ll stay stored on the maker-devel list for others to find in the future. It also helps keep the conversation open so that others can chime in and help out too. :) I looked at several of the ?nonfunctional due to frameshift? genes and they have associated features in the gff3 file. So there might be a frameshift issue in the original annotations, but I?d doubt that, or a frameshift error might be getting introduced when you convert to the tbl format. > On Nov 2, 2017, at 5:12 PM, Emmanuel Nnadi wrote: > > Hi Daniel > > I NCBI first complained of this even when I hadn't used GAG annotation to remove N's, > > On my raw file they complained about this > > Nnadi Nnaemeka Emmanuel > Department of Microbiology, > Faculty of Natural and Applied Science, > Plateau State University, Bokkos, Plateau State, Nigeria. > Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications > On Thu, Nov 2, 2017 at 10:07 PM, Daniel Ence > wrote: > Hi Emmanuel, I recommend looking into what Carson suggested. If you edited the fasta files for the ?NNN? characters for the transcripts or reference genome and then resubmitted without changing the gff3 coordinates, then that would result in these kind of errors. > > ~Daniel > > > > > > > > > >> On Nov 2, 2017, at 5:02 PM, Emmanuel Nnadi > wrote: >> >> ?muc_functional.blast.gff > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 1356 bytes Desc: not available URL: From o.k.torresen at ibv.uio.no Thu Nov 9 03:44:06 2017 From: o.k.torresen at ibv.uio.no (=?utf-8?B?T2xlIEtyaXN0aWFuIFTDuHJyZXNlbg==?=) Date: Thu, 9 Nov 2017 09:44:06 +0000 Subject: [maker-devel] substr outside of string in PhatHits_utils.pm Message-ID: Dear all, I'm having an issue with MAKER which I'm unable to wrap my head around. Hopefully the issue is easily identifiable and resolvable for someone with more insight than me. Please find the log output attached below. I cannot find any more information than this in any logs. Many scaffolds do complete fine, but some of the longest ones have issues. Thank you. Sincerely, Ole K. T?rresen Error message: #--------- command -------------# Widget::augustus: /projects/cees/bin/augustus/augustus-3.2.3/bin/augustus --strand=backward --species=gadMor2_code_braker2 --UTR=off --hintsfile=/tmp/18899594.d/maker_m8xwVp/0/34_2.22239-28465.gadMor2_code_brak er2.auto_annotator.xdef.augustus --extrinsicCfgFile=/projects/cees/bin/augustus/augustus-3.2.3/config/extrinsic/extrinsic.MPE.cfg --AUGUSTUS_CONFIG_PATH=/projects/cees/bin/augustus/augustus-3.2 .3/config /tmp/18899594.d/maker_m8xwVp/0/34_2.22239-28465.gadMor2_code_braker2.auto_annotator.augustus.fasta > /tmp/18899594.d/maker_m8xwVp/0/34_2.22239-28465.gadMor2_code_braker2.auto_annotato r.augustus #-------------------------------# deleted:0 genes begin called get_best_alt_splices1 ...processing 0 of 2 ...processing 1 of 2 end called get_best_alt_splices1 ...processing 0 of 20 ...processing 1 of 20 ...processing 2 of 20 ...processing 3 of 20 ...processing 4 of 20 ...processing 5 of 20 ...processing 6 of 20 ...processing 7 of 20 ...processing 8 of 20 ...processing 9 of 20 ...processing 10 of 20 ...processing 11 of 20 ...processing 12 of 20 ...processing 13 of 20 ...processing 14 of 20 ...processing 15 of 20 ...processing 16 of 20 ...processing 17 of 20 ...processing 18 of 20 ...processing 19 of 20 substr outside of string at /projects/cees/bin/maker/maker-3.1.1/bin/../lib/PhatHit_utils.pm line 850. --> rank=NA, hostname=compute-31-18.local ERROR: Failed while annotating transcripts ERROR: Chunk failed at level:1, tier_type:4 FAILED CONTIG:GmG20150304_scaffold_8692 ERROR: Chunk failed at level:6, tier_type:0 FAILED CONTIG:GmG20150304_scaffold_8692 examining contents of the fasta file and run log From lcampbell at ebi.ac.uk Thu Nov 9 05:13:35 2017 From: lcampbell at ebi.ac.uk (Lahcen Campbell) Date: Thu, 9 Nov 2017 11:13:35 +0000 Subject: [maker-devel] Model training with AED=0.7 made all contigs FAILED Message-ID: Hi folks, I would just like some insight into a recent round of MAKER annotation I performed and returned back 0 Finished contigs. The genome is a white fly, which I successfully ran MAKER on initally with the first round of "Evidence in", so passing in EST evidence as aligned transcript gffs, protein homology evidence etc. The run was successful and produced a lot of good quality gene models Statistics: ???????????24,613 genes with 49,547 transcripts containing 141130 cds. Now, I know this count is very high for our species, so in the 2nd round (completed running over 1 night due to all contigs failing) I attempted to increase the threshold for support, by reducing AED to 0.7 from an initial 1. Prior to starting the second round I had trained SNAP on the first round results and also ran Augustus separately and? passed this via the snaphmm, pred_gff option. Finally I set min protein to be no less than 100Aa and set est2genome and prot2genome off to allow for gene model refinement. I checked the run today and all ~8,000 contigs/scaffolds returned as FAILED with all having tried to be retried once each. My initial feeling was, I feared I have just lost my initial set of 24,613 gene models. I know believe that this won't be the case but Im not sure... Can anyone explain what might have happened here and what consequences will follow given they all returned as failed ? Have they been deleted from the MAKER data store ? I had capturdD all 1st round MAKER output files (GFF, Fasta files etc) before attempting this 2nd round (i.e. 1st round of model training) of MAKER . If I have irrevocably changed the datastore for MAKER and lost those genes, might I be able to restore to an earlier point (say back to the first round of evidence in gene models) by passing the first MAKER gff in as "maker_gff=" / "pred_pass=1" / "model_pass=1" ? Any advice on this would be much appreciated Lahcen -------------- next part -------------- An HTML attachment was scrubbed... URL: From lahcencampbell at gmail.com Thu Nov 9 08:53:19 2017 From: lahcencampbell at gmail.com (lahcen campbell) Date: Thu, 9 Nov 2017 14:53:19 +0000 Subject: [maker-devel] Model training with AED=0.7 made all contigs FAILED Message-ID: Apologies this message was sent earlier today from an incorrect email address so it was flagged for verification. Hi folks, I would just like some insight into a recent round of MAKER annotation I performed and returned back 0 Finished contigs. The genome is a white fly, which I successfully ran MAKE initially with the first round of "Evidence in", so passing in EST evidence as aligned transcript gffs, protein homology evidence etc. The run was successful and produced a lot of good quality gene models Statistics: 24,613 genes with 49,547 transcripts containing 141130 cds. Now, I know this count is very high for our species, so in the 2nd round (completed running over 1 night due to all contigs failing) I attempted to increase the threshold for support, by reducing AED to 0.7 from an initial 1. Prior to starting the second round I had trained SNAP on the first round results and also ran Augustus separately and passed this via the snaphmm, pred_gff option. Finally I set min protein to be no less than 100Aa and set est2genome and prot2genome off to allow for gene model refinement. I checked the run today and all ~8,000 contigs/scaffolds returned as FAILED with all having tried to be retried once each. (Note I retried to run this time reverting the AED to 1, yet the same outcome happened again). The following error appears throughout the log file: *MAKER WARNING: The file MAKER.contigs_datastore/BF/41/tig00000234//theVoid.tig00000234/0/tig00000234.0.all.rb.out* *did not finish on the last run and must be erased* My initial feeling was, I feared I have just lost my initial set of 24,613 gene models. I now believe that this won't be the case but Im not sure... Can anyone explain what might have happened here and what consequences will follow given they all returned as failed ? Have they been deleted from the MAKER data store ? Are they retrievable ? I had capturd all 1st round MAKER output files (GFF, Fasta files etc) before attempting this 2nd round (i.e. 1st round of model training) of MAKER . If I have irrevocably changed the datastore for MAKER and lost those genes, might I be able to restore to an earlier point (say back to the first round of evidence in gene models) by passing the first MAKER gff in as "maker_gff=" / "pred_pass=1" / "model_pass=1" ? As it stands, maker2zff and fasta_merge / gff3_merge all return nothing or empty output files. So clearly my gene models have been altered somehow. Any advice on this would be much appreciated. Lahcen -- ========================================== > Dr. Lahcen Campbell < > Contact: lahcencampbell at gmail.com < > https://www.ebi.ac.uk/about/people/lahcen-campbell < ========================================== -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Nov 9 10:28:19 2017 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 9 Nov 2017 09:28:19 -0700 Subject: [maker-devel] substr outside of string in PhatHits_utils.pm In-Reply-To: References: Message-ID: <5E5CA836-91B1-4AA8-8DC3-68FB9885EB43@gmail.com> My first guess is that if you are using gff3 files as input to anything, then there may be an issue with your GFF3 file. My second suggestion is to try MAKER 3.02.02 to see if it has the same issue. ?Carson > On Nov 9, 2017, at 2:44 AM, Ole Kristian T?rresen wrote: > > Dear all, > I'm having an issue with MAKER which I'm unable to wrap my head around. Hopefully the issue is easily identifiable and resolvable for someone with more insight than me. Please find the log output attached below. I cannot find any more information than this in any logs. Many scaffolds do complete fine, but some of the longest ones have issues. > > Thank you. > > Sincerely, > Ole K. T?rresen > > Error message: > > #--------- command -------------# > Widget::augustus: > /projects/cees/bin/augustus/augustus-3.2.3/bin/augustus --strand=backward --species=gadMor2_code_braker2 --UTR=off --hintsfile=/tmp/18899594.d/maker_m8xwVp/0/34_2.22239-28465.gadMor2_code_brak > er2.auto_annotator.xdef.augustus --extrinsicCfgFile=/projects/cees/bin/augustus/augustus-3.2.3/config/extrinsic/extrinsic.MPE.cfg --AUGUSTUS_CONFIG_PATH=/projects/cees/bin/augustus/augustus-3.2 > .3/config /tmp/18899594.d/maker_m8xwVp/0/34_2.22239-28465.gadMor2_code_braker2.auto_annotator.augustus.fasta > /tmp/18899594.d/maker_m8xwVp/0/34_2.22239-28465.gadMor2_code_braker2.auto_annotato > r.augustus > #-------------------------------# > deleted:0 genes > begin called get_best_alt_splices1 > ...processing 0 of 2 > ...processing 1 of 2 > end called get_best_alt_splices1 > ...processing 0 of 20 > ...processing 1 of 20 > ...processing 2 of 20 > ...processing 3 of 20 > ...processing 4 of 20 > ...processing 5 of 20 > ...processing 6 of 20 > ...processing 7 of 20 > ...processing 8 of 20 > ...processing 9 of 20 > ...processing 10 of 20 > ...processing 11 of 20 > ...processing 12 of 20 > ...processing 13 of 20 > ...processing 14 of 20 > ...processing 15 of 20 > ...processing 16 of 20 > ...processing 17 of 20 > ...processing 18 of 20 > ...processing 19 of 20 > substr outside of string at /projects/cees/bin/maker/maker-3.1.1/bin/../lib/PhatHit_utils.pm line 850. > --> rank=NA, hostname=compute-31-18.local > ERROR: Failed while annotating transcripts > ERROR: Chunk failed at level:1, tier_type:4 > FAILED CONTIG:GmG20150304_scaffold_8692 > > ERROR: Chunk failed at level:6, tier_type:0 > FAILED CONTIG:GmG20150304_scaffold_8692 > > examining contents of the fasta file and run log > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Thu Nov 9 17:30:50 2017 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 9 Nov 2017 16:30:50 -0700 Subject: [maker-devel] Model training with AED=0.7 made all contigs FAILED In-Reply-To: References: Message-ID: There is probably an issue with the GFF3 file being passed in (I?m guessing the Augustus one). I would avoid passing in Augustus results as GFF3, it removes the ability of MAKER to dynamically provide Augustus with hints as it runs. You are essentially handicapping the pipeline. If your first genes were est2genome or protein2genome based, I would not pass them back in. Those models are suitable for training but will really reduce the accuracy of downstream final annotations (that is why we tell people to turn off est2genome/protein2genome after training a gene predictor in the MAKER documentation). Also if your inputs to the first round were GFF3 files it will have to be reread regardless. Any protein or transcript data that was aligned by MAEKR will still have the BLAST results archived, so you don?t need to worry about that unless you alter repeat masking options (which would cause it to rerun). Also if you are changing GFF3 file input between runs but using the same directory, you might want to delete any ?.db? files in the output folder. those hold an SQLite database of the GFF3 input that may be corrupted if it failed while attempting to update the database content with the Augustus gff3 file. ?Carson > On Nov 9, 2017, at 4:13 AM, Lahcen Campbell wrote: > > Hi folks, > > I would just like some insight into a recent round of MAKER annotation I performed and returned back 0 Finished contigs. > The genome is a white fly, which I successfully ran MAKER on initally with the first round of "Evidence in", so passing in EST evidence as aligned transcript gffs, protein homology evidence etc. The run was successful and produced a lot of good quality gene models > > > Statistics: > 24,613 genes with 49,547 transcripts containing 141130 cds. > > Now, I know this count is very high for our species, so in the 2nd round (completed running over 1 night due to all contigs failing) I attempted to increase the threshold for support, by reducing AED to 0.7 from an initial 1. Prior to starting the second round I had trained SNAP on the first round results and also ran Augustus separately and passed this via the snaphmm, pred_gff option. Finally I set min protein to be no less than 100Aa and set est2genome and prot2genome off to allow for gene model refinement. > > I checked the run today and all ~8,000 contigs/scaffolds returned as FAILED with all having tried to be retried once each. > > My initial feeling was, I feared I have just lost my initial set of 24,613 gene models. I know believe that this won't be the case but Im not sure... Can anyone explain what might have happened here and what consequences will follow given they all returned as failed ? Have they been deleted from the MAKER data store ? > > I had capturdD all 1st round MAKER output files (GFF, Fasta files etc) before attempting this 2nd round (i.e. 1st round of model training) of MAKER . > > If I have irrevocably changed the datastore for MAKER and lost those genes, might I be able to restore to an earlier point (say back to the first round of evidence in gene models) by passing the first MAKER gff in as "maker_gff=" / "pred_pass=1" / "model_pass=1" ? > > Any advice on this would be much appreciated > Lahcen > > > > > > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From lahcencampbell at gmail.com Tue Nov 14 06:15:10 2017 From: lahcencampbell at gmail.com (lahcen campbell) Date: Tue, 14 Nov 2017 12:15:10 +0000 Subject: [maker-devel] Unwarranted error: Skipping the contig because it is too short Message-ID: Hi MAKER community, I was hoping someone could help me. I have a very unusual error with two different versions of maker I have tested so far. This error shouldn't be happening but it occurs time and again no matter what I try. I have tried using 2.31.6_mpich3_icc and 2.31_mpich3 Note that version 2.31.6_mpich3_icc is one I have used countless times and produced final MAKER annotations without issue. So its not that this version has issues to date. Basically, this is a brand new MAKER analysis, I am only trying to train SNAP in this first round. I am following the MakerTutorial as documented this time around and I can't get past the initial SNAP train stage. I have a single genome file with, 10 Long scaffolds making up just under 11MB (subsampled from my original full length assembly) of sequence data in which to train SNAP. The fasta file is not corrupted, and has been generated in various ways in order to test formatting issues etc. I have only edited the maker_opts file and changed: *genome=* *protein=* *protein2genome=1* But see attached my maker CTL files. The error consistently returned to me: *Skipping the contig because it is too short!!* *SeqID: contig_WHATEVER* *Length: 0* *The sequences are no where near too short. This was verified independently outside maker to be sure. * *The headers are as follows:* >tig00000458 len=2889428 reads=4143 covStat=1793.77 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >tig00000159 len=3515005 reads=5100 covStat=2143.94 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >tig00006117 len=1009519 reads=1168 covStat=804.93 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >tig00000419 len=2633986 reads=3938 covStat=1519.93 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >tig00027677 len=108573 reads=86 covStat=86.05 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >tig00021790 len=202251 reads=158 covStat=184.12 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >tig00316948 len=280333 reads=237 covStat=253.23 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >tig00019606 len=149709 reads=82 covStat=150.02 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >tig00023852 len=189461 reads=115 covStat=192.28 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >tig00316994 len=19742 reads=1 covStat=0.00 gappedBases=no class=contig suggestRepeat=no suggestCircular=no I have just about given up, I have no idea why its happening it makes zero sense. Any help or information as to why this might be happening would be amazing. Thank you in advance. Lahcen -- ========================================== > Dr. Lahcen Campbell < > Contact: lahcencampbell at gmail.com < > https://www.ebi.ac.uk/about/people/lahcen-campbell < ========================================== -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_bopts.ctl Type: application/octet-stream Size: 1412 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_exe.ctl Type: application/octet-stream Size: 1511 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_opts.ctl Type: application/octet-stream Size: 5559 bytes Desc: not available URL: From michael.s.campbell1 at gmail.com Tue Nov 14 09:08:43 2017 From: michael.s.campbell1 at gmail.com (Michael Campbell) Date: Tue, 14 Nov 2017 10:08:43 -0500 Subject: [maker-devel] Unwarranted error: Skipping the contig because it is too short In-Reply-To: References: Message-ID: <3780BDEB-AF0E-4E27-9CD4-29CF0242FF9B@gmail.com> Hi Lahcen, Nothing comes right to mind for what could be causing this error. If you want to compress your FASTA and send it to me I can try and recreate the error and try and debug it. Thanks, Mike > On Nov 14, 2017, at 7:15 AM, lahcen campbell wrote: > > Hi MAKER community, > > I was hoping someone could help me. I have a very unusual error with two different versions of maker I have tested so far. This error shouldn't be happening but it occurs time and again no matter what I try. I have tried using 2.31.6_mpich3_icc and 2.31_mpich3 > > Note that version 2.31.6_mpich3_icc is one I have used countless times and produced final MAKER annotations without issue. So its not that this version has issues to date. > > Basically, this is a brand new MAKER analysis, I am only trying to train SNAP in this first round. I am following the MakerTutorial as documented this time around and I can't get past the initial SNAP train stage. > > I have a single genome file with, 10 Long scaffolds making up just under 11MB (subsampled from my original full length assembly) of sequence data in which to train SNAP. The fasta file is not corrupted, and has been generated in various ways in order to test formatting issues etc. > > I have only edited the maker_opts file and changed: > > genome= > protein= > protein2genome=1 > > But see attached my maker CTL files. > > The error consistently returned to me: > > Skipping the contig because it is too short!! > SeqID: contig_WHATEVER > Length: 0 > > The sequences are no where near too short. This was verified independently outside maker to be sure. > > The headers are as follows: > > >tig00000458 len=2889428 reads=4143 covStat=1793.77 gappedBases=no class=contig suggestRepeat=no suggestCircular=no > >tig00000159 len=3515005 reads=5100 covStat=2143.94 gappedBases=no class=contig suggestRepeat=no suggestCircular=no > >tig00006117 len=1009519 reads=1168 covStat=804.93 gappedBases=no class=contig suggestRepeat=no suggestCircular=no > >tig00000419 len=2633986 reads=3938 covStat=1519.93 gappedBases=no class=contig suggestRepeat=no suggestCircular=no > >tig00027677 len=108573 reads=86 covStat=86.05 gappedBases=no class=contig suggestRepeat=no suggestCircular=no > >tig00021790 len=202251 reads=158 covStat=184.12 gappedBases=no class=contig suggestRepeat=no suggestCircular=no > >tig00316948 len=280333 reads=237 covStat=253.23 gappedBases=no class=contig suggestRepeat=no suggestCircular=no > >tig00019606 len=149709 reads=82 covStat=150.02 gappedBases=no class=contig suggestRepeat=no suggestCircular=no > >tig00023852 len=189461 reads=115 covStat=192.28 gappedBases=no class=contig suggestRepeat=no suggestCircular=no > >tig00316994 len=19742 reads=1 covStat=0.00 gappedBases=no class=contig suggestRepeat=no suggestCircular=no > > I have just about given up, I have no idea why its happening it makes zero sense. > > Any help or information as to why this might be happening would be amazing. > > Thank you in advance. > Lahcen > > -- > ========================================== > > Dr. Lahcen Campbell < > > Contact: lahcencampbell at gmail.com < > > https://www.ebi.ac.uk/about/people/lahcen-campbell < > ========================================== > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.s.campbell1 at gmail.com Tue Nov 14 11:04:04 2017 From: michael.s.campbell1 at gmail.com (Michael Campbell) Date: Tue, 14 Nov 2017 12:04:04 -0500 Subject: [maker-devel] Unwarranted error: Skipping the contig because it is too short In-Reply-To: References: <3780BDEB-AF0E-4E27-9CD4-29CF0242FF9B@gmail.com> Message-ID: Hi Lancen, Thanks, the name has served me well for a number of years now :) So I started a run with your 11 scaffolds. I gave it the protein file that you sent and used all of repbase for masking. All of the scaffolds finished without error. I was hoping it would be something simple that just needed another set of eyes to see, looks like it's not the case for this one. To further rule out a data issue I would try running it with the dpp test data that is bundled with MAKER to see if you can get the same error. This data set will run in about a minute. If you are on a cluster I would try running it with and without submitting it you the nodes and with and without mpi. One thing that I have done in the past is to make a new directory and run maker there (this doesn't make a lot of sense but when the error doesn't make sense either it seems reasonable). As far as rerunning MAKER there are a couple of approaches. If you want it to stop complaining about trying to many times on failed contigs you can increase the number of tries in the opts file. The line looks like this: tries=2 #number of times to try a contig if there is a failure for some reason If you want to run it elsewhere, but you don't want to have to redo all of the repeat masking and blasting you can use the gff3 output from an earlier run. If you used gff3_merge after the first run finished you got a big gff3 file with all of the gene models and evidence. If you break up that file by the source column you can selectively pass the evidence back to MAKER. If you put all of the repeatmasker and repeatrunner entries into one file and pass it in on this line: rm_gff= #pre-identified repeat elements from an external GFF3 file you can turn off model_org= and repeat_protein=. This will speed up the next run a lot. Then you can pass in the protein2genome gff3 data on this line: protein_gff= #aligned protein homology evidence from an external GFF3 file Don't pass the blast gff3 data in. If you pass in gff3 data to maker is assumes that it is polished and will not make any effort to fix alignments. the protein2genome data is polished. est2genome is the equivalent for EST input. Clean_up is useful if you are running on a file system that limits the number of files that you can write. It removes all of the intermediate files used in the annotation. This takes away the advantage of rerunning in the same directory. clean_try deletes everything first, and starts again. clean_try is the one that deletes everything and pretends that the first run never happened. I ccd the list on this response just Incas anyone else has any ideas or is facing the same error. Let me know if any of this helps, Mike > On Nov 14, 2017, at 10:48 AM, lahcen campbell wrote: > > Hi Michael > > Nice name btw I have a Michael in my name too :) Lahcen Michael Campbell to be exact haha...anyway... thanks for the reply and offer to help. > > I have attached the file in question below. Its so strange, I had to just leave it alone cause it was making me quite frustrated. Those bugs which there are now common sense solutions are the worst cause very easily you reach a wall. > > Might it have anything at all to do with the Protein homology file I passed in ? Though, note.... the same protein files here have been used in another maker run without issue so I kind of ruled that out already.....but just spitballing at this stage. > > > Might I be so cheeky to ask you one more MAKER related question Michael... ? Feel free to ignore it I hate to push but im desperate to figure it out with little time to do so... > > I have an issue with a different MAKER analysis. Currently any new run I attempt on this datastore, which has one round successful with 25000 odd genes and double the transcripts. I attempted to run the second round with a SNAP trained hmm (first time passing in SNAP hmm following first round EST/Protein evidence). In this attempt, because we obtained so many genes I thought I would be more stringent by changing the AED to 0.7 from 1.0. Something I see now I didn't approach in the right way... too late now sadly. > > MAKER finishes fine, but now it views all previous scaffolds as FAILED. Nothing seems to change this and now the datastore is for all intents and purposes locked in failed state. It keeps mentioning changes to the opts file which there were, and that the previous runs didn't finish so it must delete them. The results obtained from round 1 are still there though Im pretty sure of that, all blast files etc are still there and populated. > > Can you tell me the main differences either clean_up or clean_try have and which will completely and irreversibly wipe the first run? Something I don't want to repeat, just allow me to progress to the next round. Im hesitant to run them, but I've backed up the datastore incase. My next attempt will be to pass the exact same maker_opts file from the round1 run, with the only change made to clean_try/clean_up....Is this approach misguided ? > > Your help is very much appreciated Michael so thank you, > Best > L > > ? > ?Combined_Protein_homology.fa.zip ?? > ?SubsampledGenomeFile_n10_11MB.fasta ? > > > > On Tue, Nov 14, 2017 at 3:08 PM, Michael Campbell > wrote: > Hi Lahcen, > > Nothing comes right to mind for what could be causing this error. If you want to compress your FASTA and send it to me I can try and recreate the error and try and debug it. > > Thanks, > Mike >> On Nov 14, 2017, at 7:15 AM, lahcen campbell > wrote: >> >> Hi MAKER community, >> >> I was hoping someone could help me. I have a very unusual error with two different versions of maker I have tested so far. This error shouldn't be happening but it occurs time and again no matter what I try. I have tried using 2.31.6_mpich3_icc and 2.31_mpich3 >> >> Note that version 2.31.6_mpich3_icc is one I have used countless times and produced final MAKER annotations without issue. So its not that this version has issues to date. >> >> Basically, this is a brand new MAKER analysis, I am only trying to train SNAP in this first round. I am following the MakerTutorial as documented this time around and I can't get past the initial SNAP train stage. >> >> I have a single genome file with, 10 Long scaffolds making up just under 11MB (subsampled from my original full length assembly) of sequence data in which to train SNAP. The fasta file is not corrupted, and has been generated in various ways in order to test formatting issues etc. >> >> I have only edited the maker_opts file and changed: >> >> genome= >> protein= >> protein2genome=1 >> >> But see attached my maker CTL files. >> >> The error consistently returned to me: >> >> Skipping the contig because it is too short!! >> SeqID: contig_WHATEVER >> Length: 0 >> >> The sequences are no where near too short. This was verified independently outside maker to be sure. >> >> The headers are as follows: >> >> >tig00000458 len=2889428 reads=4143 covStat=1793.77 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >> >tig00000159 len=3515005 reads=5100 covStat=2143.94 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >> >tig00006117 len=1009519 reads=1168 covStat=804.93 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >> >tig00000419 len=2633986 reads=3938 covStat=1519.93 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >> >tig00027677 len=108573 reads=86 covStat=86.05 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >> >tig00021790 len=202251 reads=158 covStat=184.12 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >> >tig00316948 len=280333 reads=237 covStat=253.23 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >> >tig00019606 len=149709 reads=82 covStat=150.02 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >> >tig00023852 len=189461 reads=115 covStat=192.28 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >> >tig00316994 len=19742 reads=1 covStat=0.00 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >> >> I have just about given up, I have no idea why its happening it makes zero sense. >> >> Any help or information as to why this might be happening would be amazing. >> >> Thank you in advance. >> Lahcen >> >> -- >> ========================================== >> > Dr. Lahcen Campbell < >> > Contact: lahcencampbell at gmail.com < >> > https://www.ebi.ac.uk/about/people/lahcen-campbell < >> ========================================== >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > > -- > ========================================== > > Dr. Lahcen Campbell < > > Contact: lahcencampbell at gmail.com < > > https://www.ebi.ac.uk/about/people/lahcen-campbell < > ========================================== -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Nov 14 11:17:03 2017 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 14 Nov 2017 10:17:03 -0700 Subject: [maker-devel] Unwarranted error: Skipping the contig because it is too short In-Reply-To: References: Message-ID: My first thought is that one of your entries has a header and no sequence. Try this command with the fasta you are using ?> fasta_tool file.fasta --length | sort -nrk2 fasta_tool comes with maker. That command will report empty fasta entries at the bottom of the list with length 0. Alternatively, MAKER accesses the input assembly using BioPerl. Update your BioPerl to the latest CPAN version (do not use BioPerl-live, as it will be less stable). Also BioPerl is using BerkleyDB for indexing, so if you are using a Perl that is not the system Perl (i.e. /usr/bin/perl), then it was lik,ly compiled on the machine you are using and could have been compiled without BerkleyDB support. ?Carson > On Nov 14, 2017, at 5:15 AM, lahcen campbell wrote: > > Hi MAKER community, > > I was hoping someone could help me. I have a very unusual error with two different versions of maker I have tested so far. This error shouldn't be happening but it occurs time and again no matter what I try. I have tried using 2.31.6_mpich3_icc and 2.31_mpich3 > > Note that version 2.31.6_mpich3_icc is one I have used countless times and produced final MAKER annotations without issue. So its not that this version has issues to date. > > Basically, this is a brand new MAKER analysis, I am only trying to train SNAP in this first round. I am following the MakerTutorial as documented this time around and I can't get past the initial SNAP train stage. > > I have a single genome file with, 10 Long scaffolds making up just under 11MB (subsampled from my original full length assembly) of sequence data in which to train SNAP. The fasta file is not corrupted, and has been generated in various ways in order to test formatting issues etc. > > I have only edited the maker_opts file and changed: > > genome= > protein= > protein2genome=1 > > But see attached my maker CTL files. > > The error consistently returned to me: > > Skipping the contig because it is too short!! > SeqID: contig_WHATEVER > Length: 0 > > The sequences are no where near too short. This was verified independently outside maker to be sure. > > The headers are as follows: > > >tig00000458 len=2889428 reads=4143 covStat=1793.77 gappedBases=no class=contig suggestRepeat=no suggestCircular=no > >tig00000159 len=3515005 reads=5100 covStat=2143.94 gappedBases=no class=contig suggestRepeat=no suggestCircular=no > >tig00006117 len=1009519 reads=1168 covStat=804.93 gappedBases=no class=contig suggestRepeat=no suggestCircular=no > >tig00000419 len=2633986 reads=3938 covStat=1519.93 gappedBases=no class=contig suggestRepeat=no suggestCircular=no > >tig00027677 len=108573 reads=86 covStat=86.05 gappedBases=no class=contig suggestRepeat=no suggestCircular=no > >tig00021790 len=202251 reads=158 covStat=184.12 gappedBases=no class=contig suggestRepeat=no suggestCircular=no > >tig00316948 len=280333 reads=237 covStat=253.23 gappedBases=no class=contig suggestRepeat=no suggestCircular=no > >tig00019606 len=149709 reads=82 covStat=150.02 gappedBases=no class=contig suggestRepeat=no suggestCircular=no > >tig00023852 len=189461 reads=115 covStat=192.28 gappedBases=no class=contig suggestRepeat=no suggestCircular=no > >tig00316994 len=19742 reads=1 covStat=0.00 gappedBases=no class=contig suggestRepeat=no suggestCircular=no > > I have just about given up, I have no idea why its happening it makes zero sense. > > Any help or information as to why this might be happening would be amazing. > > Thank you in advance. > Lahcen > > -- > ========================================== > > Dr. Lahcen Campbell < > > Contact: lahcencampbell at gmail.com < > > https://www.ebi.ac.uk/about/people/lahcen-campbell < > ========================================== > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From lahcencampbell at gmail.com Wed Nov 15 10:32:02 2017 From: lahcencampbell at gmail.com (lahcen campbell) Date: Wed, 15 Nov 2017 16:32:02 +0000 Subject: [maker-devel] Unwarranted error: Skipping the contig because it is too short In-Reply-To: References: <3780BDEB-AF0E-4E27-9CD4-29CF0242FF9B@gmail.com> Message-ID: Hi Michael and Carson Thank you both for your helpful input, I really appreciate it. See below for my comments... Best Lahcen On Tue, Nov 14, 2017 at 5:04 PM, Michael Campbell < michael.s.campbell1 at gmail.com> wrote: > Hi Lancen, > > Thanks, the name has served me well for a number of years now :) > Its a good name, I wouldn't change it haha :) > > So I started a run with your 11 scaffolds. I gave it the protein file that > you sent and used all of repbase for masking. All of the scaffolds finished > without error. I was hoping it would be something simple that just needed > another set of eyes to see, looks like it's not the case for this one. > > To further rule out a data issue I would try running it with the dpp test > data that is bundled with MAKER to see if you can get the same error. This > data set will run in about a minute. If you are on a cluster I would try > running it with and without submitting it you the nodes and with and > without mpi. > > One thing that I have done in the past is to make a new directory and run > maker there (this doesn't make a lot of sense but when the error doesn't > make sense either it seems reasonable). > First off, I can report good news regards the 0 lengths contigs I was getting back. Carson, your thoughts on Bioperl conflict issues seemed to be the main issue. Out cluster software environment had gone through some changes of late, so working off the basis of that I was able to load the right bash config which resulted in no more 0 length contig errors. Huzzah !! > As far as rerunning MAKER there are a couple of approaches. If you want it > to stop complaining about trying to many times on failed contigs you can > increase the number of tries in the opts file. The line looks like this: > > tries=2 #number of times to try a contig if there is a failure for some > reason > > If you want to run it elsewhere, but you don't want to have to redo all of > the repeat masking and blasting you can use the gff3 output from an earlier > run. If you used gff3_merge after the first run finished you got a big gff3 > file with all of the gene models and evidence. If you break up that file by > the source column you can selectively pass the evidence back to MAKER. If > you put all of the repeatmasker and repeatrunner entries into one file and > pass it in on this line: > Can I ask, because I can't seem to find any concrete info on best practices for parsing MAKER gffs to partition the various source column fields as you described Michael. Is there a commonly used way to partition MAKER gffs based on source column? Or will I need to code it up, I ask because I feel this must have been needed before many times by other users. > > rm_gff= #pre-identified repeat elements from an external GFF3 file > I will remove links to fasta files for both 'rmlib=' and 'repeat_protein=' > > you can turn off model_org= and repeat_protein=. This will speed up the > next run a lot. Then you can pass in the protein2genome gff3 data on this > line: > > protein_gff= #aligned protein homology evidence from an external GFF3 file > > Don't pass the blast gff3 data in. If you pass in gff3 data to maker is > assumes that it is polished and will not make any effort to fix alignments. > the protein2genome data is polished. est2genome is the equivalent for EST > input. > You say don't pass the blast as gff. As I pass in all other info via GFF3 and remove any evidence as fasta inputs... BLAST won't be called again right ? Ensuring the shortest possible rerun of MAKER to roll back to a uncorrupted state. I noticed that the only unique source field types in my MAKER GFF are as follows: *augustus_masked * *blastx* *maker* *protein2genome* *repeatmasker* *repeatrunner* I read on the dev group that passing est evidence as GFF won't actually call Exonerate, est2genome option just tells MAKER to try and turn polished EST alignments directly into genes.... so If I pass this info again as GFF it will simply use the same info as it did originally and not have to recompute anything ? Based on the above fields contained in my MAKER gff, which of the following options should I select to re-annotate based on this older run ? I suspect all the options below in green should be set to 1, and the others in red set to 0. *#-----Re-annotation Using MAKER Derived GFF3* ..... *est_pass=1 #use ESTs in maker_gff: 1 = yes, 0 = no* *altest_pass=0 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no* *protein_pass=1 #use protein alignments in maker_gff: 1 = yes, 0 = no* *rm_pass=1 #use repeats in maker_gff: 1 = yes, 0 = no* *model_pass=1 #use gene models in maker_gff: 1 = yes, 0 = no* *pred_pass=1 #use ab-initio predictions in maker_gff: 1 = yes, 0 = no* *other_pass=0 #passthrough anyything else in maker_gff: 1 = yes, 0 = no * I don't think I will pass back anything under augustus_masked as I didn't set that up correctly initially, instead passing in a precomputed augustus gff which Im told isn't the best way to run MAKER. So if I can get back to a state of not failing all contigs, I will run Augustus inside maker itself on the 2nd pass. Note though, I am aware of the order of things normally, but for this instance I will continue with what I have done with success previously. Lastly, as this next run will be updating based on previous generated MAKER gff data.... what states should est2genome and protein2genome be ? 1 or 0 ? Apologies for the lengthy email reply Michael. Much appreciated again, thank you !! L > Clean_up is useful if you are running on a file system that limits the > number of files that you can write. It removes all of the intermediate > files used in the annotation. This takes away the advantage of rerunning in > the same directory. clean_try deletes everything first, and starts again. > clean_try is the one that deletes everything and pretends that the first > run never happened. > > I ccd the list on this response just Incas anyone else has any ideas or is > facing the same error. > > Let me know if any of this helps, > Mike > > On Nov 14, 2017, at 10:48 AM, lahcen campbell > wrote: > > Hi Michael > > Nice name btw I have a Michael in my name too :) Lahcen Michael Campbell > to be exact haha...anyway... thanks for the reply and offer to help. > > I have attached the file in question below. Its so strange, I had to just > leave it alone cause it was making me quite frustrated. Those bugs which > there are now common sense solutions are the worst cause very easily you > reach a wall. > > Might it have anything at all to do with the Protein homology file I > passed in ? Though, note.... the same protein files here have been used in > another maker run without issue so I kind of ruled that out already.....but > just spitballing at this stage. > > > Might I be so cheeky to ask you one more MAKER related question Michael... > ? Feel free to ignore it I hate to push but im desperate to figure it out > with little time to do so... > > I have an issue with a different MAKER analysis. Currently any new run I > attempt on this datastore, which has one round successful with 25000 odd > genes and double the transcripts. I attempted to run the second round with > a SNAP trained hmm (first time passing in SNAP hmm following first round > EST/Protein evidence). In this attempt, because we obtained so many genes I > thought I would be more stringent by changing the AED to 0.7 from 1.0. > Something I see now I didn't approach in the right way... too late now > sadly. > > MAKER finishes fine, but now it views all previous scaffolds as FAILED. > Nothing seems to change this and now the datastore is for all intents and > purposes locked in failed state. It keeps mentioning changes to the opts > file which there were, and that the previous runs didn't finish so it must > delete them. The results obtained from round 1 are still there though Im > pretty sure of that, all blast files etc are still there and populated. > > Can you tell me the main differences either clean_up or clean_try have and > which will completely and irreversibly wipe the first run? Something I > don't want to repeat, just allow me to progress to the next round. Im > hesitant to run them, but I've backed up the datastore incase. My next > attempt will be to pass the exact same maker_opts file from the round1 run, > with the only change made to clean_try/clean_up....Is this approach > misguided ? > > Your help is very much appreciated Michael so thank you, > Best > L > > ? > Combined_Protein_homology.fa.zip > > ?? > SubsampledGenomeFile_n10_11MB.fasta > > ? > > > > On Tue, Nov 14, 2017 at 3:08 PM, Michael Campbell < > michael.s.campbell1 at gmail.com> wrote: > >> Hi Lahcen, >> >> Nothing comes right to mind for what could be causing this error. If you >> want to compress your FASTA and send it to me I can try and recreate the >> error and try and debug it. >> >> Thanks, >> Mike >> >> On Nov 14, 2017, at 7:15 AM, lahcen campbell >> wrote: >> >> Hi MAKER community, >> >> I was hoping someone could help me. I have a very unusual error with two >> different versions of maker I have tested so far. This error shouldn't be >> happening but it occurs time and again no matter what I try. I have tried >> using 2.31.6_mpich3_icc and 2.31_mpich3 >> >> Note that version 2.31.6_mpich3_icc is one I have used countless times >> and produced final MAKER annotations without issue. So its not that this >> version has issues to date. >> >> Basically, this is a brand new MAKER analysis, I am only trying to train >> SNAP in this first round. I am following the MakerTutorial as documented >> this time around and I can't get past the initial SNAP train stage. >> >> I have a single genome file with, 10 Long scaffolds making up just under >> 11MB (subsampled from my original full length assembly) of sequence data in >> which to train SNAP. The fasta file is not corrupted, and has been >> generated in various ways in order to test formatting issues etc. >> >> I have only edited the maker_opts file and changed: >> >> *genome=* >> *protein=* >> *protein2genome=1* >> >> But see attached my maker CTL files. >> >> The error consistently returned to me: >> >> *Skipping the contig because it is too short!!* >> *SeqID: contig_WHATEVER* >> *Length: 0* >> >> *The sequences are no where near too short. This was verified >> independently outside maker to be sure. * >> >> *The headers are as follows:* >> >> >tig00000458 len=2889428 reads=4143 covStat=1793.77 gappedBases=no >> class=contig suggestRepeat=no suggestCircular=no >> >tig00000159 len=3515005 reads=5100 covStat=2143.94 gappedBases=no >> class=contig suggestRepeat=no suggestCircular=no >> >tig00006117 len=1009519 reads=1168 covStat=804.93 gappedBases=no >> class=contig suggestRepeat=no suggestCircular=no >> >tig00000419 len=2633986 reads=3938 covStat=1519.93 gappedBases=no >> class=contig suggestRepeat=no suggestCircular=no >> >tig00027677 len=108573 reads=86 covStat=86.05 gappedBases=no >> class=contig suggestRepeat=no suggestCircular=no >> >tig00021790 len=202251 reads=158 covStat=184.12 gappedBases=no >> class=contig suggestRepeat=no suggestCircular=no >> >tig00316948 len=280333 reads=237 covStat=253.23 gappedBases=no >> class=contig suggestRepeat=no suggestCircular=no >> >tig00019606 len=149709 reads=82 covStat=150.02 gappedBases=no >> class=contig suggestRepeat=no suggestCircular=no >> >tig00023852 len=189461 reads=115 covStat=192.28 gappedBases=no >> class=contig suggestRepeat=no suggestCircular=no >> >tig00316994 len=19742 reads=1 covStat=0.00 gappedBases=no class=contig >> suggestRepeat=no suggestCircular=no >> >> I have just about given up, I have no idea why its happening it makes >> zero sense. >> >> Any help or information as to why this might be happening would be >> amazing. >> >> Thank you in advance. >> Lahcen >> >> -- >> ========================================== >> > Dr. Lahcen Campbell < >> > Contact: lahcencampbell at gmail.com < >> > https://www.ebi.ac.uk/about/people/lahcen-campbell < >> ========================================== >> ____________ >> ___________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> > > > -- > ========================================== > > Dr. Lahcen Campbell < > > Contact: lahcencampbell at gmail.com < > > https://www.ebi.ac.uk/about/people/lahcen-campbell < > ========================================== > > > -- ========================================== > Dr. Lahcen Campbell < > Contact: lahcencampbell at gmail.com < > https://www.ebi.ac.uk/about/people/lahcen-campbell < ========================================== -------------- next part -------------- An HTML attachment was scrubbed... URL: From lahcencampbell at gmail.com Wed Nov 15 10:56:20 2017 From: lahcencampbell at gmail.com (lahcen campbell) Date: Wed, 15 Nov 2017 16:56:20 +0000 Subject: [maker-devel] Unwarranted error: Skipping the contig because it is too short In-Reply-To: References: <3780BDEB-AF0E-4E27-9CD4-29CF0242FF9B@gmail.com> Message-ID: Just an add on to this topic.... I have found a suite of gff utilities here which I hope can help me quickly parse the MAKER gff. https://github.com/mamarjan/gff3-pltools I'll report back how it goes ! Best L On Tue, Nov 14, 2017 at 5:04 PM, Michael Campbell < michael.s.campbell1 at gmail.com> wrote: > Hi Lancen, > > Thanks, the name has served me well for a number of years now :) > > So I started a run with your 11 scaffolds. I gave it the protein file that > you sent and used all of repbase for masking. All of the scaffolds finished > without error. I was hoping it would be something simple that just needed > another set of eyes to see, looks like it's not the case for this one. > > To further rule out a data issue I would try running it with the dpp test > data that is bundled with MAKER to see if you can get the same error. This > data set will run in about a minute. If you are on a cluster I would try > running it with and without submitting it you the nodes and with and > without mpi. > > One thing that I have done in the past is to make a new directory and run > maker there (this doesn't make a lot of sense but when the error doesn't > make sense either it seems reasonable). > > As far as rerunning MAKER there are a couple of approaches. If you want it > to stop complaining about trying to many times on failed contigs you can > increase the number of tries in the opts file. The line looks like this: > > tries=2 #number of times to try a contig if there is a failure for some > reason > > If you want to run it elsewhere, but you don't want to have to redo all of > the repeat masking and blasting you can use the gff3 output from an earlier > run. If you used gff3_merge after the first run finished you got a big gff3 > file with all of the gene models and evidence. If you break up that file by > the source column you can selectively pass the evidence back to MAKER. If > you put all of the repeatmasker and repeatrunner entries into one file and > pass it in on this line: > > rm_gff= #pre-identified repeat elements from an external GFF3 file > > you can turn off model_org= and repeat_protein=. This will speed up the > next run a lot. Then you can pass in the protein2genome gff3 data on this > line: > > protein_gff= #aligned protein homology evidence from an external GFF3 file > > Don't pass the blast gff3 data in. If you pass in gff3 data to maker is > assumes that it is polished and will not make any effort to fix alignments. > the protein2genome data is polished. est2genome is the equivalent for EST > input. > > Clean_up is useful if you are running on a file system that limits the > number of files that you can write. It removes all of the intermediate > files used in the annotation. This takes away the advantage of rerunning in > the same directory. clean_try deletes everything first, and starts again. > clean_try is the one that deletes everything and pretends that the first > run never happened. > > I ccd the list on this response just Incas anyone else has any ideas or is > facing the same error. > > Let me know if any of this helps, > Mike > > On Nov 14, 2017, at 10:48 AM, lahcen campbell > wrote: > > Hi Michael > > Nice name btw I have a Michael in my name too :) Lahcen Michael Campbell > to be exact haha...anyway... thanks for the reply and offer to help. > > I have attached the file in question below. Its so strange, I had to just > leave it alone cause it was making me quite frustrated. Those bugs which > there are now common sense solutions are the worst cause very easily you > reach a wall. > > Might it have anything at all to do with the Protein homology file I > passed in ? Though, note.... the same protein files here have been used in > another maker run without issue so I kind of ruled that out already.....but > just spitballing at this stage. > > > Might I be so cheeky to ask you one more MAKER related question Michael... > ? Feel free to ignore it I hate to push but im desperate to figure it out > with little time to do so... > > I have an issue with a different MAKER analysis. Currently any new run I > attempt on this datastore, which has one round successful with 25000 odd > genes and double the transcripts. I attempted to run the second round with > a SNAP trained hmm (first time passing in SNAP hmm following first round > EST/Protein evidence). In this attempt, because we obtained so many genes I > thought I would be more stringent by changing the AED to 0.7 from 1.0. > Something I see now I didn't approach in the right way... too late now > sadly. > > MAKER finishes fine, but now it views all previous scaffolds as FAILED. > Nothing seems to change this and now the datastore is for all intents and > purposes locked in failed state. It keeps mentioning changes to the opts > file which there were, and that the previous runs didn't finish so it must > delete them. The results obtained from round 1 are still there though Im > pretty sure of that, all blast files etc are still there and populated. > > Can you tell me the main differences either clean_up or clean_try have and > which will completely and irreversibly wipe the first run? Something I > don't want to repeat, just allow me to progress to the next round. Im > hesitant to run them, but I've backed up the datastore incase. My next > attempt will be to pass the exact same maker_opts file from the round1 run, > with the only change made to clean_try/clean_up....Is this approach > misguided ? > > Your help is very much appreciated Michael so thank you, > Best > L > > ? > Combined_Protein_homology.fa.zip > > ?? > SubsampledGenomeFile_n10_11MB.fasta > > ? > > > > On Tue, Nov 14, 2017 at 3:08 PM, Michael Campbell < > michael.s.campbell1 at gmail.com> wrote: > >> Hi Lahcen, >> >> Nothing comes right to mind for what could be causing this error. If you >> want to compress your FASTA and send it to me I can try and recreate the >> error and try and debug it. >> >> Thanks, >> Mike >> >> On Nov 14, 2017, at 7:15 AM, lahcen campbell >> wrote: >> >> Hi MAKER community, >> >> I was hoping someone could help me. I have a very unusual error with two >> different versions of maker I have tested so far. This error shouldn't be >> happening but it occurs time and again no matter what I try. I have tried >> using 2.31.6_mpich3_icc and 2.31_mpich3 >> >> Note that version 2.31.6_mpich3_icc is one I have used countless times >> and produced final MAKER annotations without issue. So its not that this >> version has issues to date. >> >> Basically, this is a brand new MAKER analysis, I am only trying to train >> SNAP in this first round. I am following the MakerTutorial as documented >> this time around and I can't get past the initial SNAP train stage. >> >> I have a single genome file with, 10 Long scaffolds making up just under >> 11MB (subsampled from my original full length assembly) of sequence data in >> which to train SNAP. The fasta file is not corrupted, and has been >> generated in various ways in order to test formatting issues etc. >> >> I have only edited the maker_opts file and changed: >> >> *genome=* >> *protein=* >> *protein2genome=1* >> >> But see attached my maker CTL files. >> >> The error consistently returned to me: >> >> *Skipping the contig because it is too short!!* >> *SeqID: contig_WHATEVER* >> *Length: 0* >> >> *The sequences are no where near too short. This was verified >> independently outside maker to be sure. * >> >> *The headers are as follows:* >> >> >tig00000458 len=2889428 reads=4143 covStat=1793.77 gappedBases=no >> class=contig suggestRepeat=no suggestCircular=no >> >tig00000159 len=3515005 reads=5100 covStat=2143.94 gappedBases=no >> class=contig suggestRepeat=no suggestCircular=no >> >tig00006117 len=1009519 reads=1168 covStat=804.93 gappedBases=no >> class=contig suggestRepeat=no suggestCircular=no >> >tig00000419 len=2633986 reads=3938 covStat=1519.93 gappedBases=no >> class=contig suggestRepeat=no suggestCircular=no >> >tig00027677 len=108573 reads=86 covStat=86.05 gappedBases=no >> class=contig suggestRepeat=no suggestCircular=no >> >tig00021790 len=202251 reads=158 covStat=184.12 gappedBases=no >> class=contig suggestRepeat=no suggestCircular=no >> >tig00316948 len=280333 reads=237 covStat=253.23 gappedBases=no >> class=contig suggestRepeat=no suggestCircular=no >> >tig00019606 len=149709 reads=82 covStat=150.02 gappedBases=no >> class=contig suggestRepeat=no suggestCircular=no >> >tig00023852 len=189461 reads=115 covStat=192.28 gappedBases=no >> class=contig suggestRepeat=no suggestCircular=no >> >tig00316994 len=19742 reads=1 covStat=0.00 gappedBases=no class=contig >> suggestRepeat=no suggestCircular=no >> >> I have just about given up, I have no idea why its happening it makes >> zero sense. >> >> Any help or information as to why this might be happening would be >> amazing. >> >> Thank you in advance. >> Lahcen >> >> -- >> ========================================== >> > Dr. Lahcen Campbell < >> > Contact: lahcencampbell at gmail.com < >> > https://www.ebi.ac.uk/about/people/lahcen-campbell < >> ========================================== >> ____________ >> ___________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> > > > -- > ========================================== > > Dr. Lahcen Campbell < > > Contact: lahcencampbell at gmail.com < > > https://www.ebi.ac.uk/about/people/lahcen-campbell < > ========================================== > > > -- ========================================== > Dr. Lahcen Campbell < > Contact: lahcencampbell at gmail.com < > https://www.ebi.ac.uk/about/people/lahcen-campbell < ========================================== -------------- next part -------------- An HTML attachment was scrubbed... URL: From qwzhang0601 at gmail.com Thu Nov 16 13:46:39 2017 From: qwzhang0601 at gmail.com (Quanwei Zhang) Date: Thu, 16 Nov 2017 14:46:39 -0500 Subject: [maker-devel] About loss of Histone H2A, H2B, H4 Message-ID: Hello: We have annotated a new rodent genome using Maker2. Based on the annotated maker2 gene sets, we did gene family expansion/contraction analysis using CAFE. We found Histone H2A, H2B, H4 gene families are under contraction. I wonder whether there are known bias to predict those gene families using Maker2? For example, can this due to repeat masking of the genome? I used repeatmaker and generated species specific repeat libraries follows http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction--Basic . Thanks Best Quanwei -------------- next part -------------- An HTML attachment was scrubbed... URL: From qwzhang0601 at gmail.com Thu Nov 16 13:46:39 2017 From: qwzhang0601 at gmail.com (Quanwei Zhang) Date: Thu, 16 Nov 2017 14:46:39 -0500 Subject: [maker-devel] About loss of Histone H2A, H2B, H4 Message-ID: Hello: We have annotated a new rodent genome using Maker2. Based on the annotated maker2 gene sets, we did gene family expansion/contraction analysis using CAFE. We found Histone H2A, H2B, H4 gene families are under contraction. I wonder whether there are known bias to predict those gene families using Maker2? For example, can this due to repeat masking of the genome? I used repeatmaker and generated species specific repeat libraries follows http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction--Basic . Thanks Best Quanwei -------------- next part -------------- An HTML attachment was scrubbed... URL: From mcsimenc at gmail.com Fri Nov 17 19:39:25 2017 From: mcsimenc at gmail.com (Matt Simenc) Date: Fri, 17 Nov 2017 17:39:25 -0800 Subject: [maker-devel] 99.98% of repeatmasker features on plus strand, anyone else seen this? Message-ID: Hi everybody, I just noticed that the vast majority of features with type repeatmasker are on the plus strand in my MAKER GFFs. There are a handful on the minus strand. Has anyone else seen that in their MAKER GFFs? MAKER 2.31.8 I looked at a standalone RepeatMasker run I did and the features are more evenly distributed between the +/- strands. Matt -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Nov 17 20:09:20 2017 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 17 Nov 2017 19:09:20 -0700 Subject: [maker-devel] 99.98% of repeatmasker features on plus strand, anyone else seen this? In-Reply-To: References: Message-ID: <0DC818BC-EA36-43EA-9237-003BE07C4434@gmail.com> While transposons that encode proteins will technically have a strand, simple repeats and many others do not so the algorithms used to find them will not necessarily assign a strand. For this reason the repeats are treated as strand-less since both strands are masked and are they are arbitrarily assigned to the plus strand to avoid issues with genome browsers that cannot handle strandless features. ?Carson > On Nov 17, 2017, at 6:39 PM, Matt Simenc wrote: > > Hi everybody, > > I just noticed that the vast majority of features with type repeatmasker are on the plus strand in my MAKER GFFs. There are a handful on the minus strand. Has anyone else seen that in their MAKER GFFs? > > MAKER 2.31.8 > > I looked at a standalone RepeatMasker run I did and the features are more evenly distributed between the +/- strands. > > > Matt > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Fri Nov 17 20:23:34 2017 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 17 Nov 2017 19:23:34 -0700 Subject: [maker-devel] 99.98% of repeatmasker features on plus strand, anyone else seen this? In-Reply-To: References: Message-ID: Also MAKER clusters overlapping repeats to generate the best masking of the assembly. For the GFF3 it then assigns the name of the repeat encompassing the greatest portion of the cluster to the feature (i.e. the best representative). But the cluster is technically build from overlapping repeats on both strands (repeats tend to jump on top of other repeats, so they stack with bits and pieces of other repeats at the edges). Yet another reason why everything is just assigned to the plus strand. ?Carson > On Nov 17, 2017, at 6:39 PM, Matt Simenc wrote: > > Hi everybody, > > I just noticed that the vast majority of features with type repeatmasker are on the plus strand in my MAKER GFFs. There are a handful on the minus strand. Has anyone else seen that in their MAKER GFFs? > > MAKER 2.31.8 > > I looked at a standalone RepeatMasker run I did and the features are more evenly distributed between the +/- strands. > > > Matt > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From mcsimenc at gmail.com Sat Nov 18 10:27:25 2017 From: mcsimenc at gmail.com (Matt Simenc) Date: Sat, 18 Nov 2017 08:27:25 -0800 Subject: [maker-devel] 99.98% of repeatmasker features on plus strand, anyone else seen this? In-Reply-To: References: Message-ID: Ah ok. A messy problem! I need to approximate strandedness for TE loci if possible so will do some post processing using blast/hmmer to Repbase and Dfam. Thanks for the speedy response Carson! On Fri, Nov 17, 2017 at 6:23 PM, Carson Holt wrote: > Also MAKER clusters overlapping repeats to generate the best masking of > the assembly. For the GFF3 it then assigns the name of the repeat > encompassing the greatest portion of the cluster to the feature (i.e. the > best representative). But the cluster is technically build from overlapping > repeats on both strands (repeats tend to jump on top of other repeats, so > they stack with bits and pieces of other repeats at the edges). Yet another > reason why everything is just assigned to the plus strand. > > ?Carson > > > > On Nov 17, 2017, at 6:39 PM, Matt Simenc wrote: > > > > Hi everybody, > > > > I just noticed that the vast majority of features with type repeatmasker > are on the plus strand in my MAKER GFFs. There are a handful on the minus > strand. Has anyone else seen that in their MAKER GFFs? > > > > MAKER 2.31.8 > > > > I looked at a standalone RepeatMasker run I did and the features are > more evenly distributed between the +/- strands. > > > > > > Matt > > _______________________________________________ > > maker-devel mailing list > > maker-devel at box290.bluehost.com > > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.s.campbell1 at gmail.com Wed Nov 15 15:50:45 2017 From: michael.s.campbell1 at gmail.com (Michael Campbell) Date: Wed, 15 Nov 2017 16:50:45 -0500 Subject: [maker-devel] Unwarranted error: Skipping the contig because it is too short In-Reply-To: References: <3780BDEB-AF0E-4E27-9CD4-29CF0242FF9B@gmail.com> Message-ID: <4157C9FE-1F5D-4320-A03F-2344C1DBD81C@gmail.com> Hi Lahcen, I put some answers below. > On Nov 15, 2017, at 11:32 AM, lahcen campbell wrote: > > Hi Michael and Carson > > Thank you both for your helpful input, I really appreciate it. > > See below for my comments... > > Best > Lahcen > > > On Tue, Nov 14, 2017 at 5:04 PM, Michael Campbell > wrote: > Hi Lancen, > > Thanks, the name has served me well for a number of years now :) > > Its a good name, I wouldn't change it haha :) > > > So I started a run with your 11 scaffolds. I gave it the protein file that you sent and used all of repbase for masking. All of the scaffolds finished without error. I was hoping it would be something simple that just needed another set of eyes to see, looks like it's not the case for this one. > > To further rule out a data issue I would try running it with the dpp test data that is bundled with MAKER to see if you can get the same error. This data set will run in about a minute. If you are on a cluster I would try running it with and without submitting it you the nodes and with and without mpi. > > One thing that I have done in the past is to make a new directory and run maker there (this doesn't make a lot of sense but when the error doesn't make sense either it seems reasonable). > > First off, I can report good news regards the 0 lengths contigs I was getting back. Carson, your thoughts on Bioperl conflict issues seemed to be the main issue. Out cluster software environment had gone through some changes of late, so working off the basis of that I was able to load the right bash config which resulted in no more 0 length contig errors. Huzzah !! > Great > > As far as rerunning MAKER there are a couple of approaches. If you want it to stop complaining about trying to many times on failed contigs you can increase the number of tries in the opts file. The line looks like this: > > tries=2 #number of times to try a contig if there is a failure for some reason > > If you want to run it elsewhere, but you don't want to have to redo all of the repeat masking and blasting you can use the gff3 output from an earlier run. If you used gff3_merge after the first run finished you got a big gff3 file with all of the gene models and evidence. If you break up that file by the source column you can selectively pass the evidence back to MAKER. If you put all of the repeatmasker and repeatrunner entries into one file and pass it in on this line: > > Can I ask, because I can't seem to find any concrete info on best practices for parsing MAKER gffs to partition the various source column fields as you described Michael. > > Is there a commonly used way to partition MAKER gffs based on source column? Or will I need to code it up, I ask because I feel this must have been needed before many times by other users. > I've got a script that will do it if you want it. Since you don't need all of the entries grep is probably as easy as anyting. grep -P '\tsource\t' > > rm_gff= #pre-identified repeat elements from an external GFF3 file > > I will remove links to fasta files for both 'rmlib=' and 'repeat_protein=' > Yep > > you can turn off model_org= and repeat_protein=. This will speed up the next run a lot. Then you can pass in the protein2genome gff3 data on this line: > > protein_gff= #aligned protein homology evidence from an external GFF3 file > > Don't pass the blast gff3 data in. If you pass in gff3 data to maker is assumes that it is polished and will not make any effort to fix alignments. the protein2genome data is polished. est2genome is the equivalent for EST input. > > You say don't pass the blast as gff. As I pass in all other info via GFF3 and remove any evidence as fasta inputs... BLAST won't be called again right ? Ensuring the shortest possible rerun of MAKER to roll back to a uncorrupted state. > Right. blast will not be called as long as you remove or comment out the paths to the fastas in the est= and protein= lines. > I noticed that the only unique source field types in my MAKER GFF are as follows: > augustus_masked > blastx > maker > protein2genome > repeatmasker > repeatrunner > That look right for the run you described > I read on the dev group that passing est evidence as GFF won't actually call Exonerate, est2genome option just tells MAKER to try and turn polished EST alignments directly into genes.... so If I pass this info again as GFF it will simply use the same info as it did originally and not have to recompute anything ? > > Based on the above fields contained in my MAKER gff, which of the following options should I select to re-annotate based on this older run ? I suspect all the options below in green should be set to 1, and the others in red set to 0. > > #-----Re-annotation Using MAKER Derived GFF3 > ..... > est_pass=1 #use ESTs in maker_gff: 1 = yes, 0 = no > altest_pass=0 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no > protein_pass=1 #use protein alignments in maker_gff: 1 = yes, 0 = no > rm_pass=1 #use repeats in maker_gff: 1 = yes, 0 = no > model_pass=1 #use gene models in maker_gff: 1 = yes, 0 = no > pred_pass=1 #use ab-initio predictions in maker_gff: 1 = yes, 0 = no > other_pass=0 #passthrough anyything else in maker_gff: 1 = yes, 0 = no > You don't need model_pass or pred_pass if you plan on running gene finders > I don't think I will pass back anything under augustus_masked as I didn't set that up correctly initially, instead passing in a precomputed augustus gff which Im told isn't the best way to run MAKER. So if I can get back to a state of not failing all contigs, I will run Augustus inside maker itself on the 2nd pass. Note though, I am aware of the order of things normally, but for this instance I will continue with what I have done with success previously. Yeah, when I have issues with failing contigs I'll pull stuff out until it starts running without error, then I add things back until something breaks. > Lastly, as this next run will be updating based on previous generated MAKER gff data.... what states should est2genome and protein2genome be ? 1 or 0 ? 0 those options are just for generating gene models directly from evidence when you don't have any gene finders trained. When you say updating do you mean reusing evidence from previous runs and generating new gene annotations or are you taking existing gene models and adding new evidence to see if they can be improved? > > Apologies for the lengthy email reply Michael. Much appreciated again, thank you !! No Worries, hope it helps. > > L > > > Clean_up is useful if you are running on a file system that limits the number of files that you can write. It removes all of the intermediate files used in the annotation. This takes away the advantage of rerunning in the same directory. clean_try deletes everything first, and starts again. clean_try is the one that deletes everything and pretends that the first run never happened. > > I ccd the list on this response just Incas anyone else has any ideas or is facing the same error. > > Let me know if any of this helps, > Mike > >> On Nov 14, 2017, at 10:48 AM, lahcen campbell > wrote: >> >> Hi Michael >> >> Nice name btw I have a Michael in my name too :) Lahcen Michael Campbell to be exact haha...anyway... thanks for the reply and offer to help. >> >> I have attached the file in question below. Its so strange, I had to just leave it alone cause it was making me quite frustrated. Those bugs which there are now common sense solutions are the worst cause very easily you reach a wall. >> >> Might it have anything at all to do with the Protein homology file I passed in ? Though, note.... the same protein files here have been used in another maker run without issue so I kind of ruled that out already.....but just spitballing at this stage. >> >> >> Might I be so cheeky to ask you one more MAKER related question Michael... ? Feel free to ignore it I hate to push but im desperate to figure it out with little time to do so... >> >> I have an issue with a different MAKER analysis. Currently any new run I attempt on this datastore, which has one round successful with 25000 odd genes and double the transcripts. I attempted to run the second round with a SNAP trained hmm (first time passing in SNAP hmm following first round EST/Protein evidence). In this attempt, because we obtained so many genes I thought I would be more stringent by changing the AED to 0.7 from 1.0. Something I see now I didn't approach in the right way... too late now sadly. >> >> MAKER finishes fine, but now it views all previous scaffolds as FAILED. Nothing seems to change this and now the datastore is for all intents and purposes locked in failed state. It keeps mentioning changes to the opts file which there were, and that the previous runs didn't finish so it must delete them. The results obtained from round 1 are still there though Im pretty sure of that, all blast files etc are still there and populated. >> >> Can you tell me the main differences either clean_up or clean_try have and which will completely and irreversibly wipe the first run? Something I don't want to repeat, just allow me to progress to the next round. Im hesitant to run them, but I've backed up the datastore incase. My next attempt will be to pass the exact same maker_opts file from the round1 run, with the only change made to clean_try/clean_up....Is this approach misguided ? >> >> Your help is very much appreciated Michael so thank you, >> Best >> L >> >> ? >> ?Combined_Protein_homology.fa.zip ?? >> ?SubsampledGenomeFile_n10_11MB.fasta ? >> >> >> >> On Tue, Nov 14, 2017 at 3:08 PM, Michael Campbell > wrote: >> Hi Lahcen, >> >> Nothing comes right to mind for what could be causing this error. If you want to compress your FASTA and send it to me I can try and recreate the error and try and debug it. >> >> Thanks, >> Mike >>> On Nov 14, 2017, at 7:15 AM, lahcen campbell > wrote: >>> >>> Hi MAKER community, >>> >>> I was hoping someone could help me. I have a very unusual error with two different versions of maker I have tested so far. This error shouldn't be happening but it occurs time and again no matter what I try. I have tried using 2.31.6_mpich3_icc and 2.31_mpich3 >>> >>> Note that version 2.31.6_mpich3_icc is one I have used countless times and produced final MAKER annotations without issue. So its not that this version has issues to date. >>> >>> Basically, this is a brand new MAKER analysis, I am only trying to train SNAP in this first round. I am following the MakerTutorial as documented this time around and I can't get past the initial SNAP train stage. >>> >>> I have a single genome file with, 10 Long scaffolds making up just under 11MB (subsampled from my original full length assembly) of sequence data in which to train SNAP. The fasta file is not corrupted, and has been generated in various ways in order to test formatting issues etc. >>> >>> I have only edited the maker_opts file and changed: >>> >>> genome= >>> protein= >>> protein2genome=1 >>> >>> But see attached my maker CTL files. >>> >>> The error consistently returned to me: >>> >>> Skipping the contig because it is too short!! >>> SeqID: contig_WHATEVER >>> Length: 0 >>> >>> The sequences are no where near too short. This was verified independently outside maker to be sure. >>> >>> The headers are as follows: >>> >>> >tig00000458 len=2889428 reads=4143 covStat=1793.77 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >>> >tig00000159 len=3515005 reads=5100 covStat=2143.94 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >>> >tig00006117 len=1009519 reads=1168 covStat=804.93 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >>> >tig00000419 len=2633986 reads=3938 covStat=1519.93 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >>> >tig00027677 len=108573 reads=86 covStat=86.05 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >>> >tig00021790 len=202251 reads=158 covStat=184.12 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >>> >tig00316948 len=280333 reads=237 covStat=253.23 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >>> >tig00019606 len=149709 reads=82 covStat=150.02 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >>> >tig00023852 len=189461 reads=115 covStat=192.28 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >>> >tig00316994 len=19742 reads=1 covStat=0.00 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >>> >>> I have just about given up, I have no idea why its happening it makes zero sense. >>> >>> Any help or information as to why this might be happening would be amazing. >>> >>> Thank you in advance. >>> Lahcen >>> >>> -- >>> ========================================== >>> > Dr. Lahcen Campbell < >>> > Contact: lahcencampbell at gmail.com < >>> > https://www.ebi.ac.uk/about/people/lahcen-campbell < >>> ========================================== >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> >> >> -- >> ========================================== >> > Dr. Lahcen Campbell < >> > Contact: lahcencampbell at gmail.com < >> > https://www.ebi.ac.uk/about/people/lahcen-campbell < >> ========================================== > > > > > -- > ========================================== > > Dr. Lahcen Campbell < > > Contact: lahcencampbell at gmail.com < > > https://www.ebi.ac.uk/about/people/lahcen-campbell < > ========================================== -------------- next part -------------- An HTML attachment was scrubbed... URL: From scott at scottcain.net Mon Nov 20 19:57:09 2017 From: scott at scottcain.net (Scott Cain) Date: Mon, 20 Nov 2017 20:57:09 -0500 Subject: [maker-devel] GMOD hackathon before PAG San Diego in January In-Reply-To: References: Message-ID: Hello, This is an update on the hackathon. It is a go; the hackathon page is up on GMOD.org: http://gmod.org/wiki/2018_PAG_Hackathon And the EventBrite page is up at https://www.eventbrite.com/e/gmod-2018-pag-hackathon-tickets-39700164260 Tickets are $50 which covers the costs associated with the room and lunch on the first day. Please feel free to add suggested topics to the wiki page, or send the suggestions to me to add. Thanks, Scott On Thursday, October 12, 2017, Scott Cain wrote: > Hi all, > > This January before PAG on the Wednesday and Thursday before PAG (January > 10-11) in San Diego we are planning a GMOD hackathon. We expect that > participants will be interested in solving problems/creating solutions > related to Tripal, JBrowse, Apollo, and Galaxy but if you're interested in > another GMOD project, by all means, let us know! We expect this hackathon > to overlap with the Tripal hackathon that is on January 11 (I'm pretty > sure; right Stephen?) > > If you are interested in attending this hackathon, please let me know so I > can be sure we have an appropriately sized space. And if you're coming for > the pre-PAG hackathon, consider staying for PAG, since there is always a > lot of GMOD-related content at the meeting! > > Thanks, > Scott > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain > dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research -------------- next part -------------- An HTML attachment was scrubbed... URL: From o.k.torresen at ibv.uio.no Tue Nov 21 07:57:46 2017 From: o.k.torresen at ibv.uio.no (=?utf-8?B?T2xlIEtyaXN0aWFuIFTDuHJyZXNlbg==?=) Date: Tue, 21 Nov 2017 13:57:46 +0000 Subject: [maker-devel] substr outside of string in PhatHits_utils.pm In-Reply-To: <5E5CA836-91B1-4AA8-8DC3-68FB9885EB43@gmail.com> References: <5E5CA836-91B1-4AA8-8DC3-68FB9885EB43@gmail.com> Message-ID: <182CDDD3-A108-4095-9AC4-A2C198D34107@ibv.uio.no> Thank you Carson. After a bit of struggling, I can confirm that the same error occurs in MAKER 3.01.2 (I guess you meant that version, couldn?t find 3.02.02). I am providing a GFF to est_gff, with match and match_part entries. For at least one of the scaffolds, the last coordinate (column 5) is the same number as the length of the scaffold. That should be allowed by the GFF3 standard, right? How can I troubleshoot this? The error message is not so informative. It seems that PhatHit_utils.pm tries to find a stop codon. Snipped from that file, lines 849-850: #fix stop codon by walking downstream my $has_stop = $tM->is_ter_codon(substr($transcript_seq, $end-1-3, 3)); The GFF I am using was the output of Mikado (https://www.biorxiv.org/content/early/2017/11/09/216994), which is GFF3, and then processed a bit to make it suitable for MAKER. First converted to GTF by 'mikado util convert mikado.loci.gff3 mikado.loci.gtf' Then I selected only mRNA and exon entries, and changed mRNA to transcript to make it look like cufflinks output (and set a dummy score): grep -P "\tmRNA\t|\texon\t" mikado.loci.gtf |sed "s/mRNA/transcript/g" |awk -F "\t" '{$9=$9"cov \"10.0\";"; OFS="\t"; print $1, $2, $3, $4, $5, $6, $7, $8, $9}' > mikado.loci.score.gtf Before converting with cufflinks2gff3: cufflinks2gff3 mikado.loci.score.gtf > ests.score.gff3 Thank you. Ole > On 09 Nov 2017, at 17:28, Carson Holt wrote: > > My first guess is that if you are using gff3 files as input to anything, then there may be an issue with your GFF3 file. My second suggestion is to try MAKER 3.02.02 to see if it has the same issue. > > ?Carson > > >> On Nov 9, 2017, at 2:44 AM, Ole Kristian T?rresen wrote: >> >> Dear all, >> I'm having an issue with MAKER which I'm unable to wrap my head around. Hopefully the issue is easily identifiable and resolvable for someone with more insight than me. Please find the log output attached below. I cannot find any more information than this in any logs. Many scaffolds do complete fine, but some of the longest ones have issues. >> >> Thank you. >> >> Sincerely, >> Ole K. T?rresen >> >> Error message: >> >> #--------- command -------------# >> Widget::augustus: >> /projects/cees/bin/augustus/augustus-3.2.3/bin/augustus --strand=backward --species=gadMor2_code_braker2 --UTR=off --hintsfile=/tmp/18899594.d/maker_m8xwVp/0/34_2.22239-28465.gadMor2_code_brak >> er2.auto_annotator.xdef.augustus --extrinsicCfgFile=/projects/cees/bin/augustus/augustus-3.2.3/config/extrinsic/extrinsic.MPE.cfg --AUGUSTUS_CONFIG_PATH=/projects/cees/bin/augustus/augustus-3.2 >> .3/config /tmp/18899594.d/maker_m8xwVp/0/34_2.22239-28465.gadMor2_code_braker2.auto_annotator.augustus.fasta > /tmp/18899594.d/maker_m8xwVp/0/34_2.22239-28465.gadMor2_code_braker2.auto_annotato >> r.augustus >> #-------------------------------# >> deleted:0 genes >> begin called get_best_alt_splices1 >> ...processing 0 of 2 >> ...processing 1 of 2 >> end called get_best_alt_splices1 >> ...processing 0 of 20 >> ...processing 1 of 20 >> ...processing 2 of 20 >> ...processing 3 of 20 >> ...processing 4 of 20 >> ...processing 5 of 20 >> ...processing 6 of 20 >> ...processing 7 of 20 >> ...processing 8 of 20 >> ...processing 9 of 20 >> ...processing 10 of 20 >> ...processing 11 of 20 >> ...processing 12 of 20 >> ...processing 13 of 20 >> ...processing 14 of 20 >> ...processing 15 of 20 >> ...processing 16 of 20 >> ...processing 17 of 20 >> ...processing 18 of 20 >> ...processing 19 of 20 >> substr outside of string at /projects/cees/bin/maker/maker-3.1.1/bin/../lib/PhatHit_utils.pm line 850. >> --> rank=NA, hostname=compute-31-18.local >> ERROR: Failed while annotating transcripts >> ERROR: Chunk failed at level:1, tier_type:4 >> FAILED CONTIG:GmG20150304_scaffold_8692 >> >> ERROR: Chunk failed at level:6, tier_type:0 >> FAILED CONTIG:GmG20150304_scaffold_8692 >> >> examining contents of the fasta file and run log >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > From carsonhh at gmail.com Tue Nov 21 10:19:36 2017 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 21 Nov 2017 09:19:36 -0700 Subject: [maker-devel] About loss of Histone H2A, H2B, H4 In-Reply-To: References: Message-ID: <3A172BAF-DD5C-4CA8-8D1C-6EFF30A1FFA4@gmail.com> No known biases, but if you are concerned, you can collect known Histone H2A, H2B, H4 proteins and transcripts from other species (protein= and altest= options), them run MAKER with no masking to see if you gain any models that may have been overlooked because of over-masking of repeats. Make sure to evaluate any models you find as being a pseudogene. Run InterProScan on results to make sure they contain known InterPro domains for that gene family as well. Running without repeat masking will increase sensitivity but also false positives derived from low homology alignments to simple repeats which is why you need to evaluate results using something like InterProScan. Also run BUSCO to evaluate the completeness of the genome. Make sure that the observed contraction is not just a result of an incomplete assembly. ?Carson > On Nov 16, 2017, at 12:46 PM, Quanwei Zhang wrote: > > Hello: > > We have annotated a new rodent genome using Maker2. Based on the annotated maker2 gene sets, we did gene family expansion/contraction analysis using CAFE. We found Histone H2A, H2B, H4 gene families are under contraction. I wonder whether there are known bias to predict those gene families using Maker2? For example, can this due to repeat masking of the genome? I used repeatmaker and generated species specific repeat libraries follows http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction--Basic . > > Thanks > > Best > Quanwei > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Nov 21 10:22:58 2017 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 21 Nov 2017 09:22:58 -0700 Subject: [maker-devel] Unwarranted error: Skipping the contig because it is too short In-Reply-To: <4157C9FE-1F5D-4320-A03F-2344C1DBD81C@gmail.com> References: <3780BDEB-AF0E-4E27-9CD4-29CF0242FF9B@gmail.com> <4157C9FE-1F5D-4320-A03F-2344C1DBD81C@gmail.com> Message-ID: <172954D4-7D27-4929-8BC1-B0292F8D9BDB@gmail.com> Just one note I want to add here. When you use GFF3 to pass in results as opposed to letting MAKER use the raw alignments, you lose the ability of MAKER to base some decisions on reading frame match since you lose both the alignment sequence and cigar string of the alignment. So MAKER just assumes correct ORF and sequence match rather than evaluating it (this will make AED scores artificially better for some models). ?Carson > On Nov 15, 2017, at 2:50 PM, Michael Campbell wrote: > > Hi Lahcen, > > I put some answers below. >> On Nov 15, 2017, at 11:32 AM, lahcen campbell > wrote: >> >> Hi Michael and Carson >> >> Thank you both for your helpful input, I really appreciate it. >> >> See below for my comments... >> >> Best >> Lahcen >> >> >> On Tue, Nov 14, 2017 at 5:04 PM, Michael Campbell > wrote: >> Hi Lancen, >> >> Thanks, the name has served me well for a number of years now :) >> >> Its a good name, I wouldn't change it haha :) >> >> >> So I started a run with your 11 scaffolds. I gave it the protein file that you sent and used all of repbase for masking. All of the scaffolds finished without error. I was hoping it would be something simple that just needed another set of eyes to see, looks like it's not the case for this one. >> >> To further rule out a data issue I would try running it with the dpp test data that is bundled with MAKER to see if you can get the same error. This data set will run in about a minute. If you are on a cluster I would try running it with and without submitting it you the nodes and with and without mpi. >> >> One thing that I have done in the past is to make a new directory and run maker there (this doesn't make a lot of sense but when the error doesn't make sense either it seems reasonable). >> >> First off, I can report good news regards the 0 lengths contigs I was getting back. Carson, your thoughts on Bioperl conflict issues seemed to be the main issue. Out cluster software environment had gone through some changes of late, so working off the basis of that I was able to load the right bash config which resulted in no more 0 length contig errors. Huzzah !! >> Great >> >> As far as rerunning MAKER there are a couple of approaches. If you want it to stop complaining about trying to many times on failed contigs you can increase the number of tries in the opts file. The line looks like this: >> >> tries=2 #number of times to try a contig if there is a failure for some reason >> >> If you want to run it elsewhere, but you don't want to have to redo all of the repeat masking and blasting you can use the gff3 output from an earlier run. If you used gff3_merge after the first run finished you got a big gff3 file with all of the gene models and evidence. If you break up that file by the source column you can selectively pass the evidence back to MAKER. If you put all of the repeatmasker and repeatrunner entries into one file and pass it in on this line: >> >> Can I ask, because I can't seem to find any concrete info on best practices for parsing MAKER gffs to partition the various source column fields as you described Michael. >> >> Is there a commonly used way to partition MAKER gffs based on source column? Or will I need to code it up, I ask because I feel this must have been needed before many times by other users. >> I've got a script that will do it if you want it. Since you don't need all of the entries grep is probably as easy as anyting. grep -P '\tsource\t' >> >> rm_gff= #pre-identified repeat elements from an external GFF3 file >> >> I will remove links to fasta files for both 'rmlib=' and 'repeat_protein=' >> Yep >> >> you can turn off model_org= and repeat_protein=. This will speed up the next run a lot. Then you can pass in the protein2genome gff3 data on this line: >> >> protein_gff= #aligned protein homology evidence from an external GFF3 file >> >> Don't pass the blast gff3 data in. If you pass in gff3 data to maker is assumes that it is polished and will not make any effort to fix alignments. the protein2genome data is polished. est2genome is the equivalent for EST input. >> >> You say don't pass the blast as gff. As I pass in all other info via GFF3 and remove any evidence as fasta inputs... BLAST won't be called again right ? Ensuring the shortest possible rerun of MAKER to roll back to a uncorrupted state. >> Right. blast will not be called as long as you remove or comment out the paths to the fastas in the est= and protein= lines. > >> I noticed that the only unique source field types in my MAKER GFF are as follows: >> augustus_masked >> blastx >> maker >> protein2genome >> repeatmasker >> repeatrunner >> That look right for the run you described >> I read on the dev group that passing est evidence as GFF won't actually call Exonerate, est2genome option just tells MAKER to try and turn polished EST alignments directly into genes.... so If I pass this info again as GFF it will simply use the same info as it did originally and not have to recompute anything ? >> >> Based on the above fields contained in my MAKER gff, which of the following options should I select to re-annotate based on this older run ? I suspect all the options below in green should be set to 1, and the others in red set to 0. >> >> #-----Re-annotation Using MAKER Derived GFF3 >> ..... >> est_pass=1 #use ESTs in maker_gff: 1 = yes, 0 = no >> altest_pass=0 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no >> protein_pass=1 #use protein alignments in maker_gff: 1 = yes, 0 = no >> rm_pass=1 #use repeats in maker_gff: 1 = yes, 0 = no >> model_pass=1 #use gene models in maker_gff: 1 = yes, 0 = no >> pred_pass=1 #use ab-initio predictions in maker_gff: 1 = yes, 0 = no >> other_pass=0 #passthrough anyything else in maker_gff: 1 = yes, 0 = no >> > You don't need model_pass or pred_pass if you plan on running gene finders >> I don't think I will pass back anything under augustus_masked as I didn't set that up correctly initially, instead passing in a precomputed augustus gff which Im told isn't the best way to run MAKER. So if I can get back to a state of not failing all contigs, I will run Augustus inside maker itself on the 2nd pass. Note though, I am aware of the order of things normally, but for this instance I will continue with what I have done with success previously. > Yeah, when I have issues with failing contigs I'll pull stuff out until it starts running without error, then I add things back until something breaks. > >> Lastly, as this next run will be updating based on previous generated MAKER gff data.... what states should est2genome and protein2genome be ? 1 or 0 ? > 0 those options are just for generating gene models directly from evidence when you don't have any gene finders trained. When you say updating do you mean reusing evidence from previous runs and generating new gene annotations or are you taking existing gene models and adding new evidence to see if they can be improved? >> >> Apologies for the lengthy email reply Michael. Much appreciated again, thank you !! > No Worries, hope it helps. >> >> L >> >> >> Clean_up is useful if you are running on a file system that limits the number of files that you can write. It removes all of the intermediate files used in the annotation. This takes away the advantage of rerunning in the same directory. clean_try deletes everything first, and starts again. clean_try is the one that deletes everything and pretends that the first run never happened. >> >> I ccd the list on this response just Incas anyone else has any ideas or is facing the same error. >> >> Let me know if any of this helps, >> Mike >> >>> On Nov 14, 2017, at 10:48 AM, lahcen campbell > wrote: >>> >>> Hi Michael >>> >>> Nice name btw I have a Michael in my name too :) Lahcen Michael Campbell to be exact haha...anyway... thanks for the reply and offer to help. >>> >>> I have attached the file in question below. Its so strange, I had to just leave it alone cause it was making me quite frustrated. Those bugs which there are now common sense solutions are the worst cause very easily you reach a wall. >>> >>> Might it have anything at all to do with the Protein homology file I passed in ? Though, note.... the same protein files here have been used in another maker run without issue so I kind of ruled that out already.....but just spitballing at this stage. >>> >>> >>> Might I be so cheeky to ask you one more MAKER related question Michael... ? Feel free to ignore it I hate to push but im desperate to figure it out with little time to do so... >>> >>> I have an issue with a different MAKER analysis. Currently any new run I attempt on this datastore, which has one round successful with 25000 odd genes and double the transcripts. I attempted to run the second round with a SNAP trained hmm (first time passing in SNAP hmm following first round EST/Protein evidence). In this attempt, because we obtained so many genes I thought I would be more stringent by changing the AED to 0.7 from 1.0. Something I see now I didn't approach in the right way... too late now sadly. >>> >>> MAKER finishes fine, but now it views all previous scaffolds as FAILED. Nothing seems to change this and now the datastore is for all intents and purposes locked in failed state. It keeps mentioning changes to the opts file which there were, and that the previous runs didn't finish so it must delete them. The results obtained from round 1 are still there though Im pretty sure of that, all blast files etc are still there and populated. >>> >>> Can you tell me the main differences either clean_up or clean_try have and which will completely and irreversibly wipe the first run? Something I don't want to repeat, just allow me to progress to the next round. Im hesitant to run them, but I've backed up the datastore incase. My next attempt will be to pass the exact same maker_opts file from the round1 run, with the only change made to clean_try/clean_up....Is this approach misguided ? >>> >>> Your help is very much appreciated Michael so thank you, >>> Best >>> L >>> >>> ? >>> ?Combined_Protein_homology.fa.zip ?? >>> ?SubsampledGenomeFile_n10_11MB.fasta ? >>> >>> >>> >>> On Tue, Nov 14, 2017 at 3:08 PM, Michael Campbell > wrote: >>> Hi Lahcen, >>> >>> Nothing comes right to mind for what could be causing this error. If you want to compress your FASTA and send it to me I can try and recreate the error and try and debug it. >>> >>> Thanks, >>> Mike >>>> On Nov 14, 2017, at 7:15 AM, lahcen campbell > wrote: >>>> >>>> Hi MAKER community, >>>> >>>> I was hoping someone could help me. I have a very unusual error with two different versions of maker I have tested so far. This error shouldn't be happening but it occurs time and again no matter what I try. I have tried using 2.31.6_mpich3_icc and 2.31_mpich3 >>>> >>>> Note that version 2.31.6_mpich3_icc is one I have used countless times and produced final MAKER annotations without issue. So its not that this version has issues to date. >>>> >>>> Basically, this is a brand new MAKER analysis, I am only trying to train SNAP in this first round. I am following the MakerTutorial as documented this time around and I can't get past the initial SNAP train stage. >>>> >>>> I have a single genome file with, 10 Long scaffolds making up just under 11MB (subsampled from my original full length assembly) of sequence data in which to train SNAP. The fasta file is not corrupted, and has been generated in various ways in order to test formatting issues etc. >>>> >>>> I have only edited the maker_opts file and changed: >>>> >>>> genome= >>>> protein= >>>> protein2genome=1 >>>> >>>> But see attached my maker CTL files. >>>> >>>> The error consistently returned to me: >>>> >>>> Skipping the contig because it is too short!! >>>> SeqID: contig_WHATEVER >>>> Length: 0 >>>> >>>> The sequences are no where near too short. This was verified independently outside maker to be sure. >>>> >>>> The headers are as follows: >>>> >>>> >tig00000458 len=2889428 reads=4143 covStat=1793.77 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >>>> >tig00000159 len=3515005 reads=5100 covStat=2143.94 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >>>> >tig00006117 len=1009519 reads=1168 covStat=804.93 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >>>> >tig00000419 len=2633986 reads=3938 covStat=1519.93 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >>>> >tig00027677 len=108573 reads=86 covStat=86.05 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >>>> >tig00021790 len=202251 reads=158 covStat=184.12 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >>>> >tig00316948 len=280333 reads=237 covStat=253.23 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >>>> >tig00019606 len=149709 reads=82 covStat=150.02 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >>>> >tig00023852 len=189461 reads=115 covStat=192.28 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >>>> >tig00316994 len=19742 reads=1 covStat=0.00 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >>>> >>>> I have just about given up, I have no idea why its happening it makes zero sense. >>>> >>>> Any help or information as to why this might be happening would be amazing. >>>> >>>> Thank you in advance. >>>> Lahcen >>>> >>>> -- >>>> ========================================== >>>> > Dr. Lahcen Campbell < >>>> > Contact: lahcencampbell at gmail.com < >>>> > https://www.ebi.ac.uk/about/people/lahcen-campbell < >>>> ========================================== >>>> _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at box290.bluehost.com >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> >>> >>> >>> -- >>> ========================================== >>> > Dr. Lahcen Campbell < >>> > Contact: lahcencampbell at gmail.com < >>> > https://www.ebi.ac.uk/about/people/lahcen-campbell < >>> ========================================== >> >> >> >> >> -- >> ========================================== >> > Dr. Lahcen Campbell < >> > Contact: lahcencampbell at gmail.com < >> > https://www.ebi.ac.uk/about/people/lahcen-campbell < >> ========================================== > -------------- next part -------------- An HTML attachment was scrubbed... URL: From qwzhang0601 at gmail.com Tue Nov 21 11:42:38 2017 From: qwzhang0601 at gmail.com (Quanwei Zhang) Date: Tue, 21 Nov 2017 12:42:38 -0500 Subject: [maker-devel] About loss of Histone H2A, H2B, H4 In-Reply-To: <3A172BAF-DD5C-4CA8-8D1C-6EFF30A1FFA4@gmail.com> References: <3A172BAF-DD5C-4CA8-8D1C-6EFF30A1FFA4@gmail.com> Message-ID: Dear Carson: Thank you for your comments and suggestions. Now the SNAP was trained with repeat masked, is it necessary to retrain the predictor without repeat masking? By BUSCO analysis on the genome, the completeness is shown as below. Now I am doing the analysis using the default reports of Maker2 (i.e., gene models with evidence support, the default build). For the gene loss, besides you suggestions I am also considering to do the analysis using the gene models with evidence support plus those with scanned domains (i.e., standard build). How do you think? C:95.0%[S:92.7%,D:2.3%],F:2.2%,M:2.8%,n:4104 3902 Complete BUSCOs (C) 3806 Complete and single-copy BUSCOs (S) 96 Complete and duplicated BUSCOs (D) 92 Fragmented BUSCOs (F) 110 Missing BUSCOs (M) Thanks Best Quanwei 2017-11-21 11:19 GMT-05:00 Carson Holt : > No known biases, but if you are concerned, you can collect known Histone > H2A, H2B, H4 proteins and transcripts from other species (protein= and > altest= options), them run MAKER with no masking to see if you gain any > models that may have been overlooked because of over-masking of repeats. > Make sure to evaluate any models you find as being a pseudogene. Run > InterProScan on results to make sure they contain known InterPro domains > for that gene family as well. Running without repeat masking will increase > sensitivity but also false positives derived from low homology alignments > to simple repeats which is why you need to evaluate results using something > like InterProScan. > > Also run BUSCO to evaluate the completeness of the genome. Make sure that > the observed contraction is not just a result of an incomplete assembly. > > ?Carson > > > On Nov 16, 2017, at 12:46 PM, Quanwei Zhang wrote: > > Hello: > > We have annotated a new rodent genome using Maker2. Based on the annotated > maker2 gene sets, we did gene family expansion/contraction analysis using > CAFE. We found Histone H2A, H2B, H4 gene families are under contraction. I > wonder whether there are known bias to predict those gene families using > Maker2? For example, can this due to repeat masking of the genome? I used > repeatmaker and generated species specific repeat libraries follows > http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/ > Repeat_Library_Construction--Basic. > > Thanks > > Best > Quanwei > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wanghai01 at caas.cn Mon Nov 27 07:18:36 2017 From: wanghai01 at caas.cn (HAI WANG) Date: Mon, 27 Nov 2017 08:18:36 -0500 Subject: [maker-devel] Need your help on maker pipeline Message-ID: <000601d36782$3e24e0d0$ba6ea270$@cn> Dear Professor Yandell, I am Hai Wang, a visiting scholar in Cornell University. I am sorry to bother you, but I really need your help. I am now using the maker pipeline to annotate a maize genome. The installation of maker, openmpi and other software should be OK since I've successfully run maker on your example data. But when I ran maker on my own maize genome, I always got the following error: A process has executed an operation involving a call to the "fork()" system call to create a child process. Open MPI is currently operating in a condition that could result in memory corruption or other system errors; your job may hang, crash, or produce silent data corruption. The use of fork() (or system() or other calls that create child processes) is strongly discouraged. The process that invoked fork was: Local host: [[21269,1],0] (PID 12537) If you are *absolutely sure* that your application will successfully and correctly survive a call to fork(), you may disable this warning by setting the mpi_warn_on_fork MCA parameter to 0. -------------------------------------------------------------------------- -------------------------------------------------------------------------- mpiexec noticed that process rank 32 with PID 0 on node fat1 exited on signal 11 (Segmentation fault). -------------------------------------------------------------------------- Could you please help me with this issue? Or is there a way that I can resume this job when it stops? Thank you very much! Best, Hai Wang -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Nov 27 13:45:57 2017 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 27 Nov 2017 12:45:57 -0700 Subject: [maker-devel] Need your help on maker pipeline In-Reply-To: <000601d36782$3e24e0d0$ba6ea270$@cn> References: <000601d36782$3e24e0d0$ba6ea270$@cn> Message-ID: The parameters needed to get OpenMPI to work with MAKER are described in the ?/maker/INSTALL file (specifically look at LD_PRELOAD and -mca btl ^openib) ?> !!IMPORTANT!! MAKER is not compatible with MVAPICH2. Use OpenMPI or MPICH. If using MPICH, make sure to enable shared libaries during installation (this is not the default). If using OpenMPI, make sure to set LD_PRELOAD to the location of libmpi.so before even trying to install MAKER. It must also be set before running MAKER (or any program that uses OpenMPI's shared libraries), so it's best just to add it to your ~/.bash_profile. (i.e. export LD_PRELOAD=/usr/local/openmpi/lib/libmpi.so). 1. Say yes to the 'configure for MPI' question when running 'perl Build.PL' in step 1 of the EASY INSTALL. 2. Give path to 'mpicc'. Note to make sure you do not give the path to 'mpicc' from another MPI flavor that might be installed on your system. 3. Give path to the folder containing 'mpi,h'. Note to make sure you do not give the path to a folder from another MPI flavor that might be installed on your system. Mixing MPI flavors for 'mpicc' and 'mpi.h' will cause failures. Make sure to read and confirm the auto-detected paths. 4. Finish installation according to steps 2-4 of the EASY INSTALL Note: For OpenMPI you may also want to set OMPI_MCA_mpi_warn_on_fork=0 in your ~/.bash_profile to turn off certain nonfatal warnings. Note: If jobs hang or freeze when using mpiexec under OpenMPI try adding the '-mca btl ^openib' flag to mpiexec command when running MAKER. Example: mpiexec -mca btl ^openib -n 20 maker Then to diable the forks warning, just add the parameter --mca mpi_warn_on_fork 0 to the mpiexec options as described in the warning. How to run with OpenMPi has also been covered extensively ibn the MAKER list archives and more detail can be found there ?> https://groups.google.com/forum/#!searchin/maker-devel/openmpi%7Csort:date Thanks, Carson > On Nov 27, 2017, at 6:18 AM, HAI WANG wrote: > > Dear Professor Yandell, > > I am Hai Wang, a visiting scholar in Cornell University. I am sorry to bother you, but I really need your help. I am now using the maker pipeline to annotate a maize genome. The installation of maker, openmpi and other software should be OK since I?ve successfully run maker on your example data. > > But when I ran maker on my own maize genome, I always got the following error: > > > A process has executed an operation involving a call to the > "fork()" system call to create a child process. Open MPI is currently > operating in a condition that could result in memory corruption or > other system errors; your job may hang, crash, or produce silent > data corruption. The use of fork() (or system() or other calls that > create child processes) is strongly discouraged. > > The process that invoked fork was: > > Local host: [[21269,1],0] (PID 12537) > > If you are *absolutely sure* that your application will successfully > and correctly survive a call to fork(), you may disable this warning > by setting the mpi_warn_on_fork MCA parameter to 0. > -------------------------------------------------------------------------- > -------------------------------------------------------------------------- > mpiexec noticed that process rank 32 with PID 0 on node fat1 exited on signal 11 (Segmentation fault). > -------------------------------------------------------------------------- > > Could you please help me with this issue? Or is there a way that I can resume this job when it stops? Thank you very much! > > Best, > Hai Wang > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Nov 27 13:56:04 2017 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 27 Nov 2017 12:56:04 -0700 Subject: [maker-devel] About loss of Histone H2A, H2B, H4 In-Reply-To: References: <3A172BAF-DD5C-4CA8-8D1C-6EFF30A1FFA4@gmail.com> Message-ID: You should not have to train separately for SNAP on unmasked sequence, and I do believe adding back genes that were rejected because of lack of support but contain an identifiable domain may help. These will be in the fasta files labeled non-overlapping file in the datastore. ?Carson > On Nov 21, 2017, at 10:42 AM, Quanwei Zhang wrote: > > Dear Carson: > > Thank you for your comments and suggestions. Now the SNAP was trained with repeat masked, is it necessary to retrain the predictor without repeat masking? > By BUSCO analysis on the genome, the completeness is shown as below. Now I am doing the analysis using the default reports of Maker2 (i.e., gene models with evidence support, the default build). For the gene loss, besides you suggestions I am also considering to do the analysis using the gene models with evidence support plus those with scanned domains (i.e., standard build). How do you think? > > > C:95.0%[S:92.7%,D:2.3%],F:2.2%,M:2.8%,n:4104 > 3902 Complete BUSCOs (C) > 3806 Complete and single-copy BUSCOs (S) > 96 Complete and duplicated BUSCOs (D) > 92 Fragmented BUSCOs (F) > 110 Missing BUSCOs (M) > > Thanks > Best > Quanwei > > > 2017-11-21 11:19 GMT-05:00 Carson Holt >: > No known biases, but if you are concerned, you can collect known Histone H2A, H2B, H4 proteins and transcripts from other species (protein= and altest= options), them run MAKER with no masking to see if you gain any models that may have been overlooked because of over-masking of repeats. Make sure to evaluate any models you find as being a pseudogene. Run InterProScan on results to make sure they contain known InterPro domains for that gene family as well. Running without repeat masking will increase sensitivity but also false positives derived from low homology alignments to simple repeats which is why you need to evaluate results using something like InterProScan. > > Also run BUSCO to evaluate the completeness of the genome. Make sure that the observed contraction is not just a result of an incomplete assembly. > > ?Carson > > >> On Nov 16, 2017, at 12:46 PM, Quanwei Zhang > wrote: >> >> Hello: >> >> We have annotated a new rodent genome using Maker2. Based on the annotated maker2 gene sets, we did gene family expansion/contraction analysis using CAFE. We found Histone H2A, H2B, H4 gene families are under contraction. I wonder whether there are known bias to predict those gene families using Maker2? For example, can this due to repeat masking of the genome? I used repeatmaker and generated species specific repeat libraries follows http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction--Basic . >> >> Thanks >> >> Best >> Quanwei >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From qwzhang0601 at gmail.com Tue Nov 28 07:39:52 2017 From: qwzhang0601 at gmail.com (Quanwei Zhang) Date: Tue, 28 Nov 2017 08:39:52 -0500 Subject: [maker-devel] About loss of Histone H2A, H2B, H4 In-Reply-To: References: <3A172BAF-DD5C-4CA8-8D1C-6EFF30A1FFA4@gmail.com> Message-ID: Dear Carson: Thank you! Best Quanwei 2017-11-27 14:56 GMT-05:00 Carson Holt : > You should not have to train separately for SNAP on unmasked sequence, and > I do believe adding back genes that were rejected because of lack of > support but contain an identifiable domain may help. These will be in the > fasta files labeled non-overlapping file in the datastore. > > ?Carson > > On Nov 21, 2017, at 10:42 AM, Quanwei Zhang wrote: > > Dear Carson: > > Thank you for your comments and suggestions. Now the SNAP was trained with > repeat masked, is it necessary to retrain the predictor without repeat > masking? > By BUSCO analysis on the genome, the completeness is shown as below. Now I > am doing the analysis using the default reports of Maker2 (i.e., gene > models with evidence support, the default build). For the gene loss, > besides you suggestions I am also considering to do the analysis using the > gene models with evidence support plus those with scanned domains (i.e., > standard build). How do you think? > > > C:95.0%[S:92.7%,D:2.3%],F:2.2%,M:2.8%,n:4104 > 3902 Complete BUSCOs (C) > 3806 Complete and single-copy BUSCOs (S) > 96 Complete and duplicated BUSCOs (D) > 92 Fragmented BUSCOs (F) > 110 Missing BUSCOs (M) > > Thanks > Best > Quanwei > > > 2017-11-21 11:19 GMT-05:00 Carson Holt : > >> No known biases, but if you are concerned, you can collect known Histone >> H2A, H2B, H4 proteins and transcripts from other species (protein= and >> altest= options), them run MAKER with no masking to see if you gain any >> models that may have been overlooked because of over-masking of repeats. >> Make sure to evaluate any models you find as being a pseudogene. Run >> InterProScan on results to make sure they contain known InterPro domains >> for that gene family as well. Running without repeat masking will increase >> sensitivity but also false positives derived from low homology alignments >> to simple repeats which is why you need to evaluate results using something >> like InterProScan. >> >> Also run BUSCO to evaluate the completeness of the genome. Make sure that >> the observed contraction is not just a result of an incomplete assembly. >> >> ?Carson >> >> >> On Nov 16, 2017, at 12:46 PM, Quanwei Zhang >> wrote: >> >> Hello: >> >> We have annotated a new rodent genome using Maker2. Based on the >> annotated maker2 gene sets, we did gene family expansion/contraction >> analysis using CAFE. We found Histone H2A, H2B, H4 gene families are under >> contraction. I wonder whether there are known bias to predict those gene >> families using Maker2? For example, can this due to repeat masking of the >> genome? I used repeatmaker and generated species specific repeat libraries >> follows http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repe >> at_Library_Construction--Basic. >> >> Thanks >> >> Best >> Quanwei >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Nov 28 17:39:47 2017 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 28 Nov 2017 16:39:47 -0700 Subject: [maker-devel] custom "ab initio" predictions with automatic hint-based predictions In-Reply-To: <81D27009-2422-4116-848A-E2C862A74075@univie.ac.at> References: <947BFB2F-A893-417B-A043-07CE71F6F97E@gmail.com> <81D27009-2422-4116-848A-E2C862A74075@univie.ac.at> Message-ID: <768084A0-A5DA-4745-8151-D53AD0E495E3@gmail.com> Your patch will essentially just turn off all maker hint based gene prediction when no_abinit is turned on. We do not currently have a way to pass in external hints, but if you just want your hint based predictions to compete against MAKER hint based prediction, you can provide it as pred_gff while still letting MAKER run by giving the augustus_species file. ?Carson > On Nov 28, 2017, at 7:37 AM, Bob Zimmermann wrote: > > Dear Carson, > > Thanks for the response! Sorry for the slow reply. > > Actually what I meant was that I wanted to generate other types of hints that maker could not automatically use to prevent lower quality ab initio predictions from influencing the final output. Therefore I wanted to make my own ab intio predicitions prior to running maker, and then have maker to generate the transcript hints and then run augustus, finally synthesizing my own ab initio predicions with the maker hint-based ones. (In other words, just run the second round of augustus, not the first one.) > > I?ve attached a patch which seemed to allow me to tell maker to do what I wanted it to do. Am I missing something? > > Best, > Bob > > ? > > Department of Molecular Evolution and Development > Universit?t Wien > Althanstra?e 14 (UZA I), Zimmer 2.019 > 1090 Vienna > Austria > > +43 1 427757002 > > > >> On 13 Oct 2017, at 17:42, Carson Holt wrote: >> >> Hi Bob, >> >> pred_gff is a way to get models MAKER cannot run into the analysis. Input to pred_gff will not get hints since MAKER is not running the program. Setting augustus_species allows MAKER to run Augustus with and without hints and then those models compete against each other. You cannot just run with hints as the raw model is also used as a filter to help reduce false positive gene models that result from bad hints. If the gff3 you are providing is the same as the MAKER run of Augustus, I would recommend not providing it. If it is different in some way, then you can leave it in. If you run under MPI (it?s ok to run MPI on a single machine), then MAKER will parallelize the Augustus run by running multiple configs and contig chunks at the same time. >> >> Thanks, >> Carson >> >> >> >> >> >>> On Oct 11, 2017, at 1:42 PM, Bob Zimmermann wrote: >>> >>> Hello, >>> >>> I would like to run maker with a custom set of ab initio predictions (based on hints given to augustus from RNAseq data), but allowing it to incorporate EST and protein data to make an additional run of augustus using hints derived from those alignments. >>> >>> My gene prediction section of the maker_opts.ctl file looks like this: >>> ... >>> augustus_species=all_combined #Augustus gene prediction species model >>> ... >>> pred_gff=../ab_initio_predictions/all_combined.augustus_masked.gff3 #ab-initio predictions from an external GFF3 file >>> model_gff= #annotated gene models from an external GFF3 file (annotation pass-through) >>> est2genome=1 #infer gene predictions directly from ESTs, 1 = yes, 0 = no >>> protein2genome=0 #infer predictions from protein homology, 1 = yes, 0 = no >>> ? >>> >>> It seems as though even if pred_gff is set, augustus will still be run for ab initio predictions with no hints if an augustus_species setting is present. I was curious if there was any way around this, partly because custom ab initios could improve my annotation and also because the ab initio step can take long. >>> >>> Thanks for your help! >>> >>> Bob >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> > From robert.zimmermann at univie.ac.at Tue Nov 28 08:37:40 2017 From: robert.zimmermann at univie.ac.at (Bob Zimmermann) Date: Tue, 28 Nov 2017 15:37:40 +0100 Subject: [maker-devel] custom "ab initio" predictions with automatic hint-based predictions In-Reply-To: <947BFB2F-A893-417B-A043-07CE71F6F97E@gmail.com> References: <947BFB2F-A893-417B-A043-07CE71F6F97E@gmail.com> Message-ID: <81D27009-2422-4116-848A-E2C862A74075@univie.ac.at> Dear Carson, Thanks for the response! Sorry for the slow reply. Actually what I meant was that I wanted to generate other types of hints that maker could not automatically use to prevent lower quality ab initio predictions from influencing the final output. Therefore I wanted to make my own ab intio predicitions prior to running maker, and then have maker to generate the transcript hints and then run augustus, finally synthesizing my own ab initio predicions with the maker hint-based ones. (In other words, just run the second round of augustus, not the first one.) I?ve attached a patch which seemed to allow me to tell maker to do what I wanted it to do. Am I missing something? Best, Bob ? Department of Molecular Evolution and Development Universit?t Wien Althanstra?e 14 (UZA I), Zimmer 2.019 1090 Vienna Austria +43 1 427757002 -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_noabinit.patch Type: application/octet-stream Size: 950 bytes Desc: not available URL: -------------- next part -------------- > On 13 Oct 2017, at 17:42, Carson Holt wrote: > > Hi Bob, > > pred_gff is a way to get models MAKER cannot run into the analysis. Input to pred_gff will not get hints since MAKER is not running the program. Setting augustus_species allows MAKER to run Augustus with and without hints and then those models compete against each other. You cannot just run with hints as the raw model is also used as a filter to help reduce false positive gene models that result from bad hints. If the gff3 you are providing is the same as the MAKER run of Augustus, I would recommend not providing it. If it is different in some way, then you can leave it in. If you run under MPI (it?s ok to run MPI on a single machine), then MAKER will parallelize the Augustus run by running multiple configs and contig chunks at the same time. > > Thanks, > Carson > > > > > >> On Oct 11, 2017, at 1:42 PM, Bob Zimmermann wrote: >> >> Hello, >> >> I would like to run maker with a custom set of ab initio predictions (based on hints given to augustus from RNAseq data), but allowing it to incorporate EST and protein data to make an additional run of augustus using hints derived from those alignments. >> >> My gene prediction section of the maker_opts.ctl file looks like this: >> ... >> augustus_species=all_combined #Augustus gene prediction species model >> ... >> pred_gff=../ab_initio_predictions/all_combined.augustus_masked.gff3 #ab-initio predictions from an external GFF3 file >> model_gff= #annotated gene models from an external GFF3 file (annotation pass-through) >> est2genome=1 #infer gene predictions directly from ESTs, 1 = yes, 0 = no >> protein2genome=0 #infer predictions from protein homology, 1 = yes, 0 = no >> ? >> >> It seems as though even if pred_gff is set, augustus will still be run for ab initio predictions with no hints if an augustus_species setting is present. I was curious if there was any way around this, partly because custom ab initios could improve my annotation and also because the ab initio step can take long. >> >> Thanks for your help! >> >> Bob >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > From eennadi at gmail.com Thu Nov 2 13:51:00 2017 From: eennadi at gmail.com (Emmanuel Nnadi) Date: Thu, 2 Nov 2017 20:51:00 +0100 Subject: [maker-devel] Error trying to submit genome to ncbi Message-ID: Hi, I am trying to submit my genome i annotated using maker and they sent back this error, 1. Please remove any N nucleotides from the beginning or end of the sequence 2.No feature should begin or end inside a gap. Instead the feature should be made partial at the gap boundary. [3] Coding regions should not be 5' partial if they begin with the start methionine. If this is an internal methionine int he translation than it is fine if they are partial. Conversely, all coding regions must have a stop codon or be 3' partial. You have a large number of gene features that are not associated with other features. Please include on these features in the gene description field some description of what the gene would have encoded. A feature table example of this is: <41156 >40652 gene gene_desc transposon locus_tag CR513_45338 note nonfunctional due to frameshift Please how can i use maker to solve this problem? Nnadi Nnaemeka Emmanuel Department of Microbiology, Faculty of Natural and Applied Science, Plateau State University, Bokkos, Plateau State, Nigeria. Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications -------------- next part -------------- An HTML attachment was scrubbed... URL: From dandence at gmail.com Thu Nov 2 14:08:54 2017 From: dandence at gmail.com (Daniel Ence) Date: Thu, 2 Nov 2017 16:08:54 -0400 Subject: [maker-devel] Error trying to submit genome to ncbi In-Reply-To: References: Message-ID: Hi, I think you?ve posted before about issues 1 and 2 from the NCBI. The note for issue 3 from NCBI sounds like there are gene features that don?t have associated transcript, CDS or exon features. I?m not certain how that could be a result from MAKER. It might be something that someone else created (manually or with another tool), and then passed to maker from a GFF file. In the example included in your email, it looks like these offending genes are transposons that have been annotated as genes. If that is the case for the rest of the offending genes, then I would suggest changing the ?type? field (column 3) from ?gene? to something else, like ?transposable_element? perhaps. ~Daniel > On Nov 2, 2017, at 3:51 PM, Emmanuel Nnadi wrote: > > Hi, > > I am trying to submit my genome i annotated using maker and they sent back this error, > 1. Please remove any N nucleotides from the beginning or end of the sequence > 2.No feature should begin or end inside a gap. Instead the feature should > be made partial at the gap boundary. > > [3] Coding regions should not be 5' partial if they begin with the start > methionine. If this is an internal methionine int he translation than > it is fine if they are partial. Conversely, all coding regions > must have a stop codon or be 3' partial. > You have a large number of gene features that are not associated > with other features. Please include on these features in the > gene description field some description of what the gene would > have encoded. > > A feature table example of this is: > > <41156 >40652 gene > gene_desc transposon > locus_tag CR513_45338 > note nonfunctional due to frameshift > Please how can i use maker to solve this problem? > > > Nnadi Nnaemeka Emmanuel > Department of Microbiology, > Faculty of Natural and Applied Science, > Plateau State University, Bokkos, Plateau State, Nigeria. > Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 1356 bytes Desc: not available URL: From dandence at gmail.com Thu Nov 2 14:24:31 2017 From: dandence at gmail.com (Daniel Ence) Date: Thu, 2 Nov 2017 16:24:31 -0400 Subject: [maker-devel] Error trying to submit genome to ncbi In-Reply-To: References: Message-ID: <5EB1FECF-535B-447D-AFCF-E13174DB4232@gmail.com> Hi, Thank you for sending me your data, but which ones are the offending genes that NCBI is complaining about? Can you identify the problem that NCBI is giving in some subset of the gene features? ~Daniel > On Nov 2, 2017, at 4:20 PM, Emmanuel Nnadi wrote: > > Hi Daniel thanks for your reply. > > I have attached my .tbl file > > you would see > <77753 >77549 gene > locus_tag CR513_00193 > gene AtMg00820 > note nonfunctional due to frameshift > > > Is another example. > > Its becoming frustrating. > > I have not posted the two errors before > [1] Please remove any N nucleotides from the beginning or end of the sequence. > > [2] No feature should begin or end inside a gap. Instead the feature should > be made partial at the gap boundary. > > [3] Coding regions should not be 5' partial if they begin with the start > methionine. If this is an internal methionine int he translation than > it is fine if they are partial. Conversely, all coding regions > must have a stop codon or be 3' partial. > > Nnadi Nnaemeka Emmanuel > Department of Microbiology, > Faculty of Natural and Applied Science, > Plateau State University, Bokkos, Plateau State, Nigeria. > Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications > On Thu, Nov 2, 2017 at 9:08 PM, Daniel Ence > wrote: > Hi, I think you?ve posted before about issues 1 and 2 from the NCBI. The note for issue 3 from NCBI sounds like there are gene features that don?t have associated transcript, CDS or exon features. I?m not certain how that could be a result from MAKER. It might be something that someone else created (manually or with another tool), and then passed to maker from a GFF file. In the example included in your email, it looks like these offending genes are transposons that have been annotated as genes. If that is the case for the rest of the offending genes, then I would suggest changing the ?type? field (column 3) from ?gene? to something else, like ?transposable_element? perhaps. > > ~Daniel > > >> On Nov 2, 2017, at 3:51 PM, Emmanuel Nnadi > wrote: >> >> Hi, >> >> I am trying to submit my genome i annotated using maker and they sent back this error, >> 1. Please remove any N nucleotides from the beginning or end of the sequence >> 2.No feature should begin or end inside a gap. Instead the feature should >> be made partial at the gap boundary. >> >> [3] Coding regions should not be 5' partial if they begin with the start >> methionine. If this is an internal methionine int he translation than >> it is fine if they are partial. Conversely, all coding regions >> must have a stop codon or be 3' partial. >> You have a large number of gene features that are not associated >> with other features. Please include on these features in the >> gene description field some description of what the gene would >> have encoded. >> >> A feature table example of this is: >> >> <41156 >40652 gene >> gene_desc transposon >> locus_tag CR513_45338 >> note nonfunctional due to frameshift >> Please how can i use maker to solve this problem? >> >> >> Nnadi Nnaemeka Emmanuel >> Department of Microbiology, >> Faculty of Natural and Applied Science, >> Plateau State University, Bokkos, Plateau State, Nigeria. >> Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 1356 bytes Desc: not available URL: From dandence at gmail.com Thu Nov 2 14:46:03 2017 From: dandence at gmail.com (Daniel Ence) Date: Thu, 2 Nov 2017 16:46:03 -0400 Subject: [maker-devel] Error trying to submit genome to ncbi In-Reply-To: References: <5EB1FECF-535B-447D-AFCF-E13174DB4232@gmail.com> Message-ID: These gene features with the ?nonfunctional due to frameshift? indeed do not have other features associated with them in the tbl files. Is this reflected in the gff3 files for these annotations that maker produced? I?m not certain how maker would maker a gene without a CDS or mRNA, but identifying those discrepancies would a place to understand what has happened. > On Nov 2, 2017, at 4:30 PM, Emmanuel Nnadi wrote: > > Hi Daniel, > > This is the mail they sent to me > > [1] Please remove any N nucleotides from the beginning or end of the sequence. > > [2] No feature should begin or end inside a gap. Instead the feature should > be made partial at the gap boundary. > > [3] Coding regions should not be 5' partial if they begin with the start > methionine. If this is an internal methionine int he translation than > it is fine if they are partial. Conversely, all coding regions > must have a stop codon or be 3' partial. > > [4] You have a large number of gene features that are not associated > with other features. Please include on these features in the > gene description field some description of what the gene would > have encoded. > > A feature table example of this is: > > <41156 >40652 gene > gene_desc transposon > locus_tag CR513_45338 > note nonfunctional due to frameshift > > [5] Every coding region must have a corresponding mRNA and in > every case the mRNA product name must match exactly that of the > CDS feature. > > 2 coding regions do not have an mRNA > ORIG/combined_1-5000.sqn:CDS cytochrome c oxidase subunit 2 (contig_100:<38458- > 39198, 40429->40623) CR513_00692 > ORIG/combined_1-5000.sqn:CDS cytochrome c oxidase subunit 1 > (contig_100:c>113064-111485, c111245-111221) CR513_00691 > > So I just went to the .tbl file and searched for nonfunctional due to frameshift They are quite much, I have two more .tbl files > > I used GAG annotation to remove NNN and to add start and stop codon but ncbi still complained. > > > I have ran out of idea > > Please help me > > > > > > > Nnadi Nnaemeka Emmanuel > Department of Microbiology, > Faculty of Natural and Applied Science, > Plateau State University, Bokkos, Plateau State, Nigeria. > Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications > On Thu, Nov 2, 2017 at 9:24 PM, Daniel Ence > wrote: > Hi, Thank you for sending me your data, but which ones are the offending genes that NCBI is complaining about? Can you identify the problem that NCBI is giving in some subset of the gene features? > > ~Daniel > > > > >> On Nov 2, 2017, at 4:20 PM, Emmanuel Nnadi > wrote: >> >> Hi Daniel thanks for your reply. >> >> I have attached my .tbl file >> >> you would see >> <77753 >77549 gene >> locus_tag CR513_00193 >> gene AtMg00820 >> note nonfunctional due to frameshift >> >> >> Is another example. >> >> Its becoming frustrating. >> >> I have not posted the two errors before >> [1] Please remove any N nucleotides from the beginning or end of the sequence. >> >> [2] No feature should begin or end inside a gap. Instead the feature should >> be made partial at the gap boundary. >> >> [3] Coding regions should not be 5' partial if they begin with the start >> methionine. If this is an internal methionine int he translation than >> it is fine if they are partial. Conversely, all coding regions >> must have a stop codon or be 3' partial. >> >> Nnadi Nnaemeka Emmanuel >> Department of Microbiology, >> Faculty of Natural and Applied Science, >> Plateau State University, Bokkos, Plateau State, Nigeria. >> Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications >> On Thu, Nov 2, 2017 at 9:08 PM, Daniel Ence > wrote: >> Hi, I think you?ve posted before about issues 1 and 2 from the NCBI. The note for issue 3 from NCBI sounds like there are gene features that don?t have associated transcript, CDS or exon features. I?m not certain how that could be a result from MAKER. It might be something that someone else created (manually or with another tool), and then passed to maker from a GFF file. In the example included in your email, it looks like these offending genes are transposons that have been annotated as genes. If that is the case for the rest of the offending genes, then I would suggest changing the ?type? field (column 3) from ?gene? to something else, like ?transposable_element? perhaps. >> >> ~Daniel >> >> >>> On Nov 2, 2017, at 3:51 PM, Emmanuel Nnadi > wrote: >>> >>> Hi, >>> >>> I am trying to submit my genome i annotated using maker and they sent back this error, >>> 1. Please remove any N nucleotides from the beginning or end of the sequence >>> 2.No feature should begin or end inside a gap. Instead the feature should >>> be made partial at the gap boundary. >>> >>> [3] Coding regions should not be 5' partial if they begin with the start >>> methionine. If this is an internal methionine int he translation than >>> it is fine if they are partial. Conversely, all coding regions >>> must have a stop codon or be 3' partial. >>> You have a large number of gene features that are not associated >>> with other features. Please include on these features in the >>> gene description field some description of what the gene would >>> have encoded. >>> >>> A feature table example of this is: >>> >>> <41156 >40652 gene >>> gene_desc transposon >>> locus_tag CR513_45338 >>> note nonfunctional due to frameshift >>> Please how can i use maker to solve this problem? >>> >>> >>> Nnadi Nnaemeka Emmanuel >>> Department of Microbiology, >>> Faculty of Natural and Applied Science, >>> Plateau State University, Bokkos, Plateau State, Nigeria. >>> Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 1356 bytes Desc: not available URL: From carsonhh at gmail.com Thu Nov 2 14:48:40 2017 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 2 Nov 2017 14:48:40 -0600 Subject: [maker-devel] Error trying to submit genome to ncbi In-Reply-To: References: <5EB1FECF-535B-447D-AFCF-E13174DB4232@gmail.com> Message-ID: <56DF0ADA-40DA-4C88-AD37-BF63D8BCFD22@gmail.com> If you modified the fasta files to remove N?s etc after they were annotated, then that would generate a mismatch between the GFF3 coordinates and the fasta sequence. Have you modified or split contigs in the assembly in any way? I seem to remember you posting an issue about the fasta submission to NCBI previously. ?Carson > On Nov 2, 2017, at 2:46 PM, Daniel Ence wrote: > > These gene features with the ?nonfunctional due to frameshift? indeed do not have other features associated with them in the tbl files. Is this reflected in the gff3 files for these annotations that maker produced? I?m not certain how maker would maker a gene without a CDS or mRNA, but identifying those discrepancies would a place to understand what has happened. > > > >> On Nov 2, 2017, at 4:30 PM, Emmanuel Nnadi > wrote: >> >> Hi Daniel, >> >> This is the mail they sent to me >> >> [1] Please remove any N nucleotides from the beginning or end of the sequence. >> >> [2] No feature should begin or end inside a gap. Instead the feature should >> be made partial at the gap boundary. >> >> [3] Coding regions should not be 5' partial if they begin with the start >> methionine. If this is an internal methionine int he translation than >> it is fine if they are partial. Conversely, all coding regions >> must have a stop codon or be 3' partial. >> >> [4] You have a large number of gene features that are not associated >> with other features. Please include on these features in the >> gene description field some description of what the gene would >> have encoded. >> >> A feature table example of this is: >> >> <41156 >40652 gene >> gene_desc transposon >> locus_tag CR513_45338 >> note nonfunctional due to frameshift >> >> [5] Every coding region must have a corresponding mRNA and in >> every case the mRNA product name must match exactly that of the >> CDS feature. >> >> 2 coding regions do not have an mRNA >> ORIG/combined_1-5000.sqn:CDS cytochrome c oxidase subunit 2 (contig_100:<38458- >> 39198, 40429->40623) CR513_00692 >> ORIG/combined_1-5000.sqn:CDS cytochrome c oxidase subunit 1 >> (contig_100:c>113064-111485, c111245-111221) CR513_00691 >> >> So I just went to the .tbl file and searched for nonfunctional due to frameshift They are quite much, I have two more .tbl files >> >> I used GAG annotation to remove NNN and to add start and stop codon but ncbi still complained. >> >> >> I have ran out of idea >> >> Please help me >> >> >> >> >> >> >> Nnadi Nnaemeka Emmanuel >> Department of Microbiology, >> Faculty of Natural and Applied Science, >> Plateau State University, Bokkos, Plateau State, Nigeria. >> Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications >> On Thu, Nov 2, 2017 at 9:24 PM, Daniel Ence > wrote: >> Hi, Thank you for sending me your data, but which ones are the offending genes that NCBI is complaining about? Can you identify the problem that NCBI is giving in some subset of the gene features? >> >> ~Daniel >> >> >> >> >>> On Nov 2, 2017, at 4:20 PM, Emmanuel Nnadi > wrote: >>> >>> Hi Daniel thanks for your reply. >>> >>> I have attached my .tbl file >>> >>> you would see >>> <77753 >77549 gene >>> locus_tag CR513_00193 >>> gene AtMg00820 >>> note nonfunctional due to frameshift >>> >>> >>> Is another example. >>> >>> Its becoming frustrating. >>> >>> I have not posted the two errors before >>> [1] Please remove any N nucleotides from the beginning or end of the sequence. >>> >>> [2] No feature should begin or end inside a gap. Instead the feature should >>> be made partial at the gap boundary. >>> >>> [3] Coding regions should not be 5' partial if they begin with the start >>> methionine. If this is an internal methionine int he translation than >>> it is fine if they are partial. Conversely, all coding regions >>> must have a stop codon or be 3' partial. >>> >>> Nnadi Nnaemeka Emmanuel >>> Department of Microbiology, >>> Faculty of Natural and Applied Science, >>> Plateau State University, Bokkos, Plateau State, Nigeria. >>> Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications >>> On Thu, Nov 2, 2017 at 9:08 PM, Daniel Ence > wrote: >>> Hi, I think you?ve posted before about issues 1 and 2 from the NCBI. The note for issue 3 from NCBI sounds like there are gene features that don?t have associated transcript, CDS or exon features. I?m not certain how that could be a result from MAKER. It might be something that someone else created (manually or with another tool), and then passed to maker from a GFF file. In the example included in your email, it looks like these offending genes are transposons that have been annotated as genes. If that is the case for the rest of the offending genes, then I would suggest changing the ?type? field (column 3) from ?gene? to something else, like ?transposable_element? perhaps. >>> >>> ~Daniel >>> >>> >>>> On Nov 2, 2017, at 3:51 PM, Emmanuel Nnadi > wrote: >>>> >>>> Hi, >>>> >>>> I am trying to submit my genome i annotated using maker and they sent back this error, >>>> 1. Please remove any N nucleotides from the beginning or end of the sequence >>>> 2.No feature should begin or end inside a gap. Instead the feature should >>>> be made partial at the gap boundary. >>>> >>>> [3] Coding regions should not be 5' partial if they begin with the start >>>> methionine. If this is an internal methionine int he translation than >>>> it is fine if they are partial. Conversely, all coding regions >>>> must have a stop codon or be 3' partial. >>>> You have a large number of gene features that are not associated >>>> with other features. Please include on these features in the >>>> gene description field some description of what the gene would >>>> have encoded. >>>> >>>> A feature table example of this is: >>>> >>>> <41156 >40652 gene >>>> gene_desc transposon >>>> locus_tag CR513_45338 >>>> note nonfunctional due to frameshift >>>> Please how can i use maker to solve this problem? >>>> >>>> >>>> Nnadi Nnaemeka Emmanuel >>>> Department of Microbiology, >>>> Faculty of Natural and Applied Science, >>>> Plateau State University, Bokkos, Plateau State, Nigeria. >>>> Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at box290.bluehost.com >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> >>> >> >> > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From dandence at gmail.com Thu Nov 2 15:07:01 2017 From: dandence at gmail.com (Daniel Ence) Date: Thu, 2 Nov 2017 17:07:01 -0400 Subject: [maker-devel] Error trying to submit genome to ncbi In-Reply-To: References: <5EB1FECF-535B-447D-AFCF-E13174DB4232@gmail.com> Message-ID: Hi Emmanuel, I recommend looking into what Carson suggested. If you edited the fasta files for the ?NNN? characters for the transcripts or reference genome and then resubmitted without changing the gff3 coordinates, then that would result in these kind of errors. ~Daniel > On Nov 2, 2017, at 5:02 PM, Emmanuel Nnadi wrote: > > ?muc_functional.blast.gff -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 1356 bytes Desc: not available URL: From dandence at gmail.com Thu Nov 2 15:56:24 2017 From: dandence at gmail.com (Daniel Ence) Date: Thu, 2 Nov 2017 17:56:24 -0400 Subject: [maker-devel] Error trying to submit genome to ncbi In-Reply-To: References: <5EB1FECF-535B-447D-AFCF-E13174DB4232@gmail.com> Message-ID: <20FE86D2-2431-4CD8-B4E1-E700F723760C@gmail.com> Hi Emmanuel, Please ?reply all? to in these exchanges so that they?ll stay stored on the maker-devel list for others to find in the future. It also helps keep the conversation open so that others can chime in and help out too. :) I looked at several of the ?nonfunctional due to frameshift? genes and they have associated features in the gff3 file. So there might be a frameshift issue in the original annotations, but I?d doubt that, or a frameshift error might be getting introduced when you convert to the tbl format. > On Nov 2, 2017, at 5:12 PM, Emmanuel Nnadi wrote: > > Hi Daniel > > I NCBI first complained of this even when I hadn't used GAG annotation to remove N's, > > On my raw file they complained about this > > Nnadi Nnaemeka Emmanuel > Department of Microbiology, > Faculty of Natural and Applied Science, > Plateau State University, Bokkos, Plateau State, Nigeria. > Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications > On Thu, Nov 2, 2017 at 10:07 PM, Daniel Ence > wrote: > Hi Emmanuel, I recommend looking into what Carson suggested. If you edited the fasta files for the ?NNN? characters for the transcripts or reference genome and then resubmitted without changing the gff3 coordinates, then that would result in these kind of errors. > > ~Daniel > > > > > > > > > >> On Nov 2, 2017, at 5:02 PM, Emmanuel Nnadi > wrote: >> >> ?muc_functional.blast.gff > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 1356 bytes Desc: not available URL: From o.k.torresen at ibv.uio.no Thu Nov 9 02:44:06 2017 From: o.k.torresen at ibv.uio.no (=?utf-8?B?T2xlIEtyaXN0aWFuIFTDuHJyZXNlbg==?=) Date: Thu, 9 Nov 2017 09:44:06 +0000 Subject: [maker-devel] substr outside of string in PhatHits_utils.pm Message-ID: Dear all, I'm having an issue with MAKER which I'm unable to wrap my head around. Hopefully the issue is easily identifiable and resolvable for someone with more insight than me. Please find the log output attached below. I cannot find any more information than this in any logs. Many scaffolds do complete fine, but some of the longest ones have issues. Thank you. Sincerely, Ole K. T?rresen Error message: #--------- command -------------# Widget::augustus: /projects/cees/bin/augustus/augustus-3.2.3/bin/augustus --strand=backward --species=gadMor2_code_braker2 --UTR=off --hintsfile=/tmp/18899594.d/maker_m8xwVp/0/34_2.22239-28465.gadMor2_code_brak er2.auto_annotator.xdef.augustus --extrinsicCfgFile=/projects/cees/bin/augustus/augustus-3.2.3/config/extrinsic/extrinsic.MPE.cfg --AUGUSTUS_CONFIG_PATH=/projects/cees/bin/augustus/augustus-3.2 .3/config /tmp/18899594.d/maker_m8xwVp/0/34_2.22239-28465.gadMor2_code_braker2.auto_annotator.augustus.fasta > /tmp/18899594.d/maker_m8xwVp/0/34_2.22239-28465.gadMor2_code_braker2.auto_annotato r.augustus #-------------------------------# deleted:0 genes begin called get_best_alt_splices1 ...processing 0 of 2 ...processing 1 of 2 end called get_best_alt_splices1 ...processing 0 of 20 ...processing 1 of 20 ...processing 2 of 20 ...processing 3 of 20 ...processing 4 of 20 ...processing 5 of 20 ...processing 6 of 20 ...processing 7 of 20 ...processing 8 of 20 ...processing 9 of 20 ...processing 10 of 20 ...processing 11 of 20 ...processing 12 of 20 ...processing 13 of 20 ...processing 14 of 20 ...processing 15 of 20 ...processing 16 of 20 ...processing 17 of 20 ...processing 18 of 20 ...processing 19 of 20 substr outside of string at /projects/cees/bin/maker/maker-3.1.1/bin/../lib/PhatHit_utils.pm line 850. --> rank=NA, hostname=compute-31-18.local ERROR: Failed while annotating transcripts ERROR: Chunk failed at level:1, tier_type:4 FAILED CONTIG:GmG20150304_scaffold_8692 ERROR: Chunk failed at level:6, tier_type:0 FAILED CONTIG:GmG20150304_scaffold_8692 examining contents of the fasta file and run log From lcampbell at ebi.ac.uk Thu Nov 9 04:13:35 2017 From: lcampbell at ebi.ac.uk (Lahcen Campbell) Date: Thu, 9 Nov 2017 11:13:35 +0000 Subject: [maker-devel] Model training with AED=0.7 made all contigs FAILED Message-ID: Hi folks, I would just like some insight into a recent round of MAKER annotation I performed and returned back 0 Finished contigs. The genome is a white fly, which I successfully ran MAKER on initally with the first round of "Evidence in", so passing in EST evidence as aligned transcript gffs, protein homology evidence etc. The run was successful and produced a lot of good quality gene models Statistics: ???????????24,613 genes with 49,547 transcripts containing 141130 cds. Now, I know this count is very high for our species, so in the 2nd round (completed running over 1 night due to all contigs failing) I attempted to increase the threshold for support, by reducing AED to 0.7 from an initial 1. Prior to starting the second round I had trained SNAP on the first round results and also ran Augustus separately and? passed this via the snaphmm, pred_gff option. Finally I set min protein to be no less than 100Aa and set est2genome and prot2genome off to allow for gene model refinement. I checked the run today and all ~8,000 contigs/scaffolds returned as FAILED with all having tried to be retried once each. My initial feeling was, I feared I have just lost my initial set of 24,613 gene models. I know believe that this won't be the case but Im not sure... Can anyone explain what might have happened here and what consequences will follow given they all returned as failed ? Have they been deleted from the MAKER data store ? I had capturdD all 1st round MAKER output files (GFF, Fasta files etc) before attempting this 2nd round (i.e. 1st round of model training) of MAKER . If I have irrevocably changed the datastore for MAKER and lost those genes, might I be able to restore to an earlier point (say back to the first round of evidence in gene models) by passing the first MAKER gff in as "maker_gff=" / "pred_pass=1" / "model_pass=1" ? Any advice on this would be much appreciated Lahcen -------------- next part -------------- An HTML attachment was scrubbed... URL: From lahcencampbell at gmail.com Thu Nov 9 07:53:19 2017 From: lahcencampbell at gmail.com (lahcen campbell) Date: Thu, 9 Nov 2017 14:53:19 +0000 Subject: [maker-devel] Model training with AED=0.7 made all contigs FAILED Message-ID: Apologies this message was sent earlier today from an incorrect email address so it was flagged for verification. Hi folks, I would just like some insight into a recent round of MAKER annotation I performed and returned back 0 Finished contigs. The genome is a white fly, which I successfully ran MAKE initially with the first round of "Evidence in", so passing in EST evidence as aligned transcript gffs, protein homology evidence etc. The run was successful and produced a lot of good quality gene models Statistics: 24,613 genes with 49,547 transcripts containing 141130 cds. Now, I know this count is very high for our species, so in the 2nd round (completed running over 1 night due to all contigs failing) I attempted to increase the threshold for support, by reducing AED to 0.7 from an initial 1. Prior to starting the second round I had trained SNAP on the first round results and also ran Augustus separately and passed this via the snaphmm, pred_gff option. Finally I set min protein to be no less than 100Aa and set est2genome and prot2genome off to allow for gene model refinement. I checked the run today and all ~8,000 contigs/scaffolds returned as FAILED with all having tried to be retried once each. (Note I retried to run this time reverting the AED to 1, yet the same outcome happened again). The following error appears throughout the log file: *MAKER WARNING: The file MAKER.contigs_datastore/BF/41/tig00000234//theVoid.tig00000234/0/tig00000234.0.all.rb.out* *did not finish on the last run and must be erased* My initial feeling was, I feared I have just lost my initial set of 24,613 gene models. I now believe that this won't be the case but Im not sure... Can anyone explain what might have happened here and what consequences will follow given they all returned as failed ? Have they been deleted from the MAKER data store ? Are they retrievable ? I had capturd all 1st round MAKER output files (GFF, Fasta files etc) before attempting this 2nd round (i.e. 1st round of model training) of MAKER . If I have irrevocably changed the datastore for MAKER and lost those genes, might I be able to restore to an earlier point (say back to the first round of evidence in gene models) by passing the first MAKER gff in as "maker_gff=" / "pred_pass=1" / "model_pass=1" ? As it stands, maker2zff and fasta_merge / gff3_merge all return nothing or empty output files. So clearly my gene models have been altered somehow. Any advice on this would be much appreciated. Lahcen -- ========================================== > Dr. Lahcen Campbell < > Contact: lahcencampbell at gmail.com < > https://www.ebi.ac.uk/about/people/lahcen-campbell < ========================================== -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Nov 9 09:28:19 2017 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 9 Nov 2017 09:28:19 -0700 Subject: [maker-devel] substr outside of string in PhatHits_utils.pm In-Reply-To: References: Message-ID: <5E5CA836-91B1-4AA8-8DC3-68FB9885EB43@gmail.com> My first guess is that if you are using gff3 files as input to anything, then there may be an issue with your GFF3 file. My second suggestion is to try MAKER 3.02.02 to see if it has the same issue. ?Carson > On Nov 9, 2017, at 2:44 AM, Ole Kristian T?rresen wrote: > > Dear all, > I'm having an issue with MAKER which I'm unable to wrap my head around. Hopefully the issue is easily identifiable and resolvable for someone with more insight than me. Please find the log output attached below. I cannot find any more information than this in any logs. Many scaffolds do complete fine, but some of the longest ones have issues. > > Thank you. > > Sincerely, > Ole K. T?rresen > > Error message: > > #--------- command -------------# > Widget::augustus: > /projects/cees/bin/augustus/augustus-3.2.3/bin/augustus --strand=backward --species=gadMor2_code_braker2 --UTR=off --hintsfile=/tmp/18899594.d/maker_m8xwVp/0/34_2.22239-28465.gadMor2_code_brak > er2.auto_annotator.xdef.augustus --extrinsicCfgFile=/projects/cees/bin/augustus/augustus-3.2.3/config/extrinsic/extrinsic.MPE.cfg --AUGUSTUS_CONFIG_PATH=/projects/cees/bin/augustus/augustus-3.2 > .3/config /tmp/18899594.d/maker_m8xwVp/0/34_2.22239-28465.gadMor2_code_braker2.auto_annotator.augustus.fasta > /tmp/18899594.d/maker_m8xwVp/0/34_2.22239-28465.gadMor2_code_braker2.auto_annotato > r.augustus > #-------------------------------# > deleted:0 genes > begin called get_best_alt_splices1 > ...processing 0 of 2 > ...processing 1 of 2 > end called get_best_alt_splices1 > ...processing 0 of 20 > ...processing 1 of 20 > ...processing 2 of 20 > ...processing 3 of 20 > ...processing 4 of 20 > ...processing 5 of 20 > ...processing 6 of 20 > ...processing 7 of 20 > ...processing 8 of 20 > ...processing 9 of 20 > ...processing 10 of 20 > ...processing 11 of 20 > ...processing 12 of 20 > ...processing 13 of 20 > ...processing 14 of 20 > ...processing 15 of 20 > ...processing 16 of 20 > ...processing 17 of 20 > ...processing 18 of 20 > ...processing 19 of 20 > substr outside of string at /projects/cees/bin/maker/maker-3.1.1/bin/../lib/PhatHit_utils.pm line 850. > --> rank=NA, hostname=compute-31-18.local > ERROR: Failed while annotating transcripts > ERROR: Chunk failed at level:1, tier_type:4 > FAILED CONTIG:GmG20150304_scaffold_8692 > > ERROR: Chunk failed at level:6, tier_type:0 > FAILED CONTIG:GmG20150304_scaffold_8692 > > examining contents of the fasta file and run log > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Thu Nov 9 16:30:50 2017 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 9 Nov 2017 16:30:50 -0700 Subject: [maker-devel] Model training with AED=0.7 made all contigs FAILED In-Reply-To: References: Message-ID: There is probably an issue with the GFF3 file being passed in (I?m guessing the Augustus one). I would avoid passing in Augustus results as GFF3, it removes the ability of MAKER to dynamically provide Augustus with hints as it runs. You are essentially handicapping the pipeline. If your first genes were est2genome or protein2genome based, I would not pass them back in. Those models are suitable for training but will really reduce the accuracy of downstream final annotations (that is why we tell people to turn off est2genome/protein2genome after training a gene predictor in the MAKER documentation). Also if your inputs to the first round were GFF3 files it will have to be reread regardless. Any protein or transcript data that was aligned by MAEKR will still have the BLAST results archived, so you don?t need to worry about that unless you alter repeat masking options (which would cause it to rerun). Also if you are changing GFF3 file input between runs but using the same directory, you might want to delete any ?.db? files in the output folder. those hold an SQLite database of the GFF3 input that may be corrupted if it failed while attempting to update the database content with the Augustus gff3 file. ?Carson > On Nov 9, 2017, at 4:13 AM, Lahcen Campbell wrote: > > Hi folks, > > I would just like some insight into a recent round of MAKER annotation I performed and returned back 0 Finished contigs. > The genome is a white fly, which I successfully ran MAKER on initally with the first round of "Evidence in", so passing in EST evidence as aligned transcript gffs, protein homology evidence etc. The run was successful and produced a lot of good quality gene models > > > Statistics: > 24,613 genes with 49,547 transcripts containing 141130 cds. > > Now, I know this count is very high for our species, so in the 2nd round (completed running over 1 night due to all contigs failing) I attempted to increase the threshold for support, by reducing AED to 0.7 from an initial 1. Prior to starting the second round I had trained SNAP on the first round results and also ran Augustus separately and passed this via the snaphmm, pred_gff option. Finally I set min protein to be no less than 100Aa and set est2genome and prot2genome off to allow for gene model refinement. > > I checked the run today and all ~8,000 contigs/scaffolds returned as FAILED with all having tried to be retried once each. > > My initial feeling was, I feared I have just lost my initial set of 24,613 gene models. I know believe that this won't be the case but Im not sure... Can anyone explain what might have happened here and what consequences will follow given they all returned as failed ? Have they been deleted from the MAKER data store ? > > I had capturdD all 1st round MAKER output files (GFF, Fasta files etc) before attempting this 2nd round (i.e. 1st round of model training) of MAKER . > > If I have irrevocably changed the datastore for MAKER and lost those genes, might I be able to restore to an earlier point (say back to the first round of evidence in gene models) by passing the first MAKER gff in as "maker_gff=" / "pred_pass=1" / "model_pass=1" ? > > Any advice on this would be much appreciated > Lahcen > > > > > > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From lahcencampbell at gmail.com Tue Nov 14 05:15:10 2017 From: lahcencampbell at gmail.com (lahcen campbell) Date: Tue, 14 Nov 2017 12:15:10 +0000 Subject: [maker-devel] Unwarranted error: Skipping the contig because it is too short Message-ID: Hi MAKER community, I was hoping someone could help me. I have a very unusual error with two different versions of maker I have tested so far. This error shouldn't be happening but it occurs time and again no matter what I try. I have tried using 2.31.6_mpich3_icc and 2.31_mpich3 Note that version 2.31.6_mpich3_icc is one I have used countless times and produced final MAKER annotations without issue. So its not that this version has issues to date. Basically, this is a brand new MAKER analysis, I am only trying to train SNAP in this first round. I am following the MakerTutorial as documented this time around and I can't get past the initial SNAP train stage. I have a single genome file with, 10 Long scaffolds making up just under 11MB (subsampled from my original full length assembly) of sequence data in which to train SNAP. The fasta file is not corrupted, and has been generated in various ways in order to test formatting issues etc. I have only edited the maker_opts file and changed: *genome=* *protein=* *protein2genome=1* But see attached my maker CTL files. The error consistently returned to me: *Skipping the contig because it is too short!!* *SeqID: contig_WHATEVER* *Length: 0* *The sequences are no where near too short. This was verified independently outside maker to be sure. * *The headers are as follows:* >tig00000458 len=2889428 reads=4143 covStat=1793.77 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >tig00000159 len=3515005 reads=5100 covStat=2143.94 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >tig00006117 len=1009519 reads=1168 covStat=804.93 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >tig00000419 len=2633986 reads=3938 covStat=1519.93 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >tig00027677 len=108573 reads=86 covStat=86.05 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >tig00021790 len=202251 reads=158 covStat=184.12 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >tig00316948 len=280333 reads=237 covStat=253.23 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >tig00019606 len=149709 reads=82 covStat=150.02 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >tig00023852 len=189461 reads=115 covStat=192.28 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >tig00316994 len=19742 reads=1 covStat=0.00 gappedBases=no class=contig suggestRepeat=no suggestCircular=no I have just about given up, I have no idea why its happening it makes zero sense. Any help or information as to why this might be happening would be amazing. Thank you in advance. Lahcen -- ========================================== > Dr. Lahcen Campbell < > Contact: lahcencampbell at gmail.com < > https://www.ebi.ac.uk/about/people/lahcen-campbell < ========================================== -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_bopts.ctl Type: application/octet-stream Size: 1412 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_exe.ctl Type: application/octet-stream Size: 1511 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_opts.ctl Type: application/octet-stream Size: 5559 bytes Desc: not available URL: From michael.s.campbell1 at gmail.com Tue Nov 14 08:08:43 2017 From: michael.s.campbell1 at gmail.com (Michael Campbell) Date: Tue, 14 Nov 2017 10:08:43 -0500 Subject: [maker-devel] Unwarranted error: Skipping the contig because it is too short In-Reply-To: References: Message-ID: <3780BDEB-AF0E-4E27-9CD4-29CF0242FF9B@gmail.com> Hi Lahcen, Nothing comes right to mind for what could be causing this error. If you want to compress your FASTA and send it to me I can try and recreate the error and try and debug it. Thanks, Mike > On Nov 14, 2017, at 7:15 AM, lahcen campbell wrote: > > Hi MAKER community, > > I was hoping someone could help me. I have a very unusual error with two different versions of maker I have tested so far. This error shouldn't be happening but it occurs time and again no matter what I try. I have tried using 2.31.6_mpich3_icc and 2.31_mpich3 > > Note that version 2.31.6_mpich3_icc is one I have used countless times and produced final MAKER annotations without issue. So its not that this version has issues to date. > > Basically, this is a brand new MAKER analysis, I am only trying to train SNAP in this first round. I am following the MakerTutorial as documented this time around and I can't get past the initial SNAP train stage. > > I have a single genome file with, 10 Long scaffolds making up just under 11MB (subsampled from my original full length assembly) of sequence data in which to train SNAP. The fasta file is not corrupted, and has been generated in various ways in order to test formatting issues etc. > > I have only edited the maker_opts file and changed: > > genome= > protein= > protein2genome=1 > > But see attached my maker CTL files. > > The error consistently returned to me: > > Skipping the contig because it is too short!! > SeqID: contig_WHATEVER > Length: 0 > > The sequences are no where near too short. This was verified independently outside maker to be sure. > > The headers are as follows: > > >tig00000458 len=2889428 reads=4143 covStat=1793.77 gappedBases=no class=contig suggestRepeat=no suggestCircular=no > >tig00000159 len=3515005 reads=5100 covStat=2143.94 gappedBases=no class=contig suggestRepeat=no suggestCircular=no > >tig00006117 len=1009519 reads=1168 covStat=804.93 gappedBases=no class=contig suggestRepeat=no suggestCircular=no > >tig00000419 len=2633986 reads=3938 covStat=1519.93 gappedBases=no class=contig suggestRepeat=no suggestCircular=no > >tig00027677 len=108573 reads=86 covStat=86.05 gappedBases=no class=contig suggestRepeat=no suggestCircular=no > >tig00021790 len=202251 reads=158 covStat=184.12 gappedBases=no class=contig suggestRepeat=no suggestCircular=no > >tig00316948 len=280333 reads=237 covStat=253.23 gappedBases=no class=contig suggestRepeat=no suggestCircular=no > >tig00019606 len=149709 reads=82 covStat=150.02 gappedBases=no class=contig suggestRepeat=no suggestCircular=no > >tig00023852 len=189461 reads=115 covStat=192.28 gappedBases=no class=contig suggestRepeat=no suggestCircular=no > >tig00316994 len=19742 reads=1 covStat=0.00 gappedBases=no class=contig suggestRepeat=no suggestCircular=no > > I have just about given up, I have no idea why its happening it makes zero sense. > > Any help or information as to why this might be happening would be amazing. > > Thank you in advance. > Lahcen > > -- > ========================================== > > Dr. Lahcen Campbell < > > Contact: lahcencampbell at gmail.com < > > https://www.ebi.ac.uk/about/people/lahcen-campbell < > ========================================== > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.s.campbell1 at gmail.com Tue Nov 14 10:04:04 2017 From: michael.s.campbell1 at gmail.com (Michael Campbell) Date: Tue, 14 Nov 2017 12:04:04 -0500 Subject: [maker-devel] Unwarranted error: Skipping the contig because it is too short In-Reply-To: References: <3780BDEB-AF0E-4E27-9CD4-29CF0242FF9B@gmail.com> Message-ID: Hi Lancen, Thanks, the name has served me well for a number of years now :) So I started a run with your 11 scaffolds. I gave it the protein file that you sent and used all of repbase for masking. All of the scaffolds finished without error. I was hoping it would be something simple that just needed another set of eyes to see, looks like it's not the case for this one. To further rule out a data issue I would try running it with the dpp test data that is bundled with MAKER to see if you can get the same error. This data set will run in about a minute. If you are on a cluster I would try running it with and without submitting it you the nodes and with and without mpi. One thing that I have done in the past is to make a new directory and run maker there (this doesn't make a lot of sense but when the error doesn't make sense either it seems reasonable). As far as rerunning MAKER there are a couple of approaches. If you want it to stop complaining about trying to many times on failed contigs you can increase the number of tries in the opts file. The line looks like this: tries=2 #number of times to try a contig if there is a failure for some reason If you want to run it elsewhere, but you don't want to have to redo all of the repeat masking and blasting you can use the gff3 output from an earlier run. If you used gff3_merge after the first run finished you got a big gff3 file with all of the gene models and evidence. If you break up that file by the source column you can selectively pass the evidence back to MAKER. If you put all of the repeatmasker and repeatrunner entries into one file and pass it in on this line: rm_gff= #pre-identified repeat elements from an external GFF3 file you can turn off model_org= and repeat_protein=. This will speed up the next run a lot. Then you can pass in the protein2genome gff3 data on this line: protein_gff= #aligned protein homology evidence from an external GFF3 file Don't pass the blast gff3 data in. If you pass in gff3 data to maker is assumes that it is polished and will not make any effort to fix alignments. the protein2genome data is polished. est2genome is the equivalent for EST input. Clean_up is useful if you are running on a file system that limits the number of files that you can write. It removes all of the intermediate files used in the annotation. This takes away the advantage of rerunning in the same directory. clean_try deletes everything first, and starts again. clean_try is the one that deletes everything and pretends that the first run never happened. I ccd the list on this response just Incas anyone else has any ideas or is facing the same error. Let me know if any of this helps, Mike > On Nov 14, 2017, at 10:48 AM, lahcen campbell wrote: > > Hi Michael > > Nice name btw I have a Michael in my name too :) Lahcen Michael Campbell to be exact haha...anyway... thanks for the reply and offer to help. > > I have attached the file in question below. Its so strange, I had to just leave it alone cause it was making me quite frustrated. Those bugs which there are now common sense solutions are the worst cause very easily you reach a wall. > > Might it have anything at all to do with the Protein homology file I passed in ? Though, note.... the same protein files here have been used in another maker run without issue so I kind of ruled that out already.....but just spitballing at this stage. > > > Might I be so cheeky to ask you one more MAKER related question Michael... ? Feel free to ignore it I hate to push but im desperate to figure it out with little time to do so... > > I have an issue with a different MAKER analysis. Currently any new run I attempt on this datastore, which has one round successful with 25000 odd genes and double the transcripts. I attempted to run the second round with a SNAP trained hmm (first time passing in SNAP hmm following first round EST/Protein evidence). In this attempt, because we obtained so many genes I thought I would be more stringent by changing the AED to 0.7 from 1.0. Something I see now I didn't approach in the right way... too late now sadly. > > MAKER finishes fine, but now it views all previous scaffolds as FAILED. Nothing seems to change this and now the datastore is for all intents and purposes locked in failed state. It keeps mentioning changes to the opts file which there were, and that the previous runs didn't finish so it must delete them. The results obtained from round 1 are still there though Im pretty sure of that, all blast files etc are still there and populated. > > Can you tell me the main differences either clean_up or clean_try have and which will completely and irreversibly wipe the first run? Something I don't want to repeat, just allow me to progress to the next round. Im hesitant to run them, but I've backed up the datastore incase. My next attempt will be to pass the exact same maker_opts file from the round1 run, with the only change made to clean_try/clean_up....Is this approach misguided ? > > Your help is very much appreciated Michael so thank you, > Best > L > > ? > ?Combined_Protein_homology.fa.zip ?? > ?SubsampledGenomeFile_n10_11MB.fasta ? > > > > On Tue, Nov 14, 2017 at 3:08 PM, Michael Campbell > wrote: > Hi Lahcen, > > Nothing comes right to mind for what could be causing this error. If you want to compress your FASTA and send it to me I can try and recreate the error and try and debug it. > > Thanks, > Mike >> On Nov 14, 2017, at 7:15 AM, lahcen campbell > wrote: >> >> Hi MAKER community, >> >> I was hoping someone could help me. I have a very unusual error with two different versions of maker I have tested so far. This error shouldn't be happening but it occurs time and again no matter what I try. I have tried using 2.31.6_mpich3_icc and 2.31_mpich3 >> >> Note that version 2.31.6_mpich3_icc is one I have used countless times and produced final MAKER annotations without issue. So its not that this version has issues to date. >> >> Basically, this is a brand new MAKER analysis, I am only trying to train SNAP in this first round. I am following the MakerTutorial as documented this time around and I can't get past the initial SNAP train stage. >> >> I have a single genome file with, 10 Long scaffolds making up just under 11MB (subsampled from my original full length assembly) of sequence data in which to train SNAP. The fasta file is not corrupted, and has been generated in various ways in order to test formatting issues etc. >> >> I have only edited the maker_opts file and changed: >> >> genome= >> protein= >> protein2genome=1 >> >> But see attached my maker CTL files. >> >> The error consistently returned to me: >> >> Skipping the contig because it is too short!! >> SeqID: contig_WHATEVER >> Length: 0 >> >> The sequences are no where near too short. This was verified independently outside maker to be sure. >> >> The headers are as follows: >> >> >tig00000458 len=2889428 reads=4143 covStat=1793.77 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >> >tig00000159 len=3515005 reads=5100 covStat=2143.94 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >> >tig00006117 len=1009519 reads=1168 covStat=804.93 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >> >tig00000419 len=2633986 reads=3938 covStat=1519.93 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >> >tig00027677 len=108573 reads=86 covStat=86.05 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >> >tig00021790 len=202251 reads=158 covStat=184.12 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >> >tig00316948 len=280333 reads=237 covStat=253.23 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >> >tig00019606 len=149709 reads=82 covStat=150.02 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >> >tig00023852 len=189461 reads=115 covStat=192.28 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >> >tig00316994 len=19742 reads=1 covStat=0.00 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >> >> I have just about given up, I have no idea why its happening it makes zero sense. >> >> Any help or information as to why this might be happening would be amazing. >> >> Thank you in advance. >> Lahcen >> >> -- >> ========================================== >> > Dr. Lahcen Campbell < >> > Contact: lahcencampbell at gmail.com < >> > https://www.ebi.ac.uk/about/people/lahcen-campbell < >> ========================================== >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > > -- > ========================================== > > Dr. Lahcen Campbell < > > Contact: lahcencampbell at gmail.com < > > https://www.ebi.ac.uk/about/people/lahcen-campbell < > ========================================== -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Nov 14 10:17:03 2017 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 14 Nov 2017 10:17:03 -0700 Subject: [maker-devel] Unwarranted error: Skipping the contig because it is too short In-Reply-To: References: Message-ID: My first thought is that one of your entries has a header and no sequence. Try this command with the fasta you are using ?> fasta_tool file.fasta --length | sort -nrk2 fasta_tool comes with maker. That command will report empty fasta entries at the bottom of the list with length 0. Alternatively, MAKER accesses the input assembly using BioPerl. Update your BioPerl to the latest CPAN version (do not use BioPerl-live, as it will be less stable). Also BioPerl is using BerkleyDB for indexing, so if you are using a Perl that is not the system Perl (i.e. /usr/bin/perl), then it was lik,ly compiled on the machine you are using and could have been compiled without BerkleyDB support. ?Carson > On Nov 14, 2017, at 5:15 AM, lahcen campbell wrote: > > Hi MAKER community, > > I was hoping someone could help me. I have a very unusual error with two different versions of maker I have tested so far. This error shouldn't be happening but it occurs time and again no matter what I try. I have tried using 2.31.6_mpich3_icc and 2.31_mpich3 > > Note that version 2.31.6_mpich3_icc is one I have used countless times and produced final MAKER annotations without issue. So its not that this version has issues to date. > > Basically, this is a brand new MAKER analysis, I am only trying to train SNAP in this first round. I am following the MakerTutorial as documented this time around and I can't get past the initial SNAP train stage. > > I have a single genome file with, 10 Long scaffolds making up just under 11MB (subsampled from my original full length assembly) of sequence data in which to train SNAP. The fasta file is not corrupted, and has been generated in various ways in order to test formatting issues etc. > > I have only edited the maker_opts file and changed: > > genome= > protein= > protein2genome=1 > > But see attached my maker CTL files. > > The error consistently returned to me: > > Skipping the contig because it is too short!! > SeqID: contig_WHATEVER > Length: 0 > > The sequences are no where near too short. This was verified independently outside maker to be sure. > > The headers are as follows: > > >tig00000458 len=2889428 reads=4143 covStat=1793.77 gappedBases=no class=contig suggestRepeat=no suggestCircular=no > >tig00000159 len=3515005 reads=5100 covStat=2143.94 gappedBases=no class=contig suggestRepeat=no suggestCircular=no > >tig00006117 len=1009519 reads=1168 covStat=804.93 gappedBases=no class=contig suggestRepeat=no suggestCircular=no > >tig00000419 len=2633986 reads=3938 covStat=1519.93 gappedBases=no class=contig suggestRepeat=no suggestCircular=no > >tig00027677 len=108573 reads=86 covStat=86.05 gappedBases=no class=contig suggestRepeat=no suggestCircular=no > >tig00021790 len=202251 reads=158 covStat=184.12 gappedBases=no class=contig suggestRepeat=no suggestCircular=no > >tig00316948 len=280333 reads=237 covStat=253.23 gappedBases=no class=contig suggestRepeat=no suggestCircular=no > >tig00019606 len=149709 reads=82 covStat=150.02 gappedBases=no class=contig suggestRepeat=no suggestCircular=no > >tig00023852 len=189461 reads=115 covStat=192.28 gappedBases=no class=contig suggestRepeat=no suggestCircular=no > >tig00316994 len=19742 reads=1 covStat=0.00 gappedBases=no class=contig suggestRepeat=no suggestCircular=no > > I have just about given up, I have no idea why its happening it makes zero sense. > > Any help or information as to why this might be happening would be amazing. > > Thank you in advance. > Lahcen > > -- > ========================================== > > Dr. Lahcen Campbell < > > Contact: lahcencampbell at gmail.com < > > https://www.ebi.ac.uk/about/people/lahcen-campbell < > ========================================== > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From lahcencampbell at gmail.com Wed Nov 15 09:32:02 2017 From: lahcencampbell at gmail.com (lahcen campbell) Date: Wed, 15 Nov 2017 16:32:02 +0000 Subject: [maker-devel] Unwarranted error: Skipping the contig because it is too short In-Reply-To: References: <3780BDEB-AF0E-4E27-9CD4-29CF0242FF9B@gmail.com> Message-ID: Hi Michael and Carson Thank you both for your helpful input, I really appreciate it. See below for my comments... Best Lahcen On Tue, Nov 14, 2017 at 5:04 PM, Michael Campbell < michael.s.campbell1 at gmail.com> wrote: > Hi Lancen, > > Thanks, the name has served me well for a number of years now :) > Its a good name, I wouldn't change it haha :) > > So I started a run with your 11 scaffolds. I gave it the protein file that > you sent and used all of repbase for masking. All of the scaffolds finished > without error. I was hoping it would be something simple that just needed > another set of eyes to see, looks like it's not the case for this one. > > To further rule out a data issue I would try running it with the dpp test > data that is bundled with MAKER to see if you can get the same error. This > data set will run in about a minute. If you are on a cluster I would try > running it with and without submitting it you the nodes and with and > without mpi. > > One thing that I have done in the past is to make a new directory and run > maker there (this doesn't make a lot of sense but when the error doesn't > make sense either it seems reasonable). > First off, I can report good news regards the 0 lengths contigs I was getting back. Carson, your thoughts on Bioperl conflict issues seemed to be the main issue. Out cluster software environment had gone through some changes of late, so working off the basis of that I was able to load the right bash config which resulted in no more 0 length contig errors. Huzzah !! > As far as rerunning MAKER there are a couple of approaches. If you want it > to stop complaining about trying to many times on failed contigs you can > increase the number of tries in the opts file. The line looks like this: > > tries=2 #number of times to try a contig if there is a failure for some > reason > > If you want to run it elsewhere, but you don't want to have to redo all of > the repeat masking and blasting you can use the gff3 output from an earlier > run. If you used gff3_merge after the first run finished you got a big gff3 > file with all of the gene models and evidence. If you break up that file by > the source column you can selectively pass the evidence back to MAKER. If > you put all of the repeatmasker and repeatrunner entries into one file and > pass it in on this line: > Can I ask, because I can't seem to find any concrete info on best practices for parsing MAKER gffs to partition the various source column fields as you described Michael. Is there a commonly used way to partition MAKER gffs based on source column? Or will I need to code it up, I ask because I feel this must have been needed before many times by other users. > > rm_gff= #pre-identified repeat elements from an external GFF3 file > I will remove links to fasta files for both 'rmlib=' and 'repeat_protein=' > > you can turn off model_org= and repeat_protein=. This will speed up the > next run a lot. Then you can pass in the protein2genome gff3 data on this > line: > > protein_gff= #aligned protein homology evidence from an external GFF3 file > > Don't pass the blast gff3 data in. If you pass in gff3 data to maker is > assumes that it is polished and will not make any effort to fix alignments. > the protein2genome data is polished. est2genome is the equivalent for EST > input. > You say don't pass the blast as gff. As I pass in all other info via GFF3 and remove any evidence as fasta inputs... BLAST won't be called again right ? Ensuring the shortest possible rerun of MAKER to roll back to a uncorrupted state. I noticed that the only unique source field types in my MAKER GFF are as follows: *augustus_masked * *blastx* *maker* *protein2genome* *repeatmasker* *repeatrunner* I read on the dev group that passing est evidence as GFF won't actually call Exonerate, est2genome option just tells MAKER to try and turn polished EST alignments directly into genes.... so If I pass this info again as GFF it will simply use the same info as it did originally and not have to recompute anything ? Based on the above fields contained in my MAKER gff, which of the following options should I select to re-annotate based on this older run ? I suspect all the options below in green should be set to 1, and the others in red set to 0. *#-----Re-annotation Using MAKER Derived GFF3* ..... *est_pass=1 #use ESTs in maker_gff: 1 = yes, 0 = no* *altest_pass=0 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no* *protein_pass=1 #use protein alignments in maker_gff: 1 = yes, 0 = no* *rm_pass=1 #use repeats in maker_gff: 1 = yes, 0 = no* *model_pass=1 #use gene models in maker_gff: 1 = yes, 0 = no* *pred_pass=1 #use ab-initio predictions in maker_gff: 1 = yes, 0 = no* *other_pass=0 #passthrough anyything else in maker_gff: 1 = yes, 0 = no * I don't think I will pass back anything under augustus_masked as I didn't set that up correctly initially, instead passing in a precomputed augustus gff which Im told isn't the best way to run MAKER. So if I can get back to a state of not failing all contigs, I will run Augustus inside maker itself on the 2nd pass. Note though, I am aware of the order of things normally, but for this instance I will continue with what I have done with success previously. Lastly, as this next run will be updating based on previous generated MAKER gff data.... what states should est2genome and protein2genome be ? 1 or 0 ? Apologies for the lengthy email reply Michael. Much appreciated again, thank you !! L > Clean_up is useful if you are running on a file system that limits the > number of files that you can write. It removes all of the intermediate > files used in the annotation. This takes away the advantage of rerunning in > the same directory. clean_try deletes everything first, and starts again. > clean_try is the one that deletes everything and pretends that the first > run never happened. > > I ccd the list on this response just Incas anyone else has any ideas or is > facing the same error. > > Let me know if any of this helps, > Mike > > On Nov 14, 2017, at 10:48 AM, lahcen campbell > wrote: > > Hi Michael > > Nice name btw I have a Michael in my name too :) Lahcen Michael Campbell > to be exact haha...anyway... thanks for the reply and offer to help. > > I have attached the file in question below. Its so strange, I had to just > leave it alone cause it was making me quite frustrated. Those bugs which > there are now common sense solutions are the worst cause very easily you > reach a wall. > > Might it have anything at all to do with the Protein homology file I > passed in ? Though, note.... the same protein files here have been used in > another maker run without issue so I kind of ruled that out already.....but > just spitballing at this stage. > > > Might I be so cheeky to ask you one more MAKER related question Michael... > ? Feel free to ignore it I hate to push but im desperate to figure it out > with little time to do so... > > I have an issue with a different MAKER analysis. Currently any new run I > attempt on this datastore, which has one round successful with 25000 odd > genes and double the transcripts. I attempted to run the second round with > a SNAP trained hmm (first time passing in SNAP hmm following first round > EST/Protein evidence). In this attempt, because we obtained so many genes I > thought I would be more stringent by changing the AED to 0.7 from 1.0. > Something I see now I didn't approach in the right way... too late now > sadly. > > MAKER finishes fine, but now it views all previous scaffolds as FAILED. > Nothing seems to change this and now the datastore is for all intents and > purposes locked in failed state. It keeps mentioning changes to the opts > file which there were, and that the previous runs didn't finish so it must > delete them. The results obtained from round 1 are still there though Im > pretty sure of that, all blast files etc are still there and populated. > > Can you tell me the main differences either clean_up or clean_try have and > which will completely and irreversibly wipe the first run? Something I > don't want to repeat, just allow me to progress to the next round. Im > hesitant to run them, but I've backed up the datastore incase. My next > attempt will be to pass the exact same maker_opts file from the round1 run, > with the only change made to clean_try/clean_up....Is this approach > misguided ? > > Your help is very much appreciated Michael so thank you, > Best > L > > ? > Combined_Protein_homology.fa.zip > > ?? > SubsampledGenomeFile_n10_11MB.fasta > > ? > > > > On Tue, Nov 14, 2017 at 3:08 PM, Michael Campbell < > michael.s.campbell1 at gmail.com> wrote: > >> Hi Lahcen, >> >> Nothing comes right to mind for what could be causing this error. If you >> want to compress your FASTA and send it to me I can try and recreate the >> error and try and debug it. >> >> Thanks, >> Mike >> >> On Nov 14, 2017, at 7:15 AM, lahcen campbell >> wrote: >> >> Hi MAKER community, >> >> I was hoping someone could help me. I have a very unusual error with two >> different versions of maker I have tested so far. This error shouldn't be >> happening but it occurs time and again no matter what I try. I have tried >> using 2.31.6_mpich3_icc and 2.31_mpich3 >> >> Note that version 2.31.6_mpich3_icc is one I have used countless times >> and produced final MAKER annotations without issue. So its not that this >> version has issues to date. >> >> Basically, this is a brand new MAKER analysis, I am only trying to train >> SNAP in this first round. I am following the MakerTutorial as documented >> this time around and I can't get past the initial SNAP train stage. >> >> I have a single genome file with, 10 Long scaffolds making up just under >> 11MB (subsampled from my original full length assembly) of sequence data in >> which to train SNAP. The fasta file is not corrupted, and has been >> generated in various ways in order to test formatting issues etc. >> >> I have only edited the maker_opts file and changed: >> >> *genome=* >> *protein=* >> *protein2genome=1* >> >> But see attached my maker CTL files. >> >> The error consistently returned to me: >> >> *Skipping the contig because it is too short!!* >> *SeqID: contig_WHATEVER* >> *Length: 0* >> >> *The sequences are no where near too short. This was verified >> independently outside maker to be sure. * >> >> *The headers are as follows:* >> >> >tig00000458 len=2889428 reads=4143 covStat=1793.77 gappedBases=no >> class=contig suggestRepeat=no suggestCircular=no >> >tig00000159 len=3515005 reads=5100 covStat=2143.94 gappedBases=no >> class=contig suggestRepeat=no suggestCircular=no >> >tig00006117 len=1009519 reads=1168 covStat=804.93 gappedBases=no >> class=contig suggestRepeat=no suggestCircular=no >> >tig00000419 len=2633986 reads=3938 covStat=1519.93 gappedBases=no >> class=contig suggestRepeat=no suggestCircular=no >> >tig00027677 len=108573 reads=86 covStat=86.05 gappedBases=no >> class=contig suggestRepeat=no suggestCircular=no >> >tig00021790 len=202251 reads=158 covStat=184.12 gappedBases=no >> class=contig suggestRepeat=no suggestCircular=no >> >tig00316948 len=280333 reads=237 covStat=253.23 gappedBases=no >> class=contig suggestRepeat=no suggestCircular=no >> >tig00019606 len=149709 reads=82 covStat=150.02 gappedBases=no >> class=contig suggestRepeat=no suggestCircular=no >> >tig00023852 len=189461 reads=115 covStat=192.28 gappedBases=no >> class=contig suggestRepeat=no suggestCircular=no >> >tig00316994 len=19742 reads=1 covStat=0.00 gappedBases=no class=contig >> suggestRepeat=no suggestCircular=no >> >> I have just about given up, I have no idea why its happening it makes >> zero sense. >> >> Any help or information as to why this might be happening would be >> amazing. >> >> Thank you in advance. >> Lahcen >> >> -- >> ========================================== >> > Dr. Lahcen Campbell < >> > Contact: lahcencampbell at gmail.com < >> > https://www.ebi.ac.uk/about/people/lahcen-campbell < >> ========================================== >> ____________ >> ___________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> > > > -- > ========================================== > > Dr. Lahcen Campbell < > > Contact: lahcencampbell at gmail.com < > > https://www.ebi.ac.uk/about/people/lahcen-campbell < > ========================================== > > > -- ========================================== > Dr. Lahcen Campbell < > Contact: lahcencampbell at gmail.com < > https://www.ebi.ac.uk/about/people/lahcen-campbell < ========================================== -------------- next part -------------- An HTML attachment was scrubbed... URL: From lahcencampbell at gmail.com Wed Nov 15 09:56:20 2017 From: lahcencampbell at gmail.com (lahcen campbell) Date: Wed, 15 Nov 2017 16:56:20 +0000 Subject: [maker-devel] Unwarranted error: Skipping the contig because it is too short In-Reply-To: References: <3780BDEB-AF0E-4E27-9CD4-29CF0242FF9B@gmail.com> Message-ID: Just an add on to this topic.... I have found a suite of gff utilities here which I hope can help me quickly parse the MAKER gff. https://github.com/mamarjan/gff3-pltools I'll report back how it goes ! Best L On Tue, Nov 14, 2017 at 5:04 PM, Michael Campbell < michael.s.campbell1 at gmail.com> wrote: > Hi Lancen, > > Thanks, the name has served me well for a number of years now :) > > So I started a run with your 11 scaffolds. I gave it the protein file that > you sent and used all of repbase for masking. All of the scaffolds finished > without error. I was hoping it would be something simple that just needed > another set of eyes to see, looks like it's not the case for this one. > > To further rule out a data issue I would try running it with the dpp test > data that is bundled with MAKER to see if you can get the same error. This > data set will run in about a minute. If you are on a cluster I would try > running it with and without submitting it you the nodes and with and > without mpi. > > One thing that I have done in the past is to make a new directory and run > maker there (this doesn't make a lot of sense but when the error doesn't > make sense either it seems reasonable). > > As far as rerunning MAKER there are a couple of approaches. If you want it > to stop complaining about trying to many times on failed contigs you can > increase the number of tries in the opts file. The line looks like this: > > tries=2 #number of times to try a contig if there is a failure for some > reason > > If you want to run it elsewhere, but you don't want to have to redo all of > the repeat masking and blasting you can use the gff3 output from an earlier > run. If you used gff3_merge after the first run finished you got a big gff3 > file with all of the gene models and evidence. If you break up that file by > the source column you can selectively pass the evidence back to MAKER. If > you put all of the repeatmasker and repeatrunner entries into one file and > pass it in on this line: > > rm_gff= #pre-identified repeat elements from an external GFF3 file > > you can turn off model_org= and repeat_protein=. This will speed up the > next run a lot. Then you can pass in the protein2genome gff3 data on this > line: > > protein_gff= #aligned protein homology evidence from an external GFF3 file > > Don't pass the blast gff3 data in. If you pass in gff3 data to maker is > assumes that it is polished and will not make any effort to fix alignments. > the protein2genome data is polished. est2genome is the equivalent for EST > input. > > Clean_up is useful if you are running on a file system that limits the > number of files that you can write. It removes all of the intermediate > files used in the annotation. This takes away the advantage of rerunning in > the same directory. clean_try deletes everything first, and starts again. > clean_try is the one that deletes everything and pretends that the first > run never happened. > > I ccd the list on this response just Incas anyone else has any ideas or is > facing the same error. > > Let me know if any of this helps, > Mike > > On Nov 14, 2017, at 10:48 AM, lahcen campbell > wrote: > > Hi Michael > > Nice name btw I have a Michael in my name too :) Lahcen Michael Campbell > to be exact haha...anyway... thanks for the reply and offer to help. > > I have attached the file in question below. Its so strange, I had to just > leave it alone cause it was making me quite frustrated. Those bugs which > there are now common sense solutions are the worst cause very easily you > reach a wall. > > Might it have anything at all to do with the Protein homology file I > passed in ? Though, note.... the same protein files here have been used in > another maker run without issue so I kind of ruled that out already.....but > just spitballing at this stage. > > > Might I be so cheeky to ask you one more MAKER related question Michael... > ? Feel free to ignore it I hate to push but im desperate to figure it out > with little time to do so... > > I have an issue with a different MAKER analysis. Currently any new run I > attempt on this datastore, which has one round successful with 25000 odd > genes and double the transcripts. I attempted to run the second round with > a SNAP trained hmm (first time passing in SNAP hmm following first round > EST/Protein evidence). In this attempt, because we obtained so many genes I > thought I would be more stringent by changing the AED to 0.7 from 1.0. > Something I see now I didn't approach in the right way... too late now > sadly. > > MAKER finishes fine, but now it views all previous scaffolds as FAILED. > Nothing seems to change this and now the datastore is for all intents and > purposes locked in failed state. It keeps mentioning changes to the opts > file which there were, and that the previous runs didn't finish so it must > delete them. The results obtained from round 1 are still there though Im > pretty sure of that, all blast files etc are still there and populated. > > Can you tell me the main differences either clean_up or clean_try have and > which will completely and irreversibly wipe the first run? Something I > don't want to repeat, just allow me to progress to the next round. Im > hesitant to run them, but I've backed up the datastore incase. My next > attempt will be to pass the exact same maker_opts file from the round1 run, > with the only change made to clean_try/clean_up....Is this approach > misguided ? > > Your help is very much appreciated Michael so thank you, > Best > L > > ? > Combined_Protein_homology.fa.zip > > ?? > SubsampledGenomeFile_n10_11MB.fasta > > ? > > > > On Tue, Nov 14, 2017 at 3:08 PM, Michael Campbell < > michael.s.campbell1 at gmail.com> wrote: > >> Hi Lahcen, >> >> Nothing comes right to mind for what could be causing this error. If you >> want to compress your FASTA and send it to me I can try and recreate the >> error and try and debug it. >> >> Thanks, >> Mike >> >> On Nov 14, 2017, at 7:15 AM, lahcen campbell >> wrote: >> >> Hi MAKER community, >> >> I was hoping someone could help me. I have a very unusual error with two >> different versions of maker I have tested so far. This error shouldn't be >> happening but it occurs time and again no matter what I try. I have tried >> using 2.31.6_mpich3_icc and 2.31_mpich3 >> >> Note that version 2.31.6_mpich3_icc is one I have used countless times >> and produced final MAKER annotations without issue. So its not that this >> version has issues to date. >> >> Basically, this is a brand new MAKER analysis, I am only trying to train >> SNAP in this first round. I am following the MakerTutorial as documented >> this time around and I can't get past the initial SNAP train stage. >> >> I have a single genome file with, 10 Long scaffolds making up just under >> 11MB (subsampled from my original full length assembly) of sequence data in >> which to train SNAP. The fasta file is not corrupted, and has been >> generated in various ways in order to test formatting issues etc. >> >> I have only edited the maker_opts file and changed: >> >> *genome=* >> *protein=* >> *protein2genome=1* >> >> But see attached my maker CTL files. >> >> The error consistently returned to me: >> >> *Skipping the contig because it is too short!!* >> *SeqID: contig_WHATEVER* >> *Length: 0* >> >> *The sequences are no where near too short. This was verified >> independently outside maker to be sure. * >> >> *The headers are as follows:* >> >> >tig00000458 len=2889428 reads=4143 covStat=1793.77 gappedBases=no >> class=contig suggestRepeat=no suggestCircular=no >> >tig00000159 len=3515005 reads=5100 covStat=2143.94 gappedBases=no >> class=contig suggestRepeat=no suggestCircular=no >> >tig00006117 len=1009519 reads=1168 covStat=804.93 gappedBases=no >> class=contig suggestRepeat=no suggestCircular=no >> >tig00000419 len=2633986 reads=3938 covStat=1519.93 gappedBases=no >> class=contig suggestRepeat=no suggestCircular=no >> >tig00027677 len=108573 reads=86 covStat=86.05 gappedBases=no >> class=contig suggestRepeat=no suggestCircular=no >> >tig00021790 len=202251 reads=158 covStat=184.12 gappedBases=no >> class=contig suggestRepeat=no suggestCircular=no >> >tig00316948 len=280333 reads=237 covStat=253.23 gappedBases=no >> class=contig suggestRepeat=no suggestCircular=no >> >tig00019606 len=149709 reads=82 covStat=150.02 gappedBases=no >> class=contig suggestRepeat=no suggestCircular=no >> >tig00023852 len=189461 reads=115 covStat=192.28 gappedBases=no >> class=contig suggestRepeat=no suggestCircular=no >> >tig00316994 len=19742 reads=1 covStat=0.00 gappedBases=no class=contig >> suggestRepeat=no suggestCircular=no >> >> I have just about given up, I have no idea why its happening it makes >> zero sense. >> >> Any help or information as to why this might be happening would be >> amazing. >> >> Thank you in advance. >> Lahcen >> >> -- >> ========================================== >> > Dr. Lahcen Campbell < >> > Contact: lahcencampbell at gmail.com < >> > https://www.ebi.ac.uk/about/people/lahcen-campbell < >> ========================================== >> ____________ >> ___________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> > > > -- > ========================================== > > Dr. Lahcen Campbell < > > Contact: lahcencampbell at gmail.com < > > https://www.ebi.ac.uk/about/people/lahcen-campbell < > ========================================== > > > -- ========================================== > Dr. Lahcen Campbell < > Contact: lahcencampbell at gmail.com < > https://www.ebi.ac.uk/about/people/lahcen-campbell < ========================================== -------------- next part -------------- An HTML attachment was scrubbed... URL: From qwzhang0601 at gmail.com Thu Nov 16 12:46:39 2017 From: qwzhang0601 at gmail.com (Quanwei Zhang) Date: Thu, 16 Nov 2017 14:46:39 -0500 Subject: [maker-devel] About loss of Histone H2A, H2B, H4 Message-ID: Hello: We have annotated a new rodent genome using Maker2. Based on the annotated maker2 gene sets, we did gene family expansion/contraction analysis using CAFE. We found Histone H2A, H2B, H4 gene families are under contraction. I wonder whether there are known bias to predict those gene families using Maker2? For example, can this due to repeat masking of the genome? I used repeatmaker and generated species specific repeat libraries follows http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction--Basic . Thanks Best Quanwei -------------- next part -------------- An HTML attachment was scrubbed... URL: From qwzhang0601 at gmail.com Thu Nov 16 12:46:39 2017 From: qwzhang0601 at gmail.com (Quanwei Zhang) Date: Thu, 16 Nov 2017 14:46:39 -0500 Subject: [maker-devel] About loss of Histone H2A, H2B, H4 Message-ID: Hello: We have annotated a new rodent genome using Maker2. Based on the annotated maker2 gene sets, we did gene family expansion/contraction analysis using CAFE. We found Histone H2A, H2B, H4 gene families are under contraction. I wonder whether there are known bias to predict those gene families using Maker2? For example, can this due to repeat masking of the genome? I used repeatmaker and generated species specific repeat libraries follows http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction--Basic . Thanks Best Quanwei -------------- next part -------------- An HTML attachment was scrubbed... URL: From mcsimenc at gmail.com Fri Nov 17 18:39:25 2017 From: mcsimenc at gmail.com (Matt Simenc) Date: Fri, 17 Nov 2017 17:39:25 -0800 Subject: [maker-devel] 99.98% of repeatmasker features on plus strand, anyone else seen this? Message-ID: Hi everybody, I just noticed that the vast majority of features with type repeatmasker are on the plus strand in my MAKER GFFs. There are a handful on the minus strand. Has anyone else seen that in their MAKER GFFs? MAKER 2.31.8 I looked at a standalone RepeatMasker run I did and the features are more evenly distributed between the +/- strands. Matt -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Nov 17 19:09:20 2017 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 17 Nov 2017 19:09:20 -0700 Subject: [maker-devel] 99.98% of repeatmasker features on plus strand, anyone else seen this? In-Reply-To: References: Message-ID: <0DC818BC-EA36-43EA-9237-003BE07C4434@gmail.com> While transposons that encode proteins will technically have a strand, simple repeats and many others do not so the algorithms used to find them will not necessarily assign a strand. For this reason the repeats are treated as strand-less since both strands are masked and are they are arbitrarily assigned to the plus strand to avoid issues with genome browsers that cannot handle strandless features. ?Carson > On Nov 17, 2017, at 6:39 PM, Matt Simenc wrote: > > Hi everybody, > > I just noticed that the vast majority of features with type repeatmasker are on the plus strand in my MAKER GFFs. There are a handful on the minus strand. Has anyone else seen that in their MAKER GFFs? > > MAKER 2.31.8 > > I looked at a standalone RepeatMasker run I did and the features are more evenly distributed between the +/- strands. > > > Matt > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Fri Nov 17 19:23:34 2017 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 17 Nov 2017 19:23:34 -0700 Subject: [maker-devel] 99.98% of repeatmasker features on plus strand, anyone else seen this? In-Reply-To: References: Message-ID: Also MAKER clusters overlapping repeats to generate the best masking of the assembly. For the GFF3 it then assigns the name of the repeat encompassing the greatest portion of the cluster to the feature (i.e. the best representative). But the cluster is technically build from overlapping repeats on both strands (repeats tend to jump on top of other repeats, so they stack with bits and pieces of other repeats at the edges). Yet another reason why everything is just assigned to the plus strand. ?Carson > On Nov 17, 2017, at 6:39 PM, Matt Simenc wrote: > > Hi everybody, > > I just noticed that the vast majority of features with type repeatmasker are on the plus strand in my MAKER GFFs. There are a handful on the minus strand. Has anyone else seen that in their MAKER GFFs? > > MAKER 2.31.8 > > I looked at a standalone RepeatMasker run I did and the features are more evenly distributed between the +/- strands. > > > Matt > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From mcsimenc at gmail.com Sat Nov 18 09:27:25 2017 From: mcsimenc at gmail.com (Matt Simenc) Date: Sat, 18 Nov 2017 08:27:25 -0800 Subject: [maker-devel] 99.98% of repeatmasker features on plus strand, anyone else seen this? In-Reply-To: References: Message-ID: Ah ok. A messy problem! I need to approximate strandedness for TE loci if possible so will do some post processing using blast/hmmer to Repbase and Dfam. Thanks for the speedy response Carson! On Fri, Nov 17, 2017 at 6:23 PM, Carson Holt wrote: > Also MAKER clusters overlapping repeats to generate the best masking of > the assembly. For the GFF3 it then assigns the name of the repeat > encompassing the greatest portion of the cluster to the feature (i.e. the > best representative). But the cluster is technically build from overlapping > repeats on both strands (repeats tend to jump on top of other repeats, so > they stack with bits and pieces of other repeats at the edges). Yet another > reason why everything is just assigned to the plus strand. > > ?Carson > > > > On Nov 17, 2017, at 6:39 PM, Matt Simenc wrote: > > > > Hi everybody, > > > > I just noticed that the vast majority of features with type repeatmasker > are on the plus strand in my MAKER GFFs. There are a handful on the minus > strand. Has anyone else seen that in their MAKER GFFs? > > > > MAKER 2.31.8 > > > > I looked at a standalone RepeatMasker run I did and the features are > more evenly distributed between the +/- strands. > > > > > > Matt > > _______________________________________________ > > maker-devel mailing list > > maker-devel at box290.bluehost.com > > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.s.campbell1 at gmail.com Wed Nov 15 14:50:45 2017 From: michael.s.campbell1 at gmail.com (Michael Campbell) Date: Wed, 15 Nov 2017 16:50:45 -0500 Subject: [maker-devel] Unwarranted error: Skipping the contig because it is too short In-Reply-To: References: <3780BDEB-AF0E-4E27-9CD4-29CF0242FF9B@gmail.com> Message-ID: <4157C9FE-1F5D-4320-A03F-2344C1DBD81C@gmail.com> Hi Lahcen, I put some answers below. > On Nov 15, 2017, at 11:32 AM, lahcen campbell wrote: > > Hi Michael and Carson > > Thank you both for your helpful input, I really appreciate it. > > See below for my comments... > > Best > Lahcen > > > On Tue, Nov 14, 2017 at 5:04 PM, Michael Campbell > wrote: > Hi Lancen, > > Thanks, the name has served me well for a number of years now :) > > Its a good name, I wouldn't change it haha :) > > > So I started a run with your 11 scaffolds. I gave it the protein file that you sent and used all of repbase for masking. All of the scaffolds finished without error. I was hoping it would be something simple that just needed another set of eyes to see, looks like it's not the case for this one. > > To further rule out a data issue I would try running it with the dpp test data that is bundled with MAKER to see if you can get the same error. This data set will run in about a minute. If you are on a cluster I would try running it with and without submitting it you the nodes and with and without mpi. > > One thing that I have done in the past is to make a new directory and run maker there (this doesn't make a lot of sense but when the error doesn't make sense either it seems reasonable). > > First off, I can report good news regards the 0 lengths contigs I was getting back. Carson, your thoughts on Bioperl conflict issues seemed to be the main issue. Out cluster software environment had gone through some changes of late, so working off the basis of that I was able to load the right bash config which resulted in no more 0 length contig errors. Huzzah !! > Great > > As far as rerunning MAKER there are a couple of approaches. If you want it to stop complaining about trying to many times on failed contigs you can increase the number of tries in the opts file. The line looks like this: > > tries=2 #number of times to try a contig if there is a failure for some reason > > If you want to run it elsewhere, but you don't want to have to redo all of the repeat masking and blasting you can use the gff3 output from an earlier run. If you used gff3_merge after the first run finished you got a big gff3 file with all of the gene models and evidence. If you break up that file by the source column you can selectively pass the evidence back to MAKER. If you put all of the repeatmasker and repeatrunner entries into one file and pass it in on this line: > > Can I ask, because I can't seem to find any concrete info on best practices for parsing MAKER gffs to partition the various source column fields as you described Michael. > > Is there a commonly used way to partition MAKER gffs based on source column? Or will I need to code it up, I ask because I feel this must have been needed before many times by other users. > I've got a script that will do it if you want it. Since you don't need all of the entries grep is probably as easy as anyting. grep -P '\tsource\t' > > rm_gff= #pre-identified repeat elements from an external GFF3 file > > I will remove links to fasta files for both 'rmlib=' and 'repeat_protein=' > Yep > > you can turn off model_org= and repeat_protein=. This will speed up the next run a lot. Then you can pass in the protein2genome gff3 data on this line: > > protein_gff= #aligned protein homology evidence from an external GFF3 file > > Don't pass the blast gff3 data in. If you pass in gff3 data to maker is assumes that it is polished and will not make any effort to fix alignments. the protein2genome data is polished. est2genome is the equivalent for EST input. > > You say don't pass the blast as gff. As I pass in all other info via GFF3 and remove any evidence as fasta inputs... BLAST won't be called again right ? Ensuring the shortest possible rerun of MAKER to roll back to a uncorrupted state. > Right. blast will not be called as long as you remove or comment out the paths to the fastas in the est= and protein= lines. > I noticed that the only unique source field types in my MAKER GFF are as follows: > augustus_masked > blastx > maker > protein2genome > repeatmasker > repeatrunner > That look right for the run you described > I read on the dev group that passing est evidence as GFF won't actually call Exonerate, est2genome option just tells MAKER to try and turn polished EST alignments directly into genes.... so If I pass this info again as GFF it will simply use the same info as it did originally and not have to recompute anything ? > > Based on the above fields contained in my MAKER gff, which of the following options should I select to re-annotate based on this older run ? I suspect all the options below in green should be set to 1, and the others in red set to 0. > > #-----Re-annotation Using MAKER Derived GFF3 > ..... > est_pass=1 #use ESTs in maker_gff: 1 = yes, 0 = no > altest_pass=0 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no > protein_pass=1 #use protein alignments in maker_gff: 1 = yes, 0 = no > rm_pass=1 #use repeats in maker_gff: 1 = yes, 0 = no > model_pass=1 #use gene models in maker_gff: 1 = yes, 0 = no > pred_pass=1 #use ab-initio predictions in maker_gff: 1 = yes, 0 = no > other_pass=0 #passthrough anyything else in maker_gff: 1 = yes, 0 = no > You don't need model_pass or pred_pass if you plan on running gene finders > I don't think I will pass back anything under augustus_masked as I didn't set that up correctly initially, instead passing in a precomputed augustus gff which Im told isn't the best way to run MAKER. So if I can get back to a state of not failing all contigs, I will run Augustus inside maker itself on the 2nd pass. Note though, I am aware of the order of things normally, but for this instance I will continue with what I have done with success previously. Yeah, when I have issues with failing contigs I'll pull stuff out until it starts running without error, then I add things back until something breaks. > Lastly, as this next run will be updating based on previous generated MAKER gff data.... what states should est2genome and protein2genome be ? 1 or 0 ? 0 those options are just for generating gene models directly from evidence when you don't have any gene finders trained. When you say updating do you mean reusing evidence from previous runs and generating new gene annotations or are you taking existing gene models and adding new evidence to see if they can be improved? > > Apologies for the lengthy email reply Michael. Much appreciated again, thank you !! No Worries, hope it helps. > > L > > > Clean_up is useful if you are running on a file system that limits the number of files that you can write. It removes all of the intermediate files used in the annotation. This takes away the advantage of rerunning in the same directory. clean_try deletes everything first, and starts again. clean_try is the one that deletes everything and pretends that the first run never happened. > > I ccd the list on this response just Incas anyone else has any ideas or is facing the same error. > > Let me know if any of this helps, > Mike > >> On Nov 14, 2017, at 10:48 AM, lahcen campbell > wrote: >> >> Hi Michael >> >> Nice name btw I have a Michael in my name too :) Lahcen Michael Campbell to be exact haha...anyway... thanks for the reply and offer to help. >> >> I have attached the file in question below. Its so strange, I had to just leave it alone cause it was making me quite frustrated. Those bugs which there are now common sense solutions are the worst cause very easily you reach a wall. >> >> Might it have anything at all to do with the Protein homology file I passed in ? Though, note.... the same protein files here have been used in another maker run without issue so I kind of ruled that out already.....but just spitballing at this stage. >> >> >> Might I be so cheeky to ask you one more MAKER related question Michael... ? Feel free to ignore it I hate to push but im desperate to figure it out with little time to do so... >> >> I have an issue with a different MAKER analysis. Currently any new run I attempt on this datastore, which has one round successful with 25000 odd genes and double the transcripts. I attempted to run the second round with a SNAP trained hmm (first time passing in SNAP hmm following first round EST/Protein evidence). In this attempt, because we obtained so many genes I thought I would be more stringent by changing the AED to 0.7 from 1.0. Something I see now I didn't approach in the right way... too late now sadly. >> >> MAKER finishes fine, but now it views all previous scaffolds as FAILED. Nothing seems to change this and now the datastore is for all intents and purposes locked in failed state. It keeps mentioning changes to the opts file which there were, and that the previous runs didn't finish so it must delete them. The results obtained from round 1 are still there though Im pretty sure of that, all blast files etc are still there and populated. >> >> Can you tell me the main differences either clean_up or clean_try have and which will completely and irreversibly wipe the first run? Something I don't want to repeat, just allow me to progress to the next round. Im hesitant to run them, but I've backed up the datastore incase. My next attempt will be to pass the exact same maker_opts file from the round1 run, with the only change made to clean_try/clean_up....Is this approach misguided ? >> >> Your help is very much appreciated Michael so thank you, >> Best >> L >> >> ? >> ?Combined_Protein_homology.fa.zip ?? >> ?SubsampledGenomeFile_n10_11MB.fasta ? >> >> >> >> On Tue, Nov 14, 2017 at 3:08 PM, Michael Campbell > wrote: >> Hi Lahcen, >> >> Nothing comes right to mind for what could be causing this error. If you want to compress your FASTA and send it to me I can try and recreate the error and try and debug it. >> >> Thanks, >> Mike >>> On Nov 14, 2017, at 7:15 AM, lahcen campbell > wrote: >>> >>> Hi MAKER community, >>> >>> I was hoping someone could help me. I have a very unusual error with two different versions of maker I have tested so far. This error shouldn't be happening but it occurs time and again no matter what I try. I have tried using 2.31.6_mpich3_icc and 2.31_mpich3 >>> >>> Note that version 2.31.6_mpich3_icc is one I have used countless times and produced final MAKER annotations without issue. So its not that this version has issues to date. >>> >>> Basically, this is a brand new MAKER analysis, I am only trying to train SNAP in this first round. I am following the MakerTutorial as documented this time around and I can't get past the initial SNAP train stage. >>> >>> I have a single genome file with, 10 Long scaffolds making up just under 11MB (subsampled from my original full length assembly) of sequence data in which to train SNAP. The fasta file is not corrupted, and has been generated in various ways in order to test formatting issues etc. >>> >>> I have only edited the maker_opts file and changed: >>> >>> genome= >>> protein= >>> protein2genome=1 >>> >>> But see attached my maker CTL files. >>> >>> The error consistently returned to me: >>> >>> Skipping the contig because it is too short!! >>> SeqID: contig_WHATEVER >>> Length: 0 >>> >>> The sequences are no where near too short. This was verified independently outside maker to be sure. >>> >>> The headers are as follows: >>> >>> >tig00000458 len=2889428 reads=4143 covStat=1793.77 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >>> >tig00000159 len=3515005 reads=5100 covStat=2143.94 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >>> >tig00006117 len=1009519 reads=1168 covStat=804.93 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >>> >tig00000419 len=2633986 reads=3938 covStat=1519.93 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >>> >tig00027677 len=108573 reads=86 covStat=86.05 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >>> >tig00021790 len=202251 reads=158 covStat=184.12 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >>> >tig00316948 len=280333 reads=237 covStat=253.23 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >>> >tig00019606 len=149709 reads=82 covStat=150.02 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >>> >tig00023852 len=189461 reads=115 covStat=192.28 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >>> >tig00316994 len=19742 reads=1 covStat=0.00 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >>> >>> I have just about given up, I have no idea why its happening it makes zero sense. >>> >>> Any help or information as to why this might be happening would be amazing. >>> >>> Thank you in advance. >>> Lahcen >>> >>> -- >>> ========================================== >>> > Dr. Lahcen Campbell < >>> > Contact: lahcencampbell at gmail.com < >>> > https://www.ebi.ac.uk/about/people/lahcen-campbell < >>> ========================================== >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> >> >> -- >> ========================================== >> > Dr. Lahcen Campbell < >> > Contact: lahcencampbell at gmail.com < >> > https://www.ebi.ac.uk/about/people/lahcen-campbell < >> ========================================== > > > > > -- > ========================================== > > Dr. Lahcen Campbell < > > Contact: lahcencampbell at gmail.com < > > https://www.ebi.ac.uk/about/people/lahcen-campbell < > ========================================== -------------- next part -------------- An HTML attachment was scrubbed... URL: From scott at scottcain.net Mon Nov 20 18:57:09 2017 From: scott at scottcain.net (Scott Cain) Date: Mon, 20 Nov 2017 20:57:09 -0500 Subject: [maker-devel] GMOD hackathon before PAG San Diego in January In-Reply-To: References: Message-ID: Hello, This is an update on the hackathon. It is a go; the hackathon page is up on GMOD.org: http://gmod.org/wiki/2018_PAG_Hackathon And the EventBrite page is up at https://www.eventbrite.com/e/gmod-2018-pag-hackathon-tickets-39700164260 Tickets are $50 which covers the costs associated with the room and lunch on the first day. Please feel free to add suggested topics to the wiki page, or send the suggestions to me to add. Thanks, Scott On Thursday, October 12, 2017, Scott Cain wrote: > Hi all, > > This January before PAG on the Wednesday and Thursday before PAG (January > 10-11) in San Diego we are planning a GMOD hackathon. We expect that > participants will be interested in solving problems/creating solutions > related to Tripal, JBrowse, Apollo, and Galaxy but if you're interested in > another GMOD project, by all means, let us know! We expect this hackathon > to overlap with the Tripal hackathon that is on January 11 (I'm pretty > sure; right Stephen?) > > If you are interested in attending this hackathon, please let me know so I > can be sure we have an appropriately sized space. And if you're coming for > the pre-PAG hackathon, consider staying for PAG, since there is always a > lot of GMOD-related content at the meeting! > > Thanks, > Scott > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain > dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research -------------- next part -------------- An HTML attachment was scrubbed... URL: From o.k.torresen at ibv.uio.no Tue Nov 21 06:57:46 2017 From: o.k.torresen at ibv.uio.no (=?utf-8?B?T2xlIEtyaXN0aWFuIFTDuHJyZXNlbg==?=) Date: Tue, 21 Nov 2017 13:57:46 +0000 Subject: [maker-devel] substr outside of string in PhatHits_utils.pm In-Reply-To: <5E5CA836-91B1-4AA8-8DC3-68FB9885EB43@gmail.com> References: <5E5CA836-91B1-4AA8-8DC3-68FB9885EB43@gmail.com> Message-ID: <182CDDD3-A108-4095-9AC4-A2C198D34107@ibv.uio.no> Thank you Carson. After a bit of struggling, I can confirm that the same error occurs in MAKER 3.01.2 (I guess you meant that version, couldn?t find 3.02.02). I am providing a GFF to est_gff, with match and match_part entries. For at least one of the scaffolds, the last coordinate (column 5) is the same number as the length of the scaffold. That should be allowed by the GFF3 standard, right? How can I troubleshoot this? The error message is not so informative. It seems that PhatHit_utils.pm tries to find a stop codon. Snipped from that file, lines 849-850: #fix stop codon by walking downstream my $has_stop = $tM->is_ter_codon(substr($transcript_seq, $end-1-3, 3)); The GFF I am using was the output of Mikado (https://www.biorxiv.org/content/early/2017/11/09/216994), which is GFF3, and then processed a bit to make it suitable for MAKER. First converted to GTF by 'mikado util convert mikado.loci.gff3 mikado.loci.gtf' Then I selected only mRNA and exon entries, and changed mRNA to transcript to make it look like cufflinks output (and set a dummy score): grep -P "\tmRNA\t|\texon\t" mikado.loci.gtf |sed "s/mRNA/transcript/g" |awk -F "\t" '{$9=$9"cov \"10.0\";"; OFS="\t"; print $1, $2, $3, $4, $5, $6, $7, $8, $9}' > mikado.loci.score.gtf Before converting with cufflinks2gff3: cufflinks2gff3 mikado.loci.score.gtf > ests.score.gff3 Thank you. Ole > On 09 Nov 2017, at 17:28, Carson Holt wrote: > > My first guess is that if you are using gff3 files as input to anything, then there may be an issue with your GFF3 file. My second suggestion is to try MAKER 3.02.02 to see if it has the same issue. > > ?Carson > > >> On Nov 9, 2017, at 2:44 AM, Ole Kristian T?rresen wrote: >> >> Dear all, >> I'm having an issue with MAKER which I'm unable to wrap my head around. Hopefully the issue is easily identifiable and resolvable for someone with more insight than me. Please find the log output attached below. I cannot find any more information than this in any logs. Many scaffolds do complete fine, but some of the longest ones have issues. >> >> Thank you. >> >> Sincerely, >> Ole K. T?rresen >> >> Error message: >> >> #--------- command -------------# >> Widget::augustus: >> /projects/cees/bin/augustus/augustus-3.2.3/bin/augustus --strand=backward --species=gadMor2_code_braker2 --UTR=off --hintsfile=/tmp/18899594.d/maker_m8xwVp/0/34_2.22239-28465.gadMor2_code_brak >> er2.auto_annotator.xdef.augustus --extrinsicCfgFile=/projects/cees/bin/augustus/augustus-3.2.3/config/extrinsic/extrinsic.MPE.cfg --AUGUSTUS_CONFIG_PATH=/projects/cees/bin/augustus/augustus-3.2 >> .3/config /tmp/18899594.d/maker_m8xwVp/0/34_2.22239-28465.gadMor2_code_braker2.auto_annotator.augustus.fasta > /tmp/18899594.d/maker_m8xwVp/0/34_2.22239-28465.gadMor2_code_braker2.auto_annotato >> r.augustus >> #-------------------------------# >> deleted:0 genes >> begin called get_best_alt_splices1 >> ...processing 0 of 2 >> ...processing 1 of 2 >> end called get_best_alt_splices1 >> ...processing 0 of 20 >> ...processing 1 of 20 >> ...processing 2 of 20 >> ...processing 3 of 20 >> ...processing 4 of 20 >> ...processing 5 of 20 >> ...processing 6 of 20 >> ...processing 7 of 20 >> ...processing 8 of 20 >> ...processing 9 of 20 >> ...processing 10 of 20 >> ...processing 11 of 20 >> ...processing 12 of 20 >> ...processing 13 of 20 >> ...processing 14 of 20 >> ...processing 15 of 20 >> ...processing 16 of 20 >> ...processing 17 of 20 >> ...processing 18 of 20 >> ...processing 19 of 20 >> substr outside of string at /projects/cees/bin/maker/maker-3.1.1/bin/../lib/PhatHit_utils.pm line 850. >> --> rank=NA, hostname=compute-31-18.local >> ERROR: Failed while annotating transcripts >> ERROR: Chunk failed at level:1, tier_type:4 >> FAILED CONTIG:GmG20150304_scaffold_8692 >> >> ERROR: Chunk failed at level:6, tier_type:0 >> FAILED CONTIG:GmG20150304_scaffold_8692 >> >> examining contents of the fasta file and run log >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > From carsonhh at gmail.com Tue Nov 21 09:19:36 2017 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 21 Nov 2017 09:19:36 -0700 Subject: [maker-devel] About loss of Histone H2A, H2B, H4 In-Reply-To: References: Message-ID: <3A172BAF-DD5C-4CA8-8D1C-6EFF30A1FFA4@gmail.com> No known biases, but if you are concerned, you can collect known Histone H2A, H2B, H4 proteins and transcripts from other species (protein= and altest= options), them run MAKER with no masking to see if you gain any models that may have been overlooked because of over-masking of repeats. Make sure to evaluate any models you find as being a pseudogene. Run InterProScan on results to make sure they contain known InterPro domains for that gene family as well. Running without repeat masking will increase sensitivity but also false positives derived from low homology alignments to simple repeats which is why you need to evaluate results using something like InterProScan. Also run BUSCO to evaluate the completeness of the genome. Make sure that the observed contraction is not just a result of an incomplete assembly. ?Carson > On Nov 16, 2017, at 12:46 PM, Quanwei Zhang wrote: > > Hello: > > We have annotated a new rodent genome using Maker2. Based on the annotated maker2 gene sets, we did gene family expansion/contraction analysis using CAFE. We found Histone H2A, H2B, H4 gene families are under contraction. I wonder whether there are known bias to predict those gene families using Maker2? For example, can this due to repeat masking of the genome? I used repeatmaker and generated species specific repeat libraries follows http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction--Basic . > > Thanks > > Best > Quanwei > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Nov 21 09:22:58 2017 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 21 Nov 2017 09:22:58 -0700 Subject: [maker-devel] Unwarranted error: Skipping the contig because it is too short In-Reply-To: <4157C9FE-1F5D-4320-A03F-2344C1DBD81C@gmail.com> References: <3780BDEB-AF0E-4E27-9CD4-29CF0242FF9B@gmail.com> <4157C9FE-1F5D-4320-A03F-2344C1DBD81C@gmail.com> Message-ID: <172954D4-7D27-4929-8BC1-B0292F8D9BDB@gmail.com> Just one note I want to add here. When you use GFF3 to pass in results as opposed to letting MAKER use the raw alignments, you lose the ability of MAKER to base some decisions on reading frame match since you lose both the alignment sequence and cigar string of the alignment. So MAKER just assumes correct ORF and sequence match rather than evaluating it (this will make AED scores artificially better for some models). ?Carson > On Nov 15, 2017, at 2:50 PM, Michael Campbell wrote: > > Hi Lahcen, > > I put some answers below. >> On Nov 15, 2017, at 11:32 AM, lahcen campbell > wrote: >> >> Hi Michael and Carson >> >> Thank you both for your helpful input, I really appreciate it. >> >> See below for my comments... >> >> Best >> Lahcen >> >> >> On Tue, Nov 14, 2017 at 5:04 PM, Michael Campbell > wrote: >> Hi Lancen, >> >> Thanks, the name has served me well for a number of years now :) >> >> Its a good name, I wouldn't change it haha :) >> >> >> So I started a run with your 11 scaffolds. I gave it the protein file that you sent and used all of repbase for masking. All of the scaffolds finished without error. I was hoping it would be something simple that just needed another set of eyes to see, looks like it's not the case for this one. >> >> To further rule out a data issue I would try running it with the dpp test data that is bundled with MAKER to see if you can get the same error. This data set will run in about a minute. If you are on a cluster I would try running it with and without submitting it you the nodes and with and without mpi. >> >> One thing that I have done in the past is to make a new directory and run maker there (this doesn't make a lot of sense but when the error doesn't make sense either it seems reasonable). >> >> First off, I can report good news regards the 0 lengths contigs I was getting back. Carson, your thoughts on Bioperl conflict issues seemed to be the main issue. Out cluster software environment had gone through some changes of late, so working off the basis of that I was able to load the right bash config which resulted in no more 0 length contig errors. Huzzah !! >> Great >> >> As far as rerunning MAKER there are a couple of approaches. If you want it to stop complaining about trying to many times on failed contigs you can increase the number of tries in the opts file. The line looks like this: >> >> tries=2 #number of times to try a contig if there is a failure for some reason >> >> If you want to run it elsewhere, but you don't want to have to redo all of the repeat masking and blasting you can use the gff3 output from an earlier run. If you used gff3_merge after the first run finished you got a big gff3 file with all of the gene models and evidence. If you break up that file by the source column you can selectively pass the evidence back to MAKER. If you put all of the repeatmasker and repeatrunner entries into one file and pass it in on this line: >> >> Can I ask, because I can't seem to find any concrete info on best practices for parsing MAKER gffs to partition the various source column fields as you described Michael. >> >> Is there a commonly used way to partition MAKER gffs based on source column? Or will I need to code it up, I ask because I feel this must have been needed before many times by other users. >> I've got a script that will do it if you want it. Since you don't need all of the entries grep is probably as easy as anyting. grep -P '\tsource\t' >> >> rm_gff= #pre-identified repeat elements from an external GFF3 file >> >> I will remove links to fasta files for both 'rmlib=' and 'repeat_protein=' >> Yep >> >> you can turn off model_org= and repeat_protein=. This will speed up the next run a lot. Then you can pass in the protein2genome gff3 data on this line: >> >> protein_gff= #aligned protein homology evidence from an external GFF3 file >> >> Don't pass the blast gff3 data in. If you pass in gff3 data to maker is assumes that it is polished and will not make any effort to fix alignments. the protein2genome data is polished. est2genome is the equivalent for EST input. >> >> You say don't pass the blast as gff. As I pass in all other info via GFF3 and remove any evidence as fasta inputs... BLAST won't be called again right ? Ensuring the shortest possible rerun of MAKER to roll back to a uncorrupted state. >> Right. blast will not be called as long as you remove or comment out the paths to the fastas in the est= and protein= lines. > >> I noticed that the only unique source field types in my MAKER GFF are as follows: >> augustus_masked >> blastx >> maker >> protein2genome >> repeatmasker >> repeatrunner >> That look right for the run you described >> I read on the dev group that passing est evidence as GFF won't actually call Exonerate, est2genome option just tells MAKER to try and turn polished EST alignments directly into genes.... so If I pass this info again as GFF it will simply use the same info as it did originally and not have to recompute anything ? >> >> Based on the above fields contained in my MAKER gff, which of the following options should I select to re-annotate based on this older run ? I suspect all the options below in green should be set to 1, and the others in red set to 0. >> >> #-----Re-annotation Using MAKER Derived GFF3 >> ..... >> est_pass=1 #use ESTs in maker_gff: 1 = yes, 0 = no >> altest_pass=0 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no >> protein_pass=1 #use protein alignments in maker_gff: 1 = yes, 0 = no >> rm_pass=1 #use repeats in maker_gff: 1 = yes, 0 = no >> model_pass=1 #use gene models in maker_gff: 1 = yes, 0 = no >> pred_pass=1 #use ab-initio predictions in maker_gff: 1 = yes, 0 = no >> other_pass=0 #passthrough anyything else in maker_gff: 1 = yes, 0 = no >> > You don't need model_pass or pred_pass if you plan on running gene finders >> I don't think I will pass back anything under augustus_masked as I didn't set that up correctly initially, instead passing in a precomputed augustus gff which Im told isn't the best way to run MAKER. So if I can get back to a state of not failing all contigs, I will run Augustus inside maker itself on the 2nd pass. Note though, I am aware of the order of things normally, but for this instance I will continue with what I have done with success previously. > Yeah, when I have issues with failing contigs I'll pull stuff out until it starts running without error, then I add things back until something breaks. > >> Lastly, as this next run will be updating based on previous generated MAKER gff data.... what states should est2genome and protein2genome be ? 1 or 0 ? > 0 those options are just for generating gene models directly from evidence when you don't have any gene finders trained. When you say updating do you mean reusing evidence from previous runs and generating new gene annotations or are you taking existing gene models and adding new evidence to see if they can be improved? >> >> Apologies for the lengthy email reply Michael. Much appreciated again, thank you !! > No Worries, hope it helps. >> >> L >> >> >> Clean_up is useful if you are running on a file system that limits the number of files that you can write. It removes all of the intermediate files used in the annotation. This takes away the advantage of rerunning in the same directory. clean_try deletes everything first, and starts again. clean_try is the one that deletes everything and pretends that the first run never happened. >> >> I ccd the list on this response just Incas anyone else has any ideas or is facing the same error. >> >> Let me know if any of this helps, >> Mike >> >>> On Nov 14, 2017, at 10:48 AM, lahcen campbell > wrote: >>> >>> Hi Michael >>> >>> Nice name btw I have a Michael in my name too :) Lahcen Michael Campbell to be exact haha...anyway... thanks for the reply and offer to help. >>> >>> I have attached the file in question below. Its so strange, I had to just leave it alone cause it was making me quite frustrated. Those bugs which there are now common sense solutions are the worst cause very easily you reach a wall. >>> >>> Might it have anything at all to do with the Protein homology file I passed in ? Though, note.... the same protein files here have been used in another maker run without issue so I kind of ruled that out already.....but just spitballing at this stage. >>> >>> >>> Might I be so cheeky to ask you one more MAKER related question Michael... ? Feel free to ignore it I hate to push but im desperate to figure it out with little time to do so... >>> >>> I have an issue with a different MAKER analysis. Currently any new run I attempt on this datastore, which has one round successful with 25000 odd genes and double the transcripts. I attempted to run the second round with a SNAP trained hmm (first time passing in SNAP hmm following first round EST/Protein evidence). In this attempt, because we obtained so many genes I thought I would be more stringent by changing the AED to 0.7 from 1.0. Something I see now I didn't approach in the right way... too late now sadly. >>> >>> MAKER finishes fine, but now it views all previous scaffolds as FAILED. Nothing seems to change this and now the datastore is for all intents and purposes locked in failed state. It keeps mentioning changes to the opts file which there were, and that the previous runs didn't finish so it must delete them. The results obtained from round 1 are still there though Im pretty sure of that, all blast files etc are still there and populated. >>> >>> Can you tell me the main differences either clean_up or clean_try have and which will completely and irreversibly wipe the first run? Something I don't want to repeat, just allow me to progress to the next round. Im hesitant to run them, but I've backed up the datastore incase. My next attempt will be to pass the exact same maker_opts file from the round1 run, with the only change made to clean_try/clean_up....Is this approach misguided ? >>> >>> Your help is very much appreciated Michael so thank you, >>> Best >>> L >>> >>> ? >>> ?Combined_Protein_homology.fa.zip ?? >>> ?SubsampledGenomeFile_n10_11MB.fasta ? >>> >>> >>> >>> On Tue, Nov 14, 2017 at 3:08 PM, Michael Campbell > wrote: >>> Hi Lahcen, >>> >>> Nothing comes right to mind for what could be causing this error. If you want to compress your FASTA and send it to me I can try and recreate the error and try and debug it. >>> >>> Thanks, >>> Mike >>>> On Nov 14, 2017, at 7:15 AM, lahcen campbell > wrote: >>>> >>>> Hi MAKER community, >>>> >>>> I was hoping someone could help me. I have a very unusual error with two different versions of maker I have tested so far. This error shouldn't be happening but it occurs time and again no matter what I try. I have tried using 2.31.6_mpich3_icc and 2.31_mpich3 >>>> >>>> Note that version 2.31.6_mpich3_icc is one I have used countless times and produced final MAKER annotations without issue. So its not that this version has issues to date. >>>> >>>> Basically, this is a brand new MAKER analysis, I am only trying to train SNAP in this first round. I am following the MakerTutorial as documented this time around and I can't get past the initial SNAP train stage. >>>> >>>> I have a single genome file with, 10 Long scaffolds making up just under 11MB (subsampled from my original full length assembly) of sequence data in which to train SNAP. The fasta file is not corrupted, and has been generated in various ways in order to test formatting issues etc. >>>> >>>> I have only edited the maker_opts file and changed: >>>> >>>> genome= >>>> protein= >>>> protein2genome=1 >>>> >>>> But see attached my maker CTL files. >>>> >>>> The error consistently returned to me: >>>> >>>> Skipping the contig because it is too short!! >>>> SeqID: contig_WHATEVER >>>> Length: 0 >>>> >>>> The sequences are no where near too short. This was verified independently outside maker to be sure. >>>> >>>> The headers are as follows: >>>> >>>> >tig00000458 len=2889428 reads=4143 covStat=1793.77 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >>>> >tig00000159 len=3515005 reads=5100 covStat=2143.94 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >>>> >tig00006117 len=1009519 reads=1168 covStat=804.93 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >>>> >tig00000419 len=2633986 reads=3938 covStat=1519.93 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >>>> >tig00027677 len=108573 reads=86 covStat=86.05 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >>>> >tig00021790 len=202251 reads=158 covStat=184.12 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >>>> >tig00316948 len=280333 reads=237 covStat=253.23 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >>>> >tig00019606 len=149709 reads=82 covStat=150.02 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >>>> >tig00023852 len=189461 reads=115 covStat=192.28 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >>>> >tig00316994 len=19742 reads=1 covStat=0.00 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >>>> >>>> I have just about given up, I have no idea why its happening it makes zero sense. >>>> >>>> Any help or information as to why this might be happening would be amazing. >>>> >>>> Thank you in advance. >>>> Lahcen >>>> >>>> -- >>>> ========================================== >>>> > Dr. Lahcen Campbell < >>>> > Contact: lahcencampbell at gmail.com < >>>> > https://www.ebi.ac.uk/about/people/lahcen-campbell < >>>> ========================================== >>>> _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at box290.bluehost.com >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> >>> >>> >>> -- >>> ========================================== >>> > Dr. Lahcen Campbell < >>> > Contact: lahcencampbell at gmail.com < >>> > https://www.ebi.ac.uk/about/people/lahcen-campbell < >>> ========================================== >> >> >> >> >> -- >> ========================================== >> > Dr. Lahcen Campbell < >> > Contact: lahcencampbell at gmail.com < >> > https://www.ebi.ac.uk/about/people/lahcen-campbell < >> ========================================== > -------------- next part -------------- An HTML attachment was scrubbed... URL: From qwzhang0601 at gmail.com Tue Nov 21 10:42:38 2017 From: qwzhang0601 at gmail.com (Quanwei Zhang) Date: Tue, 21 Nov 2017 12:42:38 -0500 Subject: [maker-devel] About loss of Histone H2A, H2B, H4 In-Reply-To: <3A172BAF-DD5C-4CA8-8D1C-6EFF30A1FFA4@gmail.com> References: <3A172BAF-DD5C-4CA8-8D1C-6EFF30A1FFA4@gmail.com> Message-ID: Dear Carson: Thank you for your comments and suggestions. Now the SNAP was trained with repeat masked, is it necessary to retrain the predictor without repeat masking? By BUSCO analysis on the genome, the completeness is shown as below. Now I am doing the analysis using the default reports of Maker2 (i.e., gene models with evidence support, the default build). For the gene loss, besides you suggestions I am also considering to do the analysis using the gene models with evidence support plus those with scanned domains (i.e., standard build). How do you think? C:95.0%[S:92.7%,D:2.3%],F:2.2%,M:2.8%,n:4104 3902 Complete BUSCOs (C) 3806 Complete and single-copy BUSCOs (S) 96 Complete and duplicated BUSCOs (D) 92 Fragmented BUSCOs (F) 110 Missing BUSCOs (M) Thanks Best Quanwei 2017-11-21 11:19 GMT-05:00 Carson Holt : > No known biases, but if you are concerned, you can collect known Histone > H2A, H2B, H4 proteins and transcripts from other species (protein= and > altest= options), them run MAKER with no masking to see if you gain any > models that may have been overlooked because of over-masking of repeats. > Make sure to evaluate any models you find as being a pseudogene. Run > InterProScan on results to make sure they contain known InterPro domains > for that gene family as well. Running without repeat masking will increase > sensitivity but also false positives derived from low homology alignments > to simple repeats which is why you need to evaluate results using something > like InterProScan. > > Also run BUSCO to evaluate the completeness of the genome. Make sure that > the observed contraction is not just a result of an incomplete assembly. > > ?Carson > > > On Nov 16, 2017, at 12:46 PM, Quanwei Zhang wrote: > > Hello: > > We have annotated a new rodent genome using Maker2. Based on the annotated > maker2 gene sets, we did gene family expansion/contraction analysis using > CAFE. We found Histone H2A, H2B, H4 gene families are under contraction. I > wonder whether there are known bias to predict those gene families using > Maker2? For example, can this due to repeat masking of the genome? I used > repeatmaker and generated species specific repeat libraries follows > http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/ > Repeat_Library_Construction--Basic. > > Thanks > > Best > Quanwei > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wanghai01 at caas.cn Mon Nov 27 06:18:36 2017 From: wanghai01 at caas.cn (HAI WANG) Date: Mon, 27 Nov 2017 08:18:36 -0500 Subject: [maker-devel] Need your help on maker pipeline Message-ID: <000601d36782$3e24e0d0$ba6ea270$@cn> Dear Professor Yandell, I am Hai Wang, a visiting scholar in Cornell University. I am sorry to bother you, but I really need your help. I am now using the maker pipeline to annotate a maize genome. The installation of maker, openmpi and other software should be OK since I've successfully run maker on your example data. But when I ran maker on my own maize genome, I always got the following error: A process has executed an operation involving a call to the "fork()" system call to create a child process. Open MPI is currently operating in a condition that could result in memory corruption or other system errors; your job may hang, crash, or produce silent data corruption. The use of fork() (or system() or other calls that create child processes) is strongly discouraged. The process that invoked fork was: Local host: [[21269,1],0] (PID 12537) If you are *absolutely sure* that your application will successfully and correctly survive a call to fork(), you may disable this warning by setting the mpi_warn_on_fork MCA parameter to 0. -------------------------------------------------------------------------- -------------------------------------------------------------------------- mpiexec noticed that process rank 32 with PID 0 on node fat1 exited on signal 11 (Segmentation fault). -------------------------------------------------------------------------- Could you please help me with this issue? Or is there a way that I can resume this job when it stops? Thank you very much! Best, Hai Wang -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Nov 27 12:45:57 2017 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 27 Nov 2017 12:45:57 -0700 Subject: [maker-devel] Need your help on maker pipeline In-Reply-To: <000601d36782$3e24e0d0$ba6ea270$@cn> References: <000601d36782$3e24e0d0$ba6ea270$@cn> Message-ID: The parameters needed to get OpenMPI to work with MAKER are described in the ?/maker/INSTALL file (specifically look at LD_PRELOAD and -mca btl ^openib) ?> !!IMPORTANT!! MAKER is not compatible with MVAPICH2. Use OpenMPI or MPICH. If using MPICH, make sure to enable shared libaries during installation (this is not the default). If using OpenMPI, make sure to set LD_PRELOAD to the location of libmpi.so before even trying to install MAKER. It must also be set before running MAKER (or any program that uses OpenMPI's shared libraries), so it's best just to add it to your ~/.bash_profile. (i.e. export LD_PRELOAD=/usr/local/openmpi/lib/libmpi.so). 1. Say yes to the 'configure for MPI' question when running 'perl Build.PL' in step 1 of the EASY INSTALL. 2. Give path to 'mpicc'. Note to make sure you do not give the path to 'mpicc' from another MPI flavor that might be installed on your system. 3. Give path to the folder containing 'mpi,h'. Note to make sure you do not give the path to a folder from another MPI flavor that might be installed on your system. Mixing MPI flavors for 'mpicc' and 'mpi.h' will cause failures. Make sure to read and confirm the auto-detected paths. 4. Finish installation according to steps 2-4 of the EASY INSTALL Note: For OpenMPI you may also want to set OMPI_MCA_mpi_warn_on_fork=0 in your ~/.bash_profile to turn off certain nonfatal warnings. Note: If jobs hang or freeze when using mpiexec under OpenMPI try adding the '-mca btl ^openib' flag to mpiexec command when running MAKER. Example: mpiexec -mca btl ^openib -n 20 maker Then to diable the forks warning, just add the parameter --mca mpi_warn_on_fork 0 to the mpiexec options as described in the warning. How to run with OpenMPi has also been covered extensively ibn the MAKER list archives and more detail can be found there ?> https://groups.google.com/forum/#!searchin/maker-devel/openmpi%7Csort:date Thanks, Carson > On Nov 27, 2017, at 6:18 AM, HAI WANG wrote: > > Dear Professor Yandell, > > I am Hai Wang, a visiting scholar in Cornell University. I am sorry to bother you, but I really need your help. I am now using the maker pipeline to annotate a maize genome. The installation of maker, openmpi and other software should be OK since I?ve successfully run maker on your example data. > > But when I ran maker on my own maize genome, I always got the following error: > > > A process has executed an operation involving a call to the > "fork()" system call to create a child process. Open MPI is currently > operating in a condition that could result in memory corruption or > other system errors; your job may hang, crash, or produce silent > data corruption. The use of fork() (or system() or other calls that > create child processes) is strongly discouraged. > > The process that invoked fork was: > > Local host: [[21269,1],0] (PID 12537) > > If you are *absolutely sure* that your application will successfully > and correctly survive a call to fork(), you may disable this warning > by setting the mpi_warn_on_fork MCA parameter to 0. > -------------------------------------------------------------------------- > -------------------------------------------------------------------------- > mpiexec noticed that process rank 32 with PID 0 on node fat1 exited on signal 11 (Segmentation fault). > -------------------------------------------------------------------------- > > Could you please help me with this issue? Or is there a way that I can resume this job when it stops? Thank you very much! > > Best, > Hai Wang > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Nov 27 12:56:04 2017 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 27 Nov 2017 12:56:04 -0700 Subject: [maker-devel] About loss of Histone H2A, H2B, H4 In-Reply-To: References: <3A172BAF-DD5C-4CA8-8D1C-6EFF30A1FFA4@gmail.com> Message-ID: You should not have to train separately for SNAP on unmasked sequence, and I do believe adding back genes that were rejected because of lack of support but contain an identifiable domain may help. These will be in the fasta files labeled non-overlapping file in the datastore. ?Carson > On Nov 21, 2017, at 10:42 AM, Quanwei Zhang wrote: > > Dear Carson: > > Thank you for your comments and suggestions. Now the SNAP was trained with repeat masked, is it necessary to retrain the predictor without repeat masking? > By BUSCO analysis on the genome, the completeness is shown as below. Now I am doing the analysis using the default reports of Maker2 (i.e., gene models with evidence support, the default build). For the gene loss, besides you suggestions I am also considering to do the analysis using the gene models with evidence support plus those with scanned domains (i.e., standard build). How do you think? > > > C:95.0%[S:92.7%,D:2.3%],F:2.2%,M:2.8%,n:4104 > 3902 Complete BUSCOs (C) > 3806 Complete and single-copy BUSCOs (S) > 96 Complete and duplicated BUSCOs (D) > 92 Fragmented BUSCOs (F) > 110 Missing BUSCOs (M) > > Thanks > Best > Quanwei > > > 2017-11-21 11:19 GMT-05:00 Carson Holt >: > No known biases, but if you are concerned, you can collect known Histone H2A, H2B, H4 proteins and transcripts from other species (protein= and altest= options), them run MAKER with no masking to see if you gain any models that may have been overlooked because of over-masking of repeats. Make sure to evaluate any models you find as being a pseudogene. Run InterProScan on results to make sure they contain known InterPro domains for that gene family as well. Running without repeat masking will increase sensitivity but also false positives derived from low homology alignments to simple repeats which is why you need to evaluate results using something like InterProScan. > > Also run BUSCO to evaluate the completeness of the genome. Make sure that the observed contraction is not just a result of an incomplete assembly. > > ?Carson > > >> On Nov 16, 2017, at 12:46 PM, Quanwei Zhang > wrote: >> >> Hello: >> >> We have annotated a new rodent genome using Maker2. Based on the annotated maker2 gene sets, we did gene family expansion/contraction analysis using CAFE. We found Histone H2A, H2B, H4 gene families are under contraction. I wonder whether there are known bias to predict those gene families using Maker2? For example, can this due to repeat masking of the genome? I used repeatmaker and generated species specific repeat libraries follows http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction--Basic . >> >> Thanks >> >> Best >> Quanwei >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From qwzhang0601 at gmail.com Tue Nov 28 06:39:52 2017 From: qwzhang0601 at gmail.com (Quanwei Zhang) Date: Tue, 28 Nov 2017 08:39:52 -0500 Subject: [maker-devel] About loss of Histone H2A, H2B, H4 In-Reply-To: References: <3A172BAF-DD5C-4CA8-8D1C-6EFF30A1FFA4@gmail.com> Message-ID: Dear Carson: Thank you! Best Quanwei 2017-11-27 14:56 GMT-05:00 Carson Holt : > You should not have to train separately for SNAP on unmasked sequence, and > I do believe adding back genes that were rejected because of lack of > support but contain an identifiable domain may help. These will be in the > fasta files labeled non-overlapping file in the datastore. > > ?Carson > > On Nov 21, 2017, at 10:42 AM, Quanwei Zhang wrote: > > Dear Carson: > > Thank you for your comments and suggestions. Now the SNAP was trained with > repeat masked, is it necessary to retrain the predictor without repeat > masking? > By BUSCO analysis on the genome, the completeness is shown as below. Now I > am doing the analysis using the default reports of Maker2 (i.e., gene > models with evidence support, the default build). For the gene loss, > besides you suggestions I am also considering to do the analysis using the > gene models with evidence support plus those with scanned domains (i.e., > standard build). How do you think? > > > C:95.0%[S:92.7%,D:2.3%],F:2.2%,M:2.8%,n:4104 > 3902 Complete BUSCOs (C) > 3806 Complete and single-copy BUSCOs (S) > 96 Complete and duplicated BUSCOs (D) > 92 Fragmented BUSCOs (F) > 110 Missing BUSCOs (M) > > Thanks > Best > Quanwei > > > 2017-11-21 11:19 GMT-05:00 Carson Holt : > >> No known biases, but if you are concerned, you can collect known Histone >> H2A, H2B, H4 proteins and transcripts from other species (protein= and >> altest= options), them run MAKER with no masking to see if you gain any >> models that may have been overlooked because of over-masking of repeats. >> Make sure to evaluate any models you find as being a pseudogene. Run >> InterProScan on results to make sure they contain known InterPro domains >> for that gene family as well. Running without repeat masking will increase >> sensitivity but also false positives derived from low homology alignments >> to simple repeats which is why you need to evaluate results using something >> like InterProScan. >> >> Also run BUSCO to evaluate the completeness of the genome. Make sure that >> the observed contraction is not just a result of an incomplete assembly. >> >> ?Carson >> >> >> On Nov 16, 2017, at 12:46 PM, Quanwei Zhang >> wrote: >> >> Hello: >> >> We have annotated a new rodent genome using Maker2. Based on the >> annotated maker2 gene sets, we did gene family expansion/contraction >> analysis using CAFE. We found Histone H2A, H2B, H4 gene families are under >> contraction. I wonder whether there are known bias to predict those gene >> families using Maker2? For example, can this due to repeat masking of the >> genome? I used repeatmaker and generated species specific repeat libraries >> follows http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repe >> at_Library_Construction--Basic. >> >> Thanks >> >> Best >> Quanwei >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Nov 28 16:39:47 2017 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 28 Nov 2017 16:39:47 -0700 Subject: [maker-devel] custom "ab initio" predictions with automatic hint-based predictions In-Reply-To: <81D27009-2422-4116-848A-E2C862A74075@univie.ac.at> References: <947BFB2F-A893-417B-A043-07CE71F6F97E@gmail.com> <81D27009-2422-4116-848A-E2C862A74075@univie.ac.at> Message-ID: <768084A0-A5DA-4745-8151-D53AD0E495E3@gmail.com> Your patch will essentially just turn off all maker hint based gene prediction when no_abinit is turned on. We do not currently have a way to pass in external hints, but if you just want your hint based predictions to compete against MAKER hint based prediction, you can provide it as pred_gff while still letting MAKER run by giving the augustus_species file. ?Carson > On Nov 28, 2017, at 7:37 AM, Bob Zimmermann wrote: > > Dear Carson, > > Thanks for the response! Sorry for the slow reply. > > Actually what I meant was that I wanted to generate other types of hints that maker could not automatically use to prevent lower quality ab initio predictions from influencing the final output. Therefore I wanted to make my own ab intio predicitions prior to running maker, and then have maker to generate the transcript hints and then run augustus, finally synthesizing my own ab initio predicions with the maker hint-based ones. (In other words, just run the second round of augustus, not the first one.) > > I?ve attached a patch which seemed to allow me to tell maker to do what I wanted it to do. Am I missing something? > > Best, > Bob > > ? > > Department of Molecular Evolution and Development > Universit?t Wien > Althanstra?e 14 (UZA I), Zimmer 2.019 > 1090 Vienna > Austria > > +43 1 427757002 > > > >> On 13 Oct 2017, at 17:42, Carson Holt wrote: >> >> Hi Bob, >> >> pred_gff is a way to get models MAKER cannot run into the analysis. Input to pred_gff will not get hints since MAKER is not running the program. Setting augustus_species allows MAKER to run Augustus with and without hints and then those models compete against each other. You cannot just run with hints as the raw model is also used as a filter to help reduce false positive gene models that result from bad hints. If the gff3 you are providing is the same as the MAKER run of Augustus, I would recommend not providing it. If it is different in some way, then you can leave it in. If you run under MPI (it?s ok to run MPI on a single machine), then MAKER will parallelize the Augustus run by running multiple configs and contig chunks at the same time. >> >> Thanks, >> Carson >> >> >> >> >> >>> On Oct 11, 2017, at 1:42 PM, Bob Zimmermann wrote: >>> >>> Hello, >>> >>> I would like to run maker with a custom set of ab initio predictions (based on hints given to augustus from RNAseq data), but allowing it to incorporate EST and protein data to make an additional run of augustus using hints derived from those alignments. >>> >>> My gene prediction section of the maker_opts.ctl file looks like this: >>> ... >>> augustus_species=all_combined #Augustus gene prediction species model >>> ... >>> pred_gff=../ab_initio_predictions/all_combined.augustus_masked.gff3 #ab-initio predictions from an external GFF3 file >>> model_gff= #annotated gene models from an external GFF3 file (annotation pass-through) >>> est2genome=1 #infer gene predictions directly from ESTs, 1 = yes, 0 = no >>> protein2genome=0 #infer predictions from protein homology, 1 = yes, 0 = no >>> ? >>> >>> It seems as though even if pred_gff is set, augustus will still be run for ab initio predictions with no hints if an augustus_species setting is present. I was curious if there was any way around this, partly because custom ab initios could improve my annotation and also because the ab initio step can take long. >>> >>> Thanks for your help! >>> >>> Bob >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> > From robert.zimmermann at univie.ac.at Tue Nov 28 07:37:40 2017 From: robert.zimmermann at univie.ac.at (Bob Zimmermann) Date: Tue, 28 Nov 2017 15:37:40 +0100 Subject: [maker-devel] custom "ab initio" predictions with automatic hint-based predictions In-Reply-To: <947BFB2F-A893-417B-A043-07CE71F6F97E@gmail.com> References: <947BFB2F-A893-417B-A043-07CE71F6F97E@gmail.com> Message-ID: <81D27009-2422-4116-848A-E2C862A74075@univie.ac.at> Dear Carson, Thanks for the response! Sorry for the slow reply. Actually what I meant was that I wanted to generate other types of hints that maker could not automatically use to prevent lower quality ab initio predictions from influencing the final output. Therefore I wanted to make my own ab intio predicitions prior to running maker, and then have maker to generate the transcript hints and then run augustus, finally synthesizing my own ab initio predicions with the maker hint-based ones. (In other words, just run the second round of augustus, not the first one.) I?ve attached a patch which seemed to allow me to tell maker to do what I wanted it to do. Am I missing something? Best, Bob ? Department of Molecular Evolution and Development Universit?t Wien Althanstra?e 14 (UZA I), Zimmer 2.019 1090 Vienna Austria +43 1 427757002 -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_noabinit.patch Type: application/octet-stream Size: 950 bytes Desc: not available URL: -------------- next part -------------- > On 13 Oct 2017, at 17:42, Carson Holt wrote: > > Hi Bob, > > pred_gff is a way to get models MAKER cannot run into the analysis. Input to pred_gff will not get hints since MAKER is not running the program. Setting augustus_species allows MAKER to run Augustus with and without hints and then those models compete against each other. You cannot just run with hints as the raw model is also used as a filter to help reduce false positive gene models that result from bad hints. If the gff3 you are providing is the same as the MAKER run of Augustus, I would recommend not providing it. If it is different in some way, then you can leave it in. If you run under MPI (it?s ok to run MPI on a single machine), then MAKER will parallelize the Augustus run by running multiple configs and contig chunks at the same time. > > Thanks, > Carson > > > > > >> On Oct 11, 2017, at 1:42 PM, Bob Zimmermann wrote: >> >> Hello, >> >> I would like to run maker with a custom set of ab initio predictions (based on hints given to augustus from RNAseq data), but allowing it to incorporate EST and protein data to make an additional run of augustus using hints derived from those alignments. >> >> My gene prediction section of the maker_opts.ctl file looks like this: >> ... >> augustus_species=all_combined #Augustus gene prediction species model >> ... >> pred_gff=../ab_initio_predictions/all_combined.augustus_masked.gff3 #ab-initio predictions from an external GFF3 file >> model_gff= #annotated gene models from an external GFF3 file (annotation pass-through) >> est2genome=1 #infer gene predictions directly from ESTs, 1 = yes, 0 = no >> protein2genome=0 #infer predictions from protein homology, 1 = yes, 0 = no >> ? >> >> It seems as though even if pred_gff is set, augustus will still be run for ab initio predictions with no hints if an augustus_species setting is present. I was curious if there was any way around this, partly because custom ab initios could improve my annotation and also because the ab initio step can take long. >> >> Thanks for your help! >> >> Bob >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > From aircoolsky at gmail.com Thu Nov 30 23:20:20 2017 From: aircoolsky at gmail.com (Yu-Hsuan Cheng) Date: Fri, 1 Dec 2017 14:20:20 +0800 Subject: [maker-devel] Changing the genetic code table in MAKER Message-ID: Hi, This is YuHsuan Cheng, who is a PhD student from Taiwan. I want to use the MAKER combining with SNAP to annotate ciliates genome. The genetic code for ciliates is different from other species, so I am wondering that if there is any option in MAKER I can change the genetic code table? I also asked Dr. Korf about this issue, he said SNAP has no way to change the genetic code table. I will use Augustus combining with Maker later on. The pipeline I used previously is as followed. 1. MAKER (Hints from proteome and RNAseq) 2. MAKER to Zff 3. ~/bin/maker/exe/snap/hmm-assembler.pl snapFirst . > ../../snapFirst.hmm and then used snapFirst.hmm as hints in MAKER Look forward to your reply. Thank you. Best wishes, YuHsuan Yu-Hsuan Cheng ??? Institute of Molecular Biology Academia Sinica 128 Academia road, Section 2 Nankang, Taipei 115 Taiwan Phone:+886-2-2789-9216 <+886%202%202789%209216> (Lab), +886-958-216-538 <+886%20958%20216%20538> (Mobile phone) d02b48008 at ntu.edu.tw -------------- next part -------------- An HTML attachment was scrubbed... URL: From eennadi at gmail.com Thu Nov 2 13:51:00 2017 From: eennadi at gmail.com (Emmanuel Nnadi) Date: Thu, 2 Nov 2017 20:51:00 +0100 Subject: [maker-devel] Error trying to submit genome to ncbi Message-ID: Hi, I am trying to submit my genome i annotated using maker and they sent back this error, 1. Please remove any N nucleotides from the beginning or end of the sequence 2.No feature should begin or end inside a gap. Instead the feature should be made partial at the gap boundary. [3] Coding regions should not be 5' partial if they begin with the start methionine. If this is an internal methionine int he translation than it is fine if they are partial. Conversely, all coding regions must have a stop codon or be 3' partial. You have a large number of gene features that are not associated with other features. Please include on these features in the gene description field some description of what the gene would have encoded. A feature table example of this is: <41156 >40652 gene gene_desc transposon locus_tag CR513_45338 note nonfunctional due to frameshift Please how can i use maker to solve this problem? Nnadi Nnaemeka Emmanuel Department of Microbiology, Faculty of Natural and Applied Science, Plateau State University, Bokkos, Plateau State, Nigeria. Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications -------------- next part -------------- An HTML attachment was scrubbed... URL: From dandence at gmail.com Thu Nov 2 14:08:54 2017 From: dandence at gmail.com (Daniel Ence) Date: Thu, 2 Nov 2017 16:08:54 -0400 Subject: [maker-devel] Error trying to submit genome to ncbi In-Reply-To: References: Message-ID: Hi, I think you?ve posted before about issues 1 and 2 from the NCBI. The note for issue 3 from NCBI sounds like there are gene features that don?t have associated transcript, CDS or exon features. I?m not certain how that could be a result from MAKER. It might be something that someone else created (manually or with another tool), and then passed to maker from a GFF file. In the example included in your email, it looks like these offending genes are transposons that have been annotated as genes. If that is the case for the rest of the offending genes, then I would suggest changing the ?type? field (column 3) from ?gene? to something else, like ?transposable_element? perhaps. ~Daniel > On Nov 2, 2017, at 3:51 PM, Emmanuel Nnadi wrote: > > Hi, > > I am trying to submit my genome i annotated using maker and they sent back this error, > 1. Please remove any N nucleotides from the beginning or end of the sequence > 2.No feature should begin or end inside a gap. Instead the feature should > be made partial at the gap boundary. > > [3] Coding regions should not be 5' partial if they begin with the start > methionine. If this is an internal methionine int he translation than > it is fine if they are partial. Conversely, all coding regions > must have a stop codon or be 3' partial. > You have a large number of gene features that are not associated > with other features. Please include on these features in the > gene description field some description of what the gene would > have encoded. > > A feature table example of this is: > > <41156 >40652 gene > gene_desc transposon > locus_tag CR513_45338 > note nonfunctional due to frameshift > Please how can i use maker to solve this problem? > > > Nnadi Nnaemeka Emmanuel > Department of Microbiology, > Faculty of Natural and Applied Science, > Plateau State University, Bokkos, Plateau State, Nigeria. > Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 1356 bytes Desc: not available URL: From dandence at gmail.com Thu Nov 2 14:24:31 2017 From: dandence at gmail.com (Daniel Ence) Date: Thu, 2 Nov 2017 16:24:31 -0400 Subject: [maker-devel] Error trying to submit genome to ncbi In-Reply-To: References: Message-ID: <5EB1FECF-535B-447D-AFCF-E13174DB4232@gmail.com> Hi, Thank you for sending me your data, but which ones are the offending genes that NCBI is complaining about? Can you identify the problem that NCBI is giving in some subset of the gene features? ~Daniel > On Nov 2, 2017, at 4:20 PM, Emmanuel Nnadi wrote: > > Hi Daniel thanks for your reply. > > I have attached my .tbl file > > you would see > <77753 >77549 gene > locus_tag CR513_00193 > gene AtMg00820 > note nonfunctional due to frameshift > > > Is another example. > > Its becoming frustrating. > > I have not posted the two errors before > [1] Please remove any N nucleotides from the beginning or end of the sequence. > > [2] No feature should begin or end inside a gap. Instead the feature should > be made partial at the gap boundary. > > [3] Coding regions should not be 5' partial if they begin with the start > methionine. If this is an internal methionine int he translation than > it is fine if they are partial. Conversely, all coding regions > must have a stop codon or be 3' partial. > > Nnadi Nnaemeka Emmanuel > Department of Microbiology, > Faculty of Natural and Applied Science, > Plateau State University, Bokkos, Plateau State, Nigeria. > Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications > On Thu, Nov 2, 2017 at 9:08 PM, Daniel Ence > wrote: > Hi, I think you?ve posted before about issues 1 and 2 from the NCBI. The note for issue 3 from NCBI sounds like there are gene features that don?t have associated transcript, CDS or exon features. I?m not certain how that could be a result from MAKER. It might be something that someone else created (manually or with another tool), and then passed to maker from a GFF file. In the example included in your email, it looks like these offending genes are transposons that have been annotated as genes. If that is the case for the rest of the offending genes, then I would suggest changing the ?type? field (column 3) from ?gene? to something else, like ?transposable_element? perhaps. > > ~Daniel > > >> On Nov 2, 2017, at 3:51 PM, Emmanuel Nnadi > wrote: >> >> Hi, >> >> I am trying to submit my genome i annotated using maker and they sent back this error, >> 1. Please remove any N nucleotides from the beginning or end of the sequence >> 2.No feature should begin or end inside a gap. Instead the feature should >> be made partial at the gap boundary. >> >> [3] Coding regions should not be 5' partial if they begin with the start >> methionine. If this is an internal methionine int he translation than >> it is fine if they are partial. Conversely, all coding regions >> must have a stop codon or be 3' partial. >> You have a large number of gene features that are not associated >> with other features. Please include on these features in the >> gene description field some description of what the gene would >> have encoded. >> >> A feature table example of this is: >> >> <41156 >40652 gene >> gene_desc transposon >> locus_tag CR513_45338 >> note nonfunctional due to frameshift >> Please how can i use maker to solve this problem? >> >> >> Nnadi Nnaemeka Emmanuel >> Department of Microbiology, >> Faculty of Natural and Applied Science, >> Plateau State University, Bokkos, Plateau State, Nigeria. >> Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 1356 bytes Desc: not available URL: From dandence at gmail.com Thu Nov 2 14:46:03 2017 From: dandence at gmail.com (Daniel Ence) Date: Thu, 2 Nov 2017 16:46:03 -0400 Subject: [maker-devel] Error trying to submit genome to ncbi In-Reply-To: References: <5EB1FECF-535B-447D-AFCF-E13174DB4232@gmail.com> Message-ID: These gene features with the ?nonfunctional due to frameshift? indeed do not have other features associated with them in the tbl files. Is this reflected in the gff3 files for these annotations that maker produced? I?m not certain how maker would maker a gene without a CDS or mRNA, but identifying those discrepancies would a place to understand what has happened. > On Nov 2, 2017, at 4:30 PM, Emmanuel Nnadi wrote: > > Hi Daniel, > > This is the mail they sent to me > > [1] Please remove any N nucleotides from the beginning or end of the sequence. > > [2] No feature should begin or end inside a gap. Instead the feature should > be made partial at the gap boundary. > > [3] Coding regions should not be 5' partial if they begin with the start > methionine. If this is an internal methionine int he translation than > it is fine if they are partial. Conversely, all coding regions > must have a stop codon or be 3' partial. > > [4] You have a large number of gene features that are not associated > with other features. Please include on these features in the > gene description field some description of what the gene would > have encoded. > > A feature table example of this is: > > <41156 >40652 gene > gene_desc transposon > locus_tag CR513_45338 > note nonfunctional due to frameshift > > [5] Every coding region must have a corresponding mRNA and in > every case the mRNA product name must match exactly that of the > CDS feature. > > 2 coding regions do not have an mRNA > ORIG/combined_1-5000.sqn:CDS cytochrome c oxidase subunit 2 (contig_100:<38458- > 39198, 40429->40623) CR513_00692 > ORIG/combined_1-5000.sqn:CDS cytochrome c oxidase subunit 1 > (contig_100:c>113064-111485, c111245-111221) CR513_00691 > > So I just went to the .tbl file and searched for nonfunctional due to frameshift They are quite much, I have two more .tbl files > > I used GAG annotation to remove NNN and to add start and stop codon but ncbi still complained. > > > I have ran out of idea > > Please help me > > > > > > > Nnadi Nnaemeka Emmanuel > Department of Microbiology, > Faculty of Natural and Applied Science, > Plateau State University, Bokkos, Plateau State, Nigeria. > Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications > On Thu, Nov 2, 2017 at 9:24 PM, Daniel Ence > wrote: > Hi, Thank you for sending me your data, but which ones are the offending genes that NCBI is complaining about? Can you identify the problem that NCBI is giving in some subset of the gene features? > > ~Daniel > > > > >> On Nov 2, 2017, at 4:20 PM, Emmanuel Nnadi > wrote: >> >> Hi Daniel thanks for your reply. >> >> I have attached my .tbl file >> >> you would see >> <77753 >77549 gene >> locus_tag CR513_00193 >> gene AtMg00820 >> note nonfunctional due to frameshift >> >> >> Is another example. >> >> Its becoming frustrating. >> >> I have not posted the two errors before >> [1] Please remove any N nucleotides from the beginning or end of the sequence. >> >> [2] No feature should begin or end inside a gap. Instead the feature should >> be made partial at the gap boundary. >> >> [3] Coding regions should not be 5' partial if they begin with the start >> methionine. If this is an internal methionine int he translation than >> it is fine if they are partial. Conversely, all coding regions >> must have a stop codon or be 3' partial. >> >> Nnadi Nnaemeka Emmanuel >> Department of Microbiology, >> Faculty of Natural and Applied Science, >> Plateau State University, Bokkos, Plateau State, Nigeria. >> Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications >> On Thu, Nov 2, 2017 at 9:08 PM, Daniel Ence > wrote: >> Hi, I think you?ve posted before about issues 1 and 2 from the NCBI. The note for issue 3 from NCBI sounds like there are gene features that don?t have associated transcript, CDS or exon features. I?m not certain how that could be a result from MAKER. It might be something that someone else created (manually or with another tool), and then passed to maker from a GFF file. In the example included in your email, it looks like these offending genes are transposons that have been annotated as genes. If that is the case for the rest of the offending genes, then I would suggest changing the ?type? field (column 3) from ?gene? to something else, like ?transposable_element? perhaps. >> >> ~Daniel >> >> >>> On Nov 2, 2017, at 3:51 PM, Emmanuel Nnadi > wrote: >>> >>> Hi, >>> >>> I am trying to submit my genome i annotated using maker and they sent back this error, >>> 1. Please remove any N nucleotides from the beginning or end of the sequence >>> 2.No feature should begin or end inside a gap. Instead the feature should >>> be made partial at the gap boundary. >>> >>> [3] Coding regions should not be 5' partial if they begin with the start >>> methionine. If this is an internal methionine int he translation than >>> it is fine if they are partial. Conversely, all coding regions >>> must have a stop codon or be 3' partial. >>> You have a large number of gene features that are not associated >>> with other features. Please include on these features in the >>> gene description field some description of what the gene would >>> have encoded. >>> >>> A feature table example of this is: >>> >>> <41156 >40652 gene >>> gene_desc transposon >>> locus_tag CR513_45338 >>> note nonfunctional due to frameshift >>> Please how can i use maker to solve this problem? >>> >>> >>> Nnadi Nnaemeka Emmanuel >>> Department of Microbiology, >>> Faculty of Natural and Applied Science, >>> Plateau State University, Bokkos, Plateau State, Nigeria. >>> Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 1356 bytes Desc: not available URL: From carsonhh at gmail.com Thu Nov 2 14:48:40 2017 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 2 Nov 2017 14:48:40 -0600 Subject: [maker-devel] Error trying to submit genome to ncbi In-Reply-To: References: <5EB1FECF-535B-447D-AFCF-E13174DB4232@gmail.com> Message-ID: <56DF0ADA-40DA-4C88-AD37-BF63D8BCFD22@gmail.com> If you modified the fasta files to remove N?s etc after they were annotated, then that would generate a mismatch between the GFF3 coordinates and the fasta sequence. Have you modified or split contigs in the assembly in any way? I seem to remember you posting an issue about the fasta submission to NCBI previously. ?Carson > On Nov 2, 2017, at 2:46 PM, Daniel Ence wrote: > > These gene features with the ?nonfunctional due to frameshift? indeed do not have other features associated with them in the tbl files. Is this reflected in the gff3 files for these annotations that maker produced? I?m not certain how maker would maker a gene without a CDS or mRNA, but identifying those discrepancies would a place to understand what has happened. > > > >> On Nov 2, 2017, at 4:30 PM, Emmanuel Nnadi > wrote: >> >> Hi Daniel, >> >> This is the mail they sent to me >> >> [1] Please remove any N nucleotides from the beginning or end of the sequence. >> >> [2] No feature should begin or end inside a gap. Instead the feature should >> be made partial at the gap boundary. >> >> [3] Coding regions should not be 5' partial if they begin with the start >> methionine. If this is an internal methionine int he translation than >> it is fine if they are partial. Conversely, all coding regions >> must have a stop codon or be 3' partial. >> >> [4] You have a large number of gene features that are not associated >> with other features. Please include on these features in the >> gene description field some description of what the gene would >> have encoded. >> >> A feature table example of this is: >> >> <41156 >40652 gene >> gene_desc transposon >> locus_tag CR513_45338 >> note nonfunctional due to frameshift >> >> [5] Every coding region must have a corresponding mRNA and in >> every case the mRNA product name must match exactly that of the >> CDS feature. >> >> 2 coding regions do not have an mRNA >> ORIG/combined_1-5000.sqn:CDS cytochrome c oxidase subunit 2 (contig_100:<38458- >> 39198, 40429->40623) CR513_00692 >> ORIG/combined_1-5000.sqn:CDS cytochrome c oxidase subunit 1 >> (contig_100:c>113064-111485, c111245-111221) CR513_00691 >> >> So I just went to the .tbl file and searched for nonfunctional due to frameshift They are quite much, I have two more .tbl files >> >> I used GAG annotation to remove NNN and to add start and stop codon but ncbi still complained. >> >> >> I have ran out of idea >> >> Please help me >> >> >> >> >> >> >> Nnadi Nnaemeka Emmanuel >> Department of Microbiology, >> Faculty of Natural and Applied Science, >> Plateau State University, Bokkos, Plateau State, Nigeria. >> Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications >> On Thu, Nov 2, 2017 at 9:24 PM, Daniel Ence > wrote: >> Hi, Thank you for sending me your data, but which ones are the offending genes that NCBI is complaining about? Can you identify the problem that NCBI is giving in some subset of the gene features? >> >> ~Daniel >> >> >> >> >>> On Nov 2, 2017, at 4:20 PM, Emmanuel Nnadi > wrote: >>> >>> Hi Daniel thanks for your reply. >>> >>> I have attached my .tbl file >>> >>> you would see >>> <77753 >77549 gene >>> locus_tag CR513_00193 >>> gene AtMg00820 >>> note nonfunctional due to frameshift >>> >>> >>> Is another example. >>> >>> Its becoming frustrating. >>> >>> I have not posted the two errors before >>> [1] Please remove any N nucleotides from the beginning or end of the sequence. >>> >>> [2] No feature should begin or end inside a gap. Instead the feature should >>> be made partial at the gap boundary. >>> >>> [3] Coding regions should not be 5' partial if they begin with the start >>> methionine. If this is an internal methionine int he translation than >>> it is fine if they are partial. Conversely, all coding regions >>> must have a stop codon or be 3' partial. >>> >>> Nnadi Nnaemeka Emmanuel >>> Department of Microbiology, >>> Faculty of Natural and Applied Science, >>> Plateau State University, Bokkos, Plateau State, Nigeria. >>> Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications >>> On Thu, Nov 2, 2017 at 9:08 PM, Daniel Ence > wrote: >>> Hi, I think you?ve posted before about issues 1 and 2 from the NCBI. The note for issue 3 from NCBI sounds like there are gene features that don?t have associated transcript, CDS or exon features. I?m not certain how that could be a result from MAKER. It might be something that someone else created (manually or with another tool), and then passed to maker from a GFF file. In the example included in your email, it looks like these offending genes are transposons that have been annotated as genes. If that is the case for the rest of the offending genes, then I would suggest changing the ?type? field (column 3) from ?gene? to something else, like ?transposable_element? perhaps. >>> >>> ~Daniel >>> >>> >>>> On Nov 2, 2017, at 3:51 PM, Emmanuel Nnadi > wrote: >>>> >>>> Hi, >>>> >>>> I am trying to submit my genome i annotated using maker and they sent back this error, >>>> 1. Please remove any N nucleotides from the beginning or end of the sequence >>>> 2.No feature should begin or end inside a gap. Instead the feature should >>>> be made partial at the gap boundary. >>>> >>>> [3] Coding regions should not be 5' partial if they begin with the start >>>> methionine. If this is an internal methionine int he translation than >>>> it is fine if they are partial. Conversely, all coding regions >>>> must have a stop codon or be 3' partial. >>>> You have a large number of gene features that are not associated >>>> with other features. Please include on these features in the >>>> gene description field some description of what the gene would >>>> have encoded. >>>> >>>> A feature table example of this is: >>>> >>>> <41156 >40652 gene >>>> gene_desc transposon >>>> locus_tag CR513_45338 >>>> note nonfunctional due to frameshift >>>> Please how can i use maker to solve this problem? >>>> >>>> >>>> Nnadi Nnaemeka Emmanuel >>>> Department of Microbiology, >>>> Faculty of Natural and Applied Science, >>>> Plateau State University, Bokkos, Plateau State, Nigeria. >>>> Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at box290.bluehost.com >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> >>> >> >> > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From dandence at gmail.com Thu Nov 2 15:07:01 2017 From: dandence at gmail.com (Daniel Ence) Date: Thu, 2 Nov 2017 17:07:01 -0400 Subject: [maker-devel] Error trying to submit genome to ncbi In-Reply-To: References: <5EB1FECF-535B-447D-AFCF-E13174DB4232@gmail.com> Message-ID: Hi Emmanuel, I recommend looking into what Carson suggested. If you edited the fasta files for the ?NNN? characters for the transcripts or reference genome and then resubmitted without changing the gff3 coordinates, then that would result in these kind of errors. ~Daniel > On Nov 2, 2017, at 5:02 PM, Emmanuel Nnadi wrote: > > ?muc_functional.blast.gff -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 1356 bytes Desc: not available URL: From dandence at gmail.com Thu Nov 2 15:56:24 2017 From: dandence at gmail.com (Daniel Ence) Date: Thu, 2 Nov 2017 17:56:24 -0400 Subject: [maker-devel] Error trying to submit genome to ncbi In-Reply-To: References: <5EB1FECF-535B-447D-AFCF-E13174DB4232@gmail.com> Message-ID: <20FE86D2-2431-4CD8-B4E1-E700F723760C@gmail.com> Hi Emmanuel, Please ?reply all? to in these exchanges so that they?ll stay stored on the maker-devel list for others to find in the future. It also helps keep the conversation open so that others can chime in and help out too. :) I looked at several of the ?nonfunctional due to frameshift? genes and they have associated features in the gff3 file. So there might be a frameshift issue in the original annotations, but I?d doubt that, or a frameshift error might be getting introduced when you convert to the tbl format. > On Nov 2, 2017, at 5:12 PM, Emmanuel Nnadi wrote: > > Hi Daniel > > I NCBI first complained of this even when I hadn't used GAG annotation to remove N's, > > On my raw file they complained about this > > Nnadi Nnaemeka Emmanuel > Department of Microbiology, > Faculty of Natural and Applied Science, > Plateau State University, Bokkos, Plateau State, Nigeria. > Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications > On Thu, Nov 2, 2017 at 10:07 PM, Daniel Ence > wrote: > Hi Emmanuel, I recommend looking into what Carson suggested. If you edited the fasta files for the ?NNN? characters for the transcripts or reference genome and then resubmitted without changing the gff3 coordinates, then that would result in these kind of errors. > > ~Daniel > > > > > > > > > >> On Nov 2, 2017, at 5:02 PM, Emmanuel Nnadi > wrote: >> >> ?muc_functional.blast.gff > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 1356 bytes Desc: not available URL: From o.k.torresen at ibv.uio.no Thu Nov 9 02:44:06 2017 From: o.k.torresen at ibv.uio.no (=?utf-8?B?T2xlIEtyaXN0aWFuIFTDuHJyZXNlbg==?=) Date: Thu, 9 Nov 2017 09:44:06 +0000 Subject: [maker-devel] substr outside of string in PhatHits_utils.pm Message-ID: Dear all, I'm having an issue with MAKER which I'm unable to wrap my head around. Hopefully the issue is easily identifiable and resolvable for someone with more insight than me. Please find the log output attached below. I cannot find any more information than this in any logs. Many scaffolds do complete fine, but some of the longest ones have issues. Thank you. Sincerely, Ole K. T?rresen Error message: #--------- command -------------# Widget::augustus: /projects/cees/bin/augustus/augustus-3.2.3/bin/augustus --strand=backward --species=gadMor2_code_braker2 --UTR=off --hintsfile=/tmp/18899594.d/maker_m8xwVp/0/34_2.22239-28465.gadMor2_code_brak er2.auto_annotator.xdef.augustus --extrinsicCfgFile=/projects/cees/bin/augustus/augustus-3.2.3/config/extrinsic/extrinsic.MPE.cfg --AUGUSTUS_CONFIG_PATH=/projects/cees/bin/augustus/augustus-3.2 .3/config /tmp/18899594.d/maker_m8xwVp/0/34_2.22239-28465.gadMor2_code_braker2.auto_annotator.augustus.fasta > /tmp/18899594.d/maker_m8xwVp/0/34_2.22239-28465.gadMor2_code_braker2.auto_annotato r.augustus #-------------------------------# deleted:0 genes begin called get_best_alt_splices1 ...processing 0 of 2 ...processing 1 of 2 end called get_best_alt_splices1 ...processing 0 of 20 ...processing 1 of 20 ...processing 2 of 20 ...processing 3 of 20 ...processing 4 of 20 ...processing 5 of 20 ...processing 6 of 20 ...processing 7 of 20 ...processing 8 of 20 ...processing 9 of 20 ...processing 10 of 20 ...processing 11 of 20 ...processing 12 of 20 ...processing 13 of 20 ...processing 14 of 20 ...processing 15 of 20 ...processing 16 of 20 ...processing 17 of 20 ...processing 18 of 20 ...processing 19 of 20 substr outside of string at /projects/cees/bin/maker/maker-3.1.1/bin/../lib/PhatHit_utils.pm line 850. --> rank=NA, hostname=compute-31-18.local ERROR: Failed while annotating transcripts ERROR: Chunk failed at level:1, tier_type:4 FAILED CONTIG:GmG20150304_scaffold_8692 ERROR: Chunk failed at level:6, tier_type:0 FAILED CONTIG:GmG20150304_scaffold_8692 examining contents of the fasta file and run log From lcampbell at ebi.ac.uk Thu Nov 9 04:13:35 2017 From: lcampbell at ebi.ac.uk (Lahcen Campbell) Date: Thu, 9 Nov 2017 11:13:35 +0000 Subject: [maker-devel] Model training with AED=0.7 made all contigs FAILED Message-ID: Hi folks, I would just like some insight into a recent round of MAKER annotation I performed and returned back 0 Finished contigs. The genome is a white fly, which I successfully ran MAKER on initally with the first round of "Evidence in", so passing in EST evidence as aligned transcript gffs, protein homology evidence etc. The run was successful and produced a lot of good quality gene models Statistics: ???????????24,613 genes with 49,547 transcripts containing 141130 cds. Now, I know this count is very high for our species, so in the 2nd round (completed running over 1 night due to all contigs failing) I attempted to increase the threshold for support, by reducing AED to 0.7 from an initial 1. Prior to starting the second round I had trained SNAP on the first round results and also ran Augustus separately and? passed this via the snaphmm, pred_gff option. Finally I set min protein to be no less than 100Aa and set est2genome and prot2genome off to allow for gene model refinement. I checked the run today and all ~8,000 contigs/scaffolds returned as FAILED with all having tried to be retried once each. My initial feeling was, I feared I have just lost my initial set of 24,613 gene models. I know believe that this won't be the case but Im not sure... Can anyone explain what might have happened here and what consequences will follow given they all returned as failed ? Have they been deleted from the MAKER data store ? I had capturdD all 1st round MAKER output files (GFF, Fasta files etc) before attempting this 2nd round (i.e. 1st round of model training) of MAKER . If I have irrevocably changed the datastore for MAKER and lost those genes, might I be able to restore to an earlier point (say back to the first round of evidence in gene models) by passing the first MAKER gff in as "maker_gff=" / "pred_pass=1" / "model_pass=1" ? Any advice on this would be much appreciated Lahcen -------------- next part -------------- An HTML attachment was scrubbed... URL: From lahcencampbell at gmail.com Thu Nov 9 07:53:19 2017 From: lahcencampbell at gmail.com (lahcen campbell) Date: Thu, 9 Nov 2017 14:53:19 +0000 Subject: [maker-devel] Model training with AED=0.7 made all contigs FAILED Message-ID: Apologies this message was sent earlier today from an incorrect email address so it was flagged for verification. Hi folks, I would just like some insight into a recent round of MAKER annotation I performed and returned back 0 Finished contigs. The genome is a white fly, which I successfully ran MAKE initially with the first round of "Evidence in", so passing in EST evidence as aligned transcript gffs, protein homology evidence etc. The run was successful and produced a lot of good quality gene models Statistics: 24,613 genes with 49,547 transcripts containing 141130 cds. Now, I know this count is very high for our species, so in the 2nd round (completed running over 1 night due to all contigs failing) I attempted to increase the threshold for support, by reducing AED to 0.7 from an initial 1. Prior to starting the second round I had trained SNAP on the first round results and also ran Augustus separately and passed this via the snaphmm, pred_gff option. Finally I set min protein to be no less than 100Aa and set est2genome and prot2genome off to allow for gene model refinement. I checked the run today and all ~8,000 contigs/scaffolds returned as FAILED with all having tried to be retried once each. (Note I retried to run this time reverting the AED to 1, yet the same outcome happened again). The following error appears throughout the log file: *MAKER WARNING: The file MAKER.contigs_datastore/BF/41/tig00000234//theVoid.tig00000234/0/tig00000234.0.all.rb.out* *did not finish on the last run and must be erased* My initial feeling was, I feared I have just lost my initial set of 24,613 gene models. I now believe that this won't be the case but Im not sure... Can anyone explain what might have happened here and what consequences will follow given they all returned as failed ? Have they been deleted from the MAKER data store ? Are they retrievable ? I had capturd all 1st round MAKER output files (GFF, Fasta files etc) before attempting this 2nd round (i.e. 1st round of model training) of MAKER . If I have irrevocably changed the datastore for MAKER and lost those genes, might I be able to restore to an earlier point (say back to the first round of evidence in gene models) by passing the first MAKER gff in as "maker_gff=" / "pred_pass=1" / "model_pass=1" ? As it stands, maker2zff and fasta_merge / gff3_merge all return nothing or empty output files. So clearly my gene models have been altered somehow. Any advice on this would be much appreciated. Lahcen -- ========================================== > Dr. Lahcen Campbell < > Contact: lahcencampbell at gmail.com < > https://www.ebi.ac.uk/about/people/lahcen-campbell < ========================================== -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Nov 9 09:28:19 2017 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 9 Nov 2017 09:28:19 -0700 Subject: [maker-devel] substr outside of string in PhatHits_utils.pm In-Reply-To: References: Message-ID: <5E5CA836-91B1-4AA8-8DC3-68FB9885EB43@gmail.com> My first guess is that if you are using gff3 files as input to anything, then there may be an issue with your GFF3 file. My second suggestion is to try MAKER 3.02.02 to see if it has the same issue. ?Carson > On Nov 9, 2017, at 2:44 AM, Ole Kristian T?rresen wrote: > > Dear all, > I'm having an issue with MAKER which I'm unable to wrap my head around. Hopefully the issue is easily identifiable and resolvable for someone with more insight than me. Please find the log output attached below. I cannot find any more information than this in any logs. Many scaffolds do complete fine, but some of the longest ones have issues. > > Thank you. > > Sincerely, > Ole K. T?rresen > > Error message: > > #--------- command -------------# > Widget::augustus: > /projects/cees/bin/augustus/augustus-3.2.3/bin/augustus --strand=backward --species=gadMor2_code_braker2 --UTR=off --hintsfile=/tmp/18899594.d/maker_m8xwVp/0/34_2.22239-28465.gadMor2_code_brak > er2.auto_annotator.xdef.augustus --extrinsicCfgFile=/projects/cees/bin/augustus/augustus-3.2.3/config/extrinsic/extrinsic.MPE.cfg --AUGUSTUS_CONFIG_PATH=/projects/cees/bin/augustus/augustus-3.2 > .3/config /tmp/18899594.d/maker_m8xwVp/0/34_2.22239-28465.gadMor2_code_braker2.auto_annotator.augustus.fasta > /tmp/18899594.d/maker_m8xwVp/0/34_2.22239-28465.gadMor2_code_braker2.auto_annotato > r.augustus > #-------------------------------# > deleted:0 genes > begin called get_best_alt_splices1 > ...processing 0 of 2 > ...processing 1 of 2 > end called get_best_alt_splices1 > ...processing 0 of 20 > ...processing 1 of 20 > ...processing 2 of 20 > ...processing 3 of 20 > ...processing 4 of 20 > ...processing 5 of 20 > ...processing 6 of 20 > ...processing 7 of 20 > ...processing 8 of 20 > ...processing 9 of 20 > ...processing 10 of 20 > ...processing 11 of 20 > ...processing 12 of 20 > ...processing 13 of 20 > ...processing 14 of 20 > ...processing 15 of 20 > ...processing 16 of 20 > ...processing 17 of 20 > ...processing 18 of 20 > ...processing 19 of 20 > substr outside of string at /projects/cees/bin/maker/maker-3.1.1/bin/../lib/PhatHit_utils.pm line 850. > --> rank=NA, hostname=compute-31-18.local > ERROR: Failed while annotating transcripts > ERROR: Chunk failed at level:1, tier_type:4 > FAILED CONTIG:GmG20150304_scaffold_8692 > > ERROR: Chunk failed at level:6, tier_type:0 > FAILED CONTIG:GmG20150304_scaffold_8692 > > examining contents of the fasta file and run log > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Thu Nov 9 16:30:50 2017 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 9 Nov 2017 16:30:50 -0700 Subject: [maker-devel] Model training with AED=0.7 made all contigs FAILED In-Reply-To: References: Message-ID: There is probably an issue with the GFF3 file being passed in (I?m guessing the Augustus one). I would avoid passing in Augustus results as GFF3, it removes the ability of MAKER to dynamically provide Augustus with hints as it runs. You are essentially handicapping the pipeline. If your first genes were est2genome or protein2genome based, I would not pass them back in. Those models are suitable for training but will really reduce the accuracy of downstream final annotations (that is why we tell people to turn off est2genome/protein2genome after training a gene predictor in the MAKER documentation). Also if your inputs to the first round were GFF3 files it will have to be reread regardless. Any protein or transcript data that was aligned by MAEKR will still have the BLAST results archived, so you don?t need to worry about that unless you alter repeat masking options (which would cause it to rerun). Also if you are changing GFF3 file input between runs but using the same directory, you might want to delete any ?.db? files in the output folder. those hold an SQLite database of the GFF3 input that may be corrupted if it failed while attempting to update the database content with the Augustus gff3 file. ?Carson > On Nov 9, 2017, at 4:13 AM, Lahcen Campbell wrote: > > Hi folks, > > I would just like some insight into a recent round of MAKER annotation I performed and returned back 0 Finished contigs. > The genome is a white fly, which I successfully ran MAKER on initally with the first round of "Evidence in", so passing in EST evidence as aligned transcript gffs, protein homology evidence etc. The run was successful and produced a lot of good quality gene models > > > Statistics: > 24,613 genes with 49,547 transcripts containing 141130 cds. > > Now, I know this count is very high for our species, so in the 2nd round (completed running over 1 night due to all contigs failing) I attempted to increase the threshold for support, by reducing AED to 0.7 from an initial 1. Prior to starting the second round I had trained SNAP on the first round results and also ran Augustus separately and passed this via the snaphmm, pred_gff option. Finally I set min protein to be no less than 100Aa and set est2genome and prot2genome off to allow for gene model refinement. > > I checked the run today and all ~8,000 contigs/scaffolds returned as FAILED with all having tried to be retried once each. > > My initial feeling was, I feared I have just lost my initial set of 24,613 gene models. I know believe that this won't be the case but Im not sure... Can anyone explain what might have happened here and what consequences will follow given they all returned as failed ? Have they been deleted from the MAKER data store ? > > I had capturdD all 1st round MAKER output files (GFF, Fasta files etc) before attempting this 2nd round (i.e. 1st round of model training) of MAKER . > > If I have irrevocably changed the datastore for MAKER and lost those genes, might I be able to restore to an earlier point (say back to the first round of evidence in gene models) by passing the first MAKER gff in as "maker_gff=" / "pred_pass=1" / "model_pass=1" ? > > Any advice on this would be much appreciated > Lahcen > > > > > > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From lahcencampbell at gmail.com Tue Nov 14 05:15:10 2017 From: lahcencampbell at gmail.com (lahcen campbell) Date: Tue, 14 Nov 2017 12:15:10 +0000 Subject: [maker-devel] Unwarranted error: Skipping the contig because it is too short Message-ID: Hi MAKER community, I was hoping someone could help me. I have a very unusual error with two different versions of maker I have tested so far. This error shouldn't be happening but it occurs time and again no matter what I try. I have tried using 2.31.6_mpich3_icc and 2.31_mpich3 Note that version 2.31.6_mpich3_icc is one I have used countless times and produced final MAKER annotations without issue. So its not that this version has issues to date. Basically, this is a brand new MAKER analysis, I am only trying to train SNAP in this first round. I am following the MakerTutorial as documented this time around and I can't get past the initial SNAP train stage. I have a single genome file with, 10 Long scaffolds making up just under 11MB (subsampled from my original full length assembly) of sequence data in which to train SNAP. The fasta file is not corrupted, and has been generated in various ways in order to test formatting issues etc. I have only edited the maker_opts file and changed: *genome=* *protein=* *protein2genome=1* But see attached my maker CTL files. The error consistently returned to me: *Skipping the contig because it is too short!!* *SeqID: contig_WHATEVER* *Length: 0* *The sequences are no where near too short. This was verified independently outside maker to be sure. * *The headers are as follows:* >tig00000458 len=2889428 reads=4143 covStat=1793.77 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >tig00000159 len=3515005 reads=5100 covStat=2143.94 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >tig00006117 len=1009519 reads=1168 covStat=804.93 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >tig00000419 len=2633986 reads=3938 covStat=1519.93 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >tig00027677 len=108573 reads=86 covStat=86.05 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >tig00021790 len=202251 reads=158 covStat=184.12 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >tig00316948 len=280333 reads=237 covStat=253.23 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >tig00019606 len=149709 reads=82 covStat=150.02 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >tig00023852 len=189461 reads=115 covStat=192.28 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >tig00316994 len=19742 reads=1 covStat=0.00 gappedBases=no class=contig suggestRepeat=no suggestCircular=no I have just about given up, I have no idea why its happening it makes zero sense. Any help or information as to why this might be happening would be amazing. Thank you in advance. Lahcen -- ========================================== > Dr. Lahcen Campbell < > Contact: lahcencampbell at gmail.com < > https://www.ebi.ac.uk/about/people/lahcen-campbell < ========================================== -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_bopts.ctl Type: application/octet-stream Size: 1413 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_exe.ctl Type: application/octet-stream Size: 1512 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_opts.ctl Type: application/octet-stream Size: 5560 bytes Desc: not available URL: From michael.s.campbell1 at gmail.com Tue Nov 14 08:08:43 2017 From: michael.s.campbell1 at gmail.com (Michael Campbell) Date: Tue, 14 Nov 2017 10:08:43 -0500 Subject: [maker-devel] Unwarranted error: Skipping the contig because it is too short In-Reply-To: References: Message-ID: <3780BDEB-AF0E-4E27-9CD4-29CF0242FF9B@gmail.com> Hi Lahcen, Nothing comes right to mind for what could be causing this error. If you want to compress your FASTA and send it to me I can try and recreate the error and try and debug it. Thanks, Mike > On Nov 14, 2017, at 7:15 AM, lahcen campbell wrote: > > Hi MAKER community, > > I was hoping someone could help me. I have a very unusual error with two different versions of maker I have tested so far. This error shouldn't be happening but it occurs time and again no matter what I try. I have tried using 2.31.6_mpich3_icc and 2.31_mpich3 > > Note that version 2.31.6_mpich3_icc is one I have used countless times and produced final MAKER annotations without issue. So its not that this version has issues to date. > > Basically, this is a brand new MAKER analysis, I am only trying to train SNAP in this first round. I am following the MakerTutorial as documented this time around and I can't get past the initial SNAP train stage. > > I have a single genome file with, 10 Long scaffolds making up just under 11MB (subsampled from my original full length assembly) of sequence data in which to train SNAP. The fasta file is not corrupted, and has been generated in various ways in order to test formatting issues etc. > > I have only edited the maker_opts file and changed: > > genome= > protein= > protein2genome=1 > > But see attached my maker CTL files. > > The error consistently returned to me: > > Skipping the contig because it is too short!! > SeqID: contig_WHATEVER > Length: 0 > > The sequences are no where near too short. This was verified independently outside maker to be sure. > > The headers are as follows: > > >tig00000458 len=2889428 reads=4143 covStat=1793.77 gappedBases=no class=contig suggestRepeat=no suggestCircular=no > >tig00000159 len=3515005 reads=5100 covStat=2143.94 gappedBases=no class=contig suggestRepeat=no suggestCircular=no > >tig00006117 len=1009519 reads=1168 covStat=804.93 gappedBases=no class=contig suggestRepeat=no suggestCircular=no > >tig00000419 len=2633986 reads=3938 covStat=1519.93 gappedBases=no class=contig suggestRepeat=no suggestCircular=no > >tig00027677 len=108573 reads=86 covStat=86.05 gappedBases=no class=contig suggestRepeat=no suggestCircular=no > >tig00021790 len=202251 reads=158 covStat=184.12 gappedBases=no class=contig suggestRepeat=no suggestCircular=no > >tig00316948 len=280333 reads=237 covStat=253.23 gappedBases=no class=contig suggestRepeat=no suggestCircular=no > >tig00019606 len=149709 reads=82 covStat=150.02 gappedBases=no class=contig suggestRepeat=no suggestCircular=no > >tig00023852 len=189461 reads=115 covStat=192.28 gappedBases=no class=contig suggestRepeat=no suggestCircular=no > >tig00316994 len=19742 reads=1 covStat=0.00 gappedBases=no class=contig suggestRepeat=no suggestCircular=no > > I have just about given up, I have no idea why its happening it makes zero sense. > > Any help or information as to why this might be happening would be amazing. > > Thank you in advance. > Lahcen > > -- > ========================================== > > Dr. Lahcen Campbell < > > Contact: lahcencampbell at gmail.com < > > https://www.ebi.ac.uk/about/people/lahcen-campbell < > ========================================== > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.s.campbell1 at gmail.com Tue Nov 14 10:04:04 2017 From: michael.s.campbell1 at gmail.com (Michael Campbell) Date: Tue, 14 Nov 2017 12:04:04 -0500 Subject: [maker-devel] Unwarranted error: Skipping the contig because it is too short In-Reply-To: References: <3780BDEB-AF0E-4E27-9CD4-29CF0242FF9B@gmail.com> Message-ID: Hi Lancen, Thanks, the name has served me well for a number of years now :) So I started a run with your 11 scaffolds. I gave it the protein file that you sent and used all of repbase for masking. All of the scaffolds finished without error. I was hoping it would be something simple that just needed another set of eyes to see, looks like it's not the case for this one. To further rule out a data issue I would try running it with the dpp test data that is bundled with MAKER to see if you can get the same error. This data set will run in about a minute. If you are on a cluster I would try running it with and without submitting it you the nodes and with and without mpi. One thing that I have done in the past is to make a new directory and run maker there (this doesn't make a lot of sense but when the error doesn't make sense either it seems reasonable). As far as rerunning MAKER there are a couple of approaches. If you want it to stop complaining about trying to many times on failed contigs you can increase the number of tries in the opts file. The line looks like this: tries=2 #number of times to try a contig if there is a failure for some reason If you want to run it elsewhere, but you don't want to have to redo all of the repeat masking and blasting you can use the gff3 output from an earlier run. If you used gff3_merge after the first run finished you got a big gff3 file with all of the gene models and evidence. If you break up that file by the source column you can selectively pass the evidence back to MAKER. If you put all of the repeatmasker and repeatrunner entries into one file and pass it in on this line: rm_gff= #pre-identified repeat elements from an external GFF3 file you can turn off model_org= and repeat_protein=. This will speed up the next run a lot. Then you can pass in the protein2genome gff3 data on this line: protein_gff= #aligned protein homology evidence from an external GFF3 file Don't pass the blast gff3 data in. If you pass in gff3 data to maker is assumes that it is polished and will not make any effort to fix alignments. the protein2genome data is polished. est2genome is the equivalent for EST input. Clean_up is useful if you are running on a file system that limits the number of files that you can write. It removes all of the intermediate files used in the annotation. This takes away the advantage of rerunning in the same directory. clean_try deletes everything first, and starts again. clean_try is the one that deletes everything and pretends that the first run never happened. I ccd the list on this response just Incas anyone else has any ideas or is facing the same error. Let me know if any of this helps, Mike > On Nov 14, 2017, at 10:48 AM, lahcen campbell wrote: > > Hi Michael > > Nice name btw I have a Michael in my name too :) Lahcen Michael Campbell to be exact haha...anyway... thanks for the reply and offer to help. > > I have attached the file in question below. Its so strange, I had to just leave it alone cause it was making me quite frustrated. Those bugs which there are now common sense solutions are the worst cause very easily you reach a wall. > > Might it have anything at all to do with the Protein homology file I passed in ? Though, note.... the same protein files here have been used in another maker run without issue so I kind of ruled that out already.....but just spitballing at this stage. > > > Might I be so cheeky to ask you one more MAKER related question Michael... ? Feel free to ignore it I hate to push but im desperate to figure it out with little time to do so... > > I have an issue with a different MAKER analysis. Currently any new run I attempt on this datastore, which has one round successful with 25000 odd genes and double the transcripts. I attempted to run the second round with a SNAP trained hmm (first time passing in SNAP hmm following first round EST/Protein evidence). In this attempt, because we obtained so many genes I thought I would be more stringent by changing the AED to 0.7 from 1.0. Something I see now I didn't approach in the right way... too late now sadly. > > MAKER finishes fine, but now it views all previous scaffolds as FAILED. Nothing seems to change this and now the datastore is for all intents and purposes locked in failed state. It keeps mentioning changes to the opts file which there were, and that the previous runs didn't finish so it must delete them. The results obtained from round 1 are still there though Im pretty sure of that, all blast files etc are still there and populated. > > Can you tell me the main differences either clean_up or clean_try have and which will completely and irreversibly wipe the first run? Something I don't want to repeat, just allow me to progress to the next round. Im hesitant to run them, but I've backed up the datastore incase. My next attempt will be to pass the exact same maker_opts file from the round1 run, with the only change made to clean_try/clean_up....Is this approach misguided ? > > Your help is very much appreciated Michael so thank you, > Best > L > > ? > ?Combined_Protein_homology.fa.zip ?? > ?SubsampledGenomeFile_n10_11MB.fasta ? > > > > On Tue, Nov 14, 2017 at 3:08 PM, Michael Campbell > wrote: > Hi Lahcen, > > Nothing comes right to mind for what could be causing this error. If you want to compress your FASTA and send it to me I can try and recreate the error and try and debug it. > > Thanks, > Mike >> On Nov 14, 2017, at 7:15 AM, lahcen campbell > wrote: >> >> Hi MAKER community, >> >> I was hoping someone could help me. I have a very unusual error with two different versions of maker I have tested so far. This error shouldn't be happening but it occurs time and again no matter what I try. I have tried using 2.31.6_mpich3_icc and 2.31_mpich3 >> >> Note that version 2.31.6_mpich3_icc is one I have used countless times and produced final MAKER annotations without issue. So its not that this version has issues to date. >> >> Basically, this is a brand new MAKER analysis, I am only trying to train SNAP in this first round. I am following the MakerTutorial as documented this time around and I can't get past the initial SNAP train stage. >> >> I have a single genome file with, 10 Long scaffolds making up just under 11MB (subsampled from my original full length assembly) of sequence data in which to train SNAP. The fasta file is not corrupted, and has been generated in various ways in order to test formatting issues etc. >> >> I have only edited the maker_opts file and changed: >> >> genome= >> protein= >> protein2genome=1 >> >> But see attached my maker CTL files. >> >> The error consistently returned to me: >> >> Skipping the contig because it is too short!! >> SeqID: contig_WHATEVER >> Length: 0 >> >> The sequences are no where near too short. This was verified independently outside maker to be sure. >> >> The headers are as follows: >> >> >tig00000458 len=2889428 reads=4143 covStat=1793.77 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >> >tig00000159 len=3515005 reads=5100 covStat=2143.94 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >> >tig00006117 len=1009519 reads=1168 covStat=804.93 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >> >tig00000419 len=2633986 reads=3938 covStat=1519.93 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >> >tig00027677 len=108573 reads=86 covStat=86.05 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >> >tig00021790 len=202251 reads=158 covStat=184.12 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >> >tig00316948 len=280333 reads=237 covStat=253.23 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >> >tig00019606 len=149709 reads=82 covStat=150.02 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >> >tig00023852 len=189461 reads=115 covStat=192.28 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >> >tig00316994 len=19742 reads=1 covStat=0.00 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >> >> I have just about given up, I have no idea why its happening it makes zero sense. >> >> Any help or information as to why this might be happening would be amazing. >> >> Thank you in advance. >> Lahcen >> >> -- >> ========================================== >> > Dr. Lahcen Campbell < >> > Contact: lahcencampbell at gmail.com < >> > https://www.ebi.ac.uk/about/people/lahcen-campbell < >> ========================================== >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > > -- > ========================================== > > Dr. Lahcen Campbell < > > Contact: lahcencampbell at gmail.com < > > https://www.ebi.ac.uk/about/people/lahcen-campbell < > ========================================== -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Nov 14 10:17:03 2017 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 14 Nov 2017 10:17:03 -0700 Subject: [maker-devel] Unwarranted error: Skipping the contig because it is too short In-Reply-To: References: Message-ID: My first thought is that one of your entries has a header and no sequence. Try this command with the fasta you are using ?> fasta_tool file.fasta --length | sort -nrk2 fasta_tool comes with maker. That command will report empty fasta entries at the bottom of the list with length 0. Alternatively, MAKER accesses the input assembly using BioPerl. Update your BioPerl to the latest CPAN version (do not use BioPerl-live, as it will be less stable). Also BioPerl is using BerkleyDB for indexing, so if you are using a Perl that is not the system Perl (i.e. /usr/bin/perl), then it was lik,ly compiled on the machine you are using and could have been compiled without BerkleyDB support. ?Carson > On Nov 14, 2017, at 5:15 AM, lahcen campbell wrote: > > Hi MAKER community, > > I was hoping someone could help me. I have a very unusual error with two different versions of maker I have tested so far. This error shouldn't be happening but it occurs time and again no matter what I try. I have tried using 2.31.6_mpich3_icc and 2.31_mpich3 > > Note that version 2.31.6_mpich3_icc is one I have used countless times and produced final MAKER annotations without issue. So its not that this version has issues to date. > > Basically, this is a brand new MAKER analysis, I am only trying to train SNAP in this first round. I am following the MakerTutorial as documented this time around and I can't get past the initial SNAP train stage. > > I have a single genome file with, 10 Long scaffolds making up just under 11MB (subsampled from my original full length assembly) of sequence data in which to train SNAP. The fasta file is not corrupted, and has been generated in various ways in order to test formatting issues etc. > > I have only edited the maker_opts file and changed: > > genome= > protein= > protein2genome=1 > > But see attached my maker CTL files. > > The error consistently returned to me: > > Skipping the contig because it is too short!! > SeqID: contig_WHATEVER > Length: 0 > > The sequences are no where near too short. This was verified independently outside maker to be sure. > > The headers are as follows: > > >tig00000458 len=2889428 reads=4143 covStat=1793.77 gappedBases=no class=contig suggestRepeat=no suggestCircular=no > >tig00000159 len=3515005 reads=5100 covStat=2143.94 gappedBases=no class=contig suggestRepeat=no suggestCircular=no > >tig00006117 len=1009519 reads=1168 covStat=804.93 gappedBases=no class=contig suggestRepeat=no suggestCircular=no > >tig00000419 len=2633986 reads=3938 covStat=1519.93 gappedBases=no class=contig suggestRepeat=no suggestCircular=no > >tig00027677 len=108573 reads=86 covStat=86.05 gappedBases=no class=contig suggestRepeat=no suggestCircular=no > >tig00021790 len=202251 reads=158 covStat=184.12 gappedBases=no class=contig suggestRepeat=no suggestCircular=no > >tig00316948 len=280333 reads=237 covStat=253.23 gappedBases=no class=contig suggestRepeat=no suggestCircular=no > >tig00019606 len=149709 reads=82 covStat=150.02 gappedBases=no class=contig suggestRepeat=no suggestCircular=no > >tig00023852 len=189461 reads=115 covStat=192.28 gappedBases=no class=contig suggestRepeat=no suggestCircular=no > >tig00316994 len=19742 reads=1 covStat=0.00 gappedBases=no class=contig suggestRepeat=no suggestCircular=no > > I have just about given up, I have no idea why its happening it makes zero sense. > > Any help or information as to why this might be happening would be amazing. > > Thank you in advance. > Lahcen > > -- > ========================================== > > Dr. Lahcen Campbell < > > Contact: lahcencampbell at gmail.com < > > https://www.ebi.ac.uk/about/people/lahcen-campbell < > ========================================== > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From lahcencampbell at gmail.com Wed Nov 15 09:32:02 2017 From: lahcencampbell at gmail.com (lahcen campbell) Date: Wed, 15 Nov 2017 16:32:02 +0000 Subject: [maker-devel] Unwarranted error: Skipping the contig because it is too short In-Reply-To: References: <3780BDEB-AF0E-4E27-9CD4-29CF0242FF9B@gmail.com> Message-ID: Hi Michael and Carson Thank you both for your helpful input, I really appreciate it. See below for my comments... Best Lahcen On Tue, Nov 14, 2017 at 5:04 PM, Michael Campbell < michael.s.campbell1 at gmail.com> wrote: > Hi Lancen, > > Thanks, the name has served me well for a number of years now :) > Its a good name, I wouldn't change it haha :) > > So I started a run with your 11 scaffolds. I gave it the protein file that > you sent and used all of repbase for masking. All of the scaffolds finished > without error. I was hoping it would be something simple that just needed > another set of eyes to see, looks like it's not the case for this one. > > To further rule out a data issue I would try running it with the dpp test > data that is bundled with MAKER to see if you can get the same error. This > data set will run in about a minute. If you are on a cluster I would try > running it with and without submitting it you the nodes and with and > without mpi. > > One thing that I have done in the past is to make a new directory and run > maker there (this doesn't make a lot of sense but when the error doesn't > make sense either it seems reasonable). > First off, I can report good news regards the 0 lengths contigs I was getting back. Carson, your thoughts on Bioperl conflict issues seemed to be the main issue. Out cluster software environment had gone through some changes of late, so working off the basis of that I was able to load the right bash config which resulted in no more 0 length contig errors. Huzzah !! > As far as rerunning MAKER there are a couple of approaches. If you want it > to stop complaining about trying to many times on failed contigs you can > increase the number of tries in the opts file. The line looks like this: > > tries=2 #number of times to try a contig if there is a failure for some > reason > > If you want to run it elsewhere, but you don't want to have to redo all of > the repeat masking and blasting you can use the gff3 output from an earlier > run. If you used gff3_merge after the first run finished you got a big gff3 > file with all of the gene models and evidence. If you break up that file by > the source column you can selectively pass the evidence back to MAKER. If > you put all of the repeatmasker and repeatrunner entries into one file and > pass it in on this line: > Can I ask, because I can't seem to find any concrete info on best practices for parsing MAKER gffs to partition the various source column fields as you described Michael. Is there a commonly used way to partition MAKER gffs based on source column? Or will I need to code it up, I ask because I feel this must have been needed before many times by other users. > > rm_gff= #pre-identified repeat elements from an external GFF3 file > I will remove links to fasta files for both 'rmlib=' and 'repeat_protein=' > > you can turn off model_org= and repeat_protein=. This will speed up the > next run a lot. Then you can pass in the protein2genome gff3 data on this > line: > > protein_gff= #aligned protein homology evidence from an external GFF3 file > > Don't pass the blast gff3 data in. If you pass in gff3 data to maker is > assumes that it is polished and will not make any effort to fix alignments. > the protein2genome data is polished. est2genome is the equivalent for EST > input. > You say don't pass the blast as gff. As I pass in all other info via GFF3 and remove any evidence as fasta inputs... BLAST won't be called again right ? Ensuring the shortest possible rerun of MAKER to roll back to a uncorrupted state. I noticed that the only unique source field types in my MAKER GFF are as follows: *augustus_masked * *blastx* *maker* *protein2genome* *repeatmasker* *repeatrunner* I read on the dev group that passing est evidence as GFF won't actually call Exonerate, est2genome option just tells MAKER to try and turn polished EST alignments directly into genes.... so If I pass this info again as GFF it will simply use the same info as it did originally and not have to recompute anything ? Based on the above fields contained in my MAKER gff, which of the following options should I select to re-annotate based on this older run ? I suspect all the options below in green should be set to 1, and the others in red set to 0. *#-----Re-annotation Using MAKER Derived GFF3* ..... *est_pass=1 #use ESTs in maker_gff: 1 = yes, 0 = no* *altest_pass=0 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no* *protein_pass=1 #use protein alignments in maker_gff: 1 = yes, 0 = no* *rm_pass=1 #use repeats in maker_gff: 1 = yes, 0 = no* *model_pass=1 #use gene models in maker_gff: 1 = yes, 0 = no* *pred_pass=1 #use ab-initio predictions in maker_gff: 1 = yes, 0 = no* *other_pass=0 #passthrough anyything else in maker_gff: 1 = yes, 0 = no * I don't think I will pass back anything under augustus_masked as I didn't set that up correctly initially, instead passing in a precomputed augustus gff which Im told isn't the best way to run MAKER. So if I can get back to a state of not failing all contigs, I will run Augustus inside maker itself on the 2nd pass. Note though, I am aware of the order of things normally, but for this instance I will continue with what I have done with success previously. Lastly, as this next run will be updating based on previous generated MAKER gff data.... what states should est2genome and protein2genome be ? 1 or 0 ? Apologies for the lengthy email reply Michael. Much appreciated again, thank you !! L > Clean_up is useful if you are running on a file system that limits the > number of files that you can write. It removes all of the intermediate > files used in the annotation. This takes away the advantage of rerunning in > the same directory. clean_try deletes everything first, and starts again. > clean_try is the one that deletes everything and pretends that the first > run never happened. > > I ccd the list on this response just Incas anyone else has any ideas or is > facing the same error. > > Let me know if any of this helps, > Mike > > On Nov 14, 2017, at 10:48 AM, lahcen campbell > wrote: > > Hi Michael > > Nice name btw I have a Michael in my name too :) Lahcen Michael Campbell > to be exact haha...anyway... thanks for the reply and offer to help. > > I have attached the file in question below. Its so strange, I had to just > leave it alone cause it was making me quite frustrated. Those bugs which > there are now common sense solutions are the worst cause very easily you > reach a wall. > > Might it have anything at all to do with the Protein homology file I > passed in ? Though, note.... the same protein files here have been used in > another maker run without issue so I kind of ruled that out already.....but > just spitballing at this stage. > > > Might I be so cheeky to ask you one more MAKER related question Michael... > ? Feel free to ignore it I hate to push but im desperate to figure it out > with little time to do so... > > I have an issue with a different MAKER analysis. Currently any new run I > attempt on this datastore, which has one round successful with 25000 odd > genes and double the transcripts. I attempted to run the second round with > a SNAP trained hmm (first time passing in SNAP hmm following first round > EST/Protein evidence). In this attempt, because we obtained so many genes I > thought I would be more stringent by changing the AED to 0.7 from 1.0. > Something I see now I didn't approach in the right way... too late now > sadly. > > MAKER finishes fine, but now it views all previous scaffolds as FAILED. > Nothing seems to change this and now the datastore is for all intents and > purposes locked in failed state. It keeps mentioning changes to the opts > file which there were, and that the previous runs didn't finish so it must > delete them. The results obtained from round 1 are still there though Im > pretty sure of that, all blast files etc are still there and populated. > > Can you tell me the main differences either clean_up or clean_try have and > which will completely and irreversibly wipe the first run? Something I > don't want to repeat, just allow me to progress to the next round. Im > hesitant to run them, but I've backed up the datastore incase. My next > attempt will be to pass the exact same maker_opts file from the round1 run, > with the only change made to clean_try/clean_up....Is this approach > misguided ? > > Your help is very much appreciated Michael so thank you, > Best > L > > ? > Combined_Protein_homology.fa.zip > > ?? > SubsampledGenomeFile_n10_11MB.fasta > > ? > > > > On Tue, Nov 14, 2017 at 3:08 PM, Michael Campbell < > michael.s.campbell1 at gmail.com> wrote: > >> Hi Lahcen, >> >> Nothing comes right to mind for what could be causing this error. If you >> want to compress your FASTA and send it to me I can try and recreate the >> error and try and debug it. >> >> Thanks, >> Mike >> >> On Nov 14, 2017, at 7:15 AM, lahcen campbell >> wrote: >> >> Hi MAKER community, >> >> I was hoping someone could help me. I have a very unusual error with two >> different versions of maker I have tested so far. This error shouldn't be >> happening but it occurs time and again no matter what I try. I have tried >> using 2.31.6_mpich3_icc and 2.31_mpich3 >> >> Note that version 2.31.6_mpich3_icc is one I have used countless times >> and produced final MAKER annotations without issue. So its not that this >> version has issues to date. >> >> Basically, this is a brand new MAKER analysis, I am only trying to train >> SNAP in this first round. I am following the MakerTutorial as documented >> this time around and I can't get past the initial SNAP train stage. >> >> I have a single genome file with, 10 Long scaffolds making up just under >> 11MB (subsampled from my original full length assembly) of sequence data in >> which to train SNAP. The fasta file is not corrupted, and has been >> generated in various ways in order to test formatting issues etc. >> >> I have only edited the maker_opts file and changed: >> >> *genome=* >> *protein=* >> *protein2genome=1* >> >> But see attached my maker CTL files. >> >> The error consistently returned to me: >> >> *Skipping the contig because it is too short!!* >> *SeqID: contig_WHATEVER* >> *Length: 0* >> >> *The sequences are no where near too short. This was verified >> independently outside maker to be sure. * >> >> *The headers are as follows:* >> >> >tig00000458 len=2889428 reads=4143 covStat=1793.77 gappedBases=no >> class=contig suggestRepeat=no suggestCircular=no >> >tig00000159 len=3515005 reads=5100 covStat=2143.94 gappedBases=no >> class=contig suggestRepeat=no suggestCircular=no >> >tig00006117 len=1009519 reads=1168 covStat=804.93 gappedBases=no >> class=contig suggestRepeat=no suggestCircular=no >> >tig00000419 len=2633986 reads=3938 covStat=1519.93 gappedBases=no >> class=contig suggestRepeat=no suggestCircular=no >> >tig00027677 len=108573 reads=86 covStat=86.05 gappedBases=no >> class=contig suggestRepeat=no suggestCircular=no >> >tig00021790 len=202251 reads=158 covStat=184.12 gappedBases=no >> class=contig suggestRepeat=no suggestCircular=no >> >tig00316948 len=280333 reads=237 covStat=253.23 gappedBases=no >> class=contig suggestRepeat=no suggestCircular=no >> >tig00019606 len=149709 reads=82 covStat=150.02 gappedBases=no >> class=contig suggestRepeat=no suggestCircular=no >> >tig00023852 len=189461 reads=115 covStat=192.28 gappedBases=no >> class=contig suggestRepeat=no suggestCircular=no >> >tig00316994 len=19742 reads=1 covStat=0.00 gappedBases=no class=contig >> suggestRepeat=no suggestCircular=no >> >> I have just about given up, I have no idea why its happening it makes >> zero sense. >> >> Any help or information as to why this might be happening would be >> amazing. >> >> Thank you in advance. >> Lahcen >> >> -- >> ========================================== >> > Dr. Lahcen Campbell < >> > Contact: lahcencampbell at gmail.com < >> > https://www.ebi.ac.uk/about/people/lahcen-campbell < >> ========================================== >> ____________ >> ___________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> > > > -- > ========================================== > > Dr. Lahcen Campbell < > > Contact: lahcencampbell at gmail.com < > > https://www.ebi.ac.uk/about/people/lahcen-campbell < > ========================================== > > > -- ========================================== > Dr. Lahcen Campbell < > Contact: lahcencampbell at gmail.com < > https://www.ebi.ac.uk/about/people/lahcen-campbell < ========================================== -------------- next part -------------- An HTML attachment was scrubbed... URL: From lahcencampbell at gmail.com Wed Nov 15 09:56:20 2017 From: lahcencampbell at gmail.com (lahcen campbell) Date: Wed, 15 Nov 2017 16:56:20 +0000 Subject: [maker-devel] Unwarranted error: Skipping the contig because it is too short In-Reply-To: References: <3780BDEB-AF0E-4E27-9CD4-29CF0242FF9B@gmail.com> Message-ID: Just an add on to this topic.... I have found a suite of gff utilities here which I hope can help me quickly parse the MAKER gff. https://github.com/mamarjan/gff3-pltools I'll report back how it goes ! Best L On Tue, Nov 14, 2017 at 5:04 PM, Michael Campbell < michael.s.campbell1 at gmail.com> wrote: > Hi Lancen, > > Thanks, the name has served me well for a number of years now :) > > So I started a run with your 11 scaffolds. I gave it the protein file that > you sent and used all of repbase for masking. All of the scaffolds finished > without error. I was hoping it would be something simple that just needed > another set of eyes to see, looks like it's not the case for this one. > > To further rule out a data issue I would try running it with the dpp test > data that is bundled with MAKER to see if you can get the same error. This > data set will run in about a minute. If you are on a cluster I would try > running it with and without submitting it you the nodes and with and > without mpi. > > One thing that I have done in the past is to make a new directory and run > maker there (this doesn't make a lot of sense but when the error doesn't > make sense either it seems reasonable). > > As far as rerunning MAKER there are a couple of approaches. If you want it > to stop complaining about trying to many times on failed contigs you can > increase the number of tries in the opts file. The line looks like this: > > tries=2 #number of times to try a contig if there is a failure for some > reason > > If you want to run it elsewhere, but you don't want to have to redo all of > the repeat masking and blasting you can use the gff3 output from an earlier > run. If you used gff3_merge after the first run finished you got a big gff3 > file with all of the gene models and evidence. If you break up that file by > the source column you can selectively pass the evidence back to MAKER. If > you put all of the repeatmasker and repeatrunner entries into one file and > pass it in on this line: > > rm_gff= #pre-identified repeat elements from an external GFF3 file > > you can turn off model_org= and repeat_protein=. This will speed up the > next run a lot. Then you can pass in the protein2genome gff3 data on this > line: > > protein_gff= #aligned protein homology evidence from an external GFF3 file > > Don't pass the blast gff3 data in. If you pass in gff3 data to maker is > assumes that it is polished and will not make any effort to fix alignments. > the protein2genome data is polished. est2genome is the equivalent for EST > input. > > Clean_up is useful if you are running on a file system that limits the > number of files that you can write. It removes all of the intermediate > files used in the annotation. This takes away the advantage of rerunning in > the same directory. clean_try deletes everything first, and starts again. > clean_try is the one that deletes everything and pretends that the first > run never happened. > > I ccd the list on this response just Incas anyone else has any ideas or is > facing the same error. > > Let me know if any of this helps, > Mike > > On Nov 14, 2017, at 10:48 AM, lahcen campbell > wrote: > > Hi Michael > > Nice name btw I have a Michael in my name too :) Lahcen Michael Campbell > to be exact haha...anyway... thanks for the reply and offer to help. > > I have attached the file in question below. Its so strange, I had to just > leave it alone cause it was making me quite frustrated. Those bugs which > there are now common sense solutions are the worst cause very easily you > reach a wall. > > Might it have anything at all to do with the Protein homology file I > passed in ? Though, note.... the same protein files here have been used in > another maker run without issue so I kind of ruled that out already.....but > just spitballing at this stage. > > > Might I be so cheeky to ask you one more MAKER related question Michael... > ? Feel free to ignore it I hate to push but im desperate to figure it out > with little time to do so... > > I have an issue with a different MAKER analysis. Currently any new run I > attempt on this datastore, which has one round successful with 25000 odd > genes and double the transcripts. I attempted to run the second round with > a SNAP trained hmm (first time passing in SNAP hmm following first round > EST/Protein evidence). In this attempt, because we obtained so many genes I > thought I would be more stringent by changing the AED to 0.7 from 1.0. > Something I see now I didn't approach in the right way... too late now > sadly. > > MAKER finishes fine, but now it views all previous scaffolds as FAILED. > Nothing seems to change this and now the datastore is for all intents and > purposes locked in failed state. It keeps mentioning changes to the opts > file which there were, and that the previous runs didn't finish so it must > delete them. The results obtained from round 1 are still there though Im > pretty sure of that, all blast files etc are still there and populated. > > Can you tell me the main differences either clean_up or clean_try have and > which will completely and irreversibly wipe the first run? Something I > don't want to repeat, just allow me to progress to the next round. Im > hesitant to run them, but I've backed up the datastore incase. My next > attempt will be to pass the exact same maker_opts file from the round1 run, > with the only change made to clean_try/clean_up....Is this approach > misguided ? > > Your help is very much appreciated Michael so thank you, > Best > L > > ? > Combined_Protein_homology.fa.zip > > ?? > SubsampledGenomeFile_n10_11MB.fasta > > ? > > > > On Tue, Nov 14, 2017 at 3:08 PM, Michael Campbell < > michael.s.campbell1 at gmail.com> wrote: > >> Hi Lahcen, >> >> Nothing comes right to mind for what could be causing this error. If you >> want to compress your FASTA and send it to me I can try and recreate the >> error and try and debug it. >> >> Thanks, >> Mike >> >> On Nov 14, 2017, at 7:15 AM, lahcen campbell >> wrote: >> >> Hi MAKER community, >> >> I was hoping someone could help me. I have a very unusual error with two >> different versions of maker I have tested so far. This error shouldn't be >> happening but it occurs time and again no matter what I try. I have tried >> using 2.31.6_mpich3_icc and 2.31_mpich3 >> >> Note that version 2.31.6_mpich3_icc is one I have used countless times >> and produced final MAKER annotations without issue. So its not that this >> version has issues to date. >> >> Basically, this is a brand new MAKER analysis, I am only trying to train >> SNAP in this first round. I am following the MakerTutorial as documented >> this time around and I can't get past the initial SNAP train stage. >> >> I have a single genome file with, 10 Long scaffolds making up just under >> 11MB (subsampled from my original full length assembly) of sequence data in >> which to train SNAP. The fasta file is not corrupted, and has been >> generated in various ways in order to test formatting issues etc. >> >> I have only edited the maker_opts file and changed: >> >> *genome=* >> *protein=* >> *protein2genome=1* >> >> But see attached my maker CTL files. >> >> The error consistently returned to me: >> >> *Skipping the contig because it is too short!!* >> *SeqID: contig_WHATEVER* >> *Length: 0* >> >> *The sequences are no where near too short. This was verified >> independently outside maker to be sure. * >> >> *The headers are as follows:* >> >> >tig00000458 len=2889428 reads=4143 covStat=1793.77 gappedBases=no >> class=contig suggestRepeat=no suggestCircular=no >> >tig00000159 len=3515005 reads=5100 covStat=2143.94 gappedBases=no >> class=contig suggestRepeat=no suggestCircular=no >> >tig00006117 len=1009519 reads=1168 covStat=804.93 gappedBases=no >> class=contig suggestRepeat=no suggestCircular=no >> >tig00000419 len=2633986 reads=3938 covStat=1519.93 gappedBases=no >> class=contig suggestRepeat=no suggestCircular=no >> >tig00027677 len=108573 reads=86 covStat=86.05 gappedBases=no >> class=contig suggestRepeat=no suggestCircular=no >> >tig00021790 len=202251 reads=158 covStat=184.12 gappedBases=no >> class=contig suggestRepeat=no suggestCircular=no >> >tig00316948 len=280333 reads=237 covStat=253.23 gappedBases=no >> class=contig suggestRepeat=no suggestCircular=no >> >tig00019606 len=149709 reads=82 covStat=150.02 gappedBases=no >> class=contig suggestRepeat=no suggestCircular=no >> >tig00023852 len=189461 reads=115 covStat=192.28 gappedBases=no >> class=contig suggestRepeat=no suggestCircular=no >> >tig00316994 len=19742 reads=1 covStat=0.00 gappedBases=no class=contig >> suggestRepeat=no suggestCircular=no >> >> I have just about given up, I have no idea why its happening it makes >> zero sense. >> >> Any help or information as to why this might be happening would be >> amazing. >> >> Thank you in advance. >> Lahcen >> >> -- >> ========================================== >> > Dr. Lahcen Campbell < >> > Contact: lahcencampbell at gmail.com < >> > https://www.ebi.ac.uk/about/people/lahcen-campbell < >> ========================================== >> ____________ >> ___________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> > > > -- > ========================================== > > Dr. Lahcen Campbell < > > Contact: lahcencampbell at gmail.com < > > https://www.ebi.ac.uk/about/people/lahcen-campbell < > ========================================== > > > -- ========================================== > Dr. Lahcen Campbell < > Contact: lahcencampbell at gmail.com < > https://www.ebi.ac.uk/about/people/lahcen-campbell < ========================================== -------------- next part -------------- An HTML attachment was scrubbed... URL: From qwzhang0601 at gmail.com Thu Nov 16 12:46:39 2017 From: qwzhang0601 at gmail.com (Quanwei Zhang) Date: Thu, 16 Nov 2017 14:46:39 -0500 Subject: [maker-devel] About loss of Histone H2A, H2B, H4 Message-ID: Hello: We have annotated a new rodent genome using Maker2. Based on the annotated maker2 gene sets, we did gene family expansion/contraction analysis using CAFE. We found Histone H2A, H2B, H4 gene families are under contraction. I wonder whether there are known bias to predict those gene families using Maker2? For example, can this due to repeat masking of the genome? I used repeatmaker and generated species specific repeat libraries follows http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction--Basic . Thanks Best Quanwei -------------- next part -------------- An HTML attachment was scrubbed... URL: From qwzhang0601 at gmail.com Thu Nov 16 12:46:39 2017 From: qwzhang0601 at gmail.com (Quanwei Zhang) Date: Thu, 16 Nov 2017 14:46:39 -0500 Subject: [maker-devel] About loss of Histone H2A, H2B, H4 Message-ID: Hello: We have annotated a new rodent genome using Maker2. Based on the annotated maker2 gene sets, we did gene family expansion/contraction analysis using CAFE. We found Histone H2A, H2B, H4 gene families are under contraction. I wonder whether there are known bias to predict those gene families using Maker2? For example, can this due to repeat masking of the genome? I used repeatmaker and generated species specific repeat libraries follows http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction--Basic . Thanks Best Quanwei -------------- next part -------------- An HTML attachment was scrubbed... URL: From mcsimenc at gmail.com Fri Nov 17 18:39:25 2017 From: mcsimenc at gmail.com (Matt Simenc) Date: Fri, 17 Nov 2017 17:39:25 -0800 Subject: [maker-devel] 99.98% of repeatmasker features on plus strand, anyone else seen this? Message-ID: Hi everybody, I just noticed that the vast majority of features with type repeatmasker are on the plus strand in my MAKER GFFs. There are a handful on the minus strand. Has anyone else seen that in their MAKER GFFs? MAKER 2.31.8 I looked at a standalone RepeatMasker run I did and the features are more evenly distributed between the +/- strands. Matt -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Nov 17 19:09:20 2017 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 17 Nov 2017 19:09:20 -0700 Subject: [maker-devel] 99.98% of repeatmasker features on plus strand, anyone else seen this? In-Reply-To: References: Message-ID: <0DC818BC-EA36-43EA-9237-003BE07C4434@gmail.com> While transposons that encode proteins will technically have a strand, simple repeats and many others do not so the algorithms used to find them will not necessarily assign a strand. For this reason the repeats are treated as strand-less since both strands are masked and are they are arbitrarily assigned to the plus strand to avoid issues with genome browsers that cannot handle strandless features. ?Carson > On Nov 17, 2017, at 6:39 PM, Matt Simenc wrote: > > Hi everybody, > > I just noticed that the vast majority of features with type repeatmasker are on the plus strand in my MAKER GFFs. There are a handful on the minus strand. Has anyone else seen that in their MAKER GFFs? > > MAKER 2.31.8 > > I looked at a standalone RepeatMasker run I did and the features are more evenly distributed between the +/- strands. > > > Matt > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Fri Nov 17 19:23:34 2017 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 17 Nov 2017 19:23:34 -0700 Subject: [maker-devel] 99.98% of repeatmasker features on plus strand, anyone else seen this? In-Reply-To: References: Message-ID: Also MAKER clusters overlapping repeats to generate the best masking of the assembly. For the GFF3 it then assigns the name of the repeat encompassing the greatest portion of the cluster to the feature (i.e. the best representative). But the cluster is technically build from overlapping repeats on both strands (repeats tend to jump on top of other repeats, so they stack with bits and pieces of other repeats at the edges). Yet another reason why everything is just assigned to the plus strand. ?Carson > On Nov 17, 2017, at 6:39 PM, Matt Simenc wrote: > > Hi everybody, > > I just noticed that the vast majority of features with type repeatmasker are on the plus strand in my MAKER GFFs. There are a handful on the minus strand. Has anyone else seen that in their MAKER GFFs? > > MAKER 2.31.8 > > I looked at a standalone RepeatMasker run I did and the features are more evenly distributed between the +/- strands. > > > Matt > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From mcsimenc at gmail.com Sat Nov 18 09:27:25 2017 From: mcsimenc at gmail.com (Matt Simenc) Date: Sat, 18 Nov 2017 08:27:25 -0800 Subject: [maker-devel] 99.98% of repeatmasker features on plus strand, anyone else seen this? In-Reply-To: References: Message-ID: Ah ok. A messy problem! I need to approximate strandedness for TE loci if possible so will do some post processing using blast/hmmer to Repbase and Dfam. Thanks for the speedy response Carson! On Fri, Nov 17, 2017 at 6:23 PM, Carson Holt wrote: > Also MAKER clusters overlapping repeats to generate the best masking of > the assembly. For the GFF3 it then assigns the name of the repeat > encompassing the greatest portion of the cluster to the feature (i.e. the > best representative). But the cluster is technically build from overlapping > repeats on both strands (repeats tend to jump on top of other repeats, so > they stack with bits and pieces of other repeats at the edges). Yet another > reason why everything is just assigned to the plus strand. > > ?Carson > > > > On Nov 17, 2017, at 6:39 PM, Matt Simenc wrote: > > > > Hi everybody, > > > > I just noticed that the vast majority of features with type repeatmasker > are on the plus strand in my MAKER GFFs. There are a handful on the minus > strand. Has anyone else seen that in their MAKER GFFs? > > > > MAKER 2.31.8 > > > > I looked at a standalone RepeatMasker run I did and the features are > more evenly distributed between the +/- strands. > > > > > > Matt > > _______________________________________________ > > maker-devel mailing list > > maker-devel at box290.bluehost.com > > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.s.campbell1 at gmail.com Wed Nov 15 14:50:45 2017 From: michael.s.campbell1 at gmail.com (Michael Campbell) Date: Wed, 15 Nov 2017 16:50:45 -0500 Subject: [maker-devel] Unwarranted error: Skipping the contig because it is too short In-Reply-To: References: <3780BDEB-AF0E-4E27-9CD4-29CF0242FF9B@gmail.com> Message-ID: <4157C9FE-1F5D-4320-A03F-2344C1DBD81C@gmail.com> Hi Lahcen, I put some answers below. > On Nov 15, 2017, at 11:32 AM, lahcen campbell wrote: > > Hi Michael and Carson > > Thank you both for your helpful input, I really appreciate it. > > See below for my comments... > > Best > Lahcen > > > On Tue, Nov 14, 2017 at 5:04 PM, Michael Campbell > wrote: > Hi Lancen, > > Thanks, the name has served me well for a number of years now :) > > Its a good name, I wouldn't change it haha :) > > > So I started a run with your 11 scaffolds. I gave it the protein file that you sent and used all of repbase for masking. All of the scaffolds finished without error. I was hoping it would be something simple that just needed another set of eyes to see, looks like it's not the case for this one. > > To further rule out a data issue I would try running it with the dpp test data that is bundled with MAKER to see if you can get the same error. This data set will run in about a minute. If you are on a cluster I would try running it with and without submitting it you the nodes and with and without mpi. > > One thing that I have done in the past is to make a new directory and run maker there (this doesn't make a lot of sense but when the error doesn't make sense either it seems reasonable). > > First off, I can report good news regards the 0 lengths contigs I was getting back. Carson, your thoughts on Bioperl conflict issues seemed to be the main issue. Out cluster software environment had gone through some changes of late, so working off the basis of that I was able to load the right bash config which resulted in no more 0 length contig errors. Huzzah !! > Great > > As far as rerunning MAKER there are a couple of approaches. If you want it to stop complaining about trying to many times on failed contigs you can increase the number of tries in the opts file. The line looks like this: > > tries=2 #number of times to try a contig if there is a failure for some reason > > If you want to run it elsewhere, but you don't want to have to redo all of the repeat masking and blasting you can use the gff3 output from an earlier run. If you used gff3_merge after the first run finished you got a big gff3 file with all of the gene models and evidence. If you break up that file by the source column you can selectively pass the evidence back to MAKER. If you put all of the repeatmasker and repeatrunner entries into one file and pass it in on this line: > > Can I ask, because I can't seem to find any concrete info on best practices for parsing MAKER gffs to partition the various source column fields as you described Michael. > > Is there a commonly used way to partition MAKER gffs based on source column? Or will I need to code it up, I ask because I feel this must have been needed before many times by other users. > I've got a script that will do it if you want it. Since you don't need all of the entries grep is probably as easy as anyting. grep -P '\tsource\t' > > rm_gff= #pre-identified repeat elements from an external GFF3 file > > I will remove links to fasta files for both 'rmlib=' and 'repeat_protein=' > Yep > > you can turn off model_org= and repeat_protein=. This will speed up the next run a lot. Then you can pass in the protein2genome gff3 data on this line: > > protein_gff= #aligned protein homology evidence from an external GFF3 file > > Don't pass the blast gff3 data in. If you pass in gff3 data to maker is assumes that it is polished and will not make any effort to fix alignments. the protein2genome data is polished. est2genome is the equivalent for EST input. > > You say don't pass the blast as gff. As I pass in all other info via GFF3 and remove any evidence as fasta inputs... BLAST won't be called again right ? Ensuring the shortest possible rerun of MAKER to roll back to a uncorrupted state. > Right. blast will not be called as long as you remove or comment out the paths to the fastas in the est= and protein= lines. > I noticed that the only unique source field types in my MAKER GFF are as follows: > augustus_masked > blastx > maker > protein2genome > repeatmasker > repeatrunner > That look right for the run you described > I read on the dev group that passing est evidence as GFF won't actually call Exonerate, est2genome option just tells MAKER to try and turn polished EST alignments directly into genes.... so If I pass this info again as GFF it will simply use the same info as it did originally and not have to recompute anything ? > > Based on the above fields contained in my MAKER gff, which of the following options should I select to re-annotate based on this older run ? I suspect all the options below in green should be set to 1, and the others in red set to 0. > > #-----Re-annotation Using MAKER Derived GFF3 > ..... > est_pass=1 #use ESTs in maker_gff: 1 = yes, 0 = no > altest_pass=0 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no > protein_pass=1 #use protein alignments in maker_gff: 1 = yes, 0 = no > rm_pass=1 #use repeats in maker_gff: 1 = yes, 0 = no > model_pass=1 #use gene models in maker_gff: 1 = yes, 0 = no > pred_pass=1 #use ab-initio predictions in maker_gff: 1 = yes, 0 = no > other_pass=0 #passthrough anyything else in maker_gff: 1 = yes, 0 = no > You don't need model_pass or pred_pass if you plan on running gene finders > I don't think I will pass back anything under augustus_masked as I didn't set that up correctly initially, instead passing in a precomputed augustus gff which Im told isn't the best way to run MAKER. So if I can get back to a state of not failing all contigs, I will run Augustus inside maker itself on the 2nd pass. Note though, I am aware of the order of things normally, but for this instance I will continue with what I have done with success previously. Yeah, when I have issues with failing contigs I'll pull stuff out until it starts running without error, then I add things back until something breaks. > Lastly, as this next run will be updating based on previous generated MAKER gff data.... what states should est2genome and protein2genome be ? 1 or 0 ? 0 those options are just for generating gene models directly from evidence when you don't have any gene finders trained. When you say updating do you mean reusing evidence from previous runs and generating new gene annotations or are you taking existing gene models and adding new evidence to see if they can be improved? > > Apologies for the lengthy email reply Michael. Much appreciated again, thank you !! No Worries, hope it helps. > > L > > > Clean_up is useful if you are running on a file system that limits the number of files that you can write. It removes all of the intermediate files used in the annotation. This takes away the advantage of rerunning in the same directory. clean_try deletes everything first, and starts again. clean_try is the one that deletes everything and pretends that the first run never happened. > > I ccd the list on this response just Incas anyone else has any ideas or is facing the same error. > > Let me know if any of this helps, > Mike > >> On Nov 14, 2017, at 10:48 AM, lahcen campbell > wrote: >> >> Hi Michael >> >> Nice name btw I have a Michael in my name too :) Lahcen Michael Campbell to be exact haha...anyway... thanks for the reply and offer to help. >> >> I have attached the file in question below. Its so strange, I had to just leave it alone cause it was making me quite frustrated. Those bugs which there are now common sense solutions are the worst cause very easily you reach a wall. >> >> Might it have anything at all to do with the Protein homology file I passed in ? Though, note.... the same protein files here have been used in another maker run without issue so I kind of ruled that out already.....but just spitballing at this stage. >> >> >> Might I be so cheeky to ask you one more MAKER related question Michael... ? Feel free to ignore it I hate to push but im desperate to figure it out with little time to do so... >> >> I have an issue with a different MAKER analysis. Currently any new run I attempt on this datastore, which has one round successful with 25000 odd genes and double the transcripts. I attempted to run the second round with a SNAP trained hmm (first time passing in SNAP hmm following first round EST/Protein evidence). In this attempt, because we obtained so many genes I thought I would be more stringent by changing the AED to 0.7 from 1.0. Something I see now I didn't approach in the right way... too late now sadly. >> >> MAKER finishes fine, but now it views all previous scaffolds as FAILED. Nothing seems to change this and now the datastore is for all intents and purposes locked in failed state. It keeps mentioning changes to the opts file which there were, and that the previous runs didn't finish so it must delete them. The results obtained from round 1 are still there though Im pretty sure of that, all blast files etc are still there and populated. >> >> Can you tell me the main differences either clean_up or clean_try have and which will completely and irreversibly wipe the first run? Something I don't want to repeat, just allow me to progress to the next round. Im hesitant to run them, but I've backed up the datastore incase. My next attempt will be to pass the exact same maker_opts file from the round1 run, with the only change made to clean_try/clean_up....Is this approach misguided ? >> >> Your help is very much appreciated Michael so thank you, >> Best >> L >> >> ? >> ?Combined_Protein_homology.fa.zip ?? >> ?SubsampledGenomeFile_n10_11MB.fasta ? >> >> >> >> On Tue, Nov 14, 2017 at 3:08 PM, Michael Campbell > wrote: >> Hi Lahcen, >> >> Nothing comes right to mind for what could be causing this error. If you want to compress your FASTA and send it to me I can try and recreate the error and try and debug it. >> >> Thanks, >> Mike >>> On Nov 14, 2017, at 7:15 AM, lahcen campbell > wrote: >>> >>> Hi MAKER community, >>> >>> I was hoping someone could help me. I have a very unusual error with two different versions of maker I have tested so far. This error shouldn't be happening but it occurs time and again no matter what I try. I have tried using 2.31.6_mpich3_icc and 2.31_mpich3 >>> >>> Note that version 2.31.6_mpich3_icc is one I have used countless times and produced final MAKER annotations without issue. So its not that this version has issues to date. >>> >>> Basically, this is a brand new MAKER analysis, I am only trying to train SNAP in this first round. I am following the MakerTutorial as documented this time around and I can't get past the initial SNAP train stage. >>> >>> I have a single genome file with, 10 Long scaffolds making up just under 11MB (subsampled from my original full length assembly) of sequence data in which to train SNAP. The fasta file is not corrupted, and has been generated in various ways in order to test formatting issues etc. >>> >>> I have only edited the maker_opts file and changed: >>> >>> genome= >>> protein= >>> protein2genome=1 >>> >>> But see attached my maker CTL files. >>> >>> The error consistently returned to me: >>> >>> Skipping the contig because it is too short!! >>> SeqID: contig_WHATEVER >>> Length: 0 >>> >>> The sequences are no where near too short. This was verified independently outside maker to be sure. >>> >>> The headers are as follows: >>> >>> >tig00000458 len=2889428 reads=4143 covStat=1793.77 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >>> >tig00000159 len=3515005 reads=5100 covStat=2143.94 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >>> >tig00006117 len=1009519 reads=1168 covStat=804.93 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >>> >tig00000419 len=2633986 reads=3938 covStat=1519.93 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >>> >tig00027677 len=108573 reads=86 covStat=86.05 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >>> >tig00021790 len=202251 reads=158 covStat=184.12 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >>> >tig00316948 len=280333 reads=237 covStat=253.23 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >>> >tig00019606 len=149709 reads=82 covStat=150.02 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >>> >tig00023852 len=189461 reads=115 covStat=192.28 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >>> >tig00316994 len=19742 reads=1 covStat=0.00 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >>> >>> I have just about given up, I have no idea why its happening it makes zero sense. >>> >>> Any help or information as to why this might be happening would be amazing. >>> >>> Thank you in advance. >>> Lahcen >>> >>> -- >>> ========================================== >>> > Dr. Lahcen Campbell < >>> > Contact: lahcencampbell at gmail.com < >>> > https://www.ebi.ac.uk/about/people/lahcen-campbell < >>> ========================================== >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> >> >> -- >> ========================================== >> > Dr. Lahcen Campbell < >> > Contact: lahcencampbell at gmail.com < >> > https://www.ebi.ac.uk/about/people/lahcen-campbell < >> ========================================== > > > > > -- > ========================================== > > Dr. Lahcen Campbell < > > Contact: lahcencampbell at gmail.com < > > https://www.ebi.ac.uk/about/people/lahcen-campbell < > ========================================== -------------- next part -------------- An HTML attachment was scrubbed... URL: From scott at scottcain.net Mon Nov 20 18:57:09 2017 From: scott at scottcain.net (Scott Cain) Date: Mon, 20 Nov 2017 20:57:09 -0500 Subject: [maker-devel] GMOD hackathon before PAG San Diego in January In-Reply-To: References: Message-ID: Hello, This is an update on the hackathon. It is a go; the hackathon page is up on GMOD.org: http://gmod.org/wiki/2018_PAG_Hackathon And the EventBrite page is up at https://www.eventbrite.com/e/gmod-2018-pag-hackathon-tickets-39700164260 Tickets are $50 which covers the costs associated with the room and lunch on the first day. Please feel free to add suggested topics to the wiki page, or send the suggestions to me to add. Thanks, Scott On Thursday, October 12, 2017, Scott Cain wrote: > Hi all, > > This January before PAG on the Wednesday and Thursday before PAG (January > 10-11) in San Diego we are planning a GMOD hackathon. We expect that > participants will be interested in solving problems/creating solutions > related to Tripal, JBrowse, Apollo, and Galaxy but if you're interested in > another GMOD project, by all means, let us know! We expect this hackathon > to overlap with the Tripal hackathon that is on January 11 (I'm pretty > sure; right Stephen?) > > If you are interested in attending this hackathon, please let me know so I > can be sure we have an appropriately sized space. And if you're coming for > the pre-PAG hackathon, consider staying for PAG, since there is always a > lot of GMOD-related content at the meeting! > > Thanks, > Scott > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain > dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research -------------- next part -------------- An HTML attachment was scrubbed... URL: From o.k.torresen at ibv.uio.no Tue Nov 21 06:57:46 2017 From: o.k.torresen at ibv.uio.no (=?utf-8?B?T2xlIEtyaXN0aWFuIFTDuHJyZXNlbg==?=) Date: Tue, 21 Nov 2017 13:57:46 +0000 Subject: [maker-devel] substr outside of string in PhatHits_utils.pm In-Reply-To: <5E5CA836-91B1-4AA8-8DC3-68FB9885EB43@gmail.com> References: <5E5CA836-91B1-4AA8-8DC3-68FB9885EB43@gmail.com> Message-ID: <182CDDD3-A108-4095-9AC4-A2C198D34107@ibv.uio.no> Thank you Carson. After a bit of struggling, I can confirm that the same error occurs in MAKER 3.01.2 (I guess you meant that version, couldn?t find 3.02.02). I am providing a GFF to est_gff, with match and match_part entries. For at least one of the scaffolds, the last coordinate (column 5) is the same number as the length of the scaffold. That should be allowed by the GFF3 standard, right? How can I troubleshoot this? The error message is not so informative. It seems that PhatHit_utils.pm tries to find a stop codon. Snipped from that file, lines 849-850: #fix stop codon by walking downstream my $has_stop = $tM->is_ter_codon(substr($transcript_seq, $end-1-3, 3)); The GFF I am using was the output of Mikado (https://www.biorxiv.org/content/early/2017/11/09/216994), which is GFF3, and then processed a bit to make it suitable for MAKER. First converted to GTF by 'mikado util convert mikado.loci.gff3 mikado.loci.gtf' Then I selected only mRNA and exon entries, and changed mRNA to transcript to make it look like cufflinks output (and set a dummy score): grep -P "\tmRNA\t|\texon\t" mikado.loci.gtf |sed "s/mRNA/transcript/g" |awk -F "\t" '{$9=$9"cov \"10.0\";"; OFS="\t"; print $1, $2, $3, $4, $5, $6, $7, $8, $9}' > mikado.loci.score.gtf Before converting with cufflinks2gff3: cufflinks2gff3 mikado.loci.score.gtf > ests.score.gff3 Thank you. Ole > On 09 Nov 2017, at 17:28, Carson Holt wrote: > > My first guess is that if you are using gff3 files as input to anything, then there may be an issue with your GFF3 file. My second suggestion is to try MAKER 3.02.02 to see if it has the same issue. > > ?Carson > > >> On Nov 9, 2017, at 2:44 AM, Ole Kristian T?rresen wrote: >> >> Dear all, >> I'm having an issue with MAKER which I'm unable to wrap my head around. Hopefully the issue is easily identifiable and resolvable for someone with more insight than me. Please find the log output attached below. I cannot find any more information than this in any logs. Many scaffolds do complete fine, but some of the longest ones have issues. >> >> Thank you. >> >> Sincerely, >> Ole K. T?rresen >> >> Error message: >> >> #--------- command -------------# >> Widget::augustus: >> /projects/cees/bin/augustus/augustus-3.2.3/bin/augustus --strand=backward --species=gadMor2_code_braker2 --UTR=off --hintsfile=/tmp/18899594.d/maker_m8xwVp/0/34_2.22239-28465.gadMor2_code_brak >> er2.auto_annotator.xdef.augustus --extrinsicCfgFile=/projects/cees/bin/augustus/augustus-3.2.3/config/extrinsic/extrinsic.MPE.cfg --AUGUSTUS_CONFIG_PATH=/projects/cees/bin/augustus/augustus-3.2 >> .3/config /tmp/18899594.d/maker_m8xwVp/0/34_2.22239-28465.gadMor2_code_braker2.auto_annotator.augustus.fasta > /tmp/18899594.d/maker_m8xwVp/0/34_2.22239-28465.gadMor2_code_braker2.auto_annotato >> r.augustus >> #-------------------------------# >> deleted:0 genes >> begin called get_best_alt_splices1 >> ...processing 0 of 2 >> ...processing 1 of 2 >> end called get_best_alt_splices1 >> ...processing 0 of 20 >> ...processing 1 of 20 >> ...processing 2 of 20 >> ...processing 3 of 20 >> ...processing 4 of 20 >> ...processing 5 of 20 >> ...processing 6 of 20 >> ...processing 7 of 20 >> ...processing 8 of 20 >> ...processing 9 of 20 >> ...processing 10 of 20 >> ...processing 11 of 20 >> ...processing 12 of 20 >> ...processing 13 of 20 >> ...processing 14 of 20 >> ...processing 15 of 20 >> ...processing 16 of 20 >> ...processing 17 of 20 >> ...processing 18 of 20 >> ...processing 19 of 20 >> substr outside of string at /projects/cees/bin/maker/maker-3.1.1/bin/../lib/PhatHit_utils.pm line 850. >> --> rank=NA, hostname=compute-31-18.local >> ERROR: Failed while annotating transcripts >> ERROR: Chunk failed at level:1, tier_type:4 >> FAILED CONTIG:GmG20150304_scaffold_8692 >> >> ERROR: Chunk failed at level:6, tier_type:0 >> FAILED CONTIG:GmG20150304_scaffold_8692 >> >> examining contents of the fasta file and run log >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > From carsonhh at gmail.com Tue Nov 21 09:19:36 2017 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 21 Nov 2017 09:19:36 -0700 Subject: [maker-devel] About loss of Histone H2A, H2B, H4 In-Reply-To: References: Message-ID: <3A172BAF-DD5C-4CA8-8D1C-6EFF30A1FFA4@gmail.com> No known biases, but if you are concerned, you can collect known Histone H2A, H2B, H4 proteins and transcripts from other species (protein= and altest= options), them run MAKER with no masking to see if you gain any models that may have been overlooked because of over-masking of repeats. Make sure to evaluate any models you find as being a pseudogene. Run InterProScan on results to make sure they contain known InterPro domains for that gene family as well. Running without repeat masking will increase sensitivity but also false positives derived from low homology alignments to simple repeats which is why you need to evaluate results using something like InterProScan. Also run BUSCO to evaluate the completeness of the genome. Make sure that the observed contraction is not just a result of an incomplete assembly. ?Carson > On Nov 16, 2017, at 12:46 PM, Quanwei Zhang wrote: > > Hello: > > We have annotated a new rodent genome using Maker2. Based on the annotated maker2 gene sets, we did gene family expansion/contraction analysis using CAFE. We found Histone H2A, H2B, H4 gene families are under contraction. I wonder whether there are known bias to predict those gene families using Maker2? For example, can this due to repeat masking of the genome? I used repeatmaker and generated species specific repeat libraries follows http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction--Basic . > > Thanks > > Best > Quanwei > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Nov 21 09:22:58 2017 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 21 Nov 2017 09:22:58 -0700 Subject: [maker-devel] Unwarranted error: Skipping the contig because it is too short In-Reply-To: <4157C9FE-1F5D-4320-A03F-2344C1DBD81C@gmail.com> References: <3780BDEB-AF0E-4E27-9CD4-29CF0242FF9B@gmail.com> <4157C9FE-1F5D-4320-A03F-2344C1DBD81C@gmail.com> Message-ID: <172954D4-7D27-4929-8BC1-B0292F8D9BDB@gmail.com> Just one note I want to add here. When you use GFF3 to pass in results as opposed to letting MAKER use the raw alignments, you lose the ability of MAKER to base some decisions on reading frame match since you lose both the alignment sequence and cigar string of the alignment. So MAKER just assumes correct ORF and sequence match rather than evaluating it (this will make AED scores artificially better for some models). ?Carson > On Nov 15, 2017, at 2:50 PM, Michael Campbell wrote: > > Hi Lahcen, > > I put some answers below. >> On Nov 15, 2017, at 11:32 AM, lahcen campbell > wrote: >> >> Hi Michael and Carson >> >> Thank you both for your helpful input, I really appreciate it. >> >> See below for my comments... >> >> Best >> Lahcen >> >> >> On Tue, Nov 14, 2017 at 5:04 PM, Michael Campbell > wrote: >> Hi Lancen, >> >> Thanks, the name has served me well for a number of years now :) >> >> Its a good name, I wouldn't change it haha :) >> >> >> So I started a run with your 11 scaffolds. I gave it the protein file that you sent and used all of repbase for masking. All of the scaffolds finished without error. I was hoping it would be something simple that just needed another set of eyes to see, looks like it's not the case for this one. >> >> To further rule out a data issue I would try running it with the dpp test data that is bundled with MAKER to see if you can get the same error. This data set will run in about a minute. If you are on a cluster I would try running it with and without submitting it you the nodes and with and without mpi. >> >> One thing that I have done in the past is to make a new directory and run maker there (this doesn't make a lot of sense but when the error doesn't make sense either it seems reasonable). >> >> First off, I can report good news regards the 0 lengths contigs I was getting back. Carson, your thoughts on Bioperl conflict issues seemed to be the main issue. Out cluster software environment had gone through some changes of late, so working off the basis of that I was able to load the right bash config which resulted in no more 0 length contig errors. Huzzah !! >> Great >> >> As far as rerunning MAKER there are a couple of approaches. If you want it to stop complaining about trying to many times on failed contigs you can increase the number of tries in the opts file. The line looks like this: >> >> tries=2 #number of times to try a contig if there is a failure for some reason >> >> If you want to run it elsewhere, but you don't want to have to redo all of the repeat masking and blasting you can use the gff3 output from an earlier run. If you used gff3_merge after the first run finished you got a big gff3 file with all of the gene models and evidence. If you break up that file by the source column you can selectively pass the evidence back to MAKER. If you put all of the repeatmasker and repeatrunner entries into one file and pass it in on this line: >> >> Can I ask, because I can't seem to find any concrete info on best practices for parsing MAKER gffs to partition the various source column fields as you described Michael. >> >> Is there a commonly used way to partition MAKER gffs based on source column? Or will I need to code it up, I ask because I feel this must have been needed before many times by other users. >> I've got a script that will do it if you want it. Since you don't need all of the entries grep is probably as easy as anyting. grep -P '\tsource\t' >> >> rm_gff= #pre-identified repeat elements from an external GFF3 file >> >> I will remove links to fasta files for both 'rmlib=' and 'repeat_protein=' >> Yep >> >> you can turn off model_org= and repeat_protein=. This will speed up the next run a lot. Then you can pass in the protein2genome gff3 data on this line: >> >> protein_gff= #aligned protein homology evidence from an external GFF3 file >> >> Don't pass the blast gff3 data in. If you pass in gff3 data to maker is assumes that it is polished and will not make any effort to fix alignments. the protein2genome data is polished. est2genome is the equivalent for EST input. >> >> You say don't pass the blast as gff. As I pass in all other info via GFF3 and remove any evidence as fasta inputs... BLAST won't be called again right ? Ensuring the shortest possible rerun of MAKER to roll back to a uncorrupted state. >> Right. blast will not be called as long as you remove or comment out the paths to the fastas in the est= and protein= lines. > >> I noticed that the only unique source field types in my MAKER GFF are as follows: >> augustus_masked >> blastx >> maker >> protein2genome >> repeatmasker >> repeatrunner >> That look right for the run you described >> I read on the dev group that passing est evidence as GFF won't actually call Exonerate, est2genome option just tells MAKER to try and turn polished EST alignments directly into genes.... so If I pass this info again as GFF it will simply use the same info as it did originally and not have to recompute anything ? >> >> Based on the above fields contained in my MAKER gff, which of the following options should I select to re-annotate based on this older run ? I suspect all the options below in green should be set to 1, and the others in red set to 0. >> >> #-----Re-annotation Using MAKER Derived GFF3 >> ..... >> est_pass=1 #use ESTs in maker_gff: 1 = yes, 0 = no >> altest_pass=0 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no >> protein_pass=1 #use protein alignments in maker_gff: 1 = yes, 0 = no >> rm_pass=1 #use repeats in maker_gff: 1 = yes, 0 = no >> model_pass=1 #use gene models in maker_gff: 1 = yes, 0 = no >> pred_pass=1 #use ab-initio predictions in maker_gff: 1 = yes, 0 = no >> other_pass=0 #passthrough anyything else in maker_gff: 1 = yes, 0 = no >> > You don't need model_pass or pred_pass if you plan on running gene finders >> I don't think I will pass back anything under augustus_masked as I didn't set that up correctly initially, instead passing in a precomputed augustus gff which Im told isn't the best way to run MAKER. So if I can get back to a state of not failing all contigs, I will run Augustus inside maker itself on the 2nd pass. Note though, I am aware of the order of things normally, but for this instance I will continue with what I have done with success previously. > Yeah, when I have issues with failing contigs I'll pull stuff out until it starts running without error, then I add things back until something breaks. > >> Lastly, as this next run will be updating based on previous generated MAKER gff data.... what states should est2genome and protein2genome be ? 1 or 0 ? > 0 those options are just for generating gene models directly from evidence when you don't have any gene finders trained. When you say updating do you mean reusing evidence from previous runs and generating new gene annotations or are you taking existing gene models and adding new evidence to see if they can be improved? >> >> Apologies for the lengthy email reply Michael. Much appreciated again, thank you !! > No Worries, hope it helps. >> >> L >> >> >> Clean_up is useful if you are running on a file system that limits the number of files that you can write. It removes all of the intermediate files used in the annotation. This takes away the advantage of rerunning in the same directory. clean_try deletes everything first, and starts again. clean_try is the one that deletes everything and pretends that the first run never happened. >> >> I ccd the list on this response just Incas anyone else has any ideas or is facing the same error. >> >> Let me know if any of this helps, >> Mike >> >>> On Nov 14, 2017, at 10:48 AM, lahcen campbell > wrote: >>> >>> Hi Michael >>> >>> Nice name btw I have a Michael in my name too :) Lahcen Michael Campbell to be exact haha...anyway... thanks for the reply and offer to help. >>> >>> I have attached the file in question below. Its so strange, I had to just leave it alone cause it was making me quite frustrated. Those bugs which there are now common sense solutions are the worst cause very easily you reach a wall. >>> >>> Might it have anything at all to do with the Protein homology file I passed in ? Though, note.... the same protein files here have been used in another maker run without issue so I kind of ruled that out already.....but just spitballing at this stage. >>> >>> >>> Might I be so cheeky to ask you one more MAKER related question Michael... ? Feel free to ignore it I hate to push but im desperate to figure it out with little time to do so... >>> >>> I have an issue with a different MAKER analysis. Currently any new run I attempt on this datastore, which has one round successful with 25000 odd genes and double the transcripts. I attempted to run the second round with a SNAP trained hmm (first time passing in SNAP hmm following first round EST/Protein evidence). In this attempt, because we obtained so many genes I thought I would be more stringent by changing the AED to 0.7 from 1.0. Something I see now I didn't approach in the right way... too late now sadly. >>> >>> MAKER finishes fine, but now it views all previous scaffolds as FAILED. Nothing seems to change this and now the datastore is for all intents and purposes locked in failed state. It keeps mentioning changes to the opts file which there were, and that the previous runs didn't finish so it must delete them. The results obtained from round 1 are still there though Im pretty sure of that, all blast files etc are still there and populated. >>> >>> Can you tell me the main differences either clean_up or clean_try have and which will completely and irreversibly wipe the first run? Something I don't want to repeat, just allow me to progress to the next round. Im hesitant to run them, but I've backed up the datastore incase. My next attempt will be to pass the exact same maker_opts file from the round1 run, with the only change made to clean_try/clean_up....Is this approach misguided ? >>> >>> Your help is very much appreciated Michael so thank you, >>> Best >>> L >>> >>> ? >>> ?Combined_Protein_homology.fa.zip ?? >>> ?SubsampledGenomeFile_n10_11MB.fasta ? >>> >>> >>> >>> On Tue, Nov 14, 2017 at 3:08 PM, Michael Campbell > wrote: >>> Hi Lahcen, >>> >>> Nothing comes right to mind for what could be causing this error. If you want to compress your FASTA and send it to me I can try and recreate the error and try and debug it. >>> >>> Thanks, >>> Mike >>>> On Nov 14, 2017, at 7:15 AM, lahcen campbell > wrote: >>>> >>>> Hi MAKER community, >>>> >>>> I was hoping someone could help me. I have a very unusual error with two different versions of maker I have tested so far. This error shouldn't be happening but it occurs time and again no matter what I try. I have tried using 2.31.6_mpich3_icc and 2.31_mpich3 >>>> >>>> Note that version 2.31.6_mpich3_icc is one I have used countless times and produced final MAKER annotations without issue. So its not that this version has issues to date. >>>> >>>> Basically, this is a brand new MAKER analysis, I am only trying to train SNAP in this first round. I am following the MakerTutorial as documented this time around and I can't get past the initial SNAP train stage. >>>> >>>> I have a single genome file with, 10 Long scaffolds making up just under 11MB (subsampled from my original full length assembly) of sequence data in which to train SNAP. The fasta file is not corrupted, and has been generated in various ways in order to test formatting issues etc. >>>> >>>> I have only edited the maker_opts file and changed: >>>> >>>> genome= >>>> protein= >>>> protein2genome=1 >>>> >>>> But see attached my maker CTL files. >>>> >>>> The error consistently returned to me: >>>> >>>> Skipping the contig because it is too short!! >>>> SeqID: contig_WHATEVER >>>> Length: 0 >>>> >>>> The sequences are no where near too short. This was verified independently outside maker to be sure. >>>> >>>> The headers are as follows: >>>> >>>> >tig00000458 len=2889428 reads=4143 covStat=1793.77 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >>>> >tig00000159 len=3515005 reads=5100 covStat=2143.94 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >>>> >tig00006117 len=1009519 reads=1168 covStat=804.93 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >>>> >tig00000419 len=2633986 reads=3938 covStat=1519.93 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >>>> >tig00027677 len=108573 reads=86 covStat=86.05 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >>>> >tig00021790 len=202251 reads=158 covStat=184.12 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >>>> >tig00316948 len=280333 reads=237 covStat=253.23 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >>>> >tig00019606 len=149709 reads=82 covStat=150.02 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >>>> >tig00023852 len=189461 reads=115 covStat=192.28 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >>>> >tig00316994 len=19742 reads=1 covStat=0.00 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >>>> >>>> I have just about given up, I have no idea why its happening it makes zero sense. >>>> >>>> Any help or information as to why this might be happening would be amazing. >>>> >>>> Thank you in advance. >>>> Lahcen >>>> >>>> -- >>>> ========================================== >>>> > Dr. Lahcen Campbell < >>>> > Contact: lahcencampbell at gmail.com < >>>> > https://www.ebi.ac.uk/about/people/lahcen-campbell < >>>> ========================================== >>>> _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at box290.bluehost.com >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> >>> >>> >>> -- >>> ========================================== >>> > Dr. Lahcen Campbell < >>> > Contact: lahcencampbell at gmail.com < >>> > https://www.ebi.ac.uk/about/people/lahcen-campbell < >>> ========================================== >> >> >> >> >> -- >> ========================================== >> > Dr. Lahcen Campbell < >> > Contact: lahcencampbell at gmail.com < >> > https://www.ebi.ac.uk/about/people/lahcen-campbell < >> ========================================== > -------------- next part -------------- An HTML attachment was scrubbed... URL: From qwzhang0601 at gmail.com Tue Nov 21 10:42:38 2017 From: qwzhang0601 at gmail.com (Quanwei Zhang) Date: Tue, 21 Nov 2017 12:42:38 -0500 Subject: [maker-devel] About loss of Histone H2A, H2B, H4 In-Reply-To: <3A172BAF-DD5C-4CA8-8D1C-6EFF30A1FFA4@gmail.com> References: <3A172BAF-DD5C-4CA8-8D1C-6EFF30A1FFA4@gmail.com> Message-ID: Dear Carson: Thank you for your comments and suggestions. Now the SNAP was trained with repeat masked, is it necessary to retrain the predictor without repeat masking? By BUSCO analysis on the genome, the completeness is shown as below. Now I am doing the analysis using the default reports of Maker2 (i.e., gene models with evidence support, the default build). For the gene loss, besides you suggestions I am also considering to do the analysis using the gene models with evidence support plus those with scanned domains (i.e., standard build). How do you think? C:95.0%[S:92.7%,D:2.3%],F:2.2%,M:2.8%,n:4104 3902 Complete BUSCOs (C) 3806 Complete and single-copy BUSCOs (S) 96 Complete and duplicated BUSCOs (D) 92 Fragmented BUSCOs (F) 110 Missing BUSCOs (M) Thanks Best Quanwei 2017-11-21 11:19 GMT-05:00 Carson Holt : > No known biases, but if you are concerned, you can collect known Histone > H2A, H2B, H4 proteins and transcripts from other species (protein= and > altest= options), them run MAKER with no masking to see if you gain any > models that may have been overlooked because of over-masking of repeats. > Make sure to evaluate any models you find as being a pseudogene. Run > InterProScan on results to make sure they contain known InterPro domains > for that gene family as well. Running without repeat masking will increase > sensitivity but also false positives derived from low homology alignments > to simple repeats which is why you need to evaluate results using something > like InterProScan. > > Also run BUSCO to evaluate the completeness of the genome. Make sure that > the observed contraction is not just a result of an incomplete assembly. > > ?Carson > > > On Nov 16, 2017, at 12:46 PM, Quanwei Zhang wrote: > > Hello: > > We have annotated a new rodent genome using Maker2. Based on the annotated > maker2 gene sets, we did gene family expansion/contraction analysis using > CAFE. We found Histone H2A, H2B, H4 gene families are under contraction. I > wonder whether there are known bias to predict those gene families using > Maker2? For example, can this due to repeat masking of the genome? I used > repeatmaker and generated species specific repeat libraries follows > http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/ > Repeat_Library_Construction--Basic. > > Thanks > > Best > Quanwei > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wanghai01 at caas.cn Mon Nov 27 06:18:36 2017 From: wanghai01 at caas.cn (HAI WANG) Date: Mon, 27 Nov 2017 08:18:36 -0500 Subject: [maker-devel] Need your help on maker pipeline Message-ID: <000601d36782$3e24e0d0$ba6ea270$@cn> Dear Professor Yandell, I am Hai Wang, a visiting scholar in Cornell University. I am sorry to bother you, but I really need your help. I am now using the maker pipeline to annotate a maize genome. The installation of maker, openmpi and other software should be OK since I've successfully run maker on your example data. But when I ran maker on my own maize genome, I always got the following error: A process has executed an operation involving a call to the "fork()" system call to create a child process. Open MPI is currently operating in a condition that could result in memory corruption or other system errors; your job may hang, crash, or produce silent data corruption. The use of fork() (or system() or other calls that create child processes) is strongly discouraged. The process that invoked fork was: Local host: [[21269,1],0] (PID 12537) If you are *absolutely sure* that your application will successfully and correctly survive a call to fork(), you may disable this warning by setting the mpi_warn_on_fork MCA parameter to 0. -------------------------------------------------------------------------- -------------------------------------------------------------------------- mpiexec noticed that process rank 32 with PID 0 on node fat1 exited on signal 11 (Segmentation fault). -------------------------------------------------------------------------- Could you please help me with this issue? Or is there a way that I can resume this job when it stops? Thank you very much! Best, Hai Wang -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Nov 27 12:45:57 2017 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 27 Nov 2017 12:45:57 -0700 Subject: [maker-devel] Need your help on maker pipeline In-Reply-To: <000601d36782$3e24e0d0$ba6ea270$@cn> References: <000601d36782$3e24e0d0$ba6ea270$@cn> Message-ID: The parameters needed to get OpenMPI to work with MAKER are described in the ?/maker/INSTALL file (specifically look at LD_PRELOAD and -mca btl ^openib) ?> !!IMPORTANT!! MAKER is not compatible with MVAPICH2. Use OpenMPI or MPICH. If using MPICH, make sure to enable shared libaries during installation (this is not the default). If using OpenMPI, make sure to set LD_PRELOAD to the location of libmpi.so before even trying to install MAKER. It must also be set before running MAKER (or any program that uses OpenMPI's shared libraries), so it's best just to add it to your ~/.bash_profile. (i.e. export LD_PRELOAD=/usr/local/openmpi/lib/libmpi.so). 1. Say yes to the 'configure for MPI' question when running 'perl Build.PL' in step 1 of the EASY INSTALL. 2. Give path to 'mpicc'. Note to make sure you do not give the path to 'mpicc' from another MPI flavor that might be installed on your system. 3. Give path to the folder containing 'mpi,h'. Note to make sure you do not give the path to a folder from another MPI flavor that might be installed on your system. Mixing MPI flavors for 'mpicc' and 'mpi.h' will cause failures. Make sure to read and confirm the auto-detected paths. 4. Finish installation according to steps 2-4 of the EASY INSTALL Note: For OpenMPI you may also want to set OMPI_MCA_mpi_warn_on_fork=0 in your ~/.bash_profile to turn off certain nonfatal warnings. Note: If jobs hang or freeze when using mpiexec under OpenMPI try adding the '-mca btl ^openib' flag to mpiexec command when running MAKER. Example: mpiexec -mca btl ^openib -n 20 maker Then to diable the forks warning, just add the parameter --mca mpi_warn_on_fork 0 to the mpiexec options as described in the warning. How to run with OpenMPi has also been covered extensively ibn the MAKER list archives and more detail can be found there ?> https://groups.google.com/forum/#!searchin/maker-devel/openmpi%7Csort:date Thanks, Carson > On Nov 27, 2017, at 6:18 AM, HAI WANG wrote: > > Dear Professor Yandell, > > I am Hai Wang, a visiting scholar in Cornell University. I am sorry to bother you, but I really need your help. I am now using the maker pipeline to annotate a maize genome. The installation of maker, openmpi and other software should be OK since I?ve successfully run maker on your example data. > > But when I ran maker on my own maize genome, I always got the following error: > > > A process has executed an operation involving a call to the > "fork()" system call to create a child process. Open MPI is currently > operating in a condition that could result in memory corruption or > other system errors; your job may hang, crash, or produce silent > data corruption. The use of fork() (or system() or other calls that > create child processes) is strongly discouraged. > > The process that invoked fork was: > > Local host: [[21269,1],0] (PID 12537) > > If you are *absolutely sure* that your application will successfully > and correctly survive a call to fork(), you may disable this warning > by setting the mpi_warn_on_fork MCA parameter to 0. > -------------------------------------------------------------------------- > -------------------------------------------------------------------------- > mpiexec noticed that process rank 32 with PID 0 on node fat1 exited on signal 11 (Segmentation fault). > -------------------------------------------------------------------------- > > Could you please help me with this issue? Or is there a way that I can resume this job when it stops? Thank you very much! > > Best, > Hai Wang > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Nov 27 12:56:04 2017 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 27 Nov 2017 12:56:04 -0700 Subject: [maker-devel] About loss of Histone H2A, H2B, H4 In-Reply-To: References: <3A172BAF-DD5C-4CA8-8D1C-6EFF30A1FFA4@gmail.com> Message-ID: You should not have to train separately for SNAP on unmasked sequence, and I do believe adding back genes that were rejected because of lack of support but contain an identifiable domain may help. These will be in the fasta files labeled non-overlapping file in the datastore. ?Carson > On Nov 21, 2017, at 10:42 AM, Quanwei Zhang wrote: > > Dear Carson: > > Thank you for your comments and suggestions. Now the SNAP was trained with repeat masked, is it necessary to retrain the predictor without repeat masking? > By BUSCO analysis on the genome, the completeness is shown as below. Now I am doing the analysis using the default reports of Maker2 (i.e., gene models with evidence support, the default build). For the gene loss, besides you suggestions I am also considering to do the analysis using the gene models with evidence support plus those with scanned domains (i.e., standard build). How do you think? > > > C:95.0%[S:92.7%,D:2.3%],F:2.2%,M:2.8%,n:4104 > 3902 Complete BUSCOs (C) > 3806 Complete and single-copy BUSCOs (S) > 96 Complete and duplicated BUSCOs (D) > 92 Fragmented BUSCOs (F) > 110 Missing BUSCOs (M) > > Thanks > Best > Quanwei > > > 2017-11-21 11:19 GMT-05:00 Carson Holt >: > No known biases, but if you are concerned, you can collect known Histone H2A, H2B, H4 proteins and transcripts from other species (protein= and altest= options), them run MAKER with no masking to see if you gain any models that may have been overlooked because of over-masking of repeats. Make sure to evaluate any models you find as being a pseudogene. Run InterProScan on results to make sure they contain known InterPro domains for that gene family as well. Running without repeat masking will increase sensitivity but also false positives derived from low homology alignments to simple repeats which is why you need to evaluate results using something like InterProScan. > > Also run BUSCO to evaluate the completeness of the genome. Make sure that the observed contraction is not just a result of an incomplete assembly. > > ?Carson > > >> On Nov 16, 2017, at 12:46 PM, Quanwei Zhang > wrote: >> >> Hello: >> >> We have annotated a new rodent genome using Maker2. Based on the annotated maker2 gene sets, we did gene family expansion/contraction analysis using CAFE. We found Histone H2A, H2B, H4 gene families are under contraction. I wonder whether there are known bias to predict those gene families using Maker2? For example, can this due to repeat masking of the genome? I used repeatmaker and generated species specific repeat libraries follows http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction--Basic . >> >> Thanks >> >> Best >> Quanwei >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From qwzhang0601 at gmail.com Tue Nov 28 06:39:52 2017 From: qwzhang0601 at gmail.com (Quanwei Zhang) Date: Tue, 28 Nov 2017 08:39:52 -0500 Subject: [maker-devel] About loss of Histone H2A, H2B, H4 In-Reply-To: References: <3A172BAF-DD5C-4CA8-8D1C-6EFF30A1FFA4@gmail.com> Message-ID: Dear Carson: Thank you! Best Quanwei 2017-11-27 14:56 GMT-05:00 Carson Holt : > You should not have to train separately for SNAP on unmasked sequence, and > I do believe adding back genes that were rejected because of lack of > support but contain an identifiable domain may help. These will be in the > fasta files labeled non-overlapping file in the datastore. > > ?Carson > > On Nov 21, 2017, at 10:42 AM, Quanwei Zhang wrote: > > Dear Carson: > > Thank you for your comments and suggestions. Now the SNAP was trained with > repeat masked, is it necessary to retrain the predictor without repeat > masking? > By BUSCO analysis on the genome, the completeness is shown as below. Now I > am doing the analysis using the default reports of Maker2 (i.e., gene > models with evidence support, the default build). For the gene loss, > besides you suggestions I am also considering to do the analysis using the > gene models with evidence support plus those with scanned domains (i.e., > standard build). How do you think? > > > C:95.0%[S:92.7%,D:2.3%],F:2.2%,M:2.8%,n:4104 > 3902 Complete BUSCOs (C) > 3806 Complete and single-copy BUSCOs (S) > 96 Complete and duplicated BUSCOs (D) > 92 Fragmented BUSCOs (F) > 110 Missing BUSCOs (M) > > Thanks > Best > Quanwei > > > 2017-11-21 11:19 GMT-05:00 Carson Holt : > >> No known biases, but if you are concerned, you can collect known Histone >> H2A, H2B, H4 proteins and transcripts from other species (protein= and >> altest= options), them run MAKER with no masking to see if you gain any >> models that may have been overlooked because of over-masking of repeats. >> Make sure to evaluate any models you find as being a pseudogene. Run >> InterProScan on results to make sure they contain known InterPro domains >> for that gene family as well. Running without repeat masking will increase >> sensitivity but also false positives derived from low homology alignments >> to simple repeats which is why you need to evaluate results using something >> like InterProScan. >> >> Also run BUSCO to evaluate the completeness of the genome. Make sure that >> the observed contraction is not just a result of an incomplete assembly. >> >> ?Carson >> >> >> On Nov 16, 2017, at 12:46 PM, Quanwei Zhang >> wrote: >> >> Hello: >> >> We have annotated a new rodent genome using Maker2. Based on the >> annotated maker2 gene sets, we did gene family expansion/contraction >> analysis using CAFE. We found Histone H2A, H2B, H4 gene families are under >> contraction. I wonder whether there are known bias to predict those gene >> families using Maker2? For example, can this due to repeat masking of the >> genome? I used repeatmaker and generated species specific repeat libraries >> follows http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repe >> at_Library_Construction--Basic. >> >> Thanks >> >> Best >> Quanwei >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Nov 28 16:39:47 2017 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 28 Nov 2017 16:39:47 -0700 Subject: [maker-devel] custom "ab initio" predictions with automatic hint-based predictions In-Reply-To: <81D27009-2422-4116-848A-E2C862A74075@univie.ac.at> References: <947BFB2F-A893-417B-A043-07CE71F6F97E@gmail.com> <81D27009-2422-4116-848A-E2C862A74075@univie.ac.at> Message-ID: <768084A0-A5DA-4745-8151-D53AD0E495E3@gmail.com> Your patch will essentially just turn off all maker hint based gene prediction when no_abinit is turned on. We do not currently have a way to pass in external hints, but if you just want your hint based predictions to compete against MAKER hint based prediction, you can provide it as pred_gff while still letting MAKER run by giving the augustus_species file. ?Carson > On Nov 28, 2017, at 7:37 AM, Bob Zimmermann wrote: > > Dear Carson, > > Thanks for the response! Sorry for the slow reply. > > Actually what I meant was that I wanted to generate other types of hints that maker could not automatically use to prevent lower quality ab initio predictions from influencing the final output. Therefore I wanted to make my own ab intio predicitions prior to running maker, and then have maker to generate the transcript hints and then run augustus, finally synthesizing my own ab initio predicions with the maker hint-based ones. (In other words, just run the second round of augustus, not the first one.) > > I?ve attached a patch which seemed to allow me to tell maker to do what I wanted it to do. Am I missing something? > > Best, > Bob > > ? > > Department of Molecular Evolution and Development > Universit?t Wien > Althanstra?e 14 (UZA I), Zimmer 2.019 > 1090 Vienna > Austria > > +43 1 427757002 > > > >> On 13 Oct 2017, at 17:42, Carson Holt wrote: >> >> Hi Bob, >> >> pred_gff is a way to get models MAKER cannot run into the analysis. Input to pred_gff will not get hints since MAKER is not running the program. Setting augustus_species allows MAKER to run Augustus with and without hints and then those models compete against each other. You cannot just run with hints as the raw model is also used as a filter to help reduce false positive gene models that result from bad hints. If the gff3 you are providing is the same as the MAKER run of Augustus, I would recommend not providing it. If it is different in some way, then you can leave it in. If you run under MPI (it?s ok to run MPI on a single machine), then MAKER will parallelize the Augustus run by running multiple configs and contig chunks at the same time. >> >> Thanks, >> Carson >> >> >> >> >> >>> On Oct 11, 2017, at 1:42 PM, Bob Zimmermann wrote: >>> >>> Hello, >>> >>> I would like to run maker with a custom set of ab initio predictions (based on hints given to augustus from RNAseq data), but allowing it to incorporate EST and protein data to make an additional run of augustus using hints derived from those alignments. >>> >>> My gene prediction section of the maker_opts.ctl file looks like this: >>> ... >>> augustus_species=all_combined #Augustus gene prediction species model >>> ... >>> pred_gff=../ab_initio_predictions/all_combined.augustus_masked.gff3 #ab-initio predictions from an external GFF3 file >>> model_gff= #annotated gene models from an external GFF3 file (annotation pass-through) >>> est2genome=1 #infer gene predictions directly from ESTs, 1 = yes, 0 = no >>> protein2genome=0 #infer predictions from protein homology, 1 = yes, 0 = no >>> ? >>> >>> It seems as though even if pred_gff is set, augustus will still be run for ab initio predictions with no hints if an augustus_species setting is present. I was curious if there was any way around this, partly because custom ab initios could improve my annotation and also because the ab initio step can take long. >>> >>> Thanks for your help! >>> >>> Bob >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> > From robert.zimmermann at univie.ac.at Tue Nov 28 07:37:40 2017 From: robert.zimmermann at univie.ac.at (Bob Zimmermann) Date: Tue, 28 Nov 2017 15:37:40 +0100 Subject: [maker-devel] custom "ab initio" predictions with automatic hint-based predictions In-Reply-To: <947BFB2F-A893-417B-A043-07CE71F6F97E@gmail.com> References: <947BFB2F-A893-417B-A043-07CE71F6F97E@gmail.com> Message-ID: <81D27009-2422-4116-848A-E2C862A74075@univie.ac.at> Dear Carson, Thanks for the response! Sorry for the slow reply. Actually what I meant was that I wanted to generate other types of hints that maker could not automatically use to prevent lower quality ab initio predictions from influencing the final output. Therefore I wanted to make my own ab intio predicitions prior to running maker, and then have maker to generate the transcript hints and then run augustus, finally synthesizing my own ab initio predicions with the maker hint-based ones. (In other words, just run the second round of augustus, not the first one.) I?ve attached a patch which seemed to allow me to tell maker to do what I wanted it to do. Am I missing something? Best, Bob ? Department of Molecular Evolution and Development Universit?t Wien Althanstra?e 14 (UZA I), Zimmer 2.019 1090 Vienna Austria +43 1 427757002 -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_noabinit.patch Type: application/octet-stream Size: 950 bytes Desc: not available URL: -------------- next part -------------- > On 13 Oct 2017, at 17:42, Carson Holt wrote: > > Hi Bob, > > pred_gff is a way to get models MAKER cannot run into the analysis. Input to pred_gff will not get hints since MAKER is not running the program. Setting augustus_species allows MAKER to run Augustus with and without hints and then those models compete against each other. You cannot just run with hints as the raw model is also used as a filter to help reduce false positive gene models that result from bad hints. If the gff3 you are providing is the same as the MAKER run of Augustus, I would recommend not providing it. If it is different in some way, then you can leave it in. If you run under MPI (it?s ok to run MPI on a single machine), then MAKER will parallelize the Augustus run by running multiple configs and contig chunks at the same time. > > Thanks, > Carson > > > > > >> On Oct 11, 2017, at 1:42 PM, Bob Zimmermann wrote: >> >> Hello, >> >> I would like to run maker with a custom set of ab initio predictions (based on hints given to augustus from RNAseq data), but allowing it to incorporate EST and protein data to make an additional run of augustus using hints derived from those alignments. >> >> My gene prediction section of the maker_opts.ctl file looks like this: >> ... >> augustus_species=all_combined #Augustus gene prediction species model >> ... >> pred_gff=../ab_initio_predictions/all_combined.augustus_masked.gff3 #ab-initio predictions from an external GFF3 file >> model_gff= #annotated gene models from an external GFF3 file (annotation pass-through) >> est2genome=1 #infer gene predictions directly from ESTs, 1 = yes, 0 = no >> protein2genome=0 #infer predictions from protein homology, 1 = yes, 0 = no >> ? >> >> It seems as though even if pred_gff is set, augustus will still be run for ab initio predictions with no hints if an augustus_species setting is present. I was curious if there was any way around this, partly because custom ab initios could improve my annotation and also because the ab initio step can take long. >> >> Thanks for your help! >> >> Bob >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > From aircoolsky at gmail.com Thu Nov 30 23:20:20 2017 From: aircoolsky at gmail.com (Yu-Hsuan Cheng) Date: Fri, 1 Dec 2017 14:20:20 +0800 Subject: [maker-devel] Changing the genetic code table in MAKER Message-ID: Hi, This is YuHsuan Cheng, who is a PhD student from Taiwan. I want to use the MAKER combining with SNAP to annotate ciliates genome. The genetic code for ciliates is different from other species, so I am wondering that if there is any option in MAKER I can change the genetic code table? I also asked Dr. Korf about this issue, he said SNAP has no way to change the genetic code table. I will use Augustus combining with Maker later on. The pipeline I used previously is as followed. 1. MAKER (Hints from proteome and RNAseq) 2. MAKER to Zff 3. ~/bin/maker/exe/snap/hmm-assembler.pl snapFirst . > ../../snapFirst.hmm and then used snapFirst.hmm as hints in MAKER Look forward to your reply. Thank you. Best wishes, YuHsuan Yu-Hsuan Cheng ??? Institute of Molecular Biology Academia Sinica 128 Academia road, Section 2 Nankang, Taipei 115 Taiwan Phone:+886-2-2789-9216 <+886%202%202789%209216> (Lab), +886-958-216-538 <+886%20958%20216%20538> (Mobile phone) d02b48008 at ntu.edu.tw -------------- next part -------------- An HTML attachment was scrubbed... URL: From eennadi at gmail.com Thu Nov 2 13:51:00 2017 From: eennadi at gmail.com (Emmanuel Nnadi) Date: Thu, 2 Nov 2017 20:51:00 +0100 Subject: [maker-devel] Error trying to submit genome to ncbi Message-ID: Hi, I am trying to submit my genome i annotated using maker and they sent back this error, 1. Please remove any N nucleotides from the beginning or end of the sequence 2.No feature should begin or end inside a gap. Instead the feature should be made partial at the gap boundary. [3] Coding regions should not be 5' partial if they begin with the start methionine. If this is an internal methionine int he translation than it is fine if they are partial. Conversely, all coding regions must have a stop codon or be 3' partial. You have a large number of gene features that are not associated with other features. Please include on these features in the gene description field some description of what the gene would have encoded. A feature table example of this is: <41156 >40652 gene gene_desc transposon locus_tag CR513_45338 note nonfunctional due to frameshift Please how can i use maker to solve this problem? Nnadi Nnaemeka Emmanuel Department of Microbiology, Faculty of Natural and Applied Science, Plateau State University, Bokkos, Plateau State, Nigeria. Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications -------------- next part -------------- An HTML attachment was scrubbed... URL: From dandence at gmail.com Thu Nov 2 14:08:54 2017 From: dandence at gmail.com (Daniel Ence) Date: Thu, 2 Nov 2017 16:08:54 -0400 Subject: [maker-devel] Error trying to submit genome to ncbi In-Reply-To: References: Message-ID: Hi, I think you?ve posted before about issues 1 and 2 from the NCBI. The note for issue 3 from NCBI sounds like there are gene features that don?t have associated transcript, CDS or exon features. I?m not certain how that could be a result from MAKER. It might be something that someone else created (manually or with another tool), and then passed to maker from a GFF file. In the example included in your email, it looks like these offending genes are transposons that have been annotated as genes. If that is the case for the rest of the offending genes, then I would suggest changing the ?type? field (column 3) from ?gene? to something else, like ?transposable_element? perhaps. ~Daniel > On Nov 2, 2017, at 3:51 PM, Emmanuel Nnadi wrote: > > Hi, > > I am trying to submit my genome i annotated using maker and they sent back this error, > 1. Please remove any N nucleotides from the beginning or end of the sequence > 2.No feature should begin or end inside a gap. Instead the feature should > be made partial at the gap boundary. > > [3] Coding regions should not be 5' partial if they begin with the start > methionine. If this is an internal methionine int he translation than > it is fine if they are partial. Conversely, all coding regions > must have a stop codon or be 3' partial. > You have a large number of gene features that are not associated > with other features. Please include on these features in the > gene description field some description of what the gene would > have encoded. > > A feature table example of this is: > > <41156 >40652 gene > gene_desc transposon > locus_tag CR513_45338 > note nonfunctional due to frameshift > Please how can i use maker to solve this problem? > > > Nnadi Nnaemeka Emmanuel > Department of Microbiology, > Faculty of Natural and Applied Science, > Plateau State University, Bokkos, Plateau State, Nigeria. > Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 1356 bytes Desc: not available URL: From dandence at gmail.com Thu Nov 2 14:24:31 2017 From: dandence at gmail.com (Daniel Ence) Date: Thu, 2 Nov 2017 16:24:31 -0400 Subject: [maker-devel] Error trying to submit genome to ncbi In-Reply-To: References: Message-ID: <5EB1FECF-535B-447D-AFCF-E13174DB4232@gmail.com> Hi, Thank you for sending me your data, but which ones are the offending genes that NCBI is complaining about? Can you identify the problem that NCBI is giving in some subset of the gene features? ~Daniel > On Nov 2, 2017, at 4:20 PM, Emmanuel Nnadi wrote: > > Hi Daniel thanks for your reply. > > I have attached my .tbl file > > you would see > <77753 >77549 gene > locus_tag CR513_00193 > gene AtMg00820 > note nonfunctional due to frameshift > > > Is another example. > > Its becoming frustrating. > > I have not posted the two errors before > [1] Please remove any N nucleotides from the beginning or end of the sequence. > > [2] No feature should begin or end inside a gap. Instead the feature should > be made partial at the gap boundary. > > [3] Coding regions should not be 5' partial if they begin with the start > methionine. If this is an internal methionine int he translation than > it is fine if they are partial. Conversely, all coding regions > must have a stop codon or be 3' partial. > > Nnadi Nnaemeka Emmanuel > Department of Microbiology, > Faculty of Natural and Applied Science, > Plateau State University, Bokkos, Plateau State, Nigeria. > Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications > On Thu, Nov 2, 2017 at 9:08 PM, Daniel Ence > wrote: > Hi, I think you?ve posted before about issues 1 and 2 from the NCBI. The note for issue 3 from NCBI sounds like there are gene features that don?t have associated transcript, CDS or exon features. I?m not certain how that could be a result from MAKER. It might be something that someone else created (manually or with another tool), and then passed to maker from a GFF file. In the example included in your email, it looks like these offending genes are transposons that have been annotated as genes. If that is the case for the rest of the offending genes, then I would suggest changing the ?type? field (column 3) from ?gene? to something else, like ?transposable_element? perhaps. > > ~Daniel > > >> On Nov 2, 2017, at 3:51 PM, Emmanuel Nnadi > wrote: >> >> Hi, >> >> I am trying to submit my genome i annotated using maker and they sent back this error, >> 1. Please remove any N nucleotides from the beginning or end of the sequence >> 2.No feature should begin or end inside a gap. Instead the feature should >> be made partial at the gap boundary. >> >> [3] Coding regions should not be 5' partial if they begin with the start >> methionine. If this is an internal methionine int he translation than >> it is fine if they are partial. Conversely, all coding regions >> must have a stop codon or be 3' partial. >> You have a large number of gene features that are not associated >> with other features. Please include on these features in the >> gene description field some description of what the gene would >> have encoded. >> >> A feature table example of this is: >> >> <41156 >40652 gene >> gene_desc transposon >> locus_tag CR513_45338 >> note nonfunctional due to frameshift >> Please how can i use maker to solve this problem? >> >> >> Nnadi Nnaemeka Emmanuel >> Department of Microbiology, >> Faculty of Natural and Applied Science, >> Plateau State University, Bokkos, Plateau State, Nigeria. >> Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 1356 bytes Desc: not available URL: From dandence at gmail.com Thu Nov 2 14:46:03 2017 From: dandence at gmail.com (Daniel Ence) Date: Thu, 2 Nov 2017 16:46:03 -0400 Subject: [maker-devel] Error trying to submit genome to ncbi In-Reply-To: References: <5EB1FECF-535B-447D-AFCF-E13174DB4232@gmail.com> Message-ID: These gene features with the ?nonfunctional due to frameshift? indeed do not have other features associated with them in the tbl files. Is this reflected in the gff3 files for these annotations that maker produced? I?m not certain how maker would maker a gene without a CDS or mRNA, but identifying those discrepancies would a place to understand what has happened. > On Nov 2, 2017, at 4:30 PM, Emmanuel Nnadi wrote: > > Hi Daniel, > > This is the mail they sent to me > > [1] Please remove any N nucleotides from the beginning or end of the sequence. > > [2] No feature should begin or end inside a gap. Instead the feature should > be made partial at the gap boundary. > > [3] Coding regions should not be 5' partial if they begin with the start > methionine. If this is an internal methionine int he translation than > it is fine if they are partial. Conversely, all coding regions > must have a stop codon or be 3' partial. > > [4] You have a large number of gene features that are not associated > with other features. Please include on these features in the > gene description field some description of what the gene would > have encoded. > > A feature table example of this is: > > <41156 >40652 gene > gene_desc transposon > locus_tag CR513_45338 > note nonfunctional due to frameshift > > [5] Every coding region must have a corresponding mRNA and in > every case the mRNA product name must match exactly that of the > CDS feature. > > 2 coding regions do not have an mRNA > ORIG/combined_1-5000.sqn:CDS cytochrome c oxidase subunit 2 (contig_100:<38458- > 39198, 40429->40623) CR513_00692 > ORIG/combined_1-5000.sqn:CDS cytochrome c oxidase subunit 1 > (contig_100:c>113064-111485, c111245-111221) CR513_00691 > > So I just went to the .tbl file and searched for nonfunctional due to frameshift They are quite much, I have two more .tbl files > > I used GAG annotation to remove NNN and to add start and stop codon but ncbi still complained. > > > I have ran out of idea > > Please help me > > > > > > > Nnadi Nnaemeka Emmanuel > Department of Microbiology, > Faculty of Natural and Applied Science, > Plateau State University, Bokkos, Plateau State, Nigeria. > Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications > On Thu, Nov 2, 2017 at 9:24 PM, Daniel Ence > wrote: > Hi, Thank you for sending me your data, but which ones are the offending genes that NCBI is complaining about? Can you identify the problem that NCBI is giving in some subset of the gene features? > > ~Daniel > > > > >> On Nov 2, 2017, at 4:20 PM, Emmanuel Nnadi > wrote: >> >> Hi Daniel thanks for your reply. >> >> I have attached my .tbl file >> >> you would see >> <77753 >77549 gene >> locus_tag CR513_00193 >> gene AtMg00820 >> note nonfunctional due to frameshift >> >> >> Is another example. >> >> Its becoming frustrating. >> >> I have not posted the two errors before >> [1] Please remove any N nucleotides from the beginning or end of the sequence. >> >> [2] No feature should begin or end inside a gap. Instead the feature should >> be made partial at the gap boundary. >> >> [3] Coding regions should not be 5' partial if they begin with the start >> methionine. If this is an internal methionine int he translation than >> it is fine if they are partial. Conversely, all coding regions >> must have a stop codon or be 3' partial. >> >> Nnadi Nnaemeka Emmanuel >> Department of Microbiology, >> Faculty of Natural and Applied Science, >> Plateau State University, Bokkos, Plateau State, Nigeria. >> Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications >> On Thu, Nov 2, 2017 at 9:08 PM, Daniel Ence > wrote: >> Hi, I think you?ve posted before about issues 1 and 2 from the NCBI. The note for issue 3 from NCBI sounds like there are gene features that don?t have associated transcript, CDS or exon features. I?m not certain how that could be a result from MAKER. It might be something that someone else created (manually or with another tool), and then passed to maker from a GFF file. In the example included in your email, it looks like these offending genes are transposons that have been annotated as genes. If that is the case for the rest of the offending genes, then I would suggest changing the ?type? field (column 3) from ?gene? to something else, like ?transposable_element? perhaps. >> >> ~Daniel >> >> >>> On Nov 2, 2017, at 3:51 PM, Emmanuel Nnadi > wrote: >>> >>> Hi, >>> >>> I am trying to submit my genome i annotated using maker and they sent back this error, >>> 1. Please remove any N nucleotides from the beginning or end of the sequence >>> 2.No feature should begin or end inside a gap. Instead the feature should >>> be made partial at the gap boundary. >>> >>> [3] Coding regions should not be 5' partial if they begin with the start >>> methionine. If this is an internal methionine int he translation than >>> it is fine if they are partial. Conversely, all coding regions >>> must have a stop codon or be 3' partial. >>> You have a large number of gene features that are not associated >>> with other features. Please include on these features in the >>> gene description field some description of what the gene would >>> have encoded. >>> >>> A feature table example of this is: >>> >>> <41156 >40652 gene >>> gene_desc transposon >>> locus_tag CR513_45338 >>> note nonfunctional due to frameshift >>> Please how can i use maker to solve this problem? >>> >>> >>> Nnadi Nnaemeka Emmanuel >>> Department of Microbiology, >>> Faculty of Natural and Applied Science, >>> Plateau State University, Bokkos, Plateau State, Nigeria. >>> Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 1356 bytes Desc: not available URL: From carsonhh at gmail.com Thu Nov 2 14:48:40 2017 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 2 Nov 2017 14:48:40 -0600 Subject: [maker-devel] Error trying to submit genome to ncbi In-Reply-To: References: <5EB1FECF-535B-447D-AFCF-E13174DB4232@gmail.com> Message-ID: <56DF0ADA-40DA-4C88-AD37-BF63D8BCFD22@gmail.com> If you modified the fasta files to remove N?s etc after they were annotated, then that would generate a mismatch between the GFF3 coordinates and the fasta sequence. Have you modified or split contigs in the assembly in any way? I seem to remember you posting an issue about the fasta submission to NCBI previously. ?Carson > On Nov 2, 2017, at 2:46 PM, Daniel Ence wrote: > > These gene features with the ?nonfunctional due to frameshift? indeed do not have other features associated with them in the tbl files. Is this reflected in the gff3 files for these annotations that maker produced? I?m not certain how maker would maker a gene without a CDS or mRNA, but identifying those discrepancies would a place to understand what has happened. > > > >> On Nov 2, 2017, at 4:30 PM, Emmanuel Nnadi > wrote: >> >> Hi Daniel, >> >> This is the mail they sent to me >> >> [1] Please remove any N nucleotides from the beginning or end of the sequence. >> >> [2] No feature should begin or end inside a gap. Instead the feature should >> be made partial at the gap boundary. >> >> [3] Coding regions should not be 5' partial if they begin with the start >> methionine. If this is an internal methionine int he translation than >> it is fine if they are partial. Conversely, all coding regions >> must have a stop codon or be 3' partial. >> >> [4] You have a large number of gene features that are not associated >> with other features. Please include on these features in the >> gene description field some description of what the gene would >> have encoded. >> >> A feature table example of this is: >> >> <41156 >40652 gene >> gene_desc transposon >> locus_tag CR513_45338 >> note nonfunctional due to frameshift >> >> [5] Every coding region must have a corresponding mRNA and in >> every case the mRNA product name must match exactly that of the >> CDS feature. >> >> 2 coding regions do not have an mRNA >> ORIG/combined_1-5000.sqn:CDS cytochrome c oxidase subunit 2 (contig_100:<38458- >> 39198, 40429->40623) CR513_00692 >> ORIG/combined_1-5000.sqn:CDS cytochrome c oxidase subunit 1 >> (contig_100:c>113064-111485, c111245-111221) CR513_00691 >> >> So I just went to the .tbl file and searched for nonfunctional due to frameshift They are quite much, I have two more .tbl files >> >> I used GAG annotation to remove NNN and to add start and stop codon but ncbi still complained. >> >> >> I have ran out of idea >> >> Please help me >> >> >> >> >> >> >> Nnadi Nnaemeka Emmanuel >> Department of Microbiology, >> Faculty of Natural and Applied Science, >> Plateau State University, Bokkos, Plateau State, Nigeria. >> Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications >> On Thu, Nov 2, 2017 at 9:24 PM, Daniel Ence > wrote: >> Hi, Thank you for sending me your data, but which ones are the offending genes that NCBI is complaining about? Can you identify the problem that NCBI is giving in some subset of the gene features? >> >> ~Daniel >> >> >> >> >>> On Nov 2, 2017, at 4:20 PM, Emmanuel Nnadi > wrote: >>> >>> Hi Daniel thanks for your reply. >>> >>> I have attached my .tbl file >>> >>> you would see >>> <77753 >77549 gene >>> locus_tag CR513_00193 >>> gene AtMg00820 >>> note nonfunctional due to frameshift >>> >>> >>> Is another example. >>> >>> Its becoming frustrating. >>> >>> I have not posted the two errors before >>> [1] Please remove any N nucleotides from the beginning or end of the sequence. >>> >>> [2] No feature should begin or end inside a gap. Instead the feature should >>> be made partial at the gap boundary. >>> >>> [3] Coding regions should not be 5' partial if they begin with the start >>> methionine. If this is an internal methionine int he translation than >>> it is fine if they are partial. Conversely, all coding regions >>> must have a stop codon or be 3' partial. >>> >>> Nnadi Nnaemeka Emmanuel >>> Department of Microbiology, >>> Faculty of Natural and Applied Science, >>> Plateau State University, Bokkos, Plateau State, Nigeria. >>> Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications >>> On Thu, Nov 2, 2017 at 9:08 PM, Daniel Ence > wrote: >>> Hi, I think you?ve posted before about issues 1 and 2 from the NCBI. The note for issue 3 from NCBI sounds like there are gene features that don?t have associated transcript, CDS or exon features. I?m not certain how that could be a result from MAKER. It might be something that someone else created (manually or with another tool), and then passed to maker from a GFF file. In the example included in your email, it looks like these offending genes are transposons that have been annotated as genes. If that is the case for the rest of the offending genes, then I would suggest changing the ?type? field (column 3) from ?gene? to something else, like ?transposable_element? perhaps. >>> >>> ~Daniel >>> >>> >>>> On Nov 2, 2017, at 3:51 PM, Emmanuel Nnadi > wrote: >>>> >>>> Hi, >>>> >>>> I am trying to submit my genome i annotated using maker and they sent back this error, >>>> 1. Please remove any N nucleotides from the beginning or end of the sequence >>>> 2.No feature should begin or end inside a gap. Instead the feature should >>>> be made partial at the gap boundary. >>>> >>>> [3] Coding regions should not be 5' partial if they begin with the start >>>> methionine. If this is an internal methionine int he translation than >>>> it is fine if they are partial. Conversely, all coding regions >>>> must have a stop codon or be 3' partial. >>>> You have a large number of gene features that are not associated >>>> with other features. Please include on these features in the >>>> gene description field some description of what the gene would >>>> have encoded. >>>> >>>> A feature table example of this is: >>>> >>>> <41156 >40652 gene >>>> gene_desc transposon >>>> locus_tag CR513_45338 >>>> note nonfunctional due to frameshift >>>> Please how can i use maker to solve this problem? >>>> >>>> >>>> Nnadi Nnaemeka Emmanuel >>>> Department of Microbiology, >>>> Faculty of Natural and Applied Science, >>>> Plateau State University, Bokkos, Plateau State, Nigeria. >>>> Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at box290.bluehost.com >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> >>> >> >> > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From dandence at gmail.com Thu Nov 2 15:07:01 2017 From: dandence at gmail.com (Daniel Ence) Date: Thu, 2 Nov 2017 17:07:01 -0400 Subject: [maker-devel] Error trying to submit genome to ncbi In-Reply-To: References: <5EB1FECF-535B-447D-AFCF-E13174DB4232@gmail.com> Message-ID: Hi Emmanuel, I recommend looking into what Carson suggested. If you edited the fasta files for the ?NNN? characters for the transcripts or reference genome and then resubmitted without changing the gff3 coordinates, then that would result in these kind of errors. ~Daniel > On Nov 2, 2017, at 5:02 PM, Emmanuel Nnadi wrote: > > ?muc_functional.blast.gff -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 1356 bytes Desc: not available URL: From dandence at gmail.com Thu Nov 2 15:56:24 2017 From: dandence at gmail.com (Daniel Ence) Date: Thu, 2 Nov 2017 17:56:24 -0400 Subject: [maker-devel] Error trying to submit genome to ncbi In-Reply-To: References: <5EB1FECF-535B-447D-AFCF-E13174DB4232@gmail.com> Message-ID: <20FE86D2-2431-4CD8-B4E1-E700F723760C@gmail.com> Hi Emmanuel, Please ?reply all? to in these exchanges so that they?ll stay stored on the maker-devel list for others to find in the future. It also helps keep the conversation open so that others can chime in and help out too. :) I looked at several of the ?nonfunctional due to frameshift? genes and they have associated features in the gff3 file. So there might be a frameshift issue in the original annotations, but I?d doubt that, or a frameshift error might be getting introduced when you convert to the tbl format. > On Nov 2, 2017, at 5:12 PM, Emmanuel Nnadi wrote: > > Hi Daniel > > I NCBI first complained of this even when I hadn't used GAG annotation to remove N's, > > On my raw file they complained about this > > Nnadi Nnaemeka Emmanuel > Department of Microbiology, > Faculty of Natural and Applied Science, > Plateau State University, Bokkos, Plateau State, Nigeria. > Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications > On Thu, Nov 2, 2017 at 10:07 PM, Daniel Ence > wrote: > Hi Emmanuel, I recommend looking into what Carson suggested. If you edited the fasta files for the ?NNN? characters for the transcripts or reference genome and then resubmitted without changing the gff3 coordinates, then that would result in these kind of errors. > > ~Daniel > > > > > > > > > >> On Nov 2, 2017, at 5:02 PM, Emmanuel Nnadi > wrote: >> >> ?muc_functional.blast.gff > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 1356 bytes Desc: not available URL: From o.k.torresen at ibv.uio.no Thu Nov 9 02:44:06 2017 From: o.k.torresen at ibv.uio.no (=?utf-8?B?T2xlIEtyaXN0aWFuIFTDuHJyZXNlbg==?=) Date: Thu, 9 Nov 2017 09:44:06 +0000 Subject: [maker-devel] substr outside of string in PhatHits_utils.pm Message-ID: Dear all, I'm having an issue with MAKER which I'm unable to wrap my head around. Hopefully the issue is easily identifiable and resolvable for someone with more insight than me. Please find the log output attached below. I cannot find any more information than this in any logs. Many scaffolds do complete fine, but some of the longest ones have issues. Thank you. Sincerely, Ole K. T?rresen Error message: #--------- command -------------# Widget::augustus: /projects/cees/bin/augustus/augustus-3.2.3/bin/augustus --strand=backward --species=gadMor2_code_braker2 --UTR=off --hintsfile=/tmp/18899594.d/maker_m8xwVp/0/34_2.22239-28465.gadMor2_code_brak er2.auto_annotator.xdef.augustus --extrinsicCfgFile=/projects/cees/bin/augustus/augustus-3.2.3/config/extrinsic/extrinsic.MPE.cfg --AUGUSTUS_CONFIG_PATH=/projects/cees/bin/augustus/augustus-3.2 .3/config /tmp/18899594.d/maker_m8xwVp/0/34_2.22239-28465.gadMor2_code_braker2.auto_annotator.augustus.fasta > /tmp/18899594.d/maker_m8xwVp/0/34_2.22239-28465.gadMor2_code_braker2.auto_annotato r.augustus #-------------------------------# deleted:0 genes begin called get_best_alt_splices1 ...processing 0 of 2 ...processing 1 of 2 end called get_best_alt_splices1 ...processing 0 of 20 ...processing 1 of 20 ...processing 2 of 20 ...processing 3 of 20 ...processing 4 of 20 ...processing 5 of 20 ...processing 6 of 20 ...processing 7 of 20 ...processing 8 of 20 ...processing 9 of 20 ...processing 10 of 20 ...processing 11 of 20 ...processing 12 of 20 ...processing 13 of 20 ...processing 14 of 20 ...processing 15 of 20 ...processing 16 of 20 ...processing 17 of 20 ...processing 18 of 20 ...processing 19 of 20 substr outside of string at /projects/cees/bin/maker/maker-3.1.1/bin/../lib/PhatHit_utils.pm line 850. --> rank=NA, hostname=compute-31-18.local ERROR: Failed while annotating transcripts ERROR: Chunk failed at level:1, tier_type:4 FAILED CONTIG:GmG20150304_scaffold_8692 ERROR: Chunk failed at level:6, tier_type:0 FAILED CONTIG:GmG20150304_scaffold_8692 examining contents of the fasta file and run log From lcampbell at ebi.ac.uk Thu Nov 9 04:13:35 2017 From: lcampbell at ebi.ac.uk (Lahcen Campbell) Date: Thu, 9 Nov 2017 11:13:35 +0000 Subject: [maker-devel] Model training with AED=0.7 made all contigs FAILED Message-ID: Hi folks, I would just like some insight into a recent round of MAKER annotation I performed and returned back 0 Finished contigs. The genome is a white fly, which I successfully ran MAKER on initally with the first round of "Evidence in", so passing in EST evidence as aligned transcript gffs, protein homology evidence etc. The run was successful and produced a lot of good quality gene models Statistics: ???????????24,613 genes with 49,547 transcripts containing 141130 cds. Now, I know this count is very high for our species, so in the 2nd round (completed running over 1 night due to all contigs failing) I attempted to increase the threshold for support, by reducing AED to 0.7 from an initial 1. Prior to starting the second round I had trained SNAP on the first round results and also ran Augustus separately and? passed this via the snaphmm, pred_gff option. Finally I set min protein to be no less than 100Aa and set est2genome and prot2genome off to allow for gene model refinement. I checked the run today and all ~8,000 contigs/scaffolds returned as FAILED with all having tried to be retried once each. My initial feeling was, I feared I have just lost my initial set of 24,613 gene models. I know believe that this won't be the case but Im not sure... Can anyone explain what might have happened here and what consequences will follow given they all returned as failed ? Have they been deleted from the MAKER data store ? I had capturdD all 1st round MAKER output files (GFF, Fasta files etc) before attempting this 2nd round (i.e. 1st round of model training) of MAKER . If I have irrevocably changed the datastore for MAKER and lost those genes, might I be able to restore to an earlier point (say back to the first round of evidence in gene models) by passing the first MAKER gff in as "maker_gff=" / "pred_pass=1" / "model_pass=1" ? Any advice on this would be much appreciated Lahcen -------------- next part -------------- An HTML attachment was scrubbed... URL: From lahcencampbell at gmail.com Thu Nov 9 07:53:19 2017 From: lahcencampbell at gmail.com (lahcen campbell) Date: Thu, 9 Nov 2017 14:53:19 +0000 Subject: [maker-devel] Model training with AED=0.7 made all contigs FAILED Message-ID: Apologies this message was sent earlier today from an incorrect email address so it was flagged for verification. Hi folks, I would just like some insight into a recent round of MAKER annotation I performed and returned back 0 Finished contigs. The genome is a white fly, which I successfully ran MAKE initially with the first round of "Evidence in", so passing in EST evidence as aligned transcript gffs, protein homology evidence etc. The run was successful and produced a lot of good quality gene models Statistics: 24,613 genes with 49,547 transcripts containing 141130 cds. Now, I know this count is very high for our species, so in the 2nd round (completed running over 1 night due to all contigs failing) I attempted to increase the threshold for support, by reducing AED to 0.7 from an initial 1. Prior to starting the second round I had trained SNAP on the first round results and also ran Augustus separately and passed this via the snaphmm, pred_gff option. Finally I set min protein to be no less than 100Aa and set est2genome and prot2genome off to allow for gene model refinement. I checked the run today and all ~8,000 contigs/scaffolds returned as FAILED with all having tried to be retried once each. (Note I retried to run this time reverting the AED to 1, yet the same outcome happened again). The following error appears throughout the log file: *MAKER WARNING: The file MAKER.contigs_datastore/BF/41/tig00000234//theVoid.tig00000234/0/tig00000234.0.all.rb.out* *did not finish on the last run and must be erased* My initial feeling was, I feared I have just lost my initial set of 24,613 gene models. I now believe that this won't be the case but Im not sure... Can anyone explain what might have happened here and what consequences will follow given they all returned as failed ? Have they been deleted from the MAKER data store ? Are they retrievable ? I had capturd all 1st round MAKER output files (GFF, Fasta files etc) before attempting this 2nd round (i.e. 1st round of model training) of MAKER . If I have irrevocably changed the datastore for MAKER and lost those genes, might I be able to restore to an earlier point (say back to the first round of evidence in gene models) by passing the first MAKER gff in as "maker_gff=" / "pred_pass=1" / "model_pass=1" ? As it stands, maker2zff and fasta_merge / gff3_merge all return nothing or empty output files. So clearly my gene models have been altered somehow. Any advice on this would be much appreciated. Lahcen -- ========================================== > Dr. Lahcen Campbell < > Contact: lahcencampbell at gmail.com < > https://www.ebi.ac.uk/about/people/lahcen-campbell < ========================================== -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Nov 9 09:28:19 2017 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 9 Nov 2017 09:28:19 -0700 Subject: [maker-devel] substr outside of string in PhatHits_utils.pm In-Reply-To: References: Message-ID: <5E5CA836-91B1-4AA8-8DC3-68FB9885EB43@gmail.com> My first guess is that if you are using gff3 files as input to anything, then there may be an issue with your GFF3 file. My second suggestion is to try MAKER 3.02.02 to see if it has the same issue. ?Carson > On Nov 9, 2017, at 2:44 AM, Ole Kristian T?rresen wrote: > > Dear all, > I'm having an issue with MAKER which I'm unable to wrap my head around. Hopefully the issue is easily identifiable and resolvable for someone with more insight than me. Please find the log output attached below. I cannot find any more information than this in any logs. Many scaffolds do complete fine, but some of the longest ones have issues. > > Thank you. > > Sincerely, > Ole K. T?rresen > > Error message: > > #--------- command -------------# > Widget::augustus: > /projects/cees/bin/augustus/augustus-3.2.3/bin/augustus --strand=backward --species=gadMor2_code_braker2 --UTR=off --hintsfile=/tmp/18899594.d/maker_m8xwVp/0/34_2.22239-28465.gadMor2_code_brak > er2.auto_annotator.xdef.augustus --extrinsicCfgFile=/projects/cees/bin/augustus/augustus-3.2.3/config/extrinsic/extrinsic.MPE.cfg --AUGUSTUS_CONFIG_PATH=/projects/cees/bin/augustus/augustus-3.2 > .3/config /tmp/18899594.d/maker_m8xwVp/0/34_2.22239-28465.gadMor2_code_braker2.auto_annotator.augustus.fasta > /tmp/18899594.d/maker_m8xwVp/0/34_2.22239-28465.gadMor2_code_braker2.auto_annotato > r.augustus > #-------------------------------# > deleted:0 genes > begin called get_best_alt_splices1 > ...processing 0 of 2 > ...processing 1 of 2 > end called get_best_alt_splices1 > ...processing 0 of 20 > ...processing 1 of 20 > ...processing 2 of 20 > ...processing 3 of 20 > ...processing 4 of 20 > ...processing 5 of 20 > ...processing 6 of 20 > ...processing 7 of 20 > ...processing 8 of 20 > ...processing 9 of 20 > ...processing 10 of 20 > ...processing 11 of 20 > ...processing 12 of 20 > ...processing 13 of 20 > ...processing 14 of 20 > ...processing 15 of 20 > ...processing 16 of 20 > ...processing 17 of 20 > ...processing 18 of 20 > ...processing 19 of 20 > substr outside of string at /projects/cees/bin/maker/maker-3.1.1/bin/../lib/PhatHit_utils.pm line 850. > --> rank=NA, hostname=compute-31-18.local > ERROR: Failed while annotating transcripts > ERROR: Chunk failed at level:1, tier_type:4 > FAILED CONTIG:GmG20150304_scaffold_8692 > > ERROR: Chunk failed at level:6, tier_type:0 > FAILED CONTIG:GmG20150304_scaffold_8692 > > examining contents of the fasta file and run log > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Thu Nov 9 16:30:50 2017 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 9 Nov 2017 16:30:50 -0700 Subject: [maker-devel] Model training with AED=0.7 made all contigs FAILED In-Reply-To: References: Message-ID: There is probably an issue with the GFF3 file being passed in (I?m guessing the Augustus one). I would avoid passing in Augustus results as GFF3, it removes the ability of MAKER to dynamically provide Augustus with hints as it runs. You are essentially handicapping the pipeline. If your first genes were est2genome or protein2genome based, I would not pass them back in. Those models are suitable for training but will really reduce the accuracy of downstream final annotations (that is why we tell people to turn off est2genome/protein2genome after training a gene predictor in the MAKER documentation). Also if your inputs to the first round were GFF3 files it will have to be reread regardless. Any protein or transcript data that was aligned by MAEKR will still have the BLAST results archived, so you don?t need to worry about that unless you alter repeat masking options (which would cause it to rerun). Also if you are changing GFF3 file input between runs but using the same directory, you might want to delete any ?.db? files in the output folder. those hold an SQLite database of the GFF3 input that may be corrupted if it failed while attempting to update the database content with the Augustus gff3 file. ?Carson > On Nov 9, 2017, at 4:13 AM, Lahcen Campbell wrote: > > Hi folks, > > I would just like some insight into a recent round of MAKER annotation I performed and returned back 0 Finished contigs. > The genome is a white fly, which I successfully ran MAKER on initally with the first round of "Evidence in", so passing in EST evidence as aligned transcript gffs, protein homology evidence etc. The run was successful and produced a lot of good quality gene models > > > Statistics: > 24,613 genes with 49,547 transcripts containing 141130 cds. > > Now, I know this count is very high for our species, so in the 2nd round (completed running over 1 night due to all contigs failing) I attempted to increase the threshold for support, by reducing AED to 0.7 from an initial 1. Prior to starting the second round I had trained SNAP on the first round results and also ran Augustus separately and passed this via the snaphmm, pred_gff option. Finally I set min protein to be no less than 100Aa and set est2genome and prot2genome off to allow for gene model refinement. > > I checked the run today and all ~8,000 contigs/scaffolds returned as FAILED with all having tried to be retried once each. > > My initial feeling was, I feared I have just lost my initial set of 24,613 gene models. I know believe that this won't be the case but Im not sure... Can anyone explain what might have happened here and what consequences will follow given they all returned as failed ? Have they been deleted from the MAKER data store ? > > I had capturdD all 1st round MAKER output files (GFF, Fasta files etc) before attempting this 2nd round (i.e. 1st round of model training) of MAKER . > > If I have irrevocably changed the datastore for MAKER and lost those genes, might I be able to restore to an earlier point (say back to the first round of evidence in gene models) by passing the first MAKER gff in as "maker_gff=" / "pred_pass=1" / "model_pass=1" ? > > Any advice on this would be much appreciated > Lahcen > > > > > > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From lahcencampbell at gmail.com Tue Nov 14 05:15:10 2017 From: lahcencampbell at gmail.com (lahcen campbell) Date: Tue, 14 Nov 2017 12:15:10 +0000 Subject: [maker-devel] Unwarranted error: Skipping the contig because it is too short Message-ID: Hi MAKER community, I was hoping someone could help me. I have a very unusual error with two different versions of maker I have tested so far. This error shouldn't be happening but it occurs time and again no matter what I try. I have tried using 2.31.6_mpich3_icc and 2.31_mpich3 Note that version 2.31.6_mpich3_icc is one I have used countless times and produced final MAKER annotations without issue. So its not that this version has issues to date. Basically, this is a brand new MAKER analysis, I am only trying to train SNAP in this first round. I am following the MakerTutorial as documented this time around and I can't get past the initial SNAP train stage. I have a single genome file with, 10 Long scaffolds making up just under 11MB (subsampled from my original full length assembly) of sequence data in which to train SNAP. The fasta file is not corrupted, and has been generated in various ways in order to test formatting issues etc. I have only edited the maker_opts file and changed: *genome=* *protein=* *protein2genome=1* But see attached my maker CTL files. The error consistently returned to me: *Skipping the contig because it is too short!!* *SeqID: contig_WHATEVER* *Length: 0* *The sequences are no where near too short. This was verified independently outside maker to be sure. * *The headers are as follows:* >tig00000458 len=2889428 reads=4143 covStat=1793.77 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >tig00000159 len=3515005 reads=5100 covStat=2143.94 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >tig00006117 len=1009519 reads=1168 covStat=804.93 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >tig00000419 len=2633986 reads=3938 covStat=1519.93 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >tig00027677 len=108573 reads=86 covStat=86.05 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >tig00021790 len=202251 reads=158 covStat=184.12 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >tig00316948 len=280333 reads=237 covStat=253.23 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >tig00019606 len=149709 reads=82 covStat=150.02 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >tig00023852 len=189461 reads=115 covStat=192.28 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >tig00316994 len=19742 reads=1 covStat=0.00 gappedBases=no class=contig suggestRepeat=no suggestCircular=no I have just about given up, I have no idea why its happening it makes zero sense. Any help or information as to why this might be happening would be amazing. Thank you in advance. Lahcen -- ========================================== > Dr. Lahcen Campbell < > Contact: lahcencampbell at gmail.com < > https://www.ebi.ac.uk/about/people/lahcen-campbell < ========================================== -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_bopts.ctl Type: application/octet-stream Size: 1413 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_exe.ctl Type: application/octet-stream Size: 1512 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_opts.ctl Type: application/octet-stream Size: 5560 bytes Desc: not available URL: From michael.s.campbell1 at gmail.com Tue Nov 14 08:08:43 2017 From: michael.s.campbell1 at gmail.com (Michael Campbell) Date: Tue, 14 Nov 2017 10:08:43 -0500 Subject: [maker-devel] Unwarranted error: Skipping the contig because it is too short In-Reply-To: References: Message-ID: <3780BDEB-AF0E-4E27-9CD4-29CF0242FF9B@gmail.com> Hi Lahcen, Nothing comes right to mind for what could be causing this error. If you want to compress your FASTA and send it to me I can try and recreate the error and try and debug it. Thanks, Mike > On Nov 14, 2017, at 7:15 AM, lahcen campbell wrote: > > Hi MAKER community, > > I was hoping someone could help me. I have a very unusual error with two different versions of maker I have tested so far. This error shouldn't be happening but it occurs time and again no matter what I try. I have tried using 2.31.6_mpich3_icc and 2.31_mpich3 > > Note that version 2.31.6_mpich3_icc is one I have used countless times and produced final MAKER annotations without issue. So its not that this version has issues to date. > > Basically, this is a brand new MAKER analysis, I am only trying to train SNAP in this first round. I am following the MakerTutorial as documented this time around and I can't get past the initial SNAP train stage. > > I have a single genome file with, 10 Long scaffolds making up just under 11MB (subsampled from my original full length assembly) of sequence data in which to train SNAP. The fasta file is not corrupted, and has been generated in various ways in order to test formatting issues etc. > > I have only edited the maker_opts file and changed: > > genome= > protein= > protein2genome=1 > > But see attached my maker CTL files. > > The error consistently returned to me: > > Skipping the contig because it is too short!! > SeqID: contig_WHATEVER > Length: 0 > > The sequences are no where near too short. This was verified independently outside maker to be sure. > > The headers are as follows: > > >tig00000458 len=2889428 reads=4143 covStat=1793.77 gappedBases=no class=contig suggestRepeat=no suggestCircular=no > >tig00000159 len=3515005 reads=5100 covStat=2143.94 gappedBases=no class=contig suggestRepeat=no suggestCircular=no > >tig00006117 len=1009519 reads=1168 covStat=804.93 gappedBases=no class=contig suggestRepeat=no suggestCircular=no > >tig00000419 len=2633986 reads=3938 covStat=1519.93 gappedBases=no class=contig suggestRepeat=no suggestCircular=no > >tig00027677 len=108573 reads=86 covStat=86.05 gappedBases=no class=contig suggestRepeat=no suggestCircular=no > >tig00021790 len=202251 reads=158 covStat=184.12 gappedBases=no class=contig suggestRepeat=no suggestCircular=no > >tig00316948 len=280333 reads=237 covStat=253.23 gappedBases=no class=contig suggestRepeat=no suggestCircular=no > >tig00019606 len=149709 reads=82 covStat=150.02 gappedBases=no class=contig suggestRepeat=no suggestCircular=no > >tig00023852 len=189461 reads=115 covStat=192.28 gappedBases=no class=contig suggestRepeat=no suggestCircular=no > >tig00316994 len=19742 reads=1 covStat=0.00 gappedBases=no class=contig suggestRepeat=no suggestCircular=no > > I have just about given up, I have no idea why its happening it makes zero sense. > > Any help or information as to why this might be happening would be amazing. > > Thank you in advance. > Lahcen > > -- > ========================================== > > Dr. Lahcen Campbell < > > Contact: lahcencampbell at gmail.com < > > https://www.ebi.ac.uk/about/people/lahcen-campbell < > ========================================== > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.s.campbell1 at gmail.com Tue Nov 14 10:04:04 2017 From: michael.s.campbell1 at gmail.com (Michael Campbell) Date: Tue, 14 Nov 2017 12:04:04 -0500 Subject: [maker-devel] Unwarranted error: Skipping the contig because it is too short In-Reply-To: References: <3780BDEB-AF0E-4E27-9CD4-29CF0242FF9B@gmail.com> Message-ID: Hi Lancen, Thanks, the name has served me well for a number of years now :) So I started a run with your 11 scaffolds. I gave it the protein file that you sent and used all of repbase for masking. All of the scaffolds finished without error. I was hoping it would be something simple that just needed another set of eyes to see, looks like it's not the case for this one. To further rule out a data issue I would try running it with the dpp test data that is bundled with MAKER to see if you can get the same error. This data set will run in about a minute. If you are on a cluster I would try running it with and without submitting it you the nodes and with and without mpi. One thing that I have done in the past is to make a new directory and run maker there (this doesn't make a lot of sense but when the error doesn't make sense either it seems reasonable). As far as rerunning MAKER there are a couple of approaches. If you want it to stop complaining about trying to many times on failed contigs you can increase the number of tries in the opts file. The line looks like this: tries=2 #number of times to try a contig if there is a failure for some reason If you want to run it elsewhere, but you don't want to have to redo all of the repeat masking and blasting you can use the gff3 output from an earlier run. If you used gff3_merge after the first run finished you got a big gff3 file with all of the gene models and evidence. If you break up that file by the source column you can selectively pass the evidence back to MAKER. If you put all of the repeatmasker and repeatrunner entries into one file and pass it in on this line: rm_gff= #pre-identified repeat elements from an external GFF3 file you can turn off model_org= and repeat_protein=. This will speed up the next run a lot. Then you can pass in the protein2genome gff3 data on this line: protein_gff= #aligned protein homology evidence from an external GFF3 file Don't pass the blast gff3 data in. If you pass in gff3 data to maker is assumes that it is polished and will not make any effort to fix alignments. the protein2genome data is polished. est2genome is the equivalent for EST input. Clean_up is useful if you are running on a file system that limits the number of files that you can write. It removes all of the intermediate files used in the annotation. This takes away the advantage of rerunning in the same directory. clean_try deletes everything first, and starts again. clean_try is the one that deletes everything and pretends that the first run never happened. I ccd the list on this response just Incas anyone else has any ideas or is facing the same error. Let me know if any of this helps, Mike > On Nov 14, 2017, at 10:48 AM, lahcen campbell wrote: > > Hi Michael > > Nice name btw I have a Michael in my name too :) Lahcen Michael Campbell to be exact haha...anyway... thanks for the reply and offer to help. > > I have attached the file in question below. Its so strange, I had to just leave it alone cause it was making me quite frustrated. Those bugs which there are now common sense solutions are the worst cause very easily you reach a wall. > > Might it have anything at all to do with the Protein homology file I passed in ? Though, note.... the same protein files here have been used in another maker run without issue so I kind of ruled that out already.....but just spitballing at this stage. > > > Might I be so cheeky to ask you one more MAKER related question Michael... ? Feel free to ignore it I hate to push but im desperate to figure it out with little time to do so... > > I have an issue with a different MAKER analysis. Currently any new run I attempt on this datastore, which has one round successful with 25000 odd genes and double the transcripts. I attempted to run the second round with a SNAP trained hmm (first time passing in SNAP hmm following first round EST/Protein evidence). In this attempt, because we obtained so many genes I thought I would be more stringent by changing the AED to 0.7 from 1.0. Something I see now I didn't approach in the right way... too late now sadly. > > MAKER finishes fine, but now it views all previous scaffolds as FAILED. Nothing seems to change this and now the datastore is for all intents and purposes locked in failed state. It keeps mentioning changes to the opts file which there were, and that the previous runs didn't finish so it must delete them. The results obtained from round 1 are still there though Im pretty sure of that, all blast files etc are still there and populated. > > Can you tell me the main differences either clean_up or clean_try have and which will completely and irreversibly wipe the first run? Something I don't want to repeat, just allow me to progress to the next round. Im hesitant to run them, but I've backed up the datastore incase. My next attempt will be to pass the exact same maker_opts file from the round1 run, with the only change made to clean_try/clean_up....Is this approach misguided ? > > Your help is very much appreciated Michael so thank you, > Best > L > > ? > ?Combined_Protein_homology.fa.zip ?? > ?SubsampledGenomeFile_n10_11MB.fasta ? > > > > On Tue, Nov 14, 2017 at 3:08 PM, Michael Campbell > wrote: > Hi Lahcen, > > Nothing comes right to mind for what could be causing this error. If you want to compress your FASTA and send it to me I can try and recreate the error and try and debug it. > > Thanks, > Mike >> On Nov 14, 2017, at 7:15 AM, lahcen campbell > wrote: >> >> Hi MAKER community, >> >> I was hoping someone could help me. I have a very unusual error with two different versions of maker I have tested so far. This error shouldn't be happening but it occurs time and again no matter what I try. I have tried using 2.31.6_mpich3_icc and 2.31_mpich3 >> >> Note that version 2.31.6_mpich3_icc is one I have used countless times and produced final MAKER annotations without issue. So its not that this version has issues to date. >> >> Basically, this is a brand new MAKER analysis, I am only trying to train SNAP in this first round. I am following the MakerTutorial as documented this time around and I can't get past the initial SNAP train stage. >> >> I have a single genome file with, 10 Long scaffolds making up just under 11MB (subsampled from my original full length assembly) of sequence data in which to train SNAP. The fasta file is not corrupted, and has been generated in various ways in order to test formatting issues etc. >> >> I have only edited the maker_opts file and changed: >> >> genome= >> protein= >> protein2genome=1 >> >> But see attached my maker CTL files. >> >> The error consistently returned to me: >> >> Skipping the contig because it is too short!! >> SeqID: contig_WHATEVER >> Length: 0 >> >> The sequences are no where near too short. This was verified independently outside maker to be sure. >> >> The headers are as follows: >> >> >tig00000458 len=2889428 reads=4143 covStat=1793.77 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >> >tig00000159 len=3515005 reads=5100 covStat=2143.94 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >> >tig00006117 len=1009519 reads=1168 covStat=804.93 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >> >tig00000419 len=2633986 reads=3938 covStat=1519.93 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >> >tig00027677 len=108573 reads=86 covStat=86.05 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >> >tig00021790 len=202251 reads=158 covStat=184.12 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >> >tig00316948 len=280333 reads=237 covStat=253.23 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >> >tig00019606 len=149709 reads=82 covStat=150.02 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >> >tig00023852 len=189461 reads=115 covStat=192.28 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >> >tig00316994 len=19742 reads=1 covStat=0.00 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >> >> I have just about given up, I have no idea why its happening it makes zero sense. >> >> Any help or information as to why this might be happening would be amazing. >> >> Thank you in advance. >> Lahcen >> >> -- >> ========================================== >> > Dr. Lahcen Campbell < >> > Contact: lahcencampbell at gmail.com < >> > https://www.ebi.ac.uk/about/people/lahcen-campbell < >> ========================================== >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > > -- > ========================================== > > Dr. Lahcen Campbell < > > Contact: lahcencampbell at gmail.com < > > https://www.ebi.ac.uk/about/people/lahcen-campbell < > ========================================== -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Nov 14 10:17:03 2017 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 14 Nov 2017 10:17:03 -0700 Subject: [maker-devel] Unwarranted error: Skipping the contig because it is too short In-Reply-To: References: Message-ID: My first thought is that one of your entries has a header and no sequence. Try this command with the fasta you are using ?> fasta_tool file.fasta --length | sort -nrk2 fasta_tool comes with maker. That command will report empty fasta entries at the bottom of the list with length 0. Alternatively, MAKER accesses the input assembly using BioPerl. Update your BioPerl to the latest CPAN version (do not use BioPerl-live, as it will be less stable). Also BioPerl is using BerkleyDB for indexing, so if you are using a Perl that is not the system Perl (i.e. /usr/bin/perl), then it was lik,ly compiled on the machine you are using and could have been compiled without BerkleyDB support. ?Carson > On Nov 14, 2017, at 5:15 AM, lahcen campbell wrote: > > Hi MAKER community, > > I was hoping someone could help me. I have a very unusual error with two different versions of maker I have tested so far. This error shouldn't be happening but it occurs time and again no matter what I try. I have tried using 2.31.6_mpich3_icc and 2.31_mpich3 > > Note that version 2.31.6_mpich3_icc is one I have used countless times and produced final MAKER annotations without issue. So its not that this version has issues to date. > > Basically, this is a brand new MAKER analysis, I am only trying to train SNAP in this first round. I am following the MakerTutorial as documented this time around and I can't get past the initial SNAP train stage. > > I have a single genome file with, 10 Long scaffolds making up just under 11MB (subsampled from my original full length assembly) of sequence data in which to train SNAP. The fasta file is not corrupted, and has been generated in various ways in order to test formatting issues etc. > > I have only edited the maker_opts file and changed: > > genome= > protein= > protein2genome=1 > > But see attached my maker CTL files. > > The error consistently returned to me: > > Skipping the contig because it is too short!! > SeqID: contig_WHATEVER > Length: 0 > > The sequences are no where near too short. This was verified independently outside maker to be sure. > > The headers are as follows: > > >tig00000458 len=2889428 reads=4143 covStat=1793.77 gappedBases=no class=contig suggestRepeat=no suggestCircular=no > >tig00000159 len=3515005 reads=5100 covStat=2143.94 gappedBases=no class=contig suggestRepeat=no suggestCircular=no > >tig00006117 len=1009519 reads=1168 covStat=804.93 gappedBases=no class=contig suggestRepeat=no suggestCircular=no > >tig00000419 len=2633986 reads=3938 covStat=1519.93 gappedBases=no class=contig suggestRepeat=no suggestCircular=no > >tig00027677 len=108573 reads=86 covStat=86.05 gappedBases=no class=contig suggestRepeat=no suggestCircular=no > >tig00021790 len=202251 reads=158 covStat=184.12 gappedBases=no class=contig suggestRepeat=no suggestCircular=no > >tig00316948 len=280333 reads=237 covStat=253.23 gappedBases=no class=contig suggestRepeat=no suggestCircular=no > >tig00019606 len=149709 reads=82 covStat=150.02 gappedBases=no class=contig suggestRepeat=no suggestCircular=no > >tig00023852 len=189461 reads=115 covStat=192.28 gappedBases=no class=contig suggestRepeat=no suggestCircular=no > >tig00316994 len=19742 reads=1 covStat=0.00 gappedBases=no class=contig suggestRepeat=no suggestCircular=no > > I have just about given up, I have no idea why its happening it makes zero sense. > > Any help or information as to why this might be happening would be amazing. > > Thank you in advance. > Lahcen > > -- > ========================================== > > Dr. Lahcen Campbell < > > Contact: lahcencampbell at gmail.com < > > https://www.ebi.ac.uk/about/people/lahcen-campbell < > ========================================== > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From lahcencampbell at gmail.com Wed Nov 15 09:32:02 2017 From: lahcencampbell at gmail.com (lahcen campbell) Date: Wed, 15 Nov 2017 16:32:02 +0000 Subject: [maker-devel] Unwarranted error: Skipping the contig because it is too short In-Reply-To: References: <3780BDEB-AF0E-4E27-9CD4-29CF0242FF9B@gmail.com> Message-ID: Hi Michael and Carson Thank you both for your helpful input, I really appreciate it. See below for my comments... Best Lahcen On Tue, Nov 14, 2017 at 5:04 PM, Michael Campbell < michael.s.campbell1 at gmail.com> wrote: > Hi Lancen, > > Thanks, the name has served me well for a number of years now :) > Its a good name, I wouldn't change it haha :) > > So I started a run with your 11 scaffolds. I gave it the protein file that > you sent and used all of repbase for masking. All of the scaffolds finished > without error. I was hoping it would be something simple that just needed > another set of eyes to see, looks like it's not the case for this one. > > To further rule out a data issue I would try running it with the dpp test > data that is bundled with MAKER to see if you can get the same error. This > data set will run in about a minute. If you are on a cluster I would try > running it with and without submitting it you the nodes and with and > without mpi. > > One thing that I have done in the past is to make a new directory and run > maker there (this doesn't make a lot of sense but when the error doesn't > make sense either it seems reasonable). > First off, I can report good news regards the 0 lengths contigs I was getting back. Carson, your thoughts on Bioperl conflict issues seemed to be the main issue. Out cluster software environment had gone through some changes of late, so working off the basis of that I was able to load the right bash config which resulted in no more 0 length contig errors. Huzzah !! > As far as rerunning MAKER there are a couple of approaches. If you want it > to stop complaining about trying to many times on failed contigs you can > increase the number of tries in the opts file. The line looks like this: > > tries=2 #number of times to try a contig if there is a failure for some > reason > > If you want to run it elsewhere, but you don't want to have to redo all of > the repeat masking and blasting you can use the gff3 output from an earlier > run. If you used gff3_merge after the first run finished you got a big gff3 > file with all of the gene models and evidence. If you break up that file by > the source column you can selectively pass the evidence back to MAKER. If > you put all of the repeatmasker and repeatrunner entries into one file and > pass it in on this line: > Can I ask, because I can't seem to find any concrete info on best practices for parsing MAKER gffs to partition the various source column fields as you described Michael. Is there a commonly used way to partition MAKER gffs based on source column? Or will I need to code it up, I ask because I feel this must have been needed before many times by other users. > > rm_gff= #pre-identified repeat elements from an external GFF3 file > I will remove links to fasta files for both 'rmlib=' and 'repeat_protein=' > > you can turn off model_org= and repeat_protein=. This will speed up the > next run a lot. Then you can pass in the protein2genome gff3 data on this > line: > > protein_gff= #aligned protein homology evidence from an external GFF3 file > > Don't pass the blast gff3 data in. If you pass in gff3 data to maker is > assumes that it is polished and will not make any effort to fix alignments. > the protein2genome data is polished. est2genome is the equivalent for EST > input. > You say don't pass the blast as gff. As I pass in all other info via GFF3 and remove any evidence as fasta inputs... BLAST won't be called again right ? Ensuring the shortest possible rerun of MAKER to roll back to a uncorrupted state. I noticed that the only unique source field types in my MAKER GFF are as follows: *augustus_masked * *blastx* *maker* *protein2genome* *repeatmasker* *repeatrunner* I read on the dev group that passing est evidence as GFF won't actually call Exonerate, est2genome option just tells MAKER to try and turn polished EST alignments directly into genes.... so If I pass this info again as GFF it will simply use the same info as it did originally and not have to recompute anything ? Based on the above fields contained in my MAKER gff, which of the following options should I select to re-annotate based on this older run ? I suspect all the options below in green should be set to 1, and the others in red set to 0. *#-----Re-annotation Using MAKER Derived GFF3* ..... *est_pass=1 #use ESTs in maker_gff: 1 = yes, 0 = no* *altest_pass=0 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no* *protein_pass=1 #use protein alignments in maker_gff: 1 = yes, 0 = no* *rm_pass=1 #use repeats in maker_gff: 1 = yes, 0 = no* *model_pass=1 #use gene models in maker_gff: 1 = yes, 0 = no* *pred_pass=1 #use ab-initio predictions in maker_gff: 1 = yes, 0 = no* *other_pass=0 #passthrough anyything else in maker_gff: 1 = yes, 0 = no * I don't think I will pass back anything under augustus_masked as I didn't set that up correctly initially, instead passing in a precomputed augustus gff which Im told isn't the best way to run MAKER. So if I can get back to a state of not failing all contigs, I will run Augustus inside maker itself on the 2nd pass. Note though, I am aware of the order of things normally, but for this instance I will continue with what I have done with success previously. Lastly, as this next run will be updating based on previous generated MAKER gff data.... what states should est2genome and protein2genome be ? 1 or 0 ? Apologies for the lengthy email reply Michael. Much appreciated again, thank you !! L > Clean_up is useful if you are running on a file system that limits the > number of files that you can write. It removes all of the intermediate > files used in the annotation. This takes away the advantage of rerunning in > the same directory. clean_try deletes everything first, and starts again. > clean_try is the one that deletes everything and pretends that the first > run never happened. > > I ccd the list on this response just Incas anyone else has any ideas or is > facing the same error. > > Let me know if any of this helps, > Mike > > On Nov 14, 2017, at 10:48 AM, lahcen campbell > wrote: > > Hi Michael > > Nice name btw I have a Michael in my name too :) Lahcen Michael Campbell > to be exact haha...anyway... thanks for the reply and offer to help. > > I have attached the file in question below. Its so strange, I had to just > leave it alone cause it was making me quite frustrated. Those bugs which > there are now common sense solutions are the worst cause very easily you > reach a wall. > > Might it have anything at all to do with the Protein homology file I > passed in ? Though, note.... the same protein files here have been used in > another maker run without issue so I kind of ruled that out already.....but > just spitballing at this stage. > > > Might I be so cheeky to ask you one more MAKER related question Michael... > ? Feel free to ignore it I hate to push but im desperate to figure it out > with little time to do so... > > I have an issue with a different MAKER analysis. Currently any new run I > attempt on this datastore, which has one round successful with 25000 odd > genes and double the transcripts. I attempted to run the second round with > a SNAP trained hmm (first time passing in SNAP hmm following first round > EST/Protein evidence). In this attempt, because we obtained so many genes I > thought I would be more stringent by changing the AED to 0.7 from 1.0. > Something I see now I didn't approach in the right way... too late now > sadly. > > MAKER finishes fine, but now it views all previous scaffolds as FAILED. > Nothing seems to change this and now the datastore is for all intents and > purposes locked in failed state. It keeps mentioning changes to the opts > file which there were, and that the previous runs didn't finish so it must > delete them. The results obtained from round 1 are still there though Im > pretty sure of that, all blast files etc are still there and populated. > > Can you tell me the main differences either clean_up or clean_try have and > which will completely and irreversibly wipe the first run? Something I > don't want to repeat, just allow me to progress to the next round. Im > hesitant to run them, but I've backed up the datastore incase. My next > attempt will be to pass the exact same maker_opts file from the round1 run, > with the only change made to clean_try/clean_up....Is this approach > misguided ? > > Your help is very much appreciated Michael so thank you, > Best > L > > ? > Combined_Protein_homology.fa.zip > > ?? > SubsampledGenomeFile_n10_11MB.fasta > > ? > > > > On Tue, Nov 14, 2017 at 3:08 PM, Michael Campbell < > michael.s.campbell1 at gmail.com> wrote: > >> Hi Lahcen, >> >> Nothing comes right to mind for what could be causing this error. If you >> want to compress your FASTA and send it to me I can try and recreate the >> error and try and debug it. >> >> Thanks, >> Mike >> >> On Nov 14, 2017, at 7:15 AM, lahcen campbell >> wrote: >> >> Hi MAKER community, >> >> I was hoping someone could help me. I have a very unusual error with two >> different versions of maker I have tested so far. This error shouldn't be >> happening but it occurs time and again no matter what I try. I have tried >> using 2.31.6_mpich3_icc and 2.31_mpich3 >> >> Note that version 2.31.6_mpich3_icc is one I have used countless times >> and produced final MAKER annotations without issue. So its not that this >> version has issues to date. >> >> Basically, this is a brand new MAKER analysis, I am only trying to train >> SNAP in this first round. I am following the MakerTutorial as documented >> this time around and I can't get past the initial SNAP train stage. >> >> I have a single genome file with, 10 Long scaffolds making up just under >> 11MB (subsampled from my original full length assembly) of sequence data in >> which to train SNAP. The fasta file is not corrupted, and has been >> generated in various ways in order to test formatting issues etc. >> >> I have only edited the maker_opts file and changed: >> >> *genome=* >> *protein=* >> *protein2genome=1* >> >> But see attached my maker CTL files. >> >> The error consistently returned to me: >> >> *Skipping the contig because it is too short!!* >> *SeqID: contig_WHATEVER* >> *Length: 0* >> >> *The sequences are no where near too short. This was verified >> independently outside maker to be sure. * >> >> *The headers are as follows:* >> >> >tig00000458 len=2889428 reads=4143 covStat=1793.77 gappedBases=no >> class=contig suggestRepeat=no suggestCircular=no >> >tig00000159 len=3515005 reads=5100 covStat=2143.94 gappedBases=no >> class=contig suggestRepeat=no suggestCircular=no >> >tig00006117 len=1009519 reads=1168 covStat=804.93 gappedBases=no >> class=contig suggestRepeat=no suggestCircular=no >> >tig00000419 len=2633986 reads=3938 covStat=1519.93 gappedBases=no >> class=contig suggestRepeat=no suggestCircular=no >> >tig00027677 len=108573 reads=86 covStat=86.05 gappedBases=no >> class=contig suggestRepeat=no suggestCircular=no >> >tig00021790 len=202251 reads=158 covStat=184.12 gappedBases=no >> class=contig suggestRepeat=no suggestCircular=no >> >tig00316948 len=280333 reads=237 covStat=253.23 gappedBases=no >> class=contig suggestRepeat=no suggestCircular=no >> >tig00019606 len=149709 reads=82 covStat=150.02 gappedBases=no >> class=contig suggestRepeat=no suggestCircular=no >> >tig00023852 len=189461 reads=115 covStat=192.28 gappedBases=no >> class=contig suggestRepeat=no suggestCircular=no >> >tig00316994 len=19742 reads=1 covStat=0.00 gappedBases=no class=contig >> suggestRepeat=no suggestCircular=no >> >> I have just about given up, I have no idea why its happening it makes >> zero sense. >> >> Any help or information as to why this might be happening would be >> amazing. >> >> Thank you in advance. >> Lahcen >> >> -- >> ========================================== >> > Dr. Lahcen Campbell < >> > Contact: lahcencampbell at gmail.com < >> > https://www.ebi.ac.uk/about/people/lahcen-campbell < >> ========================================== >> ____________ >> ___________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> > > > -- > ========================================== > > Dr. Lahcen Campbell < > > Contact: lahcencampbell at gmail.com < > > https://www.ebi.ac.uk/about/people/lahcen-campbell < > ========================================== > > > -- ========================================== > Dr. Lahcen Campbell < > Contact: lahcencampbell at gmail.com < > https://www.ebi.ac.uk/about/people/lahcen-campbell < ========================================== -------------- next part -------------- An HTML attachment was scrubbed... URL: From lahcencampbell at gmail.com Wed Nov 15 09:56:20 2017 From: lahcencampbell at gmail.com (lahcen campbell) Date: Wed, 15 Nov 2017 16:56:20 +0000 Subject: [maker-devel] Unwarranted error: Skipping the contig because it is too short In-Reply-To: References: <3780BDEB-AF0E-4E27-9CD4-29CF0242FF9B@gmail.com> Message-ID: Just an add on to this topic.... I have found a suite of gff utilities here which I hope can help me quickly parse the MAKER gff. https://github.com/mamarjan/gff3-pltools I'll report back how it goes ! Best L On Tue, Nov 14, 2017 at 5:04 PM, Michael Campbell < michael.s.campbell1 at gmail.com> wrote: > Hi Lancen, > > Thanks, the name has served me well for a number of years now :) > > So I started a run with your 11 scaffolds. I gave it the protein file that > you sent and used all of repbase for masking. All of the scaffolds finished > without error. I was hoping it would be something simple that just needed > another set of eyes to see, looks like it's not the case for this one. > > To further rule out a data issue I would try running it with the dpp test > data that is bundled with MAKER to see if you can get the same error. This > data set will run in about a minute. If you are on a cluster I would try > running it with and without submitting it you the nodes and with and > without mpi. > > One thing that I have done in the past is to make a new directory and run > maker there (this doesn't make a lot of sense but when the error doesn't > make sense either it seems reasonable). > > As far as rerunning MAKER there are a couple of approaches. If you want it > to stop complaining about trying to many times on failed contigs you can > increase the number of tries in the opts file. The line looks like this: > > tries=2 #number of times to try a contig if there is a failure for some > reason > > If you want to run it elsewhere, but you don't want to have to redo all of > the repeat masking and blasting you can use the gff3 output from an earlier > run. If you used gff3_merge after the first run finished you got a big gff3 > file with all of the gene models and evidence. If you break up that file by > the source column you can selectively pass the evidence back to MAKER. If > you put all of the repeatmasker and repeatrunner entries into one file and > pass it in on this line: > > rm_gff= #pre-identified repeat elements from an external GFF3 file > > you can turn off model_org= and repeat_protein=. This will speed up the > next run a lot. Then you can pass in the protein2genome gff3 data on this > line: > > protein_gff= #aligned protein homology evidence from an external GFF3 file > > Don't pass the blast gff3 data in. If you pass in gff3 data to maker is > assumes that it is polished and will not make any effort to fix alignments. > the protein2genome data is polished. est2genome is the equivalent for EST > input. > > Clean_up is useful if you are running on a file system that limits the > number of files that you can write. It removes all of the intermediate > files used in the annotation. This takes away the advantage of rerunning in > the same directory. clean_try deletes everything first, and starts again. > clean_try is the one that deletes everything and pretends that the first > run never happened. > > I ccd the list on this response just Incas anyone else has any ideas or is > facing the same error. > > Let me know if any of this helps, > Mike > > On Nov 14, 2017, at 10:48 AM, lahcen campbell > wrote: > > Hi Michael > > Nice name btw I have a Michael in my name too :) Lahcen Michael Campbell > to be exact haha...anyway... thanks for the reply and offer to help. > > I have attached the file in question below. Its so strange, I had to just > leave it alone cause it was making me quite frustrated. Those bugs which > there are now common sense solutions are the worst cause very easily you > reach a wall. > > Might it have anything at all to do with the Protein homology file I > passed in ? Though, note.... the same protein files here have been used in > another maker run without issue so I kind of ruled that out already.....but > just spitballing at this stage. > > > Might I be so cheeky to ask you one more MAKER related question Michael... > ? Feel free to ignore it I hate to push but im desperate to figure it out > with little time to do so... > > I have an issue with a different MAKER analysis. Currently any new run I > attempt on this datastore, which has one round successful with 25000 odd > genes and double the transcripts. I attempted to run the second round with > a SNAP trained hmm (first time passing in SNAP hmm following first round > EST/Protein evidence). In this attempt, because we obtained so many genes I > thought I would be more stringent by changing the AED to 0.7 from 1.0. > Something I see now I didn't approach in the right way... too late now > sadly. > > MAKER finishes fine, but now it views all previous scaffolds as FAILED. > Nothing seems to change this and now the datastore is for all intents and > purposes locked in failed state. It keeps mentioning changes to the opts > file which there were, and that the previous runs didn't finish so it must > delete them. The results obtained from round 1 are still there though Im > pretty sure of that, all blast files etc are still there and populated. > > Can you tell me the main differences either clean_up or clean_try have and > which will completely and irreversibly wipe the first run? Something I > don't want to repeat, just allow me to progress to the next round. Im > hesitant to run them, but I've backed up the datastore incase. My next > attempt will be to pass the exact same maker_opts file from the round1 run, > with the only change made to clean_try/clean_up....Is this approach > misguided ? > > Your help is very much appreciated Michael so thank you, > Best > L > > ? > Combined_Protein_homology.fa.zip > > ?? > SubsampledGenomeFile_n10_11MB.fasta > > ? > > > > On Tue, Nov 14, 2017 at 3:08 PM, Michael Campbell < > michael.s.campbell1 at gmail.com> wrote: > >> Hi Lahcen, >> >> Nothing comes right to mind for what could be causing this error. If you >> want to compress your FASTA and send it to me I can try and recreate the >> error and try and debug it. >> >> Thanks, >> Mike >> >> On Nov 14, 2017, at 7:15 AM, lahcen campbell >> wrote: >> >> Hi MAKER community, >> >> I was hoping someone could help me. I have a very unusual error with two >> different versions of maker I have tested so far. This error shouldn't be >> happening but it occurs time and again no matter what I try. I have tried >> using 2.31.6_mpich3_icc and 2.31_mpich3 >> >> Note that version 2.31.6_mpich3_icc is one I have used countless times >> and produced final MAKER annotations without issue. So its not that this >> version has issues to date. >> >> Basically, this is a brand new MAKER analysis, I am only trying to train >> SNAP in this first round. I am following the MakerTutorial as documented >> this time around and I can't get past the initial SNAP train stage. >> >> I have a single genome file with, 10 Long scaffolds making up just under >> 11MB (subsampled from my original full length assembly) of sequence data in >> which to train SNAP. The fasta file is not corrupted, and has been >> generated in various ways in order to test formatting issues etc. >> >> I have only edited the maker_opts file and changed: >> >> *genome=* >> *protein=* >> *protein2genome=1* >> >> But see attached my maker CTL files. >> >> The error consistently returned to me: >> >> *Skipping the contig because it is too short!!* >> *SeqID: contig_WHATEVER* >> *Length: 0* >> >> *The sequences are no where near too short. This was verified >> independently outside maker to be sure. * >> >> *The headers are as follows:* >> >> >tig00000458 len=2889428 reads=4143 covStat=1793.77 gappedBases=no >> class=contig suggestRepeat=no suggestCircular=no >> >tig00000159 len=3515005 reads=5100 covStat=2143.94 gappedBases=no >> class=contig suggestRepeat=no suggestCircular=no >> >tig00006117 len=1009519 reads=1168 covStat=804.93 gappedBases=no >> class=contig suggestRepeat=no suggestCircular=no >> >tig00000419 len=2633986 reads=3938 covStat=1519.93 gappedBases=no >> class=contig suggestRepeat=no suggestCircular=no >> >tig00027677 len=108573 reads=86 covStat=86.05 gappedBases=no >> class=contig suggestRepeat=no suggestCircular=no >> >tig00021790 len=202251 reads=158 covStat=184.12 gappedBases=no >> class=contig suggestRepeat=no suggestCircular=no >> >tig00316948 len=280333 reads=237 covStat=253.23 gappedBases=no >> class=contig suggestRepeat=no suggestCircular=no >> >tig00019606 len=149709 reads=82 covStat=150.02 gappedBases=no >> class=contig suggestRepeat=no suggestCircular=no >> >tig00023852 len=189461 reads=115 covStat=192.28 gappedBases=no >> class=contig suggestRepeat=no suggestCircular=no >> >tig00316994 len=19742 reads=1 covStat=0.00 gappedBases=no class=contig >> suggestRepeat=no suggestCircular=no >> >> I have just about given up, I have no idea why its happening it makes >> zero sense. >> >> Any help or information as to why this might be happening would be >> amazing. >> >> Thank you in advance. >> Lahcen >> >> -- >> ========================================== >> > Dr. Lahcen Campbell < >> > Contact: lahcencampbell at gmail.com < >> > https://www.ebi.ac.uk/about/people/lahcen-campbell < >> ========================================== >> ____________ >> ___________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> > > > -- > ========================================== > > Dr. Lahcen Campbell < > > Contact: lahcencampbell at gmail.com < > > https://www.ebi.ac.uk/about/people/lahcen-campbell < > ========================================== > > > -- ========================================== > Dr. Lahcen Campbell < > Contact: lahcencampbell at gmail.com < > https://www.ebi.ac.uk/about/people/lahcen-campbell < ========================================== -------------- next part -------------- An HTML attachment was scrubbed... URL: From qwzhang0601 at gmail.com Thu Nov 16 12:46:39 2017 From: qwzhang0601 at gmail.com (Quanwei Zhang) Date: Thu, 16 Nov 2017 14:46:39 -0500 Subject: [maker-devel] About loss of Histone H2A, H2B, H4 Message-ID: Hello: We have annotated a new rodent genome using Maker2. Based on the annotated maker2 gene sets, we did gene family expansion/contraction analysis using CAFE. We found Histone H2A, H2B, H4 gene families are under contraction. I wonder whether there are known bias to predict those gene families using Maker2? For example, can this due to repeat masking of the genome? I used repeatmaker and generated species specific repeat libraries follows http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction--Basic . Thanks Best Quanwei -------------- next part -------------- An HTML attachment was scrubbed... URL: From qwzhang0601 at gmail.com Thu Nov 16 12:46:39 2017 From: qwzhang0601 at gmail.com (Quanwei Zhang) Date: Thu, 16 Nov 2017 14:46:39 -0500 Subject: [maker-devel] About loss of Histone H2A, H2B, H4 Message-ID: Hello: We have annotated a new rodent genome using Maker2. Based on the annotated maker2 gene sets, we did gene family expansion/contraction analysis using CAFE. We found Histone H2A, H2B, H4 gene families are under contraction. I wonder whether there are known bias to predict those gene families using Maker2? For example, can this due to repeat masking of the genome? I used repeatmaker and generated species specific repeat libraries follows http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction--Basic . Thanks Best Quanwei -------------- next part -------------- An HTML attachment was scrubbed... URL: From mcsimenc at gmail.com Fri Nov 17 18:39:25 2017 From: mcsimenc at gmail.com (Matt Simenc) Date: Fri, 17 Nov 2017 17:39:25 -0800 Subject: [maker-devel] 99.98% of repeatmasker features on plus strand, anyone else seen this? Message-ID: Hi everybody, I just noticed that the vast majority of features with type repeatmasker are on the plus strand in my MAKER GFFs. There are a handful on the minus strand. Has anyone else seen that in their MAKER GFFs? MAKER 2.31.8 I looked at a standalone RepeatMasker run I did and the features are more evenly distributed between the +/- strands. Matt -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Nov 17 19:09:20 2017 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 17 Nov 2017 19:09:20 -0700 Subject: [maker-devel] 99.98% of repeatmasker features on plus strand, anyone else seen this? In-Reply-To: References: Message-ID: <0DC818BC-EA36-43EA-9237-003BE07C4434@gmail.com> While transposons that encode proteins will technically have a strand, simple repeats and many others do not so the algorithms used to find them will not necessarily assign a strand. For this reason the repeats are treated as strand-less since both strands are masked and are they are arbitrarily assigned to the plus strand to avoid issues with genome browsers that cannot handle strandless features. ?Carson > On Nov 17, 2017, at 6:39 PM, Matt Simenc wrote: > > Hi everybody, > > I just noticed that the vast majority of features with type repeatmasker are on the plus strand in my MAKER GFFs. There are a handful on the minus strand. Has anyone else seen that in their MAKER GFFs? > > MAKER 2.31.8 > > I looked at a standalone RepeatMasker run I did and the features are more evenly distributed between the +/- strands. > > > Matt > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Fri Nov 17 19:23:34 2017 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 17 Nov 2017 19:23:34 -0700 Subject: [maker-devel] 99.98% of repeatmasker features on plus strand, anyone else seen this? In-Reply-To: References: Message-ID: Also MAKER clusters overlapping repeats to generate the best masking of the assembly. For the GFF3 it then assigns the name of the repeat encompassing the greatest portion of the cluster to the feature (i.e. the best representative). But the cluster is technically build from overlapping repeats on both strands (repeats tend to jump on top of other repeats, so they stack with bits and pieces of other repeats at the edges). Yet another reason why everything is just assigned to the plus strand. ?Carson > On Nov 17, 2017, at 6:39 PM, Matt Simenc wrote: > > Hi everybody, > > I just noticed that the vast majority of features with type repeatmasker are on the plus strand in my MAKER GFFs. There are a handful on the minus strand. Has anyone else seen that in their MAKER GFFs? > > MAKER 2.31.8 > > I looked at a standalone RepeatMasker run I did and the features are more evenly distributed between the +/- strands. > > > Matt > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From mcsimenc at gmail.com Sat Nov 18 09:27:25 2017 From: mcsimenc at gmail.com (Matt Simenc) Date: Sat, 18 Nov 2017 08:27:25 -0800 Subject: [maker-devel] 99.98% of repeatmasker features on plus strand, anyone else seen this? In-Reply-To: References: Message-ID: Ah ok. A messy problem! I need to approximate strandedness for TE loci if possible so will do some post processing using blast/hmmer to Repbase and Dfam. Thanks for the speedy response Carson! On Fri, Nov 17, 2017 at 6:23 PM, Carson Holt wrote: > Also MAKER clusters overlapping repeats to generate the best masking of > the assembly. For the GFF3 it then assigns the name of the repeat > encompassing the greatest portion of the cluster to the feature (i.e. the > best representative). But the cluster is technically build from overlapping > repeats on both strands (repeats tend to jump on top of other repeats, so > they stack with bits and pieces of other repeats at the edges). Yet another > reason why everything is just assigned to the plus strand. > > ?Carson > > > > On Nov 17, 2017, at 6:39 PM, Matt Simenc wrote: > > > > Hi everybody, > > > > I just noticed that the vast majority of features with type repeatmasker > are on the plus strand in my MAKER GFFs. There are a handful on the minus > strand. Has anyone else seen that in their MAKER GFFs? > > > > MAKER 2.31.8 > > > > I looked at a standalone RepeatMasker run I did and the features are > more evenly distributed between the +/- strands. > > > > > > Matt > > _______________________________________________ > > maker-devel mailing list > > maker-devel at box290.bluehost.com > > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.s.campbell1 at gmail.com Wed Nov 15 14:50:45 2017 From: michael.s.campbell1 at gmail.com (Michael Campbell) Date: Wed, 15 Nov 2017 16:50:45 -0500 Subject: [maker-devel] Unwarranted error: Skipping the contig because it is too short In-Reply-To: References: <3780BDEB-AF0E-4E27-9CD4-29CF0242FF9B@gmail.com> Message-ID: <4157C9FE-1F5D-4320-A03F-2344C1DBD81C@gmail.com> Hi Lahcen, I put some answers below. > On Nov 15, 2017, at 11:32 AM, lahcen campbell wrote: > > Hi Michael and Carson > > Thank you both for your helpful input, I really appreciate it. > > See below for my comments... > > Best > Lahcen > > > On Tue, Nov 14, 2017 at 5:04 PM, Michael Campbell > wrote: > Hi Lancen, > > Thanks, the name has served me well for a number of years now :) > > Its a good name, I wouldn't change it haha :) > > > So I started a run with your 11 scaffolds. I gave it the protein file that you sent and used all of repbase for masking. All of the scaffolds finished without error. I was hoping it would be something simple that just needed another set of eyes to see, looks like it's not the case for this one. > > To further rule out a data issue I would try running it with the dpp test data that is bundled with MAKER to see if you can get the same error. This data set will run in about a minute. If you are on a cluster I would try running it with and without submitting it you the nodes and with and without mpi. > > One thing that I have done in the past is to make a new directory and run maker there (this doesn't make a lot of sense but when the error doesn't make sense either it seems reasonable). > > First off, I can report good news regards the 0 lengths contigs I was getting back. Carson, your thoughts on Bioperl conflict issues seemed to be the main issue. Out cluster software environment had gone through some changes of late, so working off the basis of that I was able to load the right bash config which resulted in no more 0 length contig errors. Huzzah !! > Great > > As far as rerunning MAKER there are a couple of approaches. If you want it to stop complaining about trying to many times on failed contigs you can increase the number of tries in the opts file. The line looks like this: > > tries=2 #number of times to try a contig if there is a failure for some reason > > If you want to run it elsewhere, but you don't want to have to redo all of the repeat masking and blasting you can use the gff3 output from an earlier run. If you used gff3_merge after the first run finished you got a big gff3 file with all of the gene models and evidence. If you break up that file by the source column you can selectively pass the evidence back to MAKER. If you put all of the repeatmasker and repeatrunner entries into one file and pass it in on this line: > > Can I ask, because I can't seem to find any concrete info on best practices for parsing MAKER gffs to partition the various source column fields as you described Michael. > > Is there a commonly used way to partition MAKER gffs based on source column? Or will I need to code it up, I ask because I feel this must have been needed before many times by other users. > I've got a script that will do it if you want it. Since you don't need all of the entries grep is probably as easy as anyting. grep -P '\tsource\t' > > rm_gff= #pre-identified repeat elements from an external GFF3 file > > I will remove links to fasta files for both 'rmlib=' and 'repeat_protein=' > Yep > > you can turn off model_org= and repeat_protein=. This will speed up the next run a lot. Then you can pass in the protein2genome gff3 data on this line: > > protein_gff= #aligned protein homology evidence from an external GFF3 file > > Don't pass the blast gff3 data in. If you pass in gff3 data to maker is assumes that it is polished and will not make any effort to fix alignments. the protein2genome data is polished. est2genome is the equivalent for EST input. > > You say don't pass the blast as gff. As I pass in all other info via GFF3 and remove any evidence as fasta inputs... BLAST won't be called again right ? Ensuring the shortest possible rerun of MAKER to roll back to a uncorrupted state. > Right. blast will not be called as long as you remove or comment out the paths to the fastas in the est= and protein= lines. > I noticed that the only unique source field types in my MAKER GFF are as follows: > augustus_masked > blastx > maker > protein2genome > repeatmasker > repeatrunner > That look right for the run you described > I read on the dev group that passing est evidence as GFF won't actually call Exonerate, est2genome option just tells MAKER to try and turn polished EST alignments directly into genes.... so If I pass this info again as GFF it will simply use the same info as it did originally and not have to recompute anything ? > > Based on the above fields contained in my MAKER gff, which of the following options should I select to re-annotate based on this older run ? I suspect all the options below in green should be set to 1, and the others in red set to 0. > > #-----Re-annotation Using MAKER Derived GFF3 > ..... > est_pass=1 #use ESTs in maker_gff: 1 = yes, 0 = no > altest_pass=0 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no > protein_pass=1 #use protein alignments in maker_gff: 1 = yes, 0 = no > rm_pass=1 #use repeats in maker_gff: 1 = yes, 0 = no > model_pass=1 #use gene models in maker_gff: 1 = yes, 0 = no > pred_pass=1 #use ab-initio predictions in maker_gff: 1 = yes, 0 = no > other_pass=0 #passthrough anyything else in maker_gff: 1 = yes, 0 = no > You don't need model_pass or pred_pass if you plan on running gene finders > I don't think I will pass back anything under augustus_masked as I didn't set that up correctly initially, instead passing in a precomputed augustus gff which Im told isn't the best way to run MAKER. So if I can get back to a state of not failing all contigs, I will run Augustus inside maker itself on the 2nd pass. Note though, I am aware of the order of things normally, but for this instance I will continue with what I have done with success previously. Yeah, when I have issues with failing contigs I'll pull stuff out until it starts running without error, then I add things back until something breaks. > Lastly, as this next run will be updating based on previous generated MAKER gff data.... what states should est2genome and protein2genome be ? 1 or 0 ? 0 those options are just for generating gene models directly from evidence when you don't have any gene finders trained. When you say updating do you mean reusing evidence from previous runs and generating new gene annotations or are you taking existing gene models and adding new evidence to see if they can be improved? > > Apologies for the lengthy email reply Michael. Much appreciated again, thank you !! No Worries, hope it helps. > > L > > > Clean_up is useful if you are running on a file system that limits the number of files that you can write. It removes all of the intermediate files used in the annotation. This takes away the advantage of rerunning in the same directory. clean_try deletes everything first, and starts again. clean_try is the one that deletes everything and pretends that the first run never happened. > > I ccd the list on this response just Incas anyone else has any ideas or is facing the same error. > > Let me know if any of this helps, > Mike > >> On Nov 14, 2017, at 10:48 AM, lahcen campbell > wrote: >> >> Hi Michael >> >> Nice name btw I have a Michael in my name too :) Lahcen Michael Campbell to be exact haha...anyway... thanks for the reply and offer to help. >> >> I have attached the file in question below. Its so strange, I had to just leave it alone cause it was making me quite frustrated. Those bugs which there are now common sense solutions are the worst cause very easily you reach a wall. >> >> Might it have anything at all to do with the Protein homology file I passed in ? Though, note.... the same protein files here have been used in another maker run without issue so I kind of ruled that out already.....but just spitballing at this stage. >> >> >> Might I be so cheeky to ask you one more MAKER related question Michael... ? Feel free to ignore it I hate to push but im desperate to figure it out with little time to do so... >> >> I have an issue with a different MAKER analysis. Currently any new run I attempt on this datastore, which has one round successful with 25000 odd genes and double the transcripts. I attempted to run the second round with a SNAP trained hmm (first time passing in SNAP hmm following first round EST/Protein evidence). In this attempt, because we obtained so many genes I thought I would be more stringent by changing the AED to 0.7 from 1.0. Something I see now I didn't approach in the right way... too late now sadly. >> >> MAKER finishes fine, but now it views all previous scaffolds as FAILED. Nothing seems to change this and now the datastore is for all intents and purposes locked in failed state. It keeps mentioning changes to the opts file which there were, and that the previous runs didn't finish so it must delete them. The results obtained from round 1 are still there though Im pretty sure of that, all blast files etc are still there and populated. >> >> Can you tell me the main differences either clean_up or clean_try have and which will completely and irreversibly wipe the first run? Something I don't want to repeat, just allow me to progress to the next round. Im hesitant to run them, but I've backed up the datastore incase. My next attempt will be to pass the exact same maker_opts file from the round1 run, with the only change made to clean_try/clean_up....Is this approach misguided ? >> >> Your help is very much appreciated Michael so thank you, >> Best >> L >> >> ? >> ?Combined_Protein_homology.fa.zip ?? >> ?SubsampledGenomeFile_n10_11MB.fasta ? >> >> >> >> On Tue, Nov 14, 2017 at 3:08 PM, Michael Campbell > wrote: >> Hi Lahcen, >> >> Nothing comes right to mind for what could be causing this error. If you want to compress your FASTA and send it to me I can try and recreate the error and try and debug it. >> >> Thanks, >> Mike >>> On Nov 14, 2017, at 7:15 AM, lahcen campbell > wrote: >>> >>> Hi MAKER community, >>> >>> I was hoping someone could help me. I have a very unusual error with two different versions of maker I have tested so far. This error shouldn't be happening but it occurs time and again no matter what I try. I have tried using 2.31.6_mpich3_icc and 2.31_mpich3 >>> >>> Note that version 2.31.6_mpich3_icc is one I have used countless times and produced final MAKER annotations without issue. So its not that this version has issues to date. >>> >>> Basically, this is a brand new MAKER analysis, I am only trying to train SNAP in this first round. I am following the MakerTutorial as documented this time around and I can't get past the initial SNAP train stage. >>> >>> I have a single genome file with, 10 Long scaffolds making up just under 11MB (subsampled from my original full length assembly) of sequence data in which to train SNAP. The fasta file is not corrupted, and has been generated in various ways in order to test formatting issues etc. >>> >>> I have only edited the maker_opts file and changed: >>> >>> genome= >>> protein= >>> protein2genome=1 >>> >>> But see attached my maker CTL files. >>> >>> The error consistently returned to me: >>> >>> Skipping the contig because it is too short!! >>> SeqID: contig_WHATEVER >>> Length: 0 >>> >>> The sequences are no where near too short. This was verified independently outside maker to be sure. >>> >>> The headers are as follows: >>> >>> >tig00000458 len=2889428 reads=4143 covStat=1793.77 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >>> >tig00000159 len=3515005 reads=5100 covStat=2143.94 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >>> >tig00006117 len=1009519 reads=1168 covStat=804.93 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >>> >tig00000419 len=2633986 reads=3938 covStat=1519.93 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >>> >tig00027677 len=108573 reads=86 covStat=86.05 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >>> >tig00021790 len=202251 reads=158 covStat=184.12 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >>> >tig00316948 len=280333 reads=237 covStat=253.23 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >>> >tig00019606 len=149709 reads=82 covStat=150.02 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >>> >tig00023852 len=189461 reads=115 covStat=192.28 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >>> >tig00316994 len=19742 reads=1 covStat=0.00 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >>> >>> I have just about given up, I have no idea why its happening it makes zero sense. >>> >>> Any help or information as to why this might be happening would be amazing. >>> >>> Thank you in advance. >>> Lahcen >>> >>> -- >>> ========================================== >>> > Dr. Lahcen Campbell < >>> > Contact: lahcencampbell at gmail.com < >>> > https://www.ebi.ac.uk/about/people/lahcen-campbell < >>> ========================================== >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> >> >> -- >> ========================================== >> > Dr. Lahcen Campbell < >> > Contact: lahcencampbell at gmail.com < >> > https://www.ebi.ac.uk/about/people/lahcen-campbell < >> ========================================== > > > > > -- > ========================================== > > Dr. Lahcen Campbell < > > Contact: lahcencampbell at gmail.com < > > https://www.ebi.ac.uk/about/people/lahcen-campbell < > ========================================== -------------- next part -------------- An HTML attachment was scrubbed... URL: From scott at scottcain.net Mon Nov 20 18:57:09 2017 From: scott at scottcain.net (Scott Cain) Date: Mon, 20 Nov 2017 20:57:09 -0500 Subject: [maker-devel] GMOD hackathon before PAG San Diego in January In-Reply-To: References: Message-ID: Hello, This is an update on the hackathon. It is a go; the hackathon page is up on GMOD.org: http://gmod.org/wiki/2018_PAG_Hackathon And the EventBrite page is up at https://www.eventbrite.com/e/gmod-2018-pag-hackathon-tickets-39700164260 Tickets are $50 which covers the costs associated with the room and lunch on the first day. Please feel free to add suggested topics to the wiki page, or send the suggestions to me to add. Thanks, Scott On Thursday, October 12, 2017, Scott Cain wrote: > Hi all, > > This January before PAG on the Wednesday and Thursday before PAG (January > 10-11) in San Diego we are planning a GMOD hackathon. We expect that > participants will be interested in solving problems/creating solutions > related to Tripal, JBrowse, Apollo, and Galaxy but if you're interested in > another GMOD project, by all means, let us know! We expect this hackathon > to overlap with the Tripal hackathon that is on January 11 (I'm pretty > sure; right Stephen?) > > If you are interested in attending this hackathon, please let me know so I > can be sure we have an appropriately sized space. And if you're coming for > the pre-PAG hackathon, consider staying for PAG, since there is always a > lot of GMOD-related content at the meeting! > > Thanks, > Scott > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain > dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research -------------- next part -------------- An HTML attachment was scrubbed... URL: From o.k.torresen at ibv.uio.no Tue Nov 21 06:57:46 2017 From: o.k.torresen at ibv.uio.no (=?utf-8?B?T2xlIEtyaXN0aWFuIFTDuHJyZXNlbg==?=) Date: Tue, 21 Nov 2017 13:57:46 +0000 Subject: [maker-devel] substr outside of string in PhatHits_utils.pm In-Reply-To: <5E5CA836-91B1-4AA8-8DC3-68FB9885EB43@gmail.com> References: <5E5CA836-91B1-4AA8-8DC3-68FB9885EB43@gmail.com> Message-ID: <182CDDD3-A108-4095-9AC4-A2C198D34107@ibv.uio.no> Thank you Carson. After a bit of struggling, I can confirm that the same error occurs in MAKER 3.01.2 (I guess you meant that version, couldn?t find 3.02.02). I am providing a GFF to est_gff, with match and match_part entries. For at least one of the scaffolds, the last coordinate (column 5) is the same number as the length of the scaffold. That should be allowed by the GFF3 standard, right? How can I troubleshoot this? The error message is not so informative. It seems that PhatHit_utils.pm tries to find a stop codon. Snipped from that file, lines 849-850: #fix stop codon by walking downstream my $has_stop = $tM->is_ter_codon(substr($transcript_seq, $end-1-3, 3)); The GFF I am using was the output of Mikado (https://www.biorxiv.org/content/early/2017/11/09/216994), which is GFF3, and then processed a bit to make it suitable for MAKER. First converted to GTF by 'mikado util convert mikado.loci.gff3 mikado.loci.gtf' Then I selected only mRNA and exon entries, and changed mRNA to transcript to make it look like cufflinks output (and set a dummy score): grep -P "\tmRNA\t|\texon\t" mikado.loci.gtf |sed "s/mRNA/transcript/g" |awk -F "\t" '{$9=$9"cov \"10.0\";"; OFS="\t"; print $1, $2, $3, $4, $5, $6, $7, $8, $9}' > mikado.loci.score.gtf Before converting with cufflinks2gff3: cufflinks2gff3 mikado.loci.score.gtf > ests.score.gff3 Thank you. Ole > On 09 Nov 2017, at 17:28, Carson Holt wrote: > > My first guess is that if you are using gff3 files as input to anything, then there may be an issue with your GFF3 file. My second suggestion is to try MAKER 3.02.02 to see if it has the same issue. > > ?Carson > > >> On Nov 9, 2017, at 2:44 AM, Ole Kristian T?rresen wrote: >> >> Dear all, >> I'm having an issue with MAKER which I'm unable to wrap my head around. Hopefully the issue is easily identifiable and resolvable for someone with more insight than me. Please find the log output attached below. I cannot find any more information than this in any logs. Many scaffolds do complete fine, but some of the longest ones have issues. >> >> Thank you. >> >> Sincerely, >> Ole K. T?rresen >> >> Error message: >> >> #--------- command -------------# >> Widget::augustus: >> /projects/cees/bin/augustus/augustus-3.2.3/bin/augustus --strand=backward --species=gadMor2_code_braker2 --UTR=off --hintsfile=/tmp/18899594.d/maker_m8xwVp/0/34_2.22239-28465.gadMor2_code_brak >> er2.auto_annotator.xdef.augustus --extrinsicCfgFile=/projects/cees/bin/augustus/augustus-3.2.3/config/extrinsic/extrinsic.MPE.cfg --AUGUSTUS_CONFIG_PATH=/projects/cees/bin/augustus/augustus-3.2 >> .3/config /tmp/18899594.d/maker_m8xwVp/0/34_2.22239-28465.gadMor2_code_braker2.auto_annotator.augustus.fasta > /tmp/18899594.d/maker_m8xwVp/0/34_2.22239-28465.gadMor2_code_braker2.auto_annotato >> r.augustus >> #-------------------------------# >> deleted:0 genes >> begin called get_best_alt_splices1 >> ...processing 0 of 2 >> ...processing 1 of 2 >> end called get_best_alt_splices1 >> ...processing 0 of 20 >> ...processing 1 of 20 >> ...processing 2 of 20 >> ...processing 3 of 20 >> ...processing 4 of 20 >> ...processing 5 of 20 >> ...processing 6 of 20 >> ...processing 7 of 20 >> ...processing 8 of 20 >> ...processing 9 of 20 >> ...processing 10 of 20 >> ...processing 11 of 20 >> ...processing 12 of 20 >> ...processing 13 of 20 >> ...processing 14 of 20 >> ...processing 15 of 20 >> ...processing 16 of 20 >> ...processing 17 of 20 >> ...processing 18 of 20 >> ...processing 19 of 20 >> substr outside of string at /projects/cees/bin/maker/maker-3.1.1/bin/../lib/PhatHit_utils.pm line 850. >> --> rank=NA, hostname=compute-31-18.local >> ERROR: Failed while annotating transcripts >> ERROR: Chunk failed at level:1, tier_type:4 >> FAILED CONTIG:GmG20150304_scaffold_8692 >> >> ERROR: Chunk failed at level:6, tier_type:0 >> FAILED CONTIG:GmG20150304_scaffold_8692 >> >> examining contents of the fasta file and run log >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > From carsonhh at gmail.com Tue Nov 21 09:19:36 2017 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 21 Nov 2017 09:19:36 -0700 Subject: [maker-devel] About loss of Histone H2A, H2B, H4 In-Reply-To: References: Message-ID: <3A172BAF-DD5C-4CA8-8D1C-6EFF30A1FFA4@gmail.com> No known biases, but if you are concerned, you can collect known Histone H2A, H2B, H4 proteins and transcripts from other species (protein= and altest= options), them run MAKER with no masking to see if you gain any models that may have been overlooked because of over-masking of repeats. Make sure to evaluate any models you find as being a pseudogene. Run InterProScan on results to make sure they contain known InterPro domains for that gene family as well. Running without repeat masking will increase sensitivity but also false positives derived from low homology alignments to simple repeats which is why you need to evaluate results using something like InterProScan. Also run BUSCO to evaluate the completeness of the genome. Make sure that the observed contraction is not just a result of an incomplete assembly. ?Carson > On Nov 16, 2017, at 12:46 PM, Quanwei Zhang wrote: > > Hello: > > We have annotated a new rodent genome using Maker2. Based on the annotated maker2 gene sets, we did gene family expansion/contraction analysis using CAFE. We found Histone H2A, H2B, H4 gene families are under contraction. I wonder whether there are known bias to predict those gene families using Maker2? For example, can this due to repeat masking of the genome? I used repeatmaker and generated species specific repeat libraries follows http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction--Basic . > > Thanks > > Best > Quanwei > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Nov 21 09:22:58 2017 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 21 Nov 2017 09:22:58 -0700 Subject: [maker-devel] Unwarranted error: Skipping the contig because it is too short In-Reply-To: <4157C9FE-1F5D-4320-A03F-2344C1DBD81C@gmail.com> References: <3780BDEB-AF0E-4E27-9CD4-29CF0242FF9B@gmail.com> <4157C9FE-1F5D-4320-A03F-2344C1DBD81C@gmail.com> Message-ID: <172954D4-7D27-4929-8BC1-B0292F8D9BDB@gmail.com> Just one note I want to add here. When you use GFF3 to pass in results as opposed to letting MAKER use the raw alignments, you lose the ability of MAKER to base some decisions on reading frame match since you lose both the alignment sequence and cigar string of the alignment. So MAKER just assumes correct ORF and sequence match rather than evaluating it (this will make AED scores artificially better for some models). ?Carson > On Nov 15, 2017, at 2:50 PM, Michael Campbell wrote: > > Hi Lahcen, > > I put some answers below. >> On Nov 15, 2017, at 11:32 AM, lahcen campbell > wrote: >> >> Hi Michael and Carson >> >> Thank you both for your helpful input, I really appreciate it. >> >> See below for my comments... >> >> Best >> Lahcen >> >> >> On Tue, Nov 14, 2017 at 5:04 PM, Michael Campbell > wrote: >> Hi Lancen, >> >> Thanks, the name has served me well for a number of years now :) >> >> Its a good name, I wouldn't change it haha :) >> >> >> So I started a run with your 11 scaffolds. I gave it the protein file that you sent and used all of repbase for masking. All of the scaffolds finished without error. I was hoping it would be something simple that just needed another set of eyes to see, looks like it's not the case for this one. >> >> To further rule out a data issue I would try running it with the dpp test data that is bundled with MAKER to see if you can get the same error. This data set will run in about a minute. If you are on a cluster I would try running it with and without submitting it you the nodes and with and without mpi. >> >> One thing that I have done in the past is to make a new directory and run maker there (this doesn't make a lot of sense but when the error doesn't make sense either it seems reasonable). >> >> First off, I can report good news regards the 0 lengths contigs I was getting back. Carson, your thoughts on Bioperl conflict issues seemed to be the main issue. Out cluster software environment had gone through some changes of late, so working off the basis of that I was able to load the right bash config which resulted in no more 0 length contig errors. Huzzah !! >> Great >> >> As far as rerunning MAKER there are a couple of approaches. If you want it to stop complaining about trying to many times on failed contigs you can increase the number of tries in the opts file. The line looks like this: >> >> tries=2 #number of times to try a contig if there is a failure for some reason >> >> If you want to run it elsewhere, but you don't want to have to redo all of the repeat masking and blasting you can use the gff3 output from an earlier run. If you used gff3_merge after the first run finished you got a big gff3 file with all of the gene models and evidence. If you break up that file by the source column you can selectively pass the evidence back to MAKER. If you put all of the repeatmasker and repeatrunner entries into one file and pass it in on this line: >> >> Can I ask, because I can't seem to find any concrete info on best practices for parsing MAKER gffs to partition the various source column fields as you described Michael. >> >> Is there a commonly used way to partition MAKER gffs based on source column? Or will I need to code it up, I ask because I feel this must have been needed before many times by other users. >> I've got a script that will do it if you want it. Since you don't need all of the entries grep is probably as easy as anyting. grep -P '\tsource\t' >> >> rm_gff= #pre-identified repeat elements from an external GFF3 file >> >> I will remove links to fasta files for both 'rmlib=' and 'repeat_protein=' >> Yep >> >> you can turn off model_org= and repeat_protein=. This will speed up the next run a lot. Then you can pass in the protein2genome gff3 data on this line: >> >> protein_gff= #aligned protein homology evidence from an external GFF3 file >> >> Don't pass the blast gff3 data in. If you pass in gff3 data to maker is assumes that it is polished and will not make any effort to fix alignments. the protein2genome data is polished. est2genome is the equivalent for EST input. >> >> You say don't pass the blast as gff. As I pass in all other info via GFF3 and remove any evidence as fasta inputs... BLAST won't be called again right ? Ensuring the shortest possible rerun of MAKER to roll back to a uncorrupted state. >> Right. blast will not be called as long as you remove or comment out the paths to the fastas in the est= and protein= lines. > >> I noticed that the only unique source field types in my MAKER GFF are as follows: >> augustus_masked >> blastx >> maker >> protein2genome >> repeatmasker >> repeatrunner >> That look right for the run you described >> I read on the dev group that passing est evidence as GFF won't actually call Exonerate, est2genome option just tells MAKER to try and turn polished EST alignments directly into genes.... so If I pass this info again as GFF it will simply use the same info as it did originally and not have to recompute anything ? >> >> Based on the above fields contained in my MAKER gff, which of the following options should I select to re-annotate based on this older run ? I suspect all the options below in green should be set to 1, and the others in red set to 0. >> >> #-----Re-annotation Using MAKER Derived GFF3 >> ..... >> est_pass=1 #use ESTs in maker_gff: 1 = yes, 0 = no >> altest_pass=0 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no >> protein_pass=1 #use protein alignments in maker_gff: 1 = yes, 0 = no >> rm_pass=1 #use repeats in maker_gff: 1 = yes, 0 = no >> model_pass=1 #use gene models in maker_gff: 1 = yes, 0 = no >> pred_pass=1 #use ab-initio predictions in maker_gff: 1 = yes, 0 = no >> other_pass=0 #passthrough anyything else in maker_gff: 1 = yes, 0 = no >> > You don't need model_pass or pred_pass if you plan on running gene finders >> I don't think I will pass back anything under augustus_masked as I didn't set that up correctly initially, instead passing in a precomputed augustus gff which Im told isn't the best way to run MAKER. So if I can get back to a state of not failing all contigs, I will run Augustus inside maker itself on the 2nd pass. Note though, I am aware of the order of things normally, but for this instance I will continue with what I have done with success previously. > Yeah, when I have issues with failing contigs I'll pull stuff out until it starts running without error, then I add things back until something breaks. > >> Lastly, as this next run will be updating based on previous generated MAKER gff data.... what states should est2genome and protein2genome be ? 1 or 0 ? > 0 those options are just for generating gene models directly from evidence when you don't have any gene finders trained. When you say updating do you mean reusing evidence from previous runs and generating new gene annotations or are you taking existing gene models and adding new evidence to see if they can be improved? >> >> Apologies for the lengthy email reply Michael. Much appreciated again, thank you !! > No Worries, hope it helps. >> >> L >> >> >> Clean_up is useful if you are running on a file system that limits the number of files that you can write. It removes all of the intermediate files used in the annotation. This takes away the advantage of rerunning in the same directory. clean_try deletes everything first, and starts again. clean_try is the one that deletes everything and pretends that the first run never happened. >> >> I ccd the list on this response just Incas anyone else has any ideas or is facing the same error. >> >> Let me know if any of this helps, >> Mike >> >>> On Nov 14, 2017, at 10:48 AM, lahcen campbell > wrote: >>> >>> Hi Michael >>> >>> Nice name btw I have a Michael in my name too :) Lahcen Michael Campbell to be exact haha...anyway... thanks for the reply and offer to help. >>> >>> I have attached the file in question below. Its so strange, I had to just leave it alone cause it was making me quite frustrated. Those bugs which there are now common sense solutions are the worst cause very easily you reach a wall. >>> >>> Might it have anything at all to do with the Protein homology file I passed in ? Though, note.... the same protein files here have been used in another maker run without issue so I kind of ruled that out already.....but just spitballing at this stage. >>> >>> >>> Might I be so cheeky to ask you one more MAKER related question Michael... ? Feel free to ignore it I hate to push but im desperate to figure it out with little time to do so... >>> >>> I have an issue with a different MAKER analysis. Currently any new run I attempt on this datastore, which has one round successful with 25000 odd genes and double the transcripts. I attempted to run the second round with a SNAP trained hmm (first time passing in SNAP hmm following first round EST/Protein evidence). In this attempt, because we obtained so many genes I thought I would be more stringent by changing the AED to 0.7 from 1.0. Something I see now I didn't approach in the right way... too late now sadly. >>> >>> MAKER finishes fine, but now it views all previous scaffolds as FAILED. Nothing seems to change this and now the datastore is for all intents and purposes locked in failed state. It keeps mentioning changes to the opts file which there were, and that the previous runs didn't finish so it must delete them. The results obtained from round 1 are still there though Im pretty sure of that, all blast files etc are still there and populated. >>> >>> Can you tell me the main differences either clean_up or clean_try have and which will completely and irreversibly wipe the first run? Something I don't want to repeat, just allow me to progress to the next round. Im hesitant to run them, but I've backed up the datastore incase. My next attempt will be to pass the exact same maker_opts file from the round1 run, with the only change made to clean_try/clean_up....Is this approach misguided ? >>> >>> Your help is very much appreciated Michael so thank you, >>> Best >>> L >>> >>> ? >>> ?Combined_Protein_homology.fa.zip ?? >>> ?SubsampledGenomeFile_n10_11MB.fasta ? >>> >>> >>> >>> On Tue, Nov 14, 2017 at 3:08 PM, Michael Campbell > wrote: >>> Hi Lahcen, >>> >>> Nothing comes right to mind for what could be causing this error. If you want to compress your FASTA and send it to me I can try and recreate the error and try and debug it. >>> >>> Thanks, >>> Mike >>>> On Nov 14, 2017, at 7:15 AM, lahcen campbell > wrote: >>>> >>>> Hi MAKER community, >>>> >>>> I was hoping someone could help me. I have a very unusual error with two different versions of maker I have tested so far. This error shouldn't be happening but it occurs time and again no matter what I try. I have tried using 2.31.6_mpich3_icc and 2.31_mpich3 >>>> >>>> Note that version 2.31.6_mpich3_icc is one I have used countless times and produced final MAKER annotations without issue. So its not that this version has issues to date. >>>> >>>> Basically, this is a brand new MAKER analysis, I am only trying to train SNAP in this first round. I am following the MakerTutorial as documented this time around and I can't get past the initial SNAP train stage. >>>> >>>> I have a single genome file with, 10 Long scaffolds making up just under 11MB (subsampled from my original full length assembly) of sequence data in which to train SNAP. The fasta file is not corrupted, and has been generated in various ways in order to test formatting issues etc. >>>> >>>> I have only edited the maker_opts file and changed: >>>> >>>> genome= >>>> protein= >>>> protein2genome=1 >>>> >>>> But see attached my maker CTL files. >>>> >>>> The error consistently returned to me: >>>> >>>> Skipping the contig because it is too short!! >>>> SeqID: contig_WHATEVER >>>> Length: 0 >>>> >>>> The sequences are no where near too short. This was verified independently outside maker to be sure. >>>> >>>> The headers are as follows: >>>> >>>> >tig00000458 len=2889428 reads=4143 covStat=1793.77 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >>>> >tig00000159 len=3515005 reads=5100 covStat=2143.94 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >>>> >tig00006117 len=1009519 reads=1168 covStat=804.93 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >>>> >tig00000419 len=2633986 reads=3938 covStat=1519.93 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >>>> >tig00027677 len=108573 reads=86 covStat=86.05 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >>>> >tig00021790 len=202251 reads=158 covStat=184.12 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >>>> >tig00316948 len=280333 reads=237 covStat=253.23 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >>>> >tig00019606 len=149709 reads=82 covStat=150.02 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >>>> >tig00023852 len=189461 reads=115 covStat=192.28 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >>>> >tig00316994 len=19742 reads=1 covStat=0.00 gappedBases=no class=contig suggestRepeat=no suggestCircular=no >>>> >>>> I have just about given up, I have no idea why its happening it makes zero sense. >>>> >>>> Any help or information as to why this might be happening would be amazing. >>>> >>>> Thank you in advance. >>>> Lahcen >>>> >>>> -- >>>> ========================================== >>>> > Dr. Lahcen Campbell < >>>> > Contact: lahcencampbell at gmail.com < >>>> > https://www.ebi.ac.uk/about/people/lahcen-campbell < >>>> ========================================== >>>> _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at box290.bluehost.com >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> >>> >>> >>> -- >>> ========================================== >>> > Dr. Lahcen Campbell < >>> > Contact: lahcencampbell at gmail.com < >>> > https://www.ebi.ac.uk/about/people/lahcen-campbell < >>> ========================================== >> >> >> >> >> -- >> ========================================== >> > Dr. Lahcen Campbell < >> > Contact: lahcencampbell at gmail.com < >> > https://www.ebi.ac.uk/about/people/lahcen-campbell < >> ========================================== > -------------- next part -------------- An HTML attachment was scrubbed... URL: From qwzhang0601 at gmail.com Tue Nov 21 10:42:38 2017 From: qwzhang0601 at gmail.com (Quanwei Zhang) Date: Tue, 21 Nov 2017 12:42:38 -0500 Subject: [maker-devel] About loss of Histone H2A, H2B, H4 In-Reply-To: <3A172BAF-DD5C-4CA8-8D1C-6EFF30A1FFA4@gmail.com> References: <3A172BAF-DD5C-4CA8-8D1C-6EFF30A1FFA4@gmail.com> Message-ID: Dear Carson: Thank you for your comments and suggestions. Now the SNAP was trained with repeat masked, is it necessary to retrain the predictor without repeat masking? By BUSCO analysis on the genome, the completeness is shown as below. Now I am doing the analysis using the default reports of Maker2 (i.e., gene models with evidence support, the default build). For the gene loss, besides you suggestions I am also considering to do the analysis using the gene models with evidence support plus those with scanned domains (i.e., standard build). How do you think? C:95.0%[S:92.7%,D:2.3%],F:2.2%,M:2.8%,n:4104 3902 Complete BUSCOs (C) 3806 Complete and single-copy BUSCOs (S) 96 Complete and duplicated BUSCOs (D) 92 Fragmented BUSCOs (F) 110 Missing BUSCOs (M) Thanks Best Quanwei 2017-11-21 11:19 GMT-05:00 Carson Holt : > No known biases, but if you are concerned, you can collect known Histone > H2A, H2B, H4 proteins and transcripts from other species (protein= and > altest= options), them run MAKER with no masking to see if you gain any > models that may have been overlooked because of over-masking of repeats. > Make sure to evaluate any models you find as being a pseudogene. Run > InterProScan on results to make sure they contain known InterPro domains > for that gene family as well. Running without repeat masking will increase > sensitivity but also false positives derived from low homology alignments > to simple repeats which is why you need to evaluate results using something > like InterProScan. > > Also run BUSCO to evaluate the completeness of the genome. Make sure that > the observed contraction is not just a result of an incomplete assembly. > > ?Carson > > > On Nov 16, 2017, at 12:46 PM, Quanwei Zhang wrote: > > Hello: > > We have annotated a new rodent genome using Maker2. Based on the annotated > maker2 gene sets, we did gene family expansion/contraction analysis using > CAFE. We found Histone H2A, H2B, H4 gene families are under contraction. I > wonder whether there are known bias to predict those gene families using > Maker2? For example, can this due to repeat masking of the genome? I used > repeatmaker and generated species specific repeat libraries follows > http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/ > Repeat_Library_Construction--Basic. > > Thanks > > Best > Quanwei > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wanghai01 at caas.cn Mon Nov 27 06:18:36 2017 From: wanghai01 at caas.cn (HAI WANG) Date: Mon, 27 Nov 2017 08:18:36 -0500 Subject: [maker-devel] Need your help on maker pipeline Message-ID: <000601d36782$3e24e0d0$ba6ea270$@cn> Dear Professor Yandell, I am Hai Wang, a visiting scholar in Cornell University. I am sorry to bother you, but I really need your help. I am now using the maker pipeline to annotate a maize genome. The installation of maker, openmpi and other software should be OK since I've successfully run maker on your example data. But when I ran maker on my own maize genome, I always got the following error: A process has executed an operation involving a call to the "fork()" system call to create a child process. Open MPI is currently operating in a condition that could result in memory corruption or other system errors; your job may hang, crash, or produce silent data corruption. The use of fork() (or system() or other calls that create child processes) is strongly discouraged. The process that invoked fork was: Local host: [[21269,1],0] (PID 12537) If you are *absolutely sure* that your application will successfully and correctly survive a call to fork(), you may disable this warning by setting the mpi_warn_on_fork MCA parameter to 0. -------------------------------------------------------------------------- -------------------------------------------------------------------------- mpiexec noticed that process rank 32 with PID 0 on node fat1 exited on signal 11 (Segmentation fault). -------------------------------------------------------------------------- Could you please help me with this issue? Or is there a way that I can resume this job when it stops? Thank you very much! Best, Hai Wang -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Nov 27 12:45:57 2017 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 27 Nov 2017 12:45:57 -0700 Subject: [maker-devel] Need your help on maker pipeline In-Reply-To: <000601d36782$3e24e0d0$ba6ea270$@cn> References: <000601d36782$3e24e0d0$ba6ea270$@cn> Message-ID: The parameters needed to get OpenMPI to work with MAKER are described in the ?/maker/INSTALL file (specifically look at LD_PRELOAD and -mca btl ^openib) ?> !!IMPORTANT!! MAKER is not compatible with MVAPICH2. Use OpenMPI or MPICH. If using MPICH, make sure to enable shared libaries during installation (this is not the default). If using OpenMPI, make sure to set LD_PRELOAD to the location of libmpi.so before even trying to install MAKER. It must also be set before running MAKER (or any program that uses OpenMPI's shared libraries), so it's best just to add it to your ~/.bash_profile. (i.e. export LD_PRELOAD=/usr/local/openmpi/lib/libmpi.so). 1. Say yes to the 'configure for MPI' question when running 'perl Build.PL' in step 1 of the EASY INSTALL. 2. Give path to 'mpicc'. Note to make sure you do not give the path to 'mpicc' from another MPI flavor that might be installed on your system. 3. Give path to the folder containing 'mpi,h'. Note to make sure you do not give the path to a folder from another MPI flavor that might be installed on your system. Mixing MPI flavors for 'mpicc' and 'mpi.h' will cause failures. Make sure to read and confirm the auto-detected paths. 4. Finish installation according to steps 2-4 of the EASY INSTALL Note: For OpenMPI you may also want to set OMPI_MCA_mpi_warn_on_fork=0 in your ~/.bash_profile to turn off certain nonfatal warnings. Note: If jobs hang or freeze when using mpiexec under OpenMPI try adding the '-mca btl ^openib' flag to mpiexec command when running MAKER. Example: mpiexec -mca btl ^openib -n 20 maker Then to diable the forks warning, just add the parameter --mca mpi_warn_on_fork 0 to the mpiexec options as described in the warning. How to run with OpenMPi has also been covered extensively ibn the MAKER list archives and more detail can be found there ?> https://groups.google.com/forum/#!searchin/maker-devel/openmpi%7Csort:date Thanks, Carson > On Nov 27, 2017, at 6:18 AM, HAI WANG wrote: > > Dear Professor Yandell, > > I am Hai Wang, a visiting scholar in Cornell University. I am sorry to bother you, but I really need your help. I am now using the maker pipeline to annotate a maize genome. The installation of maker, openmpi and other software should be OK since I?ve successfully run maker on your example data. > > But when I ran maker on my own maize genome, I always got the following error: > > > A process has executed an operation involving a call to the > "fork()" system call to create a child process. Open MPI is currently > operating in a condition that could result in memory corruption or > other system errors; your job may hang, crash, or produce silent > data corruption. The use of fork() (or system() or other calls that > create child processes) is strongly discouraged. > > The process that invoked fork was: > > Local host: [[21269,1],0] (PID 12537) > > If you are *absolutely sure* that your application will successfully > and correctly survive a call to fork(), you may disable this warning > by setting the mpi_warn_on_fork MCA parameter to 0. > -------------------------------------------------------------------------- > -------------------------------------------------------------------------- > mpiexec noticed that process rank 32 with PID 0 on node fat1 exited on signal 11 (Segmentation fault). > -------------------------------------------------------------------------- > > Could you please help me with this issue? Or is there a way that I can resume this job when it stops? Thank you very much! > > Best, > Hai Wang > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Nov 27 12:56:04 2017 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 27 Nov 2017 12:56:04 -0700 Subject: [maker-devel] About loss of Histone H2A, H2B, H4 In-Reply-To: References: <3A172BAF-DD5C-4CA8-8D1C-6EFF30A1FFA4@gmail.com> Message-ID: You should not have to train separately for SNAP on unmasked sequence, and I do believe adding back genes that were rejected because of lack of support but contain an identifiable domain may help. These will be in the fasta files labeled non-overlapping file in the datastore. ?Carson > On Nov 21, 2017, at 10:42 AM, Quanwei Zhang wrote: > > Dear Carson: > > Thank you for your comments and suggestions. Now the SNAP was trained with repeat masked, is it necessary to retrain the predictor without repeat masking? > By BUSCO analysis on the genome, the completeness is shown as below. Now I am doing the analysis using the default reports of Maker2 (i.e., gene models with evidence support, the default build). For the gene loss, besides you suggestions I am also considering to do the analysis using the gene models with evidence support plus those with scanned domains (i.e., standard build). How do you think? > > > C:95.0%[S:92.7%,D:2.3%],F:2.2%,M:2.8%,n:4104 > 3902 Complete BUSCOs (C) > 3806 Complete and single-copy BUSCOs (S) > 96 Complete and duplicated BUSCOs (D) > 92 Fragmented BUSCOs (F) > 110 Missing BUSCOs (M) > > Thanks > Best > Quanwei > > > 2017-11-21 11:19 GMT-05:00 Carson Holt >: > No known biases, but if you are concerned, you can collect known Histone H2A, H2B, H4 proteins and transcripts from other species (protein= and altest= options), them run MAKER with no masking to see if you gain any models that may have been overlooked because of over-masking of repeats. Make sure to evaluate any models you find as being a pseudogene. Run InterProScan on results to make sure they contain known InterPro domains for that gene family as well. Running without repeat masking will increase sensitivity but also false positives derived from low homology alignments to simple repeats which is why you need to evaluate results using something like InterProScan. > > Also run BUSCO to evaluate the completeness of the genome. Make sure that the observed contraction is not just a result of an incomplete assembly. > > ?Carson > > >> On Nov 16, 2017, at 12:46 PM, Quanwei Zhang > wrote: >> >> Hello: >> >> We have annotated a new rodent genome using Maker2. Based on the annotated maker2 gene sets, we did gene family expansion/contraction analysis using CAFE. We found Histone H2A, H2B, H4 gene families are under contraction. I wonder whether there are known bias to predict those gene families using Maker2? For example, can this due to repeat masking of the genome? I used repeatmaker and generated species specific repeat libraries follows http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction--Basic . >> >> Thanks >> >> Best >> Quanwei >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From qwzhang0601 at gmail.com Tue Nov 28 06:39:52 2017 From: qwzhang0601 at gmail.com (Quanwei Zhang) Date: Tue, 28 Nov 2017 08:39:52 -0500 Subject: [maker-devel] About loss of Histone H2A, H2B, H4 In-Reply-To: References: <3A172BAF-DD5C-4CA8-8D1C-6EFF30A1FFA4@gmail.com> Message-ID: Dear Carson: Thank you! Best Quanwei 2017-11-27 14:56 GMT-05:00 Carson Holt : > You should not have to train separately for SNAP on unmasked sequence, and > I do believe adding back genes that were rejected because of lack of > support but contain an identifiable domain may help. These will be in the > fasta files labeled non-overlapping file in the datastore. > > ?Carson > > On Nov 21, 2017, at 10:42 AM, Quanwei Zhang wrote: > > Dear Carson: > > Thank you for your comments and suggestions. Now the SNAP was trained with > repeat masked, is it necessary to retrain the predictor without repeat > masking? > By BUSCO analysis on the genome, the completeness is shown as below. Now I > am doing the analysis using the default reports of Maker2 (i.e., gene > models with evidence support, the default build). For the gene loss, > besides you suggestions I am also considering to do the analysis using the > gene models with evidence support plus those with scanned domains (i.e., > standard build). How do you think? > > > C:95.0%[S:92.7%,D:2.3%],F:2.2%,M:2.8%,n:4104 > 3902 Complete BUSCOs (C) > 3806 Complete and single-copy BUSCOs (S) > 96 Complete and duplicated BUSCOs (D) > 92 Fragmented BUSCOs (F) > 110 Missing BUSCOs (M) > > Thanks > Best > Quanwei > > > 2017-11-21 11:19 GMT-05:00 Carson Holt : > >> No known biases, but if you are concerned, you can collect known Histone >> H2A, H2B, H4 proteins and transcripts from other species (protein= and >> altest= options), them run MAKER with no masking to see if you gain any >> models that may have been overlooked because of over-masking of repeats. >> Make sure to evaluate any models you find as being a pseudogene. Run >> InterProScan on results to make sure they contain known InterPro domains >> for that gene family as well. Running without repeat masking will increase >> sensitivity but also false positives derived from low homology alignments >> to simple repeats which is why you need to evaluate results using something >> like InterProScan. >> >> Also run BUSCO to evaluate the completeness of the genome. Make sure that >> the observed contraction is not just a result of an incomplete assembly. >> >> ?Carson >> >> >> On Nov 16, 2017, at 12:46 PM, Quanwei Zhang >> wrote: >> >> Hello: >> >> We have annotated a new rodent genome using Maker2. Based on the >> annotated maker2 gene sets, we did gene family expansion/contraction >> analysis using CAFE. We found Histone H2A, H2B, H4 gene families are under >> contraction. I wonder whether there are known bias to predict those gene >> families using Maker2? For example, can this due to repeat masking of the >> genome? I used repeatmaker and generated species specific repeat libraries >> follows http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repe >> at_Library_Construction--Basic. >> >> Thanks >> >> Best >> Quanwei >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Nov 28 16:39:47 2017 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 28 Nov 2017 16:39:47 -0700 Subject: [maker-devel] custom "ab initio" predictions with automatic hint-based predictions In-Reply-To: <81D27009-2422-4116-848A-E2C862A74075@univie.ac.at> References: <947BFB2F-A893-417B-A043-07CE71F6F97E@gmail.com> <81D27009-2422-4116-848A-E2C862A74075@univie.ac.at> Message-ID: <768084A0-A5DA-4745-8151-D53AD0E495E3@gmail.com> Your patch will essentially just turn off all maker hint based gene prediction when no_abinit is turned on. We do not currently have a way to pass in external hints, but if you just want your hint based predictions to compete against MAKER hint based prediction, you can provide it as pred_gff while still letting MAKER run by giving the augustus_species file. ?Carson > On Nov 28, 2017, at 7:37 AM, Bob Zimmermann wrote: > > Dear Carson, > > Thanks for the response! Sorry for the slow reply. > > Actually what I meant was that I wanted to generate other types of hints that maker could not automatically use to prevent lower quality ab initio predictions from influencing the final output. Therefore I wanted to make my own ab intio predicitions prior to running maker, and then have maker to generate the transcript hints and then run augustus, finally synthesizing my own ab initio predicions with the maker hint-based ones. (In other words, just run the second round of augustus, not the first one.) > > I?ve attached a patch which seemed to allow me to tell maker to do what I wanted it to do. Am I missing something? > > Best, > Bob > > ? > > Department of Molecular Evolution and Development > Universit?t Wien > Althanstra?e 14 (UZA I), Zimmer 2.019 > 1090 Vienna > Austria > > +43 1 427757002 > > > >> On 13 Oct 2017, at 17:42, Carson Holt wrote: >> >> Hi Bob, >> >> pred_gff is a way to get models MAKER cannot run into the analysis. Input to pred_gff will not get hints since MAKER is not running the program. Setting augustus_species allows MAKER to run Augustus with and without hints and then those models compete against each other. You cannot just run with hints as the raw model is also used as a filter to help reduce false positive gene models that result from bad hints. If the gff3 you are providing is the same as the MAKER run of Augustus, I would recommend not providing it. If it is different in some way, then you can leave it in. If you run under MPI (it?s ok to run MPI on a single machine), then MAKER will parallelize the Augustus run by running multiple configs and contig chunks at the same time. >> >> Thanks, >> Carson >> >> >> >> >> >>> On Oct 11, 2017, at 1:42 PM, Bob Zimmermann wrote: >>> >>> Hello, >>> >>> I would like to run maker with a custom set of ab initio predictions (based on hints given to augustus from RNAseq data), but allowing it to incorporate EST and protein data to make an additional run of augustus using hints derived from those alignments. >>> >>> My gene prediction section of the maker_opts.ctl file looks like this: >>> ... >>> augustus_species=all_combined #Augustus gene prediction species model >>> ... >>> pred_gff=../ab_initio_predictions/all_combined.augustus_masked.gff3 #ab-initio predictions from an external GFF3 file >>> model_gff= #annotated gene models from an external GFF3 file (annotation pass-through) >>> est2genome=1 #infer gene predictions directly from ESTs, 1 = yes, 0 = no >>> protein2genome=0 #infer predictions from protein homology, 1 = yes, 0 = no >>> ? >>> >>> It seems as though even if pred_gff is set, augustus will still be run for ab initio predictions with no hints if an augustus_species setting is present. I was curious if there was any way around this, partly because custom ab initios could improve my annotation and also because the ab initio step can take long. >>> >>> Thanks for your help! >>> >>> Bob >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> > From robert.zimmermann at univie.ac.at Tue Nov 28 07:37:40 2017 From: robert.zimmermann at univie.ac.at (Bob Zimmermann) Date: Tue, 28 Nov 2017 15:37:40 +0100 Subject: [maker-devel] custom "ab initio" predictions with automatic hint-based predictions In-Reply-To: <947BFB2F-A893-417B-A043-07CE71F6F97E@gmail.com> References: <947BFB2F-A893-417B-A043-07CE71F6F97E@gmail.com> Message-ID: <81D27009-2422-4116-848A-E2C862A74075@univie.ac.at> Dear Carson, Thanks for the response! Sorry for the slow reply. Actually what I meant was that I wanted to generate other types of hints that maker could not automatically use to prevent lower quality ab initio predictions from influencing the final output. Therefore I wanted to make my own ab intio predicitions prior to running maker, and then have maker to generate the transcript hints and then run augustus, finally synthesizing my own ab initio predicions with the maker hint-based ones. (In other words, just run the second round of augustus, not the first one.) I?ve attached a patch which seemed to allow me to tell maker to do what I wanted it to do. Am I missing something? Best, Bob ? Department of Molecular Evolution and Development Universit?t Wien Althanstra?e 14 (UZA I), Zimmer 2.019 1090 Vienna Austria +43 1 427757002 -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_noabinit.patch Type: application/octet-stream Size: 950 bytes Desc: not available URL: -------------- next part -------------- > On 13 Oct 2017, at 17:42, Carson Holt wrote: > > Hi Bob, > > pred_gff is a way to get models MAKER cannot run into the analysis. Input to pred_gff will not get hints since MAKER is not running the program. Setting augustus_species allows MAKER to run Augustus with and without hints and then those models compete against each other. You cannot just run with hints as the raw model is also used as a filter to help reduce false positive gene models that result from bad hints. If the gff3 you are providing is the same as the MAKER run of Augustus, I would recommend not providing it. If it is different in some way, then you can leave it in. If you run under MPI (it?s ok to run MPI on a single machine), then MAKER will parallelize the Augustus run by running multiple configs and contig chunks at the same time. > > Thanks, > Carson > > > > > >> On Oct 11, 2017, at 1:42 PM, Bob Zimmermann wrote: >> >> Hello, >> >> I would like to run maker with a custom set of ab initio predictions (based on hints given to augustus from RNAseq data), but allowing it to incorporate EST and protein data to make an additional run of augustus using hints derived from those alignments. >> >> My gene prediction section of the maker_opts.ctl file looks like this: >> ... >> augustus_species=all_combined #Augustus gene prediction species model >> ... >> pred_gff=../ab_initio_predictions/all_combined.augustus_masked.gff3 #ab-initio predictions from an external GFF3 file >> model_gff= #annotated gene models from an external GFF3 file (annotation pass-through) >> est2genome=1 #infer gene predictions directly from ESTs, 1 = yes, 0 = no >> protein2genome=0 #infer predictions from protein homology, 1 = yes, 0 = no >> ? >> >> It seems as though even if pred_gff is set, augustus will still be run for ab initio predictions with no hints if an augustus_species setting is present. I was curious if there was any way around this, partly because custom ab initios could improve my annotation and also because the ab initio step can take long. >> >> Thanks for your help! >> >> Bob >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > From aircoolsky at gmail.com Thu Nov 30 23:20:20 2017 From: aircoolsky at gmail.com (Yu-Hsuan Cheng) Date: Fri, 1 Dec 2017 14:20:20 +0800 Subject: [maker-devel] Changing the genetic code table in MAKER Message-ID: Hi, This is YuHsuan Cheng, who is a PhD student from Taiwan. I want to use the MAKER combining with SNAP to annotate ciliates genome. The genetic code for ciliates is different from other species, so I am wondering that if there is any option in MAKER I can change the genetic code table? I also asked Dr. Korf about this issue, he said SNAP has no way to change the genetic code table. I will use Augustus combining with Maker later on. The pipeline I used previously is as followed. 1. MAKER (Hints from proteome and RNAseq) 2. MAKER to Zff 3. ~/bin/maker/exe/snap/hmm-assembler.pl snapFirst . > ../../snapFirst.hmm and then used snapFirst.hmm as hints in MAKER Look forward to your reply. Thank you. Best wishes, YuHsuan Yu-Hsuan Cheng ??? Institute of Molecular Biology Academia Sinica 128 Academia road, Section 2 Nankang, Taipei 115 Taiwan Phone:+886-2-2789-9216 <+886%202%202789%209216> (Lab), +886-958-216-538 <+886%20958%20216%20538> (Mobile phone) d02b48008 at ntu.edu.tw -------------- next part -------------- An HTML attachment was scrubbed... URL: