From morgan_starr_s at live.com Sun Feb 3 13:13:47 2019 From: morgan_starr_s at live.com (morgan sobol) Date: Sun, 3 Feb 2019 19:13:47 +0000 Subject: [maker-devel] Re-annotation, fewer gene predictions Message-ID: <77517CF5-7C20-43FE-94EB-7F45A3D70A4F@live.com> Hello, I previously used Maker to annotate two different fungal genomes that were created using Illumina sequences only. For these genomes, I had over 11,000 genes predicted. I recently obtained PacBio sequences for the same genomes, so I created two hybrid assemblies. Both assemblies were very familiar in length and completed number of orthologs to the Illumina only assembly, but had much fewer, but longer contigs. I re-ran Maker using the settings below. For one of my genomes, I got around 11,000 genes predicted again, as expected. However, for the other genome, I am continuously getting ~4,400 predicted genes. I am asking for help as to how I can determine why I keep getting fewer predicted genes for only one of my genomes, even though I ran them the same? Thanks, Morgan S. maker_opts.log #-----Genome (these are always required) genome=/work/Geomicrobiology/msobol/IODP_329_SPG/1368D2H1/repeatmasker/unicycler/1368D_unicycler_contigs.fasta.masked #genome sequence (fasta file or$ organism_type=eukaryotic #eukaryotic or prokaryotic. Default is eukaryotic #-----Re-annotation Using MAKER Derived GFF3 maker_gff=/work/Geomicrobiology/msobol/IODP_329_SPG/1368D2H1/maker/1368D_2H1_contigs.fasta.maker.output/1368D_2H1_contigs.fasta.all.gff #MAKER derive$ est_pass=0 #use ESTs in maker_gff: 1 = yes, 0 = no altest_pass=1 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no protein_pass=1 #use protein alignments in maker_gff: 1 = yes, 0 = no rm_pass=0 #use repeats in maker_gff: 1 = yes, 0 = no model_pass=0 #use gene models in maker_gff: 1 = yes, 0 = no pred_pass=0 #use ab-initio predictions in maker_gff: 1 = yes, 0 = no other_pass=0 #passthrough anyything else in maker_gff: 1 = yes, 0 = no #-----EST Evidence (for best results provide a file for at least one) est= #set of ESTs or assembled mRNA-seq in fasta format altest= #EST/cDNA sequence file in fasta format from an alternate organism est_gff= #aligned ESTs or mRNA-seq from an external GFF3 file altest_gff= #aligned ESTs from a closly relate species in GFF3 format #-----Protein Homology Evidence (for best results provide a file for at least one) protein=/work/Geomicrobiology/msobol/IODP_329_SPG/uniprot_sprot.fasta #protein sequence file in fasta format (i.e. from mutiple oransisms) protein_gff= #aligned protein homology evidence from an external GFF3 file #-----Repeat Masking (leave values blank to skip repeat masking) model_org= #select a model organism for RepBase masking in RepeatMasker rmlib= #provide an organism specific repeat library in fasta format for RepeatMasker repeat_protein= #provide a fasta file of transposable element proteins for RepeatRunner rm_gff= #pre-identified repeat elements from an external GFF3 file prok_rm=0 #forces MAKER to repeatmask prokaryotes (no reason to change this), 1 = yes, 0 = no softmask=0 #use soft-masking rather than hard-masking in BLAST (i.e. seg and dust filtering) #-----Gene Prediction snaphmm= #SNAP HMM file gmhmm=/home/msobol/genemark/68D_2/output/gmhmm.mod #GeneMark HMM file augustus_species=1368D_uni #Augustus gene prediction species model fgenesh_par_file= #FGENESH parameter file pred_gff= #ab-initio predictions from an external GFF3 file model_gff= #annotated gene models from an external GFF3 file (annotation pass-through) est2genome=0 #infer gene predictions directly from ESTs, 1 = yes, 0 = no protein2genome=1 #infer predictions from protein homology, 1 = yes, 0 = no trna=0 #find tRNAs with tRNAscan, 1 = yes, 0 = no snoscan_rrna= #rRNA file to have Snoscan find snoRNAs unmask=0 #also run ab-initio prediction programs on unmasked sequence, 1 = yes, 0 = no #-----Other Annotation Feature Types (features MAKER doesn't recognize) other_gff= #extra features to pass-through to final MAKER generated GFF3 file #-----External Application Behavior Options alt_peptide=C #amino acid used to replace non-standard amino acids in BLAST databases cpus=1 #max number of cpus to use in BLAST and RepeatMasker (not for MPI, leave 1 when using MPI) #-----MAKER Behavior Options max_dna_len=100000 #length for dividing up contigs into chunks (increases/decreases memory usage) min_contig=1 #skip genome contigs below this length (under 10kb are often useless) pred_flank=200 #flank for extending evidence clusters sent to gene predictors pred_stats=1 #report AED and QI statistics for all predictions as well as models AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and 1) min_protein=0 #require at least this many amino acids in predicted proteins alt_splice=0 #Take extra steps to try and find alternative splicing, 1 = yes, 0 = no always_complete=0 #extra steps to force start and stop codons, 1 = yes, 0 = no map_forward=0 #map names and attributes forward from old GFF3 genes, 1 = yes, 0 = no keep_preds=1 #Concordance threshold to add unsupported gene prediction (bound by 0 and 1) split_hit=10000 #length for the splitting of hits (expected max intron size for evidence alignments) single_exon=1 #consider single exon EST evidence when generating annotations, 1 = yes, 0 = no single_length=250 #min length required for single exon ESTs if 'single_exon is enabled' correct_est_fusion=0 #limits use of ESTs in annotation to avoid fusion genes tries=2 #number of times to try a contig if there is a failure for some reason clean_try=0 #remove all data from previous run before retrying, 1 = yes, 0 = no clean_up=0 #removes theVoid directory with individual analysis files, 1 = yes, 0 = no TMP= #specify a directory other than the system default temporary directory for temporary files -------------- next part -------------- An HTML attachment was scrubbed... URL: From xvazquezc at gmail.com Sun Feb 3 16:43:42 2019 From: xvazquezc at gmail.com (=?UTF-8?Q?Xabier_V=C3=A1zquez=2DCampos?=) Date: Mon, 4 Feb 2019 09:43:42 +1100 Subject: [maker-devel] Re-annotation, fewer gene predictions In-Reply-To: <77517CF5-7C20-43FE-94EB-7F45A3D70A4F@live.com> References: <77517CF5-7C20-43FE-94EB-7F45A3D70A4F@live.com> Message-ID: Hi Morgan, We had a similar issue with AUGUSTUS underpredicting when using a BUSCO-derived gene model https://groups.google.com/d/msg/maker-devel/ocnDG4nq1A8/NyCPzzRgAgAJ Also, check the number of proteins by each individual predictor. If the numbers from one of them are off, you may find a possible source of issues. We didn't have a very good experience with GM, as it used to overpredict an absurd number of proteins. Xabi On Mon, 4 Feb 2019 at 06:15, morgan sobol wrote: > Hello, > > I previously used Maker to annotate two different fungal genomes that were > created using Illumina sequences only. For these genomes, I had over 11,000 > genes predicted. > I recently obtained PacBio sequences for the same genomes, so I created > two hybrid assemblies. Both assemblies were very familiar in length and > completed number of orthologs to the Illumina only assembly, but had much > fewer, but longer contigs. > > I re-ran Maker using the settings below. For one of my genomes, I got > around 11,000 genes predicted again, as expected. However, for the other > genome, I am continuously getting ~4,400 predicted genes. > > I am asking for help as to how I can determine why I keep getting fewer > predicted genes for only one of my genomes, even though I ran them the same? > > Thanks, > Morgan S. > > maker_opts.log > #-----Genome (these are always required) > genome=/work/Geomicrobiology/msobol/IODP_329_SPG/1368D2H1/repeatmasker/unicycler/1368D_unicycler_contigs.fasta.masked > #genome sequence (fasta file or$ > organism_type=eukaryotic #eukaryotic or prokaryotic. Default is eukaryotic > > #-----Re-annotation Using MAKER Derived GFF3 > maker_gff=/work/Geomicrobiology/msobol/IODP_329_SPG/1368D2H1/maker/1368D_2H1_contigs.fasta.maker.output/1368D_2H1_contigs.fasta.all.gff > #MAKER derive$ > est_pass=0 #use ESTs in maker_gff: 1 = yes, 0 = no > altest_pass=1 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no > protein_pass=1 #use protein alignments in maker_gff: 1 = yes, 0 = no > rm_pass=0 #use repeats in maker_gff: 1 = yes, 0 = no > model_pass=0 #use gene models in maker_gff: 1 = yes, 0 = no > pred_pass=0 #use ab-initio predictions in maker_gff: 1 = yes, 0 = no > other_pass=0 #passthrough anyything else in maker_gff: 1 = yes, 0 = no > > #-----EST Evidence (for best results provide a file for at least one) > est= #set of ESTs or assembled mRNA-seq in fasta format > altest= #EST/cDNA sequence file in fasta format from an alternate organism > est_gff= #aligned ESTs or mRNA-seq from an external GFF3 file > altest_gff= #aligned ESTs from a closly relate species in GFF3 format > > #-----Protein Homology Evidence (for best results provide a file for at > least one) > protein=/work/Geomicrobiology/msobol/IODP_329_SPG/uniprot_sprot.fasta > #protein sequence file in fasta format (i.e. from mutiple oransisms) > protein_gff= #aligned protein homology evidence from an external GFF3 file > > #-----Repeat Masking (leave values blank to skip repeat masking) > model_org= #select a model organism for RepBase masking in RepeatMasker > rmlib= #provide an organism specific repeat library in fasta format for > RepeatMasker > repeat_protein= #provide a fasta file of transposable element proteins for > RepeatRunner > rm_gff= #pre-identified repeat elements from an external GFF3 file > prok_rm=0 #forces MAKER to repeatmask prokaryotes (no reason to change > this), 1 = yes, 0 = no > softmask=0 #use soft-masking rather than hard-masking in BLAST (i.e. seg > and dust filtering) > > #-----Gene Prediction > snaphmm= #SNAP HMM file > gmhmm=/home/msobol/genemark/68D_2/output/gmhmm.mod #GeneMark HMM file > augustus_species=1368D_uni #Augustus gene prediction species model > fgenesh_par_file= #FGENESH parameter file > pred_gff= #ab-initio predictions from an external GFF3 file > model_gff= #annotated gene models from an external GFF3 file (annotation > pass-through) > est2genome=0 #infer gene predictions directly from ESTs, 1 = yes, 0 = no > protein2genome=1 #infer predictions from protein homology, 1 = yes, 0 = no > trna=0 #find tRNAs with tRNAscan, 1 = yes, 0 = no > snoscan_rrna= #rRNA file to have Snoscan find snoRNAs > unmask=0 #also run ab-initio prediction programs on unmasked sequence, 1 = > yes, 0 = no > > #-----Other Annotation Feature Types (features MAKER doesn't recognize) > other_gff= #extra features to pass-through to final MAKER generated GFF3 > file > > #-----External Application Behavior Options > alt_peptide=C #amino acid used to replace non-standard amino acids in > BLAST databases > cpus=1 #max number of cpus to use in BLAST and RepeatMasker (not for MPI, > leave 1 when using MPI) > > #-----MAKER Behavior Options > max_dna_len=100000 #length for dividing up contigs into chunks > (increases/decreases memory usage) > min_contig=1 #skip genome contigs below this length (under 10kb are often > useless) > > pred_flank=200 #flank for extending evidence clusters sent to gene > predictors > pred_stats=1 #report AED and QI statistics for all predictions as well as > models > AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and > 1) > min_protein=0 #require at least this many amino acids in predicted proteins > alt_splice=0 #Take extra steps to try and find alternative splicing, 1 = > yes, 0 = no > always_complete=0 #extra steps to force start and stop codons, 1 = yes, 0 > = no > map_forward=0 #map names and attributes forward from old GFF3 genes, 1 = > yes, 0 = no > keep_preds=1 #Concordance threshold to add unsupported gene prediction > (bound by 0 and 1) > > split_hit=10000 #length for the splitting of hits (expected max intron > size for evidence alignments) > single_exon=1 #consider single exon EST evidence when generating > annotations, 1 = yes, 0 = no > single_length=250 #min length required for single exon ESTs if > 'single_exon is enabled' > correct_est_fusion=0 #limits use of ESTs in annotation to avoid fusion > genes > > tries=2 #number of times to try a contig if there is a failure for some > reason > clean_try=0 #remove all data from previous run before retrying, 1 = yes, 0 > = no > clean_up=0 #removes theVoid directory with individual analysis files, 1 = > yes, 0 = no > TMP= #specify a directory other than the system default temporary > directory for temporary files > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -- Xabier V?zquez-Campos, *PhD* *Research Associate* NSW Systems Biology Initiative School of Biotechnology and Biomolecular Sciences The University of New South Wales Sydney NSW 2052 AUSTRALIA -------------- next part -------------- An HTML attachment was scrubbed... URL: From keith.decker at bayer.com Mon Feb 4 12:09:35 2019 From: keith.decker at bayer.com (DECKER, KEITH F [AG/1005]) Date: Mon, 4 Feb 2019 18:09:35 +0000 Subject: [maker-devel] MAKER on AWS Message-ID: <4660790F-38F4-470B-8B8E-9911A0BC36C3@contoso.com> I would like to evaluate the use of MAKER on AWS, but I am unsure what the best approach to parallelization would be. I found this old post on STARCLUSTER, http://efish.integrativebiology.msu.edu/2015/02/10/annotate.html but my understanding is that STARCLUSTER and its successors (cfncluster and parallel cluster) can be challenging to set up and use. So my questions are 1. Has anyone had recent success running MAKER on cfncluster or parallel cluster in AWS? 2. Would it be reasonable to just split up N chromosomes across N ECS instances and collect the results at the end? If so, does it make sense to run each chromosome level annotation on for example an m4.16xlarge instance with 64 cores and 256 GB of RAM? Or is there a maximum number of cores at which the benefits from parallelization saturate? Thanks and sorry for the long question Keith This system contains confidential and copyrighted information. Access to the system is limited to users only and only for approved business purposes. Anyone obtaining access to and using this system acknowledges that all information on this system including but not limited to electronic mail, word processing, directories and files, constitutes private property belonging to the Company. Anyone using of viewing this system is further advised that the use of this system may be recorded and the information contained herein may be monitored, retrieved and reviewed if, in the Company?s sole discretion there is a business reason to do so. If improper activity or use is suspected, all available information may be used by the Company for possible disciplinary action, prosecution, civil claim or any remedy or lawful purpose. -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Feb 4 12:31:29 2019 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 4 Feb 2019 11:31:29 -0700 Subject: [maker-devel] MAKER on AWS In-Reply-To: <4660790F-38F4-470B-8B8E-9911A0BC36C3@contoso.com> References: <4660790F-38F4-470B-8B8E-9911A0BC36C3@contoso.com> Message-ID: <0934DD0D-9431-4454-A278-87E27D44F984@gmail.com> You can try and stand up a cluster inside AWS, or like you said just start independent instances each with their own piece of the total dataset. There is a tools called fasta_tool inside of maker that makes it easy to split up the dataset into equal sized chunks. Alternatively, CyVerse has set up an interesting MAKER wrapper (WQ-MAKER) that launches multiple cloud instances for MAKER and handles data chunking for you (they?ve been using XSEDE cloud resources through the NSF) ?> http://ccl.cse.nd.edu/research/papers/maker-service-ic2e2018.pdf Here is an example of an external project using their setup ?> http://onsnetwork.org/kubu4/2018/08/07/genome-annotation-olympia-oyster-genome-using-wq-maker-instance-on-jetstream/ ?Carson > On Feb 4, 2019, at 11:09 AM, DECKER, KEITH F [AG/1005] wrote: > > I would like to evaluate the use of MAKER on AWS, but I am unsure what the best approach to parallelization would be. > I found this old post on STARCLUSTER, http://efish.integrativebiology.msu.edu/2015/02/10/annotate.html > but my understanding is that STARCLUSTER and its successors (cfncluster and parallel cluster) can be challenging to set up and use. > > So my questions are > > 1. Has anyone had recent success running MAKER on cfncluster or parallel cluster in AWS? > 2. Would it be reasonable to just split up N chromosomes across N ECS instances and collect the results at the end? If so, does it make sense to run each chromosome level annotation on for example an m4.16xlarge instance with 64 cores and 256 GB of RAM? Or is there a maximum number of cores at which the benefits from parallelization saturate? > > Thanks and sorry for the long question > Keith > > > This system contains confidential and copyrighted information. Access to the system is limited to users only and only for approved business purposes. > Anyone obtaining access to and using this system acknowledges that all information on this system including but not limited to electronic mail, word processing, directories and files, constitutes private property belonging to the Company. > Anyone using of viewing this system is further advised that the use of this system may be recorded and the information contained herein may be monitored, retrieved and reviewed if, in the Company?s sole discretion there is a business reason to do so. > If improper activity or use is suspected, all available information may be used by the Company for possible disciplinary action, prosecution, civil claim or any remedy or lawful purpose. > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From liorglck at gmail.com Mon Feb 4 03:00:29 2019 From: liorglck at gmail.com (Lior Glick) Date: Mon, 4 Feb 2019 11:00:29 +0200 Subject: [maker-devel] MAKER not calling RepeatMasker exe indicated in maker_exe.ctl Message-ID: Dear MAKER users, I've been using MAKER for a while now, with RepeatMasker installed locally. By that I mean that I can type 'RepeatMasker' in my terminal and the software is initiated. Typing 'which RepeatMasker' shows the correct local path. I also use this path as value for the maker_exe.ctl parameter 'RepeatMasker'. Trying to generalize my working environment, I am trying to use a conda env which is capable of running MAKER. This env comes with RepeatMasker as well. Once I activate this env, I can still run RepeatMasker, but it points to a different path. When I run MAKER within this env, it fails right away with the error message: ERROR: Could not determine if RepBase is installed Running the same configuration files locally (i.e. outside the conda env) results in a successful run. This leads me to think that MAKER is not actually using the path indicated in the maker_exe.ctl file, and rather looks for RepeatMasker in $PATH or something similar. Is that the expected behavior? Any suggestions of how to overcome this issue? Thanks and best regards, Lior -------------- next part -------------- An HTML attachment was scrubbed... URL: From keith.decker at bayer.com Mon Feb 4 12:39:48 2019 From: keith.decker at bayer.com (DECKER, KEITH F [AG/1005]) Date: Mon, 4 Feb 2019 18:39:48 +0000 Subject: [maker-devel] MAKER on AWS In-Reply-To: <0934DD0D-9431-4454-A278-87E27D44F984@gmail.com> References: <4660790F-38F4-470B-8B8E-9911A0BC36C3@contoso.com> <0934DD0D-9431-4454-A278-87E27D44F984@gmail.com> Message-ID: <1BAD7C53-AFA5-4A4A-B35B-D760B3D4C28D@monsanto.com> Thanks, Do you have metrics on how MAKER performs on annotating a single chromosome on a single machine? For example, will I see anything close to 16X speed-up using a 16 core machine, and does performance improvement saturate at a certain number of cores? -Keith From: Carson Holt Date: Monday, February 4, 2019 at 12:33 PM To: "DECKER, KEITH F [AG/1005]" Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] MAKER on AWS You can try and stand up a cluster inside AWS, or like you said just start independent instances each with their own piece of the total dataset. There is a tools called fasta_tool inside of maker that makes it easy to split up the dataset into equal sized chunks. Alternatively, CyVerse has set up an interesting MAKER wrapper (WQ-MAKER) that launches multiple cloud instances for MAKER and handles data chunking for you (they?ve been using XSEDE cloud resources through the NSF) ?> http://ccl.cse.nd.edu/research/papers/maker-service-ic2e2018.pdf Here is an example of an external project using their setup ?> http://onsnetwork.org/kubu4/2018/08/07/genome-annotation-olympia-oyster-genome-using-wq-maker-instance-on-jetstream/ ?Carson On Feb 4, 2019, at 11:09 AM, DECKER, KEITH F [AG/1005] > wrote: I would like to evaluate the use of MAKER on AWS, but I am unsure what the best approach to parallelization would be. I found this old post on STARCLUSTER, http://efish.integrativebiology.msu.edu/2015/02/10/annotate.html but my understanding is that STARCLUSTER and its successors (cfncluster and parallel cluster) can be challenging to set up and use. So my questions are 1. Has anyone had recent success running MAKER on cfncluster or parallel cluster in AWS? 2. Would it be reasonable to just split up N chromosomes across N ECS instances and collect the results at the end? If so, does it make sense to run each chromosome level annotation on for example an m4.16xlarge instance with 64 cores and 256 GB of RAM? Or is there a maximum number of cores at which the benefits from parallelization saturate? Thanks and sorry for the long question Keith This system contains confidential and copyrighted information. Access to the system is limited to users only and only for approved business purposes. Anyone obtaining access to and using this system acknowledges that all information on this system including but not limited to electronic mail, word processing, directories and files, constitutes private property belonging to the Company. Anyone using of viewing this system is further advised that the use of this system may be recorded and the information contained herein may be monitored, retrieved and reviewed if, in the Company?s sole discretion there is a business reason to do so. If improper activity or use is suspected, all available information may be used by the Company for possible disciplinary action, prosecution, civil claim or any remedy or lawful purpose. _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org This system contains confidential and copyrighted information. Access to the system is limited to users only and only for approved business purposes. Anyone obtaining access to and using this system acknowledges that all information on this system including but not limited to electronic mail, word processing, directories and files, constitutes private property belonging to the Company. Anyone using of viewing this system is further advised that the use of this system may be recorded and the information contained herein may be monitored, retrieved and reviewed if, in the Company?s sole discretion there is a business reason to do so. If improper activity or use is suspected, all available information may be used by the Company for possible disciplinary action, prosecution, civil claim or any remedy or lawful purpose. -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Feb 4 13:00:00 2019 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 4 Feb 2019 12:00:00 -0700 Subject: [maker-devel] MAKER on AWS In-Reply-To: <1BAD7C53-AFA5-4A4A-B35B-D760B3D4C28D@monsanto.com> References: <4660790F-38F4-470B-8B8E-9911A0BC36C3@contoso.com> <0934DD0D-9431-4454-A278-87E27D44F984@gmail.com> <1BAD7C53-AFA5-4A4A-B35B-D760B3D4C28D@monsanto.com> Message-ID: I don?t have cloud performance stats, but I do have cluster performance stats you may be able to somewhat correlate (attached). On a cluster we see nearly linear performance gains until ~100 CPU cores, and the plateau doesn?t fully level out until well after 600 cores (we are hitting IO and networking limits for inter-node communication). So if you are only using a single instance, you can essentially consider it the equivalent of a single real machine which would fall well under 100 CPU cores, and performance growth would be expected to be linear on that instance. ?Carson > On Feb 4, 2019, at 11:39 AM, DECKER, KEITH F [AG/1005] wrote: > > Thanks, > Do you have metrics on how MAKER performs on annotating a single chromosome on a single machine? For example, will I see anything close to 16X speed-up using a 16 core machine, and does performance improvement saturate at a certain number of cores? > > -Keith > > From: Carson Holt > > Date: Monday, February 4, 2019 at 12:33 PM > To: "DECKER, KEITH F [AG/1005]" > > Cc: "maker-devel at yandell-lab.org " > > Subject: Re: [maker-devel] MAKER on AWS > > You can try and stand up a cluster inside AWS, or like you said just start independent instances each with their own piece of the total dataset. There is a tools called fasta_tool inside of maker that makes it easy to split up the dataset into equal sized chunks. > > Alternatively, CyVerse has set up an interesting MAKER wrapper (WQ-MAKER) that launches multiple cloud instances for MAKER and handles data chunking for you (they?ve been using XSEDE cloud resources through the NSF) ?> > http://ccl.cse.nd.edu/research/papers/maker-service-ic2e2018.pdf > > Here is an example of an external project using their setup ?> http://onsnetwork.org/kubu4/2018/08/07/genome-annotation-olympia-oyster-genome-using-wq-maker-instance-on-jetstream/ > > ?Carson > > > > > > On Feb 4, 2019, at 11:09 AM, DECKER, KEITH F [AG/1005] > wrote: > > I would like to evaluate the use of MAKER on AWS, but I am unsure what the best approach to parallelization would be. > I found this old post on STARCLUSTER, http://efish.integrativebiology.msu.edu/2015/02/10/annotate.html > but my understanding is that STARCLUSTER and its successors (cfncluster and parallel cluster) can be challenging to set up and use. > > So my questions are > > 1. Has anyone had recent success running MAKER on cfncluster or parallel cluster in AWS? > 2. Would it be reasonable to just split up N chromosomes across N ECS instances and collect the results at the end? If so, does it make sense to run each chromosome level annotation on for example an m4.16xlarge instance with 64 cores and 256 GB of RAM? Or is there a maximum number of cores at which the benefits from parallelization saturate? > > Thanks and sorry for the long question > Keith > > > > This system contains confidential and copyrighted information. Access to the system is limited to users only and only for approved business purposes. > Anyone obtaining access to and using this system acknowledges that all information on this system including but not limited to electronic mail, word processing, directories and files, constitutes private property belonging to the Company. > Anyone using of viewing this system is further advised that the use of this system may be recorded and the information contained herein may be monitored, retrieved and reviewed if, in the Company?s sole discretion there is a business reason to do so. > If improper activity or use is suspected, all available information may be used by the Company for possible disciplinary action, prosecution, civil claim or any remedy or lawful purpose. > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > This system contains confidential and copyrighted information. Access to the system is limited to users only and only for approved business purposes. > Anyone obtaining access to and using this system acknowledges that all information on this system including but not limited to electronic mail, word processing, directories and files, constitutes private property belonging to the Company. > Anyone using of viewing this system is further advised that the use of this system may be recorded and the information contained herein may be monitored, retrieved and reviewed if, in the Company?s sole discretion there is a business reason to do so. > If improper activity or use is suspected, all available information may be used by the Company for possible disciplinary action, prosecution, civil claim or any remedy or lawful purpose. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: PastedGraphic-2.pdf Type: application/pdf Size: 41424 bytes Desc: not available URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: From xvazquezc at gmail.com Tue Feb 5 16:42:40 2019 From: xvazquezc at gmail.com (=?UTF-8?Q?Xabier_V=C3=A1zquez=2DCampos?=) Date: Wed, 6 Feb 2019 09:42:40 +1100 Subject: [maker-devel] Re-annotation, fewer gene predictions In-Reply-To: References: <77517CF5-7C20-43FE-94EB-7F45A3D70A4F@live.com> Message-ID: Don't you use SNAP? It usually produces quite decent results. And easier to train than any of the other predictors In any case, the Augustus gene model is way off in both cases GM doesn't seem bad if your fungus has a rather usual genome... in the first. For the second, it looks bad I'm not too familiar with the reannotation but I'd rather create the gene models from scratch rather than reuse the ones from the Illumina-only genomes. Note that assemblies with long-reads, have a higher proportion of repetitive elements that need masking and RepeatMasker only may not be enough. In theory, this shouldn't affect Augustus model if trained through BUSCO as it uses defined conserved markers to create the gene model, but I'm not so sure about GM. If you trained Augustus with BUSCO, and this is the result, I'd discard the gene model and train it again by the "traditional way", i.e. as it used to be when we only had CEGMA. I had good results just by changing the training method. Hope it helps, Xabi On Wed, 6 Feb 2019 at 02:19, morgan sobol wrote: > Thank you, Xabi for the response. > The number of proteins from each source is greatly lower than before. > Previous numbers were 325, 10,899, and 11,243 for augustus, genemark, and > maker respectively. > The more recent numbers are 25, 857, 4418 respectively. > > So do you think maybe this hints that something is wrong from genemark? > > Morgan > > > ------------------------------ > *From:* Xabier V?zquez-Campos > *Sent:* Sunday, February 3, 2019 4:43 PM > *To:* morgan sobol > *Cc:* maker-devel at yandell-lab.org > *Subject:* Re: [maker-devel] Re-annotation, fewer gene predictions > > Hi Morgan, > > We had a similar issue with AUGUSTUS underpredicting when using a > BUSCO-derived gene model > https://groups.google.com/d/msg/maker-devel/ocnDG4nq1A8/NyCPzzRgAgAJ > > Also, check the number of proteins by each individual predictor. If the > numbers from one of them are off, you may find a possible source of issues. > We didn't have a very good experience with GM, as it used to overpredict > an absurd number of proteins. > > Xabi > > On Mon, 4 Feb 2019 at 06:15, morgan sobol wrote: > > Hello, > > I previously used Maker to annotate two different fungal genomes that were > created using Illumina sequences only. For these genomes, I had over 11,000 > genes predicted. > I recently obtained PacBio sequences for the same genomes, so I created > two hybrid assemblies. Both assemblies were very familiar in length and > completed number of orthologs to the Illumina only assembly, but had much > fewer, but longer contigs. > > I re-ran Maker using the settings below. For one of my genomes, I got > around 11,000 genes predicted again, as expected. However, for the other > genome, I am continuously getting ~4,400 predicted genes. > > I am asking for help as to how I can determine why I keep getting fewer > predicted genes for only one of my genomes, even though I ran them the same? > > Thanks, > Morgan S. > > maker_opts.log > #-----Genome (these are always required) > genome=/work/Geomicrobiology/msobol/IODP_329_SPG/1368D2H1/repeatmasker/unicycler/1368D_unicycler_contigs.fasta.masked > #genome sequence (fasta file or$ > organism_type=eukaryotic #eukaryotic or prokaryotic. Default is eukaryotic > > #-----Re-annotation Using MAKER Derived GFF3 > maker_gff=/work/Geomicrobiology/msobol/IODP_329_SPG/1368D2H1/maker/1368D_2H1_contigs.fasta.maker.output/1368D_2H1_contigs.fasta.all.gff > #MAKER derive$ > est_pass=0 #use ESTs in maker_gff: 1 = yes, 0 = no > altest_pass=1 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no > protein_pass=1 #use protein alignments in maker_gff: 1 = yes, 0 = no > rm_pass=0 #use repeats in maker_gff: 1 = yes, 0 = no > model_pass=0 #use gene models in maker_gff: 1 = yes, 0 = no > pred_pass=0 #use ab-initio predictions in maker_gff: 1 = yes, 0 = no > other_pass=0 #passthrough anyything else in maker_gff: 1 = yes, 0 = no > > #-----EST Evidence (for best results provide a file for at least one) > est= #set of ESTs or assembled mRNA-seq in fasta format > altest= #EST/cDNA sequence file in fasta format from an alternate organism > est_gff= #aligned ESTs or mRNA-seq from an external GFF3 file > altest_gff= #aligned ESTs from a closly relate species in GFF3 format > > #-----Protein Homology Evidence (for best results provide a file for at > least one) > protein=/work/Geomicrobiology/msobol/IODP_329_SPG/uniprot_sprot.fasta > #protein sequence file in fasta format (i.e. from mutiple oransisms) > protein_gff= #aligned protein homology evidence from an external GFF3 file > > #-----Repeat Masking (leave values blank to skip repeat masking) > model_org= #select a model organism for RepBase masking in RepeatMasker > rmlib= #provide an organism specific repeat library in fasta format for > RepeatMasker > repeat_protein= #provide a fasta file of transposable element proteins for > RepeatRunner > rm_gff= #pre-identified repeat elements from an external GFF3 file > prok_rm=0 #forces MAKER to repeatmask prokaryotes (no reason to change > this), 1 = yes, 0 = no > softmask=0 #use soft-masking rather than hard-masking in BLAST (i.e. seg > and dust filtering) > > #-----Gene Prediction > snaphmm= #SNAP HMM file > gmhmm=/home/msobol/genemark/68D_2/output/gmhmm.mod #GeneMark HMM file > augustus_species=1368D_uni #Augustus gene prediction species model > fgenesh_par_file= #FGENESH parameter file > pred_gff= #ab-initio predictions from an external GFF3 file > model_gff= #annotated gene models from an external GFF3 file (annotation > pass-through) > est2genome=0 #infer gene predictions directly from ESTs, 1 = yes, 0 = no > protein2genome=1 #infer predictions from protein homology, 1 = yes, 0 = no > trna=0 #find tRNAs with tRNAscan, 1 = yes, 0 = no > snoscan_rrna= #rRNA file to have Snoscan find snoRNAs > unmask=0 #also run ab-initio prediction programs on unmasked sequence, 1 = > yes, 0 = no > > #-----Other Annotation Feature Types (features MAKER doesn't recognize) > other_gff= #extra features to pass-through to final MAKER generated GFF3 > file > > #-----External Application Behavior Options > alt_peptide=C #amino acid used to replace non-standard amino acids in > BLAST databases > cpus=1 #max number of cpus to use in BLAST and RepeatMasker (not for MPI, > leave 1 when using MPI) > > #-----MAKER Behavior Options > max_dna_len=100000 #length for dividing up contigs into chunks > (increases/decreases memory usage) > min_contig=1 #skip genome contigs below this length (under 10kb are often > useless) > > pred_flank=200 #flank for extending evidence clusters sent to gene > predictors > pred_stats=1 #report AED and QI statistics for all predictions as well as > models > AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and > 1) > min_protein=0 #require at least this many amino acids in predicted proteins > alt_splice=0 #Take extra steps to try and find alternative splicing, 1 = > yes, 0 = no > always_complete=0 #extra steps to force start and stop codons, 1 = yes, 0 > = no > map_forward=0 #map names and attributes forward from old GFF3 genes, 1 = > yes, 0 = no > keep_preds=1 #Concordance threshold to add unsupported gene prediction > (bound by 0 and 1) > > split_hit=10000 #length for the splitting of hits (expected max intron > size for evidence alignments) > single_exon=1 #consider single exon EST evidence when generating > annotations, 1 = yes, 0 = no > single_length=250 #min length required for single exon ESTs if > 'single_exon is enabled' > correct_est_fusion=0 #limits use of ESTs in annotation to avoid fusion > genes > > tries=2 #number of times to try a contig if there is a failure for some > reason > clean_try=0 #remove all data from previous run before retrying, 1 = yes, 0 > = no > clean_up=0 #removes theVoid directory with individual analysis files, 1 = > yes, 0 = no > TMP= #specify a directory other than the system default temporary > directory for temporary files > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > -- > Xabier V?zquez-Campos, *PhD* > *Research Associate* > NSW Systems Biology Initiative > School of Biotechnology and Biomolecular Sciences > The University of New South Wales > Sydney NSW 2052 AUSTRALIA > -- Xabier V?zquez-Campos, *PhD* *Research Associate* NSW Systems Biology Initiative School of Biotechnology and Biomolecular Sciences The University of New South Wales Sydney NSW 2052 AUSTRALIA -------------- next part -------------- An HTML attachment was scrubbed... URL: From xvazquezc at gmail.com Wed Feb 6 16:33:47 2019 From: xvazquezc at gmail.com (=?UTF-8?Q?Xabier_V=C3=A1zquez=2DCampos?=) Date: Thu, 7 Feb 2019 09:33:47 +1100 Subject: [maker-devel] Re-annotation, fewer gene predictions In-Reply-To: References: <77517CF5-7C20-43FE-94EB-7F45A3D70A4F@live.com> Message-ID: SNAP is easy to train, works well in fungal genomes and it's explained in Maker's wiki: http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial_for_WGS_Assembly_and_Annotation_Winter_School_2018#Training_ab_initio_Gene_Predictors Oh, sorry, I didn't explain myself well. What I was trying to say is that before BUSCO, when we only had CEGMA, we would proceed in a different way to train Augustus as CEGMA wouldn't produce Augustus gene models automatically. I don't mean you to use CEGMA. This is what I have on my own documentation about how to train Augustus "the old way" > AUGUSTUS? the old way > > Alternatively, you can train AUGUSTUS in a more ?manual? way, like when we > were using CEGMA. The training starts with the output from the second > instance of fathom in the SNAP training section. > > cd ${MYGENOME_DIR}/maker/snap1 > perl ~/bin/zff2augustus_gbk.pl > ${MYGENOME}.train1.gb > > zff2augustus_gbk.pl generates a GenBank file from export.dna. > > The actual training of AUGUSTUS will be through the *webAUGUSTUS server*. > > Before proceed, it is recommended to rename the fasta headers, specially > if they contain special characters and/or very long headers. This is the > main reason of failure for the jobs submitted to webAUGUSTUS. You can use > the simplifyFastaHeaders.pl > > script for that: > > perl ~/bin/simplifyFastaHeaders.pl ${MYGENOME}_assembly.fasta nameStem ${MYGENOME}_contigs_rename.fasta ${MYGENOME}_contigs.map > > perl ~/bin/simplifyFastaHeaders.pl ${MYGENOME}_transcripts_assembled.fasta nameStem ${MYGENOME}_rna_rename.fasta ${MYGENOME}_rna.map > > nameStem is the base name for naming each of the sequences in the > multifasta files. Use a value with something appropriate. Use *contig* > and *rna* for the assembly and RNA-seq files, respectively; or something > based on that. For example, ?pgcontig? and ?pgrna? for contigs and RNA from *Puccinia > graminis* > *DO NOT* give the same nameStem to both fasta files, and don?t use any > special character. > > We need the following files (minimum): > > - ${MYGENOME}_assembly.fasta as *Genome file* > - ${MYGENOME}.train1.gb as *Training gene structure file* > > If we also have RNA-seq data: > > - ${MYGENOME}_assembled_transcripts.fasta as *cDNA file* > > Use ${MYGENOME}_v1 as *Species name*. We will need to have a different > species name in the retraining step. Otherwise when Maker2 is rerun, Maker2 > will see the same name and will not rerun AUGUSTUS, even though the species > profile is different. So, ${MYGENOME}_v1 just do the job and tracks > version. > > Once the job is finished, the *Species parameter archive* ( > parameters.tar.gz) will contain a folder with the model files for your > species. Copy it to the species folder of your AUGUSTUS installation. > Hope this helps PS: hit reply all so this is logged in Maker's mail list in case anybody else experiences similar issues On Thu, 7 Feb 2019 at 06:36, morgan sobol wrote: > I have not used SNAP or CEGMA, however, I see that CEGMA was discontinued > in 2015. > Do you think that will be a problem, or is it still worth using the old > version? > > > ------------------------------ > *From:* Xabier V?zquez-Campos > *Sent:* Tuesday, February 5, 2019 4:42 PM > *To:* morgan sobol; Maker Mailing List > *Subject:* Re: [maker-devel] Re-annotation, fewer gene predictions > > Don't you use SNAP? It usually produces quite decent results. And easier > to train than any of the other predictors > > In any case, the Augustus gene model is way off in both cases > GM doesn't seem bad if your fungus has a rather usual genome... in the > first. For the second, it looks bad > > I'm not too familiar with the reannotation but I'd rather create the gene > models from scratch rather than reuse the ones from the Illumina-only > genomes. > Note that assemblies with long-reads, have a higher proportion of > repetitive elements that need masking and RepeatMasker only may not be > enough. In theory, this shouldn't affect Augustus model if trained through > BUSCO as it uses defined conserved markers to create the gene model, but > I'm not so sure about GM. > > If you trained Augustus with BUSCO, and this is the result, I'd discard > the gene model and train it again by the "traditional way", i.e. as it used > to be when we only had CEGMA. I had good results just by changing the > training method. > > Hope it helps, > Xabi > > > > > On Wed, 6 Feb 2019 at 02:19, morgan sobol wrote: > > Thank you, Xabi for the response. > The number of proteins from each source is greatly lower than before. > Previous numbers were 325, 10,899, and 11,243 for augustus, genemark, and > maker respectively. > The more recent numbers are 25, 857, 4418 respectively. > > So do you think maybe this hints that something is wrong from genemark? > > Morgan > > > ------------------------------ > *From:* Xabier V?zquez-Campos > *Sent:* Sunday, February 3, 2019 4:43 PM > *To:* morgan sobol > *Cc:* maker-devel at yandell-lab.org > *Subject:* Re: [maker-devel] Re-annotation, fewer gene predictions > > Hi Morgan, > > We had a similar issue with AUGUSTUS underpredicting when using a > BUSCO-derived gene model > https://groups.google.com/d/msg/maker-devel/ocnDG4nq1A8/NyCPzzRgAgAJ > > Also, check the number of proteins by each individual predictor. If the > numbers from one of them are off, you may find a possible source of issues. > We didn't have a very good experience with GM, as it used to overpredict > an absurd number of proteins. > > Xabi > > On Mon, 4 Feb 2019 at 06:15, morgan sobol wrote: > > Hello, > > I previously used Maker to annotate two different fungal genomes that were > created using Illumina sequences only. For these genomes, I had over 11,000 > genes predicted. > I recently obtained PacBio sequences for the same genomes, so I created > two hybrid assemblies. Both assemblies were very familiar in length and > completed number of orthologs to the Illumina only assembly, but had much > fewer, but longer contigs. > > I re-ran Maker using the settings below. For one of my genomes, I got > around 11,000 genes predicted again, as expected. However, for the other > genome, I am continuously getting ~4,400 predicted genes. > > I am asking for help as to how I can determine why I keep getting fewer > predicted genes for only one of my genomes, even though I ran them the same? > > Thanks, > Morgan S. > > maker_opts.log > #-----Genome (these are always required) > genome=/work/Geomicrobiology/msobol/IODP_329_SPG/1368D2H1/repeatmasker/unicycler/1368D_unicycler_contigs.fasta.masked > #genome sequence (fasta file or$ > organism_type=eukaryotic #eukaryotic or prokaryotic. Default is eukaryotic > > #-----Re-annotation Using MAKER Derived GFF3 > maker_gff=/work/Geomicrobiology/msobol/IODP_329_SPG/1368D2H1/maker/1368D_2H1_contigs.fasta.maker.output/1368D_2H1_contigs.fasta.all.gff > #MAKER derive$ > est_pass=0 #use ESTs in maker_gff: 1 = yes, 0 = no > altest_pass=1 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no > protein_pass=1 #use protein alignments in maker_gff: 1 = yes, 0 = no > rm_pass=0 #use repeats in maker_gff: 1 = yes, 0 = no > model_pass=0 #use gene models in maker_gff: 1 = yes, 0 = no > pred_pass=0 #use ab-initio predictions in maker_gff: 1 = yes, 0 = no > other_pass=0 #passthrough anyything else in maker_gff: 1 = yes, 0 = no > > #-----EST Evidence (for best results provide a file for at least one) > est= #set of ESTs or assembled mRNA-seq in fasta format > altest= #EST/cDNA sequence file in fasta format from an alternate organism > est_gff= #aligned ESTs or mRNA-seq from an external GFF3 file > altest_gff= #aligned ESTs from a closly relate species in GFF3 format > > #-----Protein Homology Evidence (for best results provide a file for at > least one) > protein=/work/Geomicrobiology/msobol/IODP_329_SPG/uniprot_sprot.fasta > #protein sequence file in fasta format (i.e. from mutiple oransisms) > protein_gff= #aligned protein homology evidence from an external GFF3 file > > #-----Repeat Masking (leave values blank to skip repeat masking) > model_org= #select a model organism for RepBase masking in RepeatMasker > rmlib= #provide an organism specific repeat library in fasta format for > RepeatMasker > repeat_protein= #provide a fasta file of transposable element proteins for > RepeatRunner > rm_gff= #pre-identified repeat elements from an external GFF3 file > prok_rm=0 #forces MAKER to repeatmask prokaryotes (no reason to change > this), 1 = yes, 0 = no > softmask=0 #use soft-masking rather than hard-masking in BLAST (i.e. seg > and dust filtering) > > #-----Gene Prediction > snaphmm= #SNAP HMM file > gmhmm=/home/msobol/genemark/68D_2/output/gmhmm.mod #GeneMark HMM file > augustus_species=1368D_uni #Augustus gene prediction species model > fgenesh_par_file= #FGENESH parameter file > pred_gff= #ab-initio predictions from an external GFF3 file > model_gff= #annotated gene models from an external GFF3 file (annotation > pass-through) > est2genome=0 #infer gene predictions directly from ESTs, 1 = yes, 0 = no > protein2genome=1 #infer predictions from protein homology, 1 = yes, 0 = no > trna=0 #find tRNAs with tRNAscan, 1 = yes, 0 = no > snoscan_rrna= #rRNA file to have Snoscan find snoRNAs > unmask=0 #also run ab-initio prediction programs on unmasked sequence, 1 = > yes, 0 = no > > #-----Other Annotation Feature Types (features MAKER doesn't recognize) > other_gff= #extra features to pass-through to final MAKER generated GFF3 > file > > #-----External Application Behavior Options > alt_peptide=C #amino acid used to replace non-standard amino acids in > BLAST databases > cpus=1 #max number of cpus to use in BLAST and RepeatMasker (not for MPI, > leave 1 when using MPI) > > #-----MAKER Behavior Options > max_dna_len=100000 #length for dividing up contigs into chunks > (increases/decreases memory usage) > min_contig=1 #skip genome contigs below this length (under 10kb are often > useless) > > pred_flank=200 #flank for extending evidence clusters sent to gene > predictors > pred_stats=1 #report AED and QI statistics for all predictions as well as > models > AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and > 1) > min_protein=0 #require at least this many amino acids in predicted proteins > alt_splice=0 #Take extra steps to try and find alternative splicing, 1 = > yes, 0 = no > always_complete=0 #extra steps to force start and stop codons, 1 = yes, 0 > = no > map_forward=0 #map names and attributes forward from old GFF3 genes, 1 = > yes, 0 = no > keep_preds=1 #Concordance threshold to add unsupported gene prediction > (bound by 0 and 1) > > split_hit=10000 #length for the splitting of hits (expected max intron > size for evidence alignments) > single_exon=1 #consider single exon EST evidence when generating > annotations, 1 = yes, 0 = no > single_length=250 #min length required for single exon ESTs if > 'single_exon is enabled' > correct_est_fusion=0 #limits use of ESTs in annotation to avoid fusion > genes > > tries=2 #number of times to try a contig if there is a failure for some > reason > clean_try=0 #remove all data from previous run before retrying, 1 = yes, 0 > = no > clean_up=0 #removes theVoid directory with individual analysis files, 1 = > yes, 0 = no > TMP= #specify a directory other than the system default temporary > directory for temporary files > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > -- > Xabier V?zquez-Campos, *PhD* > *Research Associate* > NSW Systems Biology Initiative > School of Biotechnology and Biomolecular Sciences > The University of New South Wales > Sydney NSW 2052 AUSTRALIA > > > > -- > Xabier V?zquez-Campos, *PhD* > *Research Associate* > NSW Systems Biology Initiative > School of Biotechnology and Biomolecular Sciences > The University of New South Wales > Sydney NSW 2052 AUSTRALIA > -- Xabier V?zquez-Campos, *PhD* *Research Associate* NSW Systems Biology Initiative School of Biotechnology and Biomolecular Sciences The University of New South Wales Sydney NSW 2052 AUSTRALIA -------------- next part -------------- An HTML attachment was scrubbed... URL: From liorglic at mail.tau.ac.il Mon Feb 11 08:04:16 2019 From: liorglic at mail.tau.ac.il (Lior Glick) Date: Mon, 11 Feb 2019 16:04:16 +0200 Subject: [maker-devel] MAKER not calling RepeatMasker exe indicated in maker_exe.ctl Message-ID: Dear MAKER users, I've been using MAKER for a while now, with RepeatMasker installed locally. By that I mean that I can type 'RepeatMasker' in my terminal and the software is initiated. Typing 'which RepeatMasker' shows the correct local path. I also use this path as value for the maker_exe.ctl parameter 'RepeatMasker'. Trying to generalize my working environment, I am trying to use a conda env which is capable of running MAKER. This env comes with RepeatMasker as well. Once I activate this env, I can still run RepeatMasker, but it points to a different path. When I run MAKER within this env, it fails right away with the error message: ERROR: Could not determine if RepBase is installed Running the same configuration files locally (i.e. outside the conda env) results in a successful run. This leads me to think that MAKER is not actually using the path indicated in the maker_exe.ctl file, and rather looks for RepeatMasker in $PATH or something similar. Is that the expected behavior? Any suggestions of how to overcome this issue? Thanks and best regards, Lior -------------- next part -------------- An HTML attachment was scrubbed... URL: From liorglic at mail.tau.ac.il Mon Feb 11 08:12:25 2019 From: liorglic at mail.tau.ac.il (Lior Glick) Date: Mon, 11 Feb 2019 16:12:25 +0200 Subject: [maker-devel] Unknown (X) amino acids in predicted proteins Message-ID: Dear MAKER users, After completing a MAKER run, I looked at the protein fasta files that MAKER outputs and noticed that a small fraction of the sequences include X characters, indicating unknown amino acids. I was wondering how such sequences are obtained, I mean how come there are unknown amino acids in the prediction? Is this an indication of low-quality predictions? Is there any documentation regarding the procedure that generates the protein sequences? Thanks a lot, Lior -------------- next part -------------- An HTML attachment was scrubbed... URL: From kapeelc at gmail.com Thu Feb 7 13:43:47 2019 From: kapeelc at gmail.com (Kapeel Chougule) Date: Thu, 7 Feb 2019 14:43:47 -0500 Subject: [maker-devel] MAKER v3 Fgenesh ERROR Message-ID: Hi, Carson I have been getting this error with fgenesh tool within MAKER. It runs ok with most of the assembly contigs but seems to fail on one contig or part of the contig with the below error Widget::fgenesh: /mnt/grid/ware/hpc/home/data/mcampbel/applications/maker/bin/../lib/Widget/fgenesh/fgenesh_wrap /mnt/grid/ware/hpc_norepl/data/data/programs/fgenesh_v8/fgenesh_suite_v8.0.0a/fgenesh /sonas-hs/ware/hpc_norepl/data/programs/fgenesh_v8/fgenesh_suite_v8.0.0a/Zeamays.mpar.dat.new /tmp/uge/53139300.1.primary.q/maker_j3ttxX/6/6_1.600610-613023.Zeamays.mpar.dat.new.auto_annotator.fgenesh.fasta -exon_table:/tmp/uge/53139300.1.primary.q/maker_j3ttxX/6/6_1.600610-613023.Zeamays.mpar.dat.new.auto_annotator.xdef.fgenesh > /tmp/uge/53139300.1.primary.q/maker_j3ttxX/6/6_1.600610-613023.Zeamays.mpar.dat.new.auto_annotator.fgenesh #-------------------------------# ...processing 9 of 24 ...processing 8 of 28 ...processing 10 of 24 ...processing 9 of 28 ...processing 11 of 24 ...processing 10 of 28 ...processing 12 of 24 ...processing 11 of 28 deleted:0 genes ERROR: FgenesH failed --> rank=14, hostname=bnbcompute50 ERROR: Failed while annotating transcripts ERROR: Chunk failed at level:1, tier_type:4 FAILED CONTIG:Super-Scaffold_14.2_contig2 I updated the perl module fgenesh.pm as suggested in the previous threads. Attached are the maker_opts.ctl and STDERR log file. Thanks Kapeel -- *Kapeel ChouguleComputational Scientist Developer II* *One Bungtown Road Cold Spring Harbor, NY 11724http://www.warelab.org/ * -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_opts.ctl Type: application/octet-stream Size: 5420 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: stderr.log Type: application/octet-stream Size: 10012917 bytes Desc: not available URL: From fatih.sarigoel at durham.ac.uk Wed Feb 13 06:20:40 2019 From: fatih.sarigoel at durham.ac.uk (SARIGOEL, FATIH) Date: Wed, 13 Feb 2019 12:20:40 +0000 Subject: [maker-devel] Does Conda Maker actually work? Message-ID: Greetings, I notice that you never mention conda installation on your website, so I am curious if the conda version is actually supposed to be working fine or not; as for me it didn't. I created a new conda environment and installed Maker (tried this with both installation options) When I run the example files, I get this error: "make: *** [Makefile:330: IndexedBase_14e0.o] Error 127 A problem was encountered while attempting to compile and install your Inline C code. The command that failed was: "make > out.make 2>&1" with error code 2" My conda environment is here /fast_new/work/users/fsarigo_m/miniconda3 I don't understand why the program is trying to look here: /home/conda which does not exist Also begins with a "possible precedence issue" Thanks for your help in advance! Fatih +++++ Here is the full log until the end of the contig: (MakerX) [fsarigo_m at med0223 MAKER]$ maker Possible precedence issue with control flow operator at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/Bio/DB/IndexedBase.pm line 845. STATUS: Parsing control files... STATUS: Processing and indexing input FASTA files... STATUS: Setting up database for any GFF3 input... A data structure will be created for you at: /fast_new/work/users/fsarigo_m/scratch/tmp/urchin_holder/SeaUrchins/MAKER/dpp_contig.maker.output/dpp_contig_datastore To access files for individual sequences use the datastore index: /fast_new/work/users/fsarigo_m/scratch/tmp/urchin_holder/SeaUrchins/MAKER/dpp_contig.maker.output/dpp_contig_master_datastore_index.log STATUS: Now running MAKER... examining contents of the fasta file and run log --Next Contig-- Processing run.log file... #--------------------------------------------------------------------- Now starting the contig!! SeqID: contig-dpp-500-500 Length: 32156 #--------------------------------------------------------------------- Running Mkbootstrap for IndexedBase_14e0 () chmod 644 "IndexedBase_14e0.bs" "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/perl" -MExtUtils::Command::MM -e 'cp_nonempty' -- IndexedBase_14e0.bs blib/arch/auto/Bio/DB/IndexedBase_14e0/IndexedBase_14e0.bs 644 "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/perl" "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/ExtUtils/xsubpp" -typemap "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/5.26.2/ExtUtils/typemap" IndexedBase_14e0.xs > IndexedBase_14e0.xsc mv IndexedBase_14e0.xsc IndexedBase_14e0.c /home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/bin/x86_64-conda_cos6-linux-gnu-gcc -c -I"/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin" -D_REENTRANT -D_GNU_SOURCE --sysroot=/home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/x86_64-conda_cos6-linux-gnu/sysroot -fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D_FORTIFY_SOURCE=2 -O2 -DVERSION=\"0.00\" -DXS_VERSION=\"0.00\" -fPIC --sysroot=/home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/x86_64-conda_cos6-linux-gnu/sysroot "-I/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/5.26.2/x86_64-linux-thread-multi/CORE" IndexedBase_14e0.c /bin/sh: /home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/bin/x86_64-conda_cos6-linux-gnu-gcc: No such file or directory make: *** [Makefile:330: IndexedBase_14e0.o] Error 127 A problem was encountered while attempting to compile and install your Inline C code. The command that failed was: "make > out.make 2>&1" with error code 2 The build directory was: /fast_new/work/users/fsarigo_m/scratch/tmp/urchin_holder/SeaUrchins/MAKER/_Inline/build/Bio/DB/IndexedBase_14e0 To debug the problem, cd to the build directory, and inspect the output files. Environment PATH = '/fast/users/fsarigo_m/miniconda3/envs/MakerX/bin:/fast/users/fsarigo_m/miniconda3/bin:/opt/sge/bin:/opt/sge/bin/lx-amd64:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/fast/users/fsarigo_m/.local/bin:/fast/users/fsarigo_m/bin' at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/Bio/DB/IndexedBase.pm line 275. --> rank=NA, hostname=med0223 ...propagated at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/Inline/C.pm line 869. --> rank=NA, hostname=med0223 at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Error.pm line 38. Error::_throw_Error_Simple(HASH(0x564b40c78870)) called at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Error.pm line 306 Error::subs::run_clauses(HASH(0x564b40688970), "Running Mkbootstrap for IndexedBase_14e0 ()\x{a}chmod 644 \"Indexe"..., undef, ARRAY(0x564b40673ad0)) called at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Error.pm line 426 Error::subs::try(CODE(0x564b406899b8), HASH(0x564b40688970)) called at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/FastaSeq.pm line 95 FastaSeq::seq(FastaSeq=HASH(0x564b4068a7f0)) called at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Process/MpiChunk.pm line 478 Process::MpiChunk::_go(Process::MpiChunk=HASH(0x564b40673c08), "run", HASH(0x564b40673c80), 0, 0) called at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Process/MpiChunk.pm line 341 Process::MpiChunk::run(Process::MpiChunk=HASH(0x564b40673c08), 0) called at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Process/MpiChunk.pm line 357 Process::MpiChunk::run_all(Process::MpiChunk=HASH(0x564b40673c08), 0) called at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Process/MpiTiers.pm line 287 Process::MpiTiers::run_all(Process::MpiTiers=HASH(0x564b4053f9f0), 0) called at /fast/users/fsarigo_m/miniconda3/envs/MakerX/bin/maker line 683 Running Mkbootstrap for IndexedBase_14e0 () chmod 644 "IndexedBase_14e0.bs" "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/perl" -MExtUtils::Command::MM -e 'cp_nonempty' -- IndexedBase_14e0.bs blib/arch/auto/Bio/DB/IndexedBase_14e0/IndexedBase_14e0.bs 644 "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/perl" "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/ExtUtils/xsubpp" -typemap "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/5.26.2/ExtUtils/typemap" IndexedBase_14e0.xs > IndexedBase_14e0.xsc mv IndexedBase_14e0.xsc IndexedBase_14e0.c /home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/bin/x86_64-conda_cos6-linux-gnu-gcc -c -I"/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin" -D_REENTRANT -D_GNU_SOURCE --sysroot=/home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/x86_64-conda_cos6-linux-gnu/sysroot -fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D_FORTIFY_SOURCE=2 -O2 -DVERSION=\"0.00\" -DXS_VERSION=\"0.00\" -fPIC --sysroot=/home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/x86_64-conda_cos6-linux-gnu/sysroot "-I/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/5.26.2/x86_64-linux-thread-multi/CORE" IndexedBase_14e0.c /bin/sh: /home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/bin/x86_64-conda_cos6-linux-gnu-gcc: No such file or directory make: *** [Makefile:330: IndexedBase_14e0.o] Error 127 A problem was encountered while attempting to compile and install your Inline C code. The command that failed was: "make > out.make 2>&1" with error code 2 The build directory was: /fast_new/work/users/fsarigo_m/scratch/tmp/urchin_holder/SeaUrchins/MAKER/_Inline/build/Bio/DB/IndexedBase_14e0 To debug the problem, cd to the build directory, and inspect the output files. Environment PATH = '/fast/users/fsarigo_m/miniconda3/envs/MakerX/bin:/fast/users/fsarigo_m/miniconda3/bin:/opt/sge/bin:/opt/sge/bin/lx-amd64:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/fast/users/fsarigo_m/.local/bin:/fast/users/fsarigo_m/bin' at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/Bio/DB/IndexedBase.pm line 275. --> rank=NA, hostname=med0223 ...propagated at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/Inline/C.pm line 869. --> rank=NA, hostname=med0223 --> rank=NA, hostname=med0223 --> rank=NA, hostname=med0223 ERROR: Failed while examining contents of the fasta file and run log ERROR: Chunk failed at level:0, tier_type:0 FAILED CONTIG:contig-dpp-500-500 examining contents of the fasta file and run log -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Feb 13 08:51:44 2019 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 13 Feb 2019 07:51:44 -0700 Subject: [maker-devel] Does Conda Maker actually work? In-Reply-To: References: Message-ID: <0A81593F-EB19-417F-9C9D-3C55178F5D0F@gmail.com> The conda recipe was produced by another group. I do not currently recommend using it because I have seen a number of issues pop up on the list based on people attempting to install MAKER via conda. I know there is at least an issue with the conda RepeatMasker install, and there may be others. The specific failure you show is from Bio::DB::IndexedBase trying to compile an Inline::C function. It may be that conda is installing an older BioPerl where this issue still exists ?> https://github.com/bioperl/bioperl-live/issues/215 Or it may be that there is a new related issue (I?ve seen a handful of other examples that seem to relate back to Bio::DB::IndexedBase) ?> https://github.com/bioperl/bioperl-live/issues/305 Try installing MAKER without conda (make sure to remove any components that are in conda first to avoid conflicts). ?Carson > On Feb 13, 2019, at 5:20 AM, SARIGOEL, FATIH wrote: > > Greetings, > I notice that you never mention conda installation on your website, so I am curious if the conda version is actually supposed to be working fine or not; as for me it didn't. > I created a new conda environment and installed Maker (tried this with both installation options) > When I run the example files, I get this error: > > "make: *** [Makefile:330: IndexedBase_14e0.o] Error 127 > A problem was encountered while attempting to compile and install your Inline > C code. The command that failed was: > "make > out.make 2>&1" with error code 2" > > My conda environment is here > /fast_new/work/users/fsarigo_m/miniconda3 > I don't understand why the program is trying to look here: > /home/conda > which does not exist > > Also begins with a "possible precedence issue" > > Thanks for your help in advance! > Fatih > > +++++ > > Here is the full log until the end of the contig: > > (MakerX) [fsarigo_m at med0223 MAKER]$ maker > Possible precedence issue with control flow operator at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/Bio/DB/IndexedBase.pm line 845. > STATUS: Parsing control files... > STATUS: Processing and indexing input FASTA files... > STATUS: Setting up database for any GFF3 input... > A data structure will be created for you at: > /fast_new/work/users/fsarigo_m/scratch/tmp/urchin_holder/SeaUrchins/MAKER/dpp_contig.maker.output/dpp_contig_datastore > > To access files for individual sequences use the datastore index: > /fast_new/work/users/fsarigo_m/scratch/tmp/urchin_holder/SeaUrchins/MAKER/dpp_contig.maker.output/dpp_contig_master_datastore_index.log > > STATUS: Now running MAKER... > examining contents of the fasta file and run log > > > > --Next Contig-- > > Processing run.log file... > #--------------------------------------------------------------------- > Now starting the contig!! > SeqID: contig-dpp-500-500 > Length: 32156 > #--------------------------------------------------------------------- > > > Running Mkbootstrap for IndexedBase_14e0 () > chmod 644 "IndexedBase_14e0.bs" > "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/perl" -MExtUtils::Command::MM -e 'cp_nonempty' -- IndexedBase_14e0.bs blib/arch/auto/Bio/DB/IndexedBase_14e0/IndexedBase_14e0.bs 644 > "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/perl" "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/ExtUtils/xsubpp" -typemap "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/5.26.2/ExtUtils/typemap" IndexedBase_14e0.xs > IndexedBase_14e0.xsc > mv IndexedBase_14e0.xsc IndexedBase_14e0.c > /home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/bin/x86_64-conda_cos6-linux-gnu-gcc -c -I"/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin" -D_REENTRANT -D_GNU_SOURCE --sysroot=/home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/x86_64-conda_cos6-linux-gnu/sysroot -fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D_FORTIFY_SOURCE=2 -O2 -DVERSION=\"0.00\" -DXS_VERSION=\"0.00\" -fPIC --sysroot=/home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/x86_64-conda_cos6-linux-gnu/sysroot "-I/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/5.26.2/x86_64-linux-thread-multi/CORE" IndexedBase_14e0.c > /bin/sh: /home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/bin/x86_64-conda_cos6-linux-gnu-gcc: No such file or directory > make: *** [Makefile:330: IndexedBase_14e0.o] Error 127 > > A problem was encountered while attempting to compile and install your Inline > C code. The command that failed was: > "make > out.make 2>&1" with error code 2 > > The build directory was: > /fast_new/work/users/fsarigo_m/scratch/tmp/urchin_holder/SeaUrchins/MAKER/_Inline/build/Bio/DB/IndexedBase_14e0 > > To debug the problem, cd to the build directory, and inspect the output files. > > Environment PATH = '/fast/users/fsarigo_m/miniconda3/envs/MakerX/bin:/fast/users/fsarigo_m/miniconda3/bin:/opt/sge/bin:/opt/sge/bin/lx-amd64:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/fast/users/fsarigo_m/.local/bin:/fast/users/fsarigo_m/bin' > at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/Bio/DB/IndexedBase.pm line 275. > --> rank=NA, hostname=med0223 > ...propagated at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/Inline/C.pm line 869. > --> rank=NA, hostname=med0223 > at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Error.pm line 38. > Error::_throw_Error_Simple(HASH(0x564b40c78870)) called at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Error.pm line 306 > Error::subs::run_clauses(HASH(0x564b40688970), "Running Mkbootstrap for IndexedBase_14e0 ()\x{a}chmod 644 \"Indexe"..., undef, ARRAY(0x564b40673ad0)) called at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Error.pm line 426 > Error::subs::try(CODE(0x564b406899b8), HASH(0x564b40688970)) called at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/FastaSeq.pm line 95 > FastaSeq::seq(FastaSeq=HASH(0x564b4068a7f0)) called at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Process/MpiChunk.pm line 478 > Process::MpiChunk::_go(Process::MpiChunk=HASH(0x564b40673c08), "run", HASH(0x564b40673c80), 0, 0) called at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Process/MpiChunk.pm line 341 > Process::MpiChunk::run(Process::MpiChunk=HASH(0x564b40673c08), 0) called at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Process/MpiChunk.pm line 357 > Process::MpiChunk::run_all(Process::MpiChunk=HASH(0x564b40673c08), 0) called at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Process/MpiTiers.pm line 287 > Process::MpiTiers::run_all(Process::MpiTiers=HASH(0x564b4053f9f0), 0) called at /fast/users/fsarigo_m/miniconda3/envs/MakerX/bin/maker line 683 > Running Mkbootstrap for IndexedBase_14e0 () > chmod 644 "IndexedBase_14e0.bs" > "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/perl" -MExtUtils::Command::MM -e 'cp_nonempty' -- IndexedBase_14e0.bs blib/arch/auto/Bio/DB/IndexedBase_14e0/IndexedBase_14e0.bs 644 > "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/perl" "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/ExtUtils/xsubpp" -typemap "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/5.26.2/ExtUtils/typemap" IndexedBase_14e0.xs > IndexedBase_14e0.xsc > mv IndexedBase_14e0.xsc IndexedBase_14e0.c > /home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/bin/x86_64-conda_cos6-linux-gnu-gcc -c -I"/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin" -D_REENTRANT -D_GNU_SOURCE --sysroot=/home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/x86_64-conda_cos6-linux-gnu/sysroot -fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D_FORTIFY_SOURCE=2 -O2 -DVERSION=\"0.00\" -DXS_VERSION=\"0.00\" -fPIC --sysroot=/home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/x86_64-conda_cos6-linux-gnu/sysroot "-I/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/5.26.2/x86_64-linux-thread-multi/CORE" IndexedBase_14e0.c > /bin/sh: /home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/bin/x86_64-conda_cos6-linux-gnu-gcc: No such file or directory > make: *** [Makefile:330: IndexedBase_14e0.o] Error 127 > > A problem was encountered while attempting to compile and install your Inline > C code. The command that failed was: > "make > out.make 2>&1" with error code 2 > > The build directory was: > /fast_new/work/users/fsarigo_m/scratch/tmp/urchin_holder/SeaUrchins/MAKER/_Inline/build/Bio/DB/IndexedBase_14e0 > > To debug the problem, cd to the build directory, and inspect the output files. > > Environment PATH = '/fast/users/fsarigo_m/miniconda3/envs/MakerX/bin:/fast/users/fsarigo_m/miniconda3/bin:/opt/sge/bin:/opt/sge/bin/lx-amd64:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/fast/users/fsarigo_m/.local/bin:/fast/users/fsarigo_m/bin' > at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/Bio/DB/IndexedBase.pm line 275. > --> rank=NA, hostname=med0223 > ...propagated at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/Inline/C.pm line 869. > --> rank=NA, hostname=med0223 > --> rank=NA, hostname=med0223 > --> rank=NA, hostname=med0223 > ERROR: Failed while examining contents of the fasta file and run log > ERROR: Chunk failed at level:0, tier_type:0 > FAILED CONTIG:contig-dpp-500-500 > > examining contents of the fasta file and run log > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Feb 13 11:14:13 2019 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 13 Feb 2019 10:14:13 -0700 Subject: [maker-devel] MAKER not calling RepeatMasker exe indicated in maker_exe.ctl In-Reply-To: References: Message-ID: <6AFF11A9-9860-4047-A337-4B974C6C0F30@gmail.com> The conda installation of RepeatMasker runs oddly. It does not appear to run the ./configure script during setup, and is missing files inside the repeat library as a result. --Carson > On Feb 4, 2019, at 2:00 AM, Lior Glick wrote: > > Dear MAKER users, > > I've been using MAKER for a while now, with RepeatMasker installed locally. By that I mean that I can type 'RepeatMasker' in my terminal and the software is initiated. Typing 'which RepeatMasker' shows the correct local path. > I also use this path as value for the maker_exe.ctl parameter 'RepeatMasker'. > Trying to generalize my working environment, I am trying to use a conda env which is capable of running MAKER. This env comes with RepeatMasker as well. Once I activate this env, I can still run RepeatMasker, but it points to a different path. When I run MAKER within this env, it fails right away with the error message: > ERROR: Could not determine if RepBase is installed > Running the same configuration files locally (i.e. outside the conda env) results in a successful run. > This leads me to think that MAKER is not actually using the path indicated in the maker_exe.ctl file, and rather looks for RepeatMasker in $PATH or something similar. Is that the expected behavior? Any suggestions of how to overcome this issue? > > Thanks and best regards, > Lior > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Feb 13 11:18:44 2019 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 13 Feb 2019 10:18:44 -0700 Subject: [maker-devel] Unknown (X) amino acids in predicted proteins In-Reply-To: References: Message-ID: <1472E55C-62CB-4A73-B45D-C4BEF3E014B7@gmail.com> If you use GFF3 as input, or use est2genome or protein2genome in your final run, you may have ?N? characters from the assembly as part of your CDS (?N? is the ambiguity code for DNA which will result in an ?X? when translated which is the ambiguity code for amino acids). Augustus will do internal gymnastics and completely splice out exons containing N?s to try and never have this issue, but may not always be able to. It?s an indication of genome assembly issues. --Carson > On Feb 11, 2019, at 7:12 AM, Lior Glick wrote: > > Dear MAKER users, > > After completing a MAKER run, I looked at the protein fasta files that MAKER outputs and noticed that a small fraction of the sequences include X characters, indicating unknown amino acids. I was wondering how such sequences are obtained, I mean how come there are unknown amino acids in the prediction? Is this an indication of low-quality predictions? > Is there any documentation regarding the procedure that generates the protein sequences? > > Thanks a lot, > Lior > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Wed Feb 13 11:24:01 2019 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 13 Feb 2019 10:24:01 -0700 Subject: [maker-devel] Re-annotation, fewer gene predictions In-Reply-To: <77517CF5-7C20-43FE-94EB-7F45A3D70A4F@live.com> References: <77517CF5-7C20-43FE-94EB-7F45A3D70A4F@live.com> Message-ID: One thing you can also do is use old models as protein= input and run the protein2genome option just to see where things align. You may find that not all old models are recoverable in the new assembly. Fewer genes in the new assembly may mean redundant/duplicate contigs were collapse and split contigs were joined resulting in multiple gene fragments becoming a unified single model. Make sure to always review contigs in a browser to see how models and evidence correlate. ?Carson > On Feb 3, 2019, at 12:13 PM, morgan sobol wrote: > > Hello, > > I previously used Maker to annotate two different fungal genomes that were created using Illumina sequences only. For these genomes, I had over 11,000 genes predicted. > I recently obtained PacBio sequences for the same genomes, so I created two hybrid assemblies. Both assemblies were very familiar in length and completed number of orthologs to the Illumina only assembly, but had much fewer, but longer contigs. > > I re-ran Maker using the settings below. For one of my genomes, I got around 11,000 genes predicted again, as expected. However, for the other genome, I am continuously getting ~4,400 predicted genes. > > I am asking for help as to how I can determine why I keep getting fewer predicted genes for only one of my genomes, even though I ran them the same? > > Thanks, > Morgan S. > > maker_opts.log > #-----Genome (these are always required) > genome=/work/Geomicrobiology/msobol/IODP_329_SPG/1368D2H1/repeatmasker/unicycler/1368D_unicycler_contigs.fasta.masked #genome sequence (fasta file or$ > organism_type=eukaryotic #eukaryotic or prokaryotic. Default is eukaryotic > > #-----Re-annotation Using MAKER Derived GFF3 > maker_gff=/work/Geomicrobiology/msobol/IODP_329_SPG/1368D2H1/maker/1368D_2H1_contigs.fasta.maker.output/1368D_2H1_contigs.fasta.all.gff #MAKER derive$ > est_pass=0 #use ESTs in maker_gff: 1 = yes, 0 = no > altest_pass=1 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no > protein_pass=1 #use protein alignments in maker_gff: 1 = yes, 0 = no > rm_pass=0 #use repeats in maker_gff: 1 = yes, 0 = no > model_pass=0 #use gene models in maker_gff: 1 = yes, 0 = no > pred_pass=0 #use ab-initio predictions in maker_gff: 1 = yes, 0 = no > other_pass=0 #passthrough anyything else in maker_gff: 1 = yes, 0 = no > > #-----EST Evidence (for best results provide a file for at least one) > est= #set of ESTs or assembled mRNA-seq in fasta format > altest= #EST/cDNA sequence file in fasta format from an alternate organism > est_gff= #aligned ESTs or mRNA-seq from an external GFF3 file > altest_gff= #aligned ESTs from a closly relate species in GFF3 format > > #-----Protein Homology Evidence (for best results provide a file for at least one) > protein=/work/Geomicrobiology/msobol/IODP_329_SPG/uniprot_sprot.fasta #protein sequence file in fasta format (i.e. from mutiple oransisms) > protein_gff= #aligned protein homology evidence from an external GFF3 file > > #-----Repeat Masking (leave values blank to skip repeat masking) > model_org= #select a model organism for RepBase masking in RepeatMasker > rmlib= #provide an organism specific repeat library in fasta format for RepeatMasker > repeat_protein= #provide a fasta file of transposable element proteins for RepeatRunner > rm_gff= #pre-identified repeat elements from an external GFF3 file > prok_rm=0 #forces MAKER to repeatmask prokaryotes (no reason to change this), 1 = yes, 0 = no > softmask=0 #use soft-masking rather than hard-masking in BLAST (i.e. seg and dust filtering) > > #-----Gene Prediction > snaphmm= #SNAP HMM file > gmhmm=/home/msobol/genemark/68D_2/output/gmhmm.mod #GeneMark HMM file > augustus_species=1368D_uni #Augustus gene prediction species model > fgenesh_par_file= #FGENESH parameter file > pred_gff= #ab-initio predictions from an external GFF3 file > model_gff= #annotated gene models from an external GFF3 file (annotation pass-through) > est2genome=0 #infer gene predictions directly from ESTs, 1 = yes, 0 = no > protein2genome=1 #infer predictions from protein homology, 1 = yes, 0 = no > trna=0 #find tRNAs with tRNAscan, 1 = yes, 0 = no > snoscan_rrna= #rRNA file to have Snoscan find snoRNAs > unmask=0 #also run ab-initio prediction programs on unmasked sequence, 1 = yes, 0 = no > > #-----Other Annotation Feature Types (features MAKER doesn't recognize) > other_gff= #extra features to pass-through to final MAKER generated GFF3 file > > #-----External Application Behavior Options > alt_peptide=C #amino acid used to replace non-standard amino acids in BLAST databases > cpus=1 #max number of cpus to use in BLAST and RepeatMasker (not for MPI, leave 1 when using MPI) > > #-----MAKER Behavior Options > max_dna_len=100000 #length for dividing up contigs into chunks (increases/decreases memory usage) > min_contig=1 #skip genome contigs below this length (under 10kb are often useless) > > pred_flank=200 #flank for extending evidence clusters sent to gene predictors > pred_stats=1 #report AED and QI statistics for all predictions as well as models > AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and 1) > min_protein=0 #require at least this many amino acids in predicted proteins > alt_splice=0 #Take extra steps to try and find alternative splicing, 1 = yes, 0 = no > always_complete=0 #extra steps to force start and stop codons, 1 = yes, 0 = no > map_forward=0 #map names and attributes forward from old GFF3 genes, 1 = yes, 0 = no > keep_preds=1 #Concordance threshold to add unsupported gene prediction (bound by 0 and 1) > > split_hit=10000 #length for the splitting of hits (expected max intron size for evidence alignments) > single_exon=1 #consider single exon EST evidence when generating annotations, 1 = yes, 0 = no > single_length=250 #min length required for single exon ESTs if 'single_exon is enabled' > correct_est_fusion=0 #limits use of ESTs in annotation to avoid fusion genes > > tries=2 #number of times to try a contig if there is a failure for some reason > clean_try=0 #remove all data from previous run before retrying, 1 = yes, 0 = no > clean_up=0 #removes theVoid directory with individual analysis files, 1 = yes, 0 = no > TMP= #specify a directory other than the system default temporary directory for temporary files > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From liorglck at gmail.com Sun Feb 17 12:50:10 2019 From: liorglck at gmail.com (Lior Glick) Date: Sun, 17 Feb 2019 20:50:10 +0200 Subject: [maker-devel] Does Conda Maker actually work? In-Reply-To: <0A81593F-EB19-417F-9C9D-3C55178F5D0F@gmail.com> References: <0A81593F-EB19-417F-9C9D-3C55178F5D0F@gmail.com> Message-ID: That's good to know. Any plans on creating a stable conda package in the future? It'd be a very nice feature, especially since MAKER is not always straightforward to install. On Wed, Feb 13, 2019 at 5:22 PM Carson Holt wrote: > The conda recipe was produced by another group. I do not currently > recommend using it because I have seen a number of issues pop up on the > list based on people attempting to install MAKER via conda. I know there > is at least an issue with the conda RepeatMasker install, and there may be > others. The specific failure you show is from Bio::DB::IndexedBase trying > to compile an Inline::C function. It may be that conda is installing an > older BioPerl where this issue still exists ?> > https://github.com/bioperl/bioperl-live/issues/215 > > Or it may be that there is a new related issue (I?ve seen a handful of > other examples that seem to relate back to Bio::DB::IndexedBase) ?> > https://github.com/bioperl/bioperl-live/issues/305 > > Try installing MAKER without conda (make sure to remove any components > that are in conda first to avoid conflicts). > > ?Carson > > > On Feb 13, 2019, at 5:20 AM, SARIGOEL, FATIH > wrote: > > Greetings, > I notice that you never mention conda installation on your website, so I > am curious if the conda version is actually supposed to be working fine or > not; as for me it didn't. > I created a new conda environment and installed Maker (tried this with > both installation options) > When I run the example files, I get this error: > > "make: *** [Makefile:330: IndexedBase_14e0.o] Error 127 > A problem was encountered while attempting to compile and install your > Inline > C code. The command that failed was: > "make > out.make 2>&1" with error code 2" > > My conda environment is here > /fast_new/work/users/fsarigo_m/miniconda3 > I don't understand why the program is trying to look here: > /home/conda > which does not exist > > Also begins with a "possible precedence issue" > > Thanks for your help in advance! > Fatih > > +++++ > > Here is the full log until the end of the contig: > > (MakerX) [fsarigo_m at med0223 MAKER]$ maker > Possible precedence issue with control flow operator at > /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/Bio/DB/IndexedBase.pm > line 845. > STATUS: Parsing control files... > STATUS: Processing and indexing input FASTA files... > STATUS: Setting up database for any GFF3 input... > A data structure will be created for you at: > > /fast_new/work/users/fsarigo_m/scratch/tmp/urchin_holder/SeaUrchins/MAKER/dpp_contig.maker.output/dpp_contig_datastore > > To access files for individual sequences use the datastore index: > > /fast_new/work/users/fsarigo_m/scratch/tmp/urchin_holder/SeaUrchins/MAKER/dpp_contig.maker.output/dpp_contig_master_datastore_index.log > > STATUS: Now running MAKER... > examining contents of the fasta file and run log > > > > --Next Contig-- > > Processing run.log file... > #--------------------------------------------------------------------- > Now starting the contig!! > SeqID: contig-dpp-500-500 > Length: 32156 > #--------------------------------------------------------------------- > > > Running Mkbootstrap for IndexedBase_14e0 () > chmod 644 "IndexedBase_14e0.bs" > "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/perl" > -MExtUtils::Command::MM -e 'cp_nonempty' -- IndexedBase_14e0.bs > blib/arch/auto/Bio/DB/IndexedBase_14e0/IndexedBase_14e0.bs 644 > "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/perl" > "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/ExtUtils/xsubpp" > -typemap > "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/5.26.2/ExtUtils/typemap" > IndexedBase_14e0.xs > IndexedBase_14e0.xsc > mv IndexedBase_14e0.xsc IndexedBase_14e0.c > /home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/bin/x86_64-conda_cos6-linux-gnu-gcc > -c -I"/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin" > -D_REENTRANT -D_GNU_SOURCE > --sysroot=/home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/x86_64-conda_cos6-linux-gnu/sysroot > -fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong > -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D_FORTIFY_SOURCE=2 -O2 > -DVERSION=\"0.00\" -DXS_VERSION=\"0.00\" -fPIC > --sysroot=/home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/x86_64-conda_cos6-linux-gnu/sysroot > "-I/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/5.26.2/x86_64-linux-thread-multi/CORE" > IndexedBase_14e0.c > /bin/sh: > /home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/bin/x86_64-conda_cos6-linux-gnu-gcc: > No such file or directory > make: *** [Makefile:330: IndexedBase_14e0.o] Error 127 > > A problem was encountered while attempting to compile and install your > Inline > C code. The command that failed was: > "make > out.make 2>&1" with error code 2 > > The build directory was: > > /fast_new/work/users/fsarigo_m/scratch/tmp/urchin_holder/SeaUrchins/MAKER/_Inline/build/Bio/DB/IndexedBase_14e0 > > To debug the problem, cd to the build directory, and inspect the output > files. > > Environment PATH = > '/fast/users/fsarigo_m/miniconda3/envs/MakerX/bin:/fast/users/fsarigo_m/miniconda3/bin:/opt/sge/bin:/opt/sge/bin/lx-amd64:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/fast/users/fsarigo_m/.local/bin:/fast/users/fsarigo_m/bin' > at > /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/Bio/DB/IndexedBase.pm > line 275. > --> rank=NA, hostname=med0223 > ...propagated at > /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/Inline/C.pm > line 869. > --> rank=NA, hostname=med0223 > at > /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Error.pm > line 38. > Error::_throw_Error_Simple(HASH(0x564b40c78870)) called at > /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Error.pm > line 306 > Error::subs::run_clauses(HASH(0x564b40688970), "Running Mkbootstrap for > IndexedBase_14e0 ()\x{a}chmod 644 \"Indexe"..., undef, > ARRAY(0x564b40673ad0)) called at > /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Error.pm > line 426 > Error::subs::try(CODE(0x564b406899b8), HASH(0x564b40688970)) called at > /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/FastaSeq.pm > line 95 > FastaSeq::seq(FastaSeq=HASH(0x564b4068a7f0)) called at > /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Process/MpiChunk.pm > line 478 > Process::MpiChunk::_go(Process::MpiChunk=HASH(0x564b40673c08), "run", > HASH(0x564b40673c80), 0, 0) called at > /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Process/MpiChunk.pm > line 341 > Process::MpiChunk::run(Process::MpiChunk=HASH(0x564b40673c08), 0) called > at > /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Process/MpiChunk.pm > line 357 > Process::MpiChunk::run_all(Process::MpiChunk=HASH(0x564b40673c08), 0) > called at > /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Process/MpiTiers.pm > line 287 > Process::MpiTiers::run_all(Process::MpiTiers=HASH(0x564b4053f9f0), 0) > called at /fast/users/fsarigo_m/miniconda3/envs/MakerX/bin/maker line 683 > Running Mkbootstrap for IndexedBase_14e0 () > chmod 644 "IndexedBase_14e0.bs" > "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/perl" > -MExtUtils::Command::MM -e 'cp_nonempty' -- IndexedBase_14e0.bs > blib/arch/auto/Bio/DB/IndexedBase_14e0/IndexedBase_14e0.bs 644 > "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/perl" > "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/ExtUtils/xsubpp" > -typemap > "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/5.26.2/ExtUtils/typemap" > IndexedBase_14e0.xs > IndexedBase_14e0.xsc > mv IndexedBase_14e0.xsc IndexedBase_14e0.c > /home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/bin/x86_64-conda_cos6-linux-gnu-gcc > -c -I"/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin" > -D_REENTRANT -D_GNU_SOURCE > --sysroot=/home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/x86_64-conda_cos6-linux-gnu/sysroot > -fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong > -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D_FORTIFY_SOURCE=2 -O2 > -DVERSION=\"0.00\" -DXS_VERSION=\"0.00\" -fPIC > --sysroot=/home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/x86_64-conda_cos6-linux-gnu/sysroot > "-I/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/5.26.2/x86_64-linux-thread-multi/CORE" > IndexedBase_14e0.c > /bin/sh: > /home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/bin/x86_64-conda_cos6-linux-gnu-gcc: > No such file or directory > make: *** [Makefile:330: IndexedBase_14e0.o] Error 127 > > A problem was encountered while attempting to compile and install your > Inline > C code. The command that failed was: > "make > out.make 2>&1" with error code 2 > > The build directory was: > > /fast_new/work/users/fsarigo_m/scratch/tmp/urchin_holder/SeaUrchins/MAKER/_Inline/build/Bio/DB/IndexedBase_14e0 > > To debug the problem, cd to the build directory, and inspect the output > files. > > Environment PATH = > '/fast/users/fsarigo_m/miniconda3/envs/MakerX/bin:/fast/users/fsarigo_m/miniconda3/bin:/opt/sge/bin:/opt/sge/bin/lx-amd64:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/fast/users/fsarigo_m/.local/bin:/fast/users/fsarigo_m/bin' > at > /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/Bio/DB/IndexedBase.pm > line 275. > --> rank=NA, hostname=med0223 > ...propagated at > /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/Inline/C.pm > line 869. > --> rank=NA, hostname=med0223 > --> rank=NA, hostname=med0223 > --> rank=NA, hostname=med0223 > ERROR: Failed while examining contents of the fasta file and run log > ERROR: Chunk failed at level:0, tier_type:0 > FAILED CONTIG:contig-dpp-500-500 > > examining contents of the fasta file and run log > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From morgan_starr_s at live.com Mon Feb 18 03:08:56 2019 From: morgan_starr_s at live.com (morgan sobol) Date: Mon, 18 Feb 2019 09:08:56 +0000 Subject: [maker-devel] Re-annotation, fewer gene predictions In-Reply-To: References: <77517CF5-7C20-43FE-94EB-7F45A3D70A4F@live.com> , Message-ID: Thank you, Xabi and Carson. With your help, I was able to improve the annotation with a more appropriate number of predictions. Best, Morgan ________________________________ From: Xabier V?zquez-Campos Sent: Wednesday, February 6, 2019 11:33 PM To: morgan sobol; Maker Mailing List Subject: Re: [maker-devel] Re-annotation, fewer gene predictions SNAP is easy to train, works well in fungal genomes and it's explained in Maker's wiki: http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial_for_WGS_Assembly_and_Annotation_Winter_School_2018#Training_ab_initio_Gene_Predictors Oh, sorry, I didn't explain myself well. What I was trying to say is that before BUSCO, when we only had CEGMA, we would proceed in a different way to train Augustus as CEGMA wouldn't produce Augustus gene models automatically. I don't mean you to use CEGMA. This is what I have on my own documentation about how to train Augustus "the old way" AUGUSTUS? the old way Alternatively, you can train AUGUSTUS in a more ?manual? way, like when we were using CEGMA. The training starts with the output from the second instance of fathom in the SNAP training section. cd ${MYGENOME_DIR}/maker/snap1 perl ~/bin/zff2augustus_gbk.pl > ${MYGENOME}.train1.gb zff2augustus_gbk.pl generates a GenBank file from export.dna. The actual training of AUGUSTUS will be through the webAUGUSTUS server. Before proceed, it is recommended to rename the fasta headers, specially if they contain special characters and/or very long headers. This is the main reason of failure for the jobs submitted to webAUGUSTUS. You can use the simplifyFastaHeaders.pl script for that: perl ~/bin/simplifyFastaHeaders.pl ${MYGENOME}_assembly.fasta nameStem ${MYGENOME}_contigs_rename.fasta ${MYGENOME}_contigs.map perl ~/bin/simplifyFastaHeaders.pl ${MYGENOME}_transcripts_assembled.fasta nameStem ${MYGENOME}_rna_rename.fasta ${MYGENOME}_rna.map nameStem is the base name for naming each of the sequences in the multifasta files. Use a value with something appropriate. Use contig and rna for the assembly and RNA-seq files, respectively; or something based on that. For example, ?pgcontig? and ?pgrna? for contigs and RNA from Puccinia graminis DO NOT give the same nameStem to both fasta files, and don?t use any special character. We need the following files (minimum): * ${MYGENOME}_assembly.fasta as Genome file * ${MYGENOME}.train1.gb as Training gene structure file If we also have RNA-seq data: * ${MYGENOME}_assembled_transcripts.fasta as cDNA file Use ${MYGENOME}_v1 as Species name. We will need to have a different species name in the retraining step. Otherwise when Maker2 is rerun, Maker2 will see the same name and will not rerun AUGUSTUS, even though the species profile is different. So, ${MYGENOME}_v1 just do the job and tracks version. Once the job is finished, the Species parameter archive (parameters.tar.gz) will contain a folder with the model files for your species. Copy it to the species folder of your AUGUSTUS installation. Hope this helps PS: hit reply all so this is logged in Maker's mail list in case anybody else experiences similar issues On Thu, 7 Feb 2019 at 06:36, morgan sobol > wrote: I have not used SNAP or CEGMA, however, I see that CEGMA was discontinued in 2015. Do you think that will be a problem, or is it still worth using the old version? ________________________________ From: Xabier V?zquez-Campos > Sent: Tuesday, February 5, 2019 4:42 PM To: morgan sobol; Maker Mailing List Subject: Re: [maker-devel] Re-annotation, fewer gene predictions Don't you use SNAP? It usually produces quite decent results. And easier to train than any of the other predictors In any case, the Augustus gene model is way off in both cases GM doesn't seem bad if your fungus has a rather usual genome... in the first. For the second, it looks bad I'm not too familiar with the reannotation but I'd rather create the gene models from scratch rather than reuse the ones from the Illumina-only genomes. Note that assemblies with long-reads, have a higher proportion of repetitive elements that need masking and RepeatMasker only may not be enough. In theory, this shouldn't affect Augustus model if trained through BUSCO as it uses defined conserved markers to create the gene model, but I'm not so sure about GM. If you trained Augustus with BUSCO, and this is the result, I'd discard the gene model and train it again by the "traditional way", i.e. as it used to be when we only had CEGMA. I had good results just by changing the training method. Hope it helps, Xabi On Wed, 6 Feb 2019 at 02:19, morgan sobol > wrote: Thank you, Xabi for the response. The number of proteins from each source is greatly lower than before. Previous numbers were 325, 10,899, and 11,243 for augustus, genemark, and maker respectively. The more recent numbers are 25, 857, 4418 respectively. So do you think maybe this hints that something is wrong from genemark? Morgan ________________________________ From: Xabier V?zquez-Campos > Sent: Sunday, February 3, 2019 4:43 PM To: morgan sobol Cc: maker-devel at yandell-lab.org Subject: Re: [maker-devel] Re-annotation, fewer gene predictions Hi Morgan, We had a similar issue with AUGUSTUS underpredicting when using a BUSCO-derived gene model https://groups.google.com/d/msg/maker-devel/ocnDG4nq1A8/NyCPzzRgAgAJ Also, check the number of proteins by each individual predictor. If the numbers from one of them are off, you may find a possible source of issues. We didn't have a very good experience with GM, as it used to overpredict an absurd number of proteins. Xabi On Mon, 4 Feb 2019 at 06:15, morgan sobol > wrote: Hello, I previously used Maker to annotate two different fungal genomes that were created using Illumina sequences only. For these genomes, I had over 11,000 genes predicted. I recently obtained PacBio sequences for the same genomes, so I created two hybrid assemblies. Both assemblies were very familiar in length and completed number of orthologs to the Illumina only assembly, but had much fewer, but longer contigs. I re-ran Maker using the settings below. For one of my genomes, I got around 11,000 genes predicted again, as expected. However, for the other genome, I am continuously getting ~4,400 predicted genes. I am asking for help as to how I can determine why I keep getting fewer predicted genes for only one of my genomes, even though I ran them the same? Thanks, Morgan S. maker_opts.log #-----Genome (these are always required) genome=/work/Geomicrobiology/msobol/IODP_329_SPG/1368D2H1/repeatmasker/unicycler/1368D_unicycler_contigs.fasta.masked #genome sequence (fasta file or$ organism_type=eukaryotic #eukaryotic or prokaryotic. Default is eukaryotic #-----Re-annotation Using MAKER Derived GFF3 maker_gff=/work/Geomicrobiology/msobol/IODP_329_SPG/1368D2H1/maker/1368D_2H1_contigs.fasta.maker.output/1368D_2H1_contigs.fasta.all.gff #MAKER derive$ est_pass=0 #use ESTs in maker_gff: 1 = yes, 0 = no altest_pass=1 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no protein_pass=1 #use protein alignments in maker_gff: 1 = yes, 0 = no rm_pass=0 #use repeats in maker_gff: 1 = yes, 0 = no model_pass=0 #use gene models in maker_gff: 1 = yes, 0 = no pred_pass=0 #use ab-initio predictions in maker_gff: 1 = yes, 0 = no other_pass=0 #passthrough anyything else in maker_gff: 1 = yes, 0 = no #-----EST Evidence (for best results provide a file for at least one) est= #set of ESTs or assembled mRNA-seq in fasta format altest= #EST/cDNA sequence file in fasta format from an alternate organism est_gff= #aligned ESTs or mRNA-seq from an external GFF3 file altest_gff= #aligned ESTs from a closly relate species in GFF3 format #-----Protein Homology Evidence (for best results provide a file for at least one) protein=/work/Geomicrobiology/msobol/IODP_329_SPG/uniprot_sprot.fasta #protein sequence file in fasta format (i.e. from mutiple oransisms) protein_gff= #aligned protein homology evidence from an external GFF3 file #-----Repeat Masking (leave values blank to skip repeat masking) model_org= #select a model organism for RepBase masking in RepeatMasker rmlib= #provide an organism specific repeat library in fasta format for RepeatMasker repeat_protein= #provide a fasta file of transposable element proteins for RepeatRunner rm_gff= #pre-identified repeat elements from an external GFF3 file prok_rm=0 #forces MAKER to repeatmask prokaryotes (no reason to change this), 1 = yes, 0 = no softmask=0 #use soft-masking rather than hard-masking in BLAST (i.e. seg and dust filtering) #-----Gene Prediction snaphmm= #SNAP HMM file gmhmm=/home/msobol/genemark/68D_2/output/gmhmm.mod #GeneMark HMM file augustus_species=1368D_uni #Augustus gene prediction species model fgenesh_par_file= #FGENESH parameter file pred_gff= #ab-initio predictions from an external GFF3 file model_gff= #annotated gene models from an external GFF3 file (annotation pass-through) est2genome=0 #infer gene predictions directly from ESTs, 1 = yes, 0 = no protein2genome=1 #infer predictions from protein homology, 1 = yes, 0 = no trna=0 #find tRNAs with tRNAscan, 1 = yes, 0 = no snoscan_rrna= #rRNA file to have Snoscan find snoRNAs unmask=0 #also run ab-initio prediction programs on unmasked sequence, 1 = yes, 0 = no #-----Other Annotation Feature Types (features MAKER doesn't recognize) other_gff= #extra features to pass-through to final MAKER generated GFF3 file #-----External Application Behavior Options alt_peptide=C #amino acid used to replace non-standard amino acids in BLAST databases cpus=1 #max number of cpus to use in BLAST and RepeatMasker (not for MPI, leave 1 when using MPI) #-----MAKER Behavior Options max_dna_len=100000 #length for dividing up contigs into chunks (increases/decreases memory usage) min_contig=1 #skip genome contigs below this length (under 10kb are often useless) pred_flank=200 #flank for extending evidence clusters sent to gene predictors pred_stats=1 #report AED and QI statistics for all predictions as well as models AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and 1) min_protein=0 #require at least this many amino acids in predicted proteins alt_splice=0 #Take extra steps to try and find alternative splicing, 1 = yes, 0 = no always_complete=0 #extra steps to force start and stop codons, 1 = yes, 0 = no map_forward=0 #map names and attributes forward from old GFF3 genes, 1 = yes, 0 = no keep_preds=1 #Concordance threshold to add unsupported gene prediction (bound by 0 and 1) split_hit=10000 #length for the splitting of hits (expected max intron size for evidence alignments) single_exon=1 #consider single exon EST evidence when generating annotations, 1 = yes, 0 = no single_length=250 #min length required for single exon ESTs if 'single_exon is enabled' correct_est_fusion=0 #limits use of ESTs in annotation to avoid fusion genes tries=2 #number of times to try a contig if there is a failure for some reason clean_try=0 #remove all data from previous run before retrying, 1 = yes, 0 = no clean_up=0 #removes theVoid directory with individual analysis files, 1 = yes, 0 = no TMP= #specify a directory other than the system default temporary directory for temporary files _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -- Xabier V?zquez-Campos, PhD Research Associate NSW Systems Biology Initiative School of Biotechnology and Biomolecular Sciences The University of New South Wales Sydney NSW 2052 AUSTRALIA -- Xabier V?zquez-Campos, PhD Research Associate NSW Systems Biology Initiative School of Biotechnology and Biomolecular Sciences The University of New South Wales Sydney NSW 2052 AUSTRALIA -- Xabier V?zquez-Campos, PhD Research Associate NSW Systems Biology Initiative School of Biotechnology and Biomolecular Sciences The University of New South Wales Sydney NSW 2052 AUSTRALIA -------------- next part -------------- An HTML attachment was scrubbed... URL: From anthony.bretaudeau at inria.fr Mon Feb 18 03:53:39 2019 From: anthony.bretaudeau at inria.fr (Anthony Bretaudeau) Date: Mon, 18 Feb 2019 10:53:39 +0100 Subject: [maker-devel] Does Conda Maker actually work? In-Reply-To: References: <0A81593F-EB19-417F-9C9D-3C55178F5D0F@gmail.com> Message-ID: <3aa1eb97-f8bf-dd61-febf-464ad4b1626c@inria.fr> An HTML attachment was scrubbed... URL: From liorglic at mail.tau.ac.il Sun Feb 24 06:50:49 2019 From: liorglic at mail.tau.ac.il (Lior Glick) Date: Sun, 24 Feb 2019 14:50:49 +0200 Subject: [maker-devel] Profiling MAKER runs Message-ID: Dear MAKER users, I was wondering if any of you has an idea of a way by which I can profile my runs. What I mean is I'd like to know how much time was spent on each step of the analysis - am I spending most of the time masking repeats, blasting transcripts/proteins, running ab-initio predictors etc. Based on this information, I might want to adjust my configuration, e.g. maybe I'm spending a lot of time blasting transcripts, and reducing the number of input transcripts would reduce run time significantly without having a major effect on results quality. As far as I can see, the main run log does not provide such information, and I'm not sure where else to look. Any ideas or directions could be of help. Thanks! Lior -------------- next part -------------- An HTML attachment was scrubbed... URL: From morgan_starr_s at live.com Sun Feb 3 12:13:47 2019 From: morgan_starr_s at live.com (morgan sobol) Date: Sun, 3 Feb 2019 19:13:47 +0000 Subject: [maker-devel] Re-annotation, fewer gene predictions Message-ID: <77517CF5-7C20-43FE-94EB-7F45A3D70A4F@live.com> Hello, I previously used Maker to annotate two different fungal genomes that were created using Illumina sequences only. For these genomes, I had over 11,000 genes predicted. I recently obtained PacBio sequences for the same genomes, so I created two hybrid assemblies. Both assemblies were very familiar in length and completed number of orthologs to the Illumina only assembly, but had much fewer, but longer contigs. I re-ran Maker using the settings below. For one of my genomes, I got around 11,000 genes predicted again, as expected. However, for the other genome, I am continuously getting ~4,400 predicted genes. I am asking for help as to how I can determine why I keep getting fewer predicted genes for only one of my genomes, even though I ran them the same? Thanks, Morgan S. maker_opts.log #-----Genome (these are always required) genome=/work/Geomicrobiology/msobol/IODP_329_SPG/1368D2H1/repeatmasker/unicycler/1368D_unicycler_contigs.fasta.masked #genome sequence (fasta file or$ organism_type=eukaryotic #eukaryotic or prokaryotic. Default is eukaryotic #-----Re-annotation Using MAKER Derived GFF3 maker_gff=/work/Geomicrobiology/msobol/IODP_329_SPG/1368D2H1/maker/1368D_2H1_contigs.fasta.maker.output/1368D_2H1_contigs.fasta.all.gff #MAKER derive$ est_pass=0 #use ESTs in maker_gff: 1 = yes, 0 = no altest_pass=1 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no protein_pass=1 #use protein alignments in maker_gff: 1 = yes, 0 = no rm_pass=0 #use repeats in maker_gff: 1 = yes, 0 = no model_pass=0 #use gene models in maker_gff: 1 = yes, 0 = no pred_pass=0 #use ab-initio predictions in maker_gff: 1 = yes, 0 = no other_pass=0 #passthrough anyything else in maker_gff: 1 = yes, 0 = no #-----EST Evidence (for best results provide a file for at least one) est= #set of ESTs or assembled mRNA-seq in fasta format altest= #EST/cDNA sequence file in fasta format from an alternate organism est_gff= #aligned ESTs or mRNA-seq from an external GFF3 file altest_gff= #aligned ESTs from a closly relate species in GFF3 format #-----Protein Homology Evidence (for best results provide a file for at least one) protein=/work/Geomicrobiology/msobol/IODP_329_SPG/uniprot_sprot.fasta #protein sequence file in fasta format (i.e. from mutiple oransisms) protein_gff= #aligned protein homology evidence from an external GFF3 file #-----Repeat Masking (leave values blank to skip repeat masking) model_org= #select a model organism for RepBase masking in RepeatMasker rmlib= #provide an organism specific repeat library in fasta format for RepeatMasker repeat_protein= #provide a fasta file of transposable element proteins for RepeatRunner rm_gff= #pre-identified repeat elements from an external GFF3 file prok_rm=0 #forces MAKER to repeatmask prokaryotes (no reason to change this), 1 = yes, 0 = no softmask=0 #use soft-masking rather than hard-masking in BLAST (i.e. seg and dust filtering) #-----Gene Prediction snaphmm= #SNAP HMM file gmhmm=/home/msobol/genemark/68D_2/output/gmhmm.mod #GeneMark HMM file augustus_species=1368D_uni #Augustus gene prediction species model fgenesh_par_file= #FGENESH parameter file pred_gff= #ab-initio predictions from an external GFF3 file model_gff= #annotated gene models from an external GFF3 file (annotation pass-through) est2genome=0 #infer gene predictions directly from ESTs, 1 = yes, 0 = no protein2genome=1 #infer predictions from protein homology, 1 = yes, 0 = no trna=0 #find tRNAs with tRNAscan, 1 = yes, 0 = no snoscan_rrna= #rRNA file to have Snoscan find snoRNAs unmask=0 #also run ab-initio prediction programs on unmasked sequence, 1 = yes, 0 = no #-----Other Annotation Feature Types (features MAKER doesn't recognize) other_gff= #extra features to pass-through to final MAKER generated GFF3 file #-----External Application Behavior Options alt_peptide=C #amino acid used to replace non-standard amino acids in BLAST databases cpus=1 #max number of cpus to use in BLAST and RepeatMasker (not for MPI, leave 1 when using MPI) #-----MAKER Behavior Options max_dna_len=100000 #length for dividing up contigs into chunks (increases/decreases memory usage) min_contig=1 #skip genome contigs below this length (under 10kb are often useless) pred_flank=200 #flank for extending evidence clusters sent to gene predictors pred_stats=1 #report AED and QI statistics for all predictions as well as models AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and 1) min_protein=0 #require at least this many amino acids in predicted proteins alt_splice=0 #Take extra steps to try and find alternative splicing, 1 = yes, 0 = no always_complete=0 #extra steps to force start and stop codons, 1 = yes, 0 = no map_forward=0 #map names and attributes forward from old GFF3 genes, 1 = yes, 0 = no keep_preds=1 #Concordance threshold to add unsupported gene prediction (bound by 0 and 1) split_hit=10000 #length for the splitting of hits (expected max intron size for evidence alignments) single_exon=1 #consider single exon EST evidence when generating annotations, 1 = yes, 0 = no single_length=250 #min length required for single exon ESTs if 'single_exon is enabled' correct_est_fusion=0 #limits use of ESTs in annotation to avoid fusion genes tries=2 #number of times to try a contig if there is a failure for some reason clean_try=0 #remove all data from previous run before retrying, 1 = yes, 0 = no clean_up=0 #removes theVoid directory with individual analysis files, 1 = yes, 0 = no TMP= #specify a directory other than the system default temporary directory for temporary files -------------- next part -------------- An HTML attachment was scrubbed... URL: From xvazquezc at gmail.com Sun Feb 3 15:43:42 2019 From: xvazquezc at gmail.com (=?UTF-8?Q?Xabier_V=C3=A1zquez=2DCampos?=) Date: Mon, 4 Feb 2019 09:43:42 +1100 Subject: [maker-devel] Re-annotation, fewer gene predictions In-Reply-To: <77517CF5-7C20-43FE-94EB-7F45A3D70A4F@live.com> References: <77517CF5-7C20-43FE-94EB-7F45A3D70A4F@live.com> Message-ID: Hi Morgan, We had a similar issue with AUGUSTUS underpredicting when using a BUSCO-derived gene model https://groups.google.com/d/msg/maker-devel/ocnDG4nq1A8/NyCPzzRgAgAJ Also, check the number of proteins by each individual predictor. If the numbers from one of them are off, you may find a possible source of issues. We didn't have a very good experience with GM, as it used to overpredict an absurd number of proteins. Xabi On Mon, 4 Feb 2019 at 06:15, morgan sobol wrote: > Hello, > > I previously used Maker to annotate two different fungal genomes that were > created using Illumina sequences only. For these genomes, I had over 11,000 > genes predicted. > I recently obtained PacBio sequences for the same genomes, so I created > two hybrid assemblies. Both assemblies were very familiar in length and > completed number of orthologs to the Illumina only assembly, but had much > fewer, but longer contigs. > > I re-ran Maker using the settings below. For one of my genomes, I got > around 11,000 genes predicted again, as expected. However, for the other > genome, I am continuously getting ~4,400 predicted genes. > > I am asking for help as to how I can determine why I keep getting fewer > predicted genes for only one of my genomes, even though I ran them the same? > > Thanks, > Morgan S. > > maker_opts.log > #-----Genome (these are always required) > genome=/work/Geomicrobiology/msobol/IODP_329_SPG/1368D2H1/repeatmasker/unicycler/1368D_unicycler_contigs.fasta.masked > #genome sequence (fasta file or$ > organism_type=eukaryotic #eukaryotic or prokaryotic. Default is eukaryotic > > #-----Re-annotation Using MAKER Derived GFF3 > maker_gff=/work/Geomicrobiology/msobol/IODP_329_SPG/1368D2H1/maker/1368D_2H1_contigs.fasta.maker.output/1368D_2H1_contigs.fasta.all.gff > #MAKER derive$ > est_pass=0 #use ESTs in maker_gff: 1 = yes, 0 = no > altest_pass=1 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no > protein_pass=1 #use protein alignments in maker_gff: 1 = yes, 0 = no > rm_pass=0 #use repeats in maker_gff: 1 = yes, 0 = no > model_pass=0 #use gene models in maker_gff: 1 = yes, 0 = no > pred_pass=0 #use ab-initio predictions in maker_gff: 1 = yes, 0 = no > other_pass=0 #passthrough anyything else in maker_gff: 1 = yes, 0 = no > > #-----EST Evidence (for best results provide a file for at least one) > est= #set of ESTs or assembled mRNA-seq in fasta format > altest= #EST/cDNA sequence file in fasta format from an alternate organism > est_gff= #aligned ESTs or mRNA-seq from an external GFF3 file > altest_gff= #aligned ESTs from a closly relate species in GFF3 format > > #-----Protein Homology Evidence (for best results provide a file for at > least one) > protein=/work/Geomicrobiology/msobol/IODP_329_SPG/uniprot_sprot.fasta > #protein sequence file in fasta format (i.e. from mutiple oransisms) > protein_gff= #aligned protein homology evidence from an external GFF3 file > > #-----Repeat Masking (leave values blank to skip repeat masking) > model_org= #select a model organism for RepBase masking in RepeatMasker > rmlib= #provide an organism specific repeat library in fasta format for > RepeatMasker > repeat_protein= #provide a fasta file of transposable element proteins for > RepeatRunner > rm_gff= #pre-identified repeat elements from an external GFF3 file > prok_rm=0 #forces MAKER to repeatmask prokaryotes (no reason to change > this), 1 = yes, 0 = no > softmask=0 #use soft-masking rather than hard-masking in BLAST (i.e. seg > and dust filtering) > > #-----Gene Prediction > snaphmm= #SNAP HMM file > gmhmm=/home/msobol/genemark/68D_2/output/gmhmm.mod #GeneMark HMM file > augustus_species=1368D_uni #Augustus gene prediction species model > fgenesh_par_file= #FGENESH parameter file > pred_gff= #ab-initio predictions from an external GFF3 file > model_gff= #annotated gene models from an external GFF3 file (annotation > pass-through) > est2genome=0 #infer gene predictions directly from ESTs, 1 = yes, 0 = no > protein2genome=1 #infer predictions from protein homology, 1 = yes, 0 = no > trna=0 #find tRNAs with tRNAscan, 1 = yes, 0 = no > snoscan_rrna= #rRNA file to have Snoscan find snoRNAs > unmask=0 #also run ab-initio prediction programs on unmasked sequence, 1 = > yes, 0 = no > > #-----Other Annotation Feature Types (features MAKER doesn't recognize) > other_gff= #extra features to pass-through to final MAKER generated GFF3 > file > > #-----External Application Behavior Options > alt_peptide=C #amino acid used to replace non-standard amino acids in > BLAST databases > cpus=1 #max number of cpus to use in BLAST and RepeatMasker (not for MPI, > leave 1 when using MPI) > > #-----MAKER Behavior Options > max_dna_len=100000 #length for dividing up contigs into chunks > (increases/decreases memory usage) > min_contig=1 #skip genome contigs below this length (under 10kb are often > useless) > > pred_flank=200 #flank for extending evidence clusters sent to gene > predictors > pred_stats=1 #report AED and QI statistics for all predictions as well as > models > AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and > 1) > min_protein=0 #require at least this many amino acids in predicted proteins > alt_splice=0 #Take extra steps to try and find alternative splicing, 1 = > yes, 0 = no > always_complete=0 #extra steps to force start and stop codons, 1 = yes, 0 > = no > map_forward=0 #map names and attributes forward from old GFF3 genes, 1 = > yes, 0 = no > keep_preds=1 #Concordance threshold to add unsupported gene prediction > (bound by 0 and 1) > > split_hit=10000 #length for the splitting of hits (expected max intron > size for evidence alignments) > single_exon=1 #consider single exon EST evidence when generating > annotations, 1 = yes, 0 = no > single_length=250 #min length required for single exon ESTs if > 'single_exon is enabled' > correct_est_fusion=0 #limits use of ESTs in annotation to avoid fusion > genes > > tries=2 #number of times to try a contig if there is a failure for some > reason > clean_try=0 #remove all data from previous run before retrying, 1 = yes, 0 > = no > clean_up=0 #removes theVoid directory with individual analysis files, 1 = > yes, 0 = no > TMP= #specify a directory other than the system default temporary > directory for temporary files > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -- Xabier V?zquez-Campos, *PhD* *Research Associate* NSW Systems Biology Initiative School of Biotechnology and Biomolecular Sciences The University of New South Wales Sydney NSW 2052 AUSTRALIA -------------- next part -------------- An HTML attachment was scrubbed... URL: From keith.decker at bayer.com Mon Feb 4 11:09:35 2019 From: keith.decker at bayer.com (DECKER, KEITH F [AG/1005]) Date: Mon, 4 Feb 2019 18:09:35 +0000 Subject: [maker-devel] MAKER on AWS Message-ID: <4660790F-38F4-470B-8B8E-9911A0BC36C3@contoso.com> I would like to evaluate the use of MAKER on AWS, but I am unsure what the best approach to parallelization would be. I found this old post on STARCLUSTER, http://efish.integrativebiology.msu.edu/2015/02/10/annotate.html but my understanding is that STARCLUSTER and its successors (cfncluster and parallel cluster) can be challenging to set up and use. So my questions are 1. Has anyone had recent success running MAKER on cfncluster or parallel cluster in AWS? 2. Would it be reasonable to just split up N chromosomes across N ECS instances and collect the results at the end? If so, does it make sense to run each chromosome level annotation on for example an m4.16xlarge instance with 64 cores and 256 GB of RAM? Or is there a maximum number of cores at which the benefits from parallelization saturate? Thanks and sorry for the long question Keith This system contains confidential and copyrighted information. Access to the system is limited to users only and only for approved business purposes. Anyone obtaining access to and using this system acknowledges that all information on this system including but not limited to electronic mail, word processing, directories and files, constitutes private property belonging to the Company. Anyone using of viewing this system is further advised that the use of this system may be recorded and the information contained herein may be monitored, retrieved and reviewed if, in the Company?s sole discretion there is a business reason to do so. If improper activity or use is suspected, all available information may be used by the Company for possible disciplinary action, prosecution, civil claim or any remedy or lawful purpose. -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Feb 4 11:31:29 2019 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 4 Feb 2019 11:31:29 -0700 Subject: [maker-devel] MAKER on AWS In-Reply-To: <4660790F-38F4-470B-8B8E-9911A0BC36C3@contoso.com> References: <4660790F-38F4-470B-8B8E-9911A0BC36C3@contoso.com> Message-ID: <0934DD0D-9431-4454-A278-87E27D44F984@gmail.com> You can try and stand up a cluster inside AWS, or like you said just start independent instances each with their own piece of the total dataset. There is a tools called fasta_tool inside of maker that makes it easy to split up the dataset into equal sized chunks. Alternatively, CyVerse has set up an interesting MAKER wrapper (WQ-MAKER) that launches multiple cloud instances for MAKER and handles data chunking for you (they?ve been using XSEDE cloud resources through the NSF) ?> http://ccl.cse.nd.edu/research/papers/maker-service-ic2e2018.pdf Here is an example of an external project using their setup ?> http://onsnetwork.org/kubu4/2018/08/07/genome-annotation-olympia-oyster-genome-using-wq-maker-instance-on-jetstream/ ?Carson > On Feb 4, 2019, at 11:09 AM, DECKER, KEITH F [AG/1005] wrote: > > I would like to evaluate the use of MAKER on AWS, but I am unsure what the best approach to parallelization would be. > I found this old post on STARCLUSTER, http://efish.integrativebiology.msu.edu/2015/02/10/annotate.html > but my understanding is that STARCLUSTER and its successors (cfncluster and parallel cluster) can be challenging to set up and use. > > So my questions are > > 1. Has anyone had recent success running MAKER on cfncluster or parallel cluster in AWS? > 2. Would it be reasonable to just split up N chromosomes across N ECS instances and collect the results at the end? If so, does it make sense to run each chromosome level annotation on for example an m4.16xlarge instance with 64 cores and 256 GB of RAM? Or is there a maximum number of cores at which the benefits from parallelization saturate? > > Thanks and sorry for the long question > Keith > > > This system contains confidential and copyrighted information. Access to the system is limited to users only and only for approved business purposes. > Anyone obtaining access to and using this system acknowledges that all information on this system including but not limited to electronic mail, word processing, directories and files, constitutes private property belonging to the Company. > Anyone using of viewing this system is further advised that the use of this system may be recorded and the information contained herein may be monitored, retrieved and reviewed if, in the Company?s sole discretion there is a business reason to do so. > If improper activity or use is suspected, all available information may be used by the Company for possible disciplinary action, prosecution, civil claim or any remedy or lawful purpose. > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From liorglck at gmail.com Mon Feb 4 02:00:29 2019 From: liorglck at gmail.com (Lior Glick) Date: Mon, 4 Feb 2019 11:00:29 +0200 Subject: [maker-devel] MAKER not calling RepeatMasker exe indicated in maker_exe.ctl Message-ID: Dear MAKER users, I've been using MAKER for a while now, with RepeatMasker installed locally. By that I mean that I can type 'RepeatMasker' in my terminal and the software is initiated. Typing 'which RepeatMasker' shows the correct local path. I also use this path as value for the maker_exe.ctl parameter 'RepeatMasker'. Trying to generalize my working environment, I am trying to use a conda env which is capable of running MAKER. This env comes with RepeatMasker as well. Once I activate this env, I can still run RepeatMasker, but it points to a different path. When I run MAKER within this env, it fails right away with the error message: ERROR: Could not determine if RepBase is installed Running the same configuration files locally (i.e. outside the conda env) results in a successful run. This leads me to think that MAKER is not actually using the path indicated in the maker_exe.ctl file, and rather looks for RepeatMasker in $PATH or something similar. Is that the expected behavior? Any suggestions of how to overcome this issue? Thanks and best regards, Lior -------------- next part -------------- An HTML attachment was scrubbed... URL: From keith.decker at bayer.com Mon Feb 4 11:39:48 2019 From: keith.decker at bayer.com (DECKER, KEITH F [AG/1005]) Date: Mon, 4 Feb 2019 18:39:48 +0000 Subject: [maker-devel] MAKER on AWS In-Reply-To: <0934DD0D-9431-4454-A278-87E27D44F984@gmail.com> References: <4660790F-38F4-470B-8B8E-9911A0BC36C3@contoso.com> <0934DD0D-9431-4454-A278-87E27D44F984@gmail.com> Message-ID: <1BAD7C53-AFA5-4A4A-B35B-D760B3D4C28D@monsanto.com> Thanks, Do you have metrics on how MAKER performs on annotating a single chromosome on a single machine? For example, will I see anything close to 16X speed-up using a 16 core machine, and does performance improvement saturate at a certain number of cores? -Keith From: Carson Holt Date: Monday, February 4, 2019 at 12:33 PM To: "DECKER, KEITH F [AG/1005]" Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] MAKER on AWS You can try and stand up a cluster inside AWS, or like you said just start independent instances each with their own piece of the total dataset. There is a tools called fasta_tool inside of maker that makes it easy to split up the dataset into equal sized chunks. Alternatively, CyVerse has set up an interesting MAKER wrapper (WQ-MAKER) that launches multiple cloud instances for MAKER and handles data chunking for you (they?ve been using XSEDE cloud resources through the NSF) ?> http://ccl.cse.nd.edu/research/papers/maker-service-ic2e2018.pdf Here is an example of an external project using their setup ?> http://onsnetwork.org/kubu4/2018/08/07/genome-annotation-olympia-oyster-genome-using-wq-maker-instance-on-jetstream/ ?Carson On Feb 4, 2019, at 11:09 AM, DECKER, KEITH F [AG/1005] > wrote: I would like to evaluate the use of MAKER on AWS, but I am unsure what the best approach to parallelization would be. I found this old post on STARCLUSTER, http://efish.integrativebiology.msu.edu/2015/02/10/annotate.html but my understanding is that STARCLUSTER and its successors (cfncluster and parallel cluster) can be challenging to set up and use. So my questions are 1. Has anyone had recent success running MAKER on cfncluster or parallel cluster in AWS? 2. Would it be reasonable to just split up N chromosomes across N ECS instances and collect the results at the end? If so, does it make sense to run each chromosome level annotation on for example an m4.16xlarge instance with 64 cores and 256 GB of RAM? Or is there a maximum number of cores at which the benefits from parallelization saturate? Thanks and sorry for the long question Keith This system contains confidential and copyrighted information. Access to the system is limited to users only and only for approved business purposes. Anyone obtaining access to and using this system acknowledges that all information on this system including but not limited to electronic mail, word processing, directories and files, constitutes private property belonging to the Company. Anyone using of viewing this system is further advised that the use of this system may be recorded and the information contained herein may be monitored, retrieved and reviewed if, in the Company?s sole discretion there is a business reason to do so. If improper activity or use is suspected, all available information may be used by the Company for possible disciplinary action, prosecution, civil claim or any remedy or lawful purpose. _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org This system contains confidential and copyrighted information. Access to the system is limited to users only and only for approved business purposes. Anyone obtaining access to and using this system acknowledges that all information on this system including but not limited to electronic mail, word processing, directories and files, constitutes private property belonging to the Company. Anyone using of viewing this system is further advised that the use of this system may be recorded and the information contained herein may be monitored, retrieved and reviewed if, in the Company?s sole discretion there is a business reason to do so. If improper activity or use is suspected, all available information may be used by the Company for possible disciplinary action, prosecution, civil claim or any remedy or lawful purpose. -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Feb 4 12:00:00 2019 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 4 Feb 2019 12:00:00 -0700 Subject: [maker-devel] MAKER on AWS In-Reply-To: <1BAD7C53-AFA5-4A4A-B35B-D760B3D4C28D@monsanto.com> References: <4660790F-38F4-470B-8B8E-9911A0BC36C3@contoso.com> <0934DD0D-9431-4454-A278-87E27D44F984@gmail.com> <1BAD7C53-AFA5-4A4A-B35B-D760B3D4C28D@monsanto.com> Message-ID: I don?t have cloud performance stats, but I do have cluster performance stats you may be able to somewhat correlate (attached). On a cluster we see nearly linear performance gains until ~100 CPU cores, and the plateau doesn?t fully level out until well after 600 cores (we are hitting IO and networking limits for inter-node communication). So if you are only using a single instance, you can essentially consider it the equivalent of a single real machine which would fall well under 100 CPU cores, and performance growth would be expected to be linear on that instance. ?Carson > On Feb 4, 2019, at 11:39 AM, DECKER, KEITH F [AG/1005] wrote: > > Thanks, > Do you have metrics on how MAKER performs on annotating a single chromosome on a single machine? For example, will I see anything close to 16X speed-up using a 16 core machine, and does performance improvement saturate at a certain number of cores? > > -Keith > > From: Carson Holt > > Date: Monday, February 4, 2019 at 12:33 PM > To: "DECKER, KEITH F [AG/1005]" > > Cc: "maker-devel at yandell-lab.org " > > Subject: Re: [maker-devel] MAKER on AWS > > You can try and stand up a cluster inside AWS, or like you said just start independent instances each with their own piece of the total dataset. There is a tools called fasta_tool inside of maker that makes it easy to split up the dataset into equal sized chunks. > > Alternatively, CyVerse has set up an interesting MAKER wrapper (WQ-MAKER) that launches multiple cloud instances for MAKER and handles data chunking for you (they?ve been using XSEDE cloud resources through the NSF) ?> > http://ccl.cse.nd.edu/research/papers/maker-service-ic2e2018.pdf > > Here is an example of an external project using their setup ?> http://onsnetwork.org/kubu4/2018/08/07/genome-annotation-olympia-oyster-genome-using-wq-maker-instance-on-jetstream/ > > ?Carson > > > > > > On Feb 4, 2019, at 11:09 AM, DECKER, KEITH F [AG/1005] > wrote: > > I would like to evaluate the use of MAKER on AWS, but I am unsure what the best approach to parallelization would be. > I found this old post on STARCLUSTER, http://efish.integrativebiology.msu.edu/2015/02/10/annotate.html > but my understanding is that STARCLUSTER and its successors (cfncluster and parallel cluster) can be challenging to set up and use. > > So my questions are > > 1. Has anyone had recent success running MAKER on cfncluster or parallel cluster in AWS? > 2. Would it be reasonable to just split up N chromosomes across N ECS instances and collect the results at the end? If so, does it make sense to run each chromosome level annotation on for example an m4.16xlarge instance with 64 cores and 256 GB of RAM? Or is there a maximum number of cores at which the benefits from parallelization saturate? > > Thanks and sorry for the long question > Keith > > > > This system contains confidential and copyrighted information. Access to the system is limited to users only and only for approved business purposes. > Anyone obtaining access to and using this system acknowledges that all information on this system including but not limited to electronic mail, word processing, directories and files, constitutes private property belonging to the Company. > Anyone using of viewing this system is further advised that the use of this system may be recorded and the information contained herein may be monitored, retrieved and reviewed if, in the Company?s sole discretion there is a business reason to do so. > If improper activity or use is suspected, all available information may be used by the Company for possible disciplinary action, prosecution, civil claim or any remedy or lawful purpose. > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > This system contains confidential and copyrighted information. Access to the system is limited to users only and only for approved business purposes. > Anyone obtaining access to and using this system acknowledges that all information on this system including but not limited to electronic mail, word processing, directories and files, constitutes private property belonging to the Company. > Anyone using of viewing this system is further advised that the use of this system may be recorded and the information contained herein may be monitored, retrieved and reviewed if, in the Company?s sole discretion there is a business reason to do so. > If improper activity or use is suspected, all available information may be used by the Company for possible disciplinary action, prosecution, civil claim or any remedy or lawful purpose. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: PastedGraphic-2.pdf Type: application/pdf Size: 41424 bytes Desc: not available URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: From xvazquezc at gmail.com Tue Feb 5 15:42:40 2019 From: xvazquezc at gmail.com (=?UTF-8?Q?Xabier_V=C3=A1zquez=2DCampos?=) Date: Wed, 6 Feb 2019 09:42:40 +1100 Subject: [maker-devel] Re-annotation, fewer gene predictions In-Reply-To: References: <77517CF5-7C20-43FE-94EB-7F45A3D70A4F@live.com> Message-ID: Don't you use SNAP? It usually produces quite decent results. And easier to train than any of the other predictors In any case, the Augustus gene model is way off in both cases GM doesn't seem bad if your fungus has a rather usual genome... in the first. For the second, it looks bad I'm not too familiar with the reannotation but I'd rather create the gene models from scratch rather than reuse the ones from the Illumina-only genomes. Note that assemblies with long-reads, have a higher proportion of repetitive elements that need masking and RepeatMasker only may not be enough. In theory, this shouldn't affect Augustus model if trained through BUSCO as it uses defined conserved markers to create the gene model, but I'm not so sure about GM. If you trained Augustus with BUSCO, and this is the result, I'd discard the gene model and train it again by the "traditional way", i.e. as it used to be when we only had CEGMA. I had good results just by changing the training method. Hope it helps, Xabi On Wed, 6 Feb 2019 at 02:19, morgan sobol wrote: > Thank you, Xabi for the response. > The number of proteins from each source is greatly lower than before. > Previous numbers were 325, 10,899, and 11,243 for augustus, genemark, and > maker respectively. > The more recent numbers are 25, 857, 4418 respectively. > > So do you think maybe this hints that something is wrong from genemark? > > Morgan > > > ------------------------------ > *From:* Xabier V?zquez-Campos > *Sent:* Sunday, February 3, 2019 4:43 PM > *To:* morgan sobol > *Cc:* maker-devel at yandell-lab.org > *Subject:* Re: [maker-devel] Re-annotation, fewer gene predictions > > Hi Morgan, > > We had a similar issue with AUGUSTUS underpredicting when using a > BUSCO-derived gene model > https://groups.google.com/d/msg/maker-devel/ocnDG4nq1A8/NyCPzzRgAgAJ > > Also, check the number of proteins by each individual predictor. If the > numbers from one of them are off, you may find a possible source of issues. > We didn't have a very good experience with GM, as it used to overpredict > an absurd number of proteins. > > Xabi > > On Mon, 4 Feb 2019 at 06:15, morgan sobol wrote: > > Hello, > > I previously used Maker to annotate two different fungal genomes that were > created using Illumina sequences only. For these genomes, I had over 11,000 > genes predicted. > I recently obtained PacBio sequences for the same genomes, so I created > two hybrid assemblies. Both assemblies were very familiar in length and > completed number of orthologs to the Illumina only assembly, but had much > fewer, but longer contigs. > > I re-ran Maker using the settings below. For one of my genomes, I got > around 11,000 genes predicted again, as expected. However, for the other > genome, I am continuously getting ~4,400 predicted genes. > > I am asking for help as to how I can determine why I keep getting fewer > predicted genes for only one of my genomes, even though I ran them the same? > > Thanks, > Morgan S. > > maker_opts.log > #-----Genome (these are always required) > genome=/work/Geomicrobiology/msobol/IODP_329_SPG/1368D2H1/repeatmasker/unicycler/1368D_unicycler_contigs.fasta.masked > #genome sequence (fasta file or$ > organism_type=eukaryotic #eukaryotic or prokaryotic. Default is eukaryotic > > #-----Re-annotation Using MAKER Derived GFF3 > maker_gff=/work/Geomicrobiology/msobol/IODP_329_SPG/1368D2H1/maker/1368D_2H1_contigs.fasta.maker.output/1368D_2H1_contigs.fasta.all.gff > #MAKER derive$ > est_pass=0 #use ESTs in maker_gff: 1 = yes, 0 = no > altest_pass=1 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no > protein_pass=1 #use protein alignments in maker_gff: 1 = yes, 0 = no > rm_pass=0 #use repeats in maker_gff: 1 = yes, 0 = no > model_pass=0 #use gene models in maker_gff: 1 = yes, 0 = no > pred_pass=0 #use ab-initio predictions in maker_gff: 1 = yes, 0 = no > other_pass=0 #passthrough anyything else in maker_gff: 1 = yes, 0 = no > > #-----EST Evidence (for best results provide a file for at least one) > est= #set of ESTs or assembled mRNA-seq in fasta format > altest= #EST/cDNA sequence file in fasta format from an alternate organism > est_gff= #aligned ESTs or mRNA-seq from an external GFF3 file > altest_gff= #aligned ESTs from a closly relate species in GFF3 format > > #-----Protein Homology Evidence (for best results provide a file for at > least one) > protein=/work/Geomicrobiology/msobol/IODP_329_SPG/uniprot_sprot.fasta > #protein sequence file in fasta format (i.e. from mutiple oransisms) > protein_gff= #aligned protein homology evidence from an external GFF3 file > > #-----Repeat Masking (leave values blank to skip repeat masking) > model_org= #select a model organism for RepBase masking in RepeatMasker > rmlib= #provide an organism specific repeat library in fasta format for > RepeatMasker > repeat_protein= #provide a fasta file of transposable element proteins for > RepeatRunner > rm_gff= #pre-identified repeat elements from an external GFF3 file > prok_rm=0 #forces MAKER to repeatmask prokaryotes (no reason to change > this), 1 = yes, 0 = no > softmask=0 #use soft-masking rather than hard-masking in BLAST (i.e. seg > and dust filtering) > > #-----Gene Prediction > snaphmm= #SNAP HMM file > gmhmm=/home/msobol/genemark/68D_2/output/gmhmm.mod #GeneMark HMM file > augustus_species=1368D_uni #Augustus gene prediction species model > fgenesh_par_file= #FGENESH parameter file > pred_gff= #ab-initio predictions from an external GFF3 file > model_gff= #annotated gene models from an external GFF3 file (annotation > pass-through) > est2genome=0 #infer gene predictions directly from ESTs, 1 = yes, 0 = no > protein2genome=1 #infer predictions from protein homology, 1 = yes, 0 = no > trna=0 #find tRNAs with tRNAscan, 1 = yes, 0 = no > snoscan_rrna= #rRNA file to have Snoscan find snoRNAs > unmask=0 #also run ab-initio prediction programs on unmasked sequence, 1 = > yes, 0 = no > > #-----Other Annotation Feature Types (features MAKER doesn't recognize) > other_gff= #extra features to pass-through to final MAKER generated GFF3 > file > > #-----External Application Behavior Options > alt_peptide=C #amino acid used to replace non-standard amino acids in > BLAST databases > cpus=1 #max number of cpus to use in BLAST and RepeatMasker (not for MPI, > leave 1 when using MPI) > > #-----MAKER Behavior Options > max_dna_len=100000 #length for dividing up contigs into chunks > (increases/decreases memory usage) > min_contig=1 #skip genome contigs below this length (under 10kb are often > useless) > > pred_flank=200 #flank for extending evidence clusters sent to gene > predictors > pred_stats=1 #report AED and QI statistics for all predictions as well as > models > AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and > 1) > min_protein=0 #require at least this many amino acids in predicted proteins > alt_splice=0 #Take extra steps to try and find alternative splicing, 1 = > yes, 0 = no > always_complete=0 #extra steps to force start and stop codons, 1 = yes, 0 > = no > map_forward=0 #map names and attributes forward from old GFF3 genes, 1 = > yes, 0 = no > keep_preds=1 #Concordance threshold to add unsupported gene prediction > (bound by 0 and 1) > > split_hit=10000 #length for the splitting of hits (expected max intron > size for evidence alignments) > single_exon=1 #consider single exon EST evidence when generating > annotations, 1 = yes, 0 = no > single_length=250 #min length required for single exon ESTs if > 'single_exon is enabled' > correct_est_fusion=0 #limits use of ESTs in annotation to avoid fusion > genes > > tries=2 #number of times to try a contig if there is a failure for some > reason > clean_try=0 #remove all data from previous run before retrying, 1 = yes, 0 > = no > clean_up=0 #removes theVoid directory with individual analysis files, 1 = > yes, 0 = no > TMP= #specify a directory other than the system default temporary > directory for temporary files > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > -- > Xabier V?zquez-Campos, *PhD* > *Research Associate* > NSW Systems Biology Initiative > School of Biotechnology and Biomolecular Sciences > The University of New South Wales > Sydney NSW 2052 AUSTRALIA > -- Xabier V?zquez-Campos, *PhD* *Research Associate* NSW Systems Biology Initiative School of Biotechnology and Biomolecular Sciences The University of New South Wales Sydney NSW 2052 AUSTRALIA -------------- next part -------------- An HTML attachment was scrubbed... URL: From xvazquezc at gmail.com Wed Feb 6 15:33:47 2019 From: xvazquezc at gmail.com (=?UTF-8?Q?Xabier_V=C3=A1zquez=2DCampos?=) Date: Thu, 7 Feb 2019 09:33:47 +1100 Subject: [maker-devel] Re-annotation, fewer gene predictions In-Reply-To: References: <77517CF5-7C20-43FE-94EB-7F45A3D70A4F@live.com> Message-ID: SNAP is easy to train, works well in fungal genomes and it's explained in Maker's wiki: http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial_for_WGS_Assembly_and_Annotation_Winter_School_2018#Training_ab_initio_Gene_Predictors Oh, sorry, I didn't explain myself well. What I was trying to say is that before BUSCO, when we only had CEGMA, we would proceed in a different way to train Augustus as CEGMA wouldn't produce Augustus gene models automatically. I don't mean you to use CEGMA. This is what I have on my own documentation about how to train Augustus "the old way" > AUGUSTUS? the old way > > Alternatively, you can train AUGUSTUS in a more ?manual? way, like when we > were using CEGMA. The training starts with the output from the second > instance of fathom in the SNAP training section. > > cd ${MYGENOME_DIR}/maker/snap1 > perl ~/bin/zff2augustus_gbk.pl > ${MYGENOME}.train1.gb > > zff2augustus_gbk.pl generates a GenBank file from export.dna. > > The actual training of AUGUSTUS will be through the *webAUGUSTUS server*. > > Before proceed, it is recommended to rename the fasta headers, specially > if they contain special characters and/or very long headers. This is the > main reason of failure for the jobs submitted to webAUGUSTUS. You can use > the simplifyFastaHeaders.pl > > script for that: > > perl ~/bin/simplifyFastaHeaders.pl ${MYGENOME}_assembly.fasta nameStem ${MYGENOME}_contigs_rename.fasta ${MYGENOME}_contigs.map > > perl ~/bin/simplifyFastaHeaders.pl ${MYGENOME}_transcripts_assembled.fasta nameStem ${MYGENOME}_rna_rename.fasta ${MYGENOME}_rna.map > > nameStem is the base name for naming each of the sequences in the > multifasta files. Use a value with something appropriate. Use *contig* > and *rna* for the assembly and RNA-seq files, respectively; or something > based on that. For example, ?pgcontig? and ?pgrna? for contigs and RNA from *Puccinia > graminis* > *DO NOT* give the same nameStem to both fasta files, and don?t use any > special character. > > We need the following files (minimum): > > - ${MYGENOME}_assembly.fasta as *Genome file* > - ${MYGENOME}.train1.gb as *Training gene structure file* > > If we also have RNA-seq data: > > - ${MYGENOME}_assembled_transcripts.fasta as *cDNA file* > > Use ${MYGENOME}_v1 as *Species name*. We will need to have a different > species name in the retraining step. Otherwise when Maker2 is rerun, Maker2 > will see the same name and will not rerun AUGUSTUS, even though the species > profile is different. So, ${MYGENOME}_v1 just do the job and tracks > version. > > Once the job is finished, the *Species parameter archive* ( > parameters.tar.gz) will contain a folder with the model files for your > species. Copy it to the species folder of your AUGUSTUS installation. > Hope this helps PS: hit reply all so this is logged in Maker's mail list in case anybody else experiences similar issues On Thu, 7 Feb 2019 at 06:36, morgan sobol wrote: > I have not used SNAP or CEGMA, however, I see that CEGMA was discontinued > in 2015. > Do you think that will be a problem, or is it still worth using the old > version? > > > ------------------------------ > *From:* Xabier V?zquez-Campos > *Sent:* Tuesday, February 5, 2019 4:42 PM > *To:* morgan sobol; Maker Mailing List > *Subject:* Re: [maker-devel] Re-annotation, fewer gene predictions > > Don't you use SNAP? It usually produces quite decent results. And easier > to train than any of the other predictors > > In any case, the Augustus gene model is way off in both cases > GM doesn't seem bad if your fungus has a rather usual genome... in the > first. For the second, it looks bad > > I'm not too familiar with the reannotation but I'd rather create the gene > models from scratch rather than reuse the ones from the Illumina-only > genomes. > Note that assemblies with long-reads, have a higher proportion of > repetitive elements that need masking and RepeatMasker only may not be > enough. In theory, this shouldn't affect Augustus model if trained through > BUSCO as it uses defined conserved markers to create the gene model, but > I'm not so sure about GM. > > If you trained Augustus with BUSCO, and this is the result, I'd discard > the gene model and train it again by the "traditional way", i.e. as it used > to be when we only had CEGMA. I had good results just by changing the > training method. > > Hope it helps, > Xabi > > > > > On Wed, 6 Feb 2019 at 02:19, morgan sobol wrote: > > Thank you, Xabi for the response. > The number of proteins from each source is greatly lower than before. > Previous numbers were 325, 10,899, and 11,243 for augustus, genemark, and > maker respectively. > The more recent numbers are 25, 857, 4418 respectively. > > So do you think maybe this hints that something is wrong from genemark? > > Morgan > > > ------------------------------ > *From:* Xabier V?zquez-Campos > *Sent:* Sunday, February 3, 2019 4:43 PM > *To:* morgan sobol > *Cc:* maker-devel at yandell-lab.org > *Subject:* Re: [maker-devel] Re-annotation, fewer gene predictions > > Hi Morgan, > > We had a similar issue with AUGUSTUS underpredicting when using a > BUSCO-derived gene model > https://groups.google.com/d/msg/maker-devel/ocnDG4nq1A8/NyCPzzRgAgAJ > > Also, check the number of proteins by each individual predictor. If the > numbers from one of them are off, you may find a possible source of issues. > We didn't have a very good experience with GM, as it used to overpredict > an absurd number of proteins. > > Xabi > > On Mon, 4 Feb 2019 at 06:15, morgan sobol wrote: > > Hello, > > I previously used Maker to annotate two different fungal genomes that were > created using Illumina sequences only. For these genomes, I had over 11,000 > genes predicted. > I recently obtained PacBio sequences for the same genomes, so I created > two hybrid assemblies. Both assemblies were very familiar in length and > completed number of orthologs to the Illumina only assembly, but had much > fewer, but longer contigs. > > I re-ran Maker using the settings below. For one of my genomes, I got > around 11,000 genes predicted again, as expected. However, for the other > genome, I am continuously getting ~4,400 predicted genes. > > I am asking for help as to how I can determine why I keep getting fewer > predicted genes for only one of my genomes, even though I ran them the same? > > Thanks, > Morgan S. > > maker_opts.log > #-----Genome (these are always required) > genome=/work/Geomicrobiology/msobol/IODP_329_SPG/1368D2H1/repeatmasker/unicycler/1368D_unicycler_contigs.fasta.masked > #genome sequence (fasta file or$ > organism_type=eukaryotic #eukaryotic or prokaryotic. Default is eukaryotic > > #-----Re-annotation Using MAKER Derived GFF3 > maker_gff=/work/Geomicrobiology/msobol/IODP_329_SPG/1368D2H1/maker/1368D_2H1_contigs.fasta.maker.output/1368D_2H1_contigs.fasta.all.gff > #MAKER derive$ > est_pass=0 #use ESTs in maker_gff: 1 = yes, 0 = no > altest_pass=1 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no > protein_pass=1 #use protein alignments in maker_gff: 1 = yes, 0 = no > rm_pass=0 #use repeats in maker_gff: 1 = yes, 0 = no > model_pass=0 #use gene models in maker_gff: 1 = yes, 0 = no > pred_pass=0 #use ab-initio predictions in maker_gff: 1 = yes, 0 = no > other_pass=0 #passthrough anyything else in maker_gff: 1 = yes, 0 = no > > #-----EST Evidence (for best results provide a file for at least one) > est= #set of ESTs or assembled mRNA-seq in fasta format > altest= #EST/cDNA sequence file in fasta format from an alternate organism > est_gff= #aligned ESTs or mRNA-seq from an external GFF3 file > altest_gff= #aligned ESTs from a closly relate species in GFF3 format > > #-----Protein Homology Evidence (for best results provide a file for at > least one) > protein=/work/Geomicrobiology/msobol/IODP_329_SPG/uniprot_sprot.fasta > #protein sequence file in fasta format (i.e. from mutiple oransisms) > protein_gff= #aligned protein homology evidence from an external GFF3 file > > #-----Repeat Masking (leave values blank to skip repeat masking) > model_org= #select a model organism for RepBase masking in RepeatMasker > rmlib= #provide an organism specific repeat library in fasta format for > RepeatMasker > repeat_protein= #provide a fasta file of transposable element proteins for > RepeatRunner > rm_gff= #pre-identified repeat elements from an external GFF3 file > prok_rm=0 #forces MAKER to repeatmask prokaryotes (no reason to change > this), 1 = yes, 0 = no > softmask=0 #use soft-masking rather than hard-masking in BLAST (i.e. seg > and dust filtering) > > #-----Gene Prediction > snaphmm= #SNAP HMM file > gmhmm=/home/msobol/genemark/68D_2/output/gmhmm.mod #GeneMark HMM file > augustus_species=1368D_uni #Augustus gene prediction species model > fgenesh_par_file= #FGENESH parameter file > pred_gff= #ab-initio predictions from an external GFF3 file > model_gff= #annotated gene models from an external GFF3 file (annotation > pass-through) > est2genome=0 #infer gene predictions directly from ESTs, 1 = yes, 0 = no > protein2genome=1 #infer predictions from protein homology, 1 = yes, 0 = no > trna=0 #find tRNAs with tRNAscan, 1 = yes, 0 = no > snoscan_rrna= #rRNA file to have Snoscan find snoRNAs > unmask=0 #also run ab-initio prediction programs on unmasked sequence, 1 = > yes, 0 = no > > #-----Other Annotation Feature Types (features MAKER doesn't recognize) > other_gff= #extra features to pass-through to final MAKER generated GFF3 > file > > #-----External Application Behavior Options > alt_peptide=C #amino acid used to replace non-standard amino acids in > BLAST databases > cpus=1 #max number of cpus to use in BLAST and RepeatMasker (not for MPI, > leave 1 when using MPI) > > #-----MAKER Behavior Options > max_dna_len=100000 #length for dividing up contigs into chunks > (increases/decreases memory usage) > min_contig=1 #skip genome contigs below this length (under 10kb are often > useless) > > pred_flank=200 #flank for extending evidence clusters sent to gene > predictors > pred_stats=1 #report AED and QI statistics for all predictions as well as > models > AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and > 1) > min_protein=0 #require at least this many amino acids in predicted proteins > alt_splice=0 #Take extra steps to try and find alternative splicing, 1 = > yes, 0 = no > always_complete=0 #extra steps to force start and stop codons, 1 = yes, 0 > = no > map_forward=0 #map names and attributes forward from old GFF3 genes, 1 = > yes, 0 = no > keep_preds=1 #Concordance threshold to add unsupported gene prediction > (bound by 0 and 1) > > split_hit=10000 #length for the splitting of hits (expected max intron > size for evidence alignments) > single_exon=1 #consider single exon EST evidence when generating > annotations, 1 = yes, 0 = no > single_length=250 #min length required for single exon ESTs if > 'single_exon is enabled' > correct_est_fusion=0 #limits use of ESTs in annotation to avoid fusion > genes > > tries=2 #number of times to try a contig if there is a failure for some > reason > clean_try=0 #remove all data from previous run before retrying, 1 = yes, 0 > = no > clean_up=0 #removes theVoid directory with individual analysis files, 1 = > yes, 0 = no > TMP= #specify a directory other than the system default temporary > directory for temporary files > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > -- > Xabier V?zquez-Campos, *PhD* > *Research Associate* > NSW Systems Biology Initiative > School of Biotechnology and Biomolecular Sciences > The University of New South Wales > Sydney NSW 2052 AUSTRALIA > > > > -- > Xabier V?zquez-Campos, *PhD* > *Research Associate* > NSW Systems Biology Initiative > School of Biotechnology and Biomolecular Sciences > The University of New South Wales > Sydney NSW 2052 AUSTRALIA > -- Xabier V?zquez-Campos, *PhD* *Research Associate* NSW Systems Biology Initiative School of Biotechnology and Biomolecular Sciences The University of New South Wales Sydney NSW 2052 AUSTRALIA -------------- next part -------------- An HTML attachment was scrubbed... URL: From liorglic at mail.tau.ac.il Mon Feb 11 07:04:16 2019 From: liorglic at mail.tau.ac.il (Lior Glick) Date: Mon, 11 Feb 2019 16:04:16 +0200 Subject: [maker-devel] MAKER not calling RepeatMasker exe indicated in maker_exe.ctl Message-ID: Dear MAKER users, I've been using MAKER for a while now, with RepeatMasker installed locally. By that I mean that I can type 'RepeatMasker' in my terminal and the software is initiated. Typing 'which RepeatMasker' shows the correct local path. I also use this path as value for the maker_exe.ctl parameter 'RepeatMasker'. Trying to generalize my working environment, I am trying to use a conda env which is capable of running MAKER. This env comes with RepeatMasker as well. Once I activate this env, I can still run RepeatMasker, but it points to a different path. When I run MAKER within this env, it fails right away with the error message: ERROR: Could not determine if RepBase is installed Running the same configuration files locally (i.e. outside the conda env) results in a successful run. This leads me to think that MAKER is not actually using the path indicated in the maker_exe.ctl file, and rather looks for RepeatMasker in $PATH or something similar. Is that the expected behavior? Any suggestions of how to overcome this issue? Thanks and best regards, Lior -------------- next part -------------- An HTML attachment was scrubbed... URL: From liorglic at mail.tau.ac.il Mon Feb 11 07:12:25 2019 From: liorglic at mail.tau.ac.il (Lior Glick) Date: Mon, 11 Feb 2019 16:12:25 +0200 Subject: [maker-devel] Unknown (X) amino acids in predicted proteins Message-ID: Dear MAKER users, After completing a MAKER run, I looked at the protein fasta files that MAKER outputs and noticed that a small fraction of the sequences include X characters, indicating unknown amino acids. I was wondering how such sequences are obtained, I mean how come there are unknown amino acids in the prediction? Is this an indication of low-quality predictions? Is there any documentation regarding the procedure that generates the protein sequences? Thanks a lot, Lior -------------- next part -------------- An HTML attachment was scrubbed... URL: From kapeelc at gmail.com Thu Feb 7 12:43:47 2019 From: kapeelc at gmail.com (Kapeel Chougule) Date: Thu, 7 Feb 2019 14:43:47 -0500 Subject: [maker-devel] MAKER v3 Fgenesh ERROR Message-ID: Hi, Carson I have been getting this error with fgenesh tool within MAKER. It runs ok with most of the assembly contigs but seems to fail on one contig or part of the contig with the below error Widget::fgenesh: /mnt/grid/ware/hpc/home/data/mcampbel/applications/maker/bin/../lib/Widget/fgenesh/fgenesh_wrap /mnt/grid/ware/hpc_norepl/data/data/programs/fgenesh_v8/fgenesh_suite_v8.0.0a/fgenesh /sonas-hs/ware/hpc_norepl/data/programs/fgenesh_v8/fgenesh_suite_v8.0.0a/Zeamays.mpar.dat.new /tmp/uge/53139300.1.primary.q/maker_j3ttxX/6/6_1.600610-613023.Zeamays.mpar.dat.new.auto_annotator.fgenesh.fasta -exon_table:/tmp/uge/53139300.1.primary.q/maker_j3ttxX/6/6_1.600610-613023.Zeamays.mpar.dat.new.auto_annotator.xdef.fgenesh > /tmp/uge/53139300.1.primary.q/maker_j3ttxX/6/6_1.600610-613023.Zeamays.mpar.dat.new.auto_annotator.fgenesh #-------------------------------# ...processing 9 of 24 ...processing 8 of 28 ...processing 10 of 24 ...processing 9 of 28 ...processing 11 of 24 ...processing 10 of 28 ...processing 12 of 24 ...processing 11 of 28 deleted:0 genes ERROR: FgenesH failed --> rank=14, hostname=bnbcompute50 ERROR: Failed while annotating transcripts ERROR: Chunk failed at level:1, tier_type:4 FAILED CONTIG:Super-Scaffold_14.2_contig2 I updated the perl module fgenesh.pm as suggested in the previous threads. Attached are the maker_opts.ctl and STDERR log file. Thanks Kapeel -- *Kapeel ChouguleComputational Scientist Developer II* *One Bungtown Road Cold Spring Harbor, NY 11724http://www.warelab.org/ * -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_opts.ctl Type: application/octet-stream Size: 5420 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: stderr.log Type: application/octet-stream Size: 10012917 bytes Desc: not available URL: From fatih.sarigoel at durham.ac.uk Wed Feb 13 05:20:40 2019 From: fatih.sarigoel at durham.ac.uk (SARIGOEL, FATIH) Date: Wed, 13 Feb 2019 12:20:40 +0000 Subject: [maker-devel] Does Conda Maker actually work? Message-ID: Greetings, I notice that you never mention conda installation on your website, so I am curious if the conda version is actually supposed to be working fine or not; as for me it didn't. I created a new conda environment and installed Maker (tried this with both installation options) When I run the example files, I get this error: "make: *** [Makefile:330: IndexedBase_14e0.o] Error 127 A problem was encountered while attempting to compile and install your Inline C code. The command that failed was: "make > out.make 2>&1" with error code 2" My conda environment is here /fast_new/work/users/fsarigo_m/miniconda3 I don't understand why the program is trying to look here: /home/conda which does not exist Also begins with a "possible precedence issue" Thanks for your help in advance! Fatih +++++ Here is the full log until the end of the contig: (MakerX) [fsarigo_m at med0223 MAKER]$ maker Possible precedence issue with control flow operator at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/Bio/DB/IndexedBase.pm line 845. STATUS: Parsing control files... STATUS: Processing and indexing input FASTA files... STATUS: Setting up database for any GFF3 input... A data structure will be created for you at: /fast_new/work/users/fsarigo_m/scratch/tmp/urchin_holder/SeaUrchins/MAKER/dpp_contig.maker.output/dpp_contig_datastore To access files for individual sequences use the datastore index: /fast_new/work/users/fsarigo_m/scratch/tmp/urchin_holder/SeaUrchins/MAKER/dpp_contig.maker.output/dpp_contig_master_datastore_index.log STATUS: Now running MAKER... examining contents of the fasta file and run log --Next Contig-- Processing run.log file... #--------------------------------------------------------------------- Now starting the contig!! SeqID: contig-dpp-500-500 Length: 32156 #--------------------------------------------------------------------- Running Mkbootstrap for IndexedBase_14e0 () chmod 644 "IndexedBase_14e0.bs" "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/perl" -MExtUtils::Command::MM -e 'cp_nonempty' -- IndexedBase_14e0.bs blib/arch/auto/Bio/DB/IndexedBase_14e0/IndexedBase_14e0.bs 644 "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/perl" "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/ExtUtils/xsubpp" -typemap "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/5.26.2/ExtUtils/typemap" IndexedBase_14e0.xs > IndexedBase_14e0.xsc mv IndexedBase_14e0.xsc IndexedBase_14e0.c /home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/bin/x86_64-conda_cos6-linux-gnu-gcc -c -I"/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin" -D_REENTRANT -D_GNU_SOURCE --sysroot=/home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/x86_64-conda_cos6-linux-gnu/sysroot -fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D_FORTIFY_SOURCE=2 -O2 -DVERSION=\"0.00\" -DXS_VERSION=\"0.00\" -fPIC --sysroot=/home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/x86_64-conda_cos6-linux-gnu/sysroot "-I/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/5.26.2/x86_64-linux-thread-multi/CORE" IndexedBase_14e0.c /bin/sh: /home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/bin/x86_64-conda_cos6-linux-gnu-gcc: No such file or directory make: *** [Makefile:330: IndexedBase_14e0.o] Error 127 A problem was encountered while attempting to compile and install your Inline C code. The command that failed was: "make > out.make 2>&1" with error code 2 The build directory was: /fast_new/work/users/fsarigo_m/scratch/tmp/urchin_holder/SeaUrchins/MAKER/_Inline/build/Bio/DB/IndexedBase_14e0 To debug the problem, cd to the build directory, and inspect the output files. Environment PATH = '/fast/users/fsarigo_m/miniconda3/envs/MakerX/bin:/fast/users/fsarigo_m/miniconda3/bin:/opt/sge/bin:/opt/sge/bin/lx-amd64:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/fast/users/fsarigo_m/.local/bin:/fast/users/fsarigo_m/bin' at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/Bio/DB/IndexedBase.pm line 275. --> rank=NA, hostname=med0223 ...propagated at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/Inline/C.pm line 869. --> rank=NA, hostname=med0223 at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Error.pm line 38. Error::_throw_Error_Simple(HASH(0x564b40c78870)) called at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Error.pm line 306 Error::subs::run_clauses(HASH(0x564b40688970), "Running Mkbootstrap for IndexedBase_14e0 ()\x{a}chmod 644 \"Indexe"..., undef, ARRAY(0x564b40673ad0)) called at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Error.pm line 426 Error::subs::try(CODE(0x564b406899b8), HASH(0x564b40688970)) called at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/FastaSeq.pm line 95 FastaSeq::seq(FastaSeq=HASH(0x564b4068a7f0)) called at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Process/MpiChunk.pm line 478 Process::MpiChunk::_go(Process::MpiChunk=HASH(0x564b40673c08), "run", HASH(0x564b40673c80), 0, 0) called at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Process/MpiChunk.pm line 341 Process::MpiChunk::run(Process::MpiChunk=HASH(0x564b40673c08), 0) called at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Process/MpiChunk.pm line 357 Process::MpiChunk::run_all(Process::MpiChunk=HASH(0x564b40673c08), 0) called at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Process/MpiTiers.pm line 287 Process::MpiTiers::run_all(Process::MpiTiers=HASH(0x564b4053f9f0), 0) called at /fast/users/fsarigo_m/miniconda3/envs/MakerX/bin/maker line 683 Running Mkbootstrap for IndexedBase_14e0 () chmod 644 "IndexedBase_14e0.bs" "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/perl" -MExtUtils::Command::MM -e 'cp_nonempty' -- IndexedBase_14e0.bs blib/arch/auto/Bio/DB/IndexedBase_14e0/IndexedBase_14e0.bs 644 "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/perl" "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/ExtUtils/xsubpp" -typemap "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/5.26.2/ExtUtils/typemap" IndexedBase_14e0.xs > IndexedBase_14e0.xsc mv IndexedBase_14e0.xsc IndexedBase_14e0.c /home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/bin/x86_64-conda_cos6-linux-gnu-gcc -c -I"/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin" -D_REENTRANT -D_GNU_SOURCE --sysroot=/home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/x86_64-conda_cos6-linux-gnu/sysroot -fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D_FORTIFY_SOURCE=2 -O2 -DVERSION=\"0.00\" -DXS_VERSION=\"0.00\" -fPIC --sysroot=/home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/x86_64-conda_cos6-linux-gnu/sysroot "-I/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/5.26.2/x86_64-linux-thread-multi/CORE" IndexedBase_14e0.c /bin/sh: /home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/bin/x86_64-conda_cos6-linux-gnu-gcc: No such file or directory make: *** [Makefile:330: IndexedBase_14e0.o] Error 127 A problem was encountered while attempting to compile and install your Inline C code. The command that failed was: "make > out.make 2>&1" with error code 2 The build directory was: /fast_new/work/users/fsarigo_m/scratch/tmp/urchin_holder/SeaUrchins/MAKER/_Inline/build/Bio/DB/IndexedBase_14e0 To debug the problem, cd to the build directory, and inspect the output files. Environment PATH = '/fast/users/fsarigo_m/miniconda3/envs/MakerX/bin:/fast/users/fsarigo_m/miniconda3/bin:/opt/sge/bin:/opt/sge/bin/lx-amd64:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/fast/users/fsarigo_m/.local/bin:/fast/users/fsarigo_m/bin' at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/Bio/DB/IndexedBase.pm line 275. --> rank=NA, hostname=med0223 ...propagated at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/Inline/C.pm line 869. --> rank=NA, hostname=med0223 --> rank=NA, hostname=med0223 --> rank=NA, hostname=med0223 ERROR: Failed while examining contents of the fasta file and run log ERROR: Chunk failed at level:0, tier_type:0 FAILED CONTIG:contig-dpp-500-500 examining contents of the fasta file and run log -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Feb 13 07:51:44 2019 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 13 Feb 2019 07:51:44 -0700 Subject: [maker-devel] Does Conda Maker actually work? In-Reply-To: References: Message-ID: <0A81593F-EB19-417F-9C9D-3C55178F5D0F@gmail.com> The conda recipe was produced by another group. I do not currently recommend using it because I have seen a number of issues pop up on the list based on people attempting to install MAKER via conda. I know there is at least an issue with the conda RepeatMasker install, and there may be others. The specific failure you show is from Bio::DB::IndexedBase trying to compile an Inline::C function. It may be that conda is installing an older BioPerl where this issue still exists ?> https://github.com/bioperl/bioperl-live/issues/215 Or it may be that there is a new related issue (I?ve seen a handful of other examples that seem to relate back to Bio::DB::IndexedBase) ?> https://github.com/bioperl/bioperl-live/issues/305 Try installing MAKER without conda (make sure to remove any components that are in conda first to avoid conflicts). ?Carson > On Feb 13, 2019, at 5:20 AM, SARIGOEL, FATIH wrote: > > Greetings, > I notice that you never mention conda installation on your website, so I am curious if the conda version is actually supposed to be working fine or not; as for me it didn't. > I created a new conda environment and installed Maker (tried this with both installation options) > When I run the example files, I get this error: > > "make: *** [Makefile:330: IndexedBase_14e0.o] Error 127 > A problem was encountered while attempting to compile and install your Inline > C code. The command that failed was: > "make > out.make 2>&1" with error code 2" > > My conda environment is here > /fast_new/work/users/fsarigo_m/miniconda3 > I don't understand why the program is trying to look here: > /home/conda > which does not exist > > Also begins with a "possible precedence issue" > > Thanks for your help in advance! > Fatih > > +++++ > > Here is the full log until the end of the contig: > > (MakerX) [fsarigo_m at med0223 MAKER]$ maker > Possible precedence issue with control flow operator at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/Bio/DB/IndexedBase.pm line 845. > STATUS: Parsing control files... > STATUS: Processing and indexing input FASTA files... > STATUS: Setting up database for any GFF3 input... > A data structure will be created for you at: > /fast_new/work/users/fsarigo_m/scratch/tmp/urchin_holder/SeaUrchins/MAKER/dpp_contig.maker.output/dpp_contig_datastore > > To access files for individual sequences use the datastore index: > /fast_new/work/users/fsarigo_m/scratch/tmp/urchin_holder/SeaUrchins/MAKER/dpp_contig.maker.output/dpp_contig_master_datastore_index.log > > STATUS: Now running MAKER... > examining contents of the fasta file and run log > > > > --Next Contig-- > > Processing run.log file... > #--------------------------------------------------------------------- > Now starting the contig!! > SeqID: contig-dpp-500-500 > Length: 32156 > #--------------------------------------------------------------------- > > > Running Mkbootstrap for IndexedBase_14e0 () > chmod 644 "IndexedBase_14e0.bs" > "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/perl" -MExtUtils::Command::MM -e 'cp_nonempty' -- IndexedBase_14e0.bs blib/arch/auto/Bio/DB/IndexedBase_14e0/IndexedBase_14e0.bs 644 > "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/perl" "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/ExtUtils/xsubpp" -typemap "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/5.26.2/ExtUtils/typemap" IndexedBase_14e0.xs > IndexedBase_14e0.xsc > mv IndexedBase_14e0.xsc IndexedBase_14e0.c > /home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/bin/x86_64-conda_cos6-linux-gnu-gcc -c -I"/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin" -D_REENTRANT -D_GNU_SOURCE --sysroot=/home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/x86_64-conda_cos6-linux-gnu/sysroot -fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D_FORTIFY_SOURCE=2 -O2 -DVERSION=\"0.00\" -DXS_VERSION=\"0.00\" -fPIC --sysroot=/home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/x86_64-conda_cos6-linux-gnu/sysroot "-I/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/5.26.2/x86_64-linux-thread-multi/CORE" IndexedBase_14e0.c > /bin/sh: /home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/bin/x86_64-conda_cos6-linux-gnu-gcc: No such file or directory > make: *** [Makefile:330: IndexedBase_14e0.o] Error 127 > > A problem was encountered while attempting to compile and install your Inline > C code. The command that failed was: > "make > out.make 2>&1" with error code 2 > > The build directory was: > /fast_new/work/users/fsarigo_m/scratch/tmp/urchin_holder/SeaUrchins/MAKER/_Inline/build/Bio/DB/IndexedBase_14e0 > > To debug the problem, cd to the build directory, and inspect the output files. > > Environment PATH = '/fast/users/fsarigo_m/miniconda3/envs/MakerX/bin:/fast/users/fsarigo_m/miniconda3/bin:/opt/sge/bin:/opt/sge/bin/lx-amd64:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/fast/users/fsarigo_m/.local/bin:/fast/users/fsarigo_m/bin' > at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/Bio/DB/IndexedBase.pm line 275. > --> rank=NA, hostname=med0223 > ...propagated at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/Inline/C.pm line 869. > --> rank=NA, hostname=med0223 > at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Error.pm line 38. > Error::_throw_Error_Simple(HASH(0x564b40c78870)) called at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Error.pm line 306 > Error::subs::run_clauses(HASH(0x564b40688970), "Running Mkbootstrap for IndexedBase_14e0 ()\x{a}chmod 644 \"Indexe"..., undef, ARRAY(0x564b40673ad0)) called at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Error.pm line 426 > Error::subs::try(CODE(0x564b406899b8), HASH(0x564b40688970)) called at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/FastaSeq.pm line 95 > FastaSeq::seq(FastaSeq=HASH(0x564b4068a7f0)) called at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Process/MpiChunk.pm line 478 > Process::MpiChunk::_go(Process::MpiChunk=HASH(0x564b40673c08), "run", HASH(0x564b40673c80), 0, 0) called at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Process/MpiChunk.pm line 341 > Process::MpiChunk::run(Process::MpiChunk=HASH(0x564b40673c08), 0) called at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Process/MpiChunk.pm line 357 > Process::MpiChunk::run_all(Process::MpiChunk=HASH(0x564b40673c08), 0) called at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Process/MpiTiers.pm line 287 > Process::MpiTiers::run_all(Process::MpiTiers=HASH(0x564b4053f9f0), 0) called at /fast/users/fsarigo_m/miniconda3/envs/MakerX/bin/maker line 683 > Running Mkbootstrap for IndexedBase_14e0 () > chmod 644 "IndexedBase_14e0.bs" > "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/perl" -MExtUtils::Command::MM -e 'cp_nonempty' -- IndexedBase_14e0.bs blib/arch/auto/Bio/DB/IndexedBase_14e0/IndexedBase_14e0.bs 644 > "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/perl" "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/ExtUtils/xsubpp" -typemap "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/5.26.2/ExtUtils/typemap" IndexedBase_14e0.xs > IndexedBase_14e0.xsc > mv IndexedBase_14e0.xsc IndexedBase_14e0.c > /home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/bin/x86_64-conda_cos6-linux-gnu-gcc -c -I"/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin" -D_REENTRANT -D_GNU_SOURCE --sysroot=/home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/x86_64-conda_cos6-linux-gnu/sysroot -fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D_FORTIFY_SOURCE=2 -O2 -DVERSION=\"0.00\" -DXS_VERSION=\"0.00\" -fPIC --sysroot=/home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/x86_64-conda_cos6-linux-gnu/sysroot "-I/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/5.26.2/x86_64-linux-thread-multi/CORE" IndexedBase_14e0.c > /bin/sh: /home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/bin/x86_64-conda_cos6-linux-gnu-gcc: No such file or directory > make: *** [Makefile:330: IndexedBase_14e0.o] Error 127 > > A problem was encountered while attempting to compile and install your Inline > C code. The command that failed was: > "make > out.make 2>&1" with error code 2 > > The build directory was: > /fast_new/work/users/fsarigo_m/scratch/tmp/urchin_holder/SeaUrchins/MAKER/_Inline/build/Bio/DB/IndexedBase_14e0 > > To debug the problem, cd to the build directory, and inspect the output files. > > Environment PATH = '/fast/users/fsarigo_m/miniconda3/envs/MakerX/bin:/fast/users/fsarigo_m/miniconda3/bin:/opt/sge/bin:/opt/sge/bin/lx-amd64:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/fast/users/fsarigo_m/.local/bin:/fast/users/fsarigo_m/bin' > at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/Bio/DB/IndexedBase.pm line 275. > --> rank=NA, hostname=med0223 > ...propagated at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/Inline/C.pm line 869. > --> rank=NA, hostname=med0223 > --> rank=NA, hostname=med0223 > --> rank=NA, hostname=med0223 > ERROR: Failed while examining contents of the fasta file and run log > ERROR: Chunk failed at level:0, tier_type:0 > FAILED CONTIG:contig-dpp-500-500 > > examining contents of the fasta file and run log > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Feb 13 10:14:13 2019 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 13 Feb 2019 10:14:13 -0700 Subject: [maker-devel] MAKER not calling RepeatMasker exe indicated in maker_exe.ctl In-Reply-To: References: Message-ID: <6AFF11A9-9860-4047-A337-4B974C6C0F30@gmail.com> The conda installation of RepeatMasker runs oddly. It does not appear to run the ./configure script during setup, and is missing files inside the repeat library as a result. --Carson > On Feb 4, 2019, at 2:00 AM, Lior Glick wrote: > > Dear MAKER users, > > I've been using MAKER for a while now, with RepeatMasker installed locally. By that I mean that I can type 'RepeatMasker' in my terminal and the software is initiated. Typing 'which RepeatMasker' shows the correct local path. > I also use this path as value for the maker_exe.ctl parameter 'RepeatMasker'. > Trying to generalize my working environment, I am trying to use a conda env which is capable of running MAKER. This env comes with RepeatMasker as well. Once I activate this env, I can still run RepeatMasker, but it points to a different path. When I run MAKER within this env, it fails right away with the error message: > ERROR: Could not determine if RepBase is installed > Running the same configuration files locally (i.e. outside the conda env) results in a successful run. > This leads me to think that MAKER is not actually using the path indicated in the maker_exe.ctl file, and rather looks for RepeatMasker in $PATH or something similar. Is that the expected behavior? Any suggestions of how to overcome this issue? > > Thanks and best regards, > Lior > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Feb 13 10:18:44 2019 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 13 Feb 2019 10:18:44 -0700 Subject: [maker-devel] Unknown (X) amino acids in predicted proteins In-Reply-To: References: Message-ID: <1472E55C-62CB-4A73-B45D-C4BEF3E014B7@gmail.com> If you use GFF3 as input, or use est2genome or protein2genome in your final run, you may have ?N? characters from the assembly as part of your CDS (?N? is the ambiguity code for DNA which will result in an ?X? when translated which is the ambiguity code for amino acids). Augustus will do internal gymnastics and completely splice out exons containing N?s to try and never have this issue, but may not always be able to. It?s an indication of genome assembly issues. --Carson > On Feb 11, 2019, at 7:12 AM, Lior Glick wrote: > > Dear MAKER users, > > After completing a MAKER run, I looked at the protein fasta files that MAKER outputs and noticed that a small fraction of the sequences include X characters, indicating unknown amino acids. I was wondering how such sequences are obtained, I mean how come there are unknown amino acids in the prediction? Is this an indication of low-quality predictions? > Is there any documentation regarding the procedure that generates the protein sequences? > > Thanks a lot, > Lior > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Wed Feb 13 10:24:01 2019 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 13 Feb 2019 10:24:01 -0700 Subject: [maker-devel] Re-annotation, fewer gene predictions In-Reply-To: <77517CF5-7C20-43FE-94EB-7F45A3D70A4F@live.com> References: <77517CF5-7C20-43FE-94EB-7F45A3D70A4F@live.com> Message-ID: One thing you can also do is use old models as protein= input and run the protein2genome option just to see where things align. You may find that not all old models are recoverable in the new assembly. Fewer genes in the new assembly may mean redundant/duplicate contigs were collapse and split contigs were joined resulting in multiple gene fragments becoming a unified single model. Make sure to always review contigs in a browser to see how models and evidence correlate. ?Carson > On Feb 3, 2019, at 12:13 PM, morgan sobol wrote: > > Hello, > > I previously used Maker to annotate two different fungal genomes that were created using Illumina sequences only. For these genomes, I had over 11,000 genes predicted. > I recently obtained PacBio sequences for the same genomes, so I created two hybrid assemblies. Both assemblies were very familiar in length and completed number of orthologs to the Illumina only assembly, but had much fewer, but longer contigs. > > I re-ran Maker using the settings below. For one of my genomes, I got around 11,000 genes predicted again, as expected. However, for the other genome, I am continuously getting ~4,400 predicted genes. > > I am asking for help as to how I can determine why I keep getting fewer predicted genes for only one of my genomes, even though I ran them the same? > > Thanks, > Morgan S. > > maker_opts.log > #-----Genome (these are always required) > genome=/work/Geomicrobiology/msobol/IODP_329_SPG/1368D2H1/repeatmasker/unicycler/1368D_unicycler_contigs.fasta.masked #genome sequence (fasta file or$ > organism_type=eukaryotic #eukaryotic or prokaryotic. Default is eukaryotic > > #-----Re-annotation Using MAKER Derived GFF3 > maker_gff=/work/Geomicrobiology/msobol/IODP_329_SPG/1368D2H1/maker/1368D_2H1_contigs.fasta.maker.output/1368D_2H1_contigs.fasta.all.gff #MAKER derive$ > est_pass=0 #use ESTs in maker_gff: 1 = yes, 0 = no > altest_pass=1 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no > protein_pass=1 #use protein alignments in maker_gff: 1 = yes, 0 = no > rm_pass=0 #use repeats in maker_gff: 1 = yes, 0 = no > model_pass=0 #use gene models in maker_gff: 1 = yes, 0 = no > pred_pass=0 #use ab-initio predictions in maker_gff: 1 = yes, 0 = no > other_pass=0 #passthrough anyything else in maker_gff: 1 = yes, 0 = no > > #-----EST Evidence (for best results provide a file for at least one) > est= #set of ESTs or assembled mRNA-seq in fasta format > altest= #EST/cDNA sequence file in fasta format from an alternate organism > est_gff= #aligned ESTs or mRNA-seq from an external GFF3 file > altest_gff= #aligned ESTs from a closly relate species in GFF3 format > > #-----Protein Homology Evidence (for best results provide a file for at least one) > protein=/work/Geomicrobiology/msobol/IODP_329_SPG/uniprot_sprot.fasta #protein sequence file in fasta format (i.e. from mutiple oransisms) > protein_gff= #aligned protein homology evidence from an external GFF3 file > > #-----Repeat Masking (leave values blank to skip repeat masking) > model_org= #select a model organism for RepBase masking in RepeatMasker > rmlib= #provide an organism specific repeat library in fasta format for RepeatMasker > repeat_protein= #provide a fasta file of transposable element proteins for RepeatRunner > rm_gff= #pre-identified repeat elements from an external GFF3 file > prok_rm=0 #forces MAKER to repeatmask prokaryotes (no reason to change this), 1 = yes, 0 = no > softmask=0 #use soft-masking rather than hard-masking in BLAST (i.e. seg and dust filtering) > > #-----Gene Prediction > snaphmm= #SNAP HMM file > gmhmm=/home/msobol/genemark/68D_2/output/gmhmm.mod #GeneMark HMM file > augustus_species=1368D_uni #Augustus gene prediction species model > fgenesh_par_file= #FGENESH parameter file > pred_gff= #ab-initio predictions from an external GFF3 file > model_gff= #annotated gene models from an external GFF3 file (annotation pass-through) > est2genome=0 #infer gene predictions directly from ESTs, 1 = yes, 0 = no > protein2genome=1 #infer predictions from protein homology, 1 = yes, 0 = no > trna=0 #find tRNAs with tRNAscan, 1 = yes, 0 = no > snoscan_rrna= #rRNA file to have Snoscan find snoRNAs > unmask=0 #also run ab-initio prediction programs on unmasked sequence, 1 = yes, 0 = no > > #-----Other Annotation Feature Types (features MAKER doesn't recognize) > other_gff= #extra features to pass-through to final MAKER generated GFF3 file > > #-----External Application Behavior Options > alt_peptide=C #amino acid used to replace non-standard amino acids in BLAST databases > cpus=1 #max number of cpus to use in BLAST and RepeatMasker (not for MPI, leave 1 when using MPI) > > #-----MAKER Behavior Options > max_dna_len=100000 #length for dividing up contigs into chunks (increases/decreases memory usage) > min_contig=1 #skip genome contigs below this length (under 10kb are often useless) > > pred_flank=200 #flank for extending evidence clusters sent to gene predictors > pred_stats=1 #report AED and QI statistics for all predictions as well as models > AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and 1) > min_protein=0 #require at least this many amino acids in predicted proteins > alt_splice=0 #Take extra steps to try and find alternative splicing, 1 = yes, 0 = no > always_complete=0 #extra steps to force start and stop codons, 1 = yes, 0 = no > map_forward=0 #map names and attributes forward from old GFF3 genes, 1 = yes, 0 = no > keep_preds=1 #Concordance threshold to add unsupported gene prediction (bound by 0 and 1) > > split_hit=10000 #length for the splitting of hits (expected max intron size for evidence alignments) > single_exon=1 #consider single exon EST evidence when generating annotations, 1 = yes, 0 = no > single_length=250 #min length required for single exon ESTs if 'single_exon is enabled' > correct_est_fusion=0 #limits use of ESTs in annotation to avoid fusion genes > > tries=2 #number of times to try a contig if there is a failure for some reason > clean_try=0 #remove all data from previous run before retrying, 1 = yes, 0 = no > clean_up=0 #removes theVoid directory with individual analysis files, 1 = yes, 0 = no > TMP= #specify a directory other than the system default temporary directory for temporary files > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From liorglck at gmail.com Sun Feb 17 11:50:10 2019 From: liorglck at gmail.com (Lior Glick) Date: Sun, 17 Feb 2019 20:50:10 +0200 Subject: [maker-devel] Does Conda Maker actually work? In-Reply-To: <0A81593F-EB19-417F-9C9D-3C55178F5D0F@gmail.com> References: <0A81593F-EB19-417F-9C9D-3C55178F5D0F@gmail.com> Message-ID: That's good to know. Any plans on creating a stable conda package in the future? It'd be a very nice feature, especially since MAKER is not always straightforward to install. On Wed, Feb 13, 2019 at 5:22 PM Carson Holt wrote: > The conda recipe was produced by another group. I do not currently > recommend using it because I have seen a number of issues pop up on the > list based on people attempting to install MAKER via conda. I know there > is at least an issue with the conda RepeatMasker install, and there may be > others. The specific failure you show is from Bio::DB::IndexedBase trying > to compile an Inline::C function. It may be that conda is installing an > older BioPerl where this issue still exists ?> > https://github.com/bioperl/bioperl-live/issues/215 > > Or it may be that there is a new related issue (I?ve seen a handful of > other examples that seem to relate back to Bio::DB::IndexedBase) ?> > https://github.com/bioperl/bioperl-live/issues/305 > > Try installing MAKER without conda (make sure to remove any components > that are in conda first to avoid conflicts). > > ?Carson > > > On Feb 13, 2019, at 5:20 AM, SARIGOEL, FATIH > wrote: > > Greetings, > I notice that you never mention conda installation on your website, so I > am curious if the conda version is actually supposed to be working fine or > not; as for me it didn't. > I created a new conda environment and installed Maker (tried this with > both installation options) > When I run the example files, I get this error: > > "make: *** [Makefile:330: IndexedBase_14e0.o] Error 127 > A problem was encountered while attempting to compile and install your > Inline > C code. The command that failed was: > "make > out.make 2>&1" with error code 2" > > My conda environment is here > /fast_new/work/users/fsarigo_m/miniconda3 > I don't understand why the program is trying to look here: > /home/conda > which does not exist > > Also begins with a "possible precedence issue" > > Thanks for your help in advance! > Fatih > > +++++ > > Here is the full log until the end of the contig: > > (MakerX) [fsarigo_m at med0223 MAKER]$ maker > Possible precedence issue with control flow operator at > /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/Bio/DB/IndexedBase.pm > line 845. > STATUS: Parsing control files... > STATUS: Processing and indexing input FASTA files... > STATUS: Setting up database for any GFF3 input... > A data structure will be created for you at: > > /fast_new/work/users/fsarigo_m/scratch/tmp/urchin_holder/SeaUrchins/MAKER/dpp_contig.maker.output/dpp_contig_datastore > > To access files for individual sequences use the datastore index: > > /fast_new/work/users/fsarigo_m/scratch/tmp/urchin_holder/SeaUrchins/MAKER/dpp_contig.maker.output/dpp_contig_master_datastore_index.log > > STATUS: Now running MAKER... > examining contents of the fasta file and run log > > > > --Next Contig-- > > Processing run.log file... > #--------------------------------------------------------------------- > Now starting the contig!! > SeqID: contig-dpp-500-500 > Length: 32156 > #--------------------------------------------------------------------- > > > Running Mkbootstrap for IndexedBase_14e0 () > chmod 644 "IndexedBase_14e0.bs" > "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/perl" > -MExtUtils::Command::MM -e 'cp_nonempty' -- IndexedBase_14e0.bs > blib/arch/auto/Bio/DB/IndexedBase_14e0/IndexedBase_14e0.bs 644 > "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/perl" > "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/ExtUtils/xsubpp" > -typemap > "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/5.26.2/ExtUtils/typemap" > IndexedBase_14e0.xs > IndexedBase_14e0.xsc > mv IndexedBase_14e0.xsc IndexedBase_14e0.c > /home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/bin/x86_64-conda_cos6-linux-gnu-gcc > -c -I"/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin" > -D_REENTRANT -D_GNU_SOURCE > --sysroot=/home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/x86_64-conda_cos6-linux-gnu/sysroot > -fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong > -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D_FORTIFY_SOURCE=2 -O2 > -DVERSION=\"0.00\" -DXS_VERSION=\"0.00\" -fPIC > --sysroot=/home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/x86_64-conda_cos6-linux-gnu/sysroot > "-I/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/5.26.2/x86_64-linux-thread-multi/CORE" > IndexedBase_14e0.c > /bin/sh: > /home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/bin/x86_64-conda_cos6-linux-gnu-gcc: > No such file or directory > make: *** [Makefile:330: IndexedBase_14e0.o] Error 127 > > A problem was encountered while attempting to compile and install your > Inline > C code. The command that failed was: > "make > out.make 2>&1" with error code 2 > > The build directory was: > > /fast_new/work/users/fsarigo_m/scratch/tmp/urchin_holder/SeaUrchins/MAKER/_Inline/build/Bio/DB/IndexedBase_14e0 > > To debug the problem, cd to the build directory, and inspect the output > files. > > Environment PATH = > '/fast/users/fsarigo_m/miniconda3/envs/MakerX/bin:/fast/users/fsarigo_m/miniconda3/bin:/opt/sge/bin:/opt/sge/bin/lx-amd64:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/fast/users/fsarigo_m/.local/bin:/fast/users/fsarigo_m/bin' > at > /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/Bio/DB/IndexedBase.pm > line 275. > --> rank=NA, hostname=med0223 > ...propagated at > /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/Inline/C.pm > line 869. > --> rank=NA, hostname=med0223 > at > /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Error.pm > line 38. > Error::_throw_Error_Simple(HASH(0x564b40c78870)) called at > /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Error.pm > line 306 > Error::subs::run_clauses(HASH(0x564b40688970), "Running Mkbootstrap for > IndexedBase_14e0 ()\x{a}chmod 644 \"Indexe"..., undef, > ARRAY(0x564b40673ad0)) called at > /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Error.pm > line 426 > Error::subs::try(CODE(0x564b406899b8), HASH(0x564b40688970)) called at > /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/FastaSeq.pm > line 95 > FastaSeq::seq(FastaSeq=HASH(0x564b4068a7f0)) called at > /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Process/MpiChunk.pm > line 478 > Process::MpiChunk::_go(Process::MpiChunk=HASH(0x564b40673c08), "run", > HASH(0x564b40673c80), 0, 0) called at > /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Process/MpiChunk.pm > line 341 > Process::MpiChunk::run(Process::MpiChunk=HASH(0x564b40673c08), 0) called > at > /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Process/MpiChunk.pm > line 357 > Process::MpiChunk::run_all(Process::MpiChunk=HASH(0x564b40673c08), 0) > called at > /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Process/MpiTiers.pm > line 287 > Process::MpiTiers::run_all(Process::MpiTiers=HASH(0x564b4053f9f0), 0) > called at /fast/users/fsarigo_m/miniconda3/envs/MakerX/bin/maker line 683 > Running Mkbootstrap for IndexedBase_14e0 () > chmod 644 "IndexedBase_14e0.bs" > "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/perl" > -MExtUtils::Command::MM -e 'cp_nonempty' -- IndexedBase_14e0.bs > blib/arch/auto/Bio/DB/IndexedBase_14e0/IndexedBase_14e0.bs 644 > "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/perl" > "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/ExtUtils/xsubpp" > -typemap > "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/5.26.2/ExtUtils/typemap" > IndexedBase_14e0.xs > IndexedBase_14e0.xsc > mv IndexedBase_14e0.xsc IndexedBase_14e0.c > /home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/bin/x86_64-conda_cos6-linux-gnu-gcc > -c -I"/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin" > -D_REENTRANT -D_GNU_SOURCE > --sysroot=/home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/x86_64-conda_cos6-linux-gnu/sysroot > -fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong > -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D_FORTIFY_SOURCE=2 -O2 > -DVERSION=\"0.00\" -DXS_VERSION=\"0.00\" -fPIC > --sysroot=/home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/x86_64-conda_cos6-linux-gnu/sysroot > "-I/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/5.26.2/x86_64-linux-thread-multi/CORE" > IndexedBase_14e0.c > /bin/sh: > /home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/bin/x86_64-conda_cos6-linux-gnu-gcc: > No such file or directory > make: *** [Makefile:330: IndexedBase_14e0.o] Error 127 > > A problem was encountered while attempting to compile and install your > Inline > C code. The command that failed was: > "make > out.make 2>&1" with error code 2 > > The build directory was: > > /fast_new/work/users/fsarigo_m/scratch/tmp/urchin_holder/SeaUrchins/MAKER/_Inline/build/Bio/DB/IndexedBase_14e0 > > To debug the problem, cd to the build directory, and inspect the output > files. > > Environment PATH = > '/fast/users/fsarigo_m/miniconda3/envs/MakerX/bin:/fast/users/fsarigo_m/miniconda3/bin:/opt/sge/bin:/opt/sge/bin/lx-amd64:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/fast/users/fsarigo_m/.local/bin:/fast/users/fsarigo_m/bin' > at > /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/Bio/DB/IndexedBase.pm > line 275. > --> rank=NA, hostname=med0223 > ...propagated at > /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/Inline/C.pm > line 869. > --> rank=NA, hostname=med0223 > --> rank=NA, hostname=med0223 > --> rank=NA, hostname=med0223 > ERROR: Failed while examining contents of the fasta file and run log > ERROR: Chunk failed at level:0, tier_type:0 > FAILED CONTIG:contig-dpp-500-500 > > examining contents of the fasta file and run log > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From morgan_starr_s at live.com Mon Feb 18 02:08:56 2019 From: morgan_starr_s at live.com (morgan sobol) Date: Mon, 18 Feb 2019 09:08:56 +0000 Subject: [maker-devel] Re-annotation, fewer gene predictions In-Reply-To: References: <77517CF5-7C20-43FE-94EB-7F45A3D70A4F@live.com> , Message-ID: Thank you, Xabi and Carson. With your help, I was able to improve the annotation with a more appropriate number of predictions. Best, Morgan ________________________________ From: Xabier V?zquez-Campos Sent: Wednesday, February 6, 2019 11:33 PM To: morgan sobol; Maker Mailing List Subject: Re: [maker-devel] Re-annotation, fewer gene predictions SNAP is easy to train, works well in fungal genomes and it's explained in Maker's wiki: http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial_for_WGS_Assembly_and_Annotation_Winter_School_2018#Training_ab_initio_Gene_Predictors Oh, sorry, I didn't explain myself well. What I was trying to say is that before BUSCO, when we only had CEGMA, we would proceed in a different way to train Augustus as CEGMA wouldn't produce Augustus gene models automatically. I don't mean you to use CEGMA. This is what I have on my own documentation about how to train Augustus "the old way" AUGUSTUS? the old way Alternatively, you can train AUGUSTUS in a more ?manual? way, like when we were using CEGMA. The training starts with the output from the second instance of fathom in the SNAP training section. cd ${MYGENOME_DIR}/maker/snap1 perl ~/bin/zff2augustus_gbk.pl > ${MYGENOME}.train1.gb zff2augustus_gbk.pl generates a GenBank file from export.dna. The actual training of AUGUSTUS will be through the webAUGUSTUS server. Before proceed, it is recommended to rename the fasta headers, specially if they contain special characters and/or very long headers. This is the main reason of failure for the jobs submitted to webAUGUSTUS. You can use the simplifyFastaHeaders.pl script for that: perl ~/bin/simplifyFastaHeaders.pl ${MYGENOME}_assembly.fasta nameStem ${MYGENOME}_contigs_rename.fasta ${MYGENOME}_contigs.map perl ~/bin/simplifyFastaHeaders.pl ${MYGENOME}_transcripts_assembled.fasta nameStem ${MYGENOME}_rna_rename.fasta ${MYGENOME}_rna.map nameStem is the base name for naming each of the sequences in the multifasta files. Use a value with something appropriate. Use contig and rna for the assembly and RNA-seq files, respectively; or something based on that. For example, ?pgcontig? and ?pgrna? for contigs and RNA from Puccinia graminis DO NOT give the same nameStem to both fasta files, and don?t use any special character. We need the following files (minimum): * ${MYGENOME}_assembly.fasta as Genome file * ${MYGENOME}.train1.gb as Training gene structure file If we also have RNA-seq data: * ${MYGENOME}_assembled_transcripts.fasta as cDNA file Use ${MYGENOME}_v1 as Species name. We will need to have a different species name in the retraining step. Otherwise when Maker2 is rerun, Maker2 will see the same name and will not rerun AUGUSTUS, even though the species profile is different. So, ${MYGENOME}_v1 just do the job and tracks version. Once the job is finished, the Species parameter archive (parameters.tar.gz) will contain a folder with the model files for your species. Copy it to the species folder of your AUGUSTUS installation. Hope this helps PS: hit reply all so this is logged in Maker's mail list in case anybody else experiences similar issues On Thu, 7 Feb 2019 at 06:36, morgan sobol > wrote: I have not used SNAP or CEGMA, however, I see that CEGMA was discontinued in 2015. Do you think that will be a problem, or is it still worth using the old version? ________________________________ From: Xabier V?zquez-Campos > Sent: Tuesday, February 5, 2019 4:42 PM To: morgan sobol; Maker Mailing List Subject: Re: [maker-devel] Re-annotation, fewer gene predictions Don't you use SNAP? It usually produces quite decent results. And easier to train than any of the other predictors In any case, the Augustus gene model is way off in both cases GM doesn't seem bad if your fungus has a rather usual genome... in the first. For the second, it looks bad I'm not too familiar with the reannotation but I'd rather create the gene models from scratch rather than reuse the ones from the Illumina-only genomes. Note that assemblies with long-reads, have a higher proportion of repetitive elements that need masking and RepeatMasker only may not be enough. In theory, this shouldn't affect Augustus model if trained through BUSCO as it uses defined conserved markers to create the gene model, but I'm not so sure about GM. If you trained Augustus with BUSCO, and this is the result, I'd discard the gene model and train it again by the "traditional way", i.e. as it used to be when we only had CEGMA. I had good results just by changing the training method. Hope it helps, Xabi On Wed, 6 Feb 2019 at 02:19, morgan sobol > wrote: Thank you, Xabi for the response. The number of proteins from each source is greatly lower than before. Previous numbers were 325, 10,899, and 11,243 for augustus, genemark, and maker respectively. The more recent numbers are 25, 857, 4418 respectively. So do you think maybe this hints that something is wrong from genemark? Morgan ________________________________ From: Xabier V?zquez-Campos > Sent: Sunday, February 3, 2019 4:43 PM To: morgan sobol Cc: maker-devel at yandell-lab.org Subject: Re: [maker-devel] Re-annotation, fewer gene predictions Hi Morgan, We had a similar issue with AUGUSTUS underpredicting when using a BUSCO-derived gene model https://groups.google.com/d/msg/maker-devel/ocnDG4nq1A8/NyCPzzRgAgAJ Also, check the number of proteins by each individual predictor. If the numbers from one of them are off, you may find a possible source of issues. We didn't have a very good experience with GM, as it used to overpredict an absurd number of proteins. Xabi On Mon, 4 Feb 2019 at 06:15, morgan sobol > wrote: Hello, I previously used Maker to annotate two different fungal genomes that were created using Illumina sequences only. For these genomes, I had over 11,000 genes predicted. I recently obtained PacBio sequences for the same genomes, so I created two hybrid assemblies. Both assemblies were very familiar in length and completed number of orthologs to the Illumina only assembly, but had much fewer, but longer contigs. I re-ran Maker using the settings below. For one of my genomes, I got around 11,000 genes predicted again, as expected. However, for the other genome, I am continuously getting ~4,400 predicted genes. I am asking for help as to how I can determine why I keep getting fewer predicted genes for only one of my genomes, even though I ran them the same? Thanks, Morgan S. maker_opts.log #-----Genome (these are always required) genome=/work/Geomicrobiology/msobol/IODP_329_SPG/1368D2H1/repeatmasker/unicycler/1368D_unicycler_contigs.fasta.masked #genome sequence (fasta file or$ organism_type=eukaryotic #eukaryotic or prokaryotic. Default is eukaryotic #-----Re-annotation Using MAKER Derived GFF3 maker_gff=/work/Geomicrobiology/msobol/IODP_329_SPG/1368D2H1/maker/1368D_2H1_contigs.fasta.maker.output/1368D_2H1_contigs.fasta.all.gff #MAKER derive$ est_pass=0 #use ESTs in maker_gff: 1 = yes, 0 = no altest_pass=1 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no protein_pass=1 #use protein alignments in maker_gff: 1 = yes, 0 = no rm_pass=0 #use repeats in maker_gff: 1 = yes, 0 = no model_pass=0 #use gene models in maker_gff: 1 = yes, 0 = no pred_pass=0 #use ab-initio predictions in maker_gff: 1 = yes, 0 = no other_pass=0 #passthrough anyything else in maker_gff: 1 = yes, 0 = no #-----EST Evidence (for best results provide a file for at least one) est= #set of ESTs or assembled mRNA-seq in fasta format altest= #EST/cDNA sequence file in fasta format from an alternate organism est_gff= #aligned ESTs or mRNA-seq from an external GFF3 file altest_gff= #aligned ESTs from a closly relate species in GFF3 format #-----Protein Homology Evidence (for best results provide a file for at least one) protein=/work/Geomicrobiology/msobol/IODP_329_SPG/uniprot_sprot.fasta #protein sequence file in fasta format (i.e. from mutiple oransisms) protein_gff= #aligned protein homology evidence from an external GFF3 file #-----Repeat Masking (leave values blank to skip repeat masking) model_org= #select a model organism for RepBase masking in RepeatMasker rmlib= #provide an organism specific repeat library in fasta format for RepeatMasker repeat_protein= #provide a fasta file of transposable element proteins for RepeatRunner rm_gff= #pre-identified repeat elements from an external GFF3 file prok_rm=0 #forces MAKER to repeatmask prokaryotes (no reason to change this), 1 = yes, 0 = no softmask=0 #use soft-masking rather than hard-masking in BLAST (i.e. seg and dust filtering) #-----Gene Prediction snaphmm= #SNAP HMM file gmhmm=/home/msobol/genemark/68D_2/output/gmhmm.mod #GeneMark HMM file augustus_species=1368D_uni #Augustus gene prediction species model fgenesh_par_file= #FGENESH parameter file pred_gff= #ab-initio predictions from an external GFF3 file model_gff= #annotated gene models from an external GFF3 file (annotation pass-through) est2genome=0 #infer gene predictions directly from ESTs, 1 = yes, 0 = no protein2genome=1 #infer predictions from protein homology, 1 = yes, 0 = no trna=0 #find tRNAs with tRNAscan, 1 = yes, 0 = no snoscan_rrna= #rRNA file to have Snoscan find snoRNAs unmask=0 #also run ab-initio prediction programs on unmasked sequence, 1 = yes, 0 = no #-----Other Annotation Feature Types (features MAKER doesn't recognize) other_gff= #extra features to pass-through to final MAKER generated GFF3 file #-----External Application Behavior Options alt_peptide=C #amino acid used to replace non-standard amino acids in BLAST databases cpus=1 #max number of cpus to use in BLAST and RepeatMasker (not for MPI, leave 1 when using MPI) #-----MAKER Behavior Options max_dna_len=100000 #length for dividing up contigs into chunks (increases/decreases memory usage) min_contig=1 #skip genome contigs below this length (under 10kb are often useless) pred_flank=200 #flank for extending evidence clusters sent to gene predictors pred_stats=1 #report AED and QI statistics for all predictions as well as models AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and 1) min_protein=0 #require at least this many amino acids in predicted proteins alt_splice=0 #Take extra steps to try and find alternative splicing, 1 = yes, 0 = no always_complete=0 #extra steps to force start and stop codons, 1 = yes, 0 = no map_forward=0 #map names and attributes forward from old GFF3 genes, 1 = yes, 0 = no keep_preds=1 #Concordance threshold to add unsupported gene prediction (bound by 0 and 1) split_hit=10000 #length for the splitting of hits (expected max intron size for evidence alignments) single_exon=1 #consider single exon EST evidence when generating annotations, 1 = yes, 0 = no single_length=250 #min length required for single exon ESTs if 'single_exon is enabled' correct_est_fusion=0 #limits use of ESTs in annotation to avoid fusion genes tries=2 #number of times to try a contig if there is a failure for some reason clean_try=0 #remove all data from previous run before retrying, 1 = yes, 0 = no clean_up=0 #removes theVoid directory with individual analysis files, 1 = yes, 0 = no TMP= #specify a directory other than the system default temporary directory for temporary files _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -- Xabier V?zquez-Campos, PhD Research Associate NSW Systems Biology Initiative School of Biotechnology and Biomolecular Sciences The University of New South Wales Sydney NSW 2052 AUSTRALIA -- Xabier V?zquez-Campos, PhD Research Associate NSW Systems Biology Initiative School of Biotechnology and Biomolecular Sciences The University of New South Wales Sydney NSW 2052 AUSTRALIA -- Xabier V?zquez-Campos, PhD Research Associate NSW Systems Biology Initiative School of Biotechnology and Biomolecular Sciences The University of New South Wales Sydney NSW 2052 AUSTRALIA -------------- next part -------------- An HTML attachment was scrubbed... URL: From anthony.bretaudeau at inria.fr Mon Feb 18 02:53:39 2019 From: anthony.bretaudeau at inria.fr (Anthony Bretaudeau) Date: Mon, 18 Feb 2019 10:53:39 +0100 Subject: [maker-devel] Does Conda Maker actually work? In-Reply-To: References: <0A81593F-EB19-417F-9C9D-3C55178F5D0F@gmail.com> Message-ID: <3aa1eb97-f8bf-dd61-febf-464ad4b1626c@inria.fr> An HTML attachment was scrubbed... URL: From liorglic at mail.tau.ac.il Sun Feb 24 05:50:49 2019 From: liorglic at mail.tau.ac.il (Lior Glick) Date: Sun, 24 Feb 2019 14:50:49 +0200 Subject: [maker-devel] Profiling MAKER runs Message-ID: Dear MAKER users, I was wondering if any of you has an idea of a way by which I can profile my runs. What I mean is I'd like to know how much time was spent on each step of the analysis - am I spending most of the time masking repeats, blasting transcripts/proteins, running ab-initio predictors etc. Based on this information, I might want to adjust my configuration, e.g. maybe I'm spending a lot of time blasting transcripts, and reducing the number of input transcripts would reduce run time significantly without having a major effect on results quality. As far as I can see, the main run log does not provide such information, and I'm not sure where else to look. Any ideas or directions could be of help. Thanks! Lior -------------- next part -------------- An HTML attachment was scrubbed... URL: From morgan_starr_s at live.com Sun Feb 3 12:13:47 2019 From: morgan_starr_s at live.com (morgan sobol) Date: Sun, 3 Feb 2019 19:13:47 +0000 Subject: [maker-devel] Re-annotation, fewer gene predictions Message-ID: <77517CF5-7C20-43FE-94EB-7F45A3D70A4F@live.com> Hello, I previously used Maker to annotate two different fungal genomes that were created using Illumina sequences only. For these genomes, I had over 11,000 genes predicted. I recently obtained PacBio sequences for the same genomes, so I created two hybrid assemblies. Both assemblies were very familiar in length and completed number of orthologs to the Illumina only assembly, but had much fewer, but longer contigs. I re-ran Maker using the settings below. For one of my genomes, I got around 11,000 genes predicted again, as expected. However, for the other genome, I am continuously getting ~4,400 predicted genes. I am asking for help as to how I can determine why I keep getting fewer predicted genes for only one of my genomes, even though I ran them the same? Thanks, Morgan S. maker_opts.log #-----Genome (these are always required) genome=/work/Geomicrobiology/msobol/IODP_329_SPG/1368D2H1/repeatmasker/unicycler/1368D_unicycler_contigs.fasta.masked #genome sequence (fasta file or$ organism_type=eukaryotic #eukaryotic or prokaryotic. Default is eukaryotic #-----Re-annotation Using MAKER Derived GFF3 maker_gff=/work/Geomicrobiology/msobol/IODP_329_SPG/1368D2H1/maker/1368D_2H1_contigs.fasta.maker.output/1368D_2H1_contigs.fasta.all.gff #MAKER derive$ est_pass=0 #use ESTs in maker_gff: 1 = yes, 0 = no altest_pass=1 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no protein_pass=1 #use protein alignments in maker_gff: 1 = yes, 0 = no rm_pass=0 #use repeats in maker_gff: 1 = yes, 0 = no model_pass=0 #use gene models in maker_gff: 1 = yes, 0 = no pred_pass=0 #use ab-initio predictions in maker_gff: 1 = yes, 0 = no other_pass=0 #passthrough anyything else in maker_gff: 1 = yes, 0 = no #-----EST Evidence (for best results provide a file for at least one) est= #set of ESTs or assembled mRNA-seq in fasta format altest= #EST/cDNA sequence file in fasta format from an alternate organism est_gff= #aligned ESTs or mRNA-seq from an external GFF3 file altest_gff= #aligned ESTs from a closly relate species in GFF3 format #-----Protein Homology Evidence (for best results provide a file for at least one) protein=/work/Geomicrobiology/msobol/IODP_329_SPG/uniprot_sprot.fasta #protein sequence file in fasta format (i.e. from mutiple oransisms) protein_gff= #aligned protein homology evidence from an external GFF3 file #-----Repeat Masking (leave values blank to skip repeat masking) model_org= #select a model organism for RepBase masking in RepeatMasker rmlib= #provide an organism specific repeat library in fasta format for RepeatMasker repeat_protein= #provide a fasta file of transposable element proteins for RepeatRunner rm_gff= #pre-identified repeat elements from an external GFF3 file prok_rm=0 #forces MAKER to repeatmask prokaryotes (no reason to change this), 1 = yes, 0 = no softmask=0 #use soft-masking rather than hard-masking in BLAST (i.e. seg and dust filtering) #-----Gene Prediction snaphmm= #SNAP HMM file gmhmm=/home/msobol/genemark/68D_2/output/gmhmm.mod #GeneMark HMM file augustus_species=1368D_uni #Augustus gene prediction species model fgenesh_par_file= #FGENESH parameter file pred_gff= #ab-initio predictions from an external GFF3 file model_gff= #annotated gene models from an external GFF3 file (annotation pass-through) est2genome=0 #infer gene predictions directly from ESTs, 1 = yes, 0 = no protein2genome=1 #infer predictions from protein homology, 1 = yes, 0 = no trna=0 #find tRNAs with tRNAscan, 1 = yes, 0 = no snoscan_rrna= #rRNA file to have Snoscan find snoRNAs unmask=0 #also run ab-initio prediction programs on unmasked sequence, 1 = yes, 0 = no #-----Other Annotation Feature Types (features MAKER doesn't recognize) other_gff= #extra features to pass-through to final MAKER generated GFF3 file #-----External Application Behavior Options alt_peptide=C #amino acid used to replace non-standard amino acids in BLAST databases cpus=1 #max number of cpus to use in BLAST and RepeatMasker (not for MPI, leave 1 when using MPI) #-----MAKER Behavior Options max_dna_len=100000 #length for dividing up contigs into chunks (increases/decreases memory usage) min_contig=1 #skip genome contigs below this length (under 10kb are often useless) pred_flank=200 #flank for extending evidence clusters sent to gene predictors pred_stats=1 #report AED and QI statistics for all predictions as well as models AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and 1) min_protein=0 #require at least this many amino acids in predicted proteins alt_splice=0 #Take extra steps to try and find alternative splicing, 1 = yes, 0 = no always_complete=0 #extra steps to force start and stop codons, 1 = yes, 0 = no map_forward=0 #map names and attributes forward from old GFF3 genes, 1 = yes, 0 = no keep_preds=1 #Concordance threshold to add unsupported gene prediction (bound by 0 and 1) split_hit=10000 #length for the splitting of hits (expected max intron size for evidence alignments) single_exon=1 #consider single exon EST evidence when generating annotations, 1 = yes, 0 = no single_length=250 #min length required for single exon ESTs if 'single_exon is enabled' correct_est_fusion=0 #limits use of ESTs in annotation to avoid fusion genes tries=2 #number of times to try a contig if there is a failure for some reason clean_try=0 #remove all data from previous run before retrying, 1 = yes, 0 = no clean_up=0 #removes theVoid directory with individual analysis files, 1 = yes, 0 = no TMP= #specify a directory other than the system default temporary directory for temporary files -------------- next part -------------- An HTML attachment was scrubbed... URL: From xvazquezc at gmail.com Sun Feb 3 15:43:42 2019 From: xvazquezc at gmail.com (=?UTF-8?Q?Xabier_V=C3=A1zquez=2DCampos?=) Date: Mon, 4 Feb 2019 09:43:42 +1100 Subject: [maker-devel] Re-annotation, fewer gene predictions In-Reply-To: <77517CF5-7C20-43FE-94EB-7F45A3D70A4F@live.com> References: <77517CF5-7C20-43FE-94EB-7F45A3D70A4F@live.com> Message-ID: Hi Morgan, We had a similar issue with AUGUSTUS underpredicting when using a BUSCO-derived gene model https://groups.google.com/d/msg/maker-devel/ocnDG4nq1A8/NyCPzzRgAgAJ Also, check the number of proteins by each individual predictor. If the numbers from one of them are off, you may find a possible source of issues. We didn't have a very good experience with GM, as it used to overpredict an absurd number of proteins. Xabi On Mon, 4 Feb 2019 at 06:15, morgan sobol wrote: > Hello, > > I previously used Maker to annotate two different fungal genomes that were > created using Illumina sequences only. For these genomes, I had over 11,000 > genes predicted. > I recently obtained PacBio sequences for the same genomes, so I created > two hybrid assemblies. Both assemblies were very familiar in length and > completed number of orthologs to the Illumina only assembly, but had much > fewer, but longer contigs. > > I re-ran Maker using the settings below. For one of my genomes, I got > around 11,000 genes predicted again, as expected. However, for the other > genome, I am continuously getting ~4,400 predicted genes. > > I am asking for help as to how I can determine why I keep getting fewer > predicted genes for only one of my genomes, even though I ran them the same? > > Thanks, > Morgan S. > > maker_opts.log > #-----Genome (these are always required) > genome=/work/Geomicrobiology/msobol/IODP_329_SPG/1368D2H1/repeatmasker/unicycler/1368D_unicycler_contigs.fasta.masked > #genome sequence (fasta file or$ > organism_type=eukaryotic #eukaryotic or prokaryotic. Default is eukaryotic > > #-----Re-annotation Using MAKER Derived GFF3 > maker_gff=/work/Geomicrobiology/msobol/IODP_329_SPG/1368D2H1/maker/1368D_2H1_contigs.fasta.maker.output/1368D_2H1_contigs.fasta.all.gff > #MAKER derive$ > est_pass=0 #use ESTs in maker_gff: 1 = yes, 0 = no > altest_pass=1 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no > protein_pass=1 #use protein alignments in maker_gff: 1 = yes, 0 = no > rm_pass=0 #use repeats in maker_gff: 1 = yes, 0 = no > model_pass=0 #use gene models in maker_gff: 1 = yes, 0 = no > pred_pass=0 #use ab-initio predictions in maker_gff: 1 = yes, 0 = no > other_pass=0 #passthrough anyything else in maker_gff: 1 = yes, 0 = no > > #-----EST Evidence (for best results provide a file for at least one) > est= #set of ESTs or assembled mRNA-seq in fasta format > altest= #EST/cDNA sequence file in fasta format from an alternate organism > est_gff= #aligned ESTs or mRNA-seq from an external GFF3 file > altest_gff= #aligned ESTs from a closly relate species in GFF3 format > > #-----Protein Homology Evidence (for best results provide a file for at > least one) > protein=/work/Geomicrobiology/msobol/IODP_329_SPG/uniprot_sprot.fasta > #protein sequence file in fasta format (i.e. from mutiple oransisms) > protein_gff= #aligned protein homology evidence from an external GFF3 file > > #-----Repeat Masking (leave values blank to skip repeat masking) > model_org= #select a model organism for RepBase masking in RepeatMasker > rmlib= #provide an organism specific repeat library in fasta format for > RepeatMasker > repeat_protein= #provide a fasta file of transposable element proteins for > RepeatRunner > rm_gff= #pre-identified repeat elements from an external GFF3 file > prok_rm=0 #forces MAKER to repeatmask prokaryotes (no reason to change > this), 1 = yes, 0 = no > softmask=0 #use soft-masking rather than hard-masking in BLAST (i.e. seg > and dust filtering) > > #-----Gene Prediction > snaphmm= #SNAP HMM file > gmhmm=/home/msobol/genemark/68D_2/output/gmhmm.mod #GeneMark HMM file > augustus_species=1368D_uni #Augustus gene prediction species model > fgenesh_par_file= #FGENESH parameter file > pred_gff= #ab-initio predictions from an external GFF3 file > model_gff= #annotated gene models from an external GFF3 file (annotation > pass-through) > est2genome=0 #infer gene predictions directly from ESTs, 1 = yes, 0 = no > protein2genome=1 #infer predictions from protein homology, 1 = yes, 0 = no > trna=0 #find tRNAs with tRNAscan, 1 = yes, 0 = no > snoscan_rrna= #rRNA file to have Snoscan find snoRNAs > unmask=0 #also run ab-initio prediction programs on unmasked sequence, 1 = > yes, 0 = no > > #-----Other Annotation Feature Types (features MAKER doesn't recognize) > other_gff= #extra features to pass-through to final MAKER generated GFF3 > file > > #-----External Application Behavior Options > alt_peptide=C #amino acid used to replace non-standard amino acids in > BLAST databases > cpus=1 #max number of cpus to use in BLAST and RepeatMasker (not for MPI, > leave 1 when using MPI) > > #-----MAKER Behavior Options > max_dna_len=100000 #length for dividing up contigs into chunks > (increases/decreases memory usage) > min_contig=1 #skip genome contigs below this length (under 10kb are often > useless) > > pred_flank=200 #flank for extending evidence clusters sent to gene > predictors > pred_stats=1 #report AED and QI statistics for all predictions as well as > models > AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and > 1) > min_protein=0 #require at least this many amino acids in predicted proteins > alt_splice=0 #Take extra steps to try and find alternative splicing, 1 = > yes, 0 = no > always_complete=0 #extra steps to force start and stop codons, 1 = yes, 0 > = no > map_forward=0 #map names and attributes forward from old GFF3 genes, 1 = > yes, 0 = no > keep_preds=1 #Concordance threshold to add unsupported gene prediction > (bound by 0 and 1) > > split_hit=10000 #length for the splitting of hits (expected max intron > size for evidence alignments) > single_exon=1 #consider single exon EST evidence when generating > annotations, 1 = yes, 0 = no > single_length=250 #min length required for single exon ESTs if > 'single_exon is enabled' > correct_est_fusion=0 #limits use of ESTs in annotation to avoid fusion > genes > > tries=2 #number of times to try a contig if there is a failure for some > reason > clean_try=0 #remove all data from previous run before retrying, 1 = yes, 0 > = no > clean_up=0 #removes theVoid directory with individual analysis files, 1 = > yes, 0 = no > TMP= #specify a directory other than the system default temporary > directory for temporary files > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -- Xabier V?zquez-Campos, *PhD* *Research Associate* NSW Systems Biology Initiative School of Biotechnology and Biomolecular Sciences The University of New South Wales Sydney NSW 2052 AUSTRALIA -------------- next part -------------- An HTML attachment was scrubbed... URL: From keith.decker at bayer.com Mon Feb 4 11:09:35 2019 From: keith.decker at bayer.com (DECKER, KEITH F [AG/1005]) Date: Mon, 4 Feb 2019 18:09:35 +0000 Subject: [maker-devel] MAKER on AWS Message-ID: <4660790F-38F4-470B-8B8E-9911A0BC36C3@contoso.com> I would like to evaluate the use of MAKER on AWS, but I am unsure what the best approach to parallelization would be. I found this old post on STARCLUSTER, http://efish.integrativebiology.msu.edu/2015/02/10/annotate.html but my understanding is that STARCLUSTER and its successors (cfncluster and parallel cluster) can be challenging to set up and use. So my questions are 1. Has anyone had recent success running MAKER on cfncluster or parallel cluster in AWS? 2. Would it be reasonable to just split up N chromosomes across N ECS instances and collect the results at the end? If so, does it make sense to run each chromosome level annotation on for example an m4.16xlarge instance with 64 cores and 256 GB of RAM? Or is there a maximum number of cores at which the benefits from parallelization saturate? Thanks and sorry for the long question Keith This system contains confidential and copyrighted information. Access to the system is limited to users only and only for approved business purposes. Anyone obtaining access to and using this system acknowledges that all information on this system including but not limited to electronic mail, word processing, directories and files, constitutes private property belonging to the Company. Anyone using of viewing this system is further advised that the use of this system may be recorded and the information contained herein may be monitored, retrieved and reviewed if, in the Company?s sole discretion there is a business reason to do so. If improper activity or use is suspected, all available information may be used by the Company for possible disciplinary action, prosecution, civil claim or any remedy or lawful purpose. -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Feb 4 11:31:29 2019 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 4 Feb 2019 11:31:29 -0700 Subject: [maker-devel] MAKER on AWS In-Reply-To: <4660790F-38F4-470B-8B8E-9911A0BC36C3@contoso.com> References: <4660790F-38F4-470B-8B8E-9911A0BC36C3@contoso.com> Message-ID: <0934DD0D-9431-4454-A278-87E27D44F984@gmail.com> You can try and stand up a cluster inside AWS, or like you said just start independent instances each with their own piece of the total dataset. There is a tools called fasta_tool inside of maker that makes it easy to split up the dataset into equal sized chunks. Alternatively, CyVerse has set up an interesting MAKER wrapper (WQ-MAKER) that launches multiple cloud instances for MAKER and handles data chunking for you (they?ve been using XSEDE cloud resources through the NSF) ?> http://ccl.cse.nd.edu/research/papers/maker-service-ic2e2018.pdf Here is an example of an external project using their setup ?> http://onsnetwork.org/kubu4/2018/08/07/genome-annotation-olympia-oyster-genome-using-wq-maker-instance-on-jetstream/ ?Carson > On Feb 4, 2019, at 11:09 AM, DECKER, KEITH F [AG/1005] wrote: > > I would like to evaluate the use of MAKER on AWS, but I am unsure what the best approach to parallelization would be. > I found this old post on STARCLUSTER, http://efish.integrativebiology.msu.edu/2015/02/10/annotate.html > but my understanding is that STARCLUSTER and its successors (cfncluster and parallel cluster) can be challenging to set up and use. > > So my questions are > > 1. Has anyone had recent success running MAKER on cfncluster or parallel cluster in AWS? > 2. Would it be reasonable to just split up N chromosomes across N ECS instances and collect the results at the end? If so, does it make sense to run each chromosome level annotation on for example an m4.16xlarge instance with 64 cores and 256 GB of RAM? Or is there a maximum number of cores at which the benefits from parallelization saturate? > > Thanks and sorry for the long question > Keith > > > This system contains confidential and copyrighted information. Access to the system is limited to users only and only for approved business purposes. > Anyone obtaining access to and using this system acknowledges that all information on this system including but not limited to electronic mail, word processing, directories and files, constitutes private property belonging to the Company. > Anyone using of viewing this system is further advised that the use of this system may be recorded and the information contained herein may be monitored, retrieved and reviewed if, in the Company?s sole discretion there is a business reason to do so. > If improper activity or use is suspected, all available information may be used by the Company for possible disciplinary action, prosecution, civil claim or any remedy or lawful purpose. > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From liorglck at gmail.com Mon Feb 4 02:00:29 2019 From: liorglck at gmail.com (Lior Glick) Date: Mon, 4 Feb 2019 11:00:29 +0200 Subject: [maker-devel] MAKER not calling RepeatMasker exe indicated in maker_exe.ctl Message-ID: Dear MAKER users, I've been using MAKER for a while now, with RepeatMasker installed locally. By that I mean that I can type 'RepeatMasker' in my terminal and the software is initiated. Typing 'which RepeatMasker' shows the correct local path. I also use this path as value for the maker_exe.ctl parameter 'RepeatMasker'. Trying to generalize my working environment, I am trying to use a conda env which is capable of running MAKER. This env comes with RepeatMasker as well. Once I activate this env, I can still run RepeatMasker, but it points to a different path. When I run MAKER within this env, it fails right away with the error message: ERROR: Could not determine if RepBase is installed Running the same configuration files locally (i.e. outside the conda env) results in a successful run. This leads me to think that MAKER is not actually using the path indicated in the maker_exe.ctl file, and rather looks for RepeatMasker in $PATH or something similar. Is that the expected behavior? Any suggestions of how to overcome this issue? Thanks and best regards, Lior -------------- next part -------------- An HTML attachment was scrubbed... URL: From keith.decker at bayer.com Mon Feb 4 11:39:48 2019 From: keith.decker at bayer.com (DECKER, KEITH F [AG/1005]) Date: Mon, 4 Feb 2019 18:39:48 +0000 Subject: [maker-devel] MAKER on AWS In-Reply-To: <0934DD0D-9431-4454-A278-87E27D44F984@gmail.com> References: <4660790F-38F4-470B-8B8E-9911A0BC36C3@contoso.com> <0934DD0D-9431-4454-A278-87E27D44F984@gmail.com> Message-ID: <1BAD7C53-AFA5-4A4A-B35B-D760B3D4C28D@monsanto.com> Thanks, Do you have metrics on how MAKER performs on annotating a single chromosome on a single machine? For example, will I see anything close to 16X speed-up using a 16 core machine, and does performance improvement saturate at a certain number of cores? -Keith From: Carson Holt Date: Monday, February 4, 2019 at 12:33 PM To: "DECKER, KEITH F [AG/1005]" Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] MAKER on AWS You can try and stand up a cluster inside AWS, or like you said just start independent instances each with their own piece of the total dataset. There is a tools called fasta_tool inside of maker that makes it easy to split up the dataset into equal sized chunks. Alternatively, CyVerse has set up an interesting MAKER wrapper (WQ-MAKER) that launches multiple cloud instances for MAKER and handles data chunking for you (they?ve been using XSEDE cloud resources through the NSF) ?> http://ccl.cse.nd.edu/research/papers/maker-service-ic2e2018.pdf Here is an example of an external project using their setup ?> http://onsnetwork.org/kubu4/2018/08/07/genome-annotation-olympia-oyster-genome-using-wq-maker-instance-on-jetstream/ ?Carson On Feb 4, 2019, at 11:09 AM, DECKER, KEITH F [AG/1005] > wrote: I would like to evaluate the use of MAKER on AWS, but I am unsure what the best approach to parallelization would be. I found this old post on STARCLUSTER, http://efish.integrativebiology.msu.edu/2015/02/10/annotate.html but my understanding is that STARCLUSTER and its successors (cfncluster and parallel cluster) can be challenging to set up and use. So my questions are 1. Has anyone had recent success running MAKER on cfncluster or parallel cluster in AWS? 2. Would it be reasonable to just split up N chromosomes across N ECS instances and collect the results at the end? If so, does it make sense to run each chromosome level annotation on for example an m4.16xlarge instance with 64 cores and 256 GB of RAM? Or is there a maximum number of cores at which the benefits from parallelization saturate? Thanks and sorry for the long question Keith This system contains confidential and copyrighted information. Access to the system is limited to users only and only for approved business purposes. Anyone obtaining access to and using this system acknowledges that all information on this system including but not limited to electronic mail, word processing, directories and files, constitutes private property belonging to the Company. Anyone using of viewing this system is further advised that the use of this system may be recorded and the information contained herein may be monitored, retrieved and reviewed if, in the Company?s sole discretion there is a business reason to do so. If improper activity or use is suspected, all available information may be used by the Company for possible disciplinary action, prosecution, civil claim or any remedy or lawful purpose. _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org This system contains confidential and copyrighted information. Access to the system is limited to users only and only for approved business purposes. Anyone obtaining access to and using this system acknowledges that all information on this system including but not limited to electronic mail, word processing, directories and files, constitutes private property belonging to the Company. Anyone using of viewing this system is further advised that the use of this system may be recorded and the information contained herein may be monitored, retrieved and reviewed if, in the Company?s sole discretion there is a business reason to do so. If improper activity or use is suspected, all available information may be used by the Company for possible disciplinary action, prosecution, civil claim or any remedy or lawful purpose. -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Feb 4 12:00:00 2019 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 4 Feb 2019 12:00:00 -0700 Subject: [maker-devel] MAKER on AWS In-Reply-To: <1BAD7C53-AFA5-4A4A-B35B-D760B3D4C28D@monsanto.com> References: <4660790F-38F4-470B-8B8E-9911A0BC36C3@contoso.com> <0934DD0D-9431-4454-A278-87E27D44F984@gmail.com> <1BAD7C53-AFA5-4A4A-B35B-D760B3D4C28D@monsanto.com> Message-ID: I don?t have cloud performance stats, but I do have cluster performance stats you may be able to somewhat correlate (attached). On a cluster we see nearly linear performance gains until ~100 CPU cores, and the plateau doesn?t fully level out until well after 600 cores (we are hitting IO and networking limits for inter-node communication). So if you are only using a single instance, you can essentially consider it the equivalent of a single real machine which would fall well under 100 CPU cores, and performance growth would be expected to be linear on that instance. ?Carson > On Feb 4, 2019, at 11:39 AM, DECKER, KEITH F [AG/1005] wrote: > > Thanks, > Do you have metrics on how MAKER performs on annotating a single chromosome on a single machine? For example, will I see anything close to 16X speed-up using a 16 core machine, and does performance improvement saturate at a certain number of cores? > > -Keith > > From: Carson Holt > > Date: Monday, February 4, 2019 at 12:33 PM > To: "DECKER, KEITH F [AG/1005]" > > Cc: "maker-devel at yandell-lab.org " > > Subject: Re: [maker-devel] MAKER on AWS > > You can try and stand up a cluster inside AWS, or like you said just start independent instances each with their own piece of the total dataset. There is a tools called fasta_tool inside of maker that makes it easy to split up the dataset into equal sized chunks. > > Alternatively, CyVerse has set up an interesting MAKER wrapper (WQ-MAKER) that launches multiple cloud instances for MAKER and handles data chunking for you (they?ve been using XSEDE cloud resources through the NSF) ?> > http://ccl.cse.nd.edu/research/papers/maker-service-ic2e2018.pdf > > Here is an example of an external project using their setup ?> http://onsnetwork.org/kubu4/2018/08/07/genome-annotation-olympia-oyster-genome-using-wq-maker-instance-on-jetstream/ > > ?Carson > > > > > > On Feb 4, 2019, at 11:09 AM, DECKER, KEITH F [AG/1005] > wrote: > > I would like to evaluate the use of MAKER on AWS, but I am unsure what the best approach to parallelization would be. > I found this old post on STARCLUSTER, http://efish.integrativebiology.msu.edu/2015/02/10/annotate.html > but my understanding is that STARCLUSTER and its successors (cfncluster and parallel cluster) can be challenging to set up and use. > > So my questions are > > 1. Has anyone had recent success running MAKER on cfncluster or parallel cluster in AWS? > 2. Would it be reasonable to just split up N chromosomes across N ECS instances and collect the results at the end? If so, does it make sense to run each chromosome level annotation on for example an m4.16xlarge instance with 64 cores and 256 GB of RAM? Or is there a maximum number of cores at which the benefits from parallelization saturate? > > Thanks and sorry for the long question > Keith > > > > This system contains confidential and copyrighted information. Access to the system is limited to users only and only for approved business purposes. > Anyone obtaining access to and using this system acknowledges that all information on this system including but not limited to electronic mail, word processing, directories and files, constitutes private property belonging to the Company. > Anyone using of viewing this system is further advised that the use of this system may be recorded and the information contained herein may be monitored, retrieved and reviewed if, in the Company?s sole discretion there is a business reason to do so. > If improper activity or use is suspected, all available information may be used by the Company for possible disciplinary action, prosecution, civil claim or any remedy or lawful purpose. > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > This system contains confidential and copyrighted information. Access to the system is limited to users only and only for approved business purposes. > Anyone obtaining access to and using this system acknowledges that all information on this system including but not limited to electronic mail, word processing, directories and files, constitutes private property belonging to the Company. > Anyone using of viewing this system is further advised that the use of this system may be recorded and the information contained herein may be monitored, retrieved and reviewed if, in the Company?s sole discretion there is a business reason to do so. > If improper activity or use is suspected, all available information may be used by the Company for possible disciplinary action, prosecution, civil claim or any remedy or lawful purpose. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: PastedGraphic-2.pdf Type: application/pdf Size: 41425 bytes Desc: not available URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: From xvazquezc at gmail.com Tue Feb 5 15:42:40 2019 From: xvazquezc at gmail.com (=?UTF-8?Q?Xabier_V=C3=A1zquez=2DCampos?=) Date: Wed, 6 Feb 2019 09:42:40 +1100 Subject: [maker-devel] Re-annotation, fewer gene predictions In-Reply-To: References: <77517CF5-7C20-43FE-94EB-7F45A3D70A4F@live.com> Message-ID: Don't you use SNAP? It usually produces quite decent results. And easier to train than any of the other predictors In any case, the Augustus gene model is way off in both cases GM doesn't seem bad if your fungus has a rather usual genome... in the first. For the second, it looks bad I'm not too familiar with the reannotation but I'd rather create the gene models from scratch rather than reuse the ones from the Illumina-only genomes. Note that assemblies with long-reads, have a higher proportion of repetitive elements that need masking and RepeatMasker only may not be enough. In theory, this shouldn't affect Augustus model if trained through BUSCO as it uses defined conserved markers to create the gene model, but I'm not so sure about GM. If you trained Augustus with BUSCO, and this is the result, I'd discard the gene model and train it again by the "traditional way", i.e. as it used to be when we only had CEGMA. I had good results just by changing the training method. Hope it helps, Xabi On Wed, 6 Feb 2019 at 02:19, morgan sobol wrote: > Thank you, Xabi for the response. > The number of proteins from each source is greatly lower than before. > Previous numbers were 325, 10,899, and 11,243 for augustus, genemark, and > maker respectively. > The more recent numbers are 25, 857, 4418 respectively. > > So do you think maybe this hints that something is wrong from genemark? > > Morgan > > > ------------------------------ > *From:* Xabier V?zquez-Campos > *Sent:* Sunday, February 3, 2019 4:43 PM > *To:* morgan sobol > *Cc:* maker-devel at yandell-lab.org > *Subject:* Re: [maker-devel] Re-annotation, fewer gene predictions > > Hi Morgan, > > We had a similar issue with AUGUSTUS underpredicting when using a > BUSCO-derived gene model > https://groups.google.com/d/msg/maker-devel/ocnDG4nq1A8/NyCPzzRgAgAJ > > Also, check the number of proteins by each individual predictor. If the > numbers from one of them are off, you may find a possible source of issues. > We didn't have a very good experience with GM, as it used to overpredict > an absurd number of proteins. > > Xabi > > On Mon, 4 Feb 2019 at 06:15, morgan sobol wrote: > > Hello, > > I previously used Maker to annotate two different fungal genomes that were > created using Illumina sequences only. For these genomes, I had over 11,000 > genes predicted. > I recently obtained PacBio sequences for the same genomes, so I created > two hybrid assemblies. Both assemblies were very familiar in length and > completed number of orthologs to the Illumina only assembly, but had much > fewer, but longer contigs. > > I re-ran Maker using the settings below. For one of my genomes, I got > around 11,000 genes predicted again, as expected. However, for the other > genome, I am continuously getting ~4,400 predicted genes. > > I am asking for help as to how I can determine why I keep getting fewer > predicted genes for only one of my genomes, even though I ran them the same? > > Thanks, > Morgan S. > > maker_opts.log > #-----Genome (these are always required) > genome=/work/Geomicrobiology/msobol/IODP_329_SPG/1368D2H1/repeatmasker/unicycler/1368D_unicycler_contigs.fasta.masked > #genome sequence (fasta file or$ > organism_type=eukaryotic #eukaryotic or prokaryotic. Default is eukaryotic > > #-----Re-annotation Using MAKER Derived GFF3 > maker_gff=/work/Geomicrobiology/msobol/IODP_329_SPG/1368D2H1/maker/1368D_2H1_contigs.fasta.maker.output/1368D_2H1_contigs.fasta.all.gff > #MAKER derive$ > est_pass=0 #use ESTs in maker_gff: 1 = yes, 0 = no > altest_pass=1 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no > protein_pass=1 #use protein alignments in maker_gff: 1 = yes, 0 = no > rm_pass=0 #use repeats in maker_gff: 1 = yes, 0 = no > model_pass=0 #use gene models in maker_gff: 1 = yes, 0 = no > pred_pass=0 #use ab-initio predictions in maker_gff: 1 = yes, 0 = no > other_pass=0 #passthrough anyything else in maker_gff: 1 = yes, 0 = no > > #-----EST Evidence (for best results provide a file for at least one) > est= #set of ESTs or assembled mRNA-seq in fasta format > altest= #EST/cDNA sequence file in fasta format from an alternate organism > est_gff= #aligned ESTs or mRNA-seq from an external GFF3 file > altest_gff= #aligned ESTs from a closly relate species in GFF3 format > > #-----Protein Homology Evidence (for best results provide a file for at > least one) > protein=/work/Geomicrobiology/msobol/IODP_329_SPG/uniprot_sprot.fasta > #protein sequence file in fasta format (i.e. from mutiple oransisms) > protein_gff= #aligned protein homology evidence from an external GFF3 file > > #-----Repeat Masking (leave values blank to skip repeat masking) > model_org= #select a model organism for RepBase masking in RepeatMasker > rmlib= #provide an organism specific repeat library in fasta format for > RepeatMasker > repeat_protein= #provide a fasta file of transposable element proteins for > RepeatRunner > rm_gff= #pre-identified repeat elements from an external GFF3 file > prok_rm=0 #forces MAKER to repeatmask prokaryotes (no reason to change > this), 1 = yes, 0 = no > softmask=0 #use soft-masking rather than hard-masking in BLAST (i.e. seg > and dust filtering) > > #-----Gene Prediction > snaphmm= #SNAP HMM file > gmhmm=/home/msobol/genemark/68D_2/output/gmhmm.mod #GeneMark HMM file > augustus_species=1368D_uni #Augustus gene prediction species model > fgenesh_par_file= #FGENESH parameter file > pred_gff= #ab-initio predictions from an external GFF3 file > model_gff= #annotated gene models from an external GFF3 file (annotation > pass-through) > est2genome=0 #infer gene predictions directly from ESTs, 1 = yes, 0 = no > protein2genome=1 #infer predictions from protein homology, 1 = yes, 0 = no > trna=0 #find tRNAs with tRNAscan, 1 = yes, 0 = no > snoscan_rrna= #rRNA file to have Snoscan find snoRNAs > unmask=0 #also run ab-initio prediction programs on unmasked sequence, 1 = > yes, 0 = no > > #-----Other Annotation Feature Types (features MAKER doesn't recognize) > other_gff= #extra features to pass-through to final MAKER generated GFF3 > file > > #-----External Application Behavior Options > alt_peptide=C #amino acid used to replace non-standard amino acids in > BLAST databases > cpus=1 #max number of cpus to use in BLAST and RepeatMasker (not for MPI, > leave 1 when using MPI) > > #-----MAKER Behavior Options > max_dna_len=100000 #length for dividing up contigs into chunks > (increases/decreases memory usage) > min_contig=1 #skip genome contigs below this length (under 10kb are often > useless) > > pred_flank=200 #flank for extending evidence clusters sent to gene > predictors > pred_stats=1 #report AED and QI statistics for all predictions as well as > models > AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and > 1) > min_protein=0 #require at least this many amino acids in predicted proteins > alt_splice=0 #Take extra steps to try and find alternative splicing, 1 = > yes, 0 = no > always_complete=0 #extra steps to force start and stop codons, 1 = yes, 0 > = no > map_forward=0 #map names and attributes forward from old GFF3 genes, 1 = > yes, 0 = no > keep_preds=1 #Concordance threshold to add unsupported gene prediction > (bound by 0 and 1) > > split_hit=10000 #length for the splitting of hits (expected max intron > size for evidence alignments) > single_exon=1 #consider single exon EST evidence when generating > annotations, 1 = yes, 0 = no > single_length=250 #min length required for single exon ESTs if > 'single_exon is enabled' > correct_est_fusion=0 #limits use of ESTs in annotation to avoid fusion > genes > > tries=2 #number of times to try a contig if there is a failure for some > reason > clean_try=0 #remove all data from previous run before retrying, 1 = yes, 0 > = no > clean_up=0 #removes theVoid directory with individual analysis files, 1 = > yes, 0 = no > TMP= #specify a directory other than the system default temporary > directory for temporary files > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > -- > Xabier V?zquez-Campos, *PhD* > *Research Associate* > NSW Systems Biology Initiative > School of Biotechnology and Biomolecular Sciences > The University of New South Wales > Sydney NSW 2052 AUSTRALIA > -- Xabier V?zquez-Campos, *PhD* *Research Associate* NSW Systems Biology Initiative School of Biotechnology and Biomolecular Sciences The University of New South Wales Sydney NSW 2052 AUSTRALIA -------------- next part -------------- An HTML attachment was scrubbed... URL: From xvazquezc at gmail.com Wed Feb 6 15:33:47 2019 From: xvazquezc at gmail.com (=?UTF-8?Q?Xabier_V=C3=A1zquez=2DCampos?=) Date: Thu, 7 Feb 2019 09:33:47 +1100 Subject: [maker-devel] Re-annotation, fewer gene predictions In-Reply-To: References: <77517CF5-7C20-43FE-94EB-7F45A3D70A4F@live.com> Message-ID: SNAP is easy to train, works well in fungal genomes and it's explained in Maker's wiki: http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial_for_WGS_Assembly_and_Annotation_Winter_School_2018#Training_ab_initio_Gene_Predictors Oh, sorry, I didn't explain myself well. What I was trying to say is that before BUSCO, when we only had CEGMA, we would proceed in a different way to train Augustus as CEGMA wouldn't produce Augustus gene models automatically. I don't mean you to use CEGMA. This is what I have on my own documentation about how to train Augustus "the old way" > AUGUSTUS? the old way > > Alternatively, you can train AUGUSTUS in a more ?manual? way, like when we > were using CEGMA. The training starts with the output from the second > instance of fathom in the SNAP training section. > > cd ${MYGENOME_DIR}/maker/snap1 > perl ~/bin/zff2augustus_gbk.pl > ${MYGENOME}.train1.gb > > zff2augustus_gbk.pl generates a GenBank file from export.dna. > > The actual training of AUGUSTUS will be through the *webAUGUSTUS server*. > > Before proceed, it is recommended to rename the fasta headers, specially > if they contain special characters and/or very long headers. This is the > main reason of failure for the jobs submitted to webAUGUSTUS. You can use > the simplifyFastaHeaders.pl > > script for that: > > perl ~/bin/simplifyFastaHeaders.pl ${MYGENOME}_assembly.fasta nameStem ${MYGENOME}_contigs_rename.fasta ${MYGENOME}_contigs.map > > perl ~/bin/simplifyFastaHeaders.pl ${MYGENOME}_transcripts_assembled.fasta nameStem ${MYGENOME}_rna_rename.fasta ${MYGENOME}_rna.map > > nameStem is the base name for naming each of the sequences in the > multifasta files. Use a value with something appropriate. Use *contig* > and *rna* for the assembly and RNA-seq files, respectively; or something > based on that. For example, ?pgcontig? and ?pgrna? for contigs and RNA from *Puccinia > graminis* > *DO NOT* give the same nameStem to both fasta files, and don?t use any > special character. > > We need the following files (minimum): > > - ${MYGENOME}_assembly.fasta as *Genome file* > - ${MYGENOME}.train1.gb as *Training gene structure file* > > If we also have RNA-seq data: > > - ${MYGENOME}_assembled_transcripts.fasta as *cDNA file* > > Use ${MYGENOME}_v1 as *Species name*. We will need to have a different > species name in the retraining step. Otherwise when Maker2 is rerun, Maker2 > will see the same name and will not rerun AUGUSTUS, even though the species > profile is different. So, ${MYGENOME}_v1 just do the job and tracks > version. > > Once the job is finished, the *Species parameter archive* ( > parameters.tar.gz) will contain a folder with the model files for your > species. Copy it to the species folder of your AUGUSTUS installation. > Hope this helps PS: hit reply all so this is logged in Maker's mail list in case anybody else experiences similar issues On Thu, 7 Feb 2019 at 06:36, morgan sobol wrote: > I have not used SNAP or CEGMA, however, I see that CEGMA was discontinued > in 2015. > Do you think that will be a problem, or is it still worth using the old > version? > > > ------------------------------ > *From:* Xabier V?zquez-Campos > *Sent:* Tuesday, February 5, 2019 4:42 PM > *To:* morgan sobol; Maker Mailing List > *Subject:* Re: [maker-devel] Re-annotation, fewer gene predictions > > Don't you use SNAP? It usually produces quite decent results. And easier > to train than any of the other predictors > > In any case, the Augustus gene model is way off in both cases > GM doesn't seem bad if your fungus has a rather usual genome... in the > first. For the second, it looks bad > > I'm not too familiar with the reannotation but I'd rather create the gene > models from scratch rather than reuse the ones from the Illumina-only > genomes. > Note that assemblies with long-reads, have a higher proportion of > repetitive elements that need masking and RepeatMasker only may not be > enough. In theory, this shouldn't affect Augustus model if trained through > BUSCO as it uses defined conserved markers to create the gene model, but > I'm not so sure about GM. > > If you trained Augustus with BUSCO, and this is the result, I'd discard > the gene model and train it again by the "traditional way", i.e. as it used > to be when we only had CEGMA. I had good results just by changing the > training method. > > Hope it helps, > Xabi > > > > > On Wed, 6 Feb 2019 at 02:19, morgan sobol wrote: > > Thank you, Xabi for the response. > The number of proteins from each source is greatly lower than before. > Previous numbers were 325, 10,899, and 11,243 for augustus, genemark, and > maker respectively. > The more recent numbers are 25, 857, 4418 respectively. > > So do you think maybe this hints that something is wrong from genemark? > > Morgan > > > ------------------------------ > *From:* Xabier V?zquez-Campos > *Sent:* Sunday, February 3, 2019 4:43 PM > *To:* morgan sobol > *Cc:* maker-devel at yandell-lab.org > *Subject:* Re: [maker-devel] Re-annotation, fewer gene predictions > > Hi Morgan, > > We had a similar issue with AUGUSTUS underpredicting when using a > BUSCO-derived gene model > https://groups.google.com/d/msg/maker-devel/ocnDG4nq1A8/NyCPzzRgAgAJ > > Also, check the number of proteins by each individual predictor. If the > numbers from one of them are off, you may find a possible source of issues. > We didn't have a very good experience with GM, as it used to overpredict > an absurd number of proteins. > > Xabi > > On Mon, 4 Feb 2019 at 06:15, morgan sobol wrote: > > Hello, > > I previously used Maker to annotate two different fungal genomes that were > created using Illumina sequences only. For these genomes, I had over 11,000 > genes predicted. > I recently obtained PacBio sequences for the same genomes, so I created > two hybrid assemblies. Both assemblies were very familiar in length and > completed number of orthologs to the Illumina only assembly, but had much > fewer, but longer contigs. > > I re-ran Maker using the settings below. For one of my genomes, I got > around 11,000 genes predicted again, as expected. However, for the other > genome, I am continuously getting ~4,400 predicted genes. > > I am asking for help as to how I can determine why I keep getting fewer > predicted genes for only one of my genomes, even though I ran them the same? > > Thanks, > Morgan S. > > maker_opts.log > #-----Genome (these are always required) > genome=/work/Geomicrobiology/msobol/IODP_329_SPG/1368D2H1/repeatmasker/unicycler/1368D_unicycler_contigs.fasta.masked > #genome sequence (fasta file or$ > organism_type=eukaryotic #eukaryotic or prokaryotic. Default is eukaryotic > > #-----Re-annotation Using MAKER Derived GFF3 > maker_gff=/work/Geomicrobiology/msobol/IODP_329_SPG/1368D2H1/maker/1368D_2H1_contigs.fasta.maker.output/1368D_2H1_contigs.fasta.all.gff > #MAKER derive$ > est_pass=0 #use ESTs in maker_gff: 1 = yes, 0 = no > altest_pass=1 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no > protein_pass=1 #use protein alignments in maker_gff: 1 = yes, 0 = no > rm_pass=0 #use repeats in maker_gff: 1 = yes, 0 = no > model_pass=0 #use gene models in maker_gff: 1 = yes, 0 = no > pred_pass=0 #use ab-initio predictions in maker_gff: 1 = yes, 0 = no > other_pass=0 #passthrough anyything else in maker_gff: 1 = yes, 0 = no > > #-----EST Evidence (for best results provide a file for at least one) > est= #set of ESTs or assembled mRNA-seq in fasta format > altest= #EST/cDNA sequence file in fasta format from an alternate organism > est_gff= #aligned ESTs or mRNA-seq from an external GFF3 file > altest_gff= #aligned ESTs from a closly relate species in GFF3 format > > #-----Protein Homology Evidence (for best results provide a file for at > least one) > protein=/work/Geomicrobiology/msobol/IODP_329_SPG/uniprot_sprot.fasta > #protein sequence file in fasta format (i.e. from mutiple oransisms) > protein_gff= #aligned protein homology evidence from an external GFF3 file > > #-----Repeat Masking (leave values blank to skip repeat masking) > model_org= #select a model organism for RepBase masking in RepeatMasker > rmlib= #provide an organism specific repeat library in fasta format for > RepeatMasker > repeat_protein= #provide a fasta file of transposable element proteins for > RepeatRunner > rm_gff= #pre-identified repeat elements from an external GFF3 file > prok_rm=0 #forces MAKER to repeatmask prokaryotes (no reason to change > this), 1 = yes, 0 = no > softmask=0 #use soft-masking rather than hard-masking in BLAST (i.e. seg > and dust filtering) > > #-----Gene Prediction > snaphmm= #SNAP HMM file > gmhmm=/home/msobol/genemark/68D_2/output/gmhmm.mod #GeneMark HMM file > augustus_species=1368D_uni #Augustus gene prediction species model > fgenesh_par_file= #FGENESH parameter file > pred_gff= #ab-initio predictions from an external GFF3 file > model_gff= #annotated gene models from an external GFF3 file (annotation > pass-through) > est2genome=0 #infer gene predictions directly from ESTs, 1 = yes, 0 = no > protein2genome=1 #infer predictions from protein homology, 1 = yes, 0 = no > trna=0 #find tRNAs with tRNAscan, 1 = yes, 0 = no > snoscan_rrna= #rRNA file to have Snoscan find snoRNAs > unmask=0 #also run ab-initio prediction programs on unmasked sequence, 1 = > yes, 0 = no > > #-----Other Annotation Feature Types (features MAKER doesn't recognize) > other_gff= #extra features to pass-through to final MAKER generated GFF3 > file > > #-----External Application Behavior Options > alt_peptide=C #amino acid used to replace non-standard amino acids in > BLAST databases > cpus=1 #max number of cpus to use in BLAST and RepeatMasker (not for MPI, > leave 1 when using MPI) > > #-----MAKER Behavior Options > max_dna_len=100000 #length for dividing up contigs into chunks > (increases/decreases memory usage) > min_contig=1 #skip genome contigs below this length (under 10kb are often > useless) > > pred_flank=200 #flank for extending evidence clusters sent to gene > predictors > pred_stats=1 #report AED and QI statistics for all predictions as well as > models > AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and > 1) > min_protein=0 #require at least this many amino acids in predicted proteins > alt_splice=0 #Take extra steps to try and find alternative splicing, 1 = > yes, 0 = no > always_complete=0 #extra steps to force start and stop codons, 1 = yes, 0 > = no > map_forward=0 #map names and attributes forward from old GFF3 genes, 1 = > yes, 0 = no > keep_preds=1 #Concordance threshold to add unsupported gene prediction > (bound by 0 and 1) > > split_hit=10000 #length for the splitting of hits (expected max intron > size for evidence alignments) > single_exon=1 #consider single exon EST evidence when generating > annotations, 1 = yes, 0 = no > single_length=250 #min length required for single exon ESTs if > 'single_exon is enabled' > correct_est_fusion=0 #limits use of ESTs in annotation to avoid fusion > genes > > tries=2 #number of times to try a contig if there is a failure for some > reason > clean_try=0 #remove all data from previous run before retrying, 1 = yes, 0 > = no > clean_up=0 #removes theVoid directory with individual analysis files, 1 = > yes, 0 = no > TMP= #specify a directory other than the system default temporary > directory for temporary files > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > -- > Xabier V?zquez-Campos, *PhD* > *Research Associate* > NSW Systems Biology Initiative > School of Biotechnology and Biomolecular Sciences > The University of New South Wales > Sydney NSW 2052 AUSTRALIA > > > > -- > Xabier V?zquez-Campos, *PhD* > *Research Associate* > NSW Systems Biology Initiative > School of Biotechnology and Biomolecular Sciences > The University of New South Wales > Sydney NSW 2052 AUSTRALIA > -- Xabier V?zquez-Campos, *PhD* *Research Associate* NSW Systems Biology Initiative School of Biotechnology and Biomolecular Sciences The University of New South Wales Sydney NSW 2052 AUSTRALIA -------------- next part -------------- An HTML attachment was scrubbed... URL: From liorglic at mail.tau.ac.il Mon Feb 11 07:04:16 2019 From: liorglic at mail.tau.ac.il (Lior Glick) Date: Mon, 11 Feb 2019 16:04:16 +0200 Subject: [maker-devel] MAKER not calling RepeatMasker exe indicated in maker_exe.ctl Message-ID: Dear MAKER users, I've been using MAKER for a while now, with RepeatMasker installed locally. By that I mean that I can type 'RepeatMasker' in my terminal and the software is initiated. Typing 'which RepeatMasker' shows the correct local path. I also use this path as value for the maker_exe.ctl parameter 'RepeatMasker'. Trying to generalize my working environment, I am trying to use a conda env which is capable of running MAKER. This env comes with RepeatMasker as well. Once I activate this env, I can still run RepeatMasker, but it points to a different path. When I run MAKER within this env, it fails right away with the error message: ERROR: Could not determine if RepBase is installed Running the same configuration files locally (i.e. outside the conda env) results in a successful run. This leads me to think that MAKER is not actually using the path indicated in the maker_exe.ctl file, and rather looks for RepeatMasker in $PATH or something similar. Is that the expected behavior? Any suggestions of how to overcome this issue? Thanks and best regards, Lior -------------- next part -------------- An HTML attachment was scrubbed... URL: From liorglic at mail.tau.ac.il Mon Feb 11 07:12:25 2019 From: liorglic at mail.tau.ac.il (Lior Glick) Date: Mon, 11 Feb 2019 16:12:25 +0200 Subject: [maker-devel] Unknown (X) amino acids in predicted proteins Message-ID: Dear MAKER users, After completing a MAKER run, I looked at the protein fasta files that MAKER outputs and noticed that a small fraction of the sequences include X characters, indicating unknown amino acids. I was wondering how such sequences are obtained, I mean how come there are unknown amino acids in the prediction? Is this an indication of low-quality predictions? Is there any documentation regarding the procedure that generates the protein sequences? Thanks a lot, Lior -------------- next part -------------- An HTML attachment was scrubbed... URL: From kapeelc at gmail.com Thu Feb 7 12:43:47 2019 From: kapeelc at gmail.com (Kapeel Chougule) Date: Thu, 7 Feb 2019 14:43:47 -0500 Subject: [maker-devel] MAKER v3 Fgenesh ERROR Message-ID: Hi, Carson I have been getting this error with fgenesh tool within MAKER. It runs ok with most of the assembly contigs but seems to fail on one contig or part of the contig with the below error Widget::fgenesh: /mnt/grid/ware/hpc/home/data/mcampbel/applications/maker/bin/../lib/Widget/fgenesh/fgenesh_wrap /mnt/grid/ware/hpc_norepl/data/data/programs/fgenesh_v8/fgenesh_suite_v8.0.0a/fgenesh /sonas-hs/ware/hpc_norepl/data/programs/fgenesh_v8/fgenesh_suite_v8.0.0a/Zeamays.mpar.dat.new /tmp/uge/53139300.1.primary.q/maker_j3ttxX/6/6_1.600610-613023.Zeamays.mpar.dat.new.auto_annotator.fgenesh.fasta -exon_table:/tmp/uge/53139300.1.primary.q/maker_j3ttxX/6/6_1.600610-613023.Zeamays.mpar.dat.new.auto_annotator.xdef.fgenesh > /tmp/uge/53139300.1.primary.q/maker_j3ttxX/6/6_1.600610-613023.Zeamays.mpar.dat.new.auto_annotator.fgenesh #-------------------------------# ...processing 9 of 24 ...processing 8 of 28 ...processing 10 of 24 ...processing 9 of 28 ...processing 11 of 24 ...processing 10 of 28 ...processing 12 of 24 ...processing 11 of 28 deleted:0 genes ERROR: FgenesH failed --> rank=14, hostname=bnbcompute50 ERROR: Failed while annotating transcripts ERROR: Chunk failed at level:1, tier_type:4 FAILED CONTIG:Super-Scaffold_14.2_contig2 I updated the perl module fgenesh.pm as suggested in the previous threads. Attached are the maker_opts.ctl and STDERR log file. Thanks Kapeel -- *Kapeel ChouguleComputational Scientist Developer II* *One Bungtown Road Cold Spring Harbor, NY 11724http://www.warelab.org/ * -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_opts.ctl Type: application/octet-stream Size: 5421 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: stderr.log Type: application/octet-stream Size: 10012918 bytes Desc: not available URL: From fatih.sarigoel at durham.ac.uk Wed Feb 13 05:20:40 2019 From: fatih.sarigoel at durham.ac.uk (SARIGOEL, FATIH) Date: Wed, 13 Feb 2019 12:20:40 +0000 Subject: [maker-devel] Does Conda Maker actually work? Message-ID: Greetings, I notice that you never mention conda installation on your website, so I am curious if the conda version is actually supposed to be working fine or not; as for me it didn't. I created a new conda environment and installed Maker (tried this with both installation options) When I run the example files, I get this error: "make: *** [Makefile:330: IndexedBase_14e0.o] Error 127 A problem was encountered while attempting to compile and install your Inline C code. The command that failed was: "make > out.make 2>&1" with error code 2" My conda environment is here /fast_new/work/users/fsarigo_m/miniconda3 I don't understand why the program is trying to look here: /home/conda which does not exist Also begins with a "possible precedence issue" Thanks for your help in advance! Fatih +++++ Here is the full log until the end of the contig: (MakerX) [fsarigo_m at med0223 MAKER]$ maker Possible precedence issue with control flow operator at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/Bio/DB/IndexedBase.pm line 845. STATUS: Parsing control files... STATUS: Processing and indexing input FASTA files... STATUS: Setting up database for any GFF3 input... A data structure will be created for you at: /fast_new/work/users/fsarigo_m/scratch/tmp/urchin_holder/SeaUrchins/MAKER/dpp_contig.maker.output/dpp_contig_datastore To access files for individual sequences use the datastore index: /fast_new/work/users/fsarigo_m/scratch/tmp/urchin_holder/SeaUrchins/MAKER/dpp_contig.maker.output/dpp_contig_master_datastore_index.log STATUS: Now running MAKER... examining contents of the fasta file and run log --Next Contig-- Processing run.log file... #--------------------------------------------------------------------- Now starting the contig!! SeqID: contig-dpp-500-500 Length: 32156 #--------------------------------------------------------------------- Running Mkbootstrap for IndexedBase_14e0 () chmod 644 "IndexedBase_14e0.bs" "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/perl" -MExtUtils::Command::MM -e 'cp_nonempty' -- IndexedBase_14e0.bs blib/arch/auto/Bio/DB/IndexedBase_14e0/IndexedBase_14e0.bs 644 "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/perl" "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/ExtUtils/xsubpp" -typemap "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/5.26.2/ExtUtils/typemap" IndexedBase_14e0.xs > IndexedBase_14e0.xsc mv IndexedBase_14e0.xsc IndexedBase_14e0.c /home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/bin/x86_64-conda_cos6-linux-gnu-gcc -c -I"/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin" -D_REENTRANT -D_GNU_SOURCE --sysroot=/home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/x86_64-conda_cos6-linux-gnu/sysroot -fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D_FORTIFY_SOURCE=2 -O2 -DVERSION=\"0.00\" -DXS_VERSION=\"0.00\" -fPIC --sysroot=/home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/x86_64-conda_cos6-linux-gnu/sysroot "-I/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/5.26.2/x86_64-linux-thread-multi/CORE" IndexedBase_14e0.c /bin/sh: /home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/bin/x86_64-conda_cos6-linux-gnu-gcc: No such file or directory make: *** [Makefile:330: IndexedBase_14e0.o] Error 127 A problem was encountered while attempting to compile and install your Inline C code. The command that failed was: "make > out.make 2>&1" with error code 2 The build directory was: /fast_new/work/users/fsarigo_m/scratch/tmp/urchin_holder/SeaUrchins/MAKER/_Inline/build/Bio/DB/IndexedBase_14e0 To debug the problem, cd to the build directory, and inspect the output files. Environment PATH = '/fast/users/fsarigo_m/miniconda3/envs/MakerX/bin:/fast/users/fsarigo_m/miniconda3/bin:/opt/sge/bin:/opt/sge/bin/lx-amd64:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/fast/users/fsarigo_m/.local/bin:/fast/users/fsarigo_m/bin' at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/Bio/DB/IndexedBase.pm line 275. --> rank=NA, hostname=med0223 ...propagated at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/Inline/C.pm line 869. --> rank=NA, hostname=med0223 at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Error.pm line 38. Error::_throw_Error_Simple(HASH(0x564b40c78870)) called at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Error.pm line 306 Error::subs::run_clauses(HASH(0x564b40688970), "Running Mkbootstrap for IndexedBase_14e0 ()\x{a}chmod 644 \"Indexe"..., undef, ARRAY(0x564b40673ad0)) called at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Error.pm line 426 Error::subs::try(CODE(0x564b406899b8), HASH(0x564b40688970)) called at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/FastaSeq.pm line 95 FastaSeq::seq(FastaSeq=HASH(0x564b4068a7f0)) called at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Process/MpiChunk.pm line 478 Process::MpiChunk::_go(Process::MpiChunk=HASH(0x564b40673c08), "run", HASH(0x564b40673c80), 0, 0) called at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Process/MpiChunk.pm line 341 Process::MpiChunk::run(Process::MpiChunk=HASH(0x564b40673c08), 0) called at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Process/MpiChunk.pm line 357 Process::MpiChunk::run_all(Process::MpiChunk=HASH(0x564b40673c08), 0) called at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Process/MpiTiers.pm line 287 Process::MpiTiers::run_all(Process::MpiTiers=HASH(0x564b4053f9f0), 0) called at /fast/users/fsarigo_m/miniconda3/envs/MakerX/bin/maker line 683 Running Mkbootstrap for IndexedBase_14e0 () chmod 644 "IndexedBase_14e0.bs" "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/perl" -MExtUtils::Command::MM -e 'cp_nonempty' -- IndexedBase_14e0.bs blib/arch/auto/Bio/DB/IndexedBase_14e0/IndexedBase_14e0.bs 644 "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/perl" "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/ExtUtils/xsubpp" -typemap "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/5.26.2/ExtUtils/typemap" IndexedBase_14e0.xs > IndexedBase_14e0.xsc mv IndexedBase_14e0.xsc IndexedBase_14e0.c /home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/bin/x86_64-conda_cos6-linux-gnu-gcc -c -I"/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin" -D_REENTRANT -D_GNU_SOURCE --sysroot=/home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/x86_64-conda_cos6-linux-gnu/sysroot -fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D_FORTIFY_SOURCE=2 -O2 -DVERSION=\"0.00\" -DXS_VERSION=\"0.00\" -fPIC --sysroot=/home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/x86_64-conda_cos6-linux-gnu/sysroot "-I/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/5.26.2/x86_64-linux-thread-multi/CORE" IndexedBase_14e0.c /bin/sh: /home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/bin/x86_64-conda_cos6-linux-gnu-gcc: No such file or directory make: *** [Makefile:330: IndexedBase_14e0.o] Error 127 A problem was encountered while attempting to compile and install your Inline C code. The command that failed was: "make > out.make 2>&1" with error code 2 The build directory was: /fast_new/work/users/fsarigo_m/scratch/tmp/urchin_holder/SeaUrchins/MAKER/_Inline/build/Bio/DB/IndexedBase_14e0 To debug the problem, cd to the build directory, and inspect the output files. Environment PATH = '/fast/users/fsarigo_m/miniconda3/envs/MakerX/bin:/fast/users/fsarigo_m/miniconda3/bin:/opt/sge/bin:/opt/sge/bin/lx-amd64:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/fast/users/fsarigo_m/.local/bin:/fast/users/fsarigo_m/bin' at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/Bio/DB/IndexedBase.pm line 275. --> rank=NA, hostname=med0223 ...propagated at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/Inline/C.pm line 869. --> rank=NA, hostname=med0223 --> rank=NA, hostname=med0223 --> rank=NA, hostname=med0223 ERROR: Failed while examining contents of the fasta file and run log ERROR: Chunk failed at level:0, tier_type:0 FAILED CONTIG:contig-dpp-500-500 examining contents of the fasta file and run log -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Feb 13 07:51:44 2019 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 13 Feb 2019 07:51:44 -0700 Subject: [maker-devel] Does Conda Maker actually work? In-Reply-To: References: Message-ID: <0A81593F-EB19-417F-9C9D-3C55178F5D0F@gmail.com> The conda recipe was produced by another group. I do not currently recommend using it because I have seen a number of issues pop up on the list based on people attempting to install MAKER via conda. I know there is at least an issue with the conda RepeatMasker install, and there may be others. The specific failure you show is from Bio::DB::IndexedBase trying to compile an Inline::C function. It may be that conda is installing an older BioPerl where this issue still exists ?> https://github.com/bioperl/bioperl-live/issues/215 Or it may be that there is a new related issue (I?ve seen a handful of other examples that seem to relate back to Bio::DB::IndexedBase) ?> https://github.com/bioperl/bioperl-live/issues/305 Try installing MAKER without conda (make sure to remove any components that are in conda first to avoid conflicts). ?Carson > On Feb 13, 2019, at 5:20 AM, SARIGOEL, FATIH wrote: > > Greetings, > I notice that you never mention conda installation on your website, so I am curious if the conda version is actually supposed to be working fine or not; as for me it didn't. > I created a new conda environment and installed Maker (tried this with both installation options) > When I run the example files, I get this error: > > "make: *** [Makefile:330: IndexedBase_14e0.o] Error 127 > A problem was encountered while attempting to compile and install your Inline > C code. The command that failed was: > "make > out.make 2>&1" with error code 2" > > My conda environment is here > /fast_new/work/users/fsarigo_m/miniconda3 > I don't understand why the program is trying to look here: > /home/conda > which does not exist > > Also begins with a "possible precedence issue" > > Thanks for your help in advance! > Fatih > > +++++ > > Here is the full log until the end of the contig: > > (MakerX) [fsarigo_m at med0223 MAKER]$ maker > Possible precedence issue with control flow operator at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/Bio/DB/IndexedBase.pm line 845. > STATUS: Parsing control files... > STATUS: Processing and indexing input FASTA files... > STATUS: Setting up database for any GFF3 input... > A data structure will be created for you at: > /fast_new/work/users/fsarigo_m/scratch/tmp/urchin_holder/SeaUrchins/MAKER/dpp_contig.maker.output/dpp_contig_datastore > > To access files for individual sequences use the datastore index: > /fast_new/work/users/fsarigo_m/scratch/tmp/urchin_holder/SeaUrchins/MAKER/dpp_contig.maker.output/dpp_contig_master_datastore_index.log > > STATUS: Now running MAKER... > examining contents of the fasta file and run log > > > > --Next Contig-- > > Processing run.log file... > #--------------------------------------------------------------------- > Now starting the contig!! > SeqID: contig-dpp-500-500 > Length: 32156 > #--------------------------------------------------------------------- > > > Running Mkbootstrap for IndexedBase_14e0 () > chmod 644 "IndexedBase_14e0.bs" > "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/perl" -MExtUtils::Command::MM -e 'cp_nonempty' -- IndexedBase_14e0.bs blib/arch/auto/Bio/DB/IndexedBase_14e0/IndexedBase_14e0.bs 644 > "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/perl" "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/ExtUtils/xsubpp" -typemap "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/5.26.2/ExtUtils/typemap" IndexedBase_14e0.xs > IndexedBase_14e0.xsc > mv IndexedBase_14e0.xsc IndexedBase_14e0.c > /home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/bin/x86_64-conda_cos6-linux-gnu-gcc -c -I"/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin" -D_REENTRANT -D_GNU_SOURCE --sysroot=/home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/x86_64-conda_cos6-linux-gnu/sysroot -fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D_FORTIFY_SOURCE=2 -O2 -DVERSION=\"0.00\" -DXS_VERSION=\"0.00\" -fPIC --sysroot=/home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/x86_64-conda_cos6-linux-gnu/sysroot "-I/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/5.26.2/x86_64-linux-thread-multi/CORE" IndexedBase_14e0.c > /bin/sh: /home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/bin/x86_64-conda_cos6-linux-gnu-gcc: No such file or directory > make: *** [Makefile:330: IndexedBase_14e0.o] Error 127 > > A problem was encountered while attempting to compile and install your Inline > C code. The command that failed was: > "make > out.make 2>&1" with error code 2 > > The build directory was: > /fast_new/work/users/fsarigo_m/scratch/tmp/urchin_holder/SeaUrchins/MAKER/_Inline/build/Bio/DB/IndexedBase_14e0 > > To debug the problem, cd to the build directory, and inspect the output files. > > Environment PATH = '/fast/users/fsarigo_m/miniconda3/envs/MakerX/bin:/fast/users/fsarigo_m/miniconda3/bin:/opt/sge/bin:/opt/sge/bin/lx-amd64:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/fast/users/fsarigo_m/.local/bin:/fast/users/fsarigo_m/bin' > at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/Bio/DB/IndexedBase.pm line 275. > --> rank=NA, hostname=med0223 > ...propagated at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/Inline/C.pm line 869. > --> rank=NA, hostname=med0223 > at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Error.pm line 38. > Error::_throw_Error_Simple(HASH(0x564b40c78870)) called at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Error.pm line 306 > Error::subs::run_clauses(HASH(0x564b40688970), "Running Mkbootstrap for IndexedBase_14e0 ()\x{a}chmod 644 \"Indexe"..., undef, ARRAY(0x564b40673ad0)) called at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Error.pm line 426 > Error::subs::try(CODE(0x564b406899b8), HASH(0x564b40688970)) called at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/FastaSeq.pm line 95 > FastaSeq::seq(FastaSeq=HASH(0x564b4068a7f0)) called at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Process/MpiChunk.pm line 478 > Process::MpiChunk::_go(Process::MpiChunk=HASH(0x564b40673c08), "run", HASH(0x564b40673c80), 0, 0) called at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Process/MpiChunk.pm line 341 > Process::MpiChunk::run(Process::MpiChunk=HASH(0x564b40673c08), 0) called at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Process/MpiChunk.pm line 357 > Process::MpiChunk::run_all(Process::MpiChunk=HASH(0x564b40673c08), 0) called at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Process/MpiTiers.pm line 287 > Process::MpiTiers::run_all(Process::MpiTiers=HASH(0x564b4053f9f0), 0) called at /fast/users/fsarigo_m/miniconda3/envs/MakerX/bin/maker line 683 > Running Mkbootstrap for IndexedBase_14e0 () > chmod 644 "IndexedBase_14e0.bs" > "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/perl" -MExtUtils::Command::MM -e 'cp_nonempty' -- IndexedBase_14e0.bs blib/arch/auto/Bio/DB/IndexedBase_14e0/IndexedBase_14e0.bs 644 > "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/perl" "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/ExtUtils/xsubpp" -typemap "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/5.26.2/ExtUtils/typemap" IndexedBase_14e0.xs > IndexedBase_14e0.xsc > mv IndexedBase_14e0.xsc IndexedBase_14e0.c > /home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/bin/x86_64-conda_cos6-linux-gnu-gcc -c -I"/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin" -D_REENTRANT -D_GNU_SOURCE --sysroot=/home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/x86_64-conda_cos6-linux-gnu/sysroot -fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D_FORTIFY_SOURCE=2 -O2 -DVERSION=\"0.00\" -DXS_VERSION=\"0.00\" -fPIC --sysroot=/home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/x86_64-conda_cos6-linux-gnu/sysroot "-I/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/5.26.2/x86_64-linux-thread-multi/CORE" IndexedBase_14e0.c > /bin/sh: /home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/bin/x86_64-conda_cos6-linux-gnu-gcc: No such file or directory > make: *** [Makefile:330: IndexedBase_14e0.o] Error 127 > > A problem was encountered while attempting to compile and install your Inline > C code. The command that failed was: > "make > out.make 2>&1" with error code 2 > > The build directory was: > /fast_new/work/users/fsarigo_m/scratch/tmp/urchin_holder/SeaUrchins/MAKER/_Inline/build/Bio/DB/IndexedBase_14e0 > > To debug the problem, cd to the build directory, and inspect the output files. > > Environment PATH = '/fast/users/fsarigo_m/miniconda3/envs/MakerX/bin:/fast/users/fsarigo_m/miniconda3/bin:/opt/sge/bin:/opt/sge/bin/lx-amd64:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/fast/users/fsarigo_m/.local/bin:/fast/users/fsarigo_m/bin' > at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/Bio/DB/IndexedBase.pm line 275. > --> rank=NA, hostname=med0223 > ...propagated at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/Inline/C.pm line 869. > --> rank=NA, hostname=med0223 > --> rank=NA, hostname=med0223 > --> rank=NA, hostname=med0223 > ERROR: Failed while examining contents of the fasta file and run log > ERROR: Chunk failed at level:0, tier_type:0 > FAILED CONTIG:contig-dpp-500-500 > > examining contents of the fasta file and run log > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Feb 13 10:14:13 2019 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 13 Feb 2019 10:14:13 -0700 Subject: [maker-devel] MAKER not calling RepeatMasker exe indicated in maker_exe.ctl In-Reply-To: References: Message-ID: <6AFF11A9-9860-4047-A337-4B974C6C0F30@gmail.com> The conda installation of RepeatMasker runs oddly. It does not appear to run the ./configure script during setup, and is missing files inside the repeat library as a result. --Carson > On Feb 4, 2019, at 2:00 AM, Lior Glick wrote: > > Dear MAKER users, > > I've been using MAKER for a while now, with RepeatMasker installed locally. By that I mean that I can type 'RepeatMasker' in my terminal and the software is initiated. Typing 'which RepeatMasker' shows the correct local path. > I also use this path as value for the maker_exe.ctl parameter 'RepeatMasker'. > Trying to generalize my working environment, I am trying to use a conda env which is capable of running MAKER. This env comes with RepeatMasker as well. Once I activate this env, I can still run RepeatMasker, but it points to a different path. When I run MAKER within this env, it fails right away with the error message: > ERROR: Could not determine if RepBase is installed > Running the same configuration files locally (i.e. outside the conda env) results in a successful run. > This leads me to think that MAKER is not actually using the path indicated in the maker_exe.ctl file, and rather looks for RepeatMasker in $PATH or something similar. Is that the expected behavior? Any suggestions of how to overcome this issue? > > Thanks and best regards, > Lior > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Feb 13 10:18:44 2019 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 13 Feb 2019 10:18:44 -0700 Subject: [maker-devel] Unknown (X) amino acids in predicted proteins In-Reply-To: References: Message-ID: <1472E55C-62CB-4A73-B45D-C4BEF3E014B7@gmail.com> If you use GFF3 as input, or use est2genome or protein2genome in your final run, you may have ?N? characters from the assembly as part of your CDS (?N? is the ambiguity code for DNA which will result in an ?X? when translated which is the ambiguity code for amino acids). Augustus will do internal gymnastics and completely splice out exons containing N?s to try and never have this issue, but may not always be able to. It?s an indication of genome assembly issues. --Carson > On Feb 11, 2019, at 7:12 AM, Lior Glick wrote: > > Dear MAKER users, > > After completing a MAKER run, I looked at the protein fasta files that MAKER outputs and noticed that a small fraction of the sequences include X characters, indicating unknown amino acids. I was wondering how such sequences are obtained, I mean how come there are unknown amino acids in the prediction? Is this an indication of low-quality predictions? > Is there any documentation regarding the procedure that generates the protein sequences? > > Thanks a lot, > Lior > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Wed Feb 13 10:24:01 2019 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 13 Feb 2019 10:24:01 -0700 Subject: [maker-devel] Re-annotation, fewer gene predictions In-Reply-To: <77517CF5-7C20-43FE-94EB-7F45A3D70A4F@live.com> References: <77517CF5-7C20-43FE-94EB-7F45A3D70A4F@live.com> Message-ID: One thing you can also do is use old models as protein= input and run the protein2genome option just to see where things align. You may find that not all old models are recoverable in the new assembly. Fewer genes in the new assembly may mean redundant/duplicate contigs were collapse and split contigs were joined resulting in multiple gene fragments becoming a unified single model. Make sure to always review contigs in a browser to see how models and evidence correlate. ?Carson > On Feb 3, 2019, at 12:13 PM, morgan sobol wrote: > > Hello, > > I previously used Maker to annotate two different fungal genomes that were created using Illumina sequences only. For these genomes, I had over 11,000 genes predicted. > I recently obtained PacBio sequences for the same genomes, so I created two hybrid assemblies. Both assemblies were very familiar in length and completed number of orthologs to the Illumina only assembly, but had much fewer, but longer contigs. > > I re-ran Maker using the settings below. For one of my genomes, I got around 11,000 genes predicted again, as expected. However, for the other genome, I am continuously getting ~4,400 predicted genes. > > I am asking for help as to how I can determine why I keep getting fewer predicted genes for only one of my genomes, even though I ran them the same? > > Thanks, > Morgan S. > > maker_opts.log > #-----Genome (these are always required) > genome=/work/Geomicrobiology/msobol/IODP_329_SPG/1368D2H1/repeatmasker/unicycler/1368D_unicycler_contigs.fasta.masked #genome sequence (fasta file or$ > organism_type=eukaryotic #eukaryotic or prokaryotic. Default is eukaryotic > > #-----Re-annotation Using MAKER Derived GFF3 > maker_gff=/work/Geomicrobiology/msobol/IODP_329_SPG/1368D2H1/maker/1368D_2H1_contigs.fasta.maker.output/1368D_2H1_contigs.fasta.all.gff #MAKER derive$ > est_pass=0 #use ESTs in maker_gff: 1 = yes, 0 = no > altest_pass=1 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no > protein_pass=1 #use protein alignments in maker_gff: 1 = yes, 0 = no > rm_pass=0 #use repeats in maker_gff: 1 = yes, 0 = no > model_pass=0 #use gene models in maker_gff: 1 = yes, 0 = no > pred_pass=0 #use ab-initio predictions in maker_gff: 1 = yes, 0 = no > other_pass=0 #passthrough anyything else in maker_gff: 1 = yes, 0 = no > > #-----EST Evidence (for best results provide a file for at least one) > est= #set of ESTs or assembled mRNA-seq in fasta format > altest= #EST/cDNA sequence file in fasta format from an alternate organism > est_gff= #aligned ESTs or mRNA-seq from an external GFF3 file > altest_gff= #aligned ESTs from a closly relate species in GFF3 format > > #-----Protein Homology Evidence (for best results provide a file for at least one) > protein=/work/Geomicrobiology/msobol/IODP_329_SPG/uniprot_sprot.fasta #protein sequence file in fasta format (i.e. from mutiple oransisms) > protein_gff= #aligned protein homology evidence from an external GFF3 file > > #-----Repeat Masking (leave values blank to skip repeat masking) > model_org= #select a model organism for RepBase masking in RepeatMasker > rmlib= #provide an organism specific repeat library in fasta format for RepeatMasker > repeat_protein= #provide a fasta file of transposable element proteins for RepeatRunner > rm_gff= #pre-identified repeat elements from an external GFF3 file > prok_rm=0 #forces MAKER to repeatmask prokaryotes (no reason to change this), 1 = yes, 0 = no > softmask=0 #use soft-masking rather than hard-masking in BLAST (i.e. seg and dust filtering) > > #-----Gene Prediction > snaphmm= #SNAP HMM file > gmhmm=/home/msobol/genemark/68D_2/output/gmhmm.mod #GeneMark HMM file > augustus_species=1368D_uni #Augustus gene prediction species model > fgenesh_par_file= #FGENESH parameter file > pred_gff= #ab-initio predictions from an external GFF3 file > model_gff= #annotated gene models from an external GFF3 file (annotation pass-through) > est2genome=0 #infer gene predictions directly from ESTs, 1 = yes, 0 = no > protein2genome=1 #infer predictions from protein homology, 1 = yes, 0 = no > trna=0 #find tRNAs with tRNAscan, 1 = yes, 0 = no > snoscan_rrna= #rRNA file to have Snoscan find snoRNAs > unmask=0 #also run ab-initio prediction programs on unmasked sequence, 1 = yes, 0 = no > > #-----Other Annotation Feature Types (features MAKER doesn't recognize) > other_gff= #extra features to pass-through to final MAKER generated GFF3 file > > #-----External Application Behavior Options > alt_peptide=C #amino acid used to replace non-standard amino acids in BLAST databases > cpus=1 #max number of cpus to use in BLAST and RepeatMasker (not for MPI, leave 1 when using MPI) > > #-----MAKER Behavior Options > max_dna_len=100000 #length for dividing up contigs into chunks (increases/decreases memory usage) > min_contig=1 #skip genome contigs below this length (under 10kb are often useless) > > pred_flank=200 #flank for extending evidence clusters sent to gene predictors > pred_stats=1 #report AED and QI statistics for all predictions as well as models > AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and 1) > min_protein=0 #require at least this many amino acids in predicted proteins > alt_splice=0 #Take extra steps to try and find alternative splicing, 1 = yes, 0 = no > always_complete=0 #extra steps to force start and stop codons, 1 = yes, 0 = no > map_forward=0 #map names and attributes forward from old GFF3 genes, 1 = yes, 0 = no > keep_preds=1 #Concordance threshold to add unsupported gene prediction (bound by 0 and 1) > > split_hit=10000 #length for the splitting of hits (expected max intron size for evidence alignments) > single_exon=1 #consider single exon EST evidence when generating annotations, 1 = yes, 0 = no > single_length=250 #min length required for single exon ESTs if 'single_exon is enabled' > correct_est_fusion=0 #limits use of ESTs in annotation to avoid fusion genes > > tries=2 #number of times to try a contig if there is a failure for some reason > clean_try=0 #remove all data from previous run before retrying, 1 = yes, 0 = no > clean_up=0 #removes theVoid directory with individual analysis files, 1 = yes, 0 = no > TMP= #specify a directory other than the system default temporary directory for temporary files > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From liorglck at gmail.com Sun Feb 17 11:50:10 2019 From: liorglck at gmail.com (Lior Glick) Date: Sun, 17 Feb 2019 20:50:10 +0200 Subject: [maker-devel] Does Conda Maker actually work? In-Reply-To: <0A81593F-EB19-417F-9C9D-3C55178F5D0F@gmail.com> References: <0A81593F-EB19-417F-9C9D-3C55178F5D0F@gmail.com> Message-ID: That's good to know. Any plans on creating a stable conda package in the future? It'd be a very nice feature, especially since MAKER is not always straightforward to install. On Wed, Feb 13, 2019 at 5:22 PM Carson Holt wrote: > The conda recipe was produced by another group. I do not currently > recommend using it because I have seen a number of issues pop up on the > list based on people attempting to install MAKER via conda. I know there > is at least an issue with the conda RepeatMasker install, and there may be > others. The specific failure you show is from Bio::DB::IndexedBase trying > to compile an Inline::C function. It may be that conda is installing an > older BioPerl where this issue still exists ?> > https://github.com/bioperl/bioperl-live/issues/215 > > Or it may be that there is a new related issue (I?ve seen a handful of > other examples that seem to relate back to Bio::DB::IndexedBase) ?> > https://github.com/bioperl/bioperl-live/issues/305 > > Try installing MAKER without conda (make sure to remove any components > that are in conda first to avoid conflicts). > > ?Carson > > > On Feb 13, 2019, at 5:20 AM, SARIGOEL, FATIH > wrote: > > Greetings, > I notice that you never mention conda installation on your website, so I > am curious if the conda version is actually supposed to be working fine or > not; as for me it didn't. > I created a new conda environment and installed Maker (tried this with > both installation options) > When I run the example files, I get this error: > > "make: *** [Makefile:330: IndexedBase_14e0.o] Error 127 > A problem was encountered while attempting to compile and install your > Inline > C code. The command that failed was: > "make > out.make 2>&1" with error code 2" > > My conda environment is here > /fast_new/work/users/fsarigo_m/miniconda3 > I don't understand why the program is trying to look here: > /home/conda > which does not exist > > Also begins with a "possible precedence issue" > > Thanks for your help in advance! > Fatih > > +++++ > > Here is the full log until the end of the contig: > > (MakerX) [fsarigo_m at med0223 MAKER]$ maker > Possible precedence issue with control flow operator at > /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/Bio/DB/IndexedBase.pm > line 845. > STATUS: Parsing control files... > STATUS: Processing and indexing input FASTA files... > STATUS: Setting up database for any GFF3 input... > A data structure will be created for you at: > > /fast_new/work/users/fsarigo_m/scratch/tmp/urchin_holder/SeaUrchins/MAKER/dpp_contig.maker.output/dpp_contig_datastore > > To access files for individual sequences use the datastore index: > > /fast_new/work/users/fsarigo_m/scratch/tmp/urchin_holder/SeaUrchins/MAKER/dpp_contig.maker.output/dpp_contig_master_datastore_index.log > > STATUS: Now running MAKER... > examining contents of the fasta file and run log > > > > --Next Contig-- > > Processing run.log file... > #--------------------------------------------------------------------- > Now starting the contig!! > SeqID: contig-dpp-500-500 > Length: 32156 > #--------------------------------------------------------------------- > > > Running Mkbootstrap for IndexedBase_14e0 () > chmod 644 "IndexedBase_14e0.bs" > "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/perl" > -MExtUtils::Command::MM -e 'cp_nonempty' -- IndexedBase_14e0.bs > blib/arch/auto/Bio/DB/IndexedBase_14e0/IndexedBase_14e0.bs 644 > "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/perl" > "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/ExtUtils/xsubpp" > -typemap > "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/5.26.2/ExtUtils/typemap" > IndexedBase_14e0.xs > IndexedBase_14e0.xsc > mv IndexedBase_14e0.xsc IndexedBase_14e0.c > /home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/bin/x86_64-conda_cos6-linux-gnu-gcc > -c -I"/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin" > -D_REENTRANT -D_GNU_SOURCE > --sysroot=/home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/x86_64-conda_cos6-linux-gnu/sysroot > -fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong > -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D_FORTIFY_SOURCE=2 -O2 > -DVERSION=\"0.00\" -DXS_VERSION=\"0.00\" -fPIC > --sysroot=/home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/x86_64-conda_cos6-linux-gnu/sysroot > "-I/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/5.26.2/x86_64-linux-thread-multi/CORE" > IndexedBase_14e0.c > /bin/sh: > /home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/bin/x86_64-conda_cos6-linux-gnu-gcc: > No such file or directory > make: *** [Makefile:330: IndexedBase_14e0.o] Error 127 > > A problem was encountered while attempting to compile and install your > Inline > C code. The command that failed was: > "make > out.make 2>&1" with error code 2 > > The build directory was: > > /fast_new/work/users/fsarigo_m/scratch/tmp/urchin_holder/SeaUrchins/MAKER/_Inline/build/Bio/DB/IndexedBase_14e0 > > To debug the problem, cd to the build directory, and inspect the output > files. > > Environment PATH = > '/fast/users/fsarigo_m/miniconda3/envs/MakerX/bin:/fast/users/fsarigo_m/miniconda3/bin:/opt/sge/bin:/opt/sge/bin/lx-amd64:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/fast/users/fsarigo_m/.local/bin:/fast/users/fsarigo_m/bin' > at > /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/Bio/DB/IndexedBase.pm > line 275. > --> rank=NA, hostname=med0223 > ...propagated at > /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/Inline/C.pm > line 869. > --> rank=NA, hostname=med0223 > at > /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Error.pm > line 38. > Error::_throw_Error_Simple(HASH(0x564b40c78870)) called at > /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Error.pm > line 306 > Error::subs::run_clauses(HASH(0x564b40688970), "Running Mkbootstrap for > IndexedBase_14e0 ()\x{a}chmod 644 \"Indexe"..., undef, > ARRAY(0x564b40673ad0)) called at > /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Error.pm > line 426 > Error::subs::try(CODE(0x564b406899b8), HASH(0x564b40688970)) called at > /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/FastaSeq.pm > line 95 > FastaSeq::seq(FastaSeq=HASH(0x564b4068a7f0)) called at > /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Process/MpiChunk.pm > line 478 > Process::MpiChunk::_go(Process::MpiChunk=HASH(0x564b40673c08), "run", > HASH(0x564b40673c80), 0, 0) called at > /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Process/MpiChunk.pm > line 341 > Process::MpiChunk::run(Process::MpiChunk=HASH(0x564b40673c08), 0) called > at > /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Process/MpiChunk.pm > line 357 > Process::MpiChunk::run_all(Process::MpiChunk=HASH(0x564b40673c08), 0) > called at > /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Process/MpiTiers.pm > line 287 > Process::MpiTiers::run_all(Process::MpiTiers=HASH(0x564b4053f9f0), 0) > called at /fast/users/fsarigo_m/miniconda3/envs/MakerX/bin/maker line 683 > Running Mkbootstrap for IndexedBase_14e0 () > chmod 644 "IndexedBase_14e0.bs" > "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/perl" > -MExtUtils::Command::MM -e 'cp_nonempty' -- IndexedBase_14e0.bs > blib/arch/auto/Bio/DB/IndexedBase_14e0/IndexedBase_14e0.bs 644 > "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/perl" > "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/ExtUtils/xsubpp" > -typemap > "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/5.26.2/ExtUtils/typemap" > IndexedBase_14e0.xs > IndexedBase_14e0.xsc > mv IndexedBase_14e0.xsc IndexedBase_14e0.c > /home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/bin/x86_64-conda_cos6-linux-gnu-gcc > -c -I"/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin" > -D_REENTRANT -D_GNU_SOURCE > --sysroot=/home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/x86_64-conda_cos6-linux-gnu/sysroot > -fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong > -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D_FORTIFY_SOURCE=2 -O2 > -DVERSION=\"0.00\" -DXS_VERSION=\"0.00\" -fPIC > --sysroot=/home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/x86_64-conda_cos6-linux-gnu/sysroot > "-I/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/5.26.2/x86_64-linux-thread-multi/CORE" > IndexedBase_14e0.c > /bin/sh: > /home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/bin/x86_64-conda_cos6-linux-gnu-gcc: > No such file or directory > make: *** [Makefile:330: IndexedBase_14e0.o] Error 127 > > A problem was encountered while attempting to compile and install your > Inline > C code. The command that failed was: > "make > out.make 2>&1" with error code 2 > > The build directory was: > > /fast_new/work/users/fsarigo_m/scratch/tmp/urchin_holder/SeaUrchins/MAKER/_Inline/build/Bio/DB/IndexedBase_14e0 > > To debug the problem, cd to the build directory, and inspect the output > files. > > Environment PATH = > '/fast/users/fsarigo_m/miniconda3/envs/MakerX/bin:/fast/users/fsarigo_m/miniconda3/bin:/opt/sge/bin:/opt/sge/bin/lx-amd64:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/fast/users/fsarigo_m/.local/bin:/fast/users/fsarigo_m/bin' > at > /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/Bio/DB/IndexedBase.pm > line 275. > --> rank=NA, hostname=med0223 > ...propagated at > /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/Inline/C.pm > line 869. > --> rank=NA, hostname=med0223 > --> rank=NA, hostname=med0223 > --> rank=NA, hostname=med0223 > ERROR: Failed while examining contents of the fasta file and run log > ERROR: Chunk failed at level:0, tier_type:0 > FAILED CONTIG:contig-dpp-500-500 > > examining contents of the fasta file and run log > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From morgan_starr_s at live.com Mon Feb 18 02:08:56 2019 From: morgan_starr_s at live.com (morgan sobol) Date: Mon, 18 Feb 2019 09:08:56 +0000 Subject: [maker-devel] Re-annotation, fewer gene predictions In-Reply-To: References: <77517CF5-7C20-43FE-94EB-7F45A3D70A4F@live.com> , Message-ID: Thank you, Xabi and Carson. With your help, I was able to improve the annotation with a more appropriate number of predictions. Best, Morgan ________________________________ From: Xabier V?zquez-Campos Sent: Wednesday, February 6, 2019 11:33 PM To: morgan sobol; Maker Mailing List Subject: Re: [maker-devel] Re-annotation, fewer gene predictions SNAP is easy to train, works well in fungal genomes and it's explained in Maker's wiki: http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial_for_WGS_Assembly_and_Annotation_Winter_School_2018#Training_ab_initio_Gene_Predictors Oh, sorry, I didn't explain myself well. What I was trying to say is that before BUSCO, when we only had CEGMA, we would proceed in a different way to train Augustus as CEGMA wouldn't produce Augustus gene models automatically. I don't mean you to use CEGMA. This is what I have on my own documentation about how to train Augustus "the old way" AUGUSTUS? the old way Alternatively, you can train AUGUSTUS in a more ?manual? way, like when we were using CEGMA. The training starts with the output from the second instance of fathom in the SNAP training section. cd ${MYGENOME_DIR}/maker/snap1 perl ~/bin/zff2augustus_gbk.pl > ${MYGENOME}.train1.gb zff2augustus_gbk.pl generates a GenBank file from export.dna. The actual training of AUGUSTUS will be through the webAUGUSTUS server. Before proceed, it is recommended to rename the fasta headers, specially if they contain special characters and/or very long headers. This is the main reason of failure for the jobs submitted to webAUGUSTUS. You can use the simplifyFastaHeaders.pl script for that: perl ~/bin/simplifyFastaHeaders.pl ${MYGENOME}_assembly.fasta nameStem ${MYGENOME}_contigs_rename.fasta ${MYGENOME}_contigs.map perl ~/bin/simplifyFastaHeaders.pl ${MYGENOME}_transcripts_assembled.fasta nameStem ${MYGENOME}_rna_rename.fasta ${MYGENOME}_rna.map nameStem is the base name for naming each of the sequences in the multifasta files. Use a value with something appropriate. Use contig and rna for the assembly and RNA-seq files, respectively; or something based on that. For example, ?pgcontig? and ?pgrna? for contigs and RNA from Puccinia graminis DO NOT give the same nameStem to both fasta files, and don?t use any special character. We need the following files (minimum): * ${MYGENOME}_assembly.fasta as Genome file * ${MYGENOME}.train1.gb as Training gene structure file If we also have RNA-seq data: * ${MYGENOME}_assembled_transcripts.fasta as cDNA file Use ${MYGENOME}_v1 as Species name. We will need to have a different species name in the retraining step. Otherwise when Maker2 is rerun, Maker2 will see the same name and will not rerun AUGUSTUS, even though the species profile is different. So, ${MYGENOME}_v1 just do the job and tracks version. Once the job is finished, the Species parameter archive (parameters.tar.gz) will contain a folder with the model files for your species. Copy it to the species folder of your AUGUSTUS installation. Hope this helps PS: hit reply all so this is logged in Maker's mail list in case anybody else experiences similar issues On Thu, 7 Feb 2019 at 06:36, morgan sobol > wrote: I have not used SNAP or CEGMA, however, I see that CEGMA was discontinued in 2015. Do you think that will be a problem, or is it still worth using the old version? ________________________________ From: Xabier V?zquez-Campos > Sent: Tuesday, February 5, 2019 4:42 PM To: morgan sobol; Maker Mailing List Subject: Re: [maker-devel] Re-annotation, fewer gene predictions Don't you use SNAP? It usually produces quite decent results. And easier to train than any of the other predictors In any case, the Augustus gene model is way off in both cases GM doesn't seem bad if your fungus has a rather usual genome... in the first. For the second, it looks bad I'm not too familiar with the reannotation but I'd rather create the gene models from scratch rather than reuse the ones from the Illumina-only genomes. Note that assemblies with long-reads, have a higher proportion of repetitive elements that need masking and RepeatMasker only may not be enough. In theory, this shouldn't affect Augustus model if trained through BUSCO as it uses defined conserved markers to create the gene model, but I'm not so sure about GM. If you trained Augustus with BUSCO, and this is the result, I'd discard the gene model and train it again by the "traditional way", i.e. as it used to be when we only had CEGMA. I had good results just by changing the training method. Hope it helps, Xabi On Wed, 6 Feb 2019 at 02:19, morgan sobol > wrote: Thank you, Xabi for the response. The number of proteins from each source is greatly lower than before. Previous numbers were 325, 10,899, and 11,243 for augustus, genemark, and maker respectively. The more recent numbers are 25, 857, 4418 respectively. So do you think maybe this hints that something is wrong from genemark? Morgan ________________________________ From: Xabier V?zquez-Campos > Sent: Sunday, February 3, 2019 4:43 PM To: morgan sobol Cc: maker-devel at yandell-lab.org Subject: Re: [maker-devel] Re-annotation, fewer gene predictions Hi Morgan, We had a similar issue with AUGUSTUS underpredicting when using a BUSCO-derived gene model https://groups.google.com/d/msg/maker-devel/ocnDG4nq1A8/NyCPzzRgAgAJ Also, check the number of proteins by each individual predictor. If the numbers from one of them are off, you may find a possible source of issues. We didn't have a very good experience with GM, as it used to overpredict an absurd number of proteins. Xabi On Mon, 4 Feb 2019 at 06:15, morgan sobol > wrote: Hello, I previously used Maker to annotate two different fungal genomes that were created using Illumina sequences only. For these genomes, I had over 11,000 genes predicted. I recently obtained PacBio sequences for the same genomes, so I created two hybrid assemblies. Both assemblies were very familiar in length and completed number of orthologs to the Illumina only assembly, but had much fewer, but longer contigs. I re-ran Maker using the settings below. For one of my genomes, I got around 11,000 genes predicted again, as expected. However, for the other genome, I am continuously getting ~4,400 predicted genes. I am asking for help as to how I can determine why I keep getting fewer predicted genes for only one of my genomes, even though I ran them the same? Thanks, Morgan S. maker_opts.log #-----Genome (these are always required) genome=/work/Geomicrobiology/msobol/IODP_329_SPG/1368D2H1/repeatmasker/unicycler/1368D_unicycler_contigs.fasta.masked #genome sequence (fasta file or$ organism_type=eukaryotic #eukaryotic or prokaryotic. Default is eukaryotic #-----Re-annotation Using MAKER Derived GFF3 maker_gff=/work/Geomicrobiology/msobol/IODP_329_SPG/1368D2H1/maker/1368D_2H1_contigs.fasta.maker.output/1368D_2H1_contigs.fasta.all.gff #MAKER derive$ est_pass=0 #use ESTs in maker_gff: 1 = yes, 0 = no altest_pass=1 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no protein_pass=1 #use protein alignments in maker_gff: 1 = yes, 0 = no rm_pass=0 #use repeats in maker_gff: 1 = yes, 0 = no model_pass=0 #use gene models in maker_gff: 1 = yes, 0 = no pred_pass=0 #use ab-initio predictions in maker_gff: 1 = yes, 0 = no other_pass=0 #passthrough anyything else in maker_gff: 1 = yes, 0 = no #-----EST Evidence (for best results provide a file for at least one) est= #set of ESTs or assembled mRNA-seq in fasta format altest= #EST/cDNA sequence file in fasta format from an alternate organism est_gff= #aligned ESTs or mRNA-seq from an external GFF3 file altest_gff= #aligned ESTs from a closly relate species in GFF3 format #-----Protein Homology Evidence (for best results provide a file for at least one) protein=/work/Geomicrobiology/msobol/IODP_329_SPG/uniprot_sprot.fasta #protein sequence file in fasta format (i.e. from mutiple oransisms) protein_gff= #aligned protein homology evidence from an external GFF3 file #-----Repeat Masking (leave values blank to skip repeat masking) model_org= #select a model organism for RepBase masking in RepeatMasker rmlib= #provide an organism specific repeat library in fasta format for RepeatMasker repeat_protein= #provide a fasta file of transposable element proteins for RepeatRunner rm_gff= #pre-identified repeat elements from an external GFF3 file prok_rm=0 #forces MAKER to repeatmask prokaryotes (no reason to change this), 1 = yes, 0 = no softmask=0 #use soft-masking rather than hard-masking in BLAST (i.e. seg and dust filtering) #-----Gene Prediction snaphmm= #SNAP HMM file gmhmm=/home/msobol/genemark/68D_2/output/gmhmm.mod #GeneMark HMM file augustus_species=1368D_uni #Augustus gene prediction species model fgenesh_par_file= #FGENESH parameter file pred_gff= #ab-initio predictions from an external GFF3 file model_gff= #annotated gene models from an external GFF3 file (annotation pass-through) est2genome=0 #infer gene predictions directly from ESTs, 1 = yes, 0 = no protein2genome=1 #infer predictions from protein homology, 1 = yes, 0 = no trna=0 #find tRNAs with tRNAscan, 1 = yes, 0 = no snoscan_rrna= #rRNA file to have Snoscan find snoRNAs unmask=0 #also run ab-initio prediction programs on unmasked sequence, 1 = yes, 0 = no #-----Other Annotation Feature Types (features MAKER doesn't recognize) other_gff= #extra features to pass-through to final MAKER generated GFF3 file #-----External Application Behavior Options alt_peptide=C #amino acid used to replace non-standard amino acids in BLAST databases cpus=1 #max number of cpus to use in BLAST and RepeatMasker (not for MPI, leave 1 when using MPI) #-----MAKER Behavior Options max_dna_len=100000 #length for dividing up contigs into chunks (increases/decreases memory usage) min_contig=1 #skip genome contigs below this length (under 10kb are often useless) pred_flank=200 #flank for extending evidence clusters sent to gene predictors pred_stats=1 #report AED and QI statistics for all predictions as well as models AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and 1) min_protein=0 #require at least this many amino acids in predicted proteins alt_splice=0 #Take extra steps to try and find alternative splicing, 1 = yes, 0 = no always_complete=0 #extra steps to force start and stop codons, 1 = yes, 0 = no map_forward=0 #map names and attributes forward from old GFF3 genes, 1 = yes, 0 = no keep_preds=1 #Concordance threshold to add unsupported gene prediction (bound by 0 and 1) split_hit=10000 #length for the splitting of hits (expected max intron size for evidence alignments) single_exon=1 #consider single exon EST evidence when generating annotations, 1 = yes, 0 = no single_length=250 #min length required for single exon ESTs if 'single_exon is enabled' correct_est_fusion=0 #limits use of ESTs in annotation to avoid fusion genes tries=2 #number of times to try a contig if there is a failure for some reason clean_try=0 #remove all data from previous run before retrying, 1 = yes, 0 = no clean_up=0 #removes theVoid directory with individual analysis files, 1 = yes, 0 = no TMP= #specify a directory other than the system default temporary directory for temporary files _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -- Xabier V?zquez-Campos, PhD Research Associate NSW Systems Biology Initiative School of Biotechnology and Biomolecular Sciences The University of New South Wales Sydney NSW 2052 AUSTRALIA -- Xabier V?zquez-Campos, PhD Research Associate NSW Systems Biology Initiative School of Biotechnology and Biomolecular Sciences The University of New South Wales Sydney NSW 2052 AUSTRALIA -- Xabier V?zquez-Campos, PhD Research Associate NSW Systems Biology Initiative School of Biotechnology and Biomolecular Sciences The University of New South Wales Sydney NSW 2052 AUSTRALIA -------------- next part -------------- An HTML attachment was scrubbed... URL: From anthony.bretaudeau at inria.fr Mon Feb 18 02:53:39 2019 From: anthony.bretaudeau at inria.fr (Anthony Bretaudeau) Date: Mon, 18 Feb 2019 10:53:39 +0100 Subject: [maker-devel] Does Conda Maker actually work? In-Reply-To: References: <0A81593F-EB19-417F-9C9D-3C55178F5D0F@gmail.com> Message-ID: <3aa1eb97-f8bf-dd61-febf-464ad4b1626c@inria.fr> An HTML attachment was scrubbed... URL: From liorglic at mail.tau.ac.il Sun Feb 24 05:50:49 2019 From: liorglic at mail.tau.ac.il (Lior Glick) Date: Sun, 24 Feb 2019 14:50:49 +0200 Subject: [maker-devel] Profiling MAKER runs Message-ID: Dear MAKER users, I was wondering if any of you has an idea of a way by which I can profile my runs. What I mean is I'd like to know how much time was spent on each step of the analysis - am I spending most of the time masking repeats, blasting transcripts/proteins, running ab-initio predictors etc. Based on this information, I might want to adjust my configuration, e.g. maybe I'm spending a lot of time blasting transcripts, and reducing the number of input transcripts would reduce run time significantly without having a major effect on results quality. As far as I can see, the main run log does not provide such information, and I'm not sure where else to look. Any ideas or directions could be of help. Thanks! Lior -------------- next part -------------- An HTML attachment was scrubbed... URL: From morgan_starr_s at live.com Sun Feb 3 12:13:47 2019 From: morgan_starr_s at live.com (morgan sobol) Date: Sun, 3 Feb 2019 19:13:47 +0000 Subject: [maker-devel] Re-annotation, fewer gene predictions Message-ID: <77517CF5-7C20-43FE-94EB-7F45A3D70A4F@live.com> Hello, I previously used Maker to annotate two different fungal genomes that were created using Illumina sequences only. For these genomes, I had over 11,000 genes predicted. I recently obtained PacBio sequences for the same genomes, so I created two hybrid assemblies. Both assemblies were very familiar in length and completed number of orthologs to the Illumina only assembly, but had much fewer, but longer contigs. I re-ran Maker using the settings below. For one of my genomes, I got around 11,000 genes predicted again, as expected. However, for the other genome, I am continuously getting ~4,400 predicted genes. I am asking for help as to how I can determine why I keep getting fewer predicted genes for only one of my genomes, even though I ran them the same? Thanks, Morgan S. maker_opts.log #-----Genome (these are always required) genome=/work/Geomicrobiology/msobol/IODP_329_SPG/1368D2H1/repeatmasker/unicycler/1368D_unicycler_contigs.fasta.masked #genome sequence (fasta file or$ organism_type=eukaryotic #eukaryotic or prokaryotic. Default is eukaryotic #-----Re-annotation Using MAKER Derived GFF3 maker_gff=/work/Geomicrobiology/msobol/IODP_329_SPG/1368D2H1/maker/1368D_2H1_contigs.fasta.maker.output/1368D_2H1_contigs.fasta.all.gff #MAKER derive$ est_pass=0 #use ESTs in maker_gff: 1 = yes, 0 = no altest_pass=1 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no protein_pass=1 #use protein alignments in maker_gff: 1 = yes, 0 = no rm_pass=0 #use repeats in maker_gff: 1 = yes, 0 = no model_pass=0 #use gene models in maker_gff: 1 = yes, 0 = no pred_pass=0 #use ab-initio predictions in maker_gff: 1 = yes, 0 = no other_pass=0 #passthrough anyything else in maker_gff: 1 = yes, 0 = no #-----EST Evidence (for best results provide a file for at least one) est= #set of ESTs or assembled mRNA-seq in fasta format altest= #EST/cDNA sequence file in fasta format from an alternate organism est_gff= #aligned ESTs or mRNA-seq from an external GFF3 file altest_gff= #aligned ESTs from a closly relate species in GFF3 format #-----Protein Homology Evidence (for best results provide a file for at least one) protein=/work/Geomicrobiology/msobol/IODP_329_SPG/uniprot_sprot.fasta #protein sequence file in fasta format (i.e. from mutiple oransisms) protein_gff= #aligned protein homology evidence from an external GFF3 file #-----Repeat Masking (leave values blank to skip repeat masking) model_org= #select a model organism for RepBase masking in RepeatMasker rmlib= #provide an organism specific repeat library in fasta format for RepeatMasker repeat_protein= #provide a fasta file of transposable element proteins for RepeatRunner rm_gff= #pre-identified repeat elements from an external GFF3 file prok_rm=0 #forces MAKER to repeatmask prokaryotes (no reason to change this), 1 = yes, 0 = no softmask=0 #use soft-masking rather than hard-masking in BLAST (i.e. seg and dust filtering) #-----Gene Prediction snaphmm= #SNAP HMM file gmhmm=/home/msobol/genemark/68D_2/output/gmhmm.mod #GeneMark HMM file augustus_species=1368D_uni #Augustus gene prediction species model fgenesh_par_file= #FGENESH parameter file pred_gff= #ab-initio predictions from an external GFF3 file model_gff= #annotated gene models from an external GFF3 file (annotation pass-through) est2genome=0 #infer gene predictions directly from ESTs, 1 = yes, 0 = no protein2genome=1 #infer predictions from protein homology, 1 = yes, 0 = no trna=0 #find tRNAs with tRNAscan, 1 = yes, 0 = no snoscan_rrna= #rRNA file to have Snoscan find snoRNAs unmask=0 #also run ab-initio prediction programs on unmasked sequence, 1 = yes, 0 = no #-----Other Annotation Feature Types (features MAKER doesn't recognize) other_gff= #extra features to pass-through to final MAKER generated GFF3 file #-----External Application Behavior Options alt_peptide=C #amino acid used to replace non-standard amino acids in BLAST databases cpus=1 #max number of cpus to use in BLAST and RepeatMasker (not for MPI, leave 1 when using MPI) #-----MAKER Behavior Options max_dna_len=100000 #length for dividing up contigs into chunks (increases/decreases memory usage) min_contig=1 #skip genome contigs below this length (under 10kb are often useless) pred_flank=200 #flank for extending evidence clusters sent to gene predictors pred_stats=1 #report AED and QI statistics for all predictions as well as models AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and 1) min_protein=0 #require at least this many amino acids in predicted proteins alt_splice=0 #Take extra steps to try and find alternative splicing, 1 = yes, 0 = no always_complete=0 #extra steps to force start and stop codons, 1 = yes, 0 = no map_forward=0 #map names and attributes forward from old GFF3 genes, 1 = yes, 0 = no keep_preds=1 #Concordance threshold to add unsupported gene prediction (bound by 0 and 1) split_hit=10000 #length for the splitting of hits (expected max intron size for evidence alignments) single_exon=1 #consider single exon EST evidence when generating annotations, 1 = yes, 0 = no single_length=250 #min length required for single exon ESTs if 'single_exon is enabled' correct_est_fusion=0 #limits use of ESTs in annotation to avoid fusion genes tries=2 #number of times to try a contig if there is a failure for some reason clean_try=0 #remove all data from previous run before retrying, 1 = yes, 0 = no clean_up=0 #removes theVoid directory with individual analysis files, 1 = yes, 0 = no TMP= #specify a directory other than the system default temporary directory for temporary files -------------- next part -------------- An HTML attachment was scrubbed... URL: From xvazquezc at gmail.com Sun Feb 3 15:43:42 2019 From: xvazquezc at gmail.com (=?UTF-8?Q?Xabier_V=C3=A1zquez=2DCampos?=) Date: Mon, 4 Feb 2019 09:43:42 +1100 Subject: [maker-devel] Re-annotation, fewer gene predictions In-Reply-To: <77517CF5-7C20-43FE-94EB-7F45A3D70A4F@live.com> References: <77517CF5-7C20-43FE-94EB-7F45A3D70A4F@live.com> Message-ID: Hi Morgan, We had a similar issue with AUGUSTUS underpredicting when using a BUSCO-derived gene model https://groups.google.com/d/msg/maker-devel/ocnDG4nq1A8/NyCPzzRgAgAJ Also, check the number of proteins by each individual predictor. If the numbers from one of them are off, you may find a possible source of issues. We didn't have a very good experience with GM, as it used to overpredict an absurd number of proteins. Xabi On Mon, 4 Feb 2019 at 06:15, morgan sobol wrote: > Hello, > > I previously used Maker to annotate two different fungal genomes that were > created using Illumina sequences only. For these genomes, I had over 11,000 > genes predicted. > I recently obtained PacBio sequences for the same genomes, so I created > two hybrid assemblies. Both assemblies were very familiar in length and > completed number of orthologs to the Illumina only assembly, but had much > fewer, but longer contigs. > > I re-ran Maker using the settings below. For one of my genomes, I got > around 11,000 genes predicted again, as expected. However, for the other > genome, I am continuously getting ~4,400 predicted genes. > > I am asking for help as to how I can determine why I keep getting fewer > predicted genes for only one of my genomes, even though I ran them the same? > > Thanks, > Morgan S. > > maker_opts.log > #-----Genome (these are always required) > genome=/work/Geomicrobiology/msobol/IODP_329_SPG/1368D2H1/repeatmasker/unicycler/1368D_unicycler_contigs.fasta.masked > #genome sequence (fasta file or$ > organism_type=eukaryotic #eukaryotic or prokaryotic. Default is eukaryotic > > #-----Re-annotation Using MAKER Derived GFF3 > maker_gff=/work/Geomicrobiology/msobol/IODP_329_SPG/1368D2H1/maker/1368D_2H1_contigs.fasta.maker.output/1368D_2H1_contigs.fasta.all.gff > #MAKER derive$ > est_pass=0 #use ESTs in maker_gff: 1 = yes, 0 = no > altest_pass=1 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no > protein_pass=1 #use protein alignments in maker_gff: 1 = yes, 0 = no > rm_pass=0 #use repeats in maker_gff: 1 = yes, 0 = no > model_pass=0 #use gene models in maker_gff: 1 = yes, 0 = no > pred_pass=0 #use ab-initio predictions in maker_gff: 1 = yes, 0 = no > other_pass=0 #passthrough anyything else in maker_gff: 1 = yes, 0 = no > > #-----EST Evidence (for best results provide a file for at least one) > est= #set of ESTs or assembled mRNA-seq in fasta format > altest= #EST/cDNA sequence file in fasta format from an alternate organism > est_gff= #aligned ESTs or mRNA-seq from an external GFF3 file > altest_gff= #aligned ESTs from a closly relate species in GFF3 format > > #-----Protein Homology Evidence (for best results provide a file for at > least one) > protein=/work/Geomicrobiology/msobol/IODP_329_SPG/uniprot_sprot.fasta > #protein sequence file in fasta format (i.e. from mutiple oransisms) > protein_gff= #aligned protein homology evidence from an external GFF3 file > > #-----Repeat Masking (leave values blank to skip repeat masking) > model_org= #select a model organism for RepBase masking in RepeatMasker > rmlib= #provide an organism specific repeat library in fasta format for > RepeatMasker > repeat_protein= #provide a fasta file of transposable element proteins for > RepeatRunner > rm_gff= #pre-identified repeat elements from an external GFF3 file > prok_rm=0 #forces MAKER to repeatmask prokaryotes (no reason to change > this), 1 = yes, 0 = no > softmask=0 #use soft-masking rather than hard-masking in BLAST (i.e. seg > and dust filtering) > > #-----Gene Prediction > snaphmm= #SNAP HMM file > gmhmm=/home/msobol/genemark/68D_2/output/gmhmm.mod #GeneMark HMM file > augustus_species=1368D_uni #Augustus gene prediction species model > fgenesh_par_file= #FGENESH parameter file > pred_gff= #ab-initio predictions from an external GFF3 file > model_gff= #annotated gene models from an external GFF3 file (annotation > pass-through) > est2genome=0 #infer gene predictions directly from ESTs, 1 = yes, 0 = no > protein2genome=1 #infer predictions from protein homology, 1 = yes, 0 = no > trna=0 #find tRNAs with tRNAscan, 1 = yes, 0 = no > snoscan_rrna= #rRNA file to have Snoscan find snoRNAs > unmask=0 #also run ab-initio prediction programs on unmasked sequence, 1 = > yes, 0 = no > > #-----Other Annotation Feature Types (features MAKER doesn't recognize) > other_gff= #extra features to pass-through to final MAKER generated GFF3 > file > > #-----External Application Behavior Options > alt_peptide=C #amino acid used to replace non-standard amino acids in > BLAST databases > cpus=1 #max number of cpus to use in BLAST and RepeatMasker (not for MPI, > leave 1 when using MPI) > > #-----MAKER Behavior Options > max_dna_len=100000 #length for dividing up contigs into chunks > (increases/decreases memory usage) > min_contig=1 #skip genome contigs below this length (under 10kb are often > useless) > > pred_flank=200 #flank for extending evidence clusters sent to gene > predictors > pred_stats=1 #report AED and QI statistics for all predictions as well as > models > AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and > 1) > min_protein=0 #require at least this many amino acids in predicted proteins > alt_splice=0 #Take extra steps to try and find alternative splicing, 1 = > yes, 0 = no > always_complete=0 #extra steps to force start and stop codons, 1 = yes, 0 > = no > map_forward=0 #map names and attributes forward from old GFF3 genes, 1 = > yes, 0 = no > keep_preds=1 #Concordance threshold to add unsupported gene prediction > (bound by 0 and 1) > > split_hit=10000 #length for the splitting of hits (expected max intron > size for evidence alignments) > single_exon=1 #consider single exon EST evidence when generating > annotations, 1 = yes, 0 = no > single_length=250 #min length required for single exon ESTs if > 'single_exon is enabled' > correct_est_fusion=0 #limits use of ESTs in annotation to avoid fusion > genes > > tries=2 #number of times to try a contig if there is a failure for some > reason > clean_try=0 #remove all data from previous run before retrying, 1 = yes, 0 > = no > clean_up=0 #removes theVoid directory with individual analysis files, 1 = > yes, 0 = no > TMP= #specify a directory other than the system default temporary > directory for temporary files > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -- Xabier V?zquez-Campos, *PhD* *Research Associate* NSW Systems Biology Initiative School of Biotechnology and Biomolecular Sciences The University of New South Wales Sydney NSW 2052 AUSTRALIA -------------- next part -------------- An HTML attachment was scrubbed... URL: From keith.decker at bayer.com Mon Feb 4 11:09:35 2019 From: keith.decker at bayer.com (DECKER, KEITH F [AG/1005]) Date: Mon, 4 Feb 2019 18:09:35 +0000 Subject: [maker-devel] MAKER on AWS Message-ID: <4660790F-38F4-470B-8B8E-9911A0BC36C3@contoso.com> I would like to evaluate the use of MAKER on AWS, but I am unsure what the best approach to parallelization would be. I found this old post on STARCLUSTER, http://efish.integrativebiology.msu.edu/2015/02/10/annotate.html but my understanding is that STARCLUSTER and its successors (cfncluster and parallel cluster) can be challenging to set up and use. So my questions are 1. Has anyone had recent success running MAKER on cfncluster or parallel cluster in AWS? 2. Would it be reasonable to just split up N chromosomes across N ECS instances and collect the results at the end? If so, does it make sense to run each chromosome level annotation on for example an m4.16xlarge instance with 64 cores and 256 GB of RAM? Or is there a maximum number of cores at which the benefits from parallelization saturate? Thanks and sorry for the long question Keith This system contains confidential and copyrighted information. Access to the system is limited to users only and only for approved business purposes. Anyone obtaining access to and using this system acknowledges that all information on this system including but not limited to electronic mail, word processing, directories and files, constitutes private property belonging to the Company. Anyone using of viewing this system is further advised that the use of this system may be recorded and the information contained herein may be monitored, retrieved and reviewed if, in the Company?s sole discretion there is a business reason to do so. If improper activity or use is suspected, all available information may be used by the Company for possible disciplinary action, prosecution, civil claim or any remedy or lawful purpose. -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Feb 4 11:31:29 2019 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 4 Feb 2019 11:31:29 -0700 Subject: [maker-devel] MAKER on AWS In-Reply-To: <4660790F-38F4-470B-8B8E-9911A0BC36C3@contoso.com> References: <4660790F-38F4-470B-8B8E-9911A0BC36C3@contoso.com> Message-ID: <0934DD0D-9431-4454-A278-87E27D44F984@gmail.com> You can try and stand up a cluster inside AWS, or like you said just start independent instances each with their own piece of the total dataset. There is a tools called fasta_tool inside of maker that makes it easy to split up the dataset into equal sized chunks. Alternatively, CyVerse has set up an interesting MAKER wrapper (WQ-MAKER) that launches multiple cloud instances for MAKER and handles data chunking for you (they?ve been using XSEDE cloud resources through the NSF) ?> http://ccl.cse.nd.edu/research/papers/maker-service-ic2e2018.pdf Here is an example of an external project using their setup ?> http://onsnetwork.org/kubu4/2018/08/07/genome-annotation-olympia-oyster-genome-using-wq-maker-instance-on-jetstream/ ?Carson > On Feb 4, 2019, at 11:09 AM, DECKER, KEITH F [AG/1005] wrote: > > I would like to evaluate the use of MAKER on AWS, but I am unsure what the best approach to parallelization would be. > I found this old post on STARCLUSTER, http://efish.integrativebiology.msu.edu/2015/02/10/annotate.html > but my understanding is that STARCLUSTER and its successors (cfncluster and parallel cluster) can be challenging to set up and use. > > So my questions are > > 1. Has anyone had recent success running MAKER on cfncluster or parallel cluster in AWS? > 2. Would it be reasonable to just split up N chromosomes across N ECS instances and collect the results at the end? If so, does it make sense to run each chromosome level annotation on for example an m4.16xlarge instance with 64 cores and 256 GB of RAM? Or is there a maximum number of cores at which the benefits from parallelization saturate? > > Thanks and sorry for the long question > Keith > > > This system contains confidential and copyrighted information. Access to the system is limited to users only and only for approved business purposes. > Anyone obtaining access to and using this system acknowledges that all information on this system including but not limited to electronic mail, word processing, directories and files, constitutes private property belonging to the Company. > Anyone using of viewing this system is further advised that the use of this system may be recorded and the information contained herein may be monitored, retrieved and reviewed if, in the Company?s sole discretion there is a business reason to do so. > If improper activity or use is suspected, all available information may be used by the Company for possible disciplinary action, prosecution, civil claim or any remedy or lawful purpose. > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From liorglck at gmail.com Mon Feb 4 02:00:29 2019 From: liorglck at gmail.com (Lior Glick) Date: Mon, 4 Feb 2019 11:00:29 +0200 Subject: [maker-devel] MAKER not calling RepeatMasker exe indicated in maker_exe.ctl Message-ID: Dear MAKER users, I've been using MAKER for a while now, with RepeatMasker installed locally. By that I mean that I can type 'RepeatMasker' in my terminal and the software is initiated. Typing 'which RepeatMasker' shows the correct local path. I also use this path as value for the maker_exe.ctl parameter 'RepeatMasker'. Trying to generalize my working environment, I am trying to use a conda env which is capable of running MAKER. This env comes with RepeatMasker as well. Once I activate this env, I can still run RepeatMasker, but it points to a different path. When I run MAKER within this env, it fails right away with the error message: ERROR: Could not determine if RepBase is installed Running the same configuration files locally (i.e. outside the conda env) results in a successful run. This leads me to think that MAKER is not actually using the path indicated in the maker_exe.ctl file, and rather looks for RepeatMasker in $PATH or something similar. Is that the expected behavior? Any suggestions of how to overcome this issue? Thanks and best regards, Lior -------------- next part -------------- An HTML attachment was scrubbed... URL: From keith.decker at bayer.com Mon Feb 4 11:39:48 2019 From: keith.decker at bayer.com (DECKER, KEITH F [AG/1005]) Date: Mon, 4 Feb 2019 18:39:48 +0000 Subject: [maker-devel] MAKER on AWS In-Reply-To: <0934DD0D-9431-4454-A278-87E27D44F984@gmail.com> References: <4660790F-38F4-470B-8B8E-9911A0BC36C3@contoso.com> <0934DD0D-9431-4454-A278-87E27D44F984@gmail.com> Message-ID: <1BAD7C53-AFA5-4A4A-B35B-D760B3D4C28D@monsanto.com> Thanks, Do you have metrics on how MAKER performs on annotating a single chromosome on a single machine? For example, will I see anything close to 16X speed-up using a 16 core machine, and does performance improvement saturate at a certain number of cores? -Keith From: Carson Holt Date: Monday, February 4, 2019 at 12:33 PM To: "DECKER, KEITH F [AG/1005]" Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] MAKER on AWS You can try and stand up a cluster inside AWS, or like you said just start independent instances each with their own piece of the total dataset. There is a tools called fasta_tool inside of maker that makes it easy to split up the dataset into equal sized chunks. Alternatively, CyVerse has set up an interesting MAKER wrapper (WQ-MAKER) that launches multiple cloud instances for MAKER and handles data chunking for you (they?ve been using XSEDE cloud resources through the NSF) ?> http://ccl.cse.nd.edu/research/papers/maker-service-ic2e2018.pdf Here is an example of an external project using their setup ?> http://onsnetwork.org/kubu4/2018/08/07/genome-annotation-olympia-oyster-genome-using-wq-maker-instance-on-jetstream/ ?Carson On Feb 4, 2019, at 11:09 AM, DECKER, KEITH F [AG/1005] > wrote: I would like to evaluate the use of MAKER on AWS, but I am unsure what the best approach to parallelization would be. I found this old post on STARCLUSTER, http://efish.integrativebiology.msu.edu/2015/02/10/annotate.html but my understanding is that STARCLUSTER and its successors (cfncluster and parallel cluster) can be challenging to set up and use. So my questions are 1. Has anyone had recent success running MAKER on cfncluster or parallel cluster in AWS? 2. Would it be reasonable to just split up N chromosomes across N ECS instances and collect the results at the end? If so, does it make sense to run each chromosome level annotation on for example an m4.16xlarge instance with 64 cores and 256 GB of RAM? Or is there a maximum number of cores at which the benefits from parallelization saturate? Thanks and sorry for the long question Keith This system contains confidential and copyrighted information. Access to the system is limited to users only and only for approved business purposes. Anyone obtaining access to and using this system acknowledges that all information on this system including but not limited to electronic mail, word processing, directories and files, constitutes private property belonging to the Company. Anyone using of viewing this system is further advised that the use of this system may be recorded and the information contained herein may be monitored, retrieved and reviewed if, in the Company?s sole discretion there is a business reason to do so. If improper activity or use is suspected, all available information may be used by the Company for possible disciplinary action, prosecution, civil claim or any remedy or lawful purpose. _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org This system contains confidential and copyrighted information. Access to the system is limited to users only and only for approved business purposes. Anyone obtaining access to and using this system acknowledges that all information on this system including but not limited to electronic mail, word processing, directories and files, constitutes private property belonging to the Company. Anyone using of viewing this system is further advised that the use of this system may be recorded and the information contained herein may be monitored, retrieved and reviewed if, in the Company?s sole discretion there is a business reason to do so. If improper activity or use is suspected, all available information may be used by the Company for possible disciplinary action, prosecution, civil claim or any remedy or lawful purpose. -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Feb 4 12:00:00 2019 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 4 Feb 2019 12:00:00 -0700 Subject: [maker-devel] MAKER on AWS In-Reply-To: <1BAD7C53-AFA5-4A4A-B35B-D760B3D4C28D@monsanto.com> References: <4660790F-38F4-470B-8B8E-9911A0BC36C3@contoso.com> <0934DD0D-9431-4454-A278-87E27D44F984@gmail.com> <1BAD7C53-AFA5-4A4A-B35B-D760B3D4C28D@monsanto.com> Message-ID: I don?t have cloud performance stats, but I do have cluster performance stats you may be able to somewhat correlate (attached). On a cluster we see nearly linear performance gains until ~100 CPU cores, and the plateau doesn?t fully level out until well after 600 cores (we are hitting IO and networking limits for inter-node communication). So if you are only using a single instance, you can essentially consider it the equivalent of a single real machine which would fall well under 100 CPU cores, and performance growth would be expected to be linear on that instance. ?Carson > On Feb 4, 2019, at 11:39 AM, DECKER, KEITH F [AG/1005] wrote: > > Thanks, > Do you have metrics on how MAKER performs on annotating a single chromosome on a single machine? For example, will I see anything close to 16X speed-up using a 16 core machine, and does performance improvement saturate at a certain number of cores? > > -Keith > > From: Carson Holt > > Date: Monday, February 4, 2019 at 12:33 PM > To: "DECKER, KEITH F [AG/1005]" > > Cc: "maker-devel at yandell-lab.org " > > Subject: Re: [maker-devel] MAKER on AWS > > You can try and stand up a cluster inside AWS, or like you said just start independent instances each with their own piece of the total dataset. There is a tools called fasta_tool inside of maker that makes it easy to split up the dataset into equal sized chunks. > > Alternatively, CyVerse has set up an interesting MAKER wrapper (WQ-MAKER) that launches multiple cloud instances for MAKER and handles data chunking for you (they?ve been using XSEDE cloud resources through the NSF) ?> > http://ccl.cse.nd.edu/research/papers/maker-service-ic2e2018.pdf > > Here is an example of an external project using their setup ?> http://onsnetwork.org/kubu4/2018/08/07/genome-annotation-olympia-oyster-genome-using-wq-maker-instance-on-jetstream/ > > ?Carson > > > > > > On Feb 4, 2019, at 11:09 AM, DECKER, KEITH F [AG/1005] > wrote: > > I would like to evaluate the use of MAKER on AWS, but I am unsure what the best approach to parallelization would be. > I found this old post on STARCLUSTER, http://efish.integrativebiology.msu.edu/2015/02/10/annotate.html > but my understanding is that STARCLUSTER and its successors (cfncluster and parallel cluster) can be challenging to set up and use. > > So my questions are > > 1. Has anyone had recent success running MAKER on cfncluster or parallel cluster in AWS? > 2. Would it be reasonable to just split up N chromosomes across N ECS instances and collect the results at the end? If so, does it make sense to run each chromosome level annotation on for example an m4.16xlarge instance with 64 cores and 256 GB of RAM? Or is there a maximum number of cores at which the benefits from parallelization saturate? > > Thanks and sorry for the long question > Keith > > > > This system contains confidential and copyrighted information. Access to the system is limited to users only and only for approved business purposes. > Anyone obtaining access to and using this system acknowledges that all information on this system including but not limited to electronic mail, word processing, directories and files, constitutes private property belonging to the Company. > Anyone using of viewing this system is further advised that the use of this system may be recorded and the information contained herein may be monitored, retrieved and reviewed if, in the Company?s sole discretion there is a business reason to do so. > If improper activity or use is suspected, all available information may be used by the Company for possible disciplinary action, prosecution, civil claim or any remedy or lawful purpose. > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > This system contains confidential and copyrighted information. Access to the system is limited to users only and only for approved business purposes. > Anyone obtaining access to and using this system acknowledges that all information on this system including but not limited to electronic mail, word processing, directories and files, constitutes private property belonging to the Company. > Anyone using of viewing this system is further advised that the use of this system may be recorded and the information contained herein may be monitored, retrieved and reviewed if, in the Company?s sole discretion there is a business reason to do so. > If improper activity or use is suspected, all available information may be used by the Company for possible disciplinary action, prosecution, civil claim or any remedy or lawful purpose. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: PastedGraphic-2.pdf Type: application/pdf Size: 41425 bytes Desc: not available URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: From xvazquezc at gmail.com Tue Feb 5 15:42:40 2019 From: xvazquezc at gmail.com (=?UTF-8?Q?Xabier_V=C3=A1zquez=2DCampos?=) Date: Wed, 6 Feb 2019 09:42:40 +1100 Subject: [maker-devel] Re-annotation, fewer gene predictions In-Reply-To: References: <77517CF5-7C20-43FE-94EB-7F45A3D70A4F@live.com> Message-ID: Don't you use SNAP? It usually produces quite decent results. And easier to train than any of the other predictors In any case, the Augustus gene model is way off in both cases GM doesn't seem bad if your fungus has a rather usual genome... in the first. For the second, it looks bad I'm not too familiar with the reannotation but I'd rather create the gene models from scratch rather than reuse the ones from the Illumina-only genomes. Note that assemblies with long-reads, have a higher proportion of repetitive elements that need masking and RepeatMasker only may not be enough. In theory, this shouldn't affect Augustus model if trained through BUSCO as it uses defined conserved markers to create the gene model, but I'm not so sure about GM. If you trained Augustus with BUSCO, and this is the result, I'd discard the gene model and train it again by the "traditional way", i.e. as it used to be when we only had CEGMA. I had good results just by changing the training method. Hope it helps, Xabi On Wed, 6 Feb 2019 at 02:19, morgan sobol wrote: > Thank you, Xabi for the response. > The number of proteins from each source is greatly lower than before. > Previous numbers were 325, 10,899, and 11,243 for augustus, genemark, and > maker respectively. > The more recent numbers are 25, 857, 4418 respectively. > > So do you think maybe this hints that something is wrong from genemark? > > Morgan > > > ------------------------------ > *From:* Xabier V?zquez-Campos > *Sent:* Sunday, February 3, 2019 4:43 PM > *To:* morgan sobol > *Cc:* maker-devel at yandell-lab.org > *Subject:* Re: [maker-devel] Re-annotation, fewer gene predictions > > Hi Morgan, > > We had a similar issue with AUGUSTUS underpredicting when using a > BUSCO-derived gene model > https://groups.google.com/d/msg/maker-devel/ocnDG4nq1A8/NyCPzzRgAgAJ > > Also, check the number of proteins by each individual predictor. If the > numbers from one of them are off, you may find a possible source of issues. > We didn't have a very good experience with GM, as it used to overpredict > an absurd number of proteins. > > Xabi > > On Mon, 4 Feb 2019 at 06:15, morgan sobol wrote: > > Hello, > > I previously used Maker to annotate two different fungal genomes that were > created using Illumina sequences only. For these genomes, I had over 11,000 > genes predicted. > I recently obtained PacBio sequences for the same genomes, so I created > two hybrid assemblies. Both assemblies were very familiar in length and > completed number of orthologs to the Illumina only assembly, but had much > fewer, but longer contigs. > > I re-ran Maker using the settings below. For one of my genomes, I got > around 11,000 genes predicted again, as expected. However, for the other > genome, I am continuously getting ~4,400 predicted genes. > > I am asking for help as to how I can determine why I keep getting fewer > predicted genes for only one of my genomes, even though I ran them the same? > > Thanks, > Morgan S. > > maker_opts.log > #-----Genome (these are always required) > genome=/work/Geomicrobiology/msobol/IODP_329_SPG/1368D2H1/repeatmasker/unicycler/1368D_unicycler_contigs.fasta.masked > #genome sequence (fasta file or$ > organism_type=eukaryotic #eukaryotic or prokaryotic. Default is eukaryotic > > #-----Re-annotation Using MAKER Derived GFF3 > maker_gff=/work/Geomicrobiology/msobol/IODP_329_SPG/1368D2H1/maker/1368D_2H1_contigs.fasta.maker.output/1368D_2H1_contigs.fasta.all.gff > #MAKER derive$ > est_pass=0 #use ESTs in maker_gff: 1 = yes, 0 = no > altest_pass=1 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no > protein_pass=1 #use protein alignments in maker_gff: 1 = yes, 0 = no > rm_pass=0 #use repeats in maker_gff: 1 = yes, 0 = no > model_pass=0 #use gene models in maker_gff: 1 = yes, 0 = no > pred_pass=0 #use ab-initio predictions in maker_gff: 1 = yes, 0 = no > other_pass=0 #passthrough anyything else in maker_gff: 1 = yes, 0 = no > > #-----EST Evidence (for best results provide a file for at least one) > est= #set of ESTs or assembled mRNA-seq in fasta format > altest= #EST/cDNA sequence file in fasta format from an alternate organism > est_gff= #aligned ESTs or mRNA-seq from an external GFF3 file > altest_gff= #aligned ESTs from a closly relate species in GFF3 format > > #-----Protein Homology Evidence (for best results provide a file for at > least one) > protein=/work/Geomicrobiology/msobol/IODP_329_SPG/uniprot_sprot.fasta > #protein sequence file in fasta format (i.e. from mutiple oransisms) > protein_gff= #aligned protein homology evidence from an external GFF3 file > > #-----Repeat Masking (leave values blank to skip repeat masking) > model_org= #select a model organism for RepBase masking in RepeatMasker > rmlib= #provide an organism specific repeat library in fasta format for > RepeatMasker > repeat_protein= #provide a fasta file of transposable element proteins for > RepeatRunner > rm_gff= #pre-identified repeat elements from an external GFF3 file > prok_rm=0 #forces MAKER to repeatmask prokaryotes (no reason to change > this), 1 = yes, 0 = no > softmask=0 #use soft-masking rather than hard-masking in BLAST (i.e. seg > and dust filtering) > > #-----Gene Prediction > snaphmm= #SNAP HMM file > gmhmm=/home/msobol/genemark/68D_2/output/gmhmm.mod #GeneMark HMM file > augustus_species=1368D_uni #Augustus gene prediction species model > fgenesh_par_file= #FGENESH parameter file > pred_gff= #ab-initio predictions from an external GFF3 file > model_gff= #annotated gene models from an external GFF3 file (annotation > pass-through) > est2genome=0 #infer gene predictions directly from ESTs, 1 = yes, 0 = no > protein2genome=1 #infer predictions from protein homology, 1 = yes, 0 = no > trna=0 #find tRNAs with tRNAscan, 1 = yes, 0 = no > snoscan_rrna= #rRNA file to have Snoscan find snoRNAs > unmask=0 #also run ab-initio prediction programs on unmasked sequence, 1 = > yes, 0 = no > > #-----Other Annotation Feature Types (features MAKER doesn't recognize) > other_gff= #extra features to pass-through to final MAKER generated GFF3 > file > > #-----External Application Behavior Options > alt_peptide=C #amino acid used to replace non-standard amino acids in > BLAST databases > cpus=1 #max number of cpus to use in BLAST and RepeatMasker (not for MPI, > leave 1 when using MPI) > > #-----MAKER Behavior Options > max_dna_len=100000 #length for dividing up contigs into chunks > (increases/decreases memory usage) > min_contig=1 #skip genome contigs below this length (under 10kb are often > useless) > > pred_flank=200 #flank for extending evidence clusters sent to gene > predictors > pred_stats=1 #report AED and QI statistics for all predictions as well as > models > AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and > 1) > min_protein=0 #require at least this many amino acids in predicted proteins > alt_splice=0 #Take extra steps to try and find alternative splicing, 1 = > yes, 0 = no > always_complete=0 #extra steps to force start and stop codons, 1 = yes, 0 > = no > map_forward=0 #map names and attributes forward from old GFF3 genes, 1 = > yes, 0 = no > keep_preds=1 #Concordance threshold to add unsupported gene prediction > (bound by 0 and 1) > > split_hit=10000 #length for the splitting of hits (expected max intron > size for evidence alignments) > single_exon=1 #consider single exon EST evidence when generating > annotations, 1 = yes, 0 = no > single_length=250 #min length required for single exon ESTs if > 'single_exon is enabled' > correct_est_fusion=0 #limits use of ESTs in annotation to avoid fusion > genes > > tries=2 #number of times to try a contig if there is a failure for some > reason > clean_try=0 #remove all data from previous run before retrying, 1 = yes, 0 > = no > clean_up=0 #removes theVoid directory with individual analysis files, 1 = > yes, 0 = no > TMP= #specify a directory other than the system default temporary > directory for temporary files > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > -- > Xabier V?zquez-Campos, *PhD* > *Research Associate* > NSW Systems Biology Initiative > School of Biotechnology and Biomolecular Sciences > The University of New South Wales > Sydney NSW 2052 AUSTRALIA > -- Xabier V?zquez-Campos, *PhD* *Research Associate* NSW Systems Biology Initiative School of Biotechnology and Biomolecular Sciences The University of New South Wales Sydney NSW 2052 AUSTRALIA -------------- next part -------------- An HTML attachment was scrubbed... URL: From xvazquezc at gmail.com Wed Feb 6 15:33:47 2019 From: xvazquezc at gmail.com (=?UTF-8?Q?Xabier_V=C3=A1zquez=2DCampos?=) Date: Thu, 7 Feb 2019 09:33:47 +1100 Subject: [maker-devel] Re-annotation, fewer gene predictions In-Reply-To: References: <77517CF5-7C20-43FE-94EB-7F45A3D70A4F@live.com> Message-ID: SNAP is easy to train, works well in fungal genomes and it's explained in Maker's wiki: http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial_for_WGS_Assembly_and_Annotation_Winter_School_2018#Training_ab_initio_Gene_Predictors Oh, sorry, I didn't explain myself well. What I was trying to say is that before BUSCO, when we only had CEGMA, we would proceed in a different way to train Augustus as CEGMA wouldn't produce Augustus gene models automatically. I don't mean you to use CEGMA. This is what I have on my own documentation about how to train Augustus "the old way" > AUGUSTUS? the old way > > Alternatively, you can train AUGUSTUS in a more ?manual? way, like when we > were using CEGMA. The training starts with the output from the second > instance of fathom in the SNAP training section. > > cd ${MYGENOME_DIR}/maker/snap1 > perl ~/bin/zff2augustus_gbk.pl > ${MYGENOME}.train1.gb > > zff2augustus_gbk.pl generates a GenBank file from export.dna. > > The actual training of AUGUSTUS will be through the *webAUGUSTUS server*. > > Before proceed, it is recommended to rename the fasta headers, specially > if they contain special characters and/or very long headers. This is the > main reason of failure for the jobs submitted to webAUGUSTUS. You can use > the simplifyFastaHeaders.pl > > script for that: > > perl ~/bin/simplifyFastaHeaders.pl ${MYGENOME}_assembly.fasta nameStem ${MYGENOME}_contigs_rename.fasta ${MYGENOME}_contigs.map > > perl ~/bin/simplifyFastaHeaders.pl ${MYGENOME}_transcripts_assembled.fasta nameStem ${MYGENOME}_rna_rename.fasta ${MYGENOME}_rna.map > > nameStem is the base name for naming each of the sequences in the > multifasta files. Use a value with something appropriate. Use *contig* > and *rna* for the assembly and RNA-seq files, respectively; or something > based on that. For example, ?pgcontig? and ?pgrna? for contigs and RNA from *Puccinia > graminis* > *DO NOT* give the same nameStem to both fasta files, and don?t use any > special character. > > We need the following files (minimum): > > - ${MYGENOME}_assembly.fasta as *Genome file* > - ${MYGENOME}.train1.gb as *Training gene structure file* > > If we also have RNA-seq data: > > - ${MYGENOME}_assembled_transcripts.fasta as *cDNA file* > > Use ${MYGENOME}_v1 as *Species name*. We will need to have a different > species name in the retraining step. Otherwise when Maker2 is rerun, Maker2 > will see the same name and will not rerun AUGUSTUS, even though the species > profile is different. So, ${MYGENOME}_v1 just do the job and tracks > version. > > Once the job is finished, the *Species parameter archive* ( > parameters.tar.gz) will contain a folder with the model files for your > species. Copy it to the species folder of your AUGUSTUS installation. > Hope this helps PS: hit reply all so this is logged in Maker's mail list in case anybody else experiences similar issues On Thu, 7 Feb 2019 at 06:36, morgan sobol wrote: > I have not used SNAP or CEGMA, however, I see that CEGMA was discontinued > in 2015. > Do you think that will be a problem, or is it still worth using the old > version? > > > ------------------------------ > *From:* Xabier V?zquez-Campos > *Sent:* Tuesday, February 5, 2019 4:42 PM > *To:* morgan sobol; Maker Mailing List > *Subject:* Re: [maker-devel] Re-annotation, fewer gene predictions > > Don't you use SNAP? It usually produces quite decent results. And easier > to train than any of the other predictors > > In any case, the Augustus gene model is way off in both cases > GM doesn't seem bad if your fungus has a rather usual genome... in the > first. For the second, it looks bad > > I'm not too familiar with the reannotation but I'd rather create the gene > models from scratch rather than reuse the ones from the Illumina-only > genomes. > Note that assemblies with long-reads, have a higher proportion of > repetitive elements that need masking and RepeatMasker only may not be > enough. In theory, this shouldn't affect Augustus model if trained through > BUSCO as it uses defined conserved markers to create the gene model, but > I'm not so sure about GM. > > If you trained Augustus with BUSCO, and this is the result, I'd discard > the gene model and train it again by the "traditional way", i.e. as it used > to be when we only had CEGMA. I had good results just by changing the > training method. > > Hope it helps, > Xabi > > > > > On Wed, 6 Feb 2019 at 02:19, morgan sobol wrote: > > Thank you, Xabi for the response. > The number of proteins from each source is greatly lower than before. > Previous numbers were 325, 10,899, and 11,243 for augustus, genemark, and > maker respectively. > The more recent numbers are 25, 857, 4418 respectively. > > So do you think maybe this hints that something is wrong from genemark? > > Morgan > > > ------------------------------ > *From:* Xabier V?zquez-Campos > *Sent:* Sunday, February 3, 2019 4:43 PM > *To:* morgan sobol > *Cc:* maker-devel at yandell-lab.org > *Subject:* Re: [maker-devel] Re-annotation, fewer gene predictions > > Hi Morgan, > > We had a similar issue with AUGUSTUS underpredicting when using a > BUSCO-derived gene model > https://groups.google.com/d/msg/maker-devel/ocnDG4nq1A8/NyCPzzRgAgAJ > > Also, check the number of proteins by each individual predictor. If the > numbers from one of them are off, you may find a possible source of issues. > We didn't have a very good experience with GM, as it used to overpredict > an absurd number of proteins. > > Xabi > > On Mon, 4 Feb 2019 at 06:15, morgan sobol wrote: > > Hello, > > I previously used Maker to annotate two different fungal genomes that were > created using Illumina sequences only. For these genomes, I had over 11,000 > genes predicted. > I recently obtained PacBio sequences for the same genomes, so I created > two hybrid assemblies. Both assemblies were very familiar in length and > completed number of orthologs to the Illumina only assembly, but had much > fewer, but longer contigs. > > I re-ran Maker using the settings below. For one of my genomes, I got > around 11,000 genes predicted again, as expected. However, for the other > genome, I am continuously getting ~4,400 predicted genes. > > I am asking for help as to how I can determine why I keep getting fewer > predicted genes for only one of my genomes, even though I ran them the same? > > Thanks, > Morgan S. > > maker_opts.log > #-----Genome (these are always required) > genome=/work/Geomicrobiology/msobol/IODP_329_SPG/1368D2H1/repeatmasker/unicycler/1368D_unicycler_contigs.fasta.masked > #genome sequence (fasta file or$ > organism_type=eukaryotic #eukaryotic or prokaryotic. Default is eukaryotic > > #-----Re-annotation Using MAKER Derived GFF3 > maker_gff=/work/Geomicrobiology/msobol/IODP_329_SPG/1368D2H1/maker/1368D_2H1_contigs.fasta.maker.output/1368D_2H1_contigs.fasta.all.gff > #MAKER derive$ > est_pass=0 #use ESTs in maker_gff: 1 = yes, 0 = no > altest_pass=1 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no > protein_pass=1 #use protein alignments in maker_gff: 1 = yes, 0 = no > rm_pass=0 #use repeats in maker_gff: 1 = yes, 0 = no > model_pass=0 #use gene models in maker_gff: 1 = yes, 0 = no > pred_pass=0 #use ab-initio predictions in maker_gff: 1 = yes, 0 = no > other_pass=0 #passthrough anyything else in maker_gff: 1 = yes, 0 = no > > #-----EST Evidence (for best results provide a file for at least one) > est= #set of ESTs or assembled mRNA-seq in fasta format > altest= #EST/cDNA sequence file in fasta format from an alternate organism > est_gff= #aligned ESTs or mRNA-seq from an external GFF3 file > altest_gff= #aligned ESTs from a closly relate species in GFF3 format > > #-----Protein Homology Evidence (for best results provide a file for at > least one) > protein=/work/Geomicrobiology/msobol/IODP_329_SPG/uniprot_sprot.fasta > #protein sequence file in fasta format (i.e. from mutiple oransisms) > protein_gff= #aligned protein homology evidence from an external GFF3 file > > #-----Repeat Masking (leave values blank to skip repeat masking) > model_org= #select a model organism for RepBase masking in RepeatMasker > rmlib= #provide an organism specific repeat library in fasta format for > RepeatMasker > repeat_protein= #provide a fasta file of transposable element proteins for > RepeatRunner > rm_gff= #pre-identified repeat elements from an external GFF3 file > prok_rm=0 #forces MAKER to repeatmask prokaryotes (no reason to change > this), 1 = yes, 0 = no > softmask=0 #use soft-masking rather than hard-masking in BLAST (i.e. seg > and dust filtering) > > #-----Gene Prediction > snaphmm= #SNAP HMM file > gmhmm=/home/msobol/genemark/68D_2/output/gmhmm.mod #GeneMark HMM file > augustus_species=1368D_uni #Augustus gene prediction species model > fgenesh_par_file= #FGENESH parameter file > pred_gff= #ab-initio predictions from an external GFF3 file > model_gff= #annotated gene models from an external GFF3 file (annotation > pass-through) > est2genome=0 #infer gene predictions directly from ESTs, 1 = yes, 0 = no > protein2genome=1 #infer predictions from protein homology, 1 = yes, 0 = no > trna=0 #find tRNAs with tRNAscan, 1 = yes, 0 = no > snoscan_rrna= #rRNA file to have Snoscan find snoRNAs > unmask=0 #also run ab-initio prediction programs on unmasked sequence, 1 = > yes, 0 = no > > #-----Other Annotation Feature Types (features MAKER doesn't recognize) > other_gff= #extra features to pass-through to final MAKER generated GFF3 > file > > #-----External Application Behavior Options > alt_peptide=C #amino acid used to replace non-standard amino acids in > BLAST databases > cpus=1 #max number of cpus to use in BLAST and RepeatMasker (not for MPI, > leave 1 when using MPI) > > #-----MAKER Behavior Options > max_dna_len=100000 #length for dividing up contigs into chunks > (increases/decreases memory usage) > min_contig=1 #skip genome contigs below this length (under 10kb are often > useless) > > pred_flank=200 #flank for extending evidence clusters sent to gene > predictors > pred_stats=1 #report AED and QI statistics for all predictions as well as > models > AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and > 1) > min_protein=0 #require at least this many amino acids in predicted proteins > alt_splice=0 #Take extra steps to try and find alternative splicing, 1 = > yes, 0 = no > always_complete=0 #extra steps to force start and stop codons, 1 = yes, 0 > = no > map_forward=0 #map names and attributes forward from old GFF3 genes, 1 = > yes, 0 = no > keep_preds=1 #Concordance threshold to add unsupported gene prediction > (bound by 0 and 1) > > split_hit=10000 #length for the splitting of hits (expected max intron > size for evidence alignments) > single_exon=1 #consider single exon EST evidence when generating > annotations, 1 = yes, 0 = no > single_length=250 #min length required for single exon ESTs if > 'single_exon is enabled' > correct_est_fusion=0 #limits use of ESTs in annotation to avoid fusion > genes > > tries=2 #number of times to try a contig if there is a failure for some > reason > clean_try=0 #remove all data from previous run before retrying, 1 = yes, 0 > = no > clean_up=0 #removes theVoid directory with individual analysis files, 1 = > yes, 0 = no > TMP= #specify a directory other than the system default temporary > directory for temporary files > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > -- > Xabier V?zquez-Campos, *PhD* > *Research Associate* > NSW Systems Biology Initiative > School of Biotechnology and Biomolecular Sciences > The University of New South Wales > Sydney NSW 2052 AUSTRALIA > > > > -- > Xabier V?zquez-Campos, *PhD* > *Research Associate* > NSW Systems Biology Initiative > School of Biotechnology and Biomolecular Sciences > The University of New South Wales > Sydney NSW 2052 AUSTRALIA > -- Xabier V?zquez-Campos, *PhD* *Research Associate* NSW Systems Biology Initiative School of Biotechnology and Biomolecular Sciences The University of New South Wales Sydney NSW 2052 AUSTRALIA -------------- next part -------------- An HTML attachment was scrubbed... URL: From liorglic at mail.tau.ac.il Mon Feb 11 07:04:16 2019 From: liorglic at mail.tau.ac.il (Lior Glick) Date: Mon, 11 Feb 2019 16:04:16 +0200 Subject: [maker-devel] MAKER not calling RepeatMasker exe indicated in maker_exe.ctl Message-ID: Dear MAKER users, I've been using MAKER for a while now, with RepeatMasker installed locally. By that I mean that I can type 'RepeatMasker' in my terminal and the software is initiated. Typing 'which RepeatMasker' shows the correct local path. I also use this path as value for the maker_exe.ctl parameter 'RepeatMasker'. Trying to generalize my working environment, I am trying to use a conda env which is capable of running MAKER. This env comes with RepeatMasker as well. Once I activate this env, I can still run RepeatMasker, but it points to a different path. When I run MAKER within this env, it fails right away with the error message: ERROR: Could not determine if RepBase is installed Running the same configuration files locally (i.e. outside the conda env) results in a successful run. This leads me to think that MAKER is not actually using the path indicated in the maker_exe.ctl file, and rather looks for RepeatMasker in $PATH or something similar. Is that the expected behavior? Any suggestions of how to overcome this issue? Thanks and best regards, Lior -------------- next part -------------- An HTML attachment was scrubbed... URL: From liorglic at mail.tau.ac.il Mon Feb 11 07:12:25 2019 From: liorglic at mail.tau.ac.il (Lior Glick) Date: Mon, 11 Feb 2019 16:12:25 +0200 Subject: [maker-devel] Unknown (X) amino acids in predicted proteins Message-ID: Dear MAKER users, After completing a MAKER run, I looked at the protein fasta files that MAKER outputs and noticed that a small fraction of the sequences include X characters, indicating unknown amino acids. I was wondering how such sequences are obtained, I mean how come there are unknown amino acids in the prediction? Is this an indication of low-quality predictions? Is there any documentation regarding the procedure that generates the protein sequences? Thanks a lot, Lior -------------- next part -------------- An HTML attachment was scrubbed... URL: From kapeelc at gmail.com Thu Feb 7 12:43:47 2019 From: kapeelc at gmail.com (Kapeel Chougule) Date: Thu, 7 Feb 2019 14:43:47 -0500 Subject: [maker-devel] MAKER v3 Fgenesh ERROR Message-ID: Hi, Carson I have been getting this error with fgenesh tool within MAKER. It runs ok with most of the assembly contigs but seems to fail on one contig or part of the contig with the below error Widget::fgenesh: /mnt/grid/ware/hpc/home/data/mcampbel/applications/maker/bin/../lib/Widget/fgenesh/fgenesh_wrap /mnt/grid/ware/hpc_norepl/data/data/programs/fgenesh_v8/fgenesh_suite_v8.0.0a/fgenesh /sonas-hs/ware/hpc_norepl/data/programs/fgenesh_v8/fgenesh_suite_v8.0.0a/Zeamays.mpar.dat.new /tmp/uge/53139300.1.primary.q/maker_j3ttxX/6/6_1.600610-613023.Zeamays.mpar.dat.new.auto_annotator.fgenesh.fasta -exon_table:/tmp/uge/53139300.1.primary.q/maker_j3ttxX/6/6_1.600610-613023.Zeamays.mpar.dat.new.auto_annotator.xdef.fgenesh > /tmp/uge/53139300.1.primary.q/maker_j3ttxX/6/6_1.600610-613023.Zeamays.mpar.dat.new.auto_annotator.fgenesh #-------------------------------# ...processing 9 of 24 ...processing 8 of 28 ...processing 10 of 24 ...processing 9 of 28 ...processing 11 of 24 ...processing 10 of 28 ...processing 12 of 24 ...processing 11 of 28 deleted:0 genes ERROR: FgenesH failed --> rank=14, hostname=bnbcompute50 ERROR: Failed while annotating transcripts ERROR: Chunk failed at level:1, tier_type:4 FAILED CONTIG:Super-Scaffold_14.2_contig2 I updated the perl module fgenesh.pm as suggested in the previous threads. Attached are the maker_opts.ctl and STDERR log file. Thanks Kapeel -- *Kapeel ChouguleComputational Scientist Developer II* *One Bungtown Road Cold Spring Harbor, NY 11724http://www.warelab.org/ * -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_opts.ctl Type: application/octet-stream Size: 5421 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: stderr.log Type: application/octet-stream Size: 10012918 bytes Desc: not available URL: From fatih.sarigoel at durham.ac.uk Wed Feb 13 05:20:40 2019 From: fatih.sarigoel at durham.ac.uk (SARIGOEL, FATIH) Date: Wed, 13 Feb 2019 12:20:40 +0000 Subject: [maker-devel] Does Conda Maker actually work? Message-ID: Greetings, I notice that you never mention conda installation on your website, so I am curious if the conda version is actually supposed to be working fine or not; as for me it didn't. I created a new conda environment and installed Maker (tried this with both installation options) When I run the example files, I get this error: "make: *** [Makefile:330: IndexedBase_14e0.o] Error 127 A problem was encountered while attempting to compile and install your Inline C code. The command that failed was: "make > out.make 2>&1" with error code 2" My conda environment is here /fast_new/work/users/fsarigo_m/miniconda3 I don't understand why the program is trying to look here: /home/conda which does not exist Also begins with a "possible precedence issue" Thanks for your help in advance! Fatih +++++ Here is the full log until the end of the contig: (MakerX) [fsarigo_m at med0223 MAKER]$ maker Possible precedence issue with control flow operator at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/Bio/DB/IndexedBase.pm line 845. STATUS: Parsing control files... STATUS: Processing and indexing input FASTA files... STATUS: Setting up database for any GFF3 input... A data structure will be created for you at: /fast_new/work/users/fsarigo_m/scratch/tmp/urchin_holder/SeaUrchins/MAKER/dpp_contig.maker.output/dpp_contig_datastore To access files for individual sequences use the datastore index: /fast_new/work/users/fsarigo_m/scratch/tmp/urchin_holder/SeaUrchins/MAKER/dpp_contig.maker.output/dpp_contig_master_datastore_index.log STATUS: Now running MAKER... examining contents of the fasta file and run log --Next Contig-- Processing run.log file... #--------------------------------------------------------------------- Now starting the contig!! SeqID: contig-dpp-500-500 Length: 32156 #--------------------------------------------------------------------- Running Mkbootstrap for IndexedBase_14e0 () chmod 644 "IndexedBase_14e0.bs" "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/perl" -MExtUtils::Command::MM -e 'cp_nonempty' -- IndexedBase_14e0.bs blib/arch/auto/Bio/DB/IndexedBase_14e0/IndexedBase_14e0.bs 644 "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/perl" "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/ExtUtils/xsubpp" -typemap "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/5.26.2/ExtUtils/typemap" IndexedBase_14e0.xs > IndexedBase_14e0.xsc mv IndexedBase_14e0.xsc IndexedBase_14e0.c /home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/bin/x86_64-conda_cos6-linux-gnu-gcc -c -I"/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin" -D_REENTRANT -D_GNU_SOURCE --sysroot=/home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/x86_64-conda_cos6-linux-gnu/sysroot -fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D_FORTIFY_SOURCE=2 -O2 -DVERSION=\"0.00\" -DXS_VERSION=\"0.00\" -fPIC --sysroot=/home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/x86_64-conda_cos6-linux-gnu/sysroot "-I/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/5.26.2/x86_64-linux-thread-multi/CORE" IndexedBase_14e0.c /bin/sh: /home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/bin/x86_64-conda_cos6-linux-gnu-gcc: No such file or directory make: *** [Makefile:330: IndexedBase_14e0.o] Error 127 A problem was encountered while attempting to compile and install your Inline C code. The command that failed was: "make > out.make 2>&1" with error code 2 The build directory was: /fast_new/work/users/fsarigo_m/scratch/tmp/urchin_holder/SeaUrchins/MAKER/_Inline/build/Bio/DB/IndexedBase_14e0 To debug the problem, cd to the build directory, and inspect the output files. Environment PATH = '/fast/users/fsarigo_m/miniconda3/envs/MakerX/bin:/fast/users/fsarigo_m/miniconda3/bin:/opt/sge/bin:/opt/sge/bin/lx-amd64:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/fast/users/fsarigo_m/.local/bin:/fast/users/fsarigo_m/bin' at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/Bio/DB/IndexedBase.pm line 275. --> rank=NA, hostname=med0223 ...propagated at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/Inline/C.pm line 869. --> rank=NA, hostname=med0223 at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Error.pm line 38. Error::_throw_Error_Simple(HASH(0x564b40c78870)) called at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Error.pm line 306 Error::subs::run_clauses(HASH(0x564b40688970), "Running Mkbootstrap for IndexedBase_14e0 ()\x{a}chmod 644 \"Indexe"..., undef, ARRAY(0x564b40673ad0)) called at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Error.pm line 426 Error::subs::try(CODE(0x564b406899b8), HASH(0x564b40688970)) called at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/FastaSeq.pm line 95 FastaSeq::seq(FastaSeq=HASH(0x564b4068a7f0)) called at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Process/MpiChunk.pm line 478 Process::MpiChunk::_go(Process::MpiChunk=HASH(0x564b40673c08), "run", HASH(0x564b40673c80), 0, 0) called at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Process/MpiChunk.pm line 341 Process::MpiChunk::run(Process::MpiChunk=HASH(0x564b40673c08), 0) called at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Process/MpiChunk.pm line 357 Process::MpiChunk::run_all(Process::MpiChunk=HASH(0x564b40673c08), 0) called at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Process/MpiTiers.pm line 287 Process::MpiTiers::run_all(Process::MpiTiers=HASH(0x564b4053f9f0), 0) called at /fast/users/fsarigo_m/miniconda3/envs/MakerX/bin/maker line 683 Running Mkbootstrap for IndexedBase_14e0 () chmod 644 "IndexedBase_14e0.bs" "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/perl" -MExtUtils::Command::MM -e 'cp_nonempty' -- IndexedBase_14e0.bs blib/arch/auto/Bio/DB/IndexedBase_14e0/IndexedBase_14e0.bs 644 "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/perl" "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/ExtUtils/xsubpp" -typemap "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/5.26.2/ExtUtils/typemap" IndexedBase_14e0.xs > IndexedBase_14e0.xsc mv IndexedBase_14e0.xsc IndexedBase_14e0.c /home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/bin/x86_64-conda_cos6-linux-gnu-gcc -c -I"/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin" -D_REENTRANT -D_GNU_SOURCE --sysroot=/home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/x86_64-conda_cos6-linux-gnu/sysroot -fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D_FORTIFY_SOURCE=2 -O2 -DVERSION=\"0.00\" -DXS_VERSION=\"0.00\" -fPIC --sysroot=/home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/x86_64-conda_cos6-linux-gnu/sysroot "-I/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/5.26.2/x86_64-linux-thread-multi/CORE" IndexedBase_14e0.c /bin/sh: /home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/bin/x86_64-conda_cos6-linux-gnu-gcc: No such file or directory make: *** [Makefile:330: IndexedBase_14e0.o] Error 127 A problem was encountered while attempting to compile and install your Inline C code. The command that failed was: "make > out.make 2>&1" with error code 2 The build directory was: /fast_new/work/users/fsarigo_m/scratch/tmp/urchin_holder/SeaUrchins/MAKER/_Inline/build/Bio/DB/IndexedBase_14e0 To debug the problem, cd to the build directory, and inspect the output files. Environment PATH = '/fast/users/fsarigo_m/miniconda3/envs/MakerX/bin:/fast/users/fsarigo_m/miniconda3/bin:/opt/sge/bin:/opt/sge/bin/lx-amd64:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/fast/users/fsarigo_m/.local/bin:/fast/users/fsarigo_m/bin' at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/Bio/DB/IndexedBase.pm line 275. --> rank=NA, hostname=med0223 ...propagated at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/Inline/C.pm line 869. --> rank=NA, hostname=med0223 --> rank=NA, hostname=med0223 --> rank=NA, hostname=med0223 ERROR: Failed while examining contents of the fasta file and run log ERROR: Chunk failed at level:0, tier_type:0 FAILED CONTIG:contig-dpp-500-500 examining contents of the fasta file and run log -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Feb 13 07:51:44 2019 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 13 Feb 2019 07:51:44 -0700 Subject: [maker-devel] Does Conda Maker actually work? In-Reply-To: References: Message-ID: <0A81593F-EB19-417F-9C9D-3C55178F5D0F@gmail.com> The conda recipe was produced by another group. I do not currently recommend using it because I have seen a number of issues pop up on the list based on people attempting to install MAKER via conda. I know there is at least an issue with the conda RepeatMasker install, and there may be others. The specific failure you show is from Bio::DB::IndexedBase trying to compile an Inline::C function. It may be that conda is installing an older BioPerl where this issue still exists ?> https://github.com/bioperl/bioperl-live/issues/215 Or it may be that there is a new related issue (I?ve seen a handful of other examples that seem to relate back to Bio::DB::IndexedBase) ?> https://github.com/bioperl/bioperl-live/issues/305 Try installing MAKER without conda (make sure to remove any components that are in conda first to avoid conflicts). ?Carson > On Feb 13, 2019, at 5:20 AM, SARIGOEL, FATIH wrote: > > Greetings, > I notice that you never mention conda installation on your website, so I am curious if the conda version is actually supposed to be working fine or not; as for me it didn't. > I created a new conda environment and installed Maker (tried this with both installation options) > When I run the example files, I get this error: > > "make: *** [Makefile:330: IndexedBase_14e0.o] Error 127 > A problem was encountered while attempting to compile and install your Inline > C code. The command that failed was: > "make > out.make 2>&1" with error code 2" > > My conda environment is here > /fast_new/work/users/fsarigo_m/miniconda3 > I don't understand why the program is trying to look here: > /home/conda > which does not exist > > Also begins with a "possible precedence issue" > > Thanks for your help in advance! > Fatih > > +++++ > > Here is the full log until the end of the contig: > > (MakerX) [fsarigo_m at med0223 MAKER]$ maker > Possible precedence issue with control flow operator at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/Bio/DB/IndexedBase.pm line 845. > STATUS: Parsing control files... > STATUS: Processing and indexing input FASTA files... > STATUS: Setting up database for any GFF3 input... > A data structure will be created for you at: > /fast_new/work/users/fsarigo_m/scratch/tmp/urchin_holder/SeaUrchins/MAKER/dpp_contig.maker.output/dpp_contig_datastore > > To access files for individual sequences use the datastore index: > /fast_new/work/users/fsarigo_m/scratch/tmp/urchin_holder/SeaUrchins/MAKER/dpp_contig.maker.output/dpp_contig_master_datastore_index.log > > STATUS: Now running MAKER... > examining contents of the fasta file and run log > > > > --Next Contig-- > > Processing run.log file... > #--------------------------------------------------------------------- > Now starting the contig!! > SeqID: contig-dpp-500-500 > Length: 32156 > #--------------------------------------------------------------------- > > > Running Mkbootstrap for IndexedBase_14e0 () > chmod 644 "IndexedBase_14e0.bs" > "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/perl" -MExtUtils::Command::MM -e 'cp_nonempty' -- IndexedBase_14e0.bs blib/arch/auto/Bio/DB/IndexedBase_14e0/IndexedBase_14e0.bs 644 > "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/perl" "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/ExtUtils/xsubpp" -typemap "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/5.26.2/ExtUtils/typemap" IndexedBase_14e0.xs > IndexedBase_14e0.xsc > mv IndexedBase_14e0.xsc IndexedBase_14e0.c > /home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/bin/x86_64-conda_cos6-linux-gnu-gcc -c -I"/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin" -D_REENTRANT -D_GNU_SOURCE --sysroot=/home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/x86_64-conda_cos6-linux-gnu/sysroot -fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D_FORTIFY_SOURCE=2 -O2 -DVERSION=\"0.00\" -DXS_VERSION=\"0.00\" -fPIC --sysroot=/home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/x86_64-conda_cos6-linux-gnu/sysroot "-I/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/5.26.2/x86_64-linux-thread-multi/CORE" IndexedBase_14e0.c > /bin/sh: /home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/bin/x86_64-conda_cos6-linux-gnu-gcc: No such file or directory > make: *** [Makefile:330: IndexedBase_14e0.o] Error 127 > > A problem was encountered while attempting to compile and install your Inline > C code. The command that failed was: > "make > out.make 2>&1" with error code 2 > > The build directory was: > /fast_new/work/users/fsarigo_m/scratch/tmp/urchin_holder/SeaUrchins/MAKER/_Inline/build/Bio/DB/IndexedBase_14e0 > > To debug the problem, cd to the build directory, and inspect the output files. > > Environment PATH = '/fast/users/fsarigo_m/miniconda3/envs/MakerX/bin:/fast/users/fsarigo_m/miniconda3/bin:/opt/sge/bin:/opt/sge/bin/lx-amd64:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/fast/users/fsarigo_m/.local/bin:/fast/users/fsarigo_m/bin' > at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/Bio/DB/IndexedBase.pm line 275. > --> rank=NA, hostname=med0223 > ...propagated at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/Inline/C.pm line 869. > --> rank=NA, hostname=med0223 > at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Error.pm line 38. > Error::_throw_Error_Simple(HASH(0x564b40c78870)) called at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Error.pm line 306 > Error::subs::run_clauses(HASH(0x564b40688970), "Running Mkbootstrap for IndexedBase_14e0 ()\x{a}chmod 644 \"Indexe"..., undef, ARRAY(0x564b40673ad0)) called at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Error.pm line 426 > Error::subs::try(CODE(0x564b406899b8), HASH(0x564b40688970)) called at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/FastaSeq.pm line 95 > FastaSeq::seq(FastaSeq=HASH(0x564b4068a7f0)) called at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Process/MpiChunk.pm line 478 > Process::MpiChunk::_go(Process::MpiChunk=HASH(0x564b40673c08), "run", HASH(0x564b40673c80), 0, 0) called at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Process/MpiChunk.pm line 341 > Process::MpiChunk::run(Process::MpiChunk=HASH(0x564b40673c08), 0) called at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Process/MpiChunk.pm line 357 > Process::MpiChunk::run_all(Process::MpiChunk=HASH(0x564b40673c08), 0) called at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Process/MpiTiers.pm line 287 > Process::MpiTiers::run_all(Process::MpiTiers=HASH(0x564b4053f9f0), 0) called at /fast/users/fsarigo_m/miniconda3/envs/MakerX/bin/maker line 683 > Running Mkbootstrap for IndexedBase_14e0 () > chmod 644 "IndexedBase_14e0.bs" > "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/perl" -MExtUtils::Command::MM -e 'cp_nonempty' -- IndexedBase_14e0.bs blib/arch/auto/Bio/DB/IndexedBase_14e0/IndexedBase_14e0.bs 644 > "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/perl" "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/ExtUtils/xsubpp" -typemap "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/5.26.2/ExtUtils/typemap" IndexedBase_14e0.xs > IndexedBase_14e0.xsc > mv IndexedBase_14e0.xsc IndexedBase_14e0.c > /home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/bin/x86_64-conda_cos6-linux-gnu-gcc -c -I"/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin" -D_REENTRANT -D_GNU_SOURCE --sysroot=/home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/x86_64-conda_cos6-linux-gnu/sysroot -fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D_FORTIFY_SOURCE=2 -O2 -DVERSION=\"0.00\" -DXS_VERSION=\"0.00\" -fPIC --sysroot=/home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/x86_64-conda_cos6-linux-gnu/sysroot "-I/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/5.26.2/x86_64-linux-thread-multi/CORE" IndexedBase_14e0.c > /bin/sh: /home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/bin/x86_64-conda_cos6-linux-gnu-gcc: No such file or directory > make: *** [Makefile:330: IndexedBase_14e0.o] Error 127 > > A problem was encountered while attempting to compile and install your Inline > C code. The command that failed was: > "make > out.make 2>&1" with error code 2 > > The build directory was: > /fast_new/work/users/fsarigo_m/scratch/tmp/urchin_holder/SeaUrchins/MAKER/_Inline/build/Bio/DB/IndexedBase_14e0 > > To debug the problem, cd to the build directory, and inspect the output files. > > Environment PATH = '/fast/users/fsarigo_m/miniconda3/envs/MakerX/bin:/fast/users/fsarigo_m/miniconda3/bin:/opt/sge/bin:/opt/sge/bin/lx-amd64:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/fast/users/fsarigo_m/.local/bin:/fast/users/fsarigo_m/bin' > at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/Bio/DB/IndexedBase.pm line 275. > --> rank=NA, hostname=med0223 > ...propagated at /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/Inline/C.pm line 869. > --> rank=NA, hostname=med0223 > --> rank=NA, hostname=med0223 > --> rank=NA, hostname=med0223 > ERROR: Failed while examining contents of the fasta file and run log > ERROR: Chunk failed at level:0, tier_type:0 > FAILED CONTIG:contig-dpp-500-500 > > examining contents of the fasta file and run log > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Feb 13 10:14:13 2019 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 13 Feb 2019 10:14:13 -0700 Subject: [maker-devel] MAKER not calling RepeatMasker exe indicated in maker_exe.ctl In-Reply-To: References: Message-ID: <6AFF11A9-9860-4047-A337-4B974C6C0F30@gmail.com> The conda installation of RepeatMasker runs oddly. It does not appear to run the ./configure script during setup, and is missing files inside the repeat library as a result. --Carson > On Feb 4, 2019, at 2:00 AM, Lior Glick wrote: > > Dear MAKER users, > > I've been using MAKER for a while now, with RepeatMasker installed locally. By that I mean that I can type 'RepeatMasker' in my terminal and the software is initiated. Typing 'which RepeatMasker' shows the correct local path. > I also use this path as value for the maker_exe.ctl parameter 'RepeatMasker'. > Trying to generalize my working environment, I am trying to use a conda env which is capable of running MAKER. This env comes with RepeatMasker as well. Once I activate this env, I can still run RepeatMasker, but it points to a different path. When I run MAKER within this env, it fails right away with the error message: > ERROR: Could not determine if RepBase is installed > Running the same configuration files locally (i.e. outside the conda env) results in a successful run. > This leads me to think that MAKER is not actually using the path indicated in the maker_exe.ctl file, and rather looks for RepeatMasker in $PATH or something similar. Is that the expected behavior? Any suggestions of how to overcome this issue? > > Thanks and best regards, > Lior > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Feb 13 10:18:44 2019 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 13 Feb 2019 10:18:44 -0700 Subject: [maker-devel] Unknown (X) amino acids in predicted proteins In-Reply-To: References: Message-ID: <1472E55C-62CB-4A73-B45D-C4BEF3E014B7@gmail.com> If you use GFF3 as input, or use est2genome or protein2genome in your final run, you may have ?N? characters from the assembly as part of your CDS (?N? is the ambiguity code for DNA which will result in an ?X? when translated which is the ambiguity code for amino acids). Augustus will do internal gymnastics and completely splice out exons containing N?s to try and never have this issue, but may not always be able to. It?s an indication of genome assembly issues. --Carson > On Feb 11, 2019, at 7:12 AM, Lior Glick wrote: > > Dear MAKER users, > > After completing a MAKER run, I looked at the protein fasta files that MAKER outputs and noticed that a small fraction of the sequences include X characters, indicating unknown amino acids. I was wondering how such sequences are obtained, I mean how come there are unknown amino acids in the prediction? Is this an indication of low-quality predictions? > Is there any documentation regarding the procedure that generates the protein sequences? > > Thanks a lot, > Lior > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Wed Feb 13 10:24:01 2019 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 13 Feb 2019 10:24:01 -0700 Subject: [maker-devel] Re-annotation, fewer gene predictions In-Reply-To: <77517CF5-7C20-43FE-94EB-7F45A3D70A4F@live.com> References: <77517CF5-7C20-43FE-94EB-7F45A3D70A4F@live.com> Message-ID: One thing you can also do is use old models as protein= input and run the protein2genome option just to see where things align. You may find that not all old models are recoverable in the new assembly. Fewer genes in the new assembly may mean redundant/duplicate contigs were collapse and split contigs were joined resulting in multiple gene fragments becoming a unified single model. Make sure to always review contigs in a browser to see how models and evidence correlate. ?Carson > On Feb 3, 2019, at 12:13 PM, morgan sobol wrote: > > Hello, > > I previously used Maker to annotate two different fungal genomes that were created using Illumina sequences only. For these genomes, I had over 11,000 genes predicted. > I recently obtained PacBio sequences for the same genomes, so I created two hybrid assemblies. Both assemblies were very familiar in length and completed number of orthologs to the Illumina only assembly, but had much fewer, but longer contigs. > > I re-ran Maker using the settings below. For one of my genomes, I got around 11,000 genes predicted again, as expected. However, for the other genome, I am continuously getting ~4,400 predicted genes. > > I am asking for help as to how I can determine why I keep getting fewer predicted genes for only one of my genomes, even though I ran them the same? > > Thanks, > Morgan S. > > maker_opts.log > #-----Genome (these are always required) > genome=/work/Geomicrobiology/msobol/IODP_329_SPG/1368D2H1/repeatmasker/unicycler/1368D_unicycler_contigs.fasta.masked #genome sequence (fasta file or$ > organism_type=eukaryotic #eukaryotic or prokaryotic. Default is eukaryotic > > #-----Re-annotation Using MAKER Derived GFF3 > maker_gff=/work/Geomicrobiology/msobol/IODP_329_SPG/1368D2H1/maker/1368D_2H1_contigs.fasta.maker.output/1368D_2H1_contigs.fasta.all.gff #MAKER derive$ > est_pass=0 #use ESTs in maker_gff: 1 = yes, 0 = no > altest_pass=1 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no > protein_pass=1 #use protein alignments in maker_gff: 1 = yes, 0 = no > rm_pass=0 #use repeats in maker_gff: 1 = yes, 0 = no > model_pass=0 #use gene models in maker_gff: 1 = yes, 0 = no > pred_pass=0 #use ab-initio predictions in maker_gff: 1 = yes, 0 = no > other_pass=0 #passthrough anyything else in maker_gff: 1 = yes, 0 = no > > #-----EST Evidence (for best results provide a file for at least one) > est= #set of ESTs or assembled mRNA-seq in fasta format > altest= #EST/cDNA sequence file in fasta format from an alternate organism > est_gff= #aligned ESTs or mRNA-seq from an external GFF3 file > altest_gff= #aligned ESTs from a closly relate species in GFF3 format > > #-----Protein Homology Evidence (for best results provide a file for at least one) > protein=/work/Geomicrobiology/msobol/IODP_329_SPG/uniprot_sprot.fasta #protein sequence file in fasta format (i.e. from mutiple oransisms) > protein_gff= #aligned protein homology evidence from an external GFF3 file > > #-----Repeat Masking (leave values blank to skip repeat masking) > model_org= #select a model organism for RepBase masking in RepeatMasker > rmlib= #provide an organism specific repeat library in fasta format for RepeatMasker > repeat_protein= #provide a fasta file of transposable element proteins for RepeatRunner > rm_gff= #pre-identified repeat elements from an external GFF3 file > prok_rm=0 #forces MAKER to repeatmask prokaryotes (no reason to change this), 1 = yes, 0 = no > softmask=0 #use soft-masking rather than hard-masking in BLAST (i.e. seg and dust filtering) > > #-----Gene Prediction > snaphmm= #SNAP HMM file > gmhmm=/home/msobol/genemark/68D_2/output/gmhmm.mod #GeneMark HMM file > augustus_species=1368D_uni #Augustus gene prediction species model > fgenesh_par_file= #FGENESH parameter file > pred_gff= #ab-initio predictions from an external GFF3 file > model_gff= #annotated gene models from an external GFF3 file (annotation pass-through) > est2genome=0 #infer gene predictions directly from ESTs, 1 = yes, 0 = no > protein2genome=1 #infer predictions from protein homology, 1 = yes, 0 = no > trna=0 #find tRNAs with tRNAscan, 1 = yes, 0 = no > snoscan_rrna= #rRNA file to have Snoscan find snoRNAs > unmask=0 #also run ab-initio prediction programs on unmasked sequence, 1 = yes, 0 = no > > #-----Other Annotation Feature Types (features MAKER doesn't recognize) > other_gff= #extra features to pass-through to final MAKER generated GFF3 file > > #-----External Application Behavior Options > alt_peptide=C #amino acid used to replace non-standard amino acids in BLAST databases > cpus=1 #max number of cpus to use in BLAST and RepeatMasker (not for MPI, leave 1 when using MPI) > > #-----MAKER Behavior Options > max_dna_len=100000 #length for dividing up contigs into chunks (increases/decreases memory usage) > min_contig=1 #skip genome contigs below this length (under 10kb are often useless) > > pred_flank=200 #flank for extending evidence clusters sent to gene predictors > pred_stats=1 #report AED and QI statistics for all predictions as well as models > AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and 1) > min_protein=0 #require at least this many amino acids in predicted proteins > alt_splice=0 #Take extra steps to try and find alternative splicing, 1 = yes, 0 = no > always_complete=0 #extra steps to force start and stop codons, 1 = yes, 0 = no > map_forward=0 #map names and attributes forward from old GFF3 genes, 1 = yes, 0 = no > keep_preds=1 #Concordance threshold to add unsupported gene prediction (bound by 0 and 1) > > split_hit=10000 #length for the splitting of hits (expected max intron size for evidence alignments) > single_exon=1 #consider single exon EST evidence when generating annotations, 1 = yes, 0 = no > single_length=250 #min length required for single exon ESTs if 'single_exon is enabled' > correct_est_fusion=0 #limits use of ESTs in annotation to avoid fusion genes > > tries=2 #number of times to try a contig if there is a failure for some reason > clean_try=0 #remove all data from previous run before retrying, 1 = yes, 0 = no > clean_up=0 #removes theVoid directory with individual analysis files, 1 = yes, 0 = no > TMP= #specify a directory other than the system default temporary directory for temporary files > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From liorglck at gmail.com Sun Feb 17 11:50:10 2019 From: liorglck at gmail.com (Lior Glick) Date: Sun, 17 Feb 2019 20:50:10 +0200 Subject: [maker-devel] Does Conda Maker actually work? In-Reply-To: <0A81593F-EB19-417F-9C9D-3C55178F5D0F@gmail.com> References: <0A81593F-EB19-417F-9C9D-3C55178F5D0F@gmail.com> Message-ID: That's good to know. Any plans on creating a stable conda package in the future? It'd be a very nice feature, especially since MAKER is not always straightforward to install. On Wed, Feb 13, 2019 at 5:22 PM Carson Holt wrote: > The conda recipe was produced by another group. I do not currently > recommend using it because I have seen a number of issues pop up on the > list based on people attempting to install MAKER via conda. I know there > is at least an issue with the conda RepeatMasker install, and there may be > others. The specific failure you show is from Bio::DB::IndexedBase trying > to compile an Inline::C function. It may be that conda is installing an > older BioPerl where this issue still exists ?> > https://github.com/bioperl/bioperl-live/issues/215 > > Or it may be that there is a new related issue (I?ve seen a handful of > other examples that seem to relate back to Bio::DB::IndexedBase) ?> > https://github.com/bioperl/bioperl-live/issues/305 > > Try installing MAKER without conda (make sure to remove any components > that are in conda first to avoid conflicts). > > ?Carson > > > On Feb 13, 2019, at 5:20 AM, SARIGOEL, FATIH > wrote: > > Greetings, > I notice that you never mention conda installation on your website, so I > am curious if the conda version is actually supposed to be working fine or > not; as for me it didn't. > I created a new conda environment and installed Maker (tried this with > both installation options) > When I run the example files, I get this error: > > "make: *** [Makefile:330: IndexedBase_14e0.o] Error 127 > A problem was encountered while attempting to compile and install your > Inline > C code. The command that failed was: > "make > out.make 2>&1" with error code 2" > > My conda environment is here > /fast_new/work/users/fsarigo_m/miniconda3 > I don't understand why the program is trying to look here: > /home/conda > which does not exist > > Also begins with a "possible precedence issue" > > Thanks for your help in advance! > Fatih > > +++++ > > Here is the full log until the end of the contig: > > (MakerX) [fsarigo_m at med0223 MAKER]$ maker > Possible precedence issue with control flow operator at > /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/Bio/DB/IndexedBase.pm > line 845. > STATUS: Parsing control files... > STATUS: Processing and indexing input FASTA files... > STATUS: Setting up database for any GFF3 input... > A data structure will be created for you at: > > /fast_new/work/users/fsarigo_m/scratch/tmp/urchin_holder/SeaUrchins/MAKER/dpp_contig.maker.output/dpp_contig_datastore > > To access files for individual sequences use the datastore index: > > /fast_new/work/users/fsarigo_m/scratch/tmp/urchin_holder/SeaUrchins/MAKER/dpp_contig.maker.output/dpp_contig_master_datastore_index.log > > STATUS: Now running MAKER... > examining contents of the fasta file and run log > > > > --Next Contig-- > > Processing run.log file... > #--------------------------------------------------------------------- > Now starting the contig!! > SeqID: contig-dpp-500-500 > Length: 32156 > #--------------------------------------------------------------------- > > > Running Mkbootstrap for IndexedBase_14e0 () > chmod 644 "IndexedBase_14e0.bs" > "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/perl" > -MExtUtils::Command::MM -e 'cp_nonempty' -- IndexedBase_14e0.bs > blib/arch/auto/Bio/DB/IndexedBase_14e0/IndexedBase_14e0.bs 644 > "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/perl" > "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/ExtUtils/xsubpp" > -typemap > "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/5.26.2/ExtUtils/typemap" > IndexedBase_14e0.xs > IndexedBase_14e0.xsc > mv IndexedBase_14e0.xsc IndexedBase_14e0.c > /home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/bin/x86_64-conda_cos6-linux-gnu-gcc > -c -I"/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin" > -D_REENTRANT -D_GNU_SOURCE > --sysroot=/home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/x86_64-conda_cos6-linux-gnu/sysroot > -fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong > -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D_FORTIFY_SOURCE=2 -O2 > -DVERSION=\"0.00\" -DXS_VERSION=\"0.00\" -fPIC > --sysroot=/home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/x86_64-conda_cos6-linux-gnu/sysroot > "-I/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/5.26.2/x86_64-linux-thread-multi/CORE" > IndexedBase_14e0.c > /bin/sh: > /home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/bin/x86_64-conda_cos6-linux-gnu-gcc: > No such file or directory > make: *** [Makefile:330: IndexedBase_14e0.o] Error 127 > > A problem was encountered while attempting to compile and install your > Inline > C code. The command that failed was: > "make > out.make 2>&1" with error code 2 > > The build directory was: > > /fast_new/work/users/fsarigo_m/scratch/tmp/urchin_holder/SeaUrchins/MAKER/_Inline/build/Bio/DB/IndexedBase_14e0 > > To debug the problem, cd to the build directory, and inspect the output > files. > > Environment PATH = > '/fast/users/fsarigo_m/miniconda3/envs/MakerX/bin:/fast/users/fsarigo_m/miniconda3/bin:/opt/sge/bin:/opt/sge/bin/lx-amd64:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/fast/users/fsarigo_m/.local/bin:/fast/users/fsarigo_m/bin' > at > /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/Bio/DB/IndexedBase.pm > line 275. > --> rank=NA, hostname=med0223 > ...propagated at > /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/Inline/C.pm > line 869. > --> rank=NA, hostname=med0223 > at > /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Error.pm > line 38. > Error::_throw_Error_Simple(HASH(0x564b40c78870)) called at > /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Error.pm > line 306 > Error::subs::run_clauses(HASH(0x564b40688970), "Running Mkbootstrap for > IndexedBase_14e0 ()\x{a}chmod 644 \"Indexe"..., undef, > ARRAY(0x564b40673ad0)) called at > /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Error.pm > line 426 > Error::subs::try(CODE(0x564b406899b8), HASH(0x564b40688970)) called at > /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/FastaSeq.pm > line 95 > FastaSeq::seq(FastaSeq=HASH(0x564b4068a7f0)) called at > /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Process/MpiChunk.pm > line 478 > Process::MpiChunk::_go(Process::MpiChunk=HASH(0x564b40673c08), "run", > HASH(0x564b40673c80), 0, 0) called at > /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Process/MpiChunk.pm > line 341 > Process::MpiChunk::run(Process::MpiChunk=HASH(0x564b40673c08), 0) called > at > /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Process/MpiChunk.pm > line 357 > Process::MpiChunk::run_all(Process::MpiChunk=HASH(0x564b40673c08), 0) > called at > /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/../lib/Process/MpiTiers.pm > line 287 > Process::MpiTiers::run_all(Process::MpiTiers=HASH(0x564b4053f9f0), 0) > called at /fast/users/fsarigo_m/miniconda3/envs/MakerX/bin/maker line 683 > Running Mkbootstrap for IndexedBase_14e0 () > chmod 644 "IndexedBase_14e0.bs" > "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/perl" > -MExtUtils::Command::MM -e 'cp_nonempty' -- IndexedBase_14e0.bs > blib/arch/auto/Bio/DB/IndexedBase_14e0/IndexedBase_14e0.bs 644 > "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin/perl" > "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/ExtUtils/xsubpp" > -typemap > "/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/5.26.2/ExtUtils/typemap" > IndexedBase_14e0.xs > IndexedBase_14e0.xsc > mv IndexedBase_14e0.xsc IndexedBase_14e0.c > /home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/bin/x86_64-conda_cos6-linux-gnu-gcc > -c -I"/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/bin" > -D_REENTRANT -D_GNU_SOURCE > --sysroot=/home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/x86_64-conda_cos6-linux-gnu/sysroot > -fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong > -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D_FORTIFY_SOURCE=2 -O2 > -DVERSION=\"0.00\" -DXS_VERSION=\"0.00\" -fPIC > --sysroot=/home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/x86_64-conda_cos6-linux-gnu/sysroot > "-I/fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/5.26.2/x86_64-linux-thread-multi/CORE" > IndexedBase_14e0.c > /bin/sh: > /home/conda/feedstock_root/build_artifacts/perl_1548813468557/_build_env/bin/x86_64-conda_cos6-linux-gnu-gcc: > No such file or directory > make: *** [Makefile:330: IndexedBase_14e0.o] Error 127 > > A problem was encountered while attempting to compile and install your > Inline > C code. The command that failed was: > "make > out.make 2>&1" with error code 2 > > The build directory was: > > /fast_new/work/users/fsarigo_m/scratch/tmp/urchin_holder/SeaUrchins/MAKER/_Inline/build/Bio/DB/IndexedBase_14e0 > > To debug the problem, cd to the build directory, and inspect the output > files. > > Environment PATH = > '/fast/users/fsarigo_m/miniconda3/envs/MakerX/bin:/fast/users/fsarigo_m/miniconda3/bin:/opt/sge/bin:/opt/sge/bin/lx-amd64:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/fast/users/fsarigo_m/.local/bin:/fast/users/fsarigo_m/bin' > at > /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/Bio/DB/IndexedBase.pm > line 275. > --> rank=NA, hostname=med0223 > ...propagated at > /fast_new/work/users/fsarigo_m/miniconda3/envs/MakerX/lib/site_perl/5.26.2/Inline/C.pm > line 869. > --> rank=NA, hostname=med0223 > --> rank=NA, hostname=med0223 > --> rank=NA, hostname=med0223 > ERROR: Failed while examining contents of the fasta file and run log > ERROR: Chunk failed at level:0, tier_type:0 > FAILED CONTIG:contig-dpp-500-500 > > examining contents of the fasta file and run log > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From morgan_starr_s at live.com Mon Feb 18 02:08:56 2019 From: morgan_starr_s at live.com (morgan sobol) Date: Mon, 18 Feb 2019 09:08:56 +0000 Subject: [maker-devel] Re-annotation, fewer gene predictions In-Reply-To: References: <77517CF5-7C20-43FE-94EB-7F45A3D70A4F@live.com> , Message-ID: Thank you, Xabi and Carson. With your help, I was able to improve the annotation with a more appropriate number of predictions. Best, Morgan ________________________________ From: Xabier V?zquez-Campos Sent: Wednesday, February 6, 2019 11:33 PM To: morgan sobol; Maker Mailing List Subject: Re: [maker-devel] Re-annotation, fewer gene predictions SNAP is easy to train, works well in fungal genomes and it's explained in Maker's wiki: http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial_for_WGS_Assembly_and_Annotation_Winter_School_2018#Training_ab_initio_Gene_Predictors Oh, sorry, I didn't explain myself well. What I was trying to say is that before BUSCO, when we only had CEGMA, we would proceed in a different way to train Augustus as CEGMA wouldn't produce Augustus gene models automatically. I don't mean you to use CEGMA. This is what I have on my own documentation about how to train Augustus "the old way" AUGUSTUS? the old way Alternatively, you can train AUGUSTUS in a more ?manual? way, like when we were using CEGMA. The training starts with the output from the second instance of fathom in the SNAP training section. cd ${MYGENOME_DIR}/maker/snap1 perl ~/bin/zff2augustus_gbk.pl > ${MYGENOME}.train1.gb zff2augustus_gbk.pl generates a GenBank file from export.dna. The actual training of AUGUSTUS will be through the webAUGUSTUS server. Before proceed, it is recommended to rename the fasta headers, specially if they contain special characters and/or very long headers. This is the main reason of failure for the jobs submitted to webAUGUSTUS. You can use the simplifyFastaHeaders.pl script for that: perl ~/bin/simplifyFastaHeaders.pl ${MYGENOME}_assembly.fasta nameStem ${MYGENOME}_contigs_rename.fasta ${MYGENOME}_contigs.map perl ~/bin/simplifyFastaHeaders.pl ${MYGENOME}_transcripts_assembled.fasta nameStem ${MYGENOME}_rna_rename.fasta ${MYGENOME}_rna.map nameStem is the base name for naming each of the sequences in the multifasta files. Use a value with something appropriate. Use contig and rna for the assembly and RNA-seq files, respectively; or something based on that. For example, ?pgcontig? and ?pgrna? for contigs and RNA from Puccinia graminis DO NOT give the same nameStem to both fasta files, and don?t use any special character. We need the following files (minimum): * ${MYGENOME}_assembly.fasta as Genome file * ${MYGENOME}.train1.gb as Training gene structure file If we also have RNA-seq data: * ${MYGENOME}_assembled_transcripts.fasta as cDNA file Use ${MYGENOME}_v1 as Species name. We will need to have a different species name in the retraining step. Otherwise when Maker2 is rerun, Maker2 will see the same name and will not rerun AUGUSTUS, even though the species profile is different. So, ${MYGENOME}_v1 just do the job and tracks version. Once the job is finished, the Species parameter archive (parameters.tar.gz) will contain a folder with the model files for your species. Copy it to the species folder of your AUGUSTUS installation. Hope this helps PS: hit reply all so this is logged in Maker's mail list in case anybody else experiences similar issues On Thu, 7 Feb 2019 at 06:36, morgan sobol > wrote: I have not used SNAP or CEGMA, however, I see that CEGMA was discontinued in 2015. Do you think that will be a problem, or is it still worth using the old version? ________________________________ From: Xabier V?zquez-Campos > Sent: Tuesday, February 5, 2019 4:42 PM To: morgan sobol; Maker Mailing List Subject: Re: [maker-devel] Re-annotation, fewer gene predictions Don't you use SNAP? It usually produces quite decent results. And easier to train than any of the other predictors In any case, the Augustus gene model is way off in both cases GM doesn't seem bad if your fungus has a rather usual genome... in the first. For the second, it looks bad I'm not too familiar with the reannotation but I'd rather create the gene models from scratch rather than reuse the ones from the Illumina-only genomes. Note that assemblies with long-reads, have a higher proportion of repetitive elements that need masking and RepeatMasker only may not be enough. In theory, this shouldn't affect Augustus model if trained through BUSCO as it uses defined conserved markers to create the gene model, but I'm not so sure about GM. If you trained Augustus with BUSCO, and this is the result, I'd discard the gene model and train it again by the "traditional way", i.e. as it used to be when we only had CEGMA. I had good results just by changing the training method. Hope it helps, Xabi On Wed, 6 Feb 2019 at 02:19, morgan sobol > wrote: Thank you, Xabi for the response. The number of proteins from each source is greatly lower than before. Previous numbers were 325, 10,899, and 11,243 for augustus, genemark, and maker respectively. The more recent numbers are 25, 857, 4418 respectively. So do you think maybe this hints that something is wrong from genemark? Morgan ________________________________ From: Xabier V?zquez-Campos > Sent: Sunday, February 3, 2019 4:43 PM To: morgan sobol Cc: maker-devel at yandell-lab.org Subject: Re: [maker-devel] Re-annotation, fewer gene predictions Hi Morgan, We had a similar issue with AUGUSTUS underpredicting when using a BUSCO-derived gene model https://groups.google.com/d/msg/maker-devel/ocnDG4nq1A8/NyCPzzRgAgAJ Also, check the number of proteins by each individual predictor. If the numbers from one of them are off, you may find a possible source of issues. We didn't have a very good experience with GM, as it used to overpredict an absurd number of proteins. Xabi On Mon, 4 Feb 2019 at 06:15, morgan sobol > wrote: Hello, I previously used Maker to annotate two different fungal genomes that were created using Illumina sequences only. For these genomes, I had over 11,000 genes predicted. I recently obtained PacBio sequences for the same genomes, so I created two hybrid assemblies. Both assemblies were very familiar in length and completed number of orthologs to the Illumina only assembly, but had much fewer, but longer contigs. I re-ran Maker using the settings below. For one of my genomes, I got around 11,000 genes predicted again, as expected. However, for the other genome, I am continuously getting ~4,400 predicted genes. I am asking for help as to how I can determine why I keep getting fewer predicted genes for only one of my genomes, even though I ran them the same? Thanks, Morgan S. maker_opts.log #-----Genome (these are always required) genome=/work/Geomicrobiology/msobol/IODP_329_SPG/1368D2H1/repeatmasker/unicycler/1368D_unicycler_contigs.fasta.masked #genome sequence (fasta file or$ organism_type=eukaryotic #eukaryotic or prokaryotic. Default is eukaryotic #-----Re-annotation Using MAKER Derived GFF3 maker_gff=/work/Geomicrobiology/msobol/IODP_329_SPG/1368D2H1/maker/1368D_2H1_contigs.fasta.maker.output/1368D_2H1_contigs.fasta.all.gff #MAKER derive$ est_pass=0 #use ESTs in maker_gff: 1 = yes, 0 = no altest_pass=1 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no protein_pass=1 #use protein alignments in maker_gff: 1 = yes, 0 = no rm_pass=0 #use repeats in maker_gff: 1 = yes, 0 = no model_pass=0 #use gene models in maker_gff: 1 = yes, 0 = no pred_pass=0 #use ab-initio predictions in maker_gff: 1 = yes, 0 = no other_pass=0 #passthrough anyything else in maker_gff: 1 = yes, 0 = no #-----EST Evidence (for best results provide a file for at least one) est= #set of ESTs or assembled mRNA-seq in fasta format altest= #EST/cDNA sequence file in fasta format from an alternate organism est_gff= #aligned ESTs or mRNA-seq from an external GFF3 file altest_gff= #aligned ESTs from a closly relate species in GFF3 format #-----Protein Homology Evidence (for best results provide a file for at least one) protein=/work/Geomicrobiology/msobol/IODP_329_SPG/uniprot_sprot.fasta #protein sequence file in fasta format (i.e. from mutiple oransisms) protein_gff= #aligned protein homology evidence from an external GFF3 file #-----Repeat Masking (leave values blank to skip repeat masking) model_org= #select a model organism for RepBase masking in RepeatMasker rmlib= #provide an organism specific repeat library in fasta format for RepeatMasker repeat_protein= #provide a fasta file of transposable element proteins for RepeatRunner rm_gff= #pre-identified repeat elements from an external GFF3 file prok_rm=0 #forces MAKER to repeatmask prokaryotes (no reason to change this), 1 = yes, 0 = no softmask=0 #use soft-masking rather than hard-masking in BLAST (i.e. seg and dust filtering) #-----Gene Prediction snaphmm= #SNAP HMM file gmhmm=/home/msobol/genemark/68D_2/output/gmhmm.mod #GeneMark HMM file augustus_species=1368D_uni #Augustus gene prediction species model fgenesh_par_file= #FGENESH parameter file pred_gff= #ab-initio predictions from an external GFF3 file model_gff= #annotated gene models from an external GFF3 file (annotation pass-through) est2genome=0 #infer gene predictions directly from ESTs, 1 = yes, 0 = no protein2genome=1 #infer predictions from protein homology, 1 = yes, 0 = no trna=0 #find tRNAs with tRNAscan, 1 = yes, 0 = no snoscan_rrna= #rRNA file to have Snoscan find snoRNAs unmask=0 #also run ab-initio prediction programs on unmasked sequence, 1 = yes, 0 = no #-----Other Annotation Feature Types (features MAKER doesn't recognize) other_gff= #extra features to pass-through to final MAKER generated GFF3 file #-----External Application Behavior Options alt_peptide=C #amino acid used to replace non-standard amino acids in BLAST databases cpus=1 #max number of cpus to use in BLAST and RepeatMasker (not for MPI, leave 1 when using MPI) #-----MAKER Behavior Options max_dna_len=100000 #length for dividing up contigs into chunks (increases/decreases memory usage) min_contig=1 #skip genome contigs below this length (under 10kb are often useless) pred_flank=200 #flank for extending evidence clusters sent to gene predictors pred_stats=1 #report AED and QI statistics for all predictions as well as models AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and 1) min_protein=0 #require at least this many amino acids in predicted proteins alt_splice=0 #Take extra steps to try and find alternative splicing, 1 = yes, 0 = no always_complete=0 #extra steps to force start and stop codons, 1 = yes, 0 = no map_forward=0 #map names and attributes forward from old GFF3 genes, 1 = yes, 0 = no keep_preds=1 #Concordance threshold to add unsupported gene prediction (bound by 0 and 1) split_hit=10000 #length for the splitting of hits (expected max intron size for evidence alignments) single_exon=1 #consider single exon EST evidence when generating annotations, 1 = yes, 0 = no single_length=250 #min length required for single exon ESTs if 'single_exon is enabled' correct_est_fusion=0 #limits use of ESTs in annotation to avoid fusion genes tries=2 #number of times to try a contig if there is a failure for some reason clean_try=0 #remove all data from previous run before retrying, 1 = yes, 0 = no clean_up=0 #removes theVoid directory with individual analysis files, 1 = yes, 0 = no TMP= #specify a directory other than the system default temporary directory for temporary files _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -- Xabier V?zquez-Campos, PhD Research Associate NSW Systems Biology Initiative School of Biotechnology and Biomolecular Sciences The University of New South Wales Sydney NSW 2052 AUSTRALIA -- Xabier V?zquez-Campos, PhD Research Associate NSW Systems Biology Initiative School of Biotechnology and Biomolecular Sciences The University of New South Wales Sydney NSW 2052 AUSTRALIA -- Xabier V?zquez-Campos, PhD Research Associate NSW Systems Biology Initiative School of Biotechnology and Biomolecular Sciences The University of New South Wales Sydney NSW 2052 AUSTRALIA -------------- next part -------------- An HTML attachment was scrubbed... URL: From anthony.bretaudeau at inria.fr Mon Feb 18 02:53:39 2019 From: anthony.bretaudeau at inria.fr (Anthony Bretaudeau) Date: Mon, 18 Feb 2019 10:53:39 +0100 Subject: [maker-devel] Does Conda Maker actually work? In-Reply-To: References: <0A81593F-EB19-417F-9C9D-3C55178F5D0F@gmail.com> Message-ID: <3aa1eb97-f8bf-dd61-febf-464ad4b1626c@inria.fr> An HTML attachment was scrubbed... URL: From liorglic at mail.tau.ac.il Sun Feb 24 05:50:49 2019 From: liorglic at mail.tau.ac.il (Lior Glick) Date: Sun, 24 Feb 2019 14:50:49 +0200 Subject: [maker-devel] Profiling MAKER runs Message-ID: Dear MAKER users, I was wondering if any of you has an idea of a way by which I can profile my runs. What I mean is I'd like to know how much time was spent on each step of the analysis - am I spending most of the time masking repeats, blasting transcripts/proteins, running ab-initio predictors etc. Based on this information, I might want to adjust my configuration, e.g. maybe I'm spending a lot of time blasting transcripts, and reducing the number of input transcripts would reduce run time significantly without having a major effect on results quality. As far as I can see, the main run log does not provide such information, and I'm not sure where else to look. Any ideas or directions could be of help. Thanks! Lior -------------- next part -------------- An HTML attachment was scrubbed... URL: