From roscito at mpi-cbg.de Mon Oct 5 07:06:57 2015 From: roscito at mpi-cbg.de (roscito) Date: Mon, 5 Oct 2015 14:06:57 +0200 Subject: [maker-devel] different exons predicted in different maker rounds Message-ID: <2FD50373-B02F-49C0-8414-EF6705BB5826@mpi-cbg.de> Dear all, First of all, I'd like to thank everyone in this forum for all the tips and comments on the best strategies for running MAKER, they have been really helpful so far. However, I still don't fully understand the behaviour of MAKER when ran iteratively, and I compare the predictions from each round. Let me explain: My input data are the following: - the repeat-masked genome of a vertebrate (~2Gb); - mRNA data for this species mapped to the genome with tophat2 and assembled into transcripts with cufflinks; - exonerate-mapped proteins in gff3 format to the reference genome, from closely related species (global alignment) For the first round of MAKER, I provided both cufflinks and exonerate-mapped proteins with the options est2genome and protein2genome = 1. From maker output, I generated the SNAP .hmm file (as the instructions in http://gmod.org/wiki/MAKER_Tutorial) and provided it as input to the second round of MAKER. For this second round I still gave cufflinks + exonerated proteins, but switched both est2genome ad protein2genome to 0. After finished, I generated SNAP .hmm once more and provided it for the 3rd and final round of MAKER, along with cufflinks and exonerated-mapped prots and est/prot2genome=0 As sort of a sanity check, I went on and ran a 4th round of MAKER with the SNAP .hmm file from round3, cufflinks and exonerated-mapped prots and est/prot2genome=0, and this time specifying alt_splice=1. For all the rounds, I also specified single_exon=1. I loaded the gene predictions from each round plus the cufflink transcripts and the exonerated proteins to the genome browser to visually inspect the output. I saw a few strange cases where MAKER doesn't seem to use the protein/mRNA evidences for the gene predictions, and I would greatly appreciate any feedback/ideas on what I could possible be doing wrong. Here are a few screenshots so you know what I'm talking about: In this first example, MAKER misses a conserved exon for which there is both protein and mRNA evidence, and only if I specify alt_splice I get the exon 'back'. In this second example, MAKER completely ignores lots of exons, all conserved across vertebrates, and supported by protein/mRNA evidence. In the third example, there is no prediction from round1, the one from round2 matches the protein/mRNA evidence, and then in the final round3 and 4, an extra exon appears. (hope you'l be able to see the images above) As I said, I would greatly appreciate any feedback on these strange cases. Perhaps I'm missing some parameter(s)? Thanks a lot. All the best, Juliana -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: example1.png Type: image/png Size: 53178 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: example2.png Type: image/png Size: 55134 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: example3.png Type: image/png Size: 67598 bytes Desc: not available URL: From carsonhh at gmail.com Fri Oct 9 13:58:02 2015 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 9 Oct 2015 12:58:02 -0600 Subject: [maker-devel] different exons predicted in different maker rounds In-Reply-To: <2FD50373-B02F-49C0-8414-EF6705BB5826@mpi-cbg.de> References: <2FD50373-B02F-49C0-8414-EF6705BB5826@mpi-cbg.de> Message-ID: <1D5FAF7F-4758-4C4A-90EB-31C7BAE34725@gmail.com> Some of your cufflinks evidence is contradicting the existence of the exon. When you have alt_splice=0, the evidence is passed in in it?s entirety, and all sources are equal. When you have alt_splice=1 set, certain pieces of spliced evidence are given higher priority (in an iterative fashion), and cannot be overridden by contradictory evidence. The result is that there is a specific combination of evidence based hints that allow the gene predictor to find the exon, but when it sees everything the HMM doesn?t score the exon as very likely. I?d recommend not running with cufflinks because its result usually have low specificity, so it generates a lot of bad hints. Use Trinity instead to assembly everything, then allow it to be aligned inside of MAKER. Trinity assembled contigs give much greater specificity in the final results. ?Carson > On Oct 5, 2015, at 6:06 AM, roscito wrote: > > Dear all, > > First of all, I'd like to thank everyone in this forum for all the tips and comments on the best strategies for running MAKER, they have been really helpful so far. > However, I still don't fully understand the behaviour of MAKER when ran iteratively, and I compare the predictions from each round. Let me explain: > > My input data are the following: > - the repeat-masked genome of a vertebrate (~2Gb); > - mRNA data for this species mapped to the genome with tophat2 and assembled into transcripts with cufflinks; > - exonerate-mapped proteins in gff3 format to the reference genome, from closely related species (global alignment) > > For the first round of MAKER, I provided both cufflinks and exonerate-mapped proteins with the options est2genome and protein2genome = 1. From maker output, I generated the SNAP .hmm file (as the instructions in http://gmod.org/wiki/MAKER_Tutorial ) and provided it as input to the second round of MAKER. > For this second round I still gave cufflinks + exonerated proteins, but switched both est2genome ad protein2genome to 0. After finished, I generated SNAP .hmm once more and provided it for the 3rd and final round of MAKER, along with cufflinks and exonerated-mapped prots and est/prot2genome=0 > > As sort of a sanity check, I went on and ran a 4th round of MAKER with the SNAP .hmm file from round3, cufflinks and exonerated-mapped prots and est/prot2genome=0, and this time specifying alt_splice=1. > For all the rounds, I also specified single_exon=1. > > > I loaded the gene predictions from each round plus the cufflink transcripts and the exonerated proteins to the genome browser to visually inspect the output. I saw a few strange cases where MAKER doesn't seem to use the protein/mRNA evidences for the gene predictions, and I would greatly appreciate any feedback/ideas on what I could possible be doing wrong. Here are a few screenshots so you know what I'm talking about: > > In this first example, MAKER misses a conserved exon for which there is both protein and mRNA evidence, and only if I specify alt_splice I get the exon 'back'. > > > > In this second example, MAKER completely ignores lots of exons, all conserved across vertebrates, and supported by protein/mRNA evidence. > > > > In the third example, there is no prediction from round1, the one from round2 matches the protein/mRNA evidence, and then in the final round3 and 4, an extra exon appears. > > > > > (hope you'l be able to see the images above) > As I said, I would greatly appreciate any feedback on these strange cases. Perhaps I'm missing some parameter(s)? > > Thanks a lot. > All the best, > Juliana > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From kmkocot at gmail.com Fri Oct 16 23:10:14 2015 From: kmkocot at gmail.com (Kevin Kocot) Date: Sat, 17 Oct 2015 14:10:14 +1000 Subject: [maker-devel] Maker not producing expected output Message-ID: <5621CA26.8050807@uq.edu.au> Hello, I've run Maker on a draft invertebrate genome and it seemed to finish successfully. However, many of the expected output files were not produced. If I go to, for example, XX_datastore/00/0C/scaffold-334630/, all I see is: theVoid.scaffold-334630 run.log scaffold-334630.gff In particular, I'm looking for the transcripts and proteins fasta files. I'm sure I have a configuration setting incorrect or one of the dependencies not correctly installed, but I can't figure out what the problem is. Any thoughts on how I can resolve this issue and generate these files? Ideally I would love to be able to generate these files without having to run the whole pipeline again. Details on my configuration settings and the contents of the run.log file from my example above are pasted below. Thank you, Kevin ----- run.log from the example folder above looks like this: ----- SHARED_ID d574e9ca9b0019a9fe147ccb9db3588b CTL_OPTIONS maker_gff CTL_OPTIONS other_gff CTL_OPTIONS est test-transcriptome.fa CTL_OPTIONS est_reads CTL_OPTIONS altest KK273.fa CTL_OPTIONS est_gff CTL_OPTIONS altest_gff CTL_OPTIONS protein test-AA.fa CTL_OPTIONS protein_gff CTL_OPTIONS model_org all CTL_OPTIONS repeat_protein te_proteins.fasta CTL_OPTIONS rmlib CTL_OPTIONS rm_gff CTL_OPTIONS organism_type eukaryotic CTL_OPTIONS predictor est2genome,genemark,protein2genome CTL_OPTIONS est2genome 1 CTL_OPTIONS altest2genome 0 CTL_OPTIONS snaphmm CTL_OPTIONS gmhmm output/gmhmm.mod CTL_OPTIONS augustus_species CTL_OPTIONS fgenesh_par_file CTL_OPTIONS model_gff CTL_OPTIONS pred_gff CTL_OPTIONS max_dna_len 100000 CTL_OPTIONS split_hit 10000 CTL_OPTIONS pred_flank 200 CTL_OPTIONS pred_stats 0 CTL_OPTIONS min_protein 0 CTL_OPTIONS AED_threshold 1 CTL_OPTIONS single_exon 0 CTL_OPTIONS single_length 250 CTL_OPTIONS keep_preds 0 CTL_OPTIONS map_forward 0 CTL_OPTIONS est_forward 0 CTL_OPTIONS correct_est_fusion 0 CTL_OPTIONS alt_splice 0 CTL_OPTIONS always_complete 0 CTL_OPTIONS alt_peptide C CTL_OPTIONS evaluate 0 CTL_OPTIONS blast_type ncbi+ CTL_OPTIONS softmask 1 CTL_OPTIONS pcov_blastn 0.8 CTL_OPTIONS pid_blastn 0.85 CTL_OPTIONS eval_blastn 1e-10 CTL_OPTIONS bit_blastn 40 CTL_OPTIONS depth_blastn 0 CTL_OPTIONS pcov_rm_blastx 0.5 CTL_OPTIONS pid_rm_blastx 0.4 CTL_OPTIONS eval_rm_blastx 1e-06 CTL_OPTIONS bit_rm_blastx 30 CTL_OPTIONS pcov_blastx 0.5 CTL_OPTIONS pid_blastx 0.4 CTL_OPTIONS depth_blastx 0 CTL_OPTIONS eval_blastx 1e-06 CTL_OPTIONS bit_blastx 30 CTL_OPTIONS pcov_tblastx 0.8 CTL_OPTIONS pid_tblastx 0.85 CTL_OPTIONS eval_tblastx 1e-10 CTL_OPTIONS bit_tblastx 40 CTL_OPTIONS depth_tblastx 0 CTL_OPTIONS ep_score_limit 20 CTL_OPTIONS en_score_limit 20 CTL_OPTIONS enable_fathom 0 CTL_OPTIONS unmask 0 CTL_OPTIONS model_pass 0 CTL_OPTIONS est_pass 0 CTL_OPTIONS altest_pass 0 CTL_OPTIONS protein_pass 0 CTL_OPTIONS rm_pass 0 CTL_OPTIONS other_pass 0 CTL_OPTIONS pred_pass 0 CTL_OPTIONS run genemark LOGCHILD /media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/run.log.child.0 LOGCHILD /media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/run.log.child.0 LOGCHILD /media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/run.log.child.0 STARTED test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/scaffold-334630.abinit_nomask.0.gmhmm%2Emod.genemark FINISHED test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/scaffold-334630.abinit_nomask.0.gmhmm%2Emod.genemark STARTED test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/scaffold-334630.0.pred.raw.section FINISHED test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/scaffold-334630.0.pred.raw.section LOGCHILD /media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/run.log.child.0 LOGCHILD /media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/run.log.child.0 LOGCHILD /media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/run.log.child.0 STARTED test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/scaffold-334630.0.final.section FINISHED test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/scaffold-334630.0.final.section LOGCHILD /media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/run.log.child.0 LOGCHILD /media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/run.log.child.0 LOGCHILD /media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/run.log.child.0 ----- maker_opts ----- #-----Genome (these are always required) genome=/media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.fas #genome sequence (fasta file or fasta embeded in GFF3 file) organism_type=eukaryotic #eukaryotic or prokaryotic. Default is eukaryotic #-----Re-annotation Using MAKER Derived GFF3 maker_gff= #MAKER derived GFF3 file est_pass=0 #use ESTs in maker_gff: 1 = yes, 0 = no altest_pass=0 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no protein_pass=0 #use protein alignments in maker_gff: 1 = yes, 0 = no rm_pass=0 #use repeats in maker_gff: 1 = yes, 0 = no model_pass=0 #use gene models in maker_gff: 1 = yes, 0 = no pred_pass=0 #use ab-initio predictions in maker_gff: 1 = yes, 0 = no other_pass=0 #passthrough anyything else in maker_gff: 1 = yes, 0 = no #-----EST Evidence (for best results provide a file for at least one) est=/media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test-transcriptome.fa #set of ESTs or assembled mRNA-seq in fasta format altest=/media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/KK273.fa #EST/cDNA sequence file in fasta format from an alternate organism est_gff= #aligned ESTs or mRNA-seq from an external GFF3 file altest_gff= #aligned ESTs from a closly relate species in GFF3 format #-----Protein Homology Evidence (for best results provide a file for at least one) protein=/media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test-AA.fa #protein sequence file in fasta format (i.e. from mutiple oransisms) protein_gff= #aligned protein homology evidence from an external GFF3 file #-----Repeat Masking (leave values blank to skip repeat masking) model_org=all #select a model organism for RepBase masking in RepeatMasker rmlib= #provide an organism specific repeat library in fasta format for RepeatMasker repeat_protein=/usr/local/bin/maker/data/te_proteins.fasta #provide a fasta file of transposable element proteins for RepeatRunner rm_gff= #pre-identified repeat elements from an external GFF3 file prok_rm=0 #forces MAKER to repeatmask prokaryotes (no reason to change this), 1 = yes, 0 = no softmask=1 #use soft-masking rather than hard-masking in BLAST (i.e. seg and dust filtering) #-----Gene Prediction snaphmm= #SNAP HMM file gmhmm=/media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/output/gmhmm.mod #GeneMark HMM file augustus_species= #Augustus gene prediction species model fgenesh_par_file= #FGENESH parameter file pred_gff= #ab-initio predictions from an external GFF3 file model_gff= #annotated gene models from an external GFF3 file (annotation pass-through) est2genome=1 #infer gene predictions directly from ESTs, 1 = yes, 0 = no protein2genome=1 #infer predictions from protein homology, 1 = yes, 0 = no trna=0 #find tRNAs with tRNAscan, 1 = yes, 0 = no snoscan_rrna= #rRNA file to have Snoscan find snoRNAs unmask=0 #also run ab-initio prediction programs on unmasked sequence, 1 = yes, 0 = no #-----Other Annotation Feature Types (features MAKER doesn't recognize) other_gff= #extra features to pass-through to final MAKER generated GFF3 file #-----External Application Behavior Options alt_peptide=C #amino acid used to replace non-standard amino acids in BLAST databases cpus=1 #max number of cpus to use in BLAST and RepeatMasker (not for MPI, leave 1 when using MPI) #-----MAKER Behavior Options max_dna_len=100000 #length for dividing up contigs into chunks (increases/decreases memory usage) min_contig=1 #skip genome contigs below this length (under 10kb are often useless) pred_flank=200 #flank for extending evidence clusters sent to gene predictors pred_stats=0 #report AED and QI statistics for all predictions as well as models AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and 1) min_protein=0 #require at least this many amino acids in predicted proteins alt_splice=0 #Take extra steps to try and find alternative splicing, 1 = yes, 0 = no always_complete=0 #extra steps to force start and stop codons, 1 = yes, 0 = no map_forward=0 #map names and attributes forward from old GFF3 genes, 1 = yes, 0 = no keep_preds=0 #Concordance threshold to add unsupported gene prediction (bound by 0 and 1) split_hit=10000 #length for the splitting of hits (expected max intron size for evidence alignments) single_exon=0 #consider single exon EST evidence when generating annotations, 1 = yes, 0 = no single_length=250 #min length required for single exon ESTs if 'single_exon is enabled' correct_est_fusion=0 #limits use of ESTs in annotation to avoid fusion genes tries=2 #number of times to try a contig if there is a failure for some reason clean_try=0 #remove all data from previous run before retrying, 1 = yes, 0 = no clean_up=0 #removes theVoid directory with individual analysis files, 1 = yes, 0 = no TMP= #specify a directory other than the system default temporary directory for temporary files -- Kevin M. Kocot, Ph.D. NSF International Postdoctoral Research Fellow Degnan Lab The University of Queensland School of Biological Sciences 325 Goddard Building 8 St. Lucia, QLD 4072 Australia Ph: +61 0402 488 430 From dence at genetics.utah.edu Sat Oct 17 10:46:09 2015 From: dence at genetics.utah.edu (Daniel Ence) Date: Sat, 17 Oct 2015 15:46:09 +0000 Subject: [maker-devel] Maker not producing expected output In-Reply-To: <5621CA26.8050807@uq.edu.au> References: <5621CA26.8050807@uq.edu.au> Message-ID: Hi Kevin, So I have a couple of clarifying questions, and an explanation that?ll hopefully be helpful. If you look in the master datastore log, do you see an entry that shows that scaffold finished successfully? It will have the name of the scaffold, then the path to the results directory, and then a status. There should be one that shows that maker started working on it, and one that shows that maker finished it. Second what are the files that you?re expecting to see? I think you?re expecting to see couple of fasta files and a gff3 file that contain all the annotation results all gathered together. You can gather those results with the fasta_merge, and gff3_merge scripts that came with maker. To explain what you saw in that example results directory that you sent, if there weren?t any models or predictions on that scaffold, then there won?t be fasta files in the results directory. You could verify that by looking at the scaffold-334630.gff file. The fast_merge, and gff3_merge will gather all of the results fasta and gff files for all the scaffolds and put them into a few fasta files and one gff3 files, respectively. Let me know whether that helps, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 > On Oct 16, 2015, at 10:10 PM, Kevin Kocot wrote: > > Hello, > > I've run Maker on a draft invertebrate genome and it seemed to finish successfully. However, many of the expected output files were not produced. If I go to, for example, XX_datastore/00/0C/scaffold-334630/, all I see is: > > theVoid.scaffold-334630 > run.log > scaffold-334630.gff > > In particular, I'm looking for the transcripts and proteins fasta files. I'm sure I have a configuration setting incorrect or one of the dependencies not correctly installed, but I can't figure out what the problem is. Any thoughts on how I can resolve this issue and generate these files? Ideally I would love to be able to generate these files without having to run the whole pipeline again. Details on my configuration settings and the contents of the run.log file from my example above are pasted below. > > Thank you, > Kevin > > ----- > run.log from the example folder above looks like this: > ----- > SHARED_ID d574e9ca9b0019a9fe147ccb9db3588b > CTL_OPTIONS maker_gff > CTL_OPTIONS other_gff > CTL_OPTIONS est test-transcriptome.fa > CTL_OPTIONS est_reads > CTL_OPTIONS altest KK273.fa > CTL_OPTIONS est_gff > CTL_OPTIONS altest_gff > CTL_OPTIONS protein test-AA.fa > CTL_OPTIONS protein_gff > CTL_OPTIONS model_org all > CTL_OPTIONS repeat_protein te_proteins.fasta > CTL_OPTIONS rmlib > CTL_OPTIONS rm_gff > CTL_OPTIONS organism_type eukaryotic > CTL_OPTIONS predictor est2genome,genemark,protein2genome > CTL_OPTIONS est2genome 1 > CTL_OPTIONS altest2genome 0 > CTL_OPTIONS snaphmm > CTL_OPTIONS gmhmm output/gmhmm.mod > CTL_OPTIONS augustus_species > CTL_OPTIONS fgenesh_par_file > CTL_OPTIONS model_gff > CTL_OPTIONS pred_gff > CTL_OPTIONS max_dna_len 100000 > CTL_OPTIONS split_hit 10000 > CTL_OPTIONS pred_flank 200 > CTL_OPTIONS pred_stats 0 > CTL_OPTIONS min_protein 0 > CTL_OPTIONS AED_threshold 1 > CTL_OPTIONS single_exon 0 > CTL_OPTIONS single_length 250 > CTL_OPTIONS keep_preds 0 > CTL_OPTIONS map_forward 0 > CTL_OPTIONS est_forward 0 > CTL_OPTIONS correct_est_fusion 0 > CTL_OPTIONS alt_splice 0 > CTL_OPTIONS always_complete 0 > CTL_OPTIONS alt_peptide C > CTL_OPTIONS evaluate 0 > CTL_OPTIONS blast_type ncbi+ > CTL_OPTIONS softmask 1 > CTL_OPTIONS pcov_blastn 0.8 > CTL_OPTIONS pid_blastn 0.85 > CTL_OPTIONS eval_blastn 1e-10 > CTL_OPTIONS bit_blastn 40 > CTL_OPTIONS depth_blastn 0 > CTL_OPTIONS pcov_rm_blastx 0.5 > CTL_OPTIONS pid_rm_blastx 0.4 > CTL_OPTIONS eval_rm_blastx 1e-06 > CTL_OPTIONS bit_rm_blastx 30 > CTL_OPTIONS pcov_blastx 0.5 > CTL_OPTIONS pid_blastx 0.4 > CTL_OPTIONS depth_blastx 0 > CTL_OPTIONS eval_blastx 1e-06 > CTL_OPTIONS bit_blastx 30 > CTL_OPTIONS pcov_tblastx 0.8 > CTL_OPTIONS pid_tblastx 0.85 > CTL_OPTIONS eval_tblastx 1e-10 > CTL_OPTIONS bit_tblastx 40 > CTL_OPTIONS depth_tblastx 0 > CTL_OPTIONS ep_score_limit 20 > CTL_OPTIONS en_score_limit 20 > CTL_OPTIONS enable_fathom 0 > CTL_OPTIONS unmask 0 > CTL_OPTIONS model_pass 0 > CTL_OPTIONS est_pass 0 > CTL_OPTIONS altest_pass 0 > CTL_OPTIONS protein_pass 0 > CTL_OPTIONS rm_pass 0 > CTL_OPTIONS other_pass 0 > CTL_OPTIONS pred_pass 0 > CTL_OPTIONS run genemark > LOGCHILD /media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/run.log.child.0 > LOGCHILD /media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/run.log.child.0 > LOGCHILD /media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/run.log.child.0 > STARTED test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/scaffold-334630.abinit_nomask.0.gmhmm%2Emod.genemark > FINISHED test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/scaffold-334630.abinit_nomask.0.gmhmm%2Emod.genemark > STARTED test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/scaffold-334630.0.pred.raw.section > FINISHED test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/scaffold-334630.0.pred.raw.section > LOGCHILD /media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/run.log.child.0 > LOGCHILD /media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/run.log.child.0 > LOGCHILD /media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/run.log.child.0 > STARTED test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/scaffold-334630.0.final.section > FINISHED test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/scaffold-334630.0.final.section > LOGCHILD /media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/run.log.child.0 > LOGCHILD /media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/run.log.child.0 > LOGCHILD /media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/run.log.child.0 > > ----- > maker_opts > ----- > #-----Genome (these are always required) > genome=/media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.fas #genome sequence (fasta file or fasta embeded in GFF3 file) > organism_type=eukaryotic #eukaryotic or prokaryotic. Default is eukaryotic > > #-----Re-annotation Using MAKER Derived GFF3 > maker_gff= #MAKER derived GFF3 file > est_pass=0 #use ESTs in maker_gff: 1 = yes, 0 = no > altest_pass=0 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no > protein_pass=0 #use protein alignments in maker_gff: 1 = yes, 0 = no > rm_pass=0 #use repeats in maker_gff: 1 = yes, 0 = no > model_pass=0 #use gene models in maker_gff: 1 = yes, 0 = no > pred_pass=0 #use ab-initio predictions in maker_gff: 1 = yes, 0 = no > other_pass=0 #passthrough anyything else in maker_gff: 1 = yes, 0 = no > > #-----EST Evidence (for best results provide a file for at least one) > est=/media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test-transcriptome.fa #set of ESTs or assembled mRNA-seq in fasta format > altest=/media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/KK273.fa #EST/cDNA sequence file in fasta format from an alternate organism > est_gff= #aligned ESTs or mRNA-seq from an external GFF3 file > altest_gff= #aligned ESTs from a closly relate species in GFF3 format > > #-----Protein Homology Evidence (for best results provide a file for at least one) > protein=/media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test-AA.fa #protein sequence file in fasta format (i.e. from mutiple oransisms) > protein_gff= #aligned protein homology evidence from an external GFF3 file > > #-----Repeat Masking (leave values blank to skip repeat masking) > model_org=all #select a model organism for RepBase masking in RepeatMasker > rmlib= #provide an organism specific repeat library in fasta format for RepeatMasker > repeat_protein=/usr/local/bin/maker/data/te_proteins.fasta #provide a fasta file of transposable element proteins for RepeatRunner > rm_gff= #pre-identified repeat elements from an external GFF3 file > prok_rm=0 #forces MAKER to repeatmask prokaryotes (no reason to change this), 1 = yes, 0 = no > softmask=1 #use soft-masking rather than hard-masking in BLAST (i.e. seg and dust filtering) > > #-----Gene Prediction > snaphmm= #SNAP HMM file > gmhmm=/media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/output/gmhmm.mod #GeneMark HMM file > augustus_species= #Augustus gene prediction species model > fgenesh_par_file= #FGENESH parameter file > pred_gff= #ab-initio predictions from an external GFF3 file > model_gff= #annotated gene models from an external GFF3 file (annotation pass-through) > est2genome=1 #infer gene predictions directly from ESTs, 1 = yes, 0 = no > protein2genome=1 #infer predictions from protein homology, 1 = yes, 0 = no > trna=0 #find tRNAs with tRNAscan, 1 = yes, 0 = no > snoscan_rrna= #rRNA file to have Snoscan find snoRNAs > unmask=0 #also run ab-initio prediction programs on unmasked sequence, 1 = yes, 0 = no > > #-----Other Annotation Feature Types (features MAKER doesn't recognize) > other_gff= #extra features to pass-through to final MAKER generated GFF3 file > > #-----External Application Behavior Options > alt_peptide=C #amino acid used to replace non-standard amino acids in BLAST databases > cpus=1 #max number of cpus to use in BLAST and RepeatMasker (not for MPI, leave 1 when using MPI) > > #-----MAKER Behavior Options > max_dna_len=100000 #length for dividing up contigs into chunks (increases/decreases memory usage) > min_contig=1 #skip genome contigs below this length (under 10kb are often useless) > > pred_flank=200 #flank for extending evidence clusters sent to gene predictors > pred_stats=0 #report AED and QI statistics for all predictions as well as models > AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and 1) > min_protein=0 #require at least this many amino acids in predicted proteins > alt_splice=0 #Take extra steps to try and find alternative splicing, 1 = yes, 0 = no > always_complete=0 #extra steps to force start and stop codons, 1 = yes, 0 = no > map_forward=0 #map names and attributes forward from old GFF3 genes, 1 = yes, 0 = no > keep_preds=0 #Concordance threshold to add unsupported gene prediction (bound by 0 and 1) > > split_hit=10000 #length for the splitting of hits (expected max intron size for evidence alignments) > single_exon=0 #consider single exon EST evidence when generating annotations, 1 = yes, 0 = no > single_length=250 #min length required for single exon ESTs if 'single_exon is enabled' > correct_est_fusion=0 #limits use of ESTs in annotation to avoid fusion genes > > tries=2 #number of times to try a contig if there is a failure for some reason > clean_try=0 #remove all data from previous run before retrying, 1 = yes, 0 = no > clean_up=0 #removes theVoid directory with individual analysis files, 1 = yes, 0 = no > TMP= #specify a directory other than the system default temporary directory for temporary files > > -- > Kevin M. Kocot, Ph.D. > NSF International Postdoctoral Research Fellow > Degnan Lab > The University of Queensland > School of Biological Sciences > 325 Goddard Building 8 > St. Lucia, QLD 4072 > Australia > Ph: +61 0402 488 430 > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Sat Oct 17 15:24:42 2015 From: carsonhh at gmail.com (Carson Holt) Date: Sat, 17 Oct 2015 14:24:42 -0600 Subject: [maker-devel] Maker not producing expected output In-Reply-To: <5621CA26.8050807@uq.edu.au> References: <5621CA26.8050807@uq.edu.au> Message-ID: <1117A475-526B-477A-B44A-E86A4A23262B@gmail.com> You will only get fasta files for the contig when there are gene models present on that contig. The only ab initio predictor you provided parameters for is GeneMark, and apparently it did not predict any genes for the contig in question. If it did you would have at least a fasta files in the output that contained all predictions made by GeneMark. MAKER doesn?t make gene models, rather it provides hints to other gene predictors based on the evidence alignments and then promotes and polishes the models they make. If they produce no models, then you will get no results. You can try adding additional gene predictors like SNAP (incase GeneMark just isn?t performing well), or you can check the length of your contig (contigs shorter than about 10kb rarely produce any results - they are too short to be annotatable). Try looking at the results from one of the larger contigs, or use fasta_merge to gather all results from all contigs. ?Carson > On Oct 16, 2015, at 10:10 PM, Kevin Kocot wrote: > > Hello, > > I've run Maker on a draft invertebrate genome and it seemed to finish successfully. However, many of the expected output files were not produced. If I go to, for example, XX_datastore/00/0C/scaffold-334630/, all I see is: > > theVoid.scaffold-334630 > run.log > scaffold-334630.gff > > In particular, I'm looking for the transcripts and proteins fasta files. I'm sure I have a configuration setting incorrect or one of the dependencies not correctly installed, but I can't figure out what the problem is. Any thoughts on how I can resolve this issue and generate these files? Ideally I would love to be able to generate these files without having to run the whole pipeline again. Details on my configuration settings and the contents of the run.log file from my example above are pasted below. > > Thank you, > Kevin > > ----- > run.log from the example folder above looks like this: > ----- > SHARED_ID d574e9ca9b0019a9fe147ccb9db3588b > CTL_OPTIONS maker_gff > CTL_OPTIONS other_gff > CTL_OPTIONS est test-transcriptome.fa > CTL_OPTIONS est_reads > CTL_OPTIONS altest KK273.fa > CTL_OPTIONS est_gff > CTL_OPTIONS altest_gff > CTL_OPTIONS protein test-AA.fa > CTL_OPTIONS protein_gff > CTL_OPTIONS model_org all > CTL_OPTIONS repeat_protein te_proteins.fasta > CTL_OPTIONS rmlib > CTL_OPTIONS rm_gff > CTL_OPTIONS organism_type eukaryotic > CTL_OPTIONS predictor est2genome,genemark,protein2genome > CTL_OPTIONS est2genome 1 > CTL_OPTIONS altest2genome 0 > CTL_OPTIONS snaphmm > CTL_OPTIONS gmhmm output/gmhmm.mod > CTL_OPTIONS augustus_species > CTL_OPTIONS fgenesh_par_file > CTL_OPTIONS model_gff > CTL_OPTIONS pred_gff > CTL_OPTIONS max_dna_len 100000 > CTL_OPTIONS split_hit 10000 > CTL_OPTIONS pred_flank 200 > CTL_OPTIONS pred_stats 0 > CTL_OPTIONS min_protein 0 > CTL_OPTIONS AED_threshold 1 > CTL_OPTIONS single_exon 0 > CTL_OPTIONS single_length 250 > CTL_OPTIONS keep_preds 0 > CTL_OPTIONS map_forward 0 > CTL_OPTIONS est_forward 0 > CTL_OPTIONS correct_est_fusion 0 > CTL_OPTIONS alt_splice 0 > CTL_OPTIONS always_complete 0 > CTL_OPTIONS alt_peptide C > CTL_OPTIONS evaluate 0 > CTL_OPTIONS blast_type ncbi+ > CTL_OPTIONS softmask 1 > CTL_OPTIONS pcov_blastn 0.8 > CTL_OPTIONS pid_blastn 0.85 > CTL_OPTIONS eval_blastn 1e-10 > CTL_OPTIONS bit_blastn 40 > CTL_OPTIONS depth_blastn 0 > CTL_OPTIONS pcov_rm_blastx 0.5 > CTL_OPTIONS pid_rm_blastx 0.4 > CTL_OPTIONS eval_rm_blastx 1e-06 > CTL_OPTIONS bit_rm_blastx 30 > CTL_OPTIONS pcov_blastx 0.5 > CTL_OPTIONS pid_blastx 0.4 > CTL_OPTIONS depth_blastx 0 > CTL_OPTIONS eval_blastx 1e-06 > CTL_OPTIONS bit_blastx 30 > CTL_OPTIONS pcov_tblastx 0.8 > CTL_OPTIONS pid_tblastx 0.85 > CTL_OPTIONS eval_tblastx 1e-10 > CTL_OPTIONS bit_tblastx 40 > CTL_OPTIONS depth_tblastx 0 > CTL_OPTIONS ep_score_limit 20 > CTL_OPTIONS en_score_limit 20 > CTL_OPTIONS enable_fathom 0 > CTL_OPTIONS unmask 0 > CTL_OPTIONS model_pass 0 > CTL_OPTIONS est_pass 0 > CTL_OPTIONS altest_pass 0 > CTL_OPTIONS protein_pass 0 > CTL_OPTIONS rm_pass 0 > CTL_OPTIONS other_pass 0 > CTL_OPTIONS pred_pass 0 > CTL_OPTIONS run genemark > LOGCHILD /media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/run.log.child.0 > LOGCHILD /media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/run.log.child.0 > LOGCHILD /media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/run.log.child.0 > STARTED test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/scaffold-334630.abinit_nomask.0.gmhmm%2Emod.genemark > FINISHED test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/scaffold-334630.abinit_nomask.0.gmhmm%2Emod.genemark > STARTED test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/scaffold-334630.0.pred.raw.section > FINISHED test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/scaffold-334630.0.pred.raw.section > LOGCHILD /media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/run.log.child.0 > LOGCHILD /media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/run.log.child.0 > LOGCHILD /media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/run.log.child.0 > STARTED test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/scaffold-334630.0.final.section > FINISHED test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/scaffold-334630.0.final.section > LOGCHILD /media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/run.log.child.0 > LOGCHILD /media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/run.log.child.0 > LOGCHILD /media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/run.log.child.0 > > ----- > maker_opts > ----- > #-----Genome (these are always required) > genome=/media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.fas #genome sequence (fasta file or fasta embeded in GFF3 file) > organism_type=eukaryotic #eukaryotic or prokaryotic. Default is eukaryotic > > #-----Re-annotation Using MAKER Derived GFF3 > maker_gff= #MAKER derived GFF3 file > est_pass=0 #use ESTs in maker_gff: 1 = yes, 0 = no > altest_pass=0 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no > protein_pass=0 #use protein alignments in maker_gff: 1 = yes, 0 = no > rm_pass=0 #use repeats in maker_gff: 1 = yes, 0 = no > model_pass=0 #use gene models in maker_gff: 1 = yes, 0 = no > pred_pass=0 #use ab-initio predictions in maker_gff: 1 = yes, 0 = no > other_pass=0 #passthrough anyything else in maker_gff: 1 = yes, 0 = no > > #-----EST Evidence (for best results provide a file for at least one) > est=/media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test-transcriptome.fa #set of ESTs or assembled mRNA-seq in fasta format > altest=/media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/KK273.fa #EST/cDNA sequence file in fasta format from an alternate organism > est_gff= #aligned ESTs or mRNA-seq from an external GFF3 file > altest_gff= #aligned ESTs from a closly relate species in GFF3 format > > #-----Protein Homology Evidence (for best results provide a file for at least one) > protein=/media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test-AA.fa #protein sequence file in fasta format (i.e. from mutiple oransisms) > protein_gff= #aligned protein homology evidence from an external GFF3 file > > #-----Repeat Masking (leave values blank to skip repeat masking) > model_org=all #select a model organism for RepBase masking in RepeatMasker > rmlib= #provide an organism specific repeat library in fasta format for RepeatMasker > repeat_protein=/usr/local/bin/maker/data/te_proteins.fasta #provide a fasta file of transposable element proteins for RepeatRunner > rm_gff= #pre-identified repeat elements from an external GFF3 file > prok_rm=0 #forces MAKER to repeatmask prokaryotes (no reason to change this), 1 = yes, 0 = no > softmask=1 #use soft-masking rather than hard-masking in BLAST (i.e. seg and dust filtering) > > #-----Gene Prediction > snaphmm= #SNAP HMM file > gmhmm=/media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/output/gmhmm.mod #GeneMark HMM file > augustus_species= #Augustus gene prediction species model > fgenesh_par_file= #FGENESH parameter file > pred_gff= #ab-initio predictions from an external GFF3 file > model_gff= #annotated gene models from an external GFF3 file (annotation pass-through) > est2genome=1 #infer gene predictions directly from ESTs, 1 = yes, 0 = no > protein2genome=1 #infer predictions from protein homology, 1 = yes, 0 = no > trna=0 #find tRNAs with tRNAscan, 1 = yes, 0 = no > snoscan_rrna= #rRNA file to have Snoscan find snoRNAs > unmask=0 #also run ab-initio prediction programs on unmasked sequence, 1 = yes, 0 = no > > #-----Other Annotation Feature Types (features MAKER doesn't recognize) > other_gff= #extra features to pass-through to final MAKER generated GFF3 file > > #-----External Application Behavior Options > alt_peptide=C #amino acid used to replace non-standard amino acids in BLAST databases > cpus=1 #max number of cpus to use in BLAST and RepeatMasker (not for MPI, leave 1 when using MPI) > > #-----MAKER Behavior Options > max_dna_len=100000 #length for dividing up contigs into chunks (increases/decreases memory usage) > min_contig=1 #skip genome contigs below this length (under 10kb are often useless) > > pred_flank=200 #flank for extending evidence clusters sent to gene predictors > pred_stats=0 #report AED and QI statistics for all predictions as well as models > AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and 1) > min_protein=0 #require at least this many amino acids in predicted proteins > alt_splice=0 #Take extra steps to try and find alternative splicing, 1 = yes, 0 = no > always_complete=0 #extra steps to force start and stop codons, 1 = yes, 0 = no > map_forward=0 #map names and attributes forward from old GFF3 genes, 1 = yes, 0 = no > keep_preds=0 #Concordance threshold to add unsupported gene prediction (bound by 0 and 1) > > split_hit=10000 #length for the splitting of hits (expected max intron size for evidence alignments) > single_exon=0 #consider single exon EST evidence when generating annotations, 1 = yes, 0 = no > single_length=250 #min length required for single exon ESTs if 'single_exon is enabled' > correct_est_fusion=0 #limits use of ESTs in annotation to avoid fusion genes > > tries=2 #number of times to try a contig if there is a failure for some reason > clean_try=0 #remove all data from previous run before retrying, 1 = yes, 0 = no > clean_up=0 #removes theVoid directory with individual analysis files, 1 = yes, 0 = no > TMP= #specify a directory other than the system default temporary directory for temporary files > > -- > Kevin M. Kocot, Ph.D. > NSF International Postdoctoral Research Fellow > Degnan Lab > The University of Queensland > School of Biological Sciences > 325 Goddard Building 8 > St. Lucia, QLD 4072 > Australia > Ph: +61 0402 488 430 > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From 14.chewsoklim at gmail.com Mon Oct 19 21:49:21 2015 From: 14.chewsoklim at gmail.com (Sok Lim Chew) Date: Tue, 20 Oct 2015 10:49:21 +0800 Subject: [maker-devel] Failed while doing blastx of proteins Message-ID: Hi all, The following errors occurred while I was using MAKER for annotation. I have searched around this forum but seems like the solutions provided do not works for me. ################################################################# ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Must have defined a valid name for Hit STACK: Error::throw STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:486 STACK: Bio::Search::Hit::GenericHit::new /usr/local/share/perl5/Bio/Search/Hit/GenericHit.pm:149 STACK: Bio::Search::Hit::PhatHit::Base::new maker/bin/../lib/Bio/Search/Hit/PhatHit/Base.pm:127 STACK: Bio::Search::Hit::PhatHit::blastx::new maker/bin/../lib/Bio/Search/Hit/PhatHit/blastx.pm:125 STACK: Bio::Search::Hit::HitFactory::create /usr/local/share/perl5/Bio/Search/Hit/HitFactory.pm:124 STACK: Bio::Factory::ObjectFactoryI::create_object /usr/local/share/perl5/Bio/Factory/ObjectFactoryI.pm:114 STACK: Bio::Search::Iteration::GenericIteration::newhits_below_threshold /usr/local/share/perl5/Bio/Search/Iteration/GenericIteration.pm:506 STACK: Bio::Search::Iteration::GenericIteration::newhits /usr/local/share/perl5/Bio/Search/Iteration/GenericIteration.pm:488 STACK: Bio::Search::Iteration::GenericIteration::hits /usr/local/share/perl5/Bio/Search/Iteration/GenericIteration.pm:469 STACK: Bio::Search::Result::BlastResult::hits /usr/local/share/perl5/Bio/Search/Result/BlastResult.pm:168 STACK: Bio::Search::Result::BlastResult::num_hits /usr/local/share/perl5/Bio/Search/Result/BlastResult.pm:242 STACK: Widget::blastx::keepers maker/bin/../lib/Widget/blastx.pm:164 STACK: Widget::blastx::parse maker/bin/../lib/Widget/blastx.pm:132 STACK: GI::blastx_as_chunks maker/bin/../lib/GI.pm:2457 STACK: GI::blastx_as_chunks maker/bin/../lib/GI.pm:2466 STACK: Process::MpiChunk::_go maker/bin/../lib/Process/MpiChunk.pm:2687 STACK: Process::MpiChunk::run maker/bin/../lib/Process/MpiChunk.pm:341 STACK: Process::MpiChunk::run_all maker/bin/../lib/Process/MpiChunk.pm:357 STACK: Process::MpiTiers::run_all maker/bin/../lib/Process/MpiTiers.pm:287 STACK: Process::MpiTiers::run_all maker/bin/../lib/Process/MpiTiers.pm:287 STACK: maker/bin/maker:686 ----------------------------------------------------------- --> rank=NA, hostname=gena2 --> rank=NA, hostname=gena2 --> rank=NA, hostname=gena2 --> rank=NA, hostname=gena2 ERROR: Failed while doing blastx of proteins ERROR: Chunk failed at level:8, tier_type:3 FAILED CONTIG:Contig1 ERROR: Chunk failed at level:4, tier_type:0 FAILED CONTIG:Contig1 examining contents of the fasta file and run log ########################################################### Is anyone has any idea on this? Thanks, SokLim -------------- next part -------------- An HTML attachment was scrubbed... URL: From 14.chewsoklim at gmail.com Mon Oct 19 22:56:09 2015 From: 14.chewsoklim at gmail.com (Sok Lim Chew) Date: Tue, 20 Oct 2015 11:56:09 +0800 Subject: [maker-devel] Failed while doing blastx of proteins Message-ID: Hi all, The following errors occurred while I was using MAKER for annotation. I have searched around this forum but seems like the solutions provided do not works for me. ################################################################# ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Must have defined a valid name for Hit STACK: Error::throw STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:486 STACK: Bio::Search::Hit::GenericHit::new /usr/local/share/perl5/Bio/Search/Hit/GenericHit.pm:149 STACK: Bio::Search::Hit::PhatHit::Base::new maker/bin/../lib/Bio/Search/Hit/PhatHit/Base.pm:127 STACK: Bio::Search::Hit::PhatHit::blastx::new maker/bin/../lib/Bio/Search/Hit/PhatHit/blastx.pm:125 STACK: Bio::Search::Hit::HitFactory::create /usr/local/share/perl5/Bio/Search/Hit/HitFactory.pm:124 STACK: Bio::Factory::ObjectFactoryI::create_object /usr/local/share/perl5/Bio/Factory/ObjectFactoryI.pm:114 STACK: Bio::Search::Iteration::GenericIteration::newhits_below_threshold /usr/local/share/perl5/Bio/Search/Iteration/GenericIteration.pm:506 STACK: Bio::Search::Iteration::GenericIteration::newhits /usr/local/share/perl5/Bio/Search/Iteration/GenericIteration.pm:488 STACK: Bio::Search::Iteration::GenericIteration::hits /usr/local/share/perl5/Bio/Search/Iteration/GenericIteration.pm:469 STACK: Bio::Search::Result::BlastResult::hits /usr/local/share/perl5/Bio/Search/Result/BlastResult.pm:168 STACK: Bio::Search::Result::BlastResult::num_hits /usr/local/share/perl5/Bio/Search/Result/BlastResult.pm:242 STACK: Widget::blastx::keepers maker/bin/../lib/Widget/blastx.pm:164 STACK: Widget::blastx::parse maker/bin/../lib/Widget/blastx.pm:132 STACK: GI::blastx_as_chunks maker/bin/../lib/GI.pm:2457 STACK: GI::blastx_as_chunks maker/bin/../lib/GI.pm:2466 STACK: Process::MpiChunk::_go maker/bin/../lib/Process/MpiChunk.pm:2687 STACK: Process::MpiChunk::run maker/bin/../lib/Process/MpiChunk.pm:341 STACK: Process::MpiChunk::run_all maker/bin/../lib/Process/MpiChunk.pm:357 STACK: Process::MpiTiers::run_all maker/bin/../lib/Process/MpiTiers.pm:287 STACK: Process::MpiTiers::run_all maker/bin/../lib/Process/MpiTiers.pm:287 STACK: maker/bin/maker:686 ----------------------------------------------------------- --> rank=NA, hostname=gena2 --> rank=NA, hostname=gena2 --> rank=NA, hostname=gena2 --> rank=NA, hostname=gena2 ERROR: Failed while doing blastx of proteins ERROR: Chunk failed at level:8, tier_type:3 FAILED CONTIG:Contig1 ERROR: Chunk failed at level:4, tier_type:0 FAILED CONTIG:Contig1 examining contents of the fasta file and run log ########################################################### Is anyone has any idea on this? Thanks, SokLim -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Oct 20 10:52:33 2015 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 20 Oct 2015 09:52:33 -0600 Subject: [maker-devel] Failed while doing blastx of proteins In-Reply-To: References: Message-ID: <68B95736-10CD-4C90-8092-3AA754250799@gmail.com> Make sure you installed the CPAN version of BioPerl and not BioPerl live (Current version is 1.6.924). Also there are a couple of BLAST+ versions that have bugs. Use version BLAST+ version 2.2.28. What version of MAKER are you using? Should be 2.31.8. Also check that your /tmp directory is not full (will result in truncated output files). Thanks, Carson > On Oct 19, 2015, at 9:56 PM, Sok Lim Chew <14.chewsoklim at gmail.com> wrote: > > Hi all, > > The following errors occurred while I was using MAKER for annotation. I have searched around this forum but seems like the solutions provided do not works for me. > > ################################################################# > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Must have defined a valid name for Hit > STACK: Error::throw > STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:486 > STACK: Bio::Search::Hit::GenericHit::new /usr/local/share/perl5/Bio/Search/Hit/GenericHit.pm:149 > STACK: Bio::Search::Hit::PhatHit::Base::new maker/bin/../lib/Bio/Search/Hit/PhatHit/Base.pm:127 > STACK: Bio::Search::Hit::PhatHit::blastx::new maker/bin/../lib/Bio/Search/Hit/PhatHit/blastx.pm:125 > STACK: Bio::Search::Hit::HitFactory::create /usr/local/share/perl5/Bio/Search/Hit/HitFactory.pm:124 > STACK: Bio::Factory::ObjectFactoryI::create_object /usr/local/share/perl5/Bio/Factory/ObjectFactoryI.pm:114 > STACK: Bio::Search::Iteration::GenericIteration::newhits_below_threshold /usr/local/share/perl5/Bio/Search/Iteration/GenericIteration.pm:506 > STACK: Bio::Search::Iteration::GenericIteration::newhits /usr/local/share/perl5/Bio/Search/Iteration/GenericIteration.pm:488 > STACK: Bio::Search::Iteration::GenericIteration::hits /usr/local/share/perl5/Bio/Search/Iteration/GenericIteration.pm:469 > STACK: Bio::Search::Result::BlastResult::hits /usr/local/share/perl5/Bio/Search/Result/BlastResult.pm:168 > STACK: Bio::Search::Result::BlastResult::num_hits /usr/local/share/perl5/Bio/Search/Result/BlastResult.pm:242 > STACK: Widget::blastx::keepers > maker/bin/../lib/Widget/blastx.pm:164 > STACK: Widget::blastx::parse > maker/bin/../lib/Widget/blastx.pm:132 > STACK: GI::blastx_as_chunks > maker/bin/../lib/GI.pm:2457 > STACK: GI::blastx_as_chunks > maker/bin/../lib/GI.pm:2466 > STACK: Process::MpiChunk::_go > maker/bin/../lib/Process/MpiChunk.pm:2687 > STACK: Process::MpiChunk::run > maker/bin/../lib/Process/MpiChunk.pm:341 > STACK: Process::MpiChunk::run_all > maker/bin/../lib/Process/MpiChunk.pm:357 > STACK: Process::MpiTiers::run_all maker/bin/../lib/Process/MpiTiers.pm:287 > STACK: Process::MpiTiers::run_all maker/bin/../lib/Process/MpiTiers.pm:287 > STACK: maker/bin/maker:686 > ----------------------------------------------------------- > --> rank=NA, hostname=gena2 > --> rank=NA, hostname=gena2 > --> rank=NA, hostname=gena2 > --> rank=NA, hostname=gena2 > ERROR: Failed while doing blastx of proteins > ERROR: Chunk failed at level:8, tier_type:3 > FAILED CONTIG:Contig1 > > ERROR: Chunk failed at level:4, tier_type:0 > FAILED CONTIG:Contig1 > > examining contents of the fasta file and run log > > ########################################################### > > Is anyone has any idea on this? > > Thanks, > SokLim > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From mcsimenc at gmail.com Tue Oct 20 18:54:21 2015 From: mcsimenc at gmail.com (Matt Simenc) Date: Tue, 20 Oct 2015 16:54:21 -0700 Subject: [maker-devel] MPI large load Message-ID: Hi, I am using OpenMPI to run MAKER on 2 nodes with 40 CPUs/node. The load is distributing across the nodes ok but with a very large number of processes on each node. Sometimes there are several hundred more processes than can be executed at one time by a node. Is this a problem? If so, any suggestions on how to fix? Thanks! Matt Simenc Der Lab California State University Fullerton -------------- next part -------------- An HTML attachment was scrubbed... URL: From cjfields at illinois.edu Wed Oct 21 13:44:28 2015 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 21 Oct 2015 18:44:28 +0000 Subject: [maker-devel] Failed while doing blastx of proteins In-Reply-To: <68B95736-10CD-4C90-8092-3AA754250799@gmail.com> References: <68B95736-10CD-4C90-8092-3AA754250799@gmail.com> Message-ID: <5CCEB170-2B41-4CC1-A9CB-C246274345B9@illinois.edu> Agreed. It would be nice to know whether this is a Bioperl bug that needs addressing, but I?m not sure how easy it would be to pull out a test case. chris On Oct 20, 2015, at 10:52 AM, Carson Holt > wrote: Make sure you installed the CPAN version of BioPerl and not BioPerl live (Current version is 1.6.924). Also there are a couple of BLAST+ versions that have bugs. Use version BLAST+ version 2.2.28. What version of MAKER are you using? Should be 2.31.8. Also check that your /tmp directory is not full (will result in truncated output files). Thanks, Carson On Oct 19, 2015, at 9:56 PM, Sok Lim Chew <14.chewsoklim at gmail.com> wrote: Hi all, The following errors occurred while I was using MAKER for annotation. I have searched around this forum but seems like the solutions provided do not works for me. ################################################################# ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Must have defined a valid name for Hit STACK: Error::throw STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:486 STACK: Bio::Search::Hit::GenericHit::new /usr/local/share/perl5/Bio/Search/Hit/GenericHit.pm:149 STACK: Bio::Search::Hit::PhatHit::Base::new maker/bin/../lib/Bio/Search/Hit/PhatHit/Base.pm:127 STACK: Bio::Search::Hit::PhatHit::blastx::new maker/bin/../lib/Bio/Search/Hit/PhatHit/blastx.pm:125 STACK: Bio::Search::Hit::HitFactory::create /usr/local/share/perl5/Bio/Search/Hit/HitFactory.pm:124 STACK: Bio::Factory::ObjectFactoryI::create_object /usr/local/share/perl5/Bio/Factory/ObjectFactoryI.pm:114 STACK: Bio::Search::Iteration::GenericIteration::newhits_below_threshold /usr/local/share/perl5/Bio/Search/Iteration/GenericIteration.pm:506 STACK: Bio::Search::Iteration::GenericIteration::newhits /usr/local/share/perl5/Bio/Search/Iteration/GenericIteration.pm:488 STACK: Bio::Search::Iteration::GenericIteration::hits /usr/local/share/perl5/Bio/Search/Iteration/GenericIteration.pm:469 STACK: Bio::Search::Result::BlastResult::hits /usr/local/share/perl5/Bio/Search/Result/BlastResult.pm:168 STACK: Bio::Search::Result::BlastResult::num_hits /usr/local/share/perl5/Bio/Search/Result/BlastResult.pm:242 STACK: Widget::blastx::keepers maker/bin/../lib/Widget/blastx.pm:164 STACK: Widget::blastx::parse maker/bin/../lib/Widget/blastx.pm:132 STACK: GI::blastx_as_chunks maker/bin/../lib/GI.pm:2457 STACK: GI::blastx_as_chunks maker/bin/../lib/GI.pm:2466 STACK: Process::MpiChunk::_go maker/bin/../lib/Process/MpiChunk.pm:2687 STACK: Process::MpiChunk::run maker/bin/../lib/Process/MpiChunk.pm:341 STACK: Process::MpiChunk::run_all maker/bin/../lib/Process/MpiChunk.pm:357 STACK: Process::MpiTiers::run_all maker/bin/../lib/Process/MpiTiers.pm:287 STACK: Process::MpiTiers::run_all maker/bin/../lib/Process/MpiTiers.pm:287 STACK: maker/bin/maker:686 ----------------------------------------------------------- --> rank=NA, hostname=gena2 --> rank=NA, hostname=gena2 --> rank=NA, hostname=gena2 --> rank=NA, hostname=gena2 ERROR: Failed while doing blastx of proteins ERROR: Chunk failed at level:8, tier_type:3 FAILED CONTIG:Contig1 ERROR: Chunk failed at level:4, tier_type:0 FAILED CONTIG:Contig1 examining contents of the fasta file and run log ########################################################### Is anyone has any idea on this? Thanks, SokLim _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Oct 22 11:39:21 2015 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 22 Oct 2015 10:39:21 -0600 Subject: [maker-devel] MPI large load In-Reply-To: References: Message-ID: <4733DA58-3277-4703-A660-4EE819858694@gmail.com> Because MAKER is a pipeline, all processes it calls will generate separate processes (i.e. BLAST etc.). Also it will spawn a couple of helper processes to watch communication and files. The helper processes use 0% CPU, and the main MAKER process will yield to external system calls and processes until they finish execution. So they will never use a larger % of CPU than is specified. Also the way MPI works is it spawns a separate process for every CPU specified, so if you specify 40 CPUs you get 40 independent communicating processes rather than 1 process accessing 40 CPUs. So if you take into account the MPI processes, helper processes, and external system calls a 40 CPU specification could result in up to three times that many numbered processes existing simultaneously (even though no more than 40 will be active at a time). However if your system is having an issue letting the required number of processes exist, then it is a ulimit issue. Your administrator has the limit set too low. You can see what limits are set using the command ?ulimit -a?. You will need to get your system admin to fix it. ?Carson > On Oct 20, 2015, at 5:54 PM, Matt Simenc wrote: > > Hi, > > > > I am using OpenMPI to run MAKER on 2 nodes with 40 CPUs/node. The load is distributing across the nodes ok but with a very large number of processes on each node. Sometimes there are several hundred more processes than can be executed at one time by a node. Is this a problem? If so, any suggestions on how to fix? > > > > Thanks! > > > > Matt Simenc > > Der Lab > > California State University Fullerton > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From jmartin at genome.wustl.edu Tue Oct 27 16:35:38 2015 From: jmartin at genome.wustl.edu (John Martin) Date: Tue, 27 Oct 2015 16:35:38 -0500 Subject: [maker-devel] ERROR: Failed while clustering transcripts into genes for annotations Message-ID: <562FEE2A.4070907@genome.wustl.edu> I'm working on annotation for a de novo genomic assembly, and I have my jobs split into roughly equal (by seq length) batches. I am seeing ~2/3rds of these batches completing successfully, while the other 1/3rd is failing. I identified a problem contig, and have been doing test maker runs on that to try and figure out whats going on. The full batch was run using an older version of maker (v2.26), so I first tried updating to the latest version of maker (v2.31.8). That version did point out one problem in an EST evidence file I was using, which I fixed. That allowed maker to get much farther, but as it was nearing the end of the run it crashed again with this error message: ++++++++++++++++++++++++++++++++++ setting up GFF3 output and fasta chunks processing the chunk divide preparing evidence clusters for annotations Preparing evidence for hint based annotation in cluster::shadow_cluster... ...finished clustering. cleaning clusters.... total clusters:1 now processing 0 ...processing 0 of 8 ...processing 1 of 8 ...processing 2 of 8 ...processing 3 of 8 ...processing 4 of 8 ...processing 5 of 8 ...processing 6 of 8 ...processing 7 of 8 ...processing 0 of 13 ...processing 1 of 13 ...processing 2 of 13 ...processing 3 of 13 ...processing 4 of 13 ...processing 5 of 13 ...processing 6 of 13 ...processing 7 of 13 ...processing 8 of 13 ...processing 9 of 13 ...processing 10 of 13 ...processing 11 of 13 ...processing 12 of 13 annotating transcripts Making transcripts clustering transcripts into genes for annotations Processing transcripts into genes ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Calling translate without a seq argument! STACK: Error::throw STACK: Bio::Root::Root::throw /home/ec2-user/bin/BioPerl-1.6.923/Bio/Root/Root.pm:486 STACK: Bio::Tools::CodonTable::translate /home/ec2-user/bin/BioPerl-1.6.923/Bio/Tools/CodonTable.pm:414 STACK: PhatHit_utils::_adjust /home/ec2-user/bin/maker/bin/../lib/PhatHit_utils.pm:846 STACK: PhatHit_utils::adjust_start_stop /home/ec2-user/bin/maker/bin/../lib/PhatHit_utils.pm:794 STACK: maker::auto_annotator::load_transcript_struct /home/ec2-user/bin/maker/bin/../lib/maker/auto_annotator.pm:2198 STACK: maker::auto_annotator::group_transcripts /home/ec2-user/bin/maker/bin/../lib/maker/auto_annotator.pm:2676 STACK: maker::auto_annotator::annotate_genes /home/ec2-user/bin/maker/bin/../lib/maker/auto_annotator.pm:1018 STACK: Process::MpiChunk::_go /home/ec2-user/bin/maker/bin/../lib/Process/MpiChunk.pm:3847 STACK: Process::MpiChunk::run /home/ec2-user/bin/maker/bin/../lib/Process/MpiChunk.pm:341 STACK: Process::MpiChunk::run_all /home/ec2-user/bin/maker/bin/../lib/Process/MpiChunk.pm:357 STACK: Process::MpiTiers::run_all /home/ec2-user/bin/maker/bin/../lib/Process/MpiTiers.pm:287 STACK: Process::MpiTiers::run_all /home/ec2-user/bin/maker/bin/../lib/Process/MpiTiers.pm:287 STACK: /home/ec2-user/bin/maker/bin/maker:686 ----------------------------------------------------------- --> rank=NA, hostname=ip-172-31-35-77.us-west-2.compute.internal ERROR: Failed while clustering transcripts into genes for annotations ERROR: Chunk failed at level:2, tier_type:4 FAILED CONTIG:ANCCEYDFT_Contig1675 ERROR: Chunk failed at level:6, tier_type:0 FAILED CONTIG:ANCCEYDFT_Contig1675 examining contents of the fasta file and run log --Next Contig-- Processing run.log file... Maker is now finished!!! Start_time: 1445912338 End_time: 1445913817 Elapsed: 1479 ++++++++++++++++++++++++++++++++++ The root of the error seems clearly stated: MSG: Calling translate without a seq argument! but I don't know what that means in real terms. All my inputs appear valid. The contig I am testing with has 1 plus strand gene represented in the evidence files. And I've set a local TMP directory since I've read that sometimes these kinds of problems can stem from the program running out of TMP space. I am pretty sure that is not happening here (I put TMP on a disk with 1.1Tb of space, and the test contig is only 13kbp). Can anyone help me figure out what is going on? Thanks, John Martin ____ This email message is a private communication. The information transmitted, including attachments, is intended only for the person or entity to which it is addressed and may contain confidential, privileged, and/or proprietary material. Any review, duplication, retransmission, distribution, or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is unauthorized by the sender and is prohibited. If you have received this message in error, please contact the sender immediately by return email and delete the original message from all computer systems. Thank you. From roscito at mpi-cbg.de Mon Oct 5 06:06:57 2015 From: roscito at mpi-cbg.de (roscito) Date: Mon, 5 Oct 2015 14:06:57 +0200 Subject: [maker-devel] different exons predicted in different maker rounds Message-ID: <2FD50373-B02F-49C0-8414-EF6705BB5826@mpi-cbg.de> Dear all, First of all, I'd like to thank everyone in this forum for all the tips and comments on the best strategies for running MAKER, they have been really helpful so far. However, I still don't fully understand the behaviour of MAKER when ran iteratively, and I compare the predictions from each round. Let me explain: My input data are the following: - the repeat-masked genome of a vertebrate (~2Gb); - mRNA data for this species mapped to the genome with tophat2 and assembled into transcripts with cufflinks; - exonerate-mapped proteins in gff3 format to the reference genome, from closely related species (global alignment) For the first round of MAKER, I provided both cufflinks and exonerate-mapped proteins with the options est2genome and protein2genome = 1. From maker output, I generated the SNAP .hmm file (as the instructions in http://gmod.org/wiki/MAKER_Tutorial) and provided it as input to the second round of MAKER. For this second round I still gave cufflinks + exonerated proteins, but switched both est2genome ad protein2genome to 0. After finished, I generated SNAP .hmm once more and provided it for the 3rd and final round of MAKER, along with cufflinks and exonerated-mapped prots and est/prot2genome=0 As sort of a sanity check, I went on and ran a 4th round of MAKER with the SNAP .hmm file from round3, cufflinks and exonerated-mapped prots and est/prot2genome=0, and this time specifying alt_splice=1. For all the rounds, I also specified single_exon=1. I loaded the gene predictions from each round plus the cufflink transcripts and the exonerated proteins to the genome browser to visually inspect the output. I saw a few strange cases where MAKER doesn't seem to use the protein/mRNA evidences for the gene predictions, and I would greatly appreciate any feedback/ideas on what I could possible be doing wrong. Here are a few screenshots so you know what I'm talking about: In this first example, MAKER misses a conserved exon for which there is both protein and mRNA evidence, and only if I specify alt_splice I get the exon 'back'. In this second example, MAKER completely ignores lots of exons, all conserved across vertebrates, and supported by protein/mRNA evidence. In the third example, there is no prediction from round1, the one from round2 matches the protein/mRNA evidence, and then in the final round3 and 4, an extra exon appears. (hope you'l be able to see the images above) As I said, I would greatly appreciate any feedback on these strange cases. Perhaps I'm missing some parameter(s)? Thanks a lot. All the best, Juliana -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: example1.png Type: image/png Size: 53178 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: example2.png Type: image/png Size: 55134 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: example3.png Type: image/png Size: 67598 bytes Desc: not available URL: From carsonhh at gmail.com Fri Oct 9 12:58:02 2015 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 9 Oct 2015 12:58:02 -0600 Subject: [maker-devel] different exons predicted in different maker rounds In-Reply-To: <2FD50373-B02F-49C0-8414-EF6705BB5826@mpi-cbg.de> References: <2FD50373-B02F-49C0-8414-EF6705BB5826@mpi-cbg.de> Message-ID: <1D5FAF7F-4758-4C4A-90EB-31C7BAE34725@gmail.com> Some of your cufflinks evidence is contradicting the existence of the exon. When you have alt_splice=0, the evidence is passed in in it?s entirety, and all sources are equal. When you have alt_splice=1 set, certain pieces of spliced evidence are given higher priority (in an iterative fashion), and cannot be overridden by contradictory evidence. The result is that there is a specific combination of evidence based hints that allow the gene predictor to find the exon, but when it sees everything the HMM doesn?t score the exon as very likely. I?d recommend not running with cufflinks because its result usually have low specificity, so it generates a lot of bad hints. Use Trinity instead to assembly everything, then allow it to be aligned inside of MAKER. Trinity assembled contigs give much greater specificity in the final results. ?Carson > On Oct 5, 2015, at 6:06 AM, roscito wrote: > > Dear all, > > First of all, I'd like to thank everyone in this forum for all the tips and comments on the best strategies for running MAKER, they have been really helpful so far. > However, I still don't fully understand the behaviour of MAKER when ran iteratively, and I compare the predictions from each round. Let me explain: > > My input data are the following: > - the repeat-masked genome of a vertebrate (~2Gb); > - mRNA data for this species mapped to the genome with tophat2 and assembled into transcripts with cufflinks; > - exonerate-mapped proteins in gff3 format to the reference genome, from closely related species (global alignment) > > For the first round of MAKER, I provided both cufflinks and exonerate-mapped proteins with the options est2genome and protein2genome = 1. From maker output, I generated the SNAP .hmm file (as the instructions in http://gmod.org/wiki/MAKER_Tutorial ) and provided it as input to the second round of MAKER. > For this second round I still gave cufflinks + exonerated proteins, but switched both est2genome ad protein2genome to 0. After finished, I generated SNAP .hmm once more and provided it for the 3rd and final round of MAKER, along with cufflinks and exonerated-mapped prots and est/prot2genome=0 > > As sort of a sanity check, I went on and ran a 4th round of MAKER with the SNAP .hmm file from round3, cufflinks and exonerated-mapped prots and est/prot2genome=0, and this time specifying alt_splice=1. > For all the rounds, I also specified single_exon=1. > > > I loaded the gene predictions from each round plus the cufflink transcripts and the exonerated proteins to the genome browser to visually inspect the output. I saw a few strange cases where MAKER doesn't seem to use the protein/mRNA evidences for the gene predictions, and I would greatly appreciate any feedback/ideas on what I could possible be doing wrong. Here are a few screenshots so you know what I'm talking about: > > In this first example, MAKER misses a conserved exon for which there is both protein and mRNA evidence, and only if I specify alt_splice I get the exon 'back'. > > > > In this second example, MAKER completely ignores lots of exons, all conserved across vertebrates, and supported by protein/mRNA evidence. > > > > In the third example, there is no prediction from round1, the one from round2 matches the protein/mRNA evidence, and then in the final round3 and 4, an extra exon appears. > > > > > (hope you'l be able to see the images above) > As I said, I would greatly appreciate any feedback on these strange cases. Perhaps I'm missing some parameter(s)? > > Thanks a lot. > All the best, > Juliana > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From kmkocot at gmail.com Fri Oct 16 22:10:14 2015 From: kmkocot at gmail.com (Kevin Kocot) Date: Sat, 17 Oct 2015 14:10:14 +1000 Subject: [maker-devel] Maker not producing expected output Message-ID: <5621CA26.8050807@uq.edu.au> Hello, I've run Maker on a draft invertebrate genome and it seemed to finish successfully. However, many of the expected output files were not produced. If I go to, for example, XX_datastore/00/0C/scaffold-334630/, all I see is: theVoid.scaffold-334630 run.log scaffold-334630.gff In particular, I'm looking for the transcripts and proteins fasta files. I'm sure I have a configuration setting incorrect or one of the dependencies not correctly installed, but I can't figure out what the problem is. Any thoughts on how I can resolve this issue and generate these files? Ideally I would love to be able to generate these files without having to run the whole pipeline again. Details on my configuration settings and the contents of the run.log file from my example above are pasted below. Thank you, Kevin ----- run.log from the example folder above looks like this: ----- SHARED_ID d574e9ca9b0019a9fe147ccb9db3588b CTL_OPTIONS maker_gff CTL_OPTIONS other_gff CTL_OPTIONS est test-transcriptome.fa CTL_OPTIONS est_reads CTL_OPTIONS altest KK273.fa CTL_OPTIONS est_gff CTL_OPTIONS altest_gff CTL_OPTIONS protein test-AA.fa CTL_OPTIONS protein_gff CTL_OPTIONS model_org all CTL_OPTIONS repeat_protein te_proteins.fasta CTL_OPTIONS rmlib CTL_OPTIONS rm_gff CTL_OPTIONS organism_type eukaryotic CTL_OPTIONS predictor est2genome,genemark,protein2genome CTL_OPTIONS est2genome 1 CTL_OPTIONS altest2genome 0 CTL_OPTIONS snaphmm CTL_OPTIONS gmhmm output/gmhmm.mod CTL_OPTIONS augustus_species CTL_OPTIONS fgenesh_par_file CTL_OPTIONS model_gff CTL_OPTIONS pred_gff CTL_OPTIONS max_dna_len 100000 CTL_OPTIONS split_hit 10000 CTL_OPTIONS pred_flank 200 CTL_OPTIONS pred_stats 0 CTL_OPTIONS min_protein 0 CTL_OPTIONS AED_threshold 1 CTL_OPTIONS single_exon 0 CTL_OPTIONS single_length 250 CTL_OPTIONS keep_preds 0 CTL_OPTIONS map_forward 0 CTL_OPTIONS est_forward 0 CTL_OPTIONS correct_est_fusion 0 CTL_OPTIONS alt_splice 0 CTL_OPTIONS always_complete 0 CTL_OPTIONS alt_peptide C CTL_OPTIONS evaluate 0 CTL_OPTIONS blast_type ncbi+ CTL_OPTIONS softmask 1 CTL_OPTIONS pcov_blastn 0.8 CTL_OPTIONS pid_blastn 0.85 CTL_OPTIONS eval_blastn 1e-10 CTL_OPTIONS bit_blastn 40 CTL_OPTIONS depth_blastn 0 CTL_OPTIONS pcov_rm_blastx 0.5 CTL_OPTIONS pid_rm_blastx 0.4 CTL_OPTIONS eval_rm_blastx 1e-06 CTL_OPTIONS bit_rm_blastx 30 CTL_OPTIONS pcov_blastx 0.5 CTL_OPTIONS pid_blastx 0.4 CTL_OPTIONS depth_blastx 0 CTL_OPTIONS eval_blastx 1e-06 CTL_OPTIONS bit_blastx 30 CTL_OPTIONS pcov_tblastx 0.8 CTL_OPTIONS pid_tblastx 0.85 CTL_OPTIONS eval_tblastx 1e-10 CTL_OPTIONS bit_tblastx 40 CTL_OPTIONS depth_tblastx 0 CTL_OPTIONS ep_score_limit 20 CTL_OPTIONS en_score_limit 20 CTL_OPTIONS enable_fathom 0 CTL_OPTIONS unmask 0 CTL_OPTIONS model_pass 0 CTL_OPTIONS est_pass 0 CTL_OPTIONS altest_pass 0 CTL_OPTIONS protein_pass 0 CTL_OPTIONS rm_pass 0 CTL_OPTIONS other_pass 0 CTL_OPTIONS pred_pass 0 CTL_OPTIONS run genemark LOGCHILD /media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/run.log.child.0 LOGCHILD /media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/run.log.child.0 LOGCHILD /media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/run.log.child.0 STARTED test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/scaffold-334630.abinit_nomask.0.gmhmm%2Emod.genemark FINISHED test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/scaffold-334630.abinit_nomask.0.gmhmm%2Emod.genemark STARTED test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/scaffold-334630.0.pred.raw.section FINISHED test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/scaffold-334630.0.pred.raw.section LOGCHILD /media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/run.log.child.0 LOGCHILD /media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/run.log.child.0 LOGCHILD /media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/run.log.child.0 STARTED test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/scaffold-334630.0.final.section FINISHED test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/scaffold-334630.0.final.section LOGCHILD /media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/run.log.child.0 LOGCHILD /media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/run.log.child.0 LOGCHILD /media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/run.log.child.0 ----- maker_opts ----- #-----Genome (these are always required) genome=/media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.fas #genome sequence (fasta file or fasta embeded in GFF3 file) organism_type=eukaryotic #eukaryotic or prokaryotic. Default is eukaryotic #-----Re-annotation Using MAKER Derived GFF3 maker_gff= #MAKER derived GFF3 file est_pass=0 #use ESTs in maker_gff: 1 = yes, 0 = no altest_pass=0 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no protein_pass=0 #use protein alignments in maker_gff: 1 = yes, 0 = no rm_pass=0 #use repeats in maker_gff: 1 = yes, 0 = no model_pass=0 #use gene models in maker_gff: 1 = yes, 0 = no pred_pass=0 #use ab-initio predictions in maker_gff: 1 = yes, 0 = no other_pass=0 #passthrough anyything else in maker_gff: 1 = yes, 0 = no #-----EST Evidence (for best results provide a file for at least one) est=/media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test-transcriptome.fa #set of ESTs or assembled mRNA-seq in fasta format altest=/media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/KK273.fa #EST/cDNA sequence file in fasta format from an alternate organism est_gff= #aligned ESTs or mRNA-seq from an external GFF3 file altest_gff= #aligned ESTs from a closly relate species in GFF3 format #-----Protein Homology Evidence (for best results provide a file for at least one) protein=/media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test-AA.fa #protein sequence file in fasta format (i.e. from mutiple oransisms) protein_gff= #aligned protein homology evidence from an external GFF3 file #-----Repeat Masking (leave values blank to skip repeat masking) model_org=all #select a model organism for RepBase masking in RepeatMasker rmlib= #provide an organism specific repeat library in fasta format for RepeatMasker repeat_protein=/usr/local/bin/maker/data/te_proteins.fasta #provide a fasta file of transposable element proteins for RepeatRunner rm_gff= #pre-identified repeat elements from an external GFF3 file prok_rm=0 #forces MAKER to repeatmask prokaryotes (no reason to change this), 1 = yes, 0 = no softmask=1 #use soft-masking rather than hard-masking in BLAST (i.e. seg and dust filtering) #-----Gene Prediction snaphmm= #SNAP HMM file gmhmm=/media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/output/gmhmm.mod #GeneMark HMM file augustus_species= #Augustus gene prediction species model fgenesh_par_file= #FGENESH parameter file pred_gff= #ab-initio predictions from an external GFF3 file model_gff= #annotated gene models from an external GFF3 file (annotation pass-through) est2genome=1 #infer gene predictions directly from ESTs, 1 = yes, 0 = no protein2genome=1 #infer predictions from protein homology, 1 = yes, 0 = no trna=0 #find tRNAs with tRNAscan, 1 = yes, 0 = no snoscan_rrna= #rRNA file to have Snoscan find snoRNAs unmask=0 #also run ab-initio prediction programs on unmasked sequence, 1 = yes, 0 = no #-----Other Annotation Feature Types (features MAKER doesn't recognize) other_gff= #extra features to pass-through to final MAKER generated GFF3 file #-----External Application Behavior Options alt_peptide=C #amino acid used to replace non-standard amino acids in BLAST databases cpus=1 #max number of cpus to use in BLAST and RepeatMasker (not for MPI, leave 1 when using MPI) #-----MAKER Behavior Options max_dna_len=100000 #length for dividing up contigs into chunks (increases/decreases memory usage) min_contig=1 #skip genome contigs below this length (under 10kb are often useless) pred_flank=200 #flank for extending evidence clusters sent to gene predictors pred_stats=0 #report AED and QI statistics for all predictions as well as models AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and 1) min_protein=0 #require at least this many amino acids in predicted proteins alt_splice=0 #Take extra steps to try and find alternative splicing, 1 = yes, 0 = no always_complete=0 #extra steps to force start and stop codons, 1 = yes, 0 = no map_forward=0 #map names and attributes forward from old GFF3 genes, 1 = yes, 0 = no keep_preds=0 #Concordance threshold to add unsupported gene prediction (bound by 0 and 1) split_hit=10000 #length for the splitting of hits (expected max intron size for evidence alignments) single_exon=0 #consider single exon EST evidence when generating annotations, 1 = yes, 0 = no single_length=250 #min length required for single exon ESTs if 'single_exon is enabled' correct_est_fusion=0 #limits use of ESTs in annotation to avoid fusion genes tries=2 #number of times to try a contig if there is a failure for some reason clean_try=0 #remove all data from previous run before retrying, 1 = yes, 0 = no clean_up=0 #removes theVoid directory with individual analysis files, 1 = yes, 0 = no TMP= #specify a directory other than the system default temporary directory for temporary files -- Kevin M. Kocot, Ph.D. NSF International Postdoctoral Research Fellow Degnan Lab The University of Queensland School of Biological Sciences 325 Goddard Building 8 St. Lucia, QLD 4072 Australia Ph: +61 0402 488 430 From dence at genetics.utah.edu Sat Oct 17 09:46:09 2015 From: dence at genetics.utah.edu (Daniel Ence) Date: Sat, 17 Oct 2015 15:46:09 +0000 Subject: [maker-devel] Maker not producing expected output In-Reply-To: <5621CA26.8050807@uq.edu.au> References: <5621CA26.8050807@uq.edu.au> Message-ID: Hi Kevin, So I have a couple of clarifying questions, and an explanation that?ll hopefully be helpful. If you look in the master datastore log, do you see an entry that shows that scaffold finished successfully? It will have the name of the scaffold, then the path to the results directory, and then a status. There should be one that shows that maker started working on it, and one that shows that maker finished it. Second what are the files that you?re expecting to see? I think you?re expecting to see couple of fasta files and a gff3 file that contain all the annotation results all gathered together. You can gather those results with the fasta_merge, and gff3_merge scripts that came with maker. To explain what you saw in that example results directory that you sent, if there weren?t any models or predictions on that scaffold, then there won?t be fasta files in the results directory. You could verify that by looking at the scaffold-334630.gff file. The fast_merge, and gff3_merge will gather all of the results fasta and gff files for all the scaffolds and put them into a few fasta files and one gff3 files, respectively. Let me know whether that helps, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 > On Oct 16, 2015, at 10:10 PM, Kevin Kocot wrote: > > Hello, > > I've run Maker on a draft invertebrate genome and it seemed to finish successfully. However, many of the expected output files were not produced. If I go to, for example, XX_datastore/00/0C/scaffold-334630/, all I see is: > > theVoid.scaffold-334630 > run.log > scaffold-334630.gff > > In particular, I'm looking for the transcripts and proteins fasta files. I'm sure I have a configuration setting incorrect or one of the dependencies not correctly installed, but I can't figure out what the problem is. Any thoughts on how I can resolve this issue and generate these files? Ideally I would love to be able to generate these files without having to run the whole pipeline again. Details on my configuration settings and the contents of the run.log file from my example above are pasted below. > > Thank you, > Kevin > > ----- > run.log from the example folder above looks like this: > ----- > SHARED_ID d574e9ca9b0019a9fe147ccb9db3588b > CTL_OPTIONS maker_gff > CTL_OPTIONS other_gff > CTL_OPTIONS est test-transcriptome.fa > CTL_OPTIONS est_reads > CTL_OPTIONS altest KK273.fa > CTL_OPTIONS est_gff > CTL_OPTIONS altest_gff > CTL_OPTIONS protein test-AA.fa > CTL_OPTIONS protein_gff > CTL_OPTIONS model_org all > CTL_OPTIONS repeat_protein te_proteins.fasta > CTL_OPTIONS rmlib > CTL_OPTIONS rm_gff > CTL_OPTIONS organism_type eukaryotic > CTL_OPTIONS predictor est2genome,genemark,protein2genome > CTL_OPTIONS est2genome 1 > CTL_OPTIONS altest2genome 0 > CTL_OPTIONS snaphmm > CTL_OPTIONS gmhmm output/gmhmm.mod > CTL_OPTIONS augustus_species > CTL_OPTIONS fgenesh_par_file > CTL_OPTIONS model_gff > CTL_OPTIONS pred_gff > CTL_OPTIONS max_dna_len 100000 > CTL_OPTIONS split_hit 10000 > CTL_OPTIONS pred_flank 200 > CTL_OPTIONS pred_stats 0 > CTL_OPTIONS min_protein 0 > CTL_OPTIONS AED_threshold 1 > CTL_OPTIONS single_exon 0 > CTL_OPTIONS single_length 250 > CTL_OPTIONS keep_preds 0 > CTL_OPTIONS map_forward 0 > CTL_OPTIONS est_forward 0 > CTL_OPTIONS correct_est_fusion 0 > CTL_OPTIONS alt_splice 0 > CTL_OPTIONS always_complete 0 > CTL_OPTIONS alt_peptide C > CTL_OPTIONS evaluate 0 > CTL_OPTIONS blast_type ncbi+ > CTL_OPTIONS softmask 1 > CTL_OPTIONS pcov_blastn 0.8 > CTL_OPTIONS pid_blastn 0.85 > CTL_OPTIONS eval_blastn 1e-10 > CTL_OPTIONS bit_blastn 40 > CTL_OPTIONS depth_blastn 0 > CTL_OPTIONS pcov_rm_blastx 0.5 > CTL_OPTIONS pid_rm_blastx 0.4 > CTL_OPTIONS eval_rm_blastx 1e-06 > CTL_OPTIONS bit_rm_blastx 30 > CTL_OPTIONS pcov_blastx 0.5 > CTL_OPTIONS pid_blastx 0.4 > CTL_OPTIONS depth_blastx 0 > CTL_OPTIONS eval_blastx 1e-06 > CTL_OPTIONS bit_blastx 30 > CTL_OPTIONS pcov_tblastx 0.8 > CTL_OPTIONS pid_tblastx 0.85 > CTL_OPTIONS eval_tblastx 1e-10 > CTL_OPTIONS bit_tblastx 40 > CTL_OPTIONS depth_tblastx 0 > CTL_OPTIONS ep_score_limit 20 > CTL_OPTIONS en_score_limit 20 > CTL_OPTIONS enable_fathom 0 > CTL_OPTIONS unmask 0 > CTL_OPTIONS model_pass 0 > CTL_OPTIONS est_pass 0 > CTL_OPTIONS altest_pass 0 > CTL_OPTIONS protein_pass 0 > CTL_OPTIONS rm_pass 0 > CTL_OPTIONS other_pass 0 > CTL_OPTIONS pred_pass 0 > CTL_OPTIONS run genemark > LOGCHILD /media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/run.log.child.0 > LOGCHILD /media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/run.log.child.0 > LOGCHILD /media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/run.log.child.0 > STARTED test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/scaffold-334630.abinit_nomask.0.gmhmm%2Emod.genemark > FINISHED test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/scaffold-334630.abinit_nomask.0.gmhmm%2Emod.genemark > STARTED test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/scaffold-334630.0.pred.raw.section > FINISHED test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/scaffold-334630.0.pred.raw.section > LOGCHILD /media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/run.log.child.0 > LOGCHILD /media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/run.log.child.0 > LOGCHILD /media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/run.log.child.0 > STARTED test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/scaffold-334630.0.final.section > FINISHED test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/scaffold-334630.0.final.section > LOGCHILD /media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/run.log.child.0 > LOGCHILD /media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/run.log.child.0 > LOGCHILD /media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/run.log.child.0 > > ----- > maker_opts > ----- > #-----Genome (these are always required) > genome=/media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.fas #genome sequence (fasta file or fasta embeded in GFF3 file) > organism_type=eukaryotic #eukaryotic or prokaryotic. Default is eukaryotic > > #-----Re-annotation Using MAKER Derived GFF3 > maker_gff= #MAKER derived GFF3 file > est_pass=0 #use ESTs in maker_gff: 1 = yes, 0 = no > altest_pass=0 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no > protein_pass=0 #use protein alignments in maker_gff: 1 = yes, 0 = no > rm_pass=0 #use repeats in maker_gff: 1 = yes, 0 = no > model_pass=0 #use gene models in maker_gff: 1 = yes, 0 = no > pred_pass=0 #use ab-initio predictions in maker_gff: 1 = yes, 0 = no > other_pass=0 #passthrough anyything else in maker_gff: 1 = yes, 0 = no > > #-----EST Evidence (for best results provide a file for at least one) > est=/media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test-transcriptome.fa #set of ESTs or assembled mRNA-seq in fasta format > altest=/media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/KK273.fa #EST/cDNA sequence file in fasta format from an alternate organism > est_gff= #aligned ESTs or mRNA-seq from an external GFF3 file > altest_gff= #aligned ESTs from a closly relate species in GFF3 format > > #-----Protein Homology Evidence (for best results provide a file for at least one) > protein=/media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test-AA.fa #protein sequence file in fasta format (i.e. from mutiple oransisms) > protein_gff= #aligned protein homology evidence from an external GFF3 file > > #-----Repeat Masking (leave values blank to skip repeat masking) > model_org=all #select a model organism for RepBase masking in RepeatMasker > rmlib= #provide an organism specific repeat library in fasta format for RepeatMasker > repeat_protein=/usr/local/bin/maker/data/te_proteins.fasta #provide a fasta file of transposable element proteins for RepeatRunner > rm_gff= #pre-identified repeat elements from an external GFF3 file > prok_rm=0 #forces MAKER to repeatmask prokaryotes (no reason to change this), 1 = yes, 0 = no > softmask=1 #use soft-masking rather than hard-masking in BLAST (i.e. seg and dust filtering) > > #-----Gene Prediction > snaphmm= #SNAP HMM file > gmhmm=/media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/output/gmhmm.mod #GeneMark HMM file > augustus_species= #Augustus gene prediction species model > fgenesh_par_file= #FGENESH parameter file > pred_gff= #ab-initio predictions from an external GFF3 file > model_gff= #annotated gene models from an external GFF3 file (annotation pass-through) > est2genome=1 #infer gene predictions directly from ESTs, 1 = yes, 0 = no > protein2genome=1 #infer predictions from protein homology, 1 = yes, 0 = no > trna=0 #find tRNAs with tRNAscan, 1 = yes, 0 = no > snoscan_rrna= #rRNA file to have Snoscan find snoRNAs > unmask=0 #also run ab-initio prediction programs on unmasked sequence, 1 = yes, 0 = no > > #-----Other Annotation Feature Types (features MAKER doesn't recognize) > other_gff= #extra features to pass-through to final MAKER generated GFF3 file > > #-----External Application Behavior Options > alt_peptide=C #amino acid used to replace non-standard amino acids in BLAST databases > cpus=1 #max number of cpus to use in BLAST and RepeatMasker (not for MPI, leave 1 when using MPI) > > #-----MAKER Behavior Options > max_dna_len=100000 #length for dividing up contigs into chunks (increases/decreases memory usage) > min_contig=1 #skip genome contigs below this length (under 10kb are often useless) > > pred_flank=200 #flank for extending evidence clusters sent to gene predictors > pred_stats=0 #report AED and QI statistics for all predictions as well as models > AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and 1) > min_protein=0 #require at least this many amino acids in predicted proteins > alt_splice=0 #Take extra steps to try and find alternative splicing, 1 = yes, 0 = no > always_complete=0 #extra steps to force start and stop codons, 1 = yes, 0 = no > map_forward=0 #map names and attributes forward from old GFF3 genes, 1 = yes, 0 = no > keep_preds=0 #Concordance threshold to add unsupported gene prediction (bound by 0 and 1) > > split_hit=10000 #length for the splitting of hits (expected max intron size for evidence alignments) > single_exon=0 #consider single exon EST evidence when generating annotations, 1 = yes, 0 = no > single_length=250 #min length required for single exon ESTs if 'single_exon is enabled' > correct_est_fusion=0 #limits use of ESTs in annotation to avoid fusion genes > > tries=2 #number of times to try a contig if there is a failure for some reason > clean_try=0 #remove all data from previous run before retrying, 1 = yes, 0 = no > clean_up=0 #removes theVoid directory with individual analysis files, 1 = yes, 0 = no > TMP= #specify a directory other than the system default temporary directory for temporary files > > -- > Kevin M. Kocot, Ph.D. > NSF International Postdoctoral Research Fellow > Degnan Lab > The University of Queensland > School of Biological Sciences > 325 Goddard Building 8 > St. Lucia, QLD 4072 > Australia > Ph: +61 0402 488 430 > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Sat Oct 17 14:24:42 2015 From: carsonhh at gmail.com (Carson Holt) Date: Sat, 17 Oct 2015 14:24:42 -0600 Subject: [maker-devel] Maker not producing expected output In-Reply-To: <5621CA26.8050807@uq.edu.au> References: <5621CA26.8050807@uq.edu.au> Message-ID: <1117A475-526B-477A-B44A-E86A4A23262B@gmail.com> You will only get fasta files for the contig when there are gene models present on that contig. The only ab initio predictor you provided parameters for is GeneMark, and apparently it did not predict any genes for the contig in question. If it did you would have at least a fasta files in the output that contained all predictions made by GeneMark. MAKER doesn?t make gene models, rather it provides hints to other gene predictors based on the evidence alignments and then promotes and polishes the models they make. If they produce no models, then you will get no results. You can try adding additional gene predictors like SNAP (incase GeneMark just isn?t performing well), or you can check the length of your contig (contigs shorter than about 10kb rarely produce any results - they are too short to be annotatable). Try looking at the results from one of the larger contigs, or use fasta_merge to gather all results from all contigs. ?Carson > On Oct 16, 2015, at 10:10 PM, Kevin Kocot wrote: > > Hello, > > I've run Maker on a draft invertebrate genome and it seemed to finish successfully. However, many of the expected output files were not produced. If I go to, for example, XX_datastore/00/0C/scaffold-334630/, all I see is: > > theVoid.scaffold-334630 > run.log > scaffold-334630.gff > > In particular, I'm looking for the transcripts and proteins fasta files. I'm sure I have a configuration setting incorrect or one of the dependencies not correctly installed, but I can't figure out what the problem is. Any thoughts on how I can resolve this issue and generate these files? Ideally I would love to be able to generate these files without having to run the whole pipeline again. Details on my configuration settings and the contents of the run.log file from my example above are pasted below. > > Thank you, > Kevin > > ----- > run.log from the example folder above looks like this: > ----- > SHARED_ID d574e9ca9b0019a9fe147ccb9db3588b > CTL_OPTIONS maker_gff > CTL_OPTIONS other_gff > CTL_OPTIONS est test-transcriptome.fa > CTL_OPTIONS est_reads > CTL_OPTIONS altest KK273.fa > CTL_OPTIONS est_gff > CTL_OPTIONS altest_gff > CTL_OPTIONS protein test-AA.fa > CTL_OPTIONS protein_gff > CTL_OPTIONS model_org all > CTL_OPTIONS repeat_protein te_proteins.fasta > CTL_OPTIONS rmlib > CTL_OPTIONS rm_gff > CTL_OPTIONS organism_type eukaryotic > CTL_OPTIONS predictor est2genome,genemark,protein2genome > CTL_OPTIONS est2genome 1 > CTL_OPTIONS altest2genome 0 > CTL_OPTIONS snaphmm > CTL_OPTIONS gmhmm output/gmhmm.mod > CTL_OPTIONS augustus_species > CTL_OPTIONS fgenesh_par_file > CTL_OPTIONS model_gff > CTL_OPTIONS pred_gff > CTL_OPTIONS max_dna_len 100000 > CTL_OPTIONS split_hit 10000 > CTL_OPTIONS pred_flank 200 > CTL_OPTIONS pred_stats 0 > CTL_OPTIONS min_protein 0 > CTL_OPTIONS AED_threshold 1 > CTL_OPTIONS single_exon 0 > CTL_OPTIONS single_length 250 > CTL_OPTIONS keep_preds 0 > CTL_OPTIONS map_forward 0 > CTL_OPTIONS est_forward 0 > CTL_OPTIONS correct_est_fusion 0 > CTL_OPTIONS alt_splice 0 > CTL_OPTIONS always_complete 0 > CTL_OPTIONS alt_peptide C > CTL_OPTIONS evaluate 0 > CTL_OPTIONS blast_type ncbi+ > CTL_OPTIONS softmask 1 > CTL_OPTIONS pcov_blastn 0.8 > CTL_OPTIONS pid_blastn 0.85 > CTL_OPTIONS eval_blastn 1e-10 > CTL_OPTIONS bit_blastn 40 > CTL_OPTIONS depth_blastn 0 > CTL_OPTIONS pcov_rm_blastx 0.5 > CTL_OPTIONS pid_rm_blastx 0.4 > CTL_OPTIONS eval_rm_blastx 1e-06 > CTL_OPTIONS bit_rm_blastx 30 > CTL_OPTIONS pcov_blastx 0.5 > CTL_OPTIONS pid_blastx 0.4 > CTL_OPTIONS depth_blastx 0 > CTL_OPTIONS eval_blastx 1e-06 > CTL_OPTIONS bit_blastx 30 > CTL_OPTIONS pcov_tblastx 0.8 > CTL_OPTIONS pid_tblastx 0.85 > CTL_OPTIONS eval_tblastx 1e-10 > CTL_OPTIONS bit_tblastx 40 > CTL_OPTIONS depth_tblastx 0 > CTL_OPTIONS ep_score_limit 20 > CTL_OPTIONS en_score_limit 20 > CTL_OPTIONS enable_fathom 0 > CTL_OPTIONS unmask 0 > CTL_OPTIONS model_pass 0 > CTL_OPTIONS est_pass 0 > CTL_OPTIONS altest_pass 0 > CTL_OPTIONS protein_pass 0 > CTL_OPTIONS rm_pass 0 > CTL_OPTIONS other_pass 0 > CTL_OPTIONS pred_pass 0 > CTL_OPTIONS run genemark > LOGCHILD /media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/run.log.child.0 > LOGCHILD /media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/run.log.child.0 > LOGCHILD /media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/run.log.child.0 > STARTED test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/scaffold-334630.abinit_nomask.0.gmhmm%2Emod.genemark > FINISHED test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/scaffold-334630.abinit_nomask.0.gmhmm%2Emod.genemark > STARTED test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/scaffold-334630.0.pred.raw.section > FINISHED test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/scaffold-334630.0.pred.raw.section > LOGCHILD /media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/run.log.child.0 > LOGCHILD /media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/run.log.child.0 > LOGCHILD /media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/run.log.child.0 > STARTED test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/scaffold-334630.0.final.section > FINISHED test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/scaffold-334630.0.final.section > LOGCHILD /media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/run.log.child.0 > LOGCHILD /media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/run.log.child.0 > LOGCHILD /media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/run.log.child.0 > > ----- > maker_opts > ----- > #-----Genome (these are always required) > genome=/media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.fas #genome sequence (fasta file or fasta embeded in GFF3 file) > organism_type=eukaryotic #eukaryotic or prokaryotic. Default is eukaryotic > > #-----Re-annotation Using MAKER Derived GFF3 > maker_gff= #MAKER derived GFF3 file > est_pass=0 #use ESTs in maker_gff: 1 = yes, 0 = no > altest_pass=0 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no > protein_pass=0 #use protein alignments in maker_gff: 1 = yes, 0 = no > rm_pass=0 #use repeats in maker_gff: 1 = yes, 0 = no > model_pass=0 #use gene models in maker_gff: 1 = yes, 0 = no > pred_pass=0 #use ab-initio predictions in maker_gff: 1 = yes, 0 = no > other_pass=0 #passthrough anyything else in maker_gff: 1 = yes, 0 = no > > #-----EST Evidence (for best results provide a file for at least one) > est=/media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test-transcriptome.fa #set of ESTs or assembled mRNA-seq in fasta format > altest=/media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/KK273.fa #EST/cDNA sequence file in fasta format from an alternate organism > est_gff= #aligned ESTs or mRNA-seq from an external GFF3 file > altest_gff= #aligned ESTs from a closly relate species in GFF3 format > > #-----Protein Homology Evidence (for best results provide a file for at least one) > protein=/media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test-AA.fa #protein sequence file in fasta format (i.e. from mutiple oransisms) > protein_gff= #aligned protein homology evidence from an external GFF3 file > > #-----Repeat Masking (leave values blank to skip repeat masking) > model_org=all #select a model organism for RepBase masking in RepeatMasker > rmlib= #provide an organism specific repeat library in fasta format for RepeatMasker > repeat_protein=/usr/local/bin/maker/data/te_proteins.fasta #provide a fasta file of transposable element proteins for RepeatRunner > rm_gff= #pre-identified repeat elements from an external GFF3 file > prok_rm=0 #forces MAKER to repeatmask prokaryotes (no reason to change this), 1 = yes, 0 = no > softmask=1 #use soft-masking rather than hard-masking in BLAST (i.e. seg and dust filtering) > > #-----Gene Prediction > snaphmm= #SNAP HMM file > gmhmm=/media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/output/gmhmm.mod #GeneMark HMM file > augustus_species= #Augustus gene prediction species model > fgenesh_par_file= #FGENESH parameter file > pred_gff= #ab-initio predictions from an external GFF3 file > model_gff= #annotated gene models from an external GFF3 file (annotation pass-through) > est2genome=1 #infer gene predictions directly from ESTs, 1 = yes, 0 = no > protein2genome=1 #infer predictions from protein homology, 1 = yes, 0 = no > trna=0 #find tRNAs with tRNAscan, 1 = yes, 0 = no > snoscan_rrna= #rRNA file to have Snoscan find snoRNAs > unmask=0 #also run ab-initio prediction programs on unmasked sequence, 1 = yes, 0 = no > > #-----Other Annotation Feature Types (features MAKER doesn't recognize) > other_gff= #extra features to pass-through to final MAKER generated GFF3 file > > #-----External Application Behavior Options > alt_peptide=C #amino acid used to replace non-standard amino acids in BLAST databases > cpus=1 #max number of cpus to use in BLAST and RepeatMasker (not for MPI, leave 1 when using MPI) > > #-----MAKER Behavior Options > max_dna_len=100000 #length for dividing up contigs into chunks (increases/decreases memory usage) > min_contig=1 #skip genome contigs below this length (under 10kb are often useless) > > pred_flank=200 #flank for extending evidence clusters sent to gene predictors > pred_stats=0 #report AED and QI statistics for all predictions as well as models > AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and 1) > min_protein=0 #require at least this many amino acids in predicted proteins > alt_splice=0 #Take extra steps to try and find alternative splicing, 1 = yes, 0 = no > always_complete=0 #extra steps to force start and stop codons, 1 = yes, 0 = no > map_forward=0 #map names and attributes forward from old GFF3 genes, 1 = yes, 0 = no > keep_preds=0 #Concordance threshold to add unsupported gene prediction (bound by 0 and 1) > > split_hit=10000 #length for the splitting of hits (expected max intron size for evidence alignments) > single_exon=0 #consider single exon EST evidence when generating annotations, 1 = yes, 0 = no > single_length=250 #min length required for single exon ESTs if 'single_exon is enabled' > correct_est_fusion=0 #limits use of ESTs in annotation to avoid fusion genes > > tries=2 #number of times to try a contig if there is a failure for some reason > clean_try=0 #remove all data from previous run before retrying, 1 = yes, 0 = no > clean_up=0 #removes theVoid directory with individual analysis files, 1 = yes, 0 = no > TMP= #specify a directory other than the system default temporary directory for temporary files > > -- > Kevin M. Kocot, Ph.D. > NSF International Postdoctoral Research Fellow > Degnan Lab > The University of Queensland > School of Biological Sciences > 325 Goddard Building 8 > St. Lucia, QLD 4072 > Australia > Ph: +61 0402 488 430 > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From 14.chewsoklim at gmail.com Mon Oct 19 20:49:21 2015 From: 14.chewsoklim at gmail.com (Sok Lim Chew) Date: Tue, 20 Oct 2015 10:49:21 +0800 Subject: [maker-devel] Failed while doing blastx of proteins Message-ID: Hi all, The following errors occurred while I was using MAKER for annotation. I have searched around this forum but seems like the solutions provided do not works for me. ################################################################# ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Must have defined a valid name for Hit STACK: Error::throw STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:486 STACK: Bio::Search::Hit::GenericHit::new /usr/local/share/perl5/Bio/Search/Hit/GenericHit.pm:149 STACK: Bio::Search::Hit::PhatHit::Base::new maker/bin/../lib/Bio/Search/Hit/PhatHit/Base.pm:127 STACK: Bio::Search::Hit::PhatHit::blastx::new maker/bin/../lib/Bio/Search/Hit/PhatHit/blastx.pm:125 STACK: Bio::Search::Hit::HitFactory::create /usr/local/share/perl5/Bio/Search/Hit/HitFactory.pm:124 STACK: Bio::Factory::ObjectFactoryI::create_object /usr/local/share/perl5/Bio/Factory/ObjectFactoryI.pm:114 STACK: Bio::Search::Iteration::GenericIteration::newhits_below_threshold /usr/local/share/perl5/Bio/Search/Iteration/GenericIteration.pm:506 STACK: Bio::Search::Iteration::GenericIteration::newhits /usr/local/share/perl5/Bio/Search/Iteration/GenericIteration.pm:488 STACK: Bio::Search::Iteration::GenericIteration::hits /usr/local/share/perl5/Bio/Search/Iteration/GenericIteration.pm:469 STACK: Bio::Search::Result::BlastResult::hits /usr/local/share/perl5/Bio/Search/Result/BlastResult.pm:168 STACK: Bio::Search::Result::BlastResult::num_hits /usr/local/share/perl5/Bio/Search/Result/BlastResult.pm:242 STACK: Widget::blastx::keepers maker/bin/../lib/Widget/blastx.pm:164 STACK: Widget::blastx::parse maker/bin/../lib/Widget/blastx.pm:132 STACK: GI::blastx_as_chunks maker/bin/../lib/GI.pm:2457 STACK: GI::blastx_as_chunks maker/bin/../lib/GI.pm:2466 STACK: Process::MpiChunk::_go maker/bin/../lib/Process/MpiChunk.pm:2687 STACK: Process::MpiChunk::run maker/bin/../lib/Process/MpiChunk.pm:341 STACK: Process::MpiChunk::run_all maker/bin/../lib/Process/MpiChunk.pm:357 STACK: Process::MpiTiers::run_all maker/bin/../lib/Process/MpiTiers.pm:287 STACK: Process::MpiTiers::run_all maker/bin/../lib/Process/MpiTiers.pm:287 STACK: maker/bin/maker:686 ----------------------------------------------------------- --> rank=NA, hostname=gena2 --> rank=NA, hostname=gena2 --> rank=NA, hostname=gena2 --> rank=NA, hostname=gena2 ERROR: Failed while doing blastx of proteins ERROR: Chunk failed at level:8, tier_type:3 FAILED CONTIG:Contig1 ERROR: Chunk failed at level:4, tier_type:0 FAILED CONTIG:Contig1 examining contents of the fasta file and run log ########################################################### Is anyone has any idea on this? Thanks, SokLim -------------- next part -------------- An HTML attachment was scrubbed... URL: From 14.chewsoklim at gmail.com Mon Oct 19 21:56:09 2015 From: 14.chewsoklim at gmail.com (Sok Lim Chew) Date: Tue, 20 Oct 2015 11:56:09 +0800 Subject: [maker-devel] Failed while doing blastx of proteins Message-ID: Hi all, The following errors occurred while I was using MAKER for annotation. I have searched around this forum but seems like the solutions provided do not works for me. ################################################################# ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Must have defined a valid name for Hit STACK: Error::throw STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:486 STACK: Bio::Search::Hit::GenericHit::new /usr/local/share/perl5/Bio/Search/Hit/GenericHit.pm:149 STACK: Bio::Search::Hit::PhatHit::Base::new maker/bin/../lib/Bio/Search/Hit/PhatHit/Base.pm:127 STACK: Bio::Search::Hit::PhatHit::blastx::new maker/bin/../lib/Bio/Search/Hit/PhatHit/blastx.pm:125 STACK: Bio::Search::Hit::HitFactory::create /usr/local/share/perl5/Bio/Search/Hit/HitFactory.pm:124 STACK: Bio::Factory::ObjectFactoryI::create_object /usr/local/share/perl5/Bio/Factory/ObjectFactoryI.pm:114 STACK: Bio::Search::Iteration::GenericIteration::newhits_below_threshold /usr/local/share/perl5/Bio/Search/Iteration/GenericIteration.pm:506 STACK: Bio::Search::Iteration::GenericIteration::newhits /usr/local/share/perl5/Bio/Search/Iteration/GenericIteration.pm:488 STACK: Bio::Search::Iteration::GenericIteration::hits /usr/local/share/perl5/Bio/Search/Iteration/GenericIteration.pm:469 STACK: Bio::Search::Result::BlastResult::hits /usr/local/share/perl5/Bio/Search/Result/BlastResult.pm:168 STACK: Bio::Search::Result::BlastResult::num_hits /usr/local/share/perl5/Bio/Search/Result/BlastResult.pm:242 STACK: Widget::blastx::keepers maker/bin/../lib/Widget/blastx.pm:164 STACK: Widget::blastx::parse maker/bin/../lib/Widget/blastx.pm:132 STACK: GI::blastx_as_chunks maker/bin/../lib/GI.pm:2457 STACK: GI::blastx_as_chunks maker/bin/../lib/GI.pm:2466 STACK: Process::MpiChunk::_go maker/bin/../lib/Process/MpiChunk.pm:2687 STACK: Process::MpiChunk::run maker/bin/../lib/Process/MpiChunk.pm:341 STACK: Process::MpiChunk::run_all maker/bin/../lib/Process/MpiChunk.pm:357 STACK: Process::MpiTiers::run_all maker/bin/../lib/Process/MpiTiers.pm:287 STACK: Process::MpiTiers::run_all maker/bin/../lib/Process/MpiTiers.pm:287 STACK: maker/bin/maker:686 ----------------------------------------------------------- --> rank=NA, hostname=gena2 --> rank=NA, hostname=gena2 --> rank=NA, hostname=gena2 --> rank=NA, hostname=gena2 ERROR: Failed while doing blastx of proteins ERROR: Chunk failed at level:8, tier_type:3 FAILED CONTIG:Contig1 ERROR: Chunk failed at level:4, tier_type:0 FAILED CONTIG:Contig1 examining contents of the fasta file and run log ########################################################### Is anyone has any idea on this? Thanks, SokLim -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Oct 20 09:52:33 2015 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 20 Oct 2015 09:52:33 -0600 Subject: [maker-devel] Failed while doing blastx of proteins In-Reply-To: References: Message-ID: <68B95736-10CD-4C90-8092-3AA754250799@gmail.com> Make sure you installed the CPAN version of BioPerl and not BioPerl live (Current version is 1.6.924). Also there are a couple of BLAST+ versions that have bugs. Use version BLAST+ version 2.2.28. What version of MAKER are you using? Should be 2.31.8. Also check that your /tmp directory is not full (will result in truncated output files). Thanks, Carson > On Oct 19, 2015, at 9:56 PM, Sok Lim Chew <14.chewsoklim at gmail.com> wrote: > > Hi all, > > The following errors occurred while I was using MAKER for annotation. I have searched around this forum but seems like the solutions provided do not works for me. > > ################################################################# > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Must have defined a valid name for Hit > STACK: Error::throw > STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:486 > STACK: Bio::Search::Hit::GenericHit::new /usr/local/share/perl5/Bio/Search/Hit/GenericHit.pm:149 > STACK: Bio::Search::Hit::PhatHit::Base::new maker/bin/../lib/Bio/Search/Hit/PhatHit/Base.pm:127 > STACK: Bio::Search::Hit::PhatHit::blastx::new maker/bin/../lib/Bio/Search/Hit/PhatHit/blastx.pm:125 > STACK: Bio::Search::Hit::HitFactory::create /usr/local/share/perl5/Bio/Search/Hit/HitFactory.pm:124 > STACK: Bio::Factory::ObjectFactoryI::create_object /usr/local/share/perl5/Bio/Factory/ObjectFactoryI.pm:114 > STACK: Bio::Search::Iteration::GenericIteration::newhits_below_threshold /usr/local/share/perl5/Bio/Search/Iteration/GenericIteration.pm:506 > STACK: Bio::Search::Iteration::GenericIteration::newhits /usr/local/share/perl5/Bio/Search/Iteration/GenericIteration.pm:488 > STACK: Bio::Search::Iteration::GenericIteration::hits /usr/local/share/perl5/Bio/Search/Iteration/GenericIteration.pm:469 > STACK: Bio::Search::Result::BlastResult::hits /usr/local/share/perl5/Bio/Search/Result/BlastResult.pm:168 > STACK: Bio::Search::Result::BlastResult::num_hits /usr/local/share/perl5/Bio/Search/Result/BlastResult.pm:242 > STACK: Widget::blastx::keepers > maker/bin/../lib/Widget/blastx.pm:164 > STACK: Widget::blastx::parse > maker/bin/../lib/Widget/blastx.pm:132 > STACK: GI::blastx_as_chunks > maker/bin/../lib/GI.pm:2457 > STACK: GI::blastx_as_chunks > maker/bin/../lib/GI.pm:2466 > STACK: Process::MpiChunk::_go > maker/bin/../lib/Process/MpiChunk.pm:2687 > STACK: Process::MpiChunk::run > maker/bin/../lib/Process/MpiChunk.pm:341 > STACK: Process::MpiChunk::run_all > maker/bin/../lib/Process/MpiChunk.pm:357 > STACK: Process::MpiTiers::run_all maker/bin/../lib/Process/MpiTiers.pm:287 > STACK: Process::MpiTiers::run_all maker/bin/../lib/Process/MpiTiers.pm:287 > STACK: maker/bin/maker:686 > ----------------------------------------------------------- > --> rank=NA, hostname=gena2 > --> rank=NA, hostname=gena2 > --> rank=NA, hostname=gena2 > --> rank=NA, hostname=gena2 > ERROR: Failed while doing blastx of proteins > ERROR: Chunk failed at level:8, tier_type:3 > FAILED CONTIG:Contig1 > > ERROR: Chunk failed at level:4, tier_type:0 > FAILED CONTIG:Contig1 > > examining contents of the fasta file and run log > > ########################################################### > > Is anyone has any idea on this? > > Thanks, > SokLim > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From mcsimenc at gmail.com Tue Oct 20 17:54:21 2015 From: mcsimenc at gmail.com (Matt Simenc) Date: Tue, 20 Oct 2015 16:54:21 -0700 Subject: [maker-devel] MPI large load Message-ID: Hi, I am using OpenMPI to run MAKER on 2 nodes with 40 CPUs/node. The load is distributing across the nodes ok but with a very large number of processes on each node. Sometimes there are several hundred more processes than can be executed at one time by a node. Is this a problem? If so, any suggestions on how to fix? Thanks! Matt Simenc Der Lab California State University Fullerton -------------- next part -------------- An HTML attachment was scrubbed... URL: From cjfields at illinois.edu Wed Oct 21 12:44:28 2015 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 21 Oct 2015 18:44:28 +0000 Subject: [maker-devel] Failed while doing blastx of proteins In-Reply-To: <68B95736-10CD-4C90-8092-3AA754250799@gmail.com> References: <68B95736-10CD-4C90-8092-3AA754250799@gmail.com> Message-ID: <5CCEB170-2B41-4CC1-A9CB-C246274345B9@illinois.edu> Agreed. It would be nice to know whether this is a Bioperl bug that needs addressing, but I?m not sure how easy it would be to pull out a test case. chris On Oct 20, 2015, at 10:52 AM, Carson Holt > wrote: Make sure you installed the CPAN version of BioPerl and not BioPerl live (Current version is 1.6.924). Also there are a couple of BLAST+ versions that have bugs. Use version BLAST+ version 2.2.28. What version of MAKER are you using? Should be 2.31.8. Also check that your /tmp directory is not full (will result in truncated output files). Thanks, Carson On Oct 19, 2015, at 9:56 PM, Sok Lim Chew <14.chewsoklim at gmail.com> wrote: Hi all, The following errors occurred while I was using MAKER for annotation. I have searched around this forum but seems like the solutions provided do not works for me. ################################################################# ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Must have defined a valid name for Hit STACK: Error::throw STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:486 STACK: Bio::Search::Hit::GenericHit::new /usr/local/share/perl5/Bio/Search/Hit/GenericHit.pm:149 STACK: Bio::Search::Hit::PhatHit::Base::new maker/bin/../lib/Bio/Search/Hit/PhatHit/Base.pm:127 STACK: Bio::Search::Hit::PhatHit::blastx::new maker/bin/../lib/Bio/Search/Hit/PhatHit/blastx.pm:125 STACK: Bio::Search::Hit::HitFactory::create /usr/local/share/perl5/Bio/Search/Hit/HitFactory.pm:124 STACK: Bio::Factory::ObjectFactoryI::create_object /usr/local/share/perl5/Bio/Factory/ObjectFactoryI.pm:114 STACK: Bio::Search::Iteration::GenericIteration::newhits_below_threshold /usr/local/share/perl5/Bio/Search/Iteration/GenericIteration.pm:506 STACK: Bio::Search::Iteration::GenericIteration::newhits /usr/local/share/perl5/Bio/Search/Iteration/GenericIteration.pm:488 STACK: Bio::Search::Iteration::GenericIteration::hits /usr/local/share/perl5/Bio/Search/Iteration/GenericIteration.pm:469 STACK: Bio::Search::Result::BlastResult::hits /usr/local/share/perl5/Bio/Search/Result/BlastResult.pm:168 STACK: Bio::Search::Result::BlastResult::num_hits /usr/local/share/perl5/Bio/Search/Result/BlastResult.pm:242 STACK: Widget::blastx::keepers maker/bin/../lib/Widget/blastx.pm:164 STACK: Widget::blastx::parse maker/bin/../lib/Widget/blastx.pm:132 STACK: GI::blastx_as_chunks maker/bin/../lib/GI.pm:2457 STACK: GI::blastx_as_chunks maker/bin/../lib/GI.pm:2466 STACK: Process::MpiChunk::_go maker/bin/../lib/Process/MpiChunk.pm:2687 STACK: Process::MpiChunk::run maker/bin/../lib/Process/MpiChunk.pm:341 STACK: Process::MpiChunk::run_all maker/bin/../lib/Process/MpiChunk.pm:357 STACK: Process::MpiTiers::run_all maker/bin/../lib/Process/MpiTiers.pm:287 STACK: Process::MpiTiers::run_all maker/bin/../lib/Process/MpiTiers.pm:287 STACK: maker/bin/maker:686 ----------------------------------------------------------- --> rank=NA, hostname=gena2 --> rank=NA, hostname=gena2 --> rank=NA, hostname=gena2 --> rank=NA, hostname=gena2 ERROR: Failed while doing blastx of proteins ERROR: Chunk failed at level:8, tier_type:3 FAILED CONTIG:Contig1 ERROR: Chunk failed at level:4, tier_type:0 FAILED CONTIG:Contig1 examining contents of the fasta file and run log ########################################################### Is anyone has any idea on this? Thanks, SokLim _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Oct 22 10:39:21 2015 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 22 Oct 2015 10:39:21 -0600 Subject: [maker-devel] MPI large load In-Reply-To: References: Message-ID: <4733DA58-3277-4703-A660-4EE819858694@gmail.com> Because MAKER is a pipeline, all processes it calls will generate separate processes (i.e. BLAST etc.). Also it will spawn a couple of helper processes to watch communication and files. The helper processes use 0% CPU, and the main MAKER process will yield to external system calls and processes until they finish execution. So they will never use a larger % of CPU than is specified. Also the way MPI works is it spawns a separate process for every CPU specified, so if you specify 40 CPUs you get 40 independent communicating processes rather than 1 process accessing 40 CPUs. So if you take into account the MPI processes, helper processes, and external system calls a 40 CPU specification could result in up to three times that many numbered processes existing simultaneously (even though no more than 40 will be active at a time). However if your system is having an issue letting the required number of processes exist, then it is a ulimit issue. Your administrator has the limit set too low. You can see what limits are set using the command ?ulimit -a?. You will need to get your system admin to fix it. ?Carson > On Oct 20, 2015, at 5:54 PM, Matt Simenc wrote: > > Hi, > > > > I am using OpenMPI to run MAKER on 2 nodes with 40 CPUs/node. The load is distributing across the nodes ok but with a very large number of processes on each node. Sometimes there are several hundred more processes than can be executed at one time by a node. Is this a problem? If so, any suggestions on how to fix? > > > > Thanks! > > > > Matt Simenc > > Der Lab > > California State University Fullerton > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From jmartin at genome.wustl.edu Tue Oct 27 15:35:38 2015 From: jmartin at genome.wustl.edu (John Martin) Date: Tue, 27 Oct 2015 16:35:38 -0500 Subject: [maker-devel] ERROR: Failed while clustering transcripts into genes for annotations Message-ID: <562FEE2A.4070907@genome.wustl.edu> I'm working on annotation for a de novo genomic assembly, and I have my jobs split into roughly equal (by seq length) batches. I am seeing ~2/3rds of these batches completing successfully, while the other 1/3rd is failing. I identified a problem contig, and have been doing test maker runs on that to try and figure out whats going on. The full batch was run using an older version of maker (v2.26), so I first tried updating to the latest version of maker (v2.31.8). That version did point out one problem in an EST evidence file I was using, which I fixed. That allowed maker to get much farther, but as it was nearing the end of the run it crashed again with this error message: ++++++++++++++++++++++++++++++++++ setting up GFF3 output and fasta chunks processing the chunk divide preparing evidence clusters for annotations Preparing evidence for hint based annotation in cluster::shadow_cluster... ...finished clustering. cleaning clusters.... total clusters:1 now processing 0 ...processing 0 of 8 ...processing 1 of 8 ...processing 2 of 8 ...processing 3 of 8 ...processing 4 of 8 ...processing 5 of 8 ...processing 6 of 8 ...processing 7 of 8 ...processing 0 of 13 ...processing 1 of 13 ...processing 2 of 13 ...processing 3 of 13 ...processing 4 of 13 ...processing 5 of 13 ...processing 6 of 13 ...processing 7 of 13 ...processing 8 of 13 ...processing 9 of 13 ...processing 10 of 13 ...processing 11 of 13 ...processing 12 of 13 annotating transcripts Making transcripts clustering transcripts into genes for annotations Processing transcripts into genes ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Calling translate without a seq argument! STACK: Error::throw STACK: Bio::Root::Root::throw /home/ec2-user/bin/BioPerl-1.6.923/Bio/Root/Root.pm:486 STACK: Bio::Tools::CodonTable::translate /home/ec2-user/bin/BioPerl-1.6.923/Bio/Tools/CodonTable.pm:414 STACK: PhatHit_utils::_adjust /home/ec2-user/bin/maker/bin/../lib/PhatHit_utils.pm:846 STACK: PhatHit_utils::adjust_start_stop /home/ec2-user/bin/maker/bin/../lib/PhatHit_utils.pm:794 STACK: maker::auto_annotator::load_transcript_struct /home/ec2-user/bin/maker/bin/../lib/maker/auto_annotator.pm:2198 STACK: maker::auto_annotator::group_transcripts /home/ec2-user/bin/maker/bin/../lib/maker/auto_annotator.pm:2676 STACK: maker::auto_annotator::annotate_genes /home/ec2-user/bin/maker/bin/../lib/maker/auto_annotator.pm:1018 STACK: Process::MpiChunk::_go /home/ec2-user/bin/maker/bin/../lib/Process/MpiChunk.pm:3847 STACK: Process::MpiChunk::run /home/ec2-user/bin/maker/bin/../lib/Process/MpiChunk.pm:341 STACK: Process::MpiChunk::run_all /home/ec2-user/bin/maker/bin/../lib/Process/MpiChunk.pm:357 STACK: Process::MpiTiers::run_all /home/ec2-user/bin/maker/bin/../lib/Process/MpiTiers.pm:287 STACK: Process::MpiTiers::run_all /home/ec2-user/bin/maker/bin/../lib/Process/MpiTiers.pm:287 STACK: /home/ec2-user/bin/maker/bin/maker:686 ----------------------------------------------------------- --> rank=NA, hostname=ip-172-31-35-77.us-west-2.compute.internal ERROR: Failed while clustering transcripts into genes for annotations ERROR: Chunk failed at level:2, tier_type:4 FAILED CONTIG:ANCCEYDFT_Contig1675 ERROR: Chunk failed at level:6, tier_type:0 FAILED CONTIG:ANCCEYDFT_Contig1675 examining contents of the fasta file and run log --Next Contig-- Processing run.log file... Maker is now finished!!! Start_time: 1445912338 End_time: 1445913817 Elapsed: 1479 ++++++++++++++++++++++++++++++++++ The root of the error seems clearly stated: MSG: Calling translate without a seq argument! but I don't know what that means in real terms. All my inputs appear valid. The contig I am testing with has 1 plus strand gene represented in the evidence files. And I've set a local TMP directory since I've read that sometimes these kinds of problems can stem from the program running out of TMP space. I am pretty sure that is not happening here (I put TMP on a disk with 1.1Tb of space, and the test contig is only 13kbp). Can anyone help me figure out what is going on? Thanks, John Martin ____ This email message is a private communication. The information transmitted, including attachments, is intended only for the person or entity to which it is addressed and may contain confidential, privileged, and/or proprietary material. Any review, duplication, retransmission, distribution, or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is unauthorized by the sender and is prohibited. If you have received this message in error, please contact the sender immediately by return email and delete the original message from all computer systems. Thank you. From roscito at mpi-cbg.de Mon Oct 5 06:06:57 2015 From: roscito at mpi-cbg.de (roscito) Date: Mon, 5 Oct 2015 14:06:57 +0200 Subject: [maker-devel] different exons predicted in different maker rounds Message-ID: <2FD50373-B02F-49C0-8414-EF6705BB5826@mpi-cbg.de> Dear all, First of all, I'd like to thank everyone in this forum for all the tips and comments on the best strategies for running MAKER, they have been really helpful so far. However, I still don't fully understand the behaviour of MAKER when ran iteratively, and I compare the predictions from each round. Let me explain: My input data are the following: - the repeat-masked genome of a vertebrate (~2Gb); - mRNA data for this species mapped to the genome with tophat2 and assembled into transcripts with cufflinks; - exonerate-mapped proteins in gff3 format to the reference genome, from closely related species (global alignment) For the first round of MAKER, I provided both cufflinks and exonerate-mapped proteins with the options est2genome and protein2genome = 1. From maker output, I generated the SNAP .hmm file (as the instructions in http://gmod.org/wiki/MAKER_Tutorial) and provided it as input to the second round of MAKER. For this second round I still gave cufflinks + exonerated proteins, but switched both est2genome ad protein2genome to 0. After finished, I generated SNAP .hmm once more and provided it for the 3rd and final round of MAKER, along with cufflinks and exonerated-mapped prots and est/prot2genome=0 As sort of a sanity check, I went on and ran a 4th round of MAKER with the SNAP .hmm file from round3, cufflinks and exonerated-mapped prots and est/prot2genome=0, and this time specifying alt_splice=1. For all the rounds, I also specified single_exon=1. I loaded the gene predictions from each round plus the cufflink transcripts and the exonerated proteins to the genome browser to visually inspect the output. I saw a few strange cases where MAKER doesn't seem to use the protein/mRNA evidences for the gene predictions, and I would greatly appreciate any feedback/ideas on what I could possible be doing wrong. Here are a few screenshots so you know what I'm talking about: In this first example, MAKER misses a conserved exon for which there is both protein and mRNA evidence, and only if I specify alt_splice I get the exon 'back'. In this second example, MAKER completely ignores lots of exons, all conserved across vertebrates, and supported by protein/mRNA evidence. In the third example, there is no prediction from round1, the one from round2 matches the protein/mRNA evidence, and then in the final round3 and 4, an extra exon appears. (hope you'l be able to see the images above) As I said, I would greatly appreciate any feedback on these strange cases. Perhaps I'm missing some parameter(s)? Thanks a lot. All the best, Juliana -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: example1.png Type: image/png Size: 53178 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: example2.png Type: image/png Size: 55134 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: example3.png Type: image/png Size: 67598 bytes Desc: not available URL: From carsonhh at gmail.com Fri Oct 9 12:58:02 2015 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 9 Oct 2015 12:58:02 -0600 Subject: [maker-devel] different exons predicted in different maker rounds In-Reply-To: <2FD50373-B02F-49C0-8414-EF6705BB5826@mpi-cbg.de> References: <2FD50373-B02F-49C0-8414-EF6705BB5826@mpi-cbg.de> Message-ID: <1D5FAF7F-4758-4C4A-90EB-31C7BAE34725@gmail.com> Some of your cufflinks evidence is contradicting the existence of the exon. When you have alt_splice=0, the evidence is passed in in it?s entirety, and all sources are equal. When you have alt_splice=1 set, certain pieces of spliced evidence are given higher priority (in an iterative fashion), and cannot be overridden by contradictory evidence. The result is that there is a specific combination of evidence based hints that allow the gene predictor to find the exon, but when it sees everything the HMM doesn?t score the exon as very likely. I?d recommend not running with cufflinks because its result usually have low specificity, so it generates a lot of bad hints. Use Trinity instead to assembly everything, then allow it to be aligned inside of MAKER. Trinity assembled contigs give much greater specificity in the final results. ?Carson > On Oct 5, 2015, at 6:06 AM, roscito wrote: > > Dear all, > > First of all, I'd like to thank everyone in this forum for all the tips and comments on the best strategies for running MAKER, they have been really helpful so far. > However, I still don't fully understand the behaviour of MAKER when ran iteratively, and I compare the predictions from each round. Let me explain: > > My input data are the following: > - the repeat-masked genome of a vertebrate (~2Gb); > - mRNA data for this species mapped to the genome with tophat2 and assembled into transcripts with cufflinks; > - exonerate-mapped proteins in gff3 format to the reference genome, from closely related species (global alignment) > > For the first round of MAKER, I provided both cufflinks and exonerate-mapped proteins with the options est2genome and protein2genome = 1. From maker output, I generated the SNAP .hmm file (as the instructions in http://gmod.org/wiki/MAKER_Tutorial ) and provided it as input to the second round of MAKER. > For this second round I still gave cufflinks + exonerated proteins, but switched both est2genome ad protein2genome to 0. After finished, I generated SNAP .hmm once more and provided it for the 3rd and final round of MAKER, along with cufflinks and exonerated-mapped prots and est/prot2genome=0 > > As sort of a sanity check, I went on and ran a 4th round of MAKER with the SNAP .hmm file from round3, cufflinks and exonerated-mapped prots and est/prot2genome=0, and this time specifying alt_splice=1. > For all the rounds, I also specified single_exon=1. > > > I loaded the gene predictions from each round plus the cufflink transcripts and the exonerated proteins to the genome browser to visually inspect the output. I saw a few strange cases where MAKER doesn't seem to use the protein/mRNA evidences for the gene predictions, and I would greatly appreciate any feedback/ideas on what I could possible be doing wrong. Here are a few screenshots so you know what I'm talking about: > > In this first example, MAKER misses a conserved exon for which there is both protein and mRNA evidence, and only if I specify alt_splice I get the exon 'back'. > > > > In this second example, MAKER completely ignores lots of exons, all conserved across vertebrates, and supported by protein/mRNA evidence. > > > > In the third example, there is no prediction from round1, the one from round2 matches the protein/mRNA evidence, and then in the final round3 and 4, an extra exon appears. > > > > > (hope you'l be able to see the images above) > As I said, I would greatly appreciate any feedback on these strange cases. Perhaps I'm missing some parameter(s)? > > Thanks a lot. > All the best, > Juliana > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From kmkocot at gmail.com Fri Oct 16 22:10:14 2015 From: kmkocot at gmail.com (Kevin Kocot) Date: Sat, 17 Oct 2015 14:10:14 +1000 Subject: [maker-devel] Maker not producing expected output Message-ID: <5621CA26.8050807@uq.edu.au> Hello, I've run Maker on a draft invertebrate genome and it seemed to finish successfully. However, many of the expected output files were not produced. If I go to, for example, XX_datastore/00/0C/scaffold-334630/, all I see is: theVoid.scaffold-334630 run.log scaffold-334630.gff In particular, I'm looking for the transcripts and proteins fasta files. I'm sure I have a configuration setting incorrect or one of the dependencies not correctly installed, but I can't figure out what the problem is. Any thoughts on how I can resolve this issue and generate these files? Ideally I would love to be able to generate these files without having to run the whole pipeline again. Details on my configuration settings and the contents of the run.log file from my example above are pasted below. Thank you, Kevin ----- run.log from the example folder above looks like this: ----- SHARED_ID d574e9ca9b0019a9fe147ccb9db3588b CTL_OPTIONS maker_gff CTL_OPTIONS other_gff CTL_OPTIONS est test-transcriptome.fa CTL_OPTIONS est_reads CTL_OPTIONS altest KK273.fa CTL_OPTIONS est_gff CTL_OPTIONS altest_gff CTL_OPTIONS protein test-AA.fa CTL_OPTIONS protein_gff CTL_OPTIONS model_org all CTL_OPTIONS repeat_protein te_proteins.fasta CTL_OPTIONS rmlib CTL_OPTIONS rm_gff CTL_OPTIONS organism_type eukaryotic CTL_OPTIONS predictor est2genome,genemark,protein2genome CTL_OPTIONS est2genome 1 CTL_OPTIONS altest2genome 0 CTL_OPTIONS snaphmm CTL_OPTIONS gmhmm output/gmhmm.mod CTL_OPTIONS augustus_species CTL_OPTIONS fgenesh_par_file CTL_OPTIONS model_gff CTL_OPTIONS pred_gff CTL_OPTIONS max_dna_len 100000 CTL_OPTIONS split_hit 10000 CTL_OPTIONS pred_flank 200 CTL_OPTIONS pred_stats 0 CTL_OPTIONS min_protein 0 CTL_OPTIONS AED_threshold 1 CTL_OPTIONS single_exon 0 CTL_OPTIONS single_length 250 CTL_OPTIONS keep_preds 0 CTL_OPTIONS map_forward 0 CTL_OPTIONS est_forward 0 CTL_OPTIONS correct_est_fusion 0 CTL_OPTIONS alt_splice 0 CTL_OPTIONS always_complete 0 CTL_OPTIONS alt_peptide C CTL_OPTIONS evaluate 0 CTL_OPTIONS blast_type ncbi+ CTL_OPTIONS softmask 1 CTL_OPTIONS pcov_blastn 0.8 CTL_OPTIONS pid_blastn 0.85 CTL_OPTIONS eval_blastn 1e-10 CTL_OPTIONS bit_blastn 40 CTL_OPTIONS depth_blastn 0 CTL_OPTIONS pcov_rm_blastx 0.5 CTL_OPTIONS pid_rm_blastx 0.4 CTL_OPTIONS eval_rm_blastx 1e-06 CTL_OPTIONS bit_rm_blastx 30 CTL_OPTIONS pcov_blastx 0.5 CTL_OPTIONS pid_blastx 0.4 CTL_OPTIONS depth_blastx 0 CTL_OPTIONS eval_blastx 1e-06 CTL_OPTIONS bit_blastx 30 CTL_OPTIONS pcov_tblastx 0.8 CTL_OPTIONS pid_tblastx 0.85 CTL_OPTIONS eval_tblastx 1e-10 CTL_OPTIONS bit_tblastx 40 CTL_OPTIONS depth_tblastx 0 CTL_OPTIONS ep_score_limit 20 CTL_OPTIONS en_score_limit 20 CTL_OPTIONS enable_fathom 0 CTL_OPTIONS unmask 0 CTL_OPTIONS model_pass 0 CTL_OPTIONS est_pass 0 CTL_OPTIONS altest_pass 0 CTL_OPTIONS protein_pass 0 CTL_OPTIONS rm_pass 0 CTL_OPTIONS other_pass 0 CTL_OPTIONS pred_pass 0 CTL_OPTIONS run genemark LOGCHILD /media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/run.log.child.0 LOGCHILD /media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/run.log.child.0 LOGCHILD /media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/run.log.child.0 STARTED test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/scaffold-334630.abinit_nomask.0.gmhmm%2Emod.genemark FINISHED test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/scaffold-334630.abinit_nomask.0.gmhmm%2Emod.genemark STARTED test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/scaffold-334630.0.pred.raw.section FINISHED test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/scaffold-334630.0.pred.raw.section LOGCHILD /media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/run.log.child.0 LOGCHILD /media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/run.log.child.0 LOGCHILD /media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/run.log.child.0 STARTED test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/scaffold-334630.0.final.section FINISHED test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/scaffold-334630.0.final.section LOGCHILD /media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/run.log.child.0 LOGCHILD /media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/run.log.child.0 LOGCHILD /media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/run.log.child.0 ----- maker_opts ----- #-----Genome (these are always required) genome=/media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.fas #genome sequence (fasta file or fasta embeded in GFF3 file) organism_type=eukaryotic #eukaryotic or prokaryotic. Default is eukaryotic #-----Re-annotation Using MAKER Derived GFF3 maker_gff= #MAKER derived GFF3 file est_pass=0 #use ESTs in maker_gff: 1 = yes, 0 = no altest_pass=0 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no protein_pass=0 #use protein alignments in maker_gff: 1 = yes, 0 = no rm_pass=0 #use repeats in maker_gff: 1 = yes, 0 = no model_pass=0 #use gene models in maker_gff: 1 = yes, 0 = no pred_pass=0 #use ab-initio predictions in maker_gff: 1 = yes, 0 = no other_pass=0 #passthrough anyything else in maker_gff: 1 = yes, 0 = no #-----EST Evidence (for best results provide a file for at least one) est=/media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test-transcriptome.fa #set of ESTs or assembled mRNA-seq in fasta format altest=/media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/KK273.fa #EST/cDNA sequence file in fasta format from an alternate organism est_gff= #aligned ESTs or mRNA-seq from an external GFF3 file altest_gff= #aligned ESTs from a closly relate species in GFF3 format #-----Protein Homology Evidence (for best results provide a file for at least one) protein=/media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test-AA.fa #protein sequence file in fasta format (i.e. from mutiple oransisms) protein_gff= #aligned protein homology evidence from an external GFF3 file #-----Repeat Masking (leave values blank to skip repeat masking) model_org=all #select a model organism for RepBase masking in RepeatMasker rmlib= #provide an organism specific repeat library in fasta format for RepeatMasker repeat_protein=/usr/local/bin/maker/data/te_proteins.fasta #provide a fasta file of transposable element proteins for RepeatRunner rm_gff= #pre-identified repeat elements from an external GFF3 file prok_rm=0 #forces MAKER to repeatmask prokaryotes (no reason to change this), 1 = yes, 0 = no softmask=1 #use soft-masking rather than hard-masking in BLAST (i.e. seg and dust filtering) #-----Gene Prediction snaphmm= #SNAP HMM file gmhmm=/media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/output/gmhmm.mod #GeneMark HMM file augustus_species= #Augustus gene prediction species model fgenesh_par_file= #FGENESH parameter file pred_gff= #ab-initio predictions from an external GFF3 file model_gff= #annotated gene models from an external GFF3 file (annotation pass-through) est2genome=1 #infer gene predictions directly from ESTs, 1 = yes, 0 = no protein2genome=1 #infer predictions from protein homology, 1 = yes, 0 = no trna=0 #find tRNAs with tRNAscan, 1 = yes, 0 = no snoscan_rrna= #rRNA file to have Snoscan find snoRNAs unmask=0 #also run ab-initio prediction programs on unmasked sequence, 1 = yes, 0 = no #-----Other Annotation Feature Types (features MAKER doesn't recognize) other_gff= #extra features to pass-through to final MAKER generated GFF3 file #-----External Application Behavior Options alt_peptide=C #amino acid used to replace non-standard amino acids in BLAST databases cpus=1 #max number of cpus to use in BLAST and RepeatMasker (not for MPI, leave 1 when using MPI) #-----MAKER Behavior Options max_dna_len=100000 #length for dividing up contigs into chunks (increases/decreases memory usage) min_contig=1 #skip genome contigs below this length (under 10kb are often useless) pred_flank=200 #flank for extending evidence clusters sent to gene predictors pred_stats=0 #report AED and QI statistics for all predictions as well as models AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and 1) min_protein=0 #require at least this many amino acids in predicted proteins alt_splice=0 #Take extra steps to try and find alternative splicing, 1 = yes, 0 = no always_complete=0 #extra steps to force start and stop codons, 1 = yes, 0 = no map_forward=0 #map names and attributes forward from old GFF3 genes, 1 = yes, 0 = no keep_preds=0 #Concordance threshold to add unsupported gene prediction (bound by 0 and 1) split_hit=10000 #length for the splitting of hits (expected max intron size for evidence alignments) single_exon=0 #consider single exon EST evidence when generating annotations, 1 = yes, 0 = no single_length=250 #min length required for single exon ESTs if 'single_exon is enabled' correct_est_fusion=0 #limits use of ESTs in annotation to avoid fusion genes tries=2 #number of times to try a contig if there is a failure for some reason clean_try=0 #remove all data from previous run before retrying, 1 = yes, 0 = no clean_up=0 #removes theVoid directory with individual analysis files, 1 = yes, 0 = no TMP= #specify a directory other than the system default temporary directory for temporary files -- Kevin M. Kocot, Ph.D. NSF International Postdoctoral Research Fellow Degnan Lab The University of Queensland School of Biological Sciences 325 Goddard Building 8 St. Lucia, QLD 4072 Australia Ph: +61 0402 488 430 From dence at genetics.utah.edu Sat Oct 17 09:46:09 2015 From: dence at genetics.utah.edu (Daniel Ence) Date: Sat, 17 Oct 2015 15:46:09 +0000 Subject: [maker-devel] Maker not producing expected output In-Reply-To: <5621CA26.8050807@uq.edu.au> References: <5621CA26.8050807@uq.edu.au> Message-ID: Hi Kevin, So I have a couple of clarifying questions, and an explanation that?ll hopefully be helpful. If you look in the master datastore log, do you see an entry that shows that scaffold finished successfully? It will have the name of the scaffold, then the path to the results directory, and then a status. There should be one that shows that maker started working on it, and one that shows that maker finished it. Second what are the files that you?re expecting to see? I think you?re expecting to see couple of fasta files and a gff3 file that contain all the annotation results all gathered together. You can gather those results with the fasta_merge, and gff3_merge scripts that came with maker. To explain what you saw in that example results directory that you sent, if there weren?t any models or predictions on that scaffold, then there won?t be fasta files in the results directory. You could verify that by looking at the scaffold-334630.gff file. The fast_merge, and gff3_merge will gather all of the results fasta and gff files for all the scaffolds and put them into a few fasta files and one gff3 files, respectively. Let me know whether that helps, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 > On Oct 16, 2015, at 10:10 PM, Kevin Kocot wrote: > > Hello, > > I've run Maker on a draft invertebrate genome and it seemed to finish successfully. However, many of the expected output files were not produced. If I go to, for example, XX_datastore/00/0C/scaffold-334630/, all I see is: > > theVoid.scaffold-334630 > run.log > scaffold-334630.gff > > In particular, I'm looking for the transcripts and proteins fasta files. I'm sure I have a configuration setting incorrect or one of the dependencies not correctly installed, but I can't figure out what the problem is. Any thoughts on how I can resolve this issue and generate these files? Ideally I would love to be able to generate these files without having to run the whole pipeline again. Details on my configuration settings and the contents of the run.log file from my example above are pasted below. > > Thank you, > Kevin > > ----- > run.log from the example folder above looks like this: > ----- > SHARED_ID d574e9ca9b0019a9fe147ccb9db3588b > CTL_OPTIONS maker_gff > CTL_OPTIONS other_gff > CTL_OPTIONS est test-transcriptome.fa > CTL_OPTIONS est_reads > CTL_OPTIONS altest KK273.fa > CTL_OPTIONS est_gff > CTL_OPTIONS altest_gff > CTL_OPTIONS protein test-AA.fa > CTL_OPTIONS protein_gff > CTL_OPTIONS model_org all > CTL_OPTIONS repeat_protein te_proteins.fasta > CTL_OPTIONS rmlib > CTL_OPTIONS rm_gff > CTL_OPTIONS organism_type eukaryotic > CTL_OPTIONS predictor est2genome,genemark,protein2genome > CTL_OPTIONS est2genome 1 > CTL_OPTIONS altest2genome 0 > CTL_OPTIONS snaphmm > CTL_OPTIONS gmhmm output/gmhmm.mod > CTL_OPTIONS augustus_species > CTL_OPTIONS fgenesh_par_file > CTL_OPTIONS model_gff > CTL_OPTIONS pred_gff > CTL_OPTIONS max_dna_len 100000 > CTL_OPTIONS split_hit 10000 > CTL_OPTIONS pred_flank 200 > CTL_OPTIONS pred_stats 0 > CTL_OPTIONS min_protein 0 > CTL_OPTIONS AED_threshold 1 > CTL_OPTIONS single_exon 0 > CTL_OPTIONS single_length 250 > CTL_OPTIONS keep_preds 0 > CTL_OPTIONS map_forward 0 > CTL_OPTIONS est_forward 0 > CTL_OPTIONS correct_est_fusion 0 > CTL_OPTIONS alt_splice 0 > CTL_OPTIONS always_complete 0 > CTL_OPTIONS alt_peptide C > CTL_OPTIONS evaluate 0 > CTL_OPTIONS blast_type ncbi+ > CTL_OPTIONS softmask 1 > CTL_OPTIONS pcov_blastn 0.8 > CTL_OPTIONS pid_blastn 0.85 > CTL_OPTIONS eval_blastn 1e-10 > CTL_OPTIONS bit_blastn 40 > CTL_OPTIONS depth_blastn 0 > CTL_OPTIONS pcov_rm_blastx 0.5 > CTL_OPTIONS pid_rm_blastx 0.4 > CTL_OPTIONS eval_rm_blastx 1e-06 > CTL_OPTIONS bit_rm_blastx 30 > CTL_OPTIONS pcov_blastx 0.5 > CTL_OPTIONS pid_blastx 0.4 > CTL_OPTIONS depth_blastx 0 > CTL_OPTIONS eval_blastx 1e-06 > CTL_OPTIONS bit_blastx 30 > CTL_OPTIONS pcov_tblastx 0.8 > CTL_OPTIONS pid_tblastx 0.85 > CTL_OPTIONS eval_tblastx 1e-10 > CTL_OPTIONS bit_tblastx 40 > CTL_OPTIONS depth_tblastx 0 > CTL_OPTIONS ep_score_limit 20 > CTL_OPTIONS en_score_limit 20 > CTL_OPTIONS enable_fathom 0 > CTL_OPTIONS unmask 0 > CTL_OPTIONS model_pass 0 > CTL_OPTIONS est_pass 0 > CTL_OPTIONS altest_pass 0 > CTL_OPTIONS protein_pass 0 > CTL_OPTIONS rm_pass 0 > CTL_OPTIONS other_pass 0 > CTL_OPTIONS pred_pass 0 > CTL_OPTIONS run genemark > LOGCHILD /media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/run.log.child.0 > LOGCHILD /media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/run.log.child.0 > LOGCHILD /media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/run.log.child.0 > STARTED test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/scaffold-334630.abinit_nomask.0.gmhmm%2Emod.genemark > FINISHED test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/scaffold-334630.abinit_nomask.0.gmhmm%2Emod.genemark > STARTED test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/scaffold-334630.0.pred.raw.section > FINISHED test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/scaffold-334630.0.pred.raw.section > LOGCHILD /media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/run.log.child.0 > LOGCHILD /media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/run.log.child.0 > LOGCHILD /media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/run.log.child.0 > STARTED test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/scaffold-334630.0.final.section > FINISHED test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/scaffold-334630.0.final.section > LOGCHILD /media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/run.log.child.0 > LOGCHILD /media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/run.log.child.0 > LOGCHILD /media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/run.log.child.0 > > ----- > maker_opts > ----- > #-----Genome (these are always required) > genome=/media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.fas #genome sequence (fasta file or fasta embeded in GFF3 file) > organism_type=eukaryotic #eukaryotic or prokaryotic. Default is eukaryotic > > #-----Re-annotation Using MAKER Derived GFF3 > maker_gff= #MAKER derived GFF3 file > est_pass=0 #use ESTs in maker_gff: 1 = yes, 0 = no > altest_pass=0 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no > protein_pass=0 #use protein alignments in maker_gff: 1 = yes, 0 = no > rm_pass=0 #use repeats in maker_gff: 1 = yes, 0 = no > model_pass=0 #use gene models in maker_gff: 1 = yes, 0 = no > pred_pass=0 #use ab-initio predictions in maker_gff: 1 = yes, 0 = no > other_pass=0 #passthrough anyything else in maker_gff: 1 = yes, 0 = no > > #-----EST Evidence (for best results provide a file for at least one) > est=/media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test-transcriptome.fa #set of ESTs or assembled mRNA-seq in fasta format > altest=/media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/KK273.fa #EST/cDNA sequence file in fasta format from an alternate organism > est_gff= #aligned ESTs or mRNA-seq from an external GFF3 file > altest_gff= #aligned ESTs from a closly relate species in GFF3 format > > #-----Protein Homology Evidence (for best results provide a file for at least one) > protein=/media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test-AA.fa #protein sequence file in fasta format (i.e. from mutiple oransisms) > protein_gff= #aligned protein homology evidence from an external GFF3 file > > #-----Repeat Masking (leave values blank to skip repeat masking) > model_org=all #select a model organism for RepBase masking in RepeatMasker > rmlib= #provide an organism specific repeat library in fasta format for RepeatMasker > repeat_protein=/usr/local/bin/maker/data/te_proteins.fasta #provide a fasta file of transposable element proteins for RepeatRunner > rm_gff= #pre-identified repeat elements from an external GFF3 file > prok_rm=0 #forces MAKER to repeatmask prokaryotes (no reason to change this), 1 = yes, 0 = no > softmask=1 #use soft-masking rather than hard-masking in BLAST (i.e. seg and dust filtering) > > #-----Gene Prediction > snaphmm= #SNAP HMM file > gmhmm=/media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/output/gmhmm.mod #GeneMark HMM file > augustus_species= #Augustus gene prediction species model > fgenesh_par_file= #FGENESH parameter file > pred_gff= #ab-initio predictions from an external GFF3 file > model_gff= #annotated gene models from an external GFF3 file (annotation pass-through) > est2genome=1 #infer gene predictions directly from ESTs, 1 = yes, 0 = no > protein2genome=1 #infer predictions from protein homology, 1 = yes, 0 = no > trna=0 #find tRNAs with tRNAscan, 1 = yes, 0 = no > snoscan_rrna= #rRNA file to have Snoscan find snoRNAs > unmask=0 #also run ab-initio prediction programs on unmasked sequence, 1 = yes, 0 = no > > #-----Other Annotation Feature Types (features MAKER doesn't recognize) > other_gff= #extra features to pass-through to final MAKER generated GFF3 file > > #-----External Application Behavior Options > alt_peptide=C #amino acid used to replace non-standard amino acids in BLAST databases > cpus=1 #max number of cpus to use in BLAST and RepeatMasker (not for MPI, leave 1 when using MPI) > > #-----MAKER Behavior Options > max_dna_len=100000 #length for dividing up contigs into chunks (increases/decreases memory usage) > min_contig=1 #skip genome contigs below this length (under 10kb are often useless) > > pred_flank=200 #flank for extending evidence clusters sent to gene predictors > pred_stats=0 #report AED and QI statistics for all predictions as well as models > AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and 1) > min_protein=0 #require at least this many amino acids in predicted proteins > alt_splice=0 #Take extra steps to try and find alternative splicing, 1 = yes, 0 = no > always_complete=0 #extra steps to force start and stop codons, 1 = yes, 0 = no > map_forward=0 #map names and attributes forward from old GFF3 genes, 1 = yes, 0 = no > keep_preds=0 #Concordance threshold to add unsupported gene prediction (bound by 0 and 1) > > split_hit=10000 #length for the splitting of hits (expected max intron size for evidence alignments) > single_exon=0 #consider single exon EST evidence when generating annotations, 1 = yes, 0 = no > single_length=250 #min length required for single exon ESTs if 'single_exon is enabled' > correct_est_fusion=0 #limits use of ESTs in annotation to avoid fusion genes > > tries=2 #number of times to try a contig if there is a failure for some reason > clean_try=0 #remove all data from previous run before retrying, 1 = yes, 0 = no > clean_up=0 #removes theVoid directory with individual analysis files, 1 = yes, 0 = no > TMP= #specify a directory other than the system default temporary directory for temporary files > > -- > Kevin M. Kocot, Ph.D. > NSF International Postdoctoral Research Fellow > Degnan Lab > The University of Queensland > School of Biological Sciences > 325 Goddard Building 8 > St. Lucia, QLD 4072 > Australia > Ph: +61 0402 488 430 > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Sat Oct 17 14:24:42 2015 From: carsonhh at gmail.com (Carson Holt) Date: Sat, 17 Oct 2015 14:24:42 -0600 Subject: [maker-devel] Maker not producing expected output In-Reply-To: <5621CA26.8050807@uq.edu.au> References: <5621CA26.8050807@uq.edu.au> Message-ID: <1117A475-526B-477A-B44A-E86A4A23262B@gmail.com> You will only get fasta files for the contig when there are gene models present on that contig. The only ab initio predictor you provided parameters for is GeneMark, and apparently it did not predict any genes for the contig in question. If it did you would have at least a fasta files in the output that contained all predictions made by GeneMark. MAKER doesn?t make gene models, rather it provides hints to other gene predictors based on the evidence alignments and then promotes and polishes the models they make. If they produce no models, then you will get no results. You can try adding additional gene predictors like SNAP (incase GeneMark just isn?t performing well), or you can check the length of your contig (contigs shorter than about 10kb rarely produce any results - they are too short to be annotatable). Try looking at the results from one of the larger contigs, or use fasta_merge to gather all results from all contigs. ?Carson > On Oct 16, 2015, at 10:10 PM, Kevin Kocot wrote: > > Hello, > > I've run Maker on a draft invertebrate genome and it seemed to finish successfully. However, many of the expected output files were not produced. If I go to, for example, XX_datastore/00/0C/scaffold-334630/, all I see is: > > theVoid.scaffold-334630 > run.log > scaffold-334630.gff > > In particular, I'm looking for the transcripts and proteins fasta files. I'm sure I have a configuration setting incorrect or one of the dependencies not correctly installed, but I can't figure out what the problem is. Any thoughts on how I can resolve this issue and generate these files? Ideally I would love to be able to generate these files without having to run the whole pipeline again. Details on my configuration settings and the contents of the run.log file from my example above are pasted below. > > Thank you, > Kevin > > ----- > run.log from the example folder above looks like this: > ----- > SHARED_ID d574e9ca9b0019a9fe147ccb9db3588b > CTL_OPTIONS maker_gff > CTL_OPTIONS other_gff > CTL_OPTIONS est test-transcriptome.fa > CTL_OPTIONS est_reads > CTL_OPTIONS altest KK273.fa > CTL_OPTIONS est_gff > CTL_OPTIONS altest_gff > CTL_OPTIONS protein test-AA.fa > CTL_OPTIONS protein_gff > CTL_OPTIONS model_org all > CTL_OPTIONS repeat_protein te_proteins.fasta > CTL_OPTIONS rmlib > CTL_OPTIONS rm_gff > CTL_OPTIONS organism_type eukaryotic > CTL_OPTIONS predictor est2genome,genemark,protein2genome > CTL_OPTIONS est2genome 1 > CTL_OPTIONS altest2genome 0 > CTL_OPTIONS snaphmm > CTL_OPTIONS gmhmm output/gmhmm.mod > CTL_OPTIONS augustus_species > CTL_OPTIONS fgenesh_par_file > CTL_OPTIONS model_gff > CTL_OPTIONS pred_gff > CTL_OPTIONS max_dna_len 100000 > CTL_OPTIONS split_hit 10000 > CTL_OPTIONS pred_flank 200 > CTL_OPTIONS pred_stats 0 > CTL_OPTIONS min_protein 0 > CTL_OPTIONS AED_threshold 1 > CTL_OPTIONS single_exon 0 > CTL_OPTIONS single_length 250 > CTL_OPTIONS keep_preds 0 > CTL_OPTIONS map_forward 0 > CTL_OPTIONS est_forward 0 > CTL_OPTIONS correct_est_fusion 0 > CTL_OPTIONS alt_splice 0 > CTL_OPTIONS always_complete 0 > CTL_OPTIONS alt_peptide C > CTL_OPTIONS evaluate 0 > CTL_OPTIONS blast_type ncbi+ > CTL_OPTIONS softmask 1 > CTL_OPTIONS pcov_blastn 0.8 > CTL_OPTIONS pid_blastn 0.85 > CTL_OPTIONS eval_blastn 1e-10 > CTL_OPTIONS bit_blastn 40 > CTL_OPTIONS depth_blastn 0 > CTL_OPTIONS pcov_rm_blastx 0.5 > CTL_OPTIONS pid_rm_blastx 0.4 > CTL_OPTIONS eval_rm_blastx 1e-06 > CTL_OPTIONS bit_rm_blastx 30 > CTL_OPTIONS pcov_blastx 0.5 > CTL_OPTIONS pid_blastx 0.4 > CTL_OPTIONS depth_blastx 0 > CTL_OPTIONS eval_blastx 1e-06 > CTL_OPTIONS bit_blastx 30 > CTL_OPTIONS pcov_tblastx 0.8 > CTL_OPTIONS pid_tblastx 0.85 > CTL_OPTIONS eval_tblastx 1e-10 > CTL_OPTIONS bit_tblastx 40 > CTL_OPTIONS depth_tblastx 0 > CTL_OPTIONS ep_score_limit 20 > CTL_OPTIONS en_score_limit 20 > CTL_OPTIONS enable_fathom 0 > CTL_OPTIONS unmask 0 > CTL_OPTIONS model_pass 0 > CTL_OPTIONS est_pass 0 > CTL_OPTIONS altest_pass 0 > CTL_OPTIONS protein_pass 0 > CTL_OPTIONS rm_pass 0 > CTL_OPTIONS other_pass 0 > CTL_OPTIONS pred_pass 0 > CTL_OPTIONS run genemark > LOGCHILD /media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/run.log.child.0 > LOGCHILD /media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/run.log.child.0 > LOGCHILD /media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/run.log.child.0 > STARTED test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/scaffold-334630.abinit_nomask.0.gmhmm%2Emod.genemark > FINISHED test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/scaffold-334630.abinit_nomask.0.gmhmm%2Emod.genemark > STARTED test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/scaffold-334630.0.pred.raw.section > FINISHED test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/scaffold-334630.0.pred.raw.section > LOGCHILD /media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/run.log.child.0 > LOGCHILD /media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/run.log.child.0 > LOGCHILD /media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/run.log.child.0 > STARTED test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/scaffold-334630.0.final.section > FINISHED test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/scaffold-334630.0.final.section > LOGCHILD /media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/run.log.child.0 > LOGCHILD /media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/run.log.child.0 > LOGCHILD /media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/run.log.child.0 > > ----- > maker_opts > ----- > #-----Genome (these are always required) > genome=/media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.fas #genome sequence (fasta file or fasta embeded in GFF3 file) > organism_type=eukaryotic #eukaryotic or prokaryotic. Default is eukaryotic > > #-----Re-annotation Using MAKER Derived GFF3 > maker_gff= #MAKER derived GFF3 file > est_pass=0 #use ESTs in maker_gff: 1 = yes, 0 = no > altest_pass=0 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no > protein_pass=0 #use protein alignments in maker_gff: 1 = yes, 0 = no > rm_pass=0 #use repeats in maker_gff: 1 = yes, 0 = no > model_pass=0 #use gene models in maker_gff: 1 = yes, 0 = no > pred_pass=0 #use ab-initio predictions in maker_gff: 1 = yes, 0 = no > other_pass=0 #passthrough anyything else in maker_gff: 1 = yes, 0 = no > > #-----EST Evidence (for best results provide a file for at least one) > est=/media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test-transcriptome.fa #set of ESTs or assembled mRNA-seq in fasta format > altest=/media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/KK273.fa #EST/cDNA sequence file in fasta format from an alternate organism > est_gff= #aligned ESTs or mRNA-seq from an external GFF3 file > altest_gff= #aligned ESTs from a closly relate species in GFF3 format > > #-----Protein Homology Evidence (for best results provide a file for at least one) > protein=/media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test-AA.fa #protein sequence file in fasta format (i.e. from mutiple oransisms) > protein_gff= #aligned protein homology evidence from an external GFF3 file > > #-----Repeat Masking (leave values blank to skip repeat masking) > model_org=all #select a model organism for RepBase masking in RepeatMasker > rmlib= #provide an organism specific repeat library in fasta format for RepeatMasker > repeat_protein=/usr/local/bin/maker/data/te_proteins.fasta #provide a fasta file of transposable element proteins for RepeatRunner > rm_gff= #pre-identified repeat elements from an external GFF3 file > prok_rm=0 #forces MAKER to repeatmask prokaryotes (no reason to change this), 1 = yes, 0 = no > softmask=1 #use soft-masking rather than hard-masking in BLAST (i.e. seg and dust filtering) > > #-----Gene Prediction > snaphmm= #SNAP HMM file > gmhmm=/media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/output/gmhmm.mod #GeneMark HMM file > augustus_species= #Augustus gene prediction species model > fgenesh_par_file= #FGENESH parameter file > pred_gff= #ab-initio predictions from an external GFF3 file > model_gff= #annotated gene models from an external GFF3 file (annotation pass-through) > est2genome=1 #infer gene predictions directly from ESTs, 1 = yes, 0 = no > protein2genome=1 #infer predictions from protein homology, 1 = yes, 0 = no > trna=0 #find tRNAs with tRNAscan, 1 = yes, 0 = no > snoscan_rrna= #rRNA file to have Snoscan find snoRNAs > unmask=0 #also run ab-initio prediction programs on unmasked sequence, 1 = yes, 0 = no > > #-----Other Annotation Feature Types (features MAKER doesn't recognize) > other_gff= #extra features to pass-through to final MAKER generated GFF3 file > > #-----External Application Behavior Options > alt_peptide=C #amino acid used to replace non-standard amino acids in BLAST databases > cpus=1 #max number of cpus to use in BLAST and RepeatMasker (not for MPI, leave 1 when using MPI) > > #-----MAKER Behavior Options > max_dna_len=100000 #length for dividing up contigs into chunks (increases/decreases memory usage) > min_contig=1 #skip genome contigs below this length (under 10kb are often useless) > > pred_flank=200 #flank for extending evidence clusters sent to gene predictors > pred_stats=0 #report AED and QI statistics for all predictions as well as models > AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and 1) > min_protein=0 #require at least this many amino acids in predicted proteins > alt_splice=0 #Take extra steps to try and find alternative splicing, 1 = yes, 0 = no > always_complete=0 #extra steps to force start and stop codons, 1 = yes, 0 = no > map_forward=0 #map names and attributes forward from old GFF3 genes, 1 = yes, 0 = no > keep_preds=0 #Concordance threshold to add unsupported gene prediction (bound by 0 and 1) > > split_hit=10000 #length for the splitting of hits (expected max intron size for evidence alignments) > single_exon=0 #consider single exon EST evidence when generating annotations, 1 = yes, 0 = no > single_length=250 #min length required for single exon ESTs if 'single_exon is enabled' > correct_est_fusion=0 #limits use of ESTs in annotation to avoid fusion genes > > tries=2 #number of times to try a contig if there is a failure for some reason > clean_try=0 #remove all data from previous run before retrying, 1 = yes, 0 = no > clean_up=0 #removes theVoid directory with individual analysis files, 1 = yes, 0 = no > TMP= #specify a directory other than the system default temporary directory for temporary files > > -- > Kevin M. Kocot, Ph.D. > NSF International Postdoctoral Research Fellow > Degnan Lab > The University of Queensland > School of Biological Sciences > 325 Goddard Building 8 > St. Lucia, QLD 4072 > Australia > Ph: +61 0402 488 430 > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From 14.chewsoklim at gmail.com Mon Oct 19 20:49:21 2015 From: 14.chewsoklim at gmail.com (Sok Lim Chew) Date: Tue, 20 Oct 2015 10:49:21 +0800 Subject: [maker-devel] Failed while doing blastx of proteins Message-ID: Hi all, The following errors occurred while I was using MAKER for annotation. I have searched around this forum but seems like the solutions provided do not works for me. ################################################################# ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Must have defined a valid name for Hit STACK: Error::throw STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:486 STACK: Bio::Search::Hit::GenericHit::new /usr/local/share/perl5/Bio/Search/Hit/GenericHit.pm:149 STACK: Bio::Search::Hit::PhatHit::Base::new maker/bin/../lib/Bio/Search/Hit/PhatHit/Base.pm:127 STACK: Bio::Search::Hit::PhatHit::blastx::new maker/bin/../lib/Bio/Search/Hit/PhatHit/blastx.pm:125 STACK: Bio::Search::Hit::HitFactory::create /usr/local/share/perl5/Bio/Search/Hit/HitFactory.pm:124 STACK: Bio::Factory::ObjectFactoryI::create_object /usr/local/share/perl5/Bio/Factory/ObjectFactoryI.pm:114 STACK: Bio::Search::Iteration::GenericIteration::newhits_below_threshold /usr/local/share/perl5/Bio/Search/Iteration/GenericIteration.pm:506 STACK: Bio::Search::Iteration::GenericIteration::newhits /usr/local/share/perl5/Bio/Search/Iteration/GenericIteration.pm:488 STACK: Bio::Search::Iteration::GenericIteration::hits /usr/local/share/perl5/Bio/Search/Iteration/GenericIteration.pm:469 STACK: Bio::Search::Result::BlastResult::hits /usr/local/share/perl5/Bio/Search/Result/BlastResult.pm:168 STACK: Bio::Search::Result::BlastResult::num_hits /usr/local/share/perl5/Bio/Search/Result/BlastResult.pm:242 STACK: Widget::blastx::keepers maker/bin/../lib/Widget/blastx.pm:164 STACK: Widget::blastx::parse maker/bin/../lib/Widget/blastx.pm:132 STACK: GI::blastx_as_chunks maker/bin/../lib/GI.pm:2457 STACK: GI::blastx_as_chunks maker/bin/../lib/GI.pm:2466 STACK: Process::MpiChunk::_go maker/bin/../lib/Process/MpiChunk.pm:2687 STACK: Process::MpiChunk::run maker/bin/../lib/Process/MpiChunk.pm:341 STACK: Process::MpiChunk::run_all maker/bin/../lib/Process/MpiChunk.pm:357 STACK: Process::MpiTiers::run_all maker/bin/../lib/Process/MpiTiers.pm:287 STACK: Process::MpiTiers::run_all maker/bin/../lib/Process/MpiTiers.pm:287 STACK: maker/bin/maker:686 ----------------------------------------------------------- --> rank=NA, hostname=gena2 --> rank=NA, hostname=gena2 --> rank=NA, hostname=gena2 --> rank=NA, hostname=gena2 ERROR: Failed while doing blastx of proteins ERROR: Chunk failed at level:8, tier_type:3 FAILED CONTIG:Contig1 ERROR: Chunk failed at level:4, tier_type:0 FAILED CONTIG:Contig1 examining contents of the fasta file and run log ########################################################### Is anyone has any idea on this? Thanks, SokLim -------------- next part -------------- An HTML attachment was scrubbed... URL: From 14.chewsoklim at gmail.com Mon Oct 19 21:56:09 2015 From: 14.chewsoklim at gmail.com (Sok Lim Chew) Date: Tue, 20 Oct 2015 11:56:09 +0800 Subject: [maker-devel] Failed while doing blastx of proteins Message-ID: Hi all, The following errors occurred while I was using MAKER for annotation. I have searched around this forum but seems like the solutions provided do not works for me. ################################################################# ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Must have defined a valid name for Hit STACK: Error::throw STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:486 STACK: Bio::Search::Hit::GenericHit::new /usr/local/share/perl5/Bio/Search/Hit/GenericHit.pm:149 STACK: Bio::Search::Hit::PhatHit::Base::new maker/bin/../lib/Bio/Search/Hit/PhatHit/Base.pm:127 STACK: Bio::Search::Hit::PhatHit::blastx::new maker/bin/../lib/Bio/Search/Hit/PhatHit/blastx.pm:125 STACK: Bio::Search::Hit::HitFactory::create /usr/local/share/perl5/Bio/Search/Hit/HitFactory.pm:124 STACK: Bio::Factory::ObjectFactoryI::create_object /usr/local/share/perl5/Bio/Factory/ObjectFactoryI.pm:114 STACK: Bio::Search::Iteration::GenericIteration::newhits_below_threshold /usr/local/share/perl5/Bio/Search/Iteration/GenericIteration.pm:506 STACK: Bio::Search::Iteration::GenericIteration::newhits /usr/local/share/perl5/Bio/Search/Iteration/GenericIteration.pm:488 STACK: Bio::Search::Iteration::GenericIteration::hits /usr/local/share/perl5/Bio/Search/Iteration/GenericIteration.pm:469 STACK: Bio::Search::Result::BlastResult::hits /usr/local/share/perl5/Bio/Search/Result/BlastResult.pm:168 STACK: Bio::Search::Result::BlastResult::num_hits /usr/local/share/perl5/Bio/Search/Result/BlastResult.pm:242 STACK: Widget::blastx::keepers maker/bin/../lib/Widget/blastx.pm:164 STACK: Widget::blastx::parse maker/bin/../lib/Widget/blastx.pm:132 STACK: GI::blastx_as_chunks maker/bin/../lib/GI.pm:2457 STACK: GI::blastx_as_chunks maker/bin/../lib/GI.pm:2466 STACK: Process::MpiChunk::_go maker/bin/../lib/Process/MpiChunk.pm:2687 STACK: Process::MpiChunk::run maker/bin/../lib/Process/MpiChunk.pm:341 STACK: Process::MpiChunk::run_all maker/bin/../lib/Process/MpiChunk.pm:357 STACK: Process::MpiTiers::run_all maker/bin/../lib/Process/MpiTiers.pm:287 STACK: Process::MpiTiers::run_all maker/bin/../lib/Process/MpiTiers.pm:287 STACK: maker/bin/maker:686 ----------------------------------------------------------- --> rank=NA, hostname=gena2 --> rank=NA, hostname=gena2 --> rank=NA, hostname=gena2 --> rank=NA, hostname=gena2 ERROR: Failed while doing blastx of proteins ERROR: Chunk failed at level:8, tier_type:3 FAILED CONTIG:Contig1 ERROR: Chunk failed at level:4, tier_type:0 FAILED CONTIG:Contig1 examining contents of the fasta file and run log ########################################################### Is anyone has any idea on this? Thanks, SokLim -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Oct 20 09:52:33 2015 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 20 Oct 2015 09:52:33 -0600 Subject: [maker-devel] Failed while doing blastx of proteins In-Reply-To: References: Message-ID: <68B95736-10CD-4C90-8092-3AA754250799@gmail.com> Make sure you installed the CPAN version of BioPerl and not BioPerl live (Current version is 1.6.924). Also there are a couple of BLAST+ versions that have bugs. Use version BLAST+ version 2.2.28. What version of MAKER are you using? Should be 2.31.8. Also check that your /tmp directory is not full (will result in truncated output files). Thanks, Carson > On Oct 19, 2015, at 9:56 PM, Sok Lim Chew <14.chewsoklim at gmail.com> wrote: > > Hi all, > > The following errors occurred while I was using MAKER for annotation. I have searched around this forum but seems like the solutions provided do not works for me. > > ################################################################# > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Must have defined a valid name for Hit > STACK: Error::throw > STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:486 > STACK: Bio::Search::Hit::GenericHit::new /usr/local/share/perl5/Bio/Search/Hit/GenericHit.pm:149 > STACK: Bio::Search::Hit::PhatHit::Base::new maker/bin/../lib/Bio/Search/Hit/PhatHit/Base.pm:127 > STACK: Bio::Search::Hit::PhatHit::blastx::new maker/bin/../lib/Bio/Search/Hit/PhatHit/blastx.pm:125 > STACK: Bio::Search::Hit::HitFactory::create /usr/local/share/perl5/Bio/Search/Hit/HitFactory.pm:124 > STACK: Bio::Factory::ObjectFactoryI::create_object /usr/local/share/perl5/Bio/Factory/ObjectFactoryI.pm:114 > STACK: Bio::Search::Iteration::GenericIteration::newhits_below_threshold /usr/local/share/perl5/Bio/Search/Iteration/GenericIteration.pm:506 > STACK: Bio::Search::Iteration::GenericIteration::newhits /usr/local/share/perl5/Bio/Search/Iteration/GenericIteration.pm:488 > STACK: Bio::Search::Iteration::GenericIteration::hits /usr/local/share/perl5/Bio/Search/Iteration/GenericIteration.pm:469 > STACK: Bio::Search::Result::BlastResult::hits /usr/local/share/perl5/Bio/Search/Result/BlastResult.pm:168 > STACK: Bio::Search::Result::BlastResult::num_hits /usr/local/share/perl5/Bio/Search/Result/BlastResult.pm:242 > STACK: Widget::blastx::keepers > maker/bin/../lib/Widget/blastx.pm:164 > STACK: Widget::blastx::parse > maker/bin/../lib/Widget/blastx.pm:132 > STACK: GI::blastx_as_chunks > maker/bin/../lib/GI.pm:2457 > STACK: GI::blastx_as_chunks > maker/bin/../lib/GI.pm:2466 > STACK: Process::MpiChunk::_go > maker/bin/../lib/Process/MpiChunk.pm:2687 > STACK: Process::MpiChunk::run > maker/bin/../lib/Process/MpiChunk.pm:341 > STACK: Process::MpiChunk::run_all > maker/bin/../lib/Process/MpiChunk.pm:357 > STACK: Process::MpiTiers::run_all maker/bin/../lib/Process/MpiTiers.pm:287 > STACK: Process::MpiTiers::run_all maker/bin/../lib/Process/MpiTiers.pm:287 > STACK: maker/bin/maker:686 > ----------------------------------------------------------- > --> rank=NA, hostname=gena2 > --> rank=NA, hostname=gena2 > --> rank=NA, hostname=gena2 > --> rank=NA, hostname=gena2 > ERROR: Failed while doing blastx of proteins > ERROR: Chunk failed at level:8, tier_type:3 > FAILED CONTIG:Contig1 > > ERROR: Chunk failed at level:4, tier_type:0 > FAILED CONTIG:Contig1 > > examining contents of the fasta file and run log > > ########################################################### > > Is anyone has any idea on this? > > Thanks, > SokLim > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From mcsimenc at gmail.com Tue Oct 20 17:54:21 2015 From: mcsimenc at gmail.com (Matt Simenc) Date: Tue, 20 Oct 2015 16:54:21 -0700 Subject: [maker-devel] MPI large load Message-ID: Hi, I am using OpenMPI to run MAKER on 2 nodes with 40 CPUs/node. The load is distributing across the nodes ok but with a very large number of processes on each node. Sometimes there are several hundred more processes than can be executed at one time by a node. Is this a problem? If so, any suggestions on how to fix? Thanks! Matt Simenc Der Lab California State University Fullerton -------------- next part -------------- An HTML attachment was scrubbed... URL: From cjfields at illinois.edu Wed Oct 21 12:44:28 2015 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 21 Oct 2015 18:44:28 +0000 Subject: [maker-devel] Failed while doing blastx of proteins In-Reply-To: <68B95736-10CD-4C90-8092-3AA754250799@gmail.com> References: <68B95736-10CD-4C90-8092-3AA754250799@gmail.com> Message-ID: <5CCEB170-2B41-4CC1-A9CB-C246274345B9@illinois.edu> Agreed. It would be nice to know whether this is a Bioperl bug that needs addressing, but I?m not sure how easy it would be to pull out a test case. chris On Oct 20, 2015, at 10:52 AM, Carson Holt > wrote: Make sure you installed the CPAN version of BioPerl and not BioPerl live (Current version is 1.6.924). Also there are a couple of BLAST+ versions that have bugs. Use version BLAST+ version 2.2.28. What version of MAKER are you using? Should be 2.31.8. Also check that your /tmp directory is not full (will result in truncated output files). Thanks, Carson On Oct 19, 2015, at 9:56 PM, Sok Lim Chew <14.chewsoklim at gmail.com> wrote: Hi all, The following errors occurred while I was using MAKER for annotation. I have searched around this forum but seems like the solutions provided do not works for me. ################################################################# ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Must have defined a valid name for Hit STACK: Error::throw STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:486 STACK: Bio::Search::Hit::GenericHit::new /usr/local/share/perl5/Bio/Search/Hit/GenericHit.pm:149 STACK: Bio::Search::Hit::PhatHit::Base::new maker/bin/../lib/Bio/Search/Hit/PhatHit/Base.pm:127 STACK: Bio::Search::Hit::PhatHit::blastx::new maker/bin/../lib/Bio/Search/Hit/PhatHit/blastx.pm:125 STACK: Bio::Search::Hit::HitFactory::create /usr/local/share/perl5/Bio/Search/Hit/HitFactory.pm:124 STACK: Bio::Factory::ObjectFactoryI::create_object /usr/local/share/perl5/Bio/Factory/ObjectFactoryI.pm:114 STACK: Bio::Search::Iteration::GenericIteration::newhits_below_threshold /usr/local/share/perl5/Bio/Search/Iteration/GenericIteration.pm:506 STACK: Bio::Search::Iteration::GenericIteration::newhits /usr/local/share/perl5/Bio/Search/Iteration/GenericIteration.pm:488 STACK: Bio::Search::Iteration::GenericIteration::hits /usr/local/share/perl5/Bio/Search/Iteration/GenericIteration.pm:469 STACK: Bio::Search::Result::BlastResult::hits /usr/local/share/perl5/Bio/Search/Result/BlastResult.pm:168 STACK: Bio::Search::Result::BlastResult::num_hits /usr/local/share/perl5/Bio/Search/Result/BlastResult.pm:242 STACK: Widget::blastx::keepers maker/bin/../lib/Widget/blastx.pm:164 STACK: Widget::blastx::parse maker/bin/../lib/Widget/blastx.pm:132 STACK: GI::blastx_as_chunks maker/bin/../lib/GI.pm:2457 STACK: GI::blastx_as_chunks maker/bin/../lib/GI.pm:2466 STACK: Process::MpiChunk::_go maker/bin/../lib/Process/MpiChunk.pm:2687 STACK: Process::MpiChunk::run maker/bin/../lib/Process/MpiChunk.pm:341 STACK: Process::MpiChunk::run_all maker/bin/../lib/Process/MpiChunk.pm:357 STACK: Process::MpiTiers::run_all maker/bin/../lib/Process/MpiTiers.pm:287 STACK: Process::MpiTiers::run_all maker/bin/../lib/Process/MpiTiers.pm:287 STACK: maker/bin/maker:686 ----------------------------------------------------------- --> rank=NA, hostname=gena2 --> rank=NA, hostname=gena2 --> rank=NA, hostname=gena2 --> rank=NA, hostname=gena2 ERROR: Failed while doing blastx of proteins ERROR: Chunk failed at level:8, tier_type:3 FAILED CONTIG:Contig1 ERROR: Chunk failed at level:4, tier_type:0 FAILED CONTIG:Contig1 examining contents of the fasta file and run log ########################################################### Is anyone has any idea on this? Thanks, SokLim _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Oct 22 10:39:21 2015 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 22 Oct 2015 10:39:21 -0600 Subject: [maker-devel] MPI large load In-Reply-To: References: Message-ID: <4733DA58-3277-4703-A660-4EE819858694@gmail.com> Because MAKER is a pipeline, all processes it calls will generate separate processes (i.e. BLAST etc.). Also it will spawn a couple of helper processes to watch communication and files. The helper processes use 0% CPU, and the main MAKER process will yield to external system calls and processes until they finish execution. So they will never use a larger % of CPU than is specified. Also the way MPI works is it spawns a separate process for every CPU specified, so if you specify 40 CPUs you get 40 independent communicating processes rather than 1 process accessing 40 CPUs. So if you take into account the MPI processes, helper processes, and external system calls a 40 CPU specification could result in up to three times that many numbered processes existing simultaneously (even though no more than 40 will be active at a time). However if your system is having an issue letting the required number of processes exist, then it is a ulimit issue. Your administrator has the limit set too low. You can see what limits are set using the command ?ulimit -a?. You will need to get your system admin to fix it. ?Carson > On Oct 20, 2015, at 5:54 PM, Matt Simenc wrote: > > Hi, > > > > I am using OpenMPI to run MAKER on 2 nodes with 40 CPUs/node. The load is distributing across the nodes ok but with a very large number of processes on each node. Sometimes there are several hundred more processes than can be executed at one time by a node. Is this a problem? If so, any suggestions on how to fix? > > > > Thanks! > > > > Matt Simenc > > Der Lab > > California State University Fullerton > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From jmartin at genome.wustl.edu Tue Oct 27 15:35:38 2015 From: jmartin at genome.wustl.edu (John Martin) Date: Tue, 27 Oct 2015 16:35:38 -0500 Subject: [maker-devel] ERROR: Failed while clustering transcripts into genes for annotations Message-ID: <562FEE2A.4070907@genome.wustl.edu> I'm working on annotation for a de novo genomic assembly, and I have my jobs split into roughly equal (by seq length) batches. I am seeing ~2/3rds of these batches completing successfully, while the other 1/3rd is failing. I identified a problem contig, and have been doing test maker runs on that to try and figure out whats going on. The full batch was run using an older version of maker (v2.26), so I first tried updating to the latest version of maker (v2.31.8). That version did point out one problem in an EST evidence file I was using, which I fixed. That allowed maker to get much farther, but as it was nearing the end of the run it crashed again with this error message: ++++++++++++++++++++++++++++++++++ setting up GFF3 output and fasta chunks processing the chunk divide preparing evidence clusters for annotations Preparing evidence for hint based annotation in cluster::shadow_cluster... ...finished clustering. cleaning clusters.... total clusters:1 now processing 0 ...processing 0 of 8 ...processing 1 of 8 ...processing 2 of 8 ...processing 3 of 8 ...processing 4 of 8 ...processing 5 of 8 ...processing 6 of 8 ...processing 7 of 8 ...processing 0 of 13 ...processing 1 of 13 ...processing 2 of 13 ...processing 3 of 13 ...processing 4 of 13 ...processing 5 of 13 ...processing 6 of 13 ...processing 7 of 13 ...processing 8 of 13 ...processing 9 of 13 ...processing 10 of 13 ...processing 11 of 13 ...processing 12 of 13 annotating transcripts Making transcripts clustering transcripts into genes for annotations Processing transcripts into genes ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Calling translate without a seq argument! STACK: Error::throw STACK: Bio::Root::Root::throw /home/ec2-user/bin/BioPerl-1.6.923/Bio/Root/Root.pm:486 STACK: Bio::Tools::CodonTable::translate /home/ec2-user/bin/BioPerl-1.6.923/Bio/Tools/CodonTable.pm:414 STACK: PhatHit_utils::_adjust /home/ec2-user/bin/maker/bin/../lib/PhatHit_utils.pm:846 STACK: PhatHit_utils::adjust_start_stop /home/ec2-user/bin/maker/bin/../lib/PhatHit_utils.pm:794 STACK: maker::auto_annotator::load_transcript_struct /home/ec2-user/bin/maker/bin/../lib/maker/auto_annotator.pm:2198 STACK: maker::auto_annotator::group_transcripts /home/ec2-user/bin/maker/bin/../lib/maker/auto_annotator.pm:2676 STACK: maker::auto_annotator::annotate_genes /home/ec2-user/bin/maker/bin/../lib/maker/auto_annotator.pm:1018 STACK: Process::MpiChunk::_go /home/ec2-user/bin/maker/bin/../lib/Process/MpiChunk.pm:3847 STACK: Process::MpiChunk::run /home/ec2-user/bin/maker/bin/../lib/Process/MpiChunk.pm:341 STACK: Process::MpiChunk::run_all /home/ec2-user/bin/maker/bin/../lib/Process/MpiChunk.pm:357 STACK: Process::MpiTiers::run_all /home/ec2-user/bin/maker/bin/../lib/Process/MpiTiers.pm:287 STACK: Process::MpiTiers::run_all /home/ec2-user/bin/maker/bin/../lib/Process/MpiTiers.pm:287 STACK: /home/ec2-user/bin/maker/bin/maker:686 ----------------------------------------------------------- --> rank=NA, hostname=ip-172-31-35-77.us-west-2.compute.internal ERROR: Failed while clustering transcripts into genes for annotations ERROR: Chunk failed at level:2, tier_type:4 FAILED CONTIG:ANCCEYDFT_Contig1675 ERROR: Chunk failed at level:6, tier_type:0 FAILED CONTIG:ANCCEYDFT_Contig1675 examining contents of the fasta file and run log --Next Contig-- Processing run.log file... Maker is now finished!!! Start_time: 1445912338 End_time: 1445913817 Elapsed: 1479 ++++++++++++++++++++++++++++++++++ The root of the error seems clearly stated: MSG: Calling translate without a seq argument! but I don't know what that means in real terms. All my inputs appear valid. The contig I am testing with has 1 plus strand gene represented in the evidence files. And I've set a local TMP directory since I've read that sometimes these kinds of problems can stem from the program running out of TMP space. I am pretty sure that is not happening here (I put TMP on a disk with 1.1Tb of space, and the test contig is only 13kbp). Can anyone help me figure out what is going on? Thanks, John Martin ____ This email message is a private communication. The information transmitted, including attachments, is intended only for the person or entity to which it is addressed and may contain confidential, privileged, and/or proprietary material. Any review, duplication, retransmission, distribution, or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is unauthorized by the sender and is prohibited. If you have received this message in error, please contact the sender immediately by return email and delete the original message from all computer systems. Thank you. From roscito at mpi-cbg.de Mon Oct 5 06:06:57 2015 From: roscito at mpi-cbg.de (roscito) Date: Mon, 5 Oct 2015 14:06:57 +0200 Subject: [maker-devel] different exons predicted in different maker rounds Message-ID: <2FD50373-B02F-49C0-8414-EF6705BB5826@mpi-cbg.de> Dear all, First of all, I'd like to thank everyone in this forum for all the tips and comments on the best strategies for running MAKER, they have been really helpful so far. However, I still don't fully understand the behaviour of MAKER when ran iteratively, and I compare the predictions from each round. Let me explain: My input data are the following: - the repeat-masked genome of a vertebrate (~2Gb); - mRNA data for this species mapped to the genome with tophat2 and assembled into transcripts with cufflinks; - exonerate-mapped proteins in gff3 format to the reference genome, from closely related species (global alignment) For the first round of MAKER, I provided both cufflinks and exonerate-mapped proteins with the options est2genome and protein2genome = 1. From maker output, I generated the SNAP .hmm file (as the instructions in http://gmod.org/wiki/MAKER_Tutorial) and provided it as input to the second round of MAKER. For this second round I still gave cufflinks + exonerated proteins, but switched both est2genome ad protein2genome to 0. After finished, I generated SNAP .hmm once more and provided it for the 3rd and final round of MAKER, along with cufflinks and exonerated-mapped prots and est/prot2genome=0 As sort of a sanity check, I went on and ran a 4th round of MAKER with the SNAP .hmm file from round3, cufflinks and exonerated-mapped prots and est/prot2genome=0, and this time specifying alt_splice=1. For all the rounds, I also specified single_exon=1. I loaded the gene predictions from each round plus the cufflink transcripts and the exonerated proteins to the genome browser to visually inspect the output. I saw a few strange cases where MAKER doesn't seem to use the protein/mRNA evidences for the gene predictions, and I would greatly appreciate any feedback/ideas on what I could possible be doing wrong. Here are a few screenshots so you know what I'm talking about: In this first example, MAKER misses a conserved exon for which there is both protein and mRNA evidence, and only if I specify alt_splice I get the exon 'back'. In this second example, MAKER completely ignores lots of exons, all conserved across vertebrates, and supported by protein/mRNA evidence. In the third example, there is no prediction from round1, the one from round2 matches the protein/mRNA evidence, and then in the final round3 and 4, an extra exon appears. (hope you'l be able to see the images above) As I said, I would greatly appreciate any feedback on these strange cases. Perhaps I'm missing some parameter(s)? Thanks a lot. All the best, Juliana -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: example1.png Type: image/png Size: 53178 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: example2.png Type: image/png Size: 55134 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: example3.png Type: image/png Size: 67598 bytes Desc: not available URL: From carsonhh at gmail.com Fri Oct 9 12:58:02 2015 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 9 Oct 2015 12:58:02 -0600 Subject: [maker-devel] different exons predicted in different maker rounds In-Reply-To: <2FD50373-B02F-49C0-8414-EF6705BB5826@mpi-cbg.de> References: <2FD50373-B02F-49C0-8414-EF6705BB5826@mpi-cbg.de> Message-ID: <1D5FAF7F-4758-4C4A-90EB-31C7BAE34725@gmail.com> Some of your cufflinks evidence is contradicting the existence of the exon. When you have alt_splice=0, the evidence is passed in in it?s entirety, and all sources are equal. When you have alt_splice=1 set, certain pieces of spliced evidence are given higher priority (in an iterative fashion), and cannot be overridden by contradictory evidence. The result is that there is a specific combination of evidence based hints that allow the gene predictor to find the exon, but when it sees everything the HMM doesn?t score the exon as very likely. I?d recommend not running with cufflinks because its result usually have low specificity, so it generates a lot of bad hints. Use Trinity instead to assembly everything, then allow it to be aligned inside of MAKER. Trinity assembled contigs give much greater specificity in the final results. ?Carson > On Oct 5, 2015, at 6:06 AM, roscito wrote: > > Dear all, > > First of all, I'd like to thank everyone in this forum for all the tips and comments on the best strategies for running MAKER, they have been really helpful so far. > However, I still don't fully understand the behaviour of MAKER when ran iteratively, and I compare the predictions from each round. Let me explain: > > My input data are the following: > - the repeat-masked genome of a vertebrate (~2Gb); > - mRNA data for this species mapped to the genome with tophat2 and assembled into transcripts with cufflinks; > - exonerate-mapped proteins in gff3 format to the reference genome, from closely related species (global alignment) > > For the first round of MAKER, I provided both cufflinks and exonerate-mapped proteins with the options est2genome and protein2genome = 1. From maker output, I generated the SNAP .hmm file (as the instructions in http://gmod.org/wiki/MAKER_Tutorial ) and provided it as input to the second round of MAKER. > For this second round I still gave cufflinks + exonerated proteins, but switched both est2genome ad protein2genome to 0. After finished, I generated SNAP .hmm once more and provided it for the 3rd and final round of MAKER, along with cufflinks and exonerated-mapped prots and est/prot2genome=0 > > As sort of a sanity check, I went on and ran a 4th round of MAKER with the SNAP .hmm file from round3, cufflinks and exonerated-mapped prots and est/prot2genome=0, and this time specifying alt_splice=1. > For all the rounds, I also specified single_exon=1. > > > I loaded the gene predictions from each round plus the cufflink transcripts and the exonerated proteins to the genome browser to visually inspect the output. I saw a few strange cases where MAKER doesn't seem to use the protein/mRNA evidences for the gene predictions, and I would greatly appreciate any feedback/ideas on what I could possible be doing wrong. Here are a few screenshots so you know what I'm talking about: > > In this first example, MAKER misses a conserved exon for which there is both protein and mRNA evidence, and only if I specify alt_splice I get the exon 'back'. > > > > In this second example, MAKER completely ignores lots of exons, all conserved across vertebrates, and supported by protein/mRNA evidence. > > > > In the third example, there is no prediction from round1, the one from round2 matches the protein/mRNA evidence, and then in the final round3 and 4, an extra exon appears. > > > > > (hope you'l be able to see the images above) > As I said, I would greatly appreciate any feedback on these strange cases. Perhaps I'm missing some parameter(s)? > > Thanks a lot. > All the best, > Juliana > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From kmkocot at gmail.com Fri Oct 16 22:10:14 2015 From: kmkocot at gmail.com (Kevin Kocot) Date: Sat, 17 Oct 2015 14:10:14 +1000 Subject: [maker-devel] Maker not producing expected output Message-ID: <5621CA26.8050807@uq.edu.au> Hello, I've run Maker on a draft invertebrate genome and it seemed to finish successfully. However, many of the expected output files were not produced. If I go to, for example, XX_datastore/00/0C/scaffold-334630/, all I see is: theVoid.scaffold-334630 run.log scaffold-334630.gff In particular, I'm looking for the transcripts and proteins fasta files. I'm sure I have a configuration setting incorrect or one of the dependencies not correctly installed, but I can't figure out what the problem is. Any thoughts on how I can resolve this issue and generate these files? Ideally I would love to be able to generate these files without having to run the whole pipeline again. Details on my configuration settings and the contents of the run.log file from my example above are pasted below. Thank you, Kevin ----- run.log from the example folder above looks like this: ----- SHARED_ID d574e9ca9b0019a9fe147ccb9db3588b CTL_OPTIONS maker_gff CTL_OPTIONS other_gff CTL_OPTIONS est test-transcriptome.fa CTL_OPTIONS est_reads CTL_OPTIONS altest KK273.fa CTL_OPTIONS est_gff CTL_OPTIONS altest_gff CTL_OPTIONS protein test-AA.fa CTL_OPTIONS protein_gff CTL_OPTIONS model_org all CTL_OPTIONS repeat_protein te_proteins.fasta CTL_OPTIONS rmlib CTL_OPTIONS rm_gff CTL_OPTIONS organism_type eukaryotic CTL_OPTIONS predictor est2genome,genemark,protein2genome CTL_OPTIONS est2genome 1 CTL_OPTIONS altest2genome 0 CTL_OPTIONS snaphmm CTL_OPTIONS gmhmm output/gmhmm.mod CTL_OPTIONS augustus_species CTL_OPTIONS fgenesh_par_file CTL_OPTIONS model_gff CTL_OPTIONS pred_gff CTL_OPTIONS max_dna_len 100000 CTL_OPTIONS split_hit 10000 CTL_OPTIONS pred_flank 200 CTL_OPTIONS pred_stats 0 CTL_OPTIONS min_protein 0 CTL_OPTIONS AED_threshold 1 CTL_OPTIONS single_exon 0 CTL_OPTIONS single_length 250 CTL_OPTIONS keep_preds 0 CTL_OPTIONS map_forward 0 CTL_OPTIONS est_forward 0 CTL_OPTIONS correct_est_fusion 0 CTL_OPTIONS alt_splice 0 CTL_OPTIONS always_complete 0 CTL_OPTIONS alt_peptide C CTL_OPTIONS evaluate 0 CTL_OPTIONS blast_type ncbi+ CTL_OPTIONS softmask 1 CTL_OPTIONS pcov_blastn 0.8 CTL_OPTIONS pid_blastn 0.85 CTL_OPTIONS eval_blastn 1e-10 CTL_OPTIONS bit_blastn 40 CTL_OPTIONS depth_blastn 0 CTL_OPTIONS pcov_rm_blastx 0.5 CTL_OPTIONS pid_rm_blastx 0.4 CTL_OPTIONS eval_rm_blastx 1e-06 CTL_OPTIONS bit_rm_blastx 30 CTL_OPTIONS pcov_blastx 0.5 CTL_OPTIONS pid_blastx 0.4 CTL_OPTIONS depth_blastx 0 CTL_OPTIONS eval_blastx 1e-06 CTL_OPTIONS bit_blastx 30 CTL_OPTIONS pcov_tblastx 0.8 CTL_OPTIONS pid_tblastx 0.85 CTL_OPTIONS eval_tblastx 1e-10 CTL_OPTIONS bit_tblastx 40 CTL_OPTIONS depth_tblastx 0 CTL_OPTIONS ep_score_limit 20 CTL_OPTIONS en_score_limit 20 CTL_OPTIONS enable_fathom 0 CTL_OPTIONS unmask 0 CTL_OPTIONS model_pass 0 CTL_OPTIONS est_pass 0 CTL_OPTIONS altest_pass 0 CTL_OPTIONS protein_pass 0 CTL_OPTIONS rm_pass 0 CTL_OPTIONS other_pass 0 CTL_OPTIONS pred_pass 0 CTL_OPTIONS run genemark LOGCHILD /media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/run.log.child.0 LOGCHILD /media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/run.log.child.0 LOGCHILD /media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/run.log.child.0 STARTED test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/scaffold-334630.abinit_nomask.0.gmhmm%2Emod.genemark FINISHED test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/scaffold-334630.abinit_nomask.0.gmhmm%2Emod.genemark STARTED test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/scaffold-334630.0.pred.raw.section FINISHED test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/scaffold-334630.0.pred.raw.section LOGCHILD /media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/run.log.child.0 LOGCHILD /media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/run.log.child.0 LOGCHILD /media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/run.log.child.0 STARTED test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/scaffold-334630.0.final.section FINISHED test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/scaffold-334630.0.final.section LOGCHILD /media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/run.log.child.0 LOGCHILD /media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/run.log.child.0 LOGCHILD /media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/run.log.child.0 ----- maker_opts ----- #-----Genome (these are always required) genome=/media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.fas #genome sequence (fasta file or fasta embeded in GFF3 file) organism_type=eukaryotic #eukaryotic or prokaryotic. Default is eukaryotic #-----Re-annotation Using MAKER Derived GFF3 maker_gff= #MAKER derived GFF3 file est_pass=0 #use ESTs in maker_gff: 1 = yes, 0 = no altest_pass=0 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no protein_pass=0 #use protein alignments in maker_gff: 1 = yes, 0 = no rm_pass=0 #use repeats in maker_gff: 1 = yes, 0 = no model_pass=0 #use gene models in maker_gff: 1 = yes, 0 = no pred_pass=0 #use ab-initio predictions in maker_gff: 1 = yes, 0 = no other_pass=0 #passthrough anyything else in maker_gff: 1 = yes, 0 = no #-----EST Evidence (for best results provide a file for at least one) est=/media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test-transcriptome.fa #set of ESTs or assembled mRNA-seq in fasta format altest=/media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/KK273.fa #EST/cDNA sequence file in fasta format from an alternate organism est_gff= #aligned ESTs or mRNA-seq from an external GFF3 file altest_gff= #aligned ESTs from a closly relate species in GFF3 format #-----Protein Homology Evidence (for best results provide a file for at least one) protein=/media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test-AA.fa #protein sequence file in fasta format (i.e. from mutiple oransisms) protein_gff= #aligned protein homology evidence from an external GFF3 file #-----Repeat Masking (leave values blank to skip repeat masking) model_org=all #select a model organism for RepBase masking in RepeatMasker rmlib= #provide an organism specific repeat library in fasta format for RepeatMasker repeat_protein=/usr/local/bin/maker/data/te_proteins.fasta #provide a fasta file of transposable element proteins for RepeatRunner rm_gff= #pre-identified repeat elements from an external GFF3 file prok_rm=0 #forces MAKER to repeatmask prokaryotes (no reason to change this), 1 = yes, 0 = no softmask=1 #use soft-masking rather than hard-masking in BLAST (i.e. seg and dust filtering) #-----Gene Prediction snaphmm= #SNAP HMM file gmhmm=/media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/output/gmhmm.mod #GeneMark HMM file augustus_species= #Augustus gene prediction species model fgenesh_par_file= #FGENESH parameter file pred_gff= #ab-initio predictions from an external GFF3 file model_gff= #annotated gene models from an external GFF3 file (annotation pass-through) est2genome=1 #infer gene predictions directly from ESTs, 1 = yes, 0 = no protein2genome=1 #infer predictions from protein homology, 1 = yes, 0 = no trna=0 #find tRNAs with tRNAscan, 1 = yes, 0 = no snoscan_rrna= #rRNA file to have Snoscan find snoRNAs unmask=0 #also run ab-initio prediction programs on unmasked sequence, 1 = yes, 0 = no #-----Other Annotation Feature Types (features MAKER doesn't recognize) other_gff= #extra features to pass-through to final MAKER generated GFF3 file #-----External Application Behavior Options alt_peptide=C #amino acid used to replace non-standard amino acids in BLAST databases cpus=1 #max number of cpus to use in BLAST and RepeatMasker (not for MPI, leave 1 when using MPI) #-----MAKER Behavior Options max_dna_len=100000 #length for dividing up contigs into chunks (increases/decreases memory usage) min_contig=1 #skip genome contigs below this length (under 10kb are often useless) pred_flank=200 #flank for extending evidence clusters sent to gene predictors pred_stats=0 #report AED and QI statistics for all predictions as well as models AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and 1) min_protein=0 #require at least this many amino acids in predicted proteins alt_splice=0 #Take extra steps to try and find alternative splicing, 1 = yes, 0 = no always_complete=0 #extra steps to force start and stop codons, 1 = yes, 0 = no map_forward=0 #map names and attributes forward from old GFF3 genes, 1 = yes, 0 = no keep_preds=0 #Concordance threshold to add unsupported gene prediction (bound by 0 and 1) split_hit=10000 #length for the splitting of hits (expected max intron size for evidence alignments) single_exon=0 #consider single exon EST evidence when generating annotations, 1 = yes, 0 = no single_length=250 #min length required for single exon ESTs if 'single_exon is enabled' correct_est_fusion=0 #limits use of ESTs in annotation to avoid fusion genes tries=2 #number of times to try a contig if there is a failure for some reason clean_try=0 #remove all data from previous run before retrying, 1 = yes, 0 = no clean_up=0 #removes theVoid directory with individual analysis files, 1 = yes, 0 = no TMP= #specify a directory other than the system default temporary directory for temporary files -- Kevin M. Kocot, Ph.D. NSF International Postdoctoral Research Fellow Degnan Lab The University of Queensland School of Biological Sciences 325 Goddard Building 8 St. Lucia, QLD 4072 Australia Ph: +61 0402 488 430 From dence at genetics.utah.edu Sat Oct 17 09:46:09 2015 From: dence at genetics.utah.edu (Daniel Ence) Date: Sat, 17 Oct 2015 15:46:09 +0000 Subject: [maker-devel] Maker not producing expected output In-Reply-To: <5621CA26.8050807@uq.edu.au> References: <5621CA26.8050807@uq.edu.au> Message-ID: Hi Kevin, So I have a couple of clarifying questions, and an explanation that?ll hopefully be helpful. If you look in the master datastore log, do you see an entry that shows that scaffold finished successfully? It will have the name of the scaffold, then the path to the results directory, and then a status. There should be one that shows that maker started working on it, and one that shows that maker finished it. Second what are the files that you?re expecting to see? I think you?re expecting to see couple of fasta files and a gff3 file that contain all the annotation results all gathered together. You can gather those results with the fasta_merge, and gff3_merge scripts that came with maker. To explain what you saw in that example results directory that you sent, if there weren?t any models or predictions on that scaffold, then there won?t be fasta files in the results directory. You could verify that by looking at the scaffold-334630.gff file. The fast_merge, and gff3_merge will gather all of the results fasta and gff files for all the scaffolds and put them into a few fasta files and one gff3 files, respectively. Let me know whether that helps, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 > On Oct 16, 2015, at 10:10 PM, Kevin Kocot wrote: > > Hello, > > I've run Maker on a draft invertebrate genome and it seemed to finish successfully. However, many of the expected output files were not produced. If I go to, for example, XX_datastore/00/0C/scaffold-334630/, all I see is: > > theVoid.scaffold-334630 > run.log > scaffold-334630.gff > > In particular, I'm looking for the transcripts and proteins fasta files. I'm sure I have a configuration setting incorrect or one of the dependencies not correctly installed, but I can't figure out what the problem is. Any thoughts on how I can resolve this issue and generate these files? Ideally I would love to be able to generate these files without having to run the whole pipeline again. Details on my configuration settings and the contents of the run.log file from my example above are pasted below. > > Thank you, > Kevin > > ----- > run.log from the example folder above looks like this: > ----- > SHARED_ID d574e9ca9b0019a9fe147ccb9db3588b > CTL_OPTIONS maker_gff > CTL_OPTIONS other_gff > CTL_OPTIONS est test-transcriptome.fa > CTL_OPTIONS est_reads > CTL_OPTIONS altest KK273.fa > CTL_OPTIONS est_gff > CTL_OPTIONS altest_gff > CTL_OPTIONS protein test-AA.fa > CTL_OPTIONS protein_gff > CTL_OPTIONS model_org all > CTL_OPTIONS repeat_protein te_proteins.fasta > CTL_OPTIONS rmlib > CTL_OPTIONS rm_gff > CTL_OPTIONS organism_type eukaryotic > CTL_OPTIONS predictor est2genome,genemark,protein2genome > CTL_OPTIONS est2genome 1 > CTL_OPTIONS altest2genome 0 > CTL_OPTIONS snaphmm > CTL_OPTIONS gmhmm output/gmhmm.mod > CTL_OPTIONS augustus_species > CTL_OPTIONS fgenesh_par_file > CTL_OPTIONS model_gff > CTL_OPTIONS pred_gff > CTL_OPTIONS max_dna_len 100000 > CTL_OPTIONS split_hit 10000 > CTL_OPTIONS pred_flank 200 > CTL_OPTIONS pred_stats 0 > CTL_OPTIONS min_protein 0 > CTL_OPTIONS AED_threshold 1 > CTL_OPTIONS single_exon 0 > CTL_OPTIONS single_length 250 > CTL_OPTIONS keep_preds 0 > CTL_OPTIONS map_forward 0 > CTL_OPTIONS est_forward 0 > CTL_OPTIONS correct_est_fusion 0 > CTL_OPTIONS alt_splice 0 > CTL_OPTIONS always_complete 0 > CTL_OPTIONS alt_peptide C > CTL_OPTIONS evaluate 0 > CTL_OPTIONS blast_type ncbi+ > CTL_OPTIONS softmask 1 > CTL_OPTIONS pcov_blastn 0.8 > CTL_OPTIONS pid_blastn 0.85 > CTL_OPTIONS eval_blastn 1e-10 > CTL_OPTIONS bit_blastn 40 > CTL_OPTIONS depth_blastn 0 > CTL_OPTIONS pcov_rm_blastx 0.5 > CTL_OPTIONS pid_rm_blastx 0.4 > CTL_OPTIONS eval_rm_blastx 1e-06 > CTL_OPTIONS bit_rm_blastx 30 > CTL_OPTIONS pcov_blastx 0.5 > CTL_OPTIONS pid_blastx 0.4 > CTL_OPTIONS depth_blastx 0 > CTL_OPTIONS eval_blastx 1e-06 > CTL_OPTIONS bit_blastx 30 > CTL_OPTIONS pcov_tblastx 0.8 > CTL_OPTIONS pid_tblastx 0.85 > CTL_OPTIONS eval_tblastx 1e-10 > CTL_OPTIONS bit_tblastx 40 > CTL_OPTIONS depth_tblastx 0 > CTL_OPTIONS ep_score_limit 20 > CTL_OPTIONS en_score_limit 20 > CTL_OPTIONS enable_fathom 0 > CTL_OPTIONS unmask 0 > CTL_OPTIONS model_pass 0 > CTL_OPTIONS est_pass 0 > CTL_OPTIONS altest_pass 0 > CTL_OPTIONS protein_pass 0 > CTL_OPTIONS rm_pass 0 > CTL_OPTIONS other_pass 0 > CTL_OPTIONS pred_pass 0 > CTL_OPTIONS run genemark > LOGCHILD /media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/run.log.child.0 > LOGCHILD /media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/run.log.child.0 > LOGCHILD /media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/run.log.child.0 > STARTED test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/scaffold-334630.abinit_nomask.0.gmhmm%2Emod.genemark > FINISHED test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/scaffold-334630.abinit_nomask.0.gmhmm%2Emod.genemark > STARTED test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/scaffold-334630.0.pred.raw.section > FINISHED test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/scaffold-334630.0.pred.raw.section > LOGCHILD /media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/run.log.child.0 > LOGCHILD /media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/run.log.child.0 > LOGCHILD /media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/run.log.child.0 > STARTED test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/scaffold-334630.0.final.section > FINISHED test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/scaffold-334630.0.final.section > LOGCHILD /media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/run.log.child.0 > LOGCHILD /media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/run.log.child.0 > LOGCHILD /media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/run.log.child.0 > > ----- > maker_opts > ----- > #-----Genome (these are always required) > genome=/media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.fas #genome sequence (fasta file or fasta embeded in GFF3 file) > organism_type=eukaryotic #eukaryotic or prokaryotic. Default is eukaryotic > > #-----Re-annotation Using MAKER Derived GFF3 > maker_gff= #MAKER derived GFF3 file > est_pass=0 #use ESTs in maker_gff: 1 = yes, 0 = no > altest_pass=0 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no > protein_pass=0 #use protein alignments in maker_gff: 1 = yes, 0 = no > rm_pass=0 #use repeats in maker_gff: 1 = yes, 0 = no > model_pass=0 #use gene models in maker_gff: 1 = yes, 0 = no > pred_pass=0 #use ab-initio predictions in maker_gff: 1 = yes, 0 = no > other_pass=0 #passthrough anyything else in maker_gff: 1 = yes, 0 = no > > #-----EST Evidence (for best results provide a file for at least one) > est=/media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test-transcriptome.fa #set of ESTs or assembled mRNA-seq in fasta format > altest=/media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/KK273.fa #EST/cDNA sequence file in fasta format from an alternate organism > est_gff= #aligned ESTs or mRNA-seq from an external GFF3 file > altest_gff= #aligned ESTs from a closly relate species in GFF3 format > > #-----Protein Homology Evidence (for best results provide a file for at least one) > protein=/media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test-AA.fa #protein sequence file in fasta format (i.e. from mutiple oransisms) > protein_gff= #aligned protein homology evidence from an external GFF3 file > > #-----Repeat Masking (leave values blank to skip repeat masking) > model_org=all #select a model organism for RepBase masking in RepeatMasker > rmlib= #provide an organism specific repeat library in fasta format for RepeatMasker > repeat_protein=/usr/local/bin/maker/data/te_proteins.fasta #provide a fasta file of transposable element proteins for RepeatRunner > rm_gff= #pre-identified repeat elements from an external GFF3 file > prok_rm=0 #forces MAKER to repeatmask prokaryotes (no reason to change this), 1 = yes, 0 = no > softmask=1 #use soft-masking rather than hard-masking in BLAST (i.e. seg and dust filtering) > > #-----Gene Prediction > snaphmm= #SNAP HMM file > gmhmm=/media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/output/gmhmm.mod #GeneMark HMM file > augustus_species= #Augustus gene prediction species model > fgenesh_par_file= #FGENESH parameter file > pred_gff= #ab-initio predictions from an external GFF3 file > model_gff= #annotated gene models from an external GFF3 file (annotation pass-through) > est2genome=1 #infer gene predictions directly from ESTs, 1 = yes, 0 = no > protein2genome=1 #infer predictions from protein homology, 1 = yes, 0 = no > trna=0 #find tRNAs with tRNAscan, 1 = yes, 0 = no > snoscan_rrna= #rRNA file to have Snoscan find snoRNAs > unmask=0 #also run ab-initio prediction programs on unmasked sequence, 1 = yes, 0 = no > > #-----Other Annotation Feature Types (features MAKER doesn't recognize) > other_gff= #extra features to pass-through to final MAKER generated GFF3 file > > #-----External Application Behavior Options > alt_peptide=C #amino acid used to replace non-standard amino acids in BLAST databases > cpus=1 #max number of cpus to use in BLAST and RepeatMasker (not for MPI, leave 1 when using MPI) > > #-----MAKER Behavior Options > max_dna_len=100000 #length for dividing up contigs into chunks (increases/decreases memory usage) > min_contig=1 #skip genome contigs below this length (under 10kb are often useless) > > pred_flank=200 #flank for extending evidence clusters sent to gene predictors > pred_stats=0 #report AED and QI statistics for all predictions as well as models > AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and 1) > min_protein=0 #require at least this many amino acids in predicted proteins > alt_splice=0 #Take extra steps to try and find alternative splicing, 1 = yes, 0 = no > always_complete=0 #extra steps to force start and stop codons, 1 = yes, 0 = no > map_forward=0 #map names and attributes forward from old GFF3 genes, 1 = yes, 0 = no > keep_preds=0 #Concordance threshold to add unsupported gene prediction (bound by 0 and 1) > > split_hit=10000 #length for the splitting of hits (expected max intron size for evidence alignments) > single_exon=0 #consider single exon EST evidence when generating annotations, 1 = yes, 0 = no > single_length=250 #min length required for single exon ESTs if 'single_exon is enabled' > correct_est_fusion=0 #limits use of ESTs in annotation to avoid fusion genes > > tries=2 #number of times to try a contig if there is a failure for some reason > clean_try=0 #remove all data from previous run before retrying, 1 = yes, 0 = no > clean_up=0 #removes theVoid directory with individual analysis files, 1 = yes, 0 = no > TMP= #specify a directory other than the system default temporary directory for temporary files > > -- > Kevin M. Kocot, Ph.D. > NSF International Postdoctoral Research Fellow > Degnan Lab > The University of Queensland > School of Biological Sciences > 325 Goddard Building 8 > St. Lucia, QLD 4072 > Australia > Ph: +61 0402 488 430 > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Sat Oct 17 14:24:42 2015 From: carsonhh at gmail.com (Carson Holt) Date: Sat, 17 Oct 2015 14:24:42 -0600 Subject: [maker-devel] Maker not producing expected output In-Reply-To: <5621CA26.8050807@uq.edu.au> References: <5621CA26.8050807@uq.edu.au> Message-ID: <1117A475-526B-477A-B44A-E86A4A23262B@gmail.com> You will only get fasta files for the contig when there are gene models present on that contig. The only ab initio predictor you provided parameters for is GeneMark, and apparently it did not predict any genes for the contig in question. If it did you would have at least a fasta files in the output that contained all predictions made by GeneMark. MAKER doesn?t make gene models, rather it provides hints to other gene predictors based on the evidence alignments and then promotes and polishes the models they make. If they produce no models, then you will get no results. You can try adding additional gene predictors like SNAP (incase GeneMark just isn?t performing well), or you can check the length of your contig (contigs shorter than about 10kb rarely produce any results - they are too short to be annotatable). Try looking at the results from one of the larger contigs, or use fasta_merge to gather all results from all contigs. ?Carson > On Oct 16, 2015, at 10:10 PM, Kevin Kocot wrote: > > Hello, > > I've run Maker on a draft invertebrate genome and it seemed to finish successfully. However, many of the expected output files were not produced. If I go to, for example, XX_datastore/00/0C/scaffold-334630/, all I see is: > > theVoid.scaffold-334630 > run.log > scaffold-334630.gff > > In particular, I'm looking for the transcripts and proteins fasta files. I'm sure I have a configuration setting incorrect or one of the dependencies not correctly installed, but I can't figure out what the problem is. Any thoughts on how I can resolve this issue and generate these files? Ideally I would love to be able to generate these files without having to run the whole pipeline again. Details on my configuration settings and the contents of the run.log file from my example above are pasted below. > > Thank you, > Kevin > > ----- > run.log from the example folder above looks like this: > ----- > SHARED_ID d574e9ca9b0019a9fe147ccb9db3588b > CTL_OPTIONS maker_gff > CTL_OPTIONS other_gff > CTL_OPTIONS est test-transcriptome.fa > CTL_OPTIONS est_reads > CTL_OPTIONS altest KK273.fa > CTL_OPTIONS est_gff > CTL_OPTIONS altest_gff > CTL_OPTIONS protein test-AA.fa > CTL_OPTIONS protein_gff > CTL_OPTIONS model_org all > CTL_OPTIONS repeat_protein te_proteins.fasta > CTL_OPTIONS rmlib > CTL_OPTIONS rm_gff > CTL_OPTIONS organism_type eukaryotic > CTL_OPTIONS predictor est2genome,genemark,protein2genome > CTL_OPTIONS est2genome 1 > CTL_OPTIONS altest2genome 0 > CTL_OPTIONS snaphmm > CTL_OPTIONS gmhmm output/gmhmm.mod > CTL_OPTIONS augustus_species > CTL_OPTIONS fgenesh_par_file > CTL_OPTIONS model_gff > CTL_OPTIONS pred_gff > CTL_OPTIONS max_dna_len 100000 > CTL_OPTIONS split_hit 10000 > CTL_OPTIONS pred_flank 200 > CTL_OPTIONS pred_stats 0 > CTL_OPTIONS min_protein 0 > CTL_OPTIONS AED_threshold 1 > CTL_OPTIONS single_exon 0 > CTL_OPTIONS single_length 250 > CTL_OPTIONS keep_preds 0 > CTL_OPTIONS map_forward 0 > CTL_OPTIONS est_forward 0 > CTL_OPTIONS correct_est_fusion 0 > CTL_OPTIONS alt_splice 0 > CTL_OPTIONS always_complete 0 > CTL_OPTIONS alt_peptide C > CTL_OPTIONS evaluate 0 > CTL_OPTIONS blast_type ncbi+ > CTL_OPTIONS softmask 1 > CTL_OPTIONS pcov_blastn 0.8 > CTL_OPTIONS pid_blastn 0.85 > CTL_OPTIONS eval_blastn 1e-10 > CTL_OPTIONS bit_blastn 40 > CTL_OPTIONS depth_blastn 0 > CTL_OPTIONS pcov_rm_blastx 0.5 > CTL_OPTIONS pid_rm_blastx 0.4 > CTL_OPTIONS eval_rm_blastx 1e-06 > CTL_OPTIONS bit_rm_blastx 30 > CTL_OPTIONS pcov_blastx 0.5 > CTL_OPTIONS pid_blastx 0.4 > CTL_OPTIONS depth_blastx 0 > CTL_OPTIONS eval_blastx 1e-06 > CTL_OPTIONS bit_blastx 30 > CTL_OPTIONS pcov_tblastx 0.8 > CTL_OPTIONS pid_tblastx 0.85 > CTL_OPTIONS eval_tblastx 1e-10 > CTL_OPTIONS bit_tblastx 40 > CTL_OPTIONS depth_tblastx 0 > CTL_OPTIONS ep_score_limit 20 > CTL_OPTIONS en_score_limit 20 > CTL_OPTIONS enable_fathom 0 > CTL_OPTIONS unmask 0 > CTL_OPTIONS model_pass 0 > CTL_OPTIONS est_pass 0 > CTL_OPTIONS altest_pass 0 > CTL_OPTIONS protein_pass 0 > CTL_OPTIONS rm_pass 0 > CTL_OPTIONS other_pass 0 > CTL_OPTIONS pred_pass 0 > CTL_OPTIONS run genemark > LOGCHILD /media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/run.log.child.0 > LOGCHILD /media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/run.log.child.0 > LOGCHILD /media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/run.log.child.0 > STARTED test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/scaffold-334630.abinit_nomask.0.gmhmm%2Emod.genemark > FINISHED test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/scaffold-334630.abinit_nomask.0.gmhmm%2Emod.genemark > STARTED test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/scaffold-334630.0.pred.raw.section > FINISHED test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/scaffold-334630.0.pred.raw.section > LOGCHILD /media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/run.log.child.0 > LOGCHILD /media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/run.log.child.0 > LOGCHILD /media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/run.log.child.0 > STARTED test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/scaffold-334630.0.final.section > FINISHED test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/scaffold-334630.0.final.section > LOGCHILD /media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/run.log.child.0 > LOGCHILD /media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/run.log.child.0 > LOGCHILD /media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/run.log.child.0 > > ----- > maker_opts > ----- > #-----Genome (these are always required) > genome=/media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.fas #genome sequence (fasta file or fasta embeded in GFF3 file) > organism_type=eukaryotic #eukaryotic or prokaryotic. Default is eukaryotic > > #-----Re-annotation Using MAKER Derived GFF3 > maker_gff= #MAKER derived GFF3 file > est_pass=0 #use ESTs in maker_gff: 1 = yes, 0 = no > altest_pass=0 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no > protein_pass=0 #use protein alignments in maker_gff: 1 = yes, 0 = no > rm_pass=0 #use repeats in maker_gff: 1 = yes, 0 = no > model_pass=0 #use gene models in maker_gff: 1 = yes, 0 = no > pred_pass=0 #use ab-initio predictions in maker_gff: 1 = yes, 0 = no > other_pass=0 #passthrough anyything else in maker_gff: 1 = yes, 0 = no > > #-----EST Evidence (for best results provide a file for at least one) > est=/media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test-transcriptome.fa #set of ESTs or assembled mRNA-seq in fasta format > altest=/media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/KK273.fa #EST/cDNA sequence file in fasta format from an alternate organism > est_gff= #aligned ESTs or mRNA-seq from an external GFF3 file > altest_gff= #aligned ESTs from a closly relate species in GFF3 format > > #-----Protein Homology Evidence (for best results provide a file for at least one) > protein=/media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test-AA.fa #protein sequence file in fasta format (i.e. from mutiple oransisms) > protein_gff= #aligned protein homology evidence from an external GFF3 file > > #-----Repeat Masking (leave values blank to skip repeat masking) > model_org=all #select a model organism for RepBase masking in RepeatMasker > rmlib= #provide an organism specific repeat library in fasta format for RepeatMasker > repeat_protein=/usr/local/bin/maker/data/te_proteins.fasta #provide a fasta file of transposable element proteins for RepeatRunner > rm_gff= #pre-identified repeat elements from an external GFF3 file > prok_rm=0 #forces MAKER to repeatmask prokaryotes (no reason to change this), 1 = yes, 0 = no > softmask=1 #use soft-masking rather than hard-masking in BLAST (i.e. seg and dust filtering) > > #-----Gene Prediction > snaphmm= #SNAP HMM file > gmhmm=/media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/output/gmhmm.mod #GeneMark HMM file > augustus_species= #Augustus gene prediction species model > fgenesh_par_file= #FGENESH parameter file > pred_gff= #ab-initio predictions from an external GFF3 file > model_gff= #annotated gene models from an external GFF3 file (annotation pass-through) > est2genome=1 #infer gene predictions directly from ESTs, 1 = yes, 0 = no > protein2genome=1 #infer predictions from protein homology, 1 = yes, 0 = no > trna=0 #find tRNAs with tRNAscan, 1 = yes, 0 = no > snoscan_rrna= #rRNA file to have Snoscan find snoRNAs > unmask=0 #also run ab-initio prediction programs on unmasked sequence, 1 = yes, 0 = no > > #-----Other Annotation Feature Types (features MAKER doesn't recognize) > other_gff= #extra features to pass-through to final MAKER generated GFF3 file > > #-----External Application Behavior Options > alt_peptide=C #amino acid used to replace non-standard amino acids in BLAST databases > cpus=1 #max number of cpus to use in BLAST and RepeatMasker (not for MPI, leave 1 when using MPI) > > #-----MAKER Behavior Options > max_dna_len=100000 #length for dividing up contigs into chunks (increases/decreases memory usage) > min_contig=1 #skip genome contigs below this length (under 10kb are often useless) > > pred_flank=200 #flank for extending evidence clusters sent to gene predictors > pred_stats=0 #report AED and QI statistics for all predictions as well as models > AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and 1) > min_protein=0 #require at least this many amino acids in predicted proteins > alt_splice=0 #Take extra steps to try and find alternative splicing, 1 = yes, 0 = no > always_complete=0 #extra steps to force start and stop codons, 1 = yes, 0 = no > map_forward=0 #map names and attributes forward from old GFF3 genes, 1 = yes, 0 = no > keep_preds=0 #Concordance threshold to add unsupported gene prediction (bound by 0 and 1) > > split_hit=10000 #length for the splitting of hits (expected max intron size for evidence alignments) > single_exon=0 #consider single exon EST evidence when generating annotations, 1 = yes, 0 = no > single_length=250 #min length required for single exon ESTs if 'single_exon is enabled' > correct_est_fusion=0 #limits use of ESTs in annotation to avoid fusion genes > > tries=2 #number of times to try a contig if there is a failure for some reason > clean_try=0 #remove all data from previous run before retrying, 1 = yes, 0 = no > clean_up=0 #removes theVoid directory with individual analysis files, 1 = yes, 0 = no > TMP= #specify a directory other than the system default temporary directory for temporary files > > -- > Kevin M. Kocot, Ph.D. > NSF International Postdoctoral Research Fellow > Degnan Lab > The University of Queensland > School of Biological Sciences > 325 Goddard Building 8 > St. Lucia, QLD 4072 > Australia > Ph: +61 0402 488 430 > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From 14.chewsoklim at gmail.com Mon Oct 19 20:49:21 2015 From: 14.chewsoklim at gmail.com (Sok Lim Chew) Date: Tue, 20 Oct 2015 10:49:21 +0800 Subject: [maker-devel] Failed while doing blastx of proteins Message-ID: Hi all, The following errors occurred while I was using MAKER for annotation. I have searched around this forum but seems like the solutions provided do not works for me. ################################################################# ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Must have defined a valid name for Hit STACK: Error::throw STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:486 STACK: Bio::Search::Hit::GenericHit::new /usr/local/share/perl5/Bio/Search/Hit/GenericHit.pm:149 STACK: Bio::Search::Hit::PhatHit::Base::new maker/bin/../lib/Bio/Search/Hit/PhatHit/Base.pm:127 STACK: Bio::Search::Hit::PhatHit::blastx::new maker/bin/../lib/Bio/Search/Hit/PhatHit/blastx.pm:125 STACK: Bio::Search::Hit::HitFactory::create /usr/local/share/perl5/Bio/Search/Hit/HitFactory.pm:124 STACK: Bio::Factory::ObjectFactoryI::create_object /usr/local/share/perl5/Bio/Factory/ObjectFactoryI.pm:114 STACK: Bio::Search::Iteration::GenericIteration::newhits_below_threshold /usr/local/share/perl5/Bio/Search/Iteration/GenericIteration.pm:506 STACK: Bio::Search::Iteration::GenericIteration::newhits /usr/local/share/perl5/Bio/Search/Iteration/GenericIteration.pm:488 STACK: Bio::Search::Iteration::GenericIteration::hits /usr/local/share/perl5/Bio/Search/Iteration/GenericIteration.pm:469 STACK: Bio::Search::Result::BlastResult::hits /usr/local/share/perl5/Bio/Search/Result/BlastResult.pm:168 STACK: Bio::Search::Result::BlastResult::num_hits /usr/local/share/perl5/Bio/Search/Result/BlastResult.pm:242 STACK: Widget::blastx::keepers maker/bin/../lib/Widget/blastx.pm:164 STACK: Widget::blastx::parse maker/bin/../lib/Widget/blastx.pm:132 STACK: GI::blastx_as_chunks maker/bin/../lib/GI.pm:2457 STACK: GI::blastx_as_chunks maker/bin/../lib/GI.pm:2466 STACK: Process::MpiChunk::_go maker/bin/../lib/Process/MpiChunk.pm:2687 STACK: Process::MpiChunk::run maker/bin/../lib/Process/MpiChunk.pm:341 STACK: Process::MpiChunk::run_all maker/bin/../lib/Process/MpiChunk.pm:357 STACK: Process::MpiTiers::run_all maker/bin/../lib/Process/MpiTiers.pm:287 STACK: Process::MpiTiers::run_all maker/bin/../lib/Process/MpiTiers.pm:287 STACK: maker/bin/maker:686 ----------------------------------------------------------- --> rank=NA, hostname=gena2 --> rank=NA, hostname=gena2 --> rank=NA, hostname=gena2 --> rank=NA, hostname=gena2 ERROR: Failed while doing blastx of proteins ERROR: Chunk failed at level:8, tier_type:3 FAILED CONTIG:Contig1 ERROR: Chunk failed at level:4, tier_type:0 FAILED CONTIG:Contig1 examining contents of the fasta file and run log ########################################################### Is anyone has any idea on this? Thanks, SokLim -------------- next part -------------- An HTML attachment was scrubbed... URL: From 14.chewsoklim at gmail.com Mon Oct 19 21:56:09 2015 From: 14.chewsoklim at gmail.com (Sok Lim Chew) Date: Tue, 20 Oct 2015 11:56:09 +0800 Subject: [maker-devel] Failed while doing blastx of proteins Message-ID: Hi all, The following errors occurred while I was using MAKER for annotation. I have searched around this forum but seems like the solutions provided do not works for me. ################################################################# ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Must have defined a valid name for Hit STACK: Error::throw STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:486 STACK: Bio::Search::Hit::GenericHit::new /usr/local/share/perl5/Bio/Search/Hit/GenericHit.pm:149 STACK: Bio::Search::Hit::PhatHit::Base::new maker/bin/../lib/Bio/Search/Hit/PhatHit/Base.pm:127 STACK: Bio::Search::Hit::PhatHit::blastx::new maker/bin/../lib/Bio/Search/Hit/PhatHit/blastx.pm:125 STACK: Bio::Search::Hit::HitFactory::create /usr/local/share/perl5/Bio/Search/Hit/HitFactory.pm:124 STACK: Bio::Factory::ObjectFactoryI::create_object /usr/local/share/perl5/Bio/Factory/ObjectFactoryI.pm:114 STACK: Bio::Search::Iteration::GenericIteration::newhits_below_threshold /usr/local/share/perl5/Bio/Search/Iteration/GenericIteration.pm:506 STACK: Bio::Search::Iteration::GenericIteration::newhits /usr/local/share/perl5/Bio/Search/Iteration/GenericIteration.pm:488 STACK: Bio::Search::Iteration::GenericIteration::hits /usr/local/share/perl5/Bio/Search/Iteration/GenericIteration.pm:469 STACK: Bio::Search::Result::BlastResult::hits /usr/local/share/perl5/Bio/Search/Result/BlastResult.pm:168 STACK: Bio::Search::Result::BlastResult::num_hits /usr/local/share/perl5/Bio/Search/Result/BlastResult.pm:242 STACK: Widget::blastx::keepers maker/bin/../lib/Widget/blastx.pm:164 STACK: Widget::blastx::parse maker/bin/../lib/Widget/blastx.pm:132 STACK: GI::blastx_as_chunks maker/bin/../lib/GI.pm:2457 STACK: GI::blastx_as_chunks maker/bin/../lib/GI.pm:2466 STACK: Process::MpiChunk::_go maker/bin/../lib/Process/MpiChunk.pm:2687 STACK: Process::MpiChunk::run maker/bin/../lib/Process/MpiChunk.pm:341 STACK: Process::MpiChunk::run_all maker/bin/../lib/Process/MpiChunk.pm:357 STACK: Process::MpiTiers::run_all maker/bin/../lib/Process/MpiTiers.pm:287 STACK: Process::MpiTiers::run_all maker/bin/../lib/Process/MpiTiers.pm:287 STACK: maker/bin/maker:686 ----------------------------------------------------------- --> rank=NA, hostname=gena2 --> rank=NA, hostname=gena2 --> rank=NA, hostname=gena2 --> rank=NA, hostname=gena2 ERROR: Failed while doing blastx of proteins ERROR: Chunk failed at level:8, tier_type:3 FAILED CONTIG:Contig1 ERROR: Chunk failed at level:4, tier_type:0 FAILED CONTIG:Contig1 examining contents of the fasta file and run log ########################################################### Is anyone has any idea on this? Thanks, SokLim -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Oct 20 09:52:33 2015 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 20 Oct 2015 09:52:33 -0600 Subject: [maker-devel] Failed while doing blastx of proteins In-Reply-To: References: Message-ID: <68B95736-10CD-4C90-8092-3AA754250799@gmail.com> Make sure you installed the CPAN version of BioPerl and not BioPerl live (Current version is 1.6.924). Also there are a couple of BLAST+ versions that have bugs. Use version BLAST+ version 2.2.28. What version of MAKER are you using? Should be 2.31.8. Also check that your /tmp directory is not full (will result in truncated output files). Thanks, Carson > On Oct 19, 2015, at 9:56 PM, Sok Lim Chew <14.chewsoklim at gmail.com> wrote: > > Hi all, > > The following errors occurred while I was using MAKER for annotation. I have searched around this forum but seems like the solutions provided do not works for me. > > ################################################################# > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Must have defined a valid name for Hit > STACK: Error::throw > STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:486 > STACK: Bio::Search::Hit::GenericHit::new /usr/local/share/perl5/Bio/Search/Hit/GenericHit.pm:149 > STACK: Bio::Search::Hit::PhatHit::Base::new maker/bin/../lib/Bio/Search/Hit/PhatHit/Base.pm:127 > STACK: Bio::Search::Hit::PhatHit::blastx::new maker/bin/../lib/Bio/Search/Hit/PhatHit/blastx.pm:125 > STACK: Bio::Search::Hit::HitFactory::create /usr/local/share/perl5/Bio/Search/Hit/HitFactory.pm:124 > STACK: Bio::Factory::ObjectFactoryI::create_object /usr/local/share/perl5/Bio/Factory/ObjectFactoryI.pm:114 > STACK: Bio::Search::Iteration::GenericIteration::newhits_below_threshold /usr/local/share/perl5/Bio/Search/Iteration/GenericIteration.pm:506 > STACK: Bio::Search::Iteration::GenericIteration::newhits /usr/local/share/perl5/Bio/Search/Iteration/GenericIteration.pm:488 > STACK: Bio::Search::Iteration::GenericIteration::hits /usr/local/share/perl5/Bio/Search/Iteration/GenericIteration.pm:469 > STACK: Bio::Search::Result::BlastResult::hits /usr/local/share/perl5/Bio/Search/Result/BlastResult.pm:168 > STACK: Bio::Search::Result::BlastResult::num_hits /usr/local/share/perl5/Bio/Search/Result/BlastResult.pm:242 > STACK: Widget::blastx::keepers > maker/bin/../lib/Widget/blastx.pm:164 > STACK: Widget::blastx::parse > maker/bin/../lib/Widget/blastx.pm:132 > STACK: GI::blastx_as_chunks > maker/bin/../lib/GI.pm:2457 > STACK: GI::blastx_as_chunks > maker/bin/../lib/GI.pm:2466 > STACK: Process::MpiChunk::_go > maker/bin/../lib/Process/MpiChunk.pm:2687 > STACK: Process::MpiChunk::run > maker/bin/../lib/Process/MpiChunk.pm:341 > STACK: Process::MpiChunk::run_all > maker/bin/../lib/Process/MpiChunk.pm:357 > STACK: Process::MpiTiers::run_all maker/bin/../lib/Process/MpiTiers.pm:287 > STACK: Process::MpiTiers::run_all maker/bin/../lib/Process/MpiTiers.pm:287 > STACK: maker/bin/maker:686 > ----------------------------------------------------------- > --> rank=NA, hostname=gena2 > --> rank=NA, hostname=gena2 > --> rank=NA, hostname=gena2 > --> rank=NA, hostname=gena2 > ERROR: Failed while doing blastx of proteins > ERROR: Chunk failed at level:8, tier_type:3 > FAILED CONTIG:Contig1 > > ERROR: Chunk failed at level:4, tier_type:0 > FAILED CONTIG:Contig1 > > examining contents of the fasta file and run log > > ########################################################### > > Is anyone has any idea on this? > > Thanks, > SokLim > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From mcsimenc at gmail.com Tue Oct 20 17:54:21 2015 From: mcsimenc at gmail.com (Matt Simenc) Date: Tue, 20 Oct 2015 16:54:21 -0700 Subject: [maker-devel] MPI large load Message-ID: Hi, I am using OpenMPI to run MAKER on 2 nodes with 40 CPUs/node. The load is distributing across the nodes ok but with a very large number of processes on each node. Sometimes there are several hundred more processes than can be executed at one time by a node. Is this a problem? If so, any suggestions on how to fix? Thanks! Matt Simenc Der Lab California State University Fullerton -------------- next part -------------- An HTML attachment was scrubbed... URL: From cjfields at illinois.edu Wed Oct 21 12:44:28 2015 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 21 Oct 2015 18:44:28 +0000 Subject: [maker-devel] Failed while doing blastx of proteins In-Reply-To: <68B95736-10CD-4C90-8092-3AA754250799@gmail.com> References: <68B95736-10CD-4C90-8092-3AA754250799@gmail.com> Message-ID: <5CCEB170-2B41-4CC1-A9CB-C246274345B9@illinois.edu> Agreed. It would be nice to know whether this is a Bioperl bug that needs addressing, but I?m not sure how easy it would be to pull out a test case. chris On Oct 20, 2015, at 10:52 AM, Carson Holt > wrote: Make sure you installed the CPAN version of BioPerl and not BioPerl live (Current version is 1.6.924). Also there are a couple of BLAST+ versions that have bugs. Use version BLAST+ version 2.2.28. What version of MAKER are you using? Should be 2.31.8. Also check that your /tmp directory is not full (will result in truncated output files). Thanks, Carson On Oct 19, 2015, at 9:56 PM, Sok Lim Chew <14.chewsoklim at gmail.com> wrote: Hi all, The following errors occurred while I was using MAKER for annotation. I have searched around this forum but seems like the solutions provided do not works for me. ################################################################# ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Must have defined a valid name for Hit STACK: Error::throw STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:486 STACK: Bio::Search::Hit::GenericHit::new /usr/local/share/perl5/Bio/Search/Hit/GenericHit.pm:149 STACK: Bio::Search::Hit::PhatHit::Base::new maker/bin/../lib/Bio/Search/Hit/PhatHit/Base.pm:127 STACK: Bio::Search::Hit::PhatHit::blastx::new maker/bin/../lib/Bio/Search/Hit/PhatHit/blastx.pm:125 STACK: Bio::Search::Hit::HitFactory::create /usr/local/share/perl5/Bio/Search/Hit/HitFactory.pm:124 STACK: Bio::Factory::ObjectFactoryI::create_object /usr/local/share/perl5/Bio/Factory/ObjectFactoryI.pm:114 STACK: Bio::Search::Iteration::GenericIteration::newhits_below_threshold /usr/local/share/perl5/Bio/Search/Iteration/GenericIteration.pm:506 STACK: Bio::Search::Iteration::GenericIteration::newhits /usr/local/share/perl5/Bio/Search/Iteration/GenericIteration.pm:488 STACK: Bio::Search::Iteration::GenericIteration::hits /usr/local/share/perl5/Bio/Search/Iteration/GenericIteration.pm:469 STACK: Bio::Search::Result::BlastResult::hits /usr/local/share/perl5/Bio/Search/Result/BlastResult.pm:168 STACK: Bio::Search::Result::BlastResult::num_hits /usr/local/share/perl5/Bio/Search/Result/BlastResult.pm:242 STACK: Widget::blastx::keepers maker/bin/../lib/Widget/blastx.pm:164 STACK: Widget::blastx::parse maker/bin/../lib/Widget/blastx.pm:132 STACK: GI::blastx_as_chunks maker/bin/../lib/GI.pm:2457 STACK: GI::blastx_as_chunks maker/bin/../lib/GI.pm:2466 STACK: Process::MpiChunk::_go maker/bin/../lib/Process/MpiChunk.pm:2687 STACK: Process::MpiChunk::run maker/bin/../lib/Process/MpiChunk.pm:341 STACK: Process::MpiChunk::run_all maker/bin/../lib/Process/MpiChunk.pm:357 STACK: Process::MpiTiers::run_all maker/bin/../lib/Process/MpiTiers.pm:287 STACK: Process::MpiTiers::run_all maker/bin/../lib/Process/MpiTiers.pm:287 STACK: maker/bin/maker:686 ----------------------------------------------------------- --> rank=NA, hostname=gena2 --> rank=NA, hostname=gena2 --> rank=NA, hostname=gena2 --> rank=NA, hostname=gena2 ERROR: Failed while doing blastx of proteins ERROR: Chunk failed at level:8, tier_type:3 FAILED CONTIG:Contig1 ERROR: Chunk failed at level:4, tier_type:0 FAILED CONTIG:Contig1 examining contents of the fasta file and run log ########################################################### Is anyone has any idea on this? Thanks, SokLim _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Oct 22 10:39:21 2015 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 22 Oct 2015 10:39:21 -0600 Subject: [maker-devel] MPI large load In-Reply-To: References: Message-ID: <4733DA58-3277-4703-A660-4EE819858694@gmail.com> Because MAKER is a pipeline, all processes it calls will generate separate processes (i.e. BLAST etc.). Also it will spawn a couple of helper processes to watch communication and files. The helper processes use 0% CPU, and the main MAKER process will yield to external system calls and processes until they finish execution. So they will never use a larger % of CPU than is specified. Also the way MPI works is it spawns a separate process for every CPU specified, so if you specify 40 CPUs you get 40 independent communicating processes rather than 1 process accessing 40 CPUs. So if you take into account the MPI processes, helper processes, and external system calls a 40 CPU specification could result in up to three times that many numbered processes existing simultaneously (even though no more than 40 will be active at a time). However if your system is having an issue letting the required number of processes exist, then it is a ulimit issue. Your administrator has the limit set too low. You can see what limits are set using the command ?ulimit -a?. You will need to get your system admin to fix it. ?Carson > On Oct 20, 2015, at 5:54 PM, Matt Simenc wrote: > > Hi, > > > > I am using OpenMPI to run MAKER on 2 nodes with 40 CPUs/node. The load is distributing across the nodes ok but with a very large number of processes on each node. Sometimes there are several hundred more processes than can be executed at one time by a node. Is this a problem? If so, any suggestions on how to fix? > > > > Thanks! > > > > Matt Simenc > > Der Lab > > California State University Fullerton > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From jmartin at genome.wustl.edu Tue Oct 27 15:35:38 2015 From: jmartin at genome.wustl.edu (John Martin) Date: Tue, 27 Oct 2015 16:35:38 -0500 Subject: [maker-devel] ERROR: Failed while clustering transcripts into genes for annotations Message-ID: <562FEE2A.4070907@genome.wustl.edu> I'm working on annotation for a de novo genomic assembly, and I have my jobs split into roughly equal (by seq length) batches. I am seeing ~2/3rds of these batches completing successfully, while the other 1/3rd is failing. I identified a problem contig, and have been doing test maker runs on that to try and figure out whats going on. The full batch was run using an older version of maker (v2.26), so I first tried updating to the latest version of maker (v2.31.8). That version did point out one problem in an EST evidence file I was using, which I fixed. That allowed maker to get much farther, but as it was nearing the end of the run it crashed again with this error message: ++++++++++++++++++++++++++++++++++ setting up GFF3 output and fasta chunks processing the chunk divide preparing evidence clusters for annotations Preparing evidence for hint based annotation in cluster::shadow_cluster... ...finished clustering. cleaning clusters.... total clusters:1 now processing 0 ...processing 0 of 8 ...processing 1 of 8 ...processing 2 of 8 ...processing 3 of 8 ...processing 4 of 8 ...processing 5 of 8 ...processing 6 of 8 ...processing 7 of 8 ...processing 0 of 13 ...processing 1 of 13 ...processing 2 of 13 ...processing 3 of 13 ...processing 4 of 13 ...processing 5 of 13 ...processing 6 of 13 ...processing 7 of 13 ...processing 8 of 13 ...processing 9 of 13 ...processing 10 of 13 ...processing 11 of 13 ...processing 12 of 13 annotating transcripts Making transcripts clustering transcripts into genes for annotations Processing transcripts into genes ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Calling translate without a seq argument! STACK: Error::throw STACK: Bio::Root::Root::throw /home/ec2-user/bin/BioPerl-1.6.923/Bio/Root/Root.pm:486 STACK: Bio::Tools::CodonTable::translate /home/ec2-user/bin/BioPerl-1.6.923/Bio/Tools/CodonTable.pm:414 STACK: PhatHit_utils::_adjust /home/ec2-user/bin/maker/bin/../lib/PhatHit_utils.pm:846 STACK: PhatHit_utils::adjust_start_stop /home/ec2-user/bin/maker/bin/../lib/PhatHit_utils.pm:794 STACK: maker::auto_annotator::load_transcript_struct /home/ec2-user/bin/maker/bin/../lib/maker/auto_annotator.pm:2198 STACK: maker::auto_annotator::group_transcripts /home/ec2-user/bin/maker/bin/../lib/maker/auto_annotator.pm:2676 STACK: maker::auto_annotator::annotate_genes /home/ec2-user/bin/maker/bin/../lib/maker/auto_annotator.pm:1018 STACK: Process::MpiChunk::_go /home/ec2-user/bin/maker/bin/../lib/Process/MpiChunk.pm:3847 STACK: Process::MpiChunk::run /home/ec2-user/bin/maker/bin/../lib/Process/MpiChunk.pm:341 STACK: Process::MpiChunk::run_all /home/ec2-user/bin/maker/bin/../lib/Process/MpiChunk.pm:357 STACK: Process::MpiTiers::run_all /home/ec2-user/bin/maker/bin/../lib/Process/MpiTiers.pm:287 STACK: Process::MpiTiers::run_all /home/ec2-user/bin/maker/bin/../lib/Process/MpiTiers.pm:287 STACK: /home/ec2-user/bin/maker/bin/maker:686 ----------------------------------------------------------- --> rank=NA, hostname=ip-172-31-35-77.us-west-2.compute.internal ERROR: Failed while clustering transcripts into genes for annotations ERROR: Chunk failed at level:2, tier_type:4 FAILED CONTIG:ANCCEYDFT_Contig1675 ERROR: Chunk failed at level:6, tier_type:0 FAILED CONTIG:ANCCEYDFT_Contig1675 examining contents of the fasta file and run log --Next Contig-- Processing run.log file... Maker is now finished!!! Start_time: 1445912338 End_time: 1445913817 Elapsed: 1479 ++++++++++++++++++++++++++++++++++ The root of the error seems clearly stated: MSG: Calling translate without a seq argument! but I don't know what that means in real terms. All my inputs appear valid. The contig I am testing with has 1 plus strand gene represented in the evidence files. And I've set a local TMP directory since I've read that sometimes these kinds of problems can stem from the program running out of TMP space. I am pretty sure that is not happening here (I put TMP on a disk with 1.1Tb of space, and the test contig is only 13kbp). Can anyone help me figure out what is going on? Thanks, John Martin ____ This email message is a private communication. The information transmitted, including attachments, is intended only for the person or entity to which it is addressed and may contain confidential, privileged, and/or proprietary material. Any review, duplication, retransmission, distribution, or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is unauthorized by the sender and is prohibited. If you have received this message in error, please contact the sender immediately by return email and delete the original message from all computer systems. Thank you.