From Cenny.Taslim at nationwidechildrens.org Thu Aug 8 08:12:23 2019 From: Cenny.Taslim at nationwidechildrens.org (Taslim, Cenny) Date: Thu, 8 Aug 2019 13:12:23 +0000 Subject: [maker-devel] maker with mpi support on example still not done after two days Message-ID: <82339125479f4b278b87bf8458c2c04d@l1perdwmbx02.childrensroot.net> Hi Maker developers, Thanks for approving my subscription. I tried running maker with mpi support on the human fasta file provided in example_01_basic. It's been running for 2 days and 21 hours. I didn't think the example require a long time to run. I'm hoping someone can help me point out the problem. I'm running it with 4 processes: ~/opt/mpich-3.3.1/bin/mpiexec -n 4 ~/opt/maker.4/maker/bin/maker -f 2> maker.error Maker_opts.ctl is the same as opts2.txt These are the log: STATUS: Parsing control files... STATUS: Processing and indexing input FASTA files... STATUS: Setting up database for any GFF3 input... A data structure will be created for you at: /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore To access files for individual sequences use the datastore index: /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_master_datastore_index.log STATUS: Now running MAKER... examining contents of the fasta file and run log --Next Contig-- #--------------------------------------------------------------------- Now starting the contig!! SeqID: NT_010783.15 Length: 201444 #--------------------------------------------------------------------- setting up GFF3 output and fasta chunks doing repeat masking doing repeat masking running repeat masker. #--------- command -------------# Widget::RepeatMasker: cd /gpfs0/scratch/1895302/maker_kXmduG; /opt/maker.4/maker/exe/RepeatMasker/RepeatMasker /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.0.all.rb -species all -dir /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0 -pa 1 #-------------------------------# running repeat masker. #--------- command -------------# Widget::RepeatMasker: cd /gpfs0/scratch/1895302/maker_kXmduG; /opt/maker.4/maker/exe/RepeatMasker/RepeatMasker /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.all.rb -species all -dir /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0 -pa 1 #-------------------------------# doing blastx repeats formating database... #--------- command -------------# Widget::formater: /opt/miniconda3/bin/makeblastdb -dbtype prot -in /gpfs0/scratch/1895302/maker_kXmduG/1/blastprep/te_proteins%2Efasta.mpi.10.0 #-------------------------------# doing blastx repeats formating database... #--------- command -------------# Widget::formater: /opt/miniconda3/bin/makeblastdb -dbtype prot -in /gpfs0/scratch/1895302/maker_kXmduG/2/blastprep/te_proteins%2Efasta.mpi.10.1 #-------------------------------# running blast search. #--------- command -------------# Widget::blastx: /opt/miniconda3/bin/blastx -db /gpfs0/scratch/1895302/maker_kXmduG/te_proteins%2Efasta.mpi.10.0 -query /gpfs0/scratch/1895302/maker_kXmduG/1/NT_010783%2E15.1 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.0.repeatrunner #-------------------------------# running blast search. #--------- command -------------# Widget::blastx: /opt/miniconda3/bin/blastx -db /gpfs0/scratch/1895302/maker_kXmduG/te_proteins%2Efasta.mpi.10.1 -query /gpfs0/scratch/1895302/maker_kXmduG/2/NT_010783%2E15.1 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.1.repeatrunner #-------------------------------# deleted:19 hits deleted:18 hits doing blastx repeats doing blastx repeats formating database... #--------- command -------------# Widget::formater: /opt/miniconda3/bin/makeblastdb -dbtype prot -in /gpfs0/scratch/1895302/maker_kXmduG/1/blastprep/te_proteins%2Efasta.mpi.10.2 #-------------------------------# formating database... #--------- command -------------# Widget::formater: /opt/miniconda3/bin/makeblastdb -dbtype prot -in /gpfs0/scratch/1895302/maker_kXmduG/2/blastprep/te_proteins%2Efasta.mpi.10.3 #-------------------------------# running blast search. #--------- command -------------# Widget::blastx: /opt/miniconda3/bin/blastx -db /gpfs0/scratch/1895302/maker_kXmduG/te_proteins%2Efasta.mpi.10.2 -query /gpfs0/scratch/1895302/maker_kXmduG/1/NT_010783%2E15.1 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.2.repeatrunner #-------------------------------# running blast search. #--------- command -------------# Widget::blastx: /opt/miniconda3/bin/blastx -db /gpfs0/scratch/1895302/maker_kXmduG/te_proteins%2Efasta.mpi.10.3 -query /gpfs0/scratch/1895302/maker_kXmduG/2/NT_010783%2E15.1 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.3.repeatrunner #-------------------------------# deleted:9 hits deleted:9 hits doing blastx repeats doing blastx repeats formating database... #--------- command -------------# Widget::formater: /opt/miniconda3/bin/makeblastdb -dbtype prot -in /gpfs0/scratch/1895302/maker_kXmduG/1/blastprep/te_proteins%2Efasta.mpi.10.5 #-------------------------------# formating database... #--------- command -------------# Widget::formater: /opt/miniconda3/bin/makeblastdb -dbtype prot -in /gpfs0/scratch/1895302/maker_kXmduG/2/blastprep/te_proteins%2Efasta.mpi.10.4 #-------------------------------# running blast search. #--------- command -------------# Widget::blastx: /opt/miniconda3/bin/blastx -db /gpfs0/scratch/1895302/maker_kXmduG/te_proteins%2Efasta.mpi.10.5 -query /gpfs0/scratch/1895302/maker_kXmduG/1/NT_010783%2E15.1 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.5.repeatrunner #-------------------------------# running blast search. #--------- command -------------# Widget::blastx: /opt/miniconda3/bin/blastx -db /gpfs0/scratch/1895302/maker_kXmduG/te_proteins%2Efasta.mpi.10.4 -query /gpfs0/scratch/1895302/maker_kXmduG/2/NT_010783%2E15.1 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.4.repeatrunner #-------------------------------# deleted:16 hits deleted:16 hits doing blastx repeats doing blastx repeats formating database... #--------- command -------------# Widget::formater: /opt/miniconda3/bin/makeblastdb -dbtype prot -in /gpfs0/scratch/1895302/maker_kXmduG/1/blastprep/te_proteins%2Efasta.mpi.10.6 #-------------------------------# formating database... #--------- command -------------# Widget::formater: /opt/miniconda3/bin/makeblastdb -dbtype prot -in /gpfs0/scratch/1895302/maker_kXmduG/2/blastprep/te_proteins%2Efasta.mpi.10.7 #-------------------------------# running blast search. #--------- command -------------# Widget::blastx: /opt/miniconda3/bin/blastx -db /gpfs0/scratch/1895302/maker_kXmduG/te_proteins%2Efasta.mpi.10.6 -query /gpfs0/scratch/1895302/maker_kXmduG/1/NT_010783%2E15.1 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.6.repeatrunner #-------------------------------# running blast search. #--------- command -------------# Widget::blastx: /opt/miniconda3/bin/blastx -db /gpfs0/scratch/1895302/maker_kXmduG/te_proteins%2Efasta.mpi.10.7 -query /gpfs0/scratch/1895302/maker_kXmduG/2/NT_010783%2E15.1 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.7.repeatrunner #-------------------------------# deleted:8 hits deleted:12 hits doing blastx repeats doing blastx repeats formating database... #--------- command -------------# Widget::formater: /opt/miniconda3/bin/makeblastdb -dbtype prot -in /gpfs0/scratch/1895302/maker_kXmduG/1/blastprep/te_proteins%2Efasta.mpi.10.8 #-------------------------------# formating database... #--------- command -------------# Widget::formater: /opt/miniconda3/bin/makeblastdb -dbtype prot -in /gpfs0/scratch/1895302/maker_kXmduG/2/blastprep/te_proteins%2Efasta.mpi.10.9 #-------------------------------# running blast search. #--------- command -------------# Widget::blastx: /opt/miniconda3/bin/blastx -db /gpfs0/scratch/1895302/maker_kXmduG/te_proteins%2Efasta.mpi.10.8 -query /gpfs0/scratch/1895302/maker_kXmduG/1/NT_010783%2E15.1 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.8.repeatrunner #-------------------------------# running blast search. #--------- command -------------# Widget::blastx: /opt/miniconda3/bin/blastx -db /gpfs0/scratch/1895302/maker_kXmduG/te_proteins%2Efasta.mpi.10.9 -query /gpfs0/scratch/1895302/maker_kXmduG/2/NT_010783%2E15.1 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.9.repeatrunner #-------------------------------# deleted:5 hits deleted:7 hits collecting blastx repeatmasking processing all repeats in cluster::shadow_cluster... ...finished clustering. Let me know if you need more information. Thank you in advance for your help. -------------- next part -------------- An HTML attachment was scrubbed... URL: From debojyoti.das.und at gmail.com Thu Aug 8 13:54:01 2019 From: debojyoti.das.und at gmail.com (Debojyoti Das) Date: Thu, 8 Aug 2019 13:54:01 -0500 Subject: [maker-devel] maker predicting only part of a gene Message-ID: Hi, I am working on a non-model reptile species. I tried running maker with est2genome=1 and protein2genome=1 with the following evidence: 1. est=transcriptome.fasta (de novo assembled) 2. protein in fasta format from two related species. I keep getting predictions where gene models identify multi-exon genes but fail to incorporate all the exons even though they are present on the scaffolds. The fact that in the predicted gene models some exons were correctly identified while missing others even though the entire gene is present on the scaffolds, we checked this by loading the annotation in IGV Viewer. Since we are not completely confident of our transciptome assembly, we thought of using cDNA from a closely related species. 1. altest="cDNA in fasta format from a related species" 2. protein in fasta format from two related species. However, when I do this I get the error *"ERROR: You must provide some form of EST evidence to use est2genome as a predictor."* Interestingly, if I switch est2genome off (setting it to zero) maker starts running. Any suggestions on how to proceed. Best, Debojyoti -------------- next part -------------- An HTML attachment was scrubbed... URL: From debojyoti.das at und.edu Thu Aug 8 14:48:36 2019 From: debojyoti.das at und.edu (Das, Debojyoti) Date: Thu, 8 Aug 2019 19:48:36 +0000 Subject: [maker-devel] maker predicting only part of a gene Message-ID: Hi Carson, I am working on a non-model reptile species. I tried running maker with est2genome=1 and protein2genome=1 with the following evidence: 1. est=transcriptome.fasta (de novo assembled) 2. protein in fasta format from two related species. I keep getting predictions where gene models identify multi-exon genes but fail to incorporate all the exons even though they are present on the scaffolds. The fact that in the predicted gene models some exons were correctly identified while missing others even though the entire gene is present on the scaffolds, we checked this by loading the annotation in IGV Viewer. Since we are not completely confident of our transciptome assembly, we thought of using cDNA from a closely related species. 1. altest="cDNA in fasta format from a related species" 2. protein in fasta format from two related species. However, when I do this I get the error "ERROR: You must provide some form of EST evidence to use est2genome as a predictor." Interestingly, if I switch est2genome off (setting it to zero) maker starts running. Any suggestions on how to proceed. Best, Debojyoti -------------- next part -------------- An HTML attachment was scrubbed... URL: From jmartin at wustl.edu Thu Aug 8 19:41:31 2019 From: jmartin at wustl.edu (Martin, John) Date: Fri, 9 Aug 2019 00:41:31 +0000 Subject: [maker-devel] Running maker with suboptimal evidence Message-ID: Greetings, I would like to annotate a worm genome of ~90Mb but the evidence I have is not all of good quality. I have 2 high quality protein sets from previously finished & curated, closely related worms. I also have a small amount of RNAseq and some old EST data neither of which give good coverage of the transcriptome. And I have a previously run Maker geneset for this worm that I believe was generated using probably all nematodes from genbank and that poor set of RNAseq. I also have a 'fair' set of predictions from running Braker2 using only the high quality protein data I mentioned. My opinion that the previous Maker geneset is of poor quality comes from comparing that geneset to my recent Braker2 annotations and that RNAseq. I would like to try and build an improved annotation for this assembly but I'm unsure of whether I should use all this evidence, or if I would be better off not using the low coverage RNAseq, EST data and previous (questionable looking) Maker geneset. I'm looking for opinions on whether I should throw all the evidence I have into Maker or should I use only evidence that I consider of good quality? Thanks, John Martin ________________________________ The materials in this message are private and may contain Protected Healthcare Information or other information of a sensitive nature. If you are not the intended recipient, be advised that any unauthorized use, disclosure, copying or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error, please immediately notify the sender via telephone or return mail. From Cenny.Taslim at nationwidechildrens.org Wed Aug 14 11:08:40 2019 From: Cenny.Taslim at nationwidechildrens.org (Taslim, Cenny) Date: Wed, 14 Aug 2019 16:08:40 +0000 Subject: [maker-devel] maker with mpi support on example still not done after two days Message-ID: I figured out that the example job will be finished if I'm not using mpi. i.e. running the job as such is fine: ~/opt/maker.4/maker/bin/maker -f 2> maker.error Any suggestions? Without mpi, I suspect it will take months to complete as I have human genome with ~3500 contigs From carsonhh at gmail.com Wed Aug 14 13:54:26 2019 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 14 Aug 2019 12:54:26 -0600 Subject: [maker-devel] Running maker with suboptimal evidence In-Reply-To: References: Message-ID: <9C1C9005-8285-4A48-9F48-066D15D402A9@gmail.com> Run it both ways then look at it in a browser like Apollo. You can then visually see how evidence alignments compare to the models and if they are spurious. While not practical in most situations, manual review is still the gold standard for genome annotation. ?Carson > On Aug 8, 2019, at 6:41 PM, Martin, John wrote: > > Greetings, > > I would like to annotate a worm genome of ~90Mb but the evidence I > have is not all of good quality. I have 2 high quality protein sets > from previously finished & curated, closely related worms. I also have > a small amount of RNAseq and some old EST data neither of which give > good coverage of the transcriptome. And I have a previously run Maker > geneset for this worm that I believe was generated using probably all > nematodes from genbank and that poor set of RNAseq. I also have a > 'fair' set of predictions from running Braker2 using only the high > quality protein data I mentioned. My opinion that the previous Maker > geneset is of poor quality comes from comparing that geneset to my > recent Braker2 annotations and that RNAseq. > > I would like to try and build an improved annotation for this > assembly but I'm unsure of whether I should use all this evidence, or if > I would be better off not using the low coverage RNAseq, EST data and > previous (questionable looking) Maker geneset. I'm looking for > opinions on whether I should throw all the evidence I have into Maker or > should I use only evidence that I consider of good quality? > > > Thanks, > > John Martin > > > > ________________________________ > The materials in this message are private and may contain Protected Healthcare Information or other information of a sensitive nature. If you are not the intended recipient, be advised that any unauthorized use, disclosure, copying or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error, please immediately notify the sender via telephone or return mail. > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Wed Aug 14 13:57:39 2019 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 14 Aug 2019 12:57:39 -0600 Subject: [maker-devel] maker predicting only part of a gene In-Reply-To: References: Message-ID: <31721D1F-7079-4935-BE34-A8F4786A5797@gmail.com> Both est2genome=1 and protein2genome=1 do not predict genes. They simply transfer exonerate alignments which match ORFs into gene models. It?s good enough to train a predictor like SNAP or Augustus, but should not be used as the final models. If you review the documentation you will see that they should be turned off once you train a predictor. Here is an example ?> http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial_for_WGS_Assembly_and_Annotation_Winter_School_2018#Training_ab_initio_Gene_Predictors ?Carson > On Aug 8, 2019, at 1:48 PM, Das, Debojyoti wrote: > > Hi Carson, > > I am working on a non-model reptile species. I tried running maker with est2genome=1 and protein2genome=1 with the following evidence: > > 1. est=transcriptome.fasta (de novo assembled) > 2. protein in fasta format from two related species. > > > I keep getting predictions where gene models identify multi-exon genes but fail to incorporate all the exons even though they are present on the scaffolds. The fact that in the predicted gene models some exons were correctly identified while missing others even though the entire gene is present on the scaffolds, we checked this by loading the annotation in IGV Viewer. > > Since we are not completely confident of our transciptome assembly, we thought of using cDNA from a closely related species. > 1. altest="cDNA in fasta format from a related species" > 2. protein in fasta format from two related species. > > However, when I do this I get the error > "ERROR: You must provide some form of EST evidence to use est2genome as a predictor." > > Interestingly, if I switch est2genome off (setting it to zero) maker starts running. > > Any suggestions on how to proceed. > > Best, > Debojyoti > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Aug 14 14:05:05 2019 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 14 Aug 2019 13:05:05 -0600 Subject: [maker-devel] maker with mpi support on example still not done after two days In-Reply-To: <82339125479f4b278b87bf8458c2c04d@l1perdwmbx02.childrensroot.net> References: <82339125479f4b278b87bf8458c2c04d@l1perdwmbx02.childrensroot.net> Message-ID: If no additional output is produced, then it is frozen for some reason. My first suggestion would be to reinstall MPICH and then reinstall MAKER using the MPICH you just installed. Freezing issues are usually related to MPI communication which is handled by the communicator you have installed (i.e. the mpiexec command that launches MAKER). Alternatively it can be related to the file system. Some less common network mounted file systems do not support do not correctly support hardlinks which can cause programatic file locks to freeze. You can try running with the -nolock flag if that is the case. ?Carson > On Aug 8, 2019, at 7:12 AM, Taslim, Cenny wrote: > > Hi Maker developers, > > Thanks for approving my subscription. > I tried running maker with mpi support on the human fasta file provided in example_01_basic. It?s been running for 2 days and 21 hours. I didn?t think the example require a long time to run. > I?m hoping someone can help me point out the problem. > > I?m running it with 4 processes: > ~/opt/mpich-3.3.1/bin/mpiexec -n 4 ~/opt/maker.4/maker/bin/maker -f 2> maker.error > > Maker_opts.ctl is the same as opts2.txt > > These are the log: > STATUS: Parsing control files... > STATUS: Processing and indexing input FASTA files... > STATUS: Setting up database for any GFF3 input... > A data structure will be created for you at: > /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore > > To access files for individual sequences use the datastore index: > /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_master_datastore_index.log > > STATUS: Now running MAKER... > examining contents of the fasta file and run log > > > > --Next Contig-- > > #--------------------------------------------------------------------- > Now starting the contig!! > SeqID: NT_010783.15 > Length: 201444 > #--------------------------------------------------------------------- > > > setting up GFF3 output and fasta chunks > doing repeat masking > doing repeat masking > running repeat masker. > #--------- command -------------# > Widget::RepeatMasker: > cd /gpfs0/scratch/1895302/maker_kXmduG; /opt/maker.4/maker/exe/RepeatMasker/RepeatMasker /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.0.all.rb -species all -dir /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0 -pa 1 > #-------------------------------# > running repeat masker. > #--------- command -------------# > Widget::RepeatMasker: > cd /gpfs0/scratch/1895302/maker_kXmduG; /opt/maker.4/maker/exe/RepeatMasker/RepeatMasker /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.all.rb -species all -dir /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0 -pa 1 > #-------------------------------# > doing blastx repeats > formating database... > #--------- command -------------# > Widget::formater: > /opt/miniconda3/bin/makeblastdb -dbtype prot -in /gpfs0/scratch/1895302/maker_kXmduG/1/blastprep/te_proteins%2Efasta.mpi.10.0 > #-------------------------------# > doing blastx repeats > formating database... > #--------- command -------------# > Widget::formater: > /opt/miniconda3/bin/makeblastdb -dbtype prot -in /gpfs0/scratch/1895302/maker_kXmduG/2/blastprep/te_proteins%2Efasta.mpi.10.1 > #-------------------------------# > running blast search. > #--------- command -------------# > Widget::blastx: > /opt/miniconda3/bin/blastx -db /gpfs0/scratch/1895302/maker_kXmduG/te_proteins%2Efasta.mpi.10.0 -query /gpfs0/scratch/1895302/maker_kXmduG/1/NT_010783%2E15.1 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.0.repeatrunner > #-------------------------------# > running blast search. > #--------- command -------------# > Widget::blastx: > /opt/miniconda3/bin/blastx -db /gpfs0/scratch/1895302/maker_kXmduG/te_proteins%2Efasta.mpi.10.1 -query /gpfs0/scratch/1895302/maker_kXmduG/2/NT_010783%2E15.1 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.1.repeatrunner > #-------------------------------# > deleted:19 hits > deleted:18 hits > doing blastx repeats > doing blastx repeats > formating database... > #--------- command -------------# > Widget::formater: > /opt/miniconda3/bin/makeblastdb -dbtype prot -in /gpfs0/scratch/1895302/maker_kXmduG/1/blastprep/te_proteins%2Efasta.mpi.10.2 > #-------------------------------# > formating database... > #--------- command -------------# > Widget::formater: > /opt/miniconda3/bin/makeblastdb -dbtype prot -in /gpfs0/scratch/1895302/maker_kXmduG/2/blastprep/te_proteins%2Efasta.mpi.10.3 > #-------------------------------# > running blast search. > #--------- command -------------# > Widget::blastx: > /opt/miniconda3/bin/blastx -db /gpfs0/scratch/1895302/maker_kXmduG/te_proteins%2Efasta.mpi.10.2 -query /gpfs0/scratch/1895302/maker_kXmduG/1/NT_010783%2E15.1 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.2.repeatrunner > #-------------------------------# > running blast search. > #--------- command -------------# > Widget::blastx: > /opt/miniconda3/bin/blastx -db /gpfs0/scratch/1895302/maker_kXmduG/te_proteins%2Efasta.mpi.10.3 -query /gpfs0/scratch/1895302/maker_kXmduG/2/NT_010783%2E15.1 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.3.repeatrunner > #-------------------------------# > deleted:9 hits > deleted:9 hits > doing blastx repeats > doing blastx repeats > formating database... > #--------- command -------------# > Widget::formater: > /opt/miniconda3/bin/makeblastdb -dbtype prot -in /gpfs0/scratch/1895302/maker_kXmduG/1/blastprep/te_proteins%2Efasta.mpi.10.5 > #-------------------------------# > formating database... > #--------- command -------------# > Widget::formater: > /opt/miniconda3/bin/makeblastdb -dbtype prot -in /gpfs0/scratch/1895302/maker_kXmduG/2/blastprep/te_proteins%2Efasta.mpi.10.4 > #-------------------------------# > running blast search. > #--------- command -------------# > Widget::blastx: > /opt/miniconda3/bin/blastx -db /gpfs0/scratch/1895302/maker_kXmduG/te_proteins%2Efasta.mpi.10.5 -query /gpfs0/scratch/1895302/maker_kXmduG/1/NT_010783%2E15.1 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.5.repeatrunner > #-------------------------------# > running blast search. > #--------- command -------------# > Widget::blastx: > /opt/miniconda3/bin/blastx -db /gpfs0/scratch/1895302/maker_kXmduG/te_proteins%2Efasta.mpi.10.4 -query /gpfs0/scratch/1895302/maker_kXmduG/2/NT_010783%2E15.1 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.4.repeatrunner > #-------------------------------# > deleted:16 hits > deleted:16 hits > doing blastx repeats > doing blastx repeats > formating database... > #--------- command -------------# > Widget::formater: > /opt/miniconda3/bin/makeblastdb -dbtype prot -in /gpfs0/scratch/1895302/maker_kXmduG/1/blastprep/te_proteins%2Efasta.mpi.10.6 > #-------------------------------# > formating database... > #--------- command -------------# > Widget::formater: > /opt/miniconda3/bin/makeblastdb -dbtype prot -in /gpfs0/scratch/1895302/maker_kXmduG/2/blastprep/te_proteins%2Efasta.mpi.10.7 > #-------------------------------# > running blast search. > #--------- command -------------# > Widget::blastx: > /opt/miniconda3/bin/blastx -db /gpfs0/scratch/1895302/maker_kXmduG/te_proteins%2Efasta.mpi.10.6 -query /gpfs0/scratch/1895302/maker_kXmduG/1/NT_010783%2E15.1 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.6.repeatrunner > #-------------------------------# > running blast search. > #--------- command -------------# > Widget::blastx: > /opt/miniconda3/bin/blastx -db /gpfs0/scratch/1895302/maker_kXmduG/te_proteins%2Efasta.mpi.10.7 -query /gpfs0/scratch/1895302/maker_kXmduG/2/NT_010783%2E15.1 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.7.repeatrunner > #-------------------------------# > deleted:8 hits > deleted:12 hits > doing blastx repeats > doing blastx repeats > formating database... > #--------- command -------------# > Widget::formater: > /opt/miniconda3/bin/makeblastdb -dbtype prot -in /gpfs0/scratch/1895302/maker_kXmduG/1/blastprep/te_proteins%2Efasta.mpi.10.8 > #-------------------------------# > formating database... > #--------- command -------------# > Widget::formater: > /opt/miniconda3/bin/makeblastdb -dbtype prot -in /gpfs0/scratch/1895302/maker_kXmduG/2/blastprep/te_proteins%2Efasta.mpi.10.9 > #-------------------------------# > running blast search. > #--------- command -------------# > Widget::blastx: > /opt/miniconda3/bin/blastx -db /gpfs0/scratch/1895302/maker_kXmduG/te_proteins%2Efasta.mpi.10.8 -query /gpfs0/scratch/1895302/maker_kXmduG/1/NT_010783%2E15.1 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.8.repeatrunner > #-------------------------------# > running blast search. > #--------- command -------------# > Widget::blastx: > /opt/miniconda3/bin/blastx -db /gpfs0/scratch/1895302/maker_kXmduG/te_proteins%2Efasta.mpi.10.9 -query /gpfs0/scratch/1895302/maker_kXmduG/2/NT_010783%2E15.1 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.9.repeatrunner > #-------------------------------# > deleted:5 hits > deleted:7 hits > collecting blastx repeatmasking > processing all repeats > in cluster::shadow_cluster... > ...finished clustering. > > Let me know if you need more information. Thank you in advance for your help. > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Aug 14 14:06:41 2019 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 14 Aug 2019 13:06:41 -0600 Subject: [maker-devel] maker with mpi support on example still not done after two days In-Reply-To: References: <82339125479f4b278b87bf8458c2c04d@l1perdwmbx02.childrensroot.net> Message-ID: <54BF4DAE-272A-453E-9522-A82FAADABEF7@gmail.com> You can also try OpenMPI instead of MPICH to see if it behaves better (requires MAKER reinstall against OpenMPI libraries). ?Carson > On Aug 14, 2019, at 1:05 PM, Carson Holt wrote: > > If no additional output is produced, then it is frozen for some reason. My first suggestion would be to reinstall MPICH and then reinstall MAKER using the MPICH you just installed. Freezing issues are usually related to MPI communication which is handled by the communicator you have installed (i.e. the mpiexec command that launches MAKER). > > Alternatively it can be related to the file system. Some less common network mounted file systems do not support do not correctly support hardlinks which can cause programatic file locks to freeze. You can try running with the -nolock flag if that is the case. > > ?Carson > > >> On Aug 8, 2019, at 7:12 AM, Taslim, Cenny > wrote: >> >> Hi Maker developers, >> >> Thanks for approving my subscription. >> I tried running maker with mpi support on the human fasta file provided in example_01_basic. It?s been running for 2 days and 21 hours. I didn?t think the example require a long time to run. >> I?m hoping someone can help me point out the problem. >> >> I?m running it with 4 processes: >> ~/opt/mpich-3.3.1/bin/mpiexec -n 4 ~/opt/maker.4/maker/bin/maker -f 2> maker.error >> >> Maker_opts.ctl is the same as opts2.txt >> >> These are the log: >> STATUS: Parsing control files... >> STATUS: Processing and indexing input FASTA files... >> STATUS: Setting up database for any GFF3 input... >> A data structure will be created for you at: >> /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore >> >> To access files for individual sequences use the datastore index: >> /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_master_datastore_index.log >> >> STATUS: Now running MAKER... >> examining contents of the fasta file and run log >> >> >> >> --Next Contig-- >> >> #--------------------------------------------------------------------- >> Now starting the contig!! >> SeqID: NT_010783.15 >> Length: 201444 >> #--------------------------------------------------------------------- >> >> >> setting up GFF3 output and fasta chunks >> doing repeat masking >> doing repeat masking >> running repeat masker. >> #--------- command -------------# >> Widget::RepeatMasker: >> cd /gpfs0/scratch/1895302/maker_kXmduG; /opt/maker.4/maker/exe/RepeatMasker/RepeatMasker /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.0.all.rb -species all -dir /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0 -pa 1 >> #-------------------------------# >> running repeat masker. >> #--------- command -------------# >> Widget::RepeatMasker: >> cd /gpfs0/scratch/1895302/maker_kXmduG; /opt/maker.4/maker/exe/RepeatMasker/RepeatMasker /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.all.rb -species all -dir /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0 -pa 1 >> #-------------------------------# >> doing blastx repeats >> formating database... >> #--------- command -------------# >> Widget::formater: >> /opt/miniconda3/bin/makeblastdb -dbtype prot -in /gpfs0/scratch/1895302/maker_kXmduG/1/blastprep/te_proteins%2Efasta.mpi.10.0 >> #-------------------------------# >> doing blastx repeats >> formating database... >> #--------- command -------------# >> Widget::formater: >> /opt/miniconda3/bin/makeblastdb -dbtype prot -in /gpfs0/scratch/1895302/maker_kXmduG/2/blastprep/te_proteins%2Efasta.mpi.10.1 >> #-------------------------------# >> running blast search. >> #--------- command -------------# >> Widget::blastx: >> /opt/miniconda3/bin/blastx -db /gpfs0/scratch/1895302/maker_kXmduG/te_proteins%2Efasta.mpi.10.0 -query /gpfs0/scratch/1895302/maker_kXmduG/1/NT_010783%2E15.1 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.0.repeatrunner >> #-------------------------------# >> running blast search. >> #--------- command -------------# >> Widget::blastx: >> /opt/miniconda3/bin/blastx -db /gpfs0/scratch/1895302/maker_kXmduG/te_proteins%2Efasta.mpi.10.1 -query /gpfs0/scratch/1895302/maker_kXmduG/2/NT_010783%2E15.1 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.1.repeatrunner >> #-------------------------------# >> deleted:19 hits >> deleted:18 hits >> doing blastx repeats >> doing blastx repeats >> formating database... >> #--------- command -------------# >> Widget::formater: >> /opt/miniconda3/bin/makeblastdb -dbtype prot -in /gpfs0/scratch/1895302/maker_kXmduG/1/blastprep/te_proteins%2Efasta.mpi.10.2 >> #-------------------------------# >> formating database... >> #--------- command -------------# >> Widget::formater: >> /opt/miniconda3/bin/makeblastdb -dbtype prot -in /gpfs0/scratch/1895302/maker_kXmduG/2/blastprep/te_proteins%2Efasta.mpi.10.3 >> #-------------------------------# >> running blast search. >> #--------- command -------------# >> Widget::blastx: >> /opt/miniconda3/bin/blastx -db /gpfs0/scratch/1895302/maker_kXmduG/te_proteins%2Efasta.mpi.10.2 -query /gpfs0/scratch/1895302/maker_kXmduG/1/NT_010783%2E15.1 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.2.repeatrunner >> #-------------------------------# >> running blast search. >> #--------- command -------------# >> Widget::blastx: >> /opt/miniconda3/bin/blastx -db /gpfs0/scratch/1895302/maker_kXmduG/te_proteins%2Efasta.mpi.10.3 -query /gpfs0/scratch/1895302/maker_kXmduG/2/NT_010783%2E15.1 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.3.repeatrunner >> #-------------------------------# >> deleted:9 hits >> deleted:9 hits >> doing blastx repeats >> doing blastx repeats >> formating database... >> #--------- command -------------# >> Widget::formater: >> /opt/miniconda3/bin/makeblastdb -dbtype prot -in /gpfs0/scratch/1895302/maker_kXmduG/1/blastprep/te_proteins%2Efasta.mpi.10.5 >> #-------------------------------# >> formating database... >> #--------- command -------------# >> Widget::formater: >> /opt/miniconda3/bin/makeblastdb -dbtype prot -in /gpfs0/scratch/1895302/maker_kXmduG/2/blastprep/te_proteins%2Efasta.mpi.10.4 >> #-------------------------------# >> running blast search. >> #--------- command -------------# >> Widget::blastx: >> /opt/miniconda3/bin/blastx -db /gpfs0/scratch/1895302/maker_kXmduG/te_proteins%2Efasta.mpi.10.5 -query /gpfs0/scratch/1895302/maker_kXmduG/1/NT_010783%2E15.1 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.5.repeatrunner >> #-------------------------------# >> running blast search. >> #--------- command -------------# >> Widget::blastx: >> /opt/miniconda3/bin/blastx -db /gpfs0/scratch/1895302/maker_kXmduG/te_proteins%2Efasta.mpi.10.4 -query /gpfs0/scratch/1895302/maker_kXmduG/2/NT_010783%2E15.1 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.4.repeatrunner >> #-------------------------------# >> deleted:16 hits >> deleted:16 hits >> doing blastx repeats >> doing blastx repeats >> formating database... >> #--------- command -------------# >> Widget::formater: >> /opt/miniconda3/bin/makeblastdb -dbtype prot -in /gpfs0/scratch/1895302/maker_kXmduG/1/blastprep/te_proteins%2Efasta.mpi.10.6 >> #-------------------------------# >> formating database... >> #--------- command -------------# >> Widget::formater: >> /opt/miniconda3/bin/makeblastdb -dbtype prot -in /gpfs0/scratch/1895302/maker_kXmduG/2/blastprep/te_proteins%2Efasta.mpi.10.7 >> #-------------------------------# >> running blast search. >> #--------- command -------------# >> Widget::blastx: >> /opt/miniconda3/bin/blastx -db /gpfs0/scratch/1895302/maker_kXmduG/te_proteins%2Efasta.mpi.10.6 -query /gpfs0/scratch/1895302/maker_kXmduG/1/NT_010783%2E15.1 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.6.repeatrunner >> #-------------------------------# >> running blast search. >> #--------- command -------------# >> Widget::blastx: >> /opt/miniconda3/bin/blastx -db /gpfs0/scratch/1895302/maker_kXmduG/te_proteins%2Efasta.mpi.10.7 -query /gpfs0/scratch/1895302/maker_kXmduG/2/NT_010783%2E15.1 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.7.repeatrunner >> #-------------------------------# >> deleted:8 hits >> deleted:12 hits >> doing blastx repeats >> doing blastx repeats >> formating database... >> #--------- command -------------# >> Widget::formater: >> /opt/miniconda3/bin/makeblastdb -dbtype prot -in /gpfs0/scratch/1895302/maker_kXmduG/1/blastprep/te_proteins%2Efasta.mpi.10.8 >> #-------------------------------# >> formating database... >> #--------- command -------------# >> Widget::formater: >> /opt/miniconda3/bin/makeblastdb -dbtype prot -in /gpfs0/scratch/1895302/maker_kXmduG/2/blastprep/te_proteins%2Efasta.mpi.10.9 >> #-------------------------------# >> running blast search. >> #--------- command -------------# >> Widget::blastx: >> /opt/miniconda3/bin/blastx -db /gpfs0/scratch/1895302/maker_kXmduG/te_proteins%2Efasta.mpi.10.8 -query /gpfs0/scratch/1895302/maker_kXmduG/1/NT_010783%2E15.1 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.8.repeatrunner >> #-------------------------------# >> running blast search. >> #--------- command -------------# >> Widget::blastx: >> /opt/miniconda3/bin/blastx -db /gpfs0/scratch/1895302/maker_kXmduG/te_proteins%2Efasta.mpi.10.9 -query /gpfs0/scratch/1895302/maker_kXmduG/2/NT_010783%2E15.1 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.9.repeatrunner >> #-------------------------------# >> deleted:5 hits >> deleted:7 hits >> collecting blastx repeatmasking >> processing all repeats >> in cluster::shadow_cluster... >> ...finished clustering. >> >> Let me know if you need more information. Thank you in advance for your help. >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From pickettbd at gmail.com Thu Aug 15 15:48:46 2019 From: pickettbd at gmail.com (Brandon Pickett) Date: Thu, 15 Aug 2019 13:48:46 -0700 Subject: [maker-devel] which files are expected after fasta_merge? Message-ID: Good afternoon! I just finished my third round of maker. I trained snap, augustus, etc. between the rounds. I used fasta_merge and gff3_merge to extract files after each round of maker. gff3_merge performed as expected each time, but fasta_merge surprised me. I will show you which files fasta_merge generated after each round. Please note that, as many people do, I renamed my output files from the default. Accordingly, I will list all the files with a generalized prefix of "maker" and show the rest of the file name as it was generated for me. Also note that I've changed .fasta to .fa for brevity. After round #1: transcripts.fa proteins.fa After round #2: non_overlapping_ab_initio.proteins.fa non_overlapping_ab_initio.transcripts.fa transcripts.fa augustus_masked.proteins.fa augustus_masked.transcripts.fa evm.proteins.fa evm.transcripts.fa genemark.proteins.fa genemark.transcripts.fa snap_masked.proteins.fa snap_masked.transcripts.fa proteins.fa After round #3: non_overlapping_ab_initio.proteins.fa non_overlapping_ab_initio.transcripts.fa augustus_masked.proteins.fa augustus_masked.transcripts.fa genemark.proteins.fa genemark.transcripts.fa snap_masked.proteins.fa snap_masked.transcripts.fa I am unsurprised that I didn't get all these files after round #1 because I used round #1 to generate gene models from transcript evidence. I didn't expect so many files after round #2 (having only seen the output from round #1 up to that point), but it makes sense that I would get output from augustus, evidence modeler (evm), genemark, and snap since I provided them as input to this round (#2) of maker. Between rounds #2 and #3, I re-trained snap and augustus. Genemark was trained between rounds #1 and #2 without gene models from maker and thus did not require re-training. The only difference in my maker control files between rounds #2 and #3 were the paths to the snap and augustus files. In both #2 and #3, the control files had run_evm=1. I can provide my control files for each round, if needed. *My question is why transcripts.fa, proteins.fa, evm.proteins.fa, and evm.transcripts.fa were not generated after round #3? *I recognize that this is probably not an error, rather a lack of my understanding of when each file is and is not generated. Thank you, Brandon Pickett -------------- next part -------------- An HTML attachment was scrubbed... URL: From jacques.dainat at nbis.se Tue Aug 20 03:14:53 2019 From: jacques.dainat at nbis.se (Jacques Dainat) Date: Tue, 20 Aug 2019 10:14:53 +0200 Subject: [maker-devel] maker_gff parameter - problem when gff contains fasta sequences Message-ID: <0EBF46CF-C0C8-4985-93D4-7BA587413DA7@nbis.se> Dear Carson, I?m using maker/3.01.02 with open MPI. I realised that the option maker_gff from the maker_opts.ctl works great as long as no FASTA sequence is embeded in the GFF3 file. e.g: ``` ### ##FASTA >3098|quiver TTTATGGGTTCAGGCGGACCCATGGCGCCGACCATATTTTGAGAGCTGGACGACTCTGTA GGGTTGGGTATTGGCTGATTATTCATTCAAATCCCACGAGTAGCCTAGGAAGTGACGGTC ``` I ended up with GFF3 files containing fasta sequences in a sequential manner (All contig1 features then the sequence of contig1, all contig2 features then the sequence of contig2, etc? I precise this because we can meet gff3 files where all the sequences are gather at the end of the file). In such case MAKER takes in consideration only the features met before to reach the first FASTA sequence in the file. Then it stops to process the file and doesn?t consider the rest of it. I haven?t seen any particular message but my resulting annotation was obviously wrong. Indeed most of the data repeat/alignment/models contained in the gff file haven?t been passed to MAKER. Would it be possible to add a fix to continue to parse a gff file even after meeting a fasta sequence? Best regards, /Jacques ------------------------------------------------- Jacques Dainat, Ph.D. NBIS (National Bioinformatics Infrastructure Sweden) Genome Annotation Service http://nbis.se/about/staff/jacques-dainat https://github.com/NBISweden/GAAS http://nbis.se ? Contact ? Address: Uppsala University, Biomedicinska Centrum Department of Medical Biochemistry Microbiology, Genomics Husargatan 3, box 582 S-75123 Uppsala Sweden Phone: +46 18 471 46 25 -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Aug 20 08:16:23 2019 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 20 Aug 2019 07:16:23 -0600 Subject: [maker-devel] maker_gff parameter - problem when gff contains fasta sequences In-Reply-To: <0EBF46CF-C0C8-4985-93D4-7BA587413DA7@nbis.se> References: <0EBF46CF-C0C8-4985-93D4-7BA587413DA7@nbis.se> Message-ID: All fasta entries must occur at the end of the file according to gff3 specification. If a fasta entry is embedded in the middle, you have a corrupt file. If you are trying to merge gff3, files you can use the gff3_merge script. Concatenation via something like ?cat? however results in a broken file. ?Carson Sent from my iPhone > On Aug 20, 2019, at 2:14 AM, Jacques Dainat wrote: > > Dear Carson, > > I?m using maker/3.01.02 with open MPI. > I realised that the option maker_gff from the maker_opts.ctl works great as long as no FASTA sequence is embeded in the GFF3 file. > e.g: > ``` > ### > ##FASTA > >3098|quiver > TTTATGGGTTCAGGCGGACCCATGGCGCCGACCATATTTTGAGAGCTGGACGACTCTGTA > GGGTTGGGTATTGGCTGATTATTCATTCAAATCCCACGAGTAGCCTAGGAAGTGACGGTC > ``` > I ended up with GFF3 files containing fasta sequences in a sequential manner (All contig1 features then the sequence of contig1, all contig2 features then the sequence of contig2, etc? I precise this because we can meet gff3 files where all the sequences are gather at the end of the file). In such case MAKER takes in consideration only the features met before to reach the first FASTA sequence in the file. Then it stops to process the file and doesn?t consider the rest of it. > > I haven?t seen any particular message but my resulting annotation was obviously wrong. Indeed most of the data repeat/alignment/models contained in the gff file haven?t been passed to MAKER. Would it be possible to add a fix to continue to parse a gff file even after meeting a fasta sequence? > > > Best regards, > > /Jacques > ------------------------------------------------- > Jacques Dainat, Ph.D. > NBIS (National Bioinformatics Infrastructure Sweden) > Genome Annotation Service > http://nbis.se/about/staff/jacques-dainat > https://github.com/NBISweden/GAAS > http://nbis.se > > ? Contact ? > Address: Uppsala University, Biomedicinska Centrum > Department of Medical Biochemistry Microbiology, Genomics > Husargatan 3, box 582 > S-75123 Uppsala Sweden > Phone: +46 18 471 46 25 > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Aug 20 08:20:50 2019 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 20 Aug 2019 07:20:50 -0600 Subject: [maker-devel] maker_gff parameter - problem when gff contains fasta sequences In-Reply-To: References: <0EBF46CF-C0C8-4985-93D4-7BA587413DA7@nbis.se> Message-ID: Here is the relevant part of the format specification ?> ##FASTA This notation indicates that the annotation portion of the file is at an end and that the remainder of the file contains one or more sequences (nucleotide or protein) in FASTA format. This allows features and sequences to be bundled together. All FASTA sequences included in the file must be included together at the end of the file and may not be interspersed with the features lines. Once a ##FASTA section is encountered no other content beyond valid FASTA sequence is allowed. ?Carson Sent from my iPhone > On Aug 20, 2019, at 7:16 AM, Carson Holt wrote: > > All fasta entries must occur at the end of the file according to gff3 specification. If a fasta entry is embedded in the middle, you have a corrupt file. If you are trying to merge gff3, files you can use the gff3_merge script. Concatenation via something like ?cat? however results in a broken file. > > ?Carson > > Sent from my iPhone > >> On Aug 20, 2019, at 2:14 AM, Jacques Dainat wrote: >> >> Dear Carson, >> >> I?m using maker/3.01.02 with open MPI. >> I realised that the option maker_gff from the maker_opts.ctl works great as long as no FASTA sequence is embeded in the GFF3 file. >> e.g: >> ``` >> ### >> ##FASTA >> >3098|quiver >> TTTATGGGTTCAGGCGGACCCATGGCGCCGACCATATTTTGAGAGCTGGACGACTCTGTA >> GGGTTGGGTATTGGCTGATTATTCATTCAAATCCCACGAGTAGCCTAGGAAGTGACGGTC >> ``` >> I ended up with GFF3 files containing fasta sequences in a sequential manner (All contig1 features then the sequence of contig1, all contig2 features then the sequence of contig2, etc? I precise this because we can meet gff3 files where all the sequences are gather at the end of the file). In such case MAKER takes in consideration only the features met before to reach the first FASTA sequence in the file. Then it stops to process the file and doesn?t consider the rest of it. >> >> I haven?t seen any particular message but my resulting annotation was obviously wrong. Indeed most of the data repeat/alignment/models contained in the gff file haven?t been passed to MAKER. Would it be possible to add a fix to continue to parse a gff file even after meeting a fasta sequence? >> >> >> Best regards, >> >> /Jacques >> ------------------------------------------------- >> Jacques Dainat, Ph.D. >> NBIS (National Bioinformatics Infrastructure Sweden) >> Genome Annotation Service >> http://nbis.se/about/staff/jacques-dainat >> https://github.com/NBISweden/GAAS >> http://nbis.se >> >> ? Contact ? >> Address: Uppsala University, Biomedicinska Centrum >> Department of Medical Biochemistry Microbiology, Genomics >> Husargatan 3, box 582 >> S-75123 Uppsala Sweden >> Phone: +46 18 471 46 25 >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From jacques.dainat at nbis.se Tue Aug 20 08:28:46 2019 From: jacques.dainat at nbis.se (Jacques Dainat) Date: Tue, 20 Aug 2019 15:28:46 +0200 Subject: [maker-devel] maker_gff parameter - problem when gff contains fasta sequences In-Reply-To: References: <0EBF46CF-C0C8-4985-93D4-7BA587413DA7@nbis.se> Message-ID: <66852BCA-9F3C-40CB-B58A-9578E3418851@nbis.se> Thank you for your quick answer, you are right I should have read the gff3 specification more carefully. I will investigate which step I modified that introduced the problem. Thank again. /Jacques > On 20 Aug 2019, at 15:16, Carson Holt wrote: > > All fasta entries must occur at the end of the file according to gff3 specification. If a fasta entry is embedded in the middle, you have a corrupt file. If you are trying to merge gff3, files you can use the gff3_merge script. Concatenation via something like ?cat? however results in a broken file. > > ?Carson > > Sent from my iPhone > > On Aug 20, 2019, at 2:14 AM, Jacques Dainat > wrote: > >> Dear Carson, >> >> I?m using maker/3.01.02 with open MPI. >> I realised that the option maker_gff from the maker_opts.ctl works great as long as no FASTA sequence is embeded in the GFF3 file. >> e.g: >> ``` >> ### >> ##FASTA >> >3098|quiver >> TTTATGGGTTCAGGCGGACCCATGGCGCCGACCATATTTTGAGAGCTGGACGACTCTGTA >> GGGTTGGGTATTGGCTGATTATTCATTCAAATCCCACGAGTAGCCTAGGAAGTGACGGTC >> ``` >> I ended up with GFF3 files containing fasta sequences in a sequential manner (All contig1 features then the sequence of contig1, all contig2 features then the sequence of contig2, etc? I precise this because we can meet gff3 files where all the sequences are gather at the end of the file). In such case MAKER takes in consideration only the features met before to reach the first FASTA sequence in the file. Then it stops to process the file and doesn?t consider the rest of it. >> >> I haven?t seen any particular message but my resulting annotation was obviously wrong. Indeed most of the data repeat/alignment/models contained in the gff file haven?t been passed to MAKER. Would it be possible to add a fix to continue to parse a gff file even after meeting a fasta sequence? >> >> >> Best regards, >> >> /Jacques >> ------------------------------------------------- >> Jacques Dainat, Ph.D. >> NBIS (National Bioinformatics Infrastructure Sweden) >> Genome Annotation Service >> http://nbis.se/about/staff/jacques-dainat >> https://github.com/NBISweden/GAAS >> http://nbis.se >> >> ? Contact ? >> Address: Uppsala University, Biomedicinska Centrum >> Department of Medical Biochemistry Microbiology, Genomics >> Husargatan 3, box 582 >> S-75123 Uppsala Sweden >> Phone: +46 18 471 46 25 >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From jacques.dainat at nbis.se Wed Aug 21 07:59:43 2019 From: jacques.dainat at nbis.se (Jacques Dainat) Date: Wed, 21 Aug 2019 14:59:43 +0200 Subject: [maker-devel] GeneWise in MAKER? Message-ID: Dear Carson, Reading this paper J. Armstrong, I. T. Fiddes, M. Diekhans, and B. Paten. Whole-Genome Alignment and Comparative Annotation. Annu Rev Anim Biosci, Oct 2018. I discovered that MAKER is running GeneWise (i.e table 4). Do they mixup with Exonerate, or it is something well hidden within MAKER and its documentation? Best regards, Jacques ------------------------------------------------- Jacques Dainat, Ph.D. NBIS (National Bioinformatics Infrastructure Sweden) Genome Annotation Service http://nbis.se/about/staff/jacques-dainat https://github.com/NBISweden/GAAS http://nbis.se ? Contact ? Address: Uppsala University, Biomedicinska Centrum Department of Medical Biochemistry Microbiology, Genomics Husargatan 3, box 582 S-75123 Uppsala Sweden Phone: +46 18 471 46 25 -------------- next part -------------- An HTML attachment was scrubbed... URL: From ABoyher at danforthcenter.org Wed Aug 28 13:31:33 2019 From: ABoyher at danforthcenter.org (Boyher, Adam) Date: Wed, 28 Aug 2019 18:31:33 +0000 Subject: [maker-devel] Haplotype specific annotations Message-ID: <02119316-A412-43AB-A6A8-05037B14D972@contoso.com> Hi I have a phased genome (haplotype specific assemblies) that I have annotated separately with Maker. There are a couple of things I?ve had a little trouble figuring out in relation to this. The first is that I have 3 sets of genes, two sets that exist in one haplotype but not the other, and a third that exist in both. I want to name these genes specifically based on what set they are in. So for instance, genes that exist in both have the same name in both assemblies, but genes that exist in only one haplotype are named specific to that haplotype. Is there a straightforward way to do this? The second issue is that I?ve discovered one gene that is annotated in one phase, but not the other. However, when I blast the genomic sequence against the second haplotype, I find an exact match. Given that I used the exact same methods to annotate both haplotypes, starting with the same set of evidence (protein, transcriptome, repeat), why might maker miss or exclude that gene? Thanks Adam -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Aug 30 11:20:32 2019 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 30 Aug 2019 10:20:32 -0600 Subject: [maker-devel] Haplotype specific annotations In-Reply-To: <02119316-A412-43AB-A6A8-05037B14D972@contoso.com> References: <02119316-A412-43AB-A6A8-05037B14D972@contoso.com> Message-ID: <098E19B1-BDB4-4FDA-A94B-4E30CEB5B58A@gmail.com> It may be that there are broken splice donors/acceptors in one vs the other which would not be seen with a blast search. Look at both in a browser to see what evidence looks like in both and how ab initial predictions compare between the two. As for naming you could try reciprical best blast hits to see who matches who. Unfortunately you will have to do a lot of manual review to make sure you are not just matching parlors together. ?Carson > On Aug 28, 2019, at 12:31 PM, Boyher, Adam wrote: > > Hi > > I have a phased genome (haplotype specific assemblies) that I have annotated separately with Maker. There are a couple of things I?ve had a little trouble figuring out in relation to this. The first is that I have 3 sets of genes, two sets that exist in one haplotype but not the other, and a third that exist in both. I want to name these genes specifically based on what set they are in. So for instance, genes that exist in both have the same name in both assemblies, but genes that exist in only one haplotype are named specific to that haplotype. Is there a straightforward way to do this? > > The second issue is that I?ve discovered one gene that is annotated in one phase, but not the other. However, when I blast the genomic sequence against the second haplotype, I find an exact match. Given that I used the exact same methods to annotate both haplotypes, starting with the same set of evidence (protein, transcriptome, repeat), why might maker miss or exclude that gene? > > Thanks > Adam > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Aug 30 11:48:52 2019 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 30 Aug 2019 10:48:52 -0600 Subject: [maker-devel] which files are expected after fasta_merge? In-Reply-To: References: Message-ID: If you disabled evidence for round 3 (i.e. protein= and est=) then you will get no annotations and EVM will not run. You can look at the GFF3 in a browser, and if you see that there are no protein/est alignments, then that is likely why. ?Carson > On Aug 15, 2019, at 2:48 PM, Brandon Pickett wrote: > > Good afternoon! > > I just finished my third round of maker. I trained snap, augustus, etc. between the rounds. I used fasta_merge and gff3_merge to extract files after each round of maker. gff3_merge performed as expected each time, but fasta_merge surprised me. I will show you which files fasta_merge generated after each round. Please note that, as many people do, I renamed my output files from the default. Accordingly, I will list all the files with a generalized prefix of "maker" and show the rest of the file name as it was generated for me. Also note that I've changed .fasta to .fa for brevity. > > After round #1: > transcripts.fa > proteins.fa > > After round #2: > non_overlapping_ab_initio.proteins.fa > non_overlapping_ab_initio.transcripts.fa > transcripts.fa > augustus_masked.proteins.fa > augustus_masked.transcripts.fa > evm.proteins.fa > evm.transcripts.fa > genemark.proteins.fa > genemark.transcripts.fa > snap_masked.proteins.fa > snap_masked.transcripts.fa > proteins.fa > > After round #3: > non_overlapping_ab_initio.proteins.fa > non_overlapping_ab_initio.transcripts.fa > augustus_masked.proteins.fa > augustus_masked.transcripts.fa > genemark.proteins.fa > genemark.transcripts.fa > snap_masked.proteins.fa > snap_masked.transcripts.fa > > I am unsurprised that I didn't get all these files after round #1 because I used round #1 to generate gene models from transcript evidence. I didn't expect so many files after round #2 (having only seen the output from round #1 up to that point), but it makes sense that I would get output from augustus, evidence modeler (evm), genemark, and snap since I provided them as input to this round (#2) of maker. Between rounds #2 and #3, I re-trained snap and augustus. Genemark was trained between rounds #1 and #2 without gene models from maker and thus did not require re-training. The only difference in my maker control files between rounds #2 and #3 were the paths to the snap and augustus files. In both #2 and #3, the control files had run_evm=1. I can provide my control files for each round, if needed. My question is why transcripts.fa, proteins.fa, evm.proteins.fa, and evm.transcripts.fa were not generated after round #3? I recognize that this is probably not an error, rather a lack of my understanding of when each file is and is not generated. > > Thank you, > Brandon Pickett > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From Cenny.Taslim at nationwidechildrens.org Thu Aug 8 07:12:23 2019 From: Cenny.Taslim at nationwidechildrens.org (Taslim, Cenny) Date: Thu, 8 Aug 2019 13:12:23 +0000 Subject: [maker-devel] maker with mpi support on example still not done after two days Message-ID: <82339125479f4b278b87bf8458c2c04d@l1perdwmbx02.childrensroot.net> Hi Maker developers, Thanks for approving my subscription. I tried running maker with mpi support on the human fasta file provided in example_01_basic. It's been running for 2 days and 21 hours. I didn't think the example require a long time to run. I'm hoping someone can help me point out the problem. I'm running it with 4 processes: ~/opt/mpich-3.3.1/bin/mpiexec -n 4 ~/opt/maker.4/maker/bin/maker -f 2> maker.error Maker_opts.ctl is the same as opts2.txt These are the log: STATUS: Parsing control files... STATUS: Processing and indexing input FASTA files... STATUS: Setting up database for any GFF3 input... A data structure will be created for you at: /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore To access files for individual sequences use the datastore index: /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_master_datastore_index.log STATUS: Now running MAKER... examining contents of the fasta file and run log --Next Contig-- #--------------------------------------------------------------------- Now starting the contig!! SeqID: NT_010783.15 Length: 201444 #--------------------------------------------------------------------- setting up GFF3 output and fasta chunks doing repeat masking doing repeat masking running repeat masker. #--------- command -------------# Widget::RepeatMasker: cd /gpfs0/scratch/1895302/maker_kXmduG; /opt/maker.4/maker/exe/RepeatMasker/RepeatMasker /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.0.all.rb -species all -dir /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0 -pa 1 #-------------------------------# running repeat masker. #--------- command -------------# Widget::RepeatMasker: cd /gpfs0/scratch/1895302/maker_kXmduG; /opt/maker.4/maker/exe/RepeatMasker/RepeatMasker /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.all.rb -species all -dir /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0 -pa 1 #-------------------------------# doing blastx repeats formating database... #--------- command -------------# Widget::formater: /opt/miniconda3/bin/makeblastdb -dbtype prot -in /gpfs0/scratch/1895302/maker_kXmduG/1/blastprep/te_proteins%2Efasta.mpi.10.0 #-------------------------------# doing blastx repeats formating database... #--------- command -------------# Widget::formater: /opt/miniconda3/bin/makeblastdb -dbtype prot -in /gpfs0/scratch/1895302/maker_kXmduG/2/blastprep/te_proteins%2Efasta.mpi.10.1 #-------------------------------# running blast search. #--------- command -------------# Widget::blastx: /opt/miniconda3/bin/blastx -db /gpfs0/scratch/1895302/maker_kXmduG/te_proteins%2Efasta.mpi.10.0 -query /gpfs0/scratch/1895302/maker_kXmduG/1/NT_010783%2E15.1 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.0.repeatrunner #-------------------------------# running blast search. #--------- command -------------# Widget::blastx: /opt/miniconda3/bin/blastx -db /gpfs0/scratch/1895302/maker_kXmduG/te_proteins%2Efasta.mpi.10.1 -query /gpfs0/scratch/1895302/maker_kXmduG/2/NT_010783%2E15.1 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.1.repeatrunner #-------------------------------# deleted:19 hits deleted:18 hits doing blastx repeats doing blastx repeats formating database... #--------- command -------------# Widget::formater: /opt/miniconda3/bin/makeblastdb -dbtype prot -in /gpfs0/scratch/1895302/maker_kXmduG/1/blastprep/te_proteins%2Efasta.mpi.10.2 #-------------------------------# formating database... #--------- command -------------# Widget::formater: /opt/miniconda3/bin/makeblastdb -dbtype prot -in /gpfs0/scratch/1895302/maker_kXmduG/2/blastprep/te_proteins%2Efasta.mpi.10.3 #-------------------------------# running blast search. #--------- command -------------# Widget::blastx: /opt/miniconda3/bin/blastx -db /gpfs0/scratch/1895302/maker_kXmduG/te_proteins%2Efasta.mpi.10.2 -query /gpfs0/scratch/1895302/maker_kXmduG/1/NT_010783%2E15.1 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.2.repeatrunner #-------------------------------# running blast search. #--------- command -------------# Widget::blastx: /opt/miniconda3/bin/blastx -db /gpfs0/scratch/1895302/maker_kXmduG/te_proteins%2Efasta.mpi.10.3 -query /gpfs0/scratch/1895302/maker_kXmduG/2/NT_010783%2E15.1 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.3.repeatrunner #-------------------------------# deleted:9 hits deleted:9 hits doing blastx repeats doing blastx repeats formating database... #--------- command -------------# Widget::formater: /opt/miniconda3/bin/makeblastdb -dbtype prot -in /gpfs0/scratch/1895302/maker_kXmduG/1/blastprep/te_proteins%2Efasta.mpi.10.5 #-------------------------------# formating database... #--------- command -------------# Widget::formater: /opt/miniconda3/bin/makeblastdb -dbtype prot -in /gpfs0/scratch/1895302/maker_kXmduG/2/blastprep/te_proteins%2Efasta.mpi.10.4 #-------------------------------# running blast search. #--------- command -------------# Widget::blastx: /opt/miniconda3/bin/blastx -db /gpfs0/scratch/1895302/maker_kXmduG/te_proteins%2Efasta.mpi.10.5 -query /gpfs0/scratch/1895302/maker_kXmduG/1/NT_010783%2E15.1 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.5.repeatrunner #-------------------------------# running blast search. #--------- command -------------# Widget::blastx: /opt/miniconda3/bin/blastx -db /gpfs0/scratch/1895302/maker_kXmduG/te_proteins%2Efasta.mpi.10.4 -query /gpfs0/scratch/1895302/maker_kXmduG/2/NT_010783%2E15.1 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.4.repeatrunner #-------------------------------# deleted:16 hits deleted:16 hits doing blastx repeats doing blastx repeats formating database... #--------- command -------------# Widget::formater: /opt/miniconda3/bin/makeblastdb -dbtype prot -in /gpfs0/scratch/1895302/maker_kXmduG/1/blastprep/te_proteins%2Efasta.mpi.10.6 #-------------------------------# formating database... #--------- command -------------# Widget::formater: /opt/miniconda3/bin/makeblastdb -dbtype prot -in /gpfs0/scratch/1895302/maker_kXmduG/2/blastprep/te_proteins%2Efasta.mpi.10.7 #-------------------------------# running blast search. #--------- command -------------# Widget::blastx: /opt/miniconda3/bin/blastx -db /gpfs0/scratch/1895302/maker_kXmduG/te_proteins%2Efasta.mpi.10.6 -query /gpfs0/scratch/1895302/maker_kXmduG/1/NT_010783%2E15.1 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.6.repeatrunner #-------------------------------# running blast search. #--------- command -------------# Widget::blastx: /opt/miniconda3/bin/blastx -db /gpfs0/scratch/1895302/maker_kXmduG/te_proteins%2Efasta.mpi.10.7 -query /gpfs0/scratch/1895302/maker_kXmduG/2/NT_010783%2E15.1 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.7.repeatrunner #-------------------------------# deleted:8 hits deleted:12 hits doing blastx repeats doing blastx repeats formating database... #--------- command -------------# Widget::formater: /opt/miniconda3/bin/makeblastdb -dbtype prot -in /gpfs0/scratch/1895302/maker_kXmduG/1/blastprep/te_proteins%2Efasta.mpi.10.8 #-------------------------------# formating database... #--------- command -------------# Widget::formater: /opt/miniconda3/bin/makeblastdb -dbtype prot -in /gpfs0/scratch/1895302/maker_kXmduG/2/blastprep/te_proteins%2Efasta.mpi.10.9 #-------------------------------# running blast search. #--------- command -------------# Widget::blastx: /opt/miniconda3/bin/blastx -db /gpfs0/scratch/1895302/maker_kXmduG/te_proteins%2Efasta.mpi.10.8 -query /gpfs0/scratch/1895302/maker_kXmduG/1/NT_010783%2E15.1 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.8.repeatrunner #-------------------------------# running blast search. #--------- command -------------# Widget::blastx: /opt/miniconda3/bin/blastx -db /gpfs0/scratch/1895302/maker_kXmduG/te_proteins%2Efasta.mpi.10.9 -query /gpfs0/scratch/1895302/maker_kXmduG/2/NT_010783%2E15.1 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.9.repeatrunner #-------------------------------# deleted:5 hits deleted:7 hits collecting blastx repeatmasking processing all repeats in cluster::shadow_cluster... ...finished clustering. Let me know if you need more information. Thank you in advance for your help. -------------- next part -------------- An HTML attachment was scrubbed... URL: From debojyoti.das.und at gmail.com Thu Aug 8 12:54:01 2019 From: debojyoti.das.und at gmail.com (Debojyoti Das) Date: Thu, 8 Aug 2019 13:54:01 -0500 Subject: [maker-devel] maker predicting only part of a gene Message-ID: Hi, I am working on a non-model reptile species. I tried running maker with est2genome=1 and protein2genome=1 with the following evidence: 1. est=transcriptome.fasta (de novo assembled) 2. protein in fasta format from two related species. I keep getting predictions where gene models identify multi-exon genes but fail to incorporate all the exons even though they are present on the scaffolds. The fact that in the predicted gene models some exons were correctly identified while missing others even though the entire gene is present on the scaffolds, we checked this by loading the annotation in IGV Viewer. Since we are not completely confident of our transciptome assembly, we thought of using cDNA from a closely related species. 1. altest="cDNA in fasta format from a related species" 2. protein in fasta format from two related species. However, when I do this I get the error *"ERROR: You must provide some form of EST evidence to use est2genome as a predictor."* Interestingly, if I switch est2genome off (setting it to zero) maker starts running. Any suggestions on how to proceed. Best, Debojyoti -------------- next part -------------- An HTML attachment was scrubbed... URL: From debojyoti.das at und.edu Thu Aug 8 13:48:36 2019 From: debojyoti.das at und.edu (Das, Debojyoti) Date: Thu, 8 Aug 2019 19:48:36 +0000 Subject: [maker-devel] maker predicting only part of a gene Message-ID: Hi Carson, I am working on a non-model reptile species. I tried running maker with est2genome=1 and protein2genome=1 with the following evidence: 1. est=transcriptome.fasta (de novo assembled) 2. protein in fasta format from two related species. I keep getting predictions where gene models identify multi-exon genes but fail to incorporate all the exons even though they are present on the scaffolds. The fact that in the predicted gene models some exons were correctly identified while missing others even though the entire gene is present on the scaffolds, we checked this by loading the annotation in IGV Viewer. Since we are not completely confident of our transciptome assembly, we thought of using cDNA from a closely related species. 1. altest="cDNA in fasta format from a related species" 2. protein in fasta format from two related species. However, when I do this I get the error "ERROR: You must provide some form of EST evidence to use est2genome as a predictor." Interestingly, if I switch est2genome off (setting it to zero) maker starts running. Any suggestions on how to proceed. Best, Debojyoti -------------- next part -------------- An HTML attachment was scrubbed... URL: From jmartin at wustl.edu Thu Aug 8 18:41:31 2019 From: jmartin at wustl.edu (Martin, John) Date: Fri, 9 Aug 2019 00:41:31 +0000 Subject: [maker-devel] Running maker with suboptimal evidence Message-ID: Greetings, I would like to annotate a worm genome of ~90Mb but the evidence I have is not all of good quality. I have 2 high quality protein sets from previously finished & curated, closely related worms. I also have a small amount of RNAseq and some old EST data neither of which give good coverage of the transcriptome. And I have a previously run Maker geneset for this worm that I believe was generated using probably all nematodes from genbank and that poor set of RNAseq. I also have a 'fair' set of predictions from running Braker2 using only the high quality protein data I mentioned. My opinion that the previous Maker geneset is of poor quality comes from comparing that geneset to my recent Braker2 annotations and that RNAseq. I would like to try and build an improved annotation for this assembly but I'm unsure of whether I should use all this evidence, or if I would be better off not using the low coverage RNAseq, EST data and previous (questionable looking) Maker geneset. I'm looking for opinions on whether I should throw all the evidence I have into Maker or should I use only evidence that I consider of good quality? Thanks, John Martin ________________________________ The materials in this message are private and may contain Protected Healthcare Information or other information of a sensitive nature. If you are not the intended recipient, be advised that any unauthorized use, disclosure, copying or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error, please immediately notify the sender via telephone or return mail. From Cenny.Taslim at nationwidechildrens.org Wed Aug 14 10:08:40 2019 From: Cenny.Taslim at nationwidechildrens.org (Taslim, Cenny) Date: Wed, 14 Aug 2019 16:08:40 +0000 Subject: [maker-devel] maker with mpi support on example still not done after two days Message-ID: I figured out that the example job will be finished if I'm not using mpi. i.e. running the job as such is fine: ~/opt/maker.4/maker/bin/maker -f 2> maker.error Any suggestions? Without mpi, I suspect it will take months to complete as I have human genome with ~3500 contigs From carsonhh at gmail.com Wed Aug 14 12:54:26 2019 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 14 Aug 2019 12:54:26 -0600 Subject: [maker-devel] Running maker with suboptimal evidence In-Reply-To: References: Message-ID: <9C1C9005-8285-4A48-9F48-066D15D402A9@gmail.com> Run it both ways then look at it in a browser like Apollo. You can then visually see how evidence alignments compare to the models and if they are spurious. While not practical in most situations, manual review is still the gold standard for genome annotation. ?Carson > On Aug 8, 2019, at 6:41 PM, Martin, John wrote: > > Greetings, > > I would like to annotate a worm genome of ~90Mb but the evidence I > have is not all of good quality. I have 2 high quality protein sets > from previously finished & curated, closely related worms. I also have > a small amount of RNAseq and some old EST data neither of which give > good coverage of the transcriptome. And I have a previously run Maker > geneset for this worm that I believe was generated using probably all > nematodes from genbank and that poor set of RNAseq. I also have a > 'fair' set of predictions from running Braker2 using only the high > quality protein data I mentioned. My opinion that the previous Maker > geneset is of poor quality comes from comparing that geneset to my > recent Braker2 annotations and that RNAseq. > > I would like to try and build an improved annotation for this > assembly but I'm unsure of whether I should use all this evidence, or if > I would be better off not using the low coverage RNAseq, EST data and > previous (questionable looking) Maker geneset. I'm looking for > opinions on whether I should throw all the evidence I have into Maker or > should I use only evidence that I consider of good quality? > > > Thanks, > > John Martin > > > > ________________________________ > The materials in this message are private and may contain Protected Healthcare Information or other information of a sensitive nature. If you are not the intended recipient, be advised that any unauthorized use, disclosure, copying or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error, please immediately notify the sender via telephone or return mail. > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Wed Aug 14 12:57:39 2019 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 14 Aug 2019 12:57:39 -0600 Subject: [maker-devel] maker predicting only part of a gene In-Reply-To: References: Message-ID: <31721D1F-7079-4935-BE34-A8F4786A5797@gmail.com> Both est2genome=1 and protein2genome=1 do not predict genes. They simply transfer exonerate alignments which match ORFs into gene models. It?s good enough to train a predictor like SNAP or Augustus, but should not be used as the final models. If you review the documentation you will see that they should be turned off once you train a predictor. Here is an example ?> http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial_for_WGS_Assembly_and_Annotation_Winter_School_2018#Training_ab_initio_Gene_Predictors ?Carson > On Aug 8, 2019, at 1:48 PM, Das, Debojyoti wrote: > > Hi Carson, > > I am working on a non-model reptile species. I tried running maker with est2genome=1 and protein2genome=1 with the following evidence: > > 1. est=transcriptome.fasta (de novo assembled) > 2. protein in fasta format from two related species. > > > I keep getting predictions where gene models identify multi-exon genes but fail to incorporate all the exons even though they are present on the scaffolds. The fact that in the predicted gene models some exons were correctly identified while missing others even though the entire gene is present on the scaffolds, we checked this by loading the annotation in IGV Viewer. > > Since we are not completely confident of our transciptome assembly, we thought of using cDNA from a closely related species. > 1. altest="cDNA in fasta format from a related species" > 2. protein in fasta format from two related species. > > However, when I do this I get the error > "ERROR: You must provide some form of EST evidence to use est2genome as a predictor." > > Interestingly, if I switch est2genome off (setting it to zero) maker starts running. > > Any suggestions on how to proceed. > > Best, > Debojyoti > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Aug 14 13:05:05 2019 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 14 Aug 2019 13:05:05 -0600 Subject: [maker-devel] maker with mpi support on example still not done after two days In-Reply-To: <82339125479f4b278b87bf8458c2c04d@l1perdwmbx02.childrensroot.net> References: <82339125479f4b278b87bf8458c2c04d@l1perdwmbx02.childrensroot.net> Message-ID: If no additional output is produced, then it is frozen for some reason. My first suggestion would be to reinstall MPICH and then reinstall MAKER using the MPICH you just installed. Freezing issues are usually related to MPI communication which is handled by the communicator you have installed (i.e. the mpiexec command that launches MAKER). Alternatively it can be related to the file system. Some less common network mounted file systems do not support do not correctly support hardlinks which can cause programatic file locks to freeze. You can try running with the -nolock flag if that is the case. ?Carson > On Aug 8, 2019, at 7:12 AM, Taslim, Cenny wrote: > > Hi Maker developers, > > Thanks for approving my subscription. > I tried running maker with mpi support on the human fasta file provided in example_01_basic. It?s been running for 2 days and 21 hours. I didn?t think the example require a long time to run. > I?m hoping someone can help me point out the problem. > > I?m running it with 4 processes: > ~/opt/mpich-3.3.1/bin/mpiexec -n 4 ~/opt/maker.4/maker/bin/maker -f 2> maker.error > > Maker_opts.ctl is the same as opts2.txt > > These are the log: > STATUS: Parsing control files... > STATUS: Processing and indexing input FASTA files... > STATUS: Setting up database for any GFF3 input... > A data structure will be created for you at: > /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore > > To access files for individual sequences use the datastore index: > /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_master_datastore_index.log > > STATUS: Now running MAKER... > examining contents of the fasta file and run log > > > > --Next Contig-- > > #--------------------------------------------------------------------- > Now starting the contig!! > SeqID: NT_010783.15 > Length: 201444 > #--------------------------------------------------------------------- > > > setting up GFF3 output and fasta chunks > doing repeat masking > doing repeat masking > running repeat masker. > #--------- command -------------# > Widget::RepeatMasker: > cd /gpfs0/scratch/1895302/maker_kXmduG; /opt/maker.4/maker/exe/RepeatMasker/RepeatMasker /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.0.all.rb -species all -dir /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0 -pa 1 > #-------------------------------# > running repeat masker. > #--------- command -------------# > Widget::RepeatMasker: > cd /gpfs0/scratch/1895302/maker_kXmduG; /opt/maker.4/maker/exe/RepeatMasker/RepeatMasker /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.all.rb -species all -dir /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0 -pa 1 > #-------------------------------# > doing blastx repeats > formating database... > #--------- command -------------# > Widget::formater: > /opt/miniconda3/bin/makeblastdb -dbtype prot -in /gpfs0/scratch/1895302/maker_kXmduG/1/blastprep/te_proteins%2Efasta.mpi.10.0 > #-------------------------------# > doing blastx repeats > formating database... > #--------- command -------------# > Widget::formater: > /opt/miniconda3/bin/makeblastdb -dbtype prot -in /gpfs0/scratch/1895302/maker_kXmduG/2/blastprep/te_proteins%2Efasta.mpi.10.1 > #-------------------------------# > running blast search. > #--------- command -------------# > Widget::blastx: > /opt/miniconda3/bin/blastx -db /gpfs0/scratch/1895302/maker_kXmduG/te_proteins%2Efasta.mpi.10.0 -query /gpfs0/scratch/1895302/maker_kXmduG/1/NT_010783%2E15.1 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.0.repeatrunner > #-------------------------------# > running blast search. > #--------- command -------------# > Widget::blastx: > /opt/miniconda3/bin/blastx -db /gpfs0/scratch/1895302/maker_kXmduG/te_proteins%2Efasta.mpi.10.1 -query /gpfs0/scratch/1895302/maker_kXmduG/2/NT_010783%2E15.1 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.1.repeatrunner > #-------------------------------# > deleted:19 hits > deleted:18 hits > doing blastx repeats > doing blastx repeats > formating database... > #--------- command -------------# > Widget::formater: > /opt/miniconda3/bin/makeblastdb -dbtype prot -in /gpfs0/scratch/1895302/maker_kXmduG/1/blastprep/te_proteins%2Efasta.mpi.10.2 > #-------------------------------# > formating database... > #--------- command -------------# > Widget::formater: > /opt/miniconda3/bin/makeblastdb -dbtype prot -in /gpfs0/scratch/1895302/maker_kXmduG/2/blastprep/te_proteins%2Efasta.mpi.10.3 > #-------------------------------# > running blast search. > #--------- command -------------# > Widget::blastx: > /opt/miniconda3/bin/blastx -db /gpfs0/scratch/1895302/maker_kXmduG/te_proteins%2Efasta.mpi.10.2 -query /gpfs0/scratch/1895302/maker_kXmduG/1/NT_010783%2E15.1 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.2.repeatrunner > #-------------------------------# > running blast search. > #--------- command -------------# > Widget::blastx: > /opt/miniconda3/bin/blastx -db /gpfs0/scratch/1895302/maker_kXmduG/te_proteins%2Efasta.mpi.10.3 -query /gpfs0/scratch/1895302/maker_kXmduG/2/NT_010783%2E15.1 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.3.repeatrunner > #-------------------------------# > deleted:9 hits > deleted:9 hits > doing blastx repeats > doing blastx repeats > formating database... > #--------- command -------------# > Widget::formater: > /opt/miniconda3/bin/makeblastdb -dbtype prot -in /gpfs0/scratch/1895302/maker_kXmduG/1/blastprep/te_proteins%2Efasta.mpi.10.5 > #-------------------------------# > formating database... > #--------- command -------------# > Widget::formater: > /opt/miniconda3/bin/makeblastdb -dbtype prot -in /gpfs0/scratch/1895302/maker_kXmduG/2/blastprep/te_proteins%2Efasta.mpi.10.4 > #-------------------------------# > running blast search. > #--------- command -------------# > Widget::blastx: > /opt/miniconda3/bin/blastx -db /gpfs0/scratch/1895302/maker_kXmduG/te_proteins%2Efasta.mpi.10.5 -query /gpfs0/scratch/1895302/maker_kXmduG/1/NT_010783%2E15.1 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.5.repeatrunner > #-------------------------------# > running blast search. > #--------- command -------------# > Widget::blastx: > /opt/miniconda3/bin/blastx -db /gpfs0/scratch/1895302/maker_kXmduG/te_proteins%2Efasta.mpi.10.4 -query /gpfs0/scratch/1895302/maker_kXmduG/2/NT_010783%2E15.1 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.4.repeatrunner > #-------------------------------# > deleted:16 hits > deleted:16 hits > doing blastx repeats > doing blastx repeats > formating database... > #--------- command -------------# > Widget::formater: > /opt/miniconda3/bin/makeblastdb -dbtype prot -in /gpfs0/scratch/1895302/maker_kXmduG/1/blastprep/te_proteins%2Efasta.mpi.10.6 > #-------------------------------# > formating database... > #--------- command -------------# > Widget::formater: > /opt/miniconda3/bin/makeblastdb -dbtype prot -in /gpfs0/scratch/1895302/maker_kXmduG/2/blastprep/te_proteins%2Efasta.mpi.10.7 > #-------------------------------# > running blast search. > #--------- command -------------# > Widget::blastx: > /opt/miniconda3/bin/blastx -db /gpfs0/scratch/1895302/maker_kXmduG/te_proteins%2Efasta.mpi.10.6 -query /gpfs0/scratch/1895302/maker_kXmduG/1/NT_010783%2E15.1 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.6.repeatrunner > #-------------------------------# > running blast search. > #--------- command -------------# > Widget::blastx: > /opt/miniconda3/bin/blastx -db /gpfs0/scratch/1895302/maker_kXmduG/te_proteins%2Efasta.mpi.10.7 -query /gpfs0/scratch/1895302/maker_kXmduG/2/NT_010783%2E15.1 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.7.repeatrunner > #-------------------------------# > deleted:8 hits > deleted:12 hits > doing blastx repeats > doing blastx repeats > formating database... > #--------- command -------------# > Widget::formater: > /opt/miniconda3/bin/makeblastdb -dbtype prot -in /gpfs0/scratch/1895302/maker_kXmduG/1/blastprep/te_proteins%2Efasta.mpi.10.8 > #-------------------------------# > formating database... > #--------- command -------------# > Widget::formater: > /opt/miniconda3/bin/makeblastdb -dbtype prot -in /gpfs0/scratch/1895302/maker_kXmduG/2/blastprep/te_proteins%2Efasta.mpi.10.9 > #-------------------------------# > running blast search. > #--------- command -------------# > Widget::blastx: > /opt/miniconda3/bin/blastx -db /gpfs0/scratch/1895302/maker_kXmduG/te_proteins%2Efasta.mpi.10.8 -query /gpfs0/scratch/1895302/maker_kXmduG/1/NT_010783%2E15.1 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.8.repeatrunner > #-------------------------------# > running blast search. > #--------- command -------------# > Widget::blastx: > /opt/miniconda3/bin/blastx -db /gpfs0/scratch/1895302/maker_kXmduG/te_proteins%2Efasta.mpi.10.9 -query /gpfs0/scratch/1895302/maker_kXmduG/2/NT_010783%2E15.1 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.9.repeatrunner > #-------------------------------# > deleted:5 hits > deleted:7 hits > collecting blastx repeatmasking > processing all repeats > in cluster::shadow_cluster... > ...finished clustering. > > Let me know if you need more information. Thank you in advance for your help. > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Aug 14 13:06:41 2019 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 14 Aug 2019 13:06:41 -0600 Subject: [maker-devel] maker with mpi support on example still not done after two days In-Reply-To: References: <82339125479f4b278b87bf8458c2c04d@l1perdwmbx02.childrensroot.net> Message-ID: <54BF4DAE-272A-453E-9522-A82FAADABEF7@gmail.com> You can also try OpenMPI instead of MPICH to see if it behaves better (requires MAKER reinstall against OpenMPI libraries). ?Carson > On Aug 14, 2019, at 1:05 PM, Carson Holt wrote: > > If no additional output is produced, then it is frozen for some reason. My first suggestion would be to reinstall MPICH and then reinstall MAKER using the MPICH you just installed. Freezing issues are usually related to MPI communication which is handled by the communicator you have installed (i.e. the mpiexec command that launches MAKER). > > Alternatively it can be related to the file system. Some less common network mounted file systems do not support do not correctly support hardlinks which can cause programatic file locks to freeze. You can try running with the -nolock flag if that is the case. > > ?Carson > > >> On Aug 8, 2019, at 7:12 AM, Taslim, Cenny > wrote: >> >> Hi Maker developers, >> >> Thanks for approving my subscription. >> I tried running maker with mpi support on the human fasta file provided in example_01_basic. It?s been running for 2 days and 21 hours. I didn?t think the example require a long time to run. >> I?m hoping someone can help me point out the problem. >> >> I?m running it with 4 processes: >> ~/opt/mpich-3.3.1/bin/mpiexec -n 4 ~/opt/maker.4/maker/bin/maker -f 2> maker.error >> >> Maker_opts.ctl is the same as opts2.txt >> >> These are the log: >> STATUS: Parsing control files... >> STATUS: Processing and indexing input FASTA files... >> STATUS: Setting up database for any GFF3 input... >> A data structure will be created for you at: >> /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore >> >> To access files for individual sequences use the datastore index: >> /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_master_datastore_index.log >> >> STATUS: Now running MAKER... >> examining contents of the fasta file and run log >> >> >> >> --Next Contig-- >> >> #--------------------------------------------------------------------- >> Now starting the contig!! >> SeqID: NT_010783.15 >> Length: 201444 >> #--------------------------------------------------------------------- >> >> >> setting up GFF3 output and fasta chunks >> doing repeat masking >> doing repeat masking >> running repeat masker. >> #--------- command -------------# >> Widget::RepeatMasker: >> cd /gpfs0/scratch/1895302/maker_kXmduG; /opt/maker.4/maker/exe/RepeatMasker/RepeatMasker /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.0.all.rb -species all -dir /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0 -pa 1 >> #-------------------------------# >> running repeat masker. >> #--------- command -------------# >> Widget::RepeatMasker: >> cd /gpfs0/scratch/1895302/maker_kXmduG; /opt/maker.4/maker/exe/RepeatMasker/RepeatMasker /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.all.rb -species all -dir /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0 -pa 1 >> #-------------------------------# >> doing blastx repeats >> formating database... >> #--------- command -------------# >> Widget::formater: >> /opt/miniconda3/bin/makeblastdb -dbtype prot -in /gpfs0/scratch/1895302/maker_kXmduG/1/blastprep/te_proteins%2Efasta.mpi.10.0 >> #-------------------------------# >> doing blastx repeats >> formating database... >> #--------- command -------------# >> Widget::formater: >> /opt/miniconda3/bin/makeblastdb -dbtype prot -in /gpfs0/scratch/1895302/maker_kXmduG/2/blastprep/te_proteins%2Efasta.mpi.10.1 >> #-------------------------------# >> running blast search. >> #--------- command -------------# >> Widget::blastx: >> /opt/miniconda3/bin/blastx -db /gpfs0/scratch/1895302/maker_kXmduG/te_proteins%2Efasta.mpi.10.0 -query /gpfs0/scratch/1895302/maker_kXmduG/1/NT_010783%2E15.1 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.0.repeatrunner >> #-------------------------------# >> running blast search. >> #--------- command -------------# >> Widget::blastx: >> /opt/miniconda3/bin/blastx -db /gpfs0/scratch/1895302/maker_kXmduG/te_proteins%2Efasta.mpi.10.1 -query /gpfs0/scratch/1895302/maker_kXmduG/2/NT_010783%2E15.1 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.1.repeatrunner >> #-------------------------------# >> deleted:19 hits >> deleted:18 hits >> doing blastx repeats >> doing blastx repeats >> formating database... >> #--------- command -------------# >> Widget::formater: >> /opt/miniconda3/bin/makeblastdb -dbtype prot -in /gpfs0/scratch/1895302/maker_kXmduG/1/blastprep/te_proteins%2Efasta.mpi.10.2 >> #-------------------------------# >> formating database... >> #--------- command -------------# >> Widget::formater: >> /opt/miniconda3/bin/makeblastdb -dbtype prot -in /gpfs0/scratch/1895302/maker_kXmduG/2/blastprep/te_proteins%2Efasta.mpi.10.3 >> #-------------------------------# >> running blast search. >> #--------- command -------------# >> Widget::blastx: >> /opt/miniconda3/bin/blastx -db /gpfs0/scratch/1895302/maker_kXmduG/te_proteins%2Efasta.mpi.10.2 -query /gpfs0/scratch/1895302/maker_kXmduG/1/NT_010783%2E15.1 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.2.repeatrunner >> #-------------------------------# >> running blast search. >> #--------- command -------------# >> Widget::blastx: >> /opt/miniconda3/bin/blastx -db /gpfs0/scratch/1895302/maker_kXmduG/te_proteins%2Efasta.mpi.10.3 -query /gpfs0/scratch/1895302/maker_kXmduG/2/NT_010783%2E15.1 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.3.repeatrunner >> #-------------------------------# >> deleted:9 hits >> deleted:9 hits >> doing blastx repeats >> doing blastx repeats >> formating database... >> #--------- command -------------# >> Widget::formater: >> /opt/miniconda3/bin/makeblastdb -dbtype prot -in /gpfs0/scratch/1895302/maker_kXmduG/1/blastprep/te_proteins%2Efasta.mpi.10.5 >> #-------------------------------# >> formating database... >> #--------- command -------------# >> Widget::formater: >> /opt/miniconda3/bin/makeblastdb -dbtype prot -in /gpfs0/scratch/1895302/maker_kXmduG/2/blastprep/te_proteins%2Efasta.mpi.10.4 >> #-------------------------------# >> running blast search. >> #--------- command -------------# >> Widget::blastx: >> /opt/miniconda3/bin/blastx -db /gpfs0/scratch/1895302/maker_kXmduG/te_proteins%2Efasta.mpi.10.5 -query /gpfs0/scratch/1895302/maker_kXmduG/1/NT_010783%2E15.1 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.5.repeatrunner >> #-------------------------------# >> running blast search. >> #--------- command -------------# >> Widget::blastx: >> /opt/miniconda3/bin/blastx -db /gpfs0/scratch/1895302/maker_kXmduG/te_proteins%2Efasta.mpi.10.4 -query /gpfs0/scratch/1895302/maker_kXmduG/2/NT_010783%2E15.1 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.4.repeatrunner >> #-------------------------------# >> deleted:16 hits >> deleted:16 hits >> doing blastx repeats >> doing blastx repeats >> formating database... >> #--------- command -------------# >> Widget::formater: >> /opt/miniconda3/bin/makeblastdb -dbtype prot -in /gpfs0/scratch/1895302/maker_kXmduG/1/blastprep/te_proteins%2Efasta.mpi.10.6 >> #-------------------------------# >> formating database... >> #--------- command -------------# >> Widget::formater: >> /opt/miniconda3/bin/makeblastdb -dbtype prot -in /gpfs0/scratch/1895302/maker_kXmduG/2/blastprep/te_proteins%2Efasta.mpi.10.7 >> #-------------------------------# >> running blast search. >> #--------- command -------------# >> Widget::blastx: >> /opt/miniconda3/bin/blastx -db /gpfs0/scratch/1895302/maker_kXmduG/te_proteins%2Efasta.mpi.10.6 -query /gpfs0/scratch/1895302/maker_kXmduG/1/NT_010783%2E15.1 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.6.repeatrunner >> #-------------------------------# >> running blast search. >> #--------- command -------------# >> Widget::blastx: >> /opt/miniconda3/bin/blastx -db /gpfs0/scratch/1895302/maker_kXmduG/te_proteins%2Efasta.mpi.10.7 -query /gpfs0/scratch/1895302/maker_kXmduG/2/NT_010783%2E15.1 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.7.repeatrunner >> #-------------------------------# >> deleted:8 hits >> deleted:12 hits >> doing blastx repeats >> doing blastx repeats >> formating database... >> #--------- command -------------# >> Widget::formater: >> /opt/miniconda3/bin/makeblastdb -dbtype prot -in /gpfs0/scratch/1895302/maker_kXmduG/1/blastprep/te_proteins%2Efasta.mpi.10.8 >> #-------------------------------# >> formating database... >> #--------- command -------------# >> Widget::formater: >> /opt/miniconda3/bin/makeblastdb -dbtype prot -in /gpfs0/scratch/1895302/maker_kXmduG/2/blastprep/te_proteins%2Efasta.mpi.10.9 >> #-------------------------------# >> running blast search. >> #--------- command -------------# >> Widget::blastx: >> /opt/miniconda3/bin/blastx -db /gpfs0/scratch/1895302/maker_kXmduG/te_proteins%2Efasta.mpi.10.8 -query /gpfs0/scratch/1895302/maker_kXmduG/1/NT_010783%2E15.1 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.8.repeatrunner >> #-------------------------------# >> running blast search. >> #--------- command -------------# >> Widget::blastx: >> /opt/miniconda3/bin/blastx -db /gpfs0/scratch/1895302/maker_kXmduG/te_proteins%2Efasta.mpi.10.9 -query /gpfs0/scratch/1895302/maker_kXmduG/2/NT_010783%2E15.1 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.9.repeatrunner >> #-------------------------------# >> deleted:5 hits >> deleted:7 hits >> collecting blastx repeatmasking >> processing all repeats >> in cluster::shadow_cluster... >> ...finished clustering. >> >> Let me know if you need more information. Thank you in advance for your help. >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From pickettbd at gmail.com Thu Aug 15 14:48:46 2019 From: pickettbd at gmail.com (Brandon Pickett) Date: Thu, 15 Aug 2019 13:48:46 -0700 Subject: [maker-devel] which files are expected after fasta_merge? Message-ID: Good afternoon! I just finished my third round of maker. I trained snap, augustus, etc. between the rounds. I used fasta_merge and gff3_merge to extract files after each round of maker. gff3_merge performed as expected each time, but fasta_merge surprised me. I will show you which files fasta_merge generated after each round. Please note that, as many people do, I renamed my output files from the default. Accordingly, I will list all the files with a generalized prefix of "maker" and show the rest of the file name as it was generated for me. Also note that I've changed .fasta to .fa for brevity. After round #1: transcripts.fa proteins.fa After round #2: non_overlapping_ab_initio.proteins.fa non_overlapping_ab_initio.transcripts.fa transcripts.fa augustus_masked.proteins.fa augustus_masked.transcripts.fa evm.proteins.fa evm.transcripts.fa genemark.proteins.fa genemark.transcripts.fa snap_masked.proteins.fa snap_masked.transcripts.fa proteins.fa After round #3: non_overlapping_ab_initio.proteins.fa non_overlapping_ab_initio.transcripts.fa augustus_masked.proteins.fa augustus_masked.transcripts.fa genemark.proteins.fa genemark.transcripts.fa snap_masked.proteins.fa snap_masked.transcripts.fa I am unsurprised that I didn't get all these files after round #1 because I used round #1 to generate gene models from transcript evidence. I didn't expect so many files after round #2 (having only seen the output from round #1 up to that point), but it makes sense that I would get output from augustus, evidence modeler (evm), genemark, and snap since I provided them as input to this round (#2) of maker. Between rounds #2 and #3, I re-trained snap and augustus. Genemark was trained between rounds #1 and #2 without gene models from maker and thus did not require re-training. The only difference in my maker control files between rounds #2 and #3 were the paths to the snap and augustus files. In both #2 and #3, the control files had run_evm=1. I can provide my control files for each round, if needed. *My question is why transcripts.fa, proteins.fa, evm.proteins.fa, and evm.transcripts.fa were not generated after round #3? *I recognize that this is probably not an error, rather a lack of my understanding of when each file is and is not generated. Thank you, Brandon Pickett -------------- next part -------------- An HTML attachment was scrubbed... URL: From jacques.dainat at nbis.se Tue Aug 20 02:14:53 2019 From: jacques.dainat at nbis.se (Jacques Dainat) Date: Tue, 20 Aug 2019 10:14:53 +0200 Subject: [maker-devel] maker_gff parameter - problem when gff contains fasta sequences Message-ID: <0EBF46CF-C0C8-4985-93D4-7BA587413DA7@nbis.se> Dear Carson, I?m using maker/3.01.02 with open MPI. I realised that the option maker_gff from the maker_opts.ctl works great as long as no FASTA sequence is embeded in the GFF3 file. e.g: ``` ### ##FASTA >3098|quiver TTTATGGGTTCAGGCGGACCCATGGCGCCGACCATATTTTGAGAGCTGGACGACTCTGTA GGGTTGGGTATTGGCTGATTATTCATTCAAATCCCACGAGTAGCCTAGGAAGTGACGGTC ``` I ended up with GFF3 files containing fasta sequences in a sequential manner (All contig1 features then the sequence of contig1, all contig2 features then the sequence of contig2, etc? I precise this because we can meet gff3 files where all the sequences are gather at the end of the file). In such case MAKER takes in consideration only the features met before to reach the first FASTA sequence in the file. Then it stops to process the file and doesn?t consider the rest of it. I haven?t seen any particular message but my resulting annotation was obviously wrong. Indeed most of the data repeat/alignment/models contained in the gff file haven?t been passed to MAKER. Would it be possible to add a fix to continue to parse a gff file even after meeting a fasta sequence? Best regards, /Jacques ------------------------------------------------- Jacques Dainat, Ph.D. NBIS (National Bioinformatics Infrastructure Sweden) Genome Annotation Service http://nbis.se/about/staff/jacques-dainat https://github.com/NBISweden/GAAS http://nbis.se ? Contact ? Address: Uppsala University, Biomedicinska Centrum Department of Medical Biochemistry Microbiology, Genomics Husargatan 3, box 582 S-75123 Uppsala Sweden Phone: +46 18 471 46 25 -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Aug 20 07:16:23 2019 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 20 Aug 2019 07:16:23 -0600 Subject: [maker-devel] maker_gff parameter - problem when gff contains fasta sequences In-Reply-To: <0EBF46CF-C0C8-4985-93D4-7BA587413DA7@nbis.se> References: <0EBF46CF-C0C8-4985-93D4-7BA587413DA7@nbis.se> Message-ID: All fasta entries must occur at the end of the file according to gff3 specification. If a fasta entry is embedded in the middle, you have a corrupt file. If you are trying to merge gff3, files you can use the gff3_merge script. Concatenation via something like ?cat? however results in a broken file. ?Carson Sent from my iPhone > On Aug 20, 2019, at 2:14 AM, Jacques Dainat wrote: > > Dear Carson, > > I?m using maker/3.01.02 with open MPI. > I realised that the option maker_gff from the maker_opts.ctl works great as long as no FASTA sequence is embeded in the GFF3 file. > e.g: > ``` > ### > ##FASTA > >3098|quiver > TTTATGGGTTCAGGCGGACCCATGGCGCCGACCATATTTTGAGAGCTGGACGACTCTGTA > GGGTTGGGTATTGGCTGATTATTCATTCAAATCCCACGAGTAGCCTAGGAAGTGACGGTC > ``` > I ended up with GFF3 files containing fasta sequences in a sequential manner (All contig1 features then the sequence of contig1, all contig2 features then the sequence of contig2, etc? I precise this because we can meet gff3 files where all the sequences are gather at the end of the file). In such case MAKER takes in consideration only the features met before to reach the first FASTA sequence in the file. Then it stops to process the file and doesn?t consider the rest of it. > > I haven?t seen any particular message but my resulting annotation was obviously wrong. Indeed most of the data repeat/alignment/models contained in the gff file haven?t been passed to MAKER. Would it be possible to add a fix to continue to parse a gff file even after meeting a fasta sequence? > > > Best regards, > > /Jacques > ------------------------------------------------- > Jacques Dainat, Ph.D. > NBIS (National Bioinformatics Infrastructure Sweden) > Genome Annotation Service > http://nbis.se/about/staff/jacques-dainat > https://github.com/NBISweden/GAAS > http://nbis.se > > ? Contact ? > Address: Uppsala University, Biomedicinska Centrum > Department of Medical Biochemistry Microbiology, Genomics > Husargatan 3, box 582 > S-75123 Uppsala Sweden > Phone: +46 18 471 46 25 > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Aug 20 07:20:50 2019 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 20 Aug 2019 07:20:50 -0600 Subject: [maker-devel] maker_gff parameter - problem when gff contains fasta sequences In-Reply-To: References: <0EBF46CF-C0C8-4985-93D4-7BA587413DA7@nbis.se> Message-ID: Here is the relevant part of the format specification ?> ##FASTA This notation indicates that the annotation portion of the file is at an end and that the remainder of the file contains one or more sequences (nucleotide or protein) in FASTA format. This allows features and sequences to be bundled together. All FASTA sequences included in the file must be included together at the end of the file and may not be interspersed with the features lines. Once a ##FASTA section is encountered no other content beyond valid FASTA sequence is allowed. ?Carson Sent from my iPhone > On Aug 20, 2019, at 7:16 AM, Carson Holt wrote: > > All fasta entries must occur at the end of the file according to gff3 specification. If a fasta entry is embedded in the middle, you have a corrupt file. If you are trying to merge gff3, files you can use the gff3_merge script. Concatenation via something like ?cat? however results in a broken file. > > ?Carson > > Sent from my iPhone > >> On Aug 20, 2019, at 2:14 AM, Jacques Dainat wrote: >> >> Dear Carson, >> >> I?m using maker/3.01.02 with open MPI. >> I realised that the option maker_gff from the maker_opts.ctl works great as long as no FASTA sequence is embeded in the GFF3 file. >> e.g: >> ``` >> ### >> ##FASTA >> >3098|quiver >> TTTATGGGTTCAGGCGGACCCATGGCGCCGACCATATTTTGAGAGCTGGACGACTCTGTA >> GGGTTGGGTATTGGCTGATTATTCATTCAAATCCCACGAGTAGCCTAGGAAGTGACGGTC >> ``` >> I ended up with GFF3 files containing fasta sequences in a sequential manner (All contig1 features then the sequence of contig1, all contig2 features then the sequence of contig2, etc? I precise this because we can meet gff3 files where all the sequences are gather at the end of the file). In such case MAKER takes in consideration only the features met before to reach the first FASTA sequence in the file. Then it stops to process the file and doesn?t consider the rest of it. >> >> I haven?t seen any particular message but my resulting annotation was obviously wrong. Indeed most of the data repeat/alignment/models contained in the gff file haven?t been passed to MAKER. Would it be possible to add a fix to continue to parse a gff file even after meeting a fasta sequence? >> >> >> Best regards, >> >> /Jacques >> ------------------------------------------------- >> Jacques Dainat, Ph.D. >> NBIS (National Bioinformatics Infrastructure Sweden) >> Genome Annotation Service >> http://nbis.se/about/staff/jacques-dainat >> https://github.com/NBISweden/GAAS >> http://nbis.se >> >> ? Contact ? >> Address: Uppsala University, Biomedicinska Centrum >> Department of Medical Biochemistry Microbiology, Genomics >> Husargatan 3, box 582 >> S-75123 Uppsala Sweden >> Phone: +46 18 471 46 25 >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From jacques.dainat at nbis.se Tue Aug 20 07:28:46 2019 From: jacques.dainat at nbis.se (Jacques Dainat) Date: Tue, 20 Aug 2019 15:28:46 +0200 Subject: [maker-devel] maker_gff parameter - problem when gff contains fasta sequences In-Reply-To: References: <0EBF46CF-C0C8-4985-93D4-7BA587413DA7@nbis.se> Message-ID: <66852BCA-9F3C-40CB-B58A-9578E3418851@nbis.se> Thank you for your quick answer, you are right I should have read the gff3 specification more carefully. I will investigate which step I modified that introduced the problem. Thank again. /Jacques > On 20 Aug 2019, at 15:16, Carson Holt wrote: > > All fasta entries must occur at the end of the file according to gff3 specification. If a fasta entry is embedded in the middle, you have a corrupt file. If you are trying to merge gff3, files you can use the gff3_merge script. Concatenation via something like ?cat? however results in a broken file. > > ?Carson > > Sent from my iPhone > > On Aug 20, 2019, at 2:14 AM, Jacques Dainat > wrote: > >> Dear Carson, >> >> I?m using maker/3.01.02 with open MPI. >> I realised that the option maker_gff from the maker_opts.ctl works great as long as no FASTA sequence is embeded in the GFF3 file. >> e.g: >> ``` >> ### >> ##FASTA >> >3098|quiver >> TTTATGGGTTCAGGCGGACCCATGGCGCCGACCATATTTTGAGAGCTGGACGACTCTGTA >> GGGTTGGGTATTGGCTGATTATTCATTCAAATCCCACGAGTAGCCTAGGAAGTGACGGTC >> ``` >> I ended up with GFF3 files containing fasta sequences in a sequential manner (All contig1 features then the sequence of contig1, all contig2 features then the sequence of contig2, etc? I precise this because we can meet gff3 files where all the sequences are gather at the end of the file). In such case MAKER takes in consideration only the features met before to reach the first FASTA sequence in the file. Then it stops to process the file and doesn?t consider the rest of it. >> >> I haven?t seen any particular message but my resulting annotation was obviously wrong. Indeed most of the data repeat/alignment/models contained in the gff file haven?t been passed to MAKER. Would it be possible to add a fix to continue to parse a gff file even after meeting a fasta sequence? >> >> >> Best regards, >> >> /Jacques >> ------------------------------------------------- >> Jacques Dainat, Ph.D. >> NBIS (National Bioinformatics Infrastructure Sweden) >> Genome Annotation Service >> http://nbis.se/about/staff/jacques-dainat >> https://github.com/NBISweden/GAAS >> http://nbis.se >> >> ? Contact ? >> Address: Uppsala University, Biomedicinska Centrum >> Department of Medical Biochemistry Microbiology, Genomics >> Husargatan 3, box 582 >> S-75123 Uppsala Sweden >> Phone: +46 18 471 46 25 >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From jacques.dainat at nbis.se Wed Aug 21 06:59:43 2019 From: jacques.dainat at nbis.se (Jacques Dainat) Date: Wed, 21 Aug 2019 14:59:43 +0200 Subject: [maker-devel] GeneWise in MAKER? Message-ID: Dear Carson, Reading this paper J. Armstrong, I. T. Fiddes, M. Diekhans, and B. Paten. Whole-Genome Alignment and Comparative Annotation. Annu Rev Anim Biosci, Oct 2018. I discovered that MAKER is running GeneWise (i.e table 4). Do they mixup with Exonerate, or it is something well hidden within MAKER and its documentation? Best regards, Jacques ------------------------------------------------- Jacques Dainat, Ph.D. NBIS (National Bioinformatics Infrastructure Sweden) Genome Annotation Service http://nbis.se/about/staff/jacques-dainat https://github.com/NBISweden/GAAS http://nbis.se ? Contact ? Address: Uppsala University, Biomedicinska Centrum Department of Medical Biochemistry Microbiology, Genomics Husargatan 3, box 582 S-75123 Uppsala Sweden Phone: +46 18 471 46 25 -------------- next part -------------- An HTML attachment was scrubbed... URL: From ABoyher at danforthcenter.org Wed Aug 28 12:31:33 2019 From: ABoyher at danforthcenter.org (Boyher, Adam) Date: Wed, 28 Aug 2019 18:31:33 +0000 Subject: [maker-devel] Haplotype specific annotations Message-ID: <02119316-A412-43AB-A6A8-05037B14D972@contoso.com> Hi I have a phased genome (haplotype specific assemblies) that I have annotated separately with Maker. There are a couple of things I?ve had a little trouble figuring out in relation to this. The first is that I have 3 sets of genes, two sets that exist in one haplotype but not the other, and a third that exist in both. I want to name these genes specifically based on what set they are in. So for instance, genes that exist in both have the same name in both assemblies, but genes that exist in only one haplotype are named specific to that haplotype. Is there a straightforward way to do this? The second issue is that I?ve discovered one gene that is annotated in one phase, but not the other. However, when I blast the genomic sequence against the second haplotype, I find an exact match. Given that I used the exact same methods to annotate both haplotypes, starting with the same set of evidence (protein, transcriptome, repeat), why might maker miss or exclude that gene? Thanks Adam -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Aug 30 10:20:32 2019 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 30 Aug 2019 10:20:32 -0600 Subject: [maker-devel] Haplotype specific annotations In-Reply-To: <02119316-A412-43AB-A6A8-05037B14D972@contoso.com> References: <02119316-A412-43AB-A6A8-05037B14D972@contoso.com> Message-ID: <098E19B1-BDB4-4FDA-A94B-4E30CEB5B58A@gmail.com> It may be that there are broken splice donors/acceptors in one vs the other which would not be seen with a blast search. Look at both in a browser to see what evidence looks like in both and how ab initial predictions compare between the two. As for naming you could try reciprical best blast hits to see who matches who. Unfortunately you will have to do a lot of manual review to make sure you are not just matching parlors together. ?Carson > On Aug 28, 2019, at 12:31 PM, Boyher, Adam wrote: > > Hi > > I have a phased genome (haplotype specific assemblies) that I have annotated separately with Maker. There are a couple of things I?ve had a little trouble figuring out in relation to this. The first is that I have 3 sets of genes, two sets that exist in one haplotype but not the other, and a third that exist in both. I want to name these genes specifically based on what set they are in. So for instance, genes that exist in both have the same name in both assemblies, but genes that exist in only one haplotype are named specific to that haplotype. Is there a straightforward way to do this? > > The second issue is that I?ve discovered one gene that is annotated in one phase, but not the other. However, when I blast the genomic sequence against the second haplotype, I find an exact match. Given that I used the exact same methods to annotate both haplotypes, starting with the same set of evidence (protein, transcriptome, repeat), why might maker miss or exclude that gene? > > Thanks > Adam > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Aug 30 10:48:52 2019 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 30 Aug 2019 10:48:52 -0600 Subject: [maker-devel] which files are expected after fasta_merge? In-Reply-To: References: Message-ID: If you disabled evidence for round 3 (i.e. protein= and est=) then you will get no annotations and EVM will not run. You can look at the GFF3 in a browser, and if you see that there are no protein/est alignments, then that is likely why. ?Carson > On Aug 15, 2019, at 2:48 PM, Brandon Pickett wrote: > > Good afternoon! > > I just finished my third round of maker. I trained snap, augustus, etc. between the rounds. I used fasta_merge and gff3_merge to extract files after each round of maker. gff3_merge performed as expected each time, but fasta_merge surprised me. I will show you which files fasta_merge generated after each round. Please note that, as many people do, I renamed my output files from the default. Accordingly, I will list all the files with a generalized prefix of "maker" and show the rest of the file name as it was generated for me. Also note that I've changed .fasta to .fa for brevity. > > After round #1: > transcripts.fa > proteins.fa > > After round #2: > non_overlapping_ab_initio.proteins.fa > non_overlapping_ab_initio.transcripts.fa > transcripts.fa > augustus_masked.proteins.fa > augustus_masked.transcripts.fa > evm.proteins.fa > evm.transcripts.fa > genemark.proteins.fa > genemark.transcripts.fa > snap_masked.proteins.fa > snap_masked.transcripts.fa > proteins.fa > > After round #3: > non_overlapping_ab_initio.proteins.fa > non_overlapping_ab_initio.transcripts.fa > augustus_masked.proteins.fa > augustus_masked.transcripts.fa > genemark.proteins.fa > genemark.transcripts.fa > snap_masked.proteins.fa > snap_masked.transcripts.fa > > I am unsurprised that I didn't get all these files after round #1 because I used round #1 to generate gene models from transcript evidence. I didn't expect so many files after round #2 (having only seen the output from round #1 up to that point), but it makes sense that I would get output from augustus, evidence modeler (evm), genemark, and snap since I provided them as input to this round (#2) of maker. Between rounds #2 and #3, I re-trained snap and augustus. Genemark was trained between rounds #1 and #2 without gene models from maker and thus did not require re-training. The only difference in my maker control files between rounds #2 and #3 were the paths to the snap and augustus files. In both #2 and #3, the control files had run_evm=1. I can provide my control files for each round, if needed. My question is why transcripts.fa, proteins.fa, evm.proteins.fa, and evm.transcripts.fa were not generated after round #3? I recognize that this is probably not an error, rather a lack of my understanding of when each file is and is not generated. > > Thank you, > Brandon Pickett > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From Cenny.Taslim at nationwidechildrens.org Thu Aug 8 07:12:23 2019 From: Cenny.Taslim at nationwidechildrens.org (Taslim, Cenny) Date: Thu, 8 Aug 2019 13:12:23 +0000 Subject: [maker-devel] maker with mpi support on example still not done after two days Message-ID: <82339125479f4b278b87bf8458c2c04d@l1perdwmbx02.childrensroot.net> Hi Maker developers, Thanks for approving my subscription. I tried running maker with mpi support on the human fasta file provided in example_01_basic. It's been running for 2 days and 21 hours. I didn't think the example require a long time to run. I'm hoping someone can help me point out the problem. I'm running it with 4 processes: ~/opt/mpich-3.3.1/bin/mpiexec -n 4 ~/opt/maker.4/maker/bin/maker -f 2> maker.error Maker_opts.ctl is the same as opts2.txt These are the log: STATUS: Parsing control files... STATUS: Processing and indexing input FASTA files... STATUS: Setting up database for any GFF3 input... A data structure will be created for you at: /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore To access files for individual sequences use the datastore index: /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_master_datastore_index.log STATUS: Now running MAKER... examining contents of the fasta file and run log --Next Contig-- #--------------------------------------------------------------------- Now starting the contig!! SeqID: NT_010783.15 Length: 201444 #--------------------------------------------------------------------- setting up GFF3 output and fasta chunks doing repeat masking doing repeat masking running repeat masker. #--------- command -------------# Widget::RepeatMasker: cd /gpfs0/scratch/1895302/maker_kXmduG; /opt/maker.4/maker/exe/RepeatMasker/RepeatMasker /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.0.all.rb -species all -dir /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0 -pa 1 #-------------------------------# running repeat masker. #--------- command -------------# Widget::RepeatMasker: cd /gpfs0/scratch/1895302/maker_kXmduG; /opt/maker.4/maker/exe/RepeatMasker/RepeatMasker /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.all.rb -species all -dir /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0 -pa 1 #-------------------------------# doing blastx repeats formating database... #--------- command -------------# Widget::formater: /opt/miniconda3/bin/makeblastdb -dbtype prot -in /gpfs0/scratch/1895302/maker_kXmduG/1/blastprep/te_proteins%2Efasta.mpi.10.0 #-------------------------------# doing blastx repeats formating database... #--------- command -------------# Widget::formater: /opt/miniconda3/bin/makeblastdb -dbtype prot -in /gpfs0/scratch/1895302/maker_kXmduG/2/blastprep/te_proteins%2Efasta.mpi.10.1 #-------------------------------# running blast search. #--------- command -------------# Widget::blastx: /opt/miniconda3/bin/blastx -db /gpfs0/scratch/1895302/maker_kXmduG/te_proteins%2Efasta.mpi.10.0 -query /gpfs0/scratch/1895302/maker_kXmduG/1/NT_010783%2E15.1 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.0.repeatrunner #-------------------------------# running blast search. #--------- command -------------# Widget::blastx: /opt/miniconda3/bin/blastx -db /gpfs0/scratch/1895302/maker_kXmduG/te_proteins%2Efasta.mpi.10.1 -query /gpfs0/scratch/1895302/maker_kXmduG/2/NT_010783%2E15.1 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.1.repeatrunner #-------------------------------# deleted:19 hits deleted:18 hits doing blastx repeats doing blastx repeats formating database... #--------- command -------------# Widget::formater: /opt/miniconda3/bin/makeblastdb -dbtype prot -in /gpfs0/scratch/1895302/maker_kXmduG/1/blastprep/te_proteins%2Efasta.mpi.10.2 #-------------------------------# formating database... #--------- command -------------# Widget::formater: /opt/miniconda3/bin/makeblastdb -dbtype prot -in /gpfs0/scratch/1895302/maker_kXmduG/2/blastprep/te_proteins%2Efasta.mpi.10.3 #-------------------------------# running blast search. #--------- command -------------# Widget::blastx: /opt/miniconda3/bin/blastx -db /gpfs0/scratch/1895302/maker_kXmduG/te_proteins%2Efasta.mpi.10.2 -query /gpfs0/scratch/1895302/maker_kXmduG/1/NT_010783%2E15.1 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.2.repeatrunner #-------------------------------# running blast search. #--------- command -------------# Widget::blastx: /opt/miniconda3/bin/blastx -db /gpfs0/scratch/1895302/maker_kXmduG/te_proteins%2Efasta.mpi.10.3 -query /gpfs0/scratch/1895302/maker_kXmduG/2/NT_010783%2E15.1 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.3.repeatrunner #-------------------------------# deleted:9 hits deleted:9 hits doing blastx repeats doing blastx repeats formating database... #--------- command -------------# Widget::formater: /opt/miniconda3/bin/makeblastdb -dbtype prot -in /gpfs0/scratch/1895302/maker_kXmduG/1/blastprep/te_proteins%2Efasta.mpi.10.5 #-------------------------------# formating database... #--------- command -------------# Widget::formater: /opt/miniconda3/bin/makeblastdb -dbtype prot -in /gpfs0/scratch/1895302/maker_kXmduG/2/blastprep/te_proteins%2Efasta.mpi.10.4 #-------------------------------# running blast search. #--------- command -------------# Widget::blastx: /opt/miniconda3/bin/blastx -db /gpfs0/scratch/1895302/maker_kXmduG/te_proteins%2Efasta.mpi.10.5 -query /gpfs0/scratch/1895302/maker_kXmduG/1/NT_010783%2E15.1 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.5.repeatrunner #-------------------------------# running blast search. #--------- command -------------# Widget::blastx: /opt/miniconda3/bin/blastx -db /gpfs0/scratch/1895302/maker_kXmduG/te_proteins%2Efasta.mpi.10.4 -query /gpfs0/scratch/1895302/maker_kXmduG/2/NT_010783%2E15.1 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.4.repeatrunner #-------------------------------# deleted:16 hits deleted:16 hits doing blastx repeats doing blastx repeats formating database... #--------- command -------------# Widget::formater: /opt/miniconda3/bin/makeblastdb -dbtype prot -in /gpfs0/scratch/1895302/maker_kXmduG/1/blastprep/te_proteins%2Efasta.mpi.10.6 #-------------------------------# formating database... #--------- command -------------# Widget::formater: /opt/miniconda3/bin/makeblastdb -dbtype prot -in /gpfs0/scratch/1895302/maker_kXmduG/2/blastprep/te_proteins%2Efasta.mpi.10.7 #-------------------------------# running blast search. #--------- command -------------# Widget::blastx: /opt/miniconda3/bin/blastx -db /gpfs0/scratch/1895302/maker_kXmduG/te_proteins%2Efasta.mpi.10.6 -query /gpfs0/scratch/1895302/maker_kXmduG/1/NT_010783%2E15.1 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.6.repeatrunner #-------------------------------# running blast search. #--------- command -------------# Widget::blastx: /opt/miniconda3/bin/blastx -db /gpfs0/scratch/1895302/maker_kXmduG/te_proteins%2Efasta.mpi.10.7 -query /gpfs0/scratch/1895302/maker_kXmduG/2/NT_010783%2E15.1 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.7.repeatrunner #-------------------------------# deleted:8 hits deleted:12 hits doing blastx repeats doing blastx repeats formating database... #--------- command -------------# Widget::formater: /opt/miniconda3/bin/makeblastdb -dbtype prot -in /gpfs0/scratch/1895302/maker_kXmduG/1/blastprep/te_proteins%2Efasta.mpi.10.8 #-------------------------------# formating database... #--------- command -------------# Widget::formater: /opt/miniconda3/bin/makeblastdb -dbtype prot -in /gpfs0/scratch/1895302/maker_kXmduG/2/blastprep/te_proteins%2Efasta.mpi.10.9 #-------------------------------# running blast search. #--------- command -------------# Widget::blastx: /opt/miniconda3/bin/blastx -db /gpfs0/scratch/1895302/maker_kXmduG/te_proteins%2Efasta.mpi.10.8 -query /gpfs0/scratch/1895302/maker_kXmduG/1/NT_010783%2E15.1 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.8.repeatrunner #-------------------------------# running blast search. #--------- command -------------# Widget::blastx: /opt/miniconda3/bin/blastx -db /gpfs0/scratch/1895302/maker_kXmduG/te_proteins%2Efasta.mpi.10.9 -query /gpfs0/scratch/1895302/maker_kXmduG/2/NT_010783%2E15.1 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.9.repeatrunner #-------------------------------# deleted:5 hits deleted:7 hits collecting blastx repeatmasking processing all repeats in cluster::shadow_cluster... ...finished clustering. Let me know if you need more information. Thank you in advance for your help. -------------- next part -------------- An HTML attachment was scrubbed... URL: From debojyoti.das.und at gmail.com Thu Aug 8 12:54:01 2019 From: debojyoti.das.und at gmail.com (Debojyoti Das) Date: Thu, 8 Aug 2019 13:54:01 -0500 Subject: [maker-devel] maker predicting only part of a gene Message-ID: Hi, I am working on a non-model reptile species. I tried running maker with est2genome=1 and protein2genome=1 with the following evidence: 1. est=transcriptome.fasta (de novo assembled) 2. protein in fasta format from two related species. I keep getting predictions where gene models identify multi-exon genes but fail to incorporate all the exons even though they are present on the scaffolds. The fact that in the predicted gene models some exons were correctly identified while missing others even though the entire gene is present on the scaffolds, we checked this by loading the annotation in IGV Viewer. Since we are not completely confident of our transciptome assembly, we thought of using cDNA from a closely related species. 1. altest="cDNA in fasta format from a related species" 2. protein in fasta format from two related species. However, when I do this I get the error *"ERROR: You must provide some form of EST evidence to use est2genome as a predictor."* Interestingly, if I switch est2genome off (setting it to zero) maker starts running. Any suggestions on how to proceed. Best, Debojyoti -------------- next part -------------- An HTML attachment was scrubbed... URL: From debojyoti.das at und.edu Thu Aug 8 13:48:36 2019 From: debojyoti.das at und.edu (Das, Debojyoti) Date: Thu, 8 Aug 2019 19:48:36 +0000 Subject: [maker-devel] maker predicting only part of a gene Message-ID: Hi Carson, I am working on a non-model reptile species. I tried running maker with est2genome=1 and protein2genome=1 with the following evidence: 1. est=transcriptome.fasta (de novo assembled) 2. protein in fasta format from two related species. I keep getting predictions where gene models identify multi-exon genes but fail to incorporate all the exons even though they are present on the scaffolds. The fact that in the predicted gene models some exons were correctly identified while missing others even though the entire gene is present on the scaffolds, we checked this by loading the annotation in IGV Viewer. Since we are not completely confident of our transciptome assembly, we thought of using cDNA from a closely related species. 1. altest="cDNA in fasta format from a related species" 2. protein in fasta format from two related species. However, when I do this I get the error "ERROR: You must provide some form of EST evidence to use est2genome as a predictor." Interestingly, if I switch est2genome off (setting it to zero) maker starts running. Any suggestions on how to proceed. Best, Debojyoti -------------- next part -------------- An HTML attachment was scrubbed... URL: From jmartin at wustl.edu Thu Aug 8 18:41:31 2019 From: jmartin at wustl.edu (Martin, John) Date: Fri, 9 Aug 2019 00:41:31 +0000 Subject: [maker-devel] Running maker with suboptimal evidence Message-ID: Greetings, I would like to annotate a worm genome of ~90Mb but the evidence I have is not all of good quality. I have 2 high quality protein sets from previously finished & curated, closely related worms. I also have a small amount of RNAseq and some old EST data neither of which give good coverage of the transcriptome. And I have a previously run Maker geneset for this worm that I believe was generated using probably all nematodes from genbank and that poor set of RNAseq. I also have a 'fair' set of predictions from running Braker2 using only the high quality protein data I mentioned. My opinion that the previous Maker geneset is of poor quality comes from comparing that geneset to my recent Braker2 annotations and that RNAseq. I would like to try and build an improved annotation for this assembly but I'm unsure of whether I should use all this evidence, or if I would be better off not using the low coverage RNAseq, EST data and previous (questionable looking) Maker geneset. I'm looking for opinions on whether I should throw all the evidence I have into Maker or should I use only evidence that I consider of good quality? Thanks, John Martin ________________________________ The materials in this message are private and may contain Protected Healthcare Information or other information of a sensitive nature. If you are not the intended recipient, be advised that any unauthorized use, disclosure, copying or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error, please immediately notify the sender via telephone or return mail. From Cenny.Taslim at nationwidechildrens.org Wed Aug 14 10:08:40 2019 From: Cenny.Taslim at nationwidechildrens.org (Taslim, Cenny) Date: Wed, 14 Aug 2019 16:08:40 +0000 Subject: [maker-devel] maker with mpi support on example still not done after two days Message-ID: I figured out that the example job will be finished if I'm not using mpi. i.e. running the job as such is fine: ~/opt/maker.4/maker/bin/maker -f 2> maker.error Any suggestions? Without mpi, I suspect it will take months to complete as I have human genome with ~3500 contigs From carsonhh at gmail.com Wed Aug 14 12:54:26 2019 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 14 Aug 2019 12:54:26 -0600 Subject: [maker-devel] Running maker with suboptimal evidence In-Reply-To: References: Message-ID: <9C1C9005-8285-4A48-9F48-066D15D402A9@gmail.com> Run it both ways then look at it in a browser like Apollo. You can then visually see how evidence alignments compare to the models and if they are spurious. While not practical in most situations, manual review is still the gold standard for genome annotation. ?Carson > On Aug 8, 2019, at 6:41 PM, Martin, John wrote: > > Greetings, > > I would like to annotate a worm genome of ~90Mb but the evidence I > have is not all of good quality. I have 2 high quality protein sets > from previously finished & curated, closely related worms. I also have > a small amount of RNAseq and some old EST data neither of which give > good coverage of the transcriptome. And I have a previously run Maker > geneset for this worm that I believe was generated using probably all > nematodes from genbank and that poor set of RNAseq. I also have a > 'fair' set of predictions from running Braker2 using only the high > quality protein data I mentioned. My opinion that the previous Maker > geneset is of poor quality comes from comparing that geneset to my > recent Braker2 annotations and that RNAseq. > > I would like to try and build an improved annotation for this > assembly but I'm unsure of whether I should use all this evidence, or if > I would be better off not using the low coverage RNAseq, EST data and > previous (questionable looking) Maker geneset. I'm looking for > opinions on whether I should throw all the evidence I have into Maker or > should I use only evidence that I consider of good quality? > > > Thanks, > > John Martin > > > > ________________________________ > The materials in this message are private and may contain Protected Healthcare Information or other information of a sensitive nature. If you are not the intended recipient, be advised that any unauthorized use, disclosure, copying or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error, please immediately notify the sender via telephone or return mail. > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Wed Aug 14 12:57:39 2019 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 14 Aug 2019 12:57:39 -0600 Subject: [maker-devel] maker predicting only part of a gene In-Reply-To: References: Message-ID: <31721D1F-7079-4935-BE34-A8F4786A5797@gmail.com> Both est2genome=1 and protein2genome=1 do not predict genes. They simply transfer exonerate alignments which match ORFs into gene models. It?s good enough to train a predictor like SNAP or Augustus, but should not be used as the final models. If you review the documentation you will see that they should be turned off once you train a predictor. Here is an example ?> http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial_for_WGS_Assembly_and_Annotation_Winter_School_2018#Training_ab_initio_Gene_Predictors ?Carson > On Aug 8, 2019, at 1:48 PM, Das, Debojyoti wrote: > > Hi Carson, > > I am working on a non-model reptile species. I tried running maker with est2genome=1 and protein2genome=1 with the following evidence: > > 1. est=transcriptome.fasta (de novo assembled) > 2. protein in fasta format from two related species. > > > I keep getting predictions where gene models identify multi-exon genes but fail to incorporate all the exons even though they are present on the scaffolds. The fact that in the predicted gene models some exons were correctly identified while missing others even though the entire gene is present on the scaffolds, we checked this by loading the annotation in IGV Viewer. > > Since we are not completely confident of our transciptome assembly, we thought of using cDNA from a closely related species. > 1. altest="cDNA in fasta format from a related species" > 2. protein in fasta format from two related species. > > However, when I do this I get the error > "ERROR: You must provide some form of EST evidence to use est2genome as a predictor." > > Interestingly, if I switch est2genome off (setting it to zero) maker starts running. > > Any suggestions on how to proceed. > > Best, > Debojyoti > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Aug 14 13:05:05 2019 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 14 Aug 2019 13:05:05 -0600 Subject: [maker-devel] maker with mpi support on example still not done after two days In-Reply-To: <82339125479f4b278b87bf8458c2c04d@l1perdwmbx02.childrensroot.net> References: <82339125479f4b278b87bf8458c2c04d@l1perdwmbx02.childrensroot.net> Message-ID: If no additional output is produced, then it is frozen for some reason. My first suggestion would be to reinstall MPICH and then reinstall MAKER using the MPICH you just installed. Freezing issues are usually related to MPI communication which is handled by the communicator you have installed (i.e. the mpiexec command that launches MAKER). Alternatively it can be related to the file system. Some less common network mounted file systems do not support do not correctly support hardlinks which can cause programatic file locks to freeze. You can try running with the -nolock flag if that is the case. ?Carson > On Aug 8, 2019, at 7:12 AM, Taslim, Cenny wrote: > > Hi Maker developers, > > Thanks for approving my subscription. > I tried running maker with mpi support on the human fasta file provided in example_01_basic. It?s been running for 2 days and 21 hours. I didn?t think the example require a long time to run. > I?m hoping someone can help me point out the problem. > > I?m running it with 4 processes: > ~/opt/mpich-3.3.1/bin/mpiexec -n 4 ~/opt/maker.4/maker/bin/maker -f 2> maker.error > > Maker_opts.ctl is the same as opts2.txt > > These are the log: > STATUS: Parsing control files... > STATUS: Processing and indexing input FASTA files... > STATUS: Setting up database for any GFF3 input... > A data structure will be created for you at: > /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore > > To access files for individual sequences use the datastore index: > /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_master_datastore_index.log > > STATUS: Now running MAKER... > examining contents of the fasta file and run log > > > > --Next Contig-- > > #--------------------------------------------------------------------- > Now starting the contig!! > SeqID: NT_010783.15 > Length: 201444 > #--------------------------------------------------------------------- > > > setting up GFF3 output and fasta chunks > doing repeat masking > doing repeat masking > running repeat masker. > #--------- command -------------# > Widget::RepeatMasker: > cd /gpfs0/scratch/1895302/maker_kXmduG; /opt/maker.4/maker/exe/RepeatMasker/RepeatMasker /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.0.all.rb -species all -dir /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0 -pa 1 > #-------------------------------# > running repeat masker. > #--------- command -------------# > Widget::RepeatMasker: > cd /gpfs0/scratch/1895302/maker_kXmduG; /opt/maker.4/maker/exe/RepeatMasker/RepeatMasker /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.all.rb -species all -dir /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0 -pa 1 > #-------------------------------# > doing blastx repeats > formating database... > #--------- command -------------# > Widget::formater: > /opt/miniconda3/bin/makeblastdb -dbtype prot -in /gpfs0/scratch/1895302/maker_kXmduG/1/blastprep/te_proteins%2Efasta.mpi.10.0 > #-------------------------------# > doing blastx repeats > formating database... > #--------- command -------------# > Widget::formater: > /opt/miniconda3/bin/makeblastdb -dbtype prot -in /gpfs0/scratch/1895302/maker_kXmduG/2/blastprep/te_proteins%2Efasta.mpi.10.1 > #-------------------------------# > running blast search. > #--------- command -------------# > Widget::blastx: > /opt/miniconda3/bin/blastx -db /gpfs0/scratch/1895302/maker_kXmduG/te_proteins%2Efasta.mpi.10.0 -query /gpfs0/scratch/1895302/maker_kXmduG/1/NT_010783%2E15.1 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.0.repeatrunner > #-------------------------------# > running blast search. > #--------- command -------------# > Widget::blastx: > /opt/miniconda3/bin/blastx -db /gpfs0/scratch/1895302/maker_kXmduG/te_proteins%2Efasta.mpi.10.1 -query /gpfs0/scratch/1895302/maker_kXmduG/2/NT_010783%2E15.1 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.1.repeatrunner > #-------------------------------# > deleted:19 hits > deleted:18 hits > doing blastx repeats > doing blastx repeats > formating database... > #--------- command -------------# > Widget::formater: > /opt/miniconda3/bin/makeblastdb -dbtype prot -in /gpfs0/scratch/1895302/maker_kXmduG/1/blastprep/te_proteins%2Efasta.mpi.10.2 > #-------------------------------# > formating database... > #--------- command -------------# > Widget::formater: > /opt/miniconda3/bin/makeblastdb -dbtype prot -in /gpfs0/scratch/1895302/maker_kXmduG/2/blastprep/te_proteins%2Efasta.mpi.10.3 > #-------------------------------# > running blast search. > #--------- command -------------# > Widget::blastx: > /opt/miniconda3/bin/blastx -db /gpfs0/scratch/1895302/maker_kXmduG/te_proteins%2Efasta.mpi.10.2 -query /gpfs0/scratch/1895302/maker_kXmduG/1/NT_010783%2E15.1 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.2.repeatrunner > #-------------------------------# > running blast search. > #--------- command -------------# > Widget::blastx: > /opt/miniconda3/bin/blastx -db /gpfs0/scratch/1895302/maker_kXmduG/te_proteins%2Efasta.mpi.10.3 -query /gpfs0/scratch/1895302/maker_kXmduG/2/NT_010783%2E15.1 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.3.repeatrunner > #-------------------------------# > deleted:9 hits > deleted:9 hits > doing blastx repeats > doing blastx repeats > formating database... > #--------- command -------------# > Widget::formater: > /opt/miniconda3/bin/makeblastdb -dbtype prot -in /gpfs0/scratch/1895302/maker_kXmduG/1/blastprep/te_proteins%2Efasta.mpi.10.5 > #-------------------------------# > formating database... > #--------- command -------------# > Widget::formater: > /opt/miniconda3/bin/makeblastdb -dbtype prot -in /gpfs0/scratch/1895302/maker_kXmduG/2/blastprep/te_proteins%2Efasta.mpi.10.4 > #-------------------------------# > running blast search. > #--------- command -------------# > Widget::blastx: > /opt/miniconda3/bin/blastx -db /gpfs0/scratch/1895302/maker_kXmduG/te_proteins%2Efasta.mpi.10.5 -query /gpfs0/scratch/1895302/maker_kXmduG/1/NT_010783%2E15.1 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.5.repeatrunner > #-------------------------------# > running blast search. > #--------- command -------------# > Widget::blastx: > /opt/miniconda3/bin/blastx -db /gpfs0/scratch/1895302/maker_kXmduG/te_proteins%2Efasta.mpi.10.4 -query /gpfs0/scratch/1895302/maker_kXmduG/2/NT_010783%2E15.1 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.4.repeatrunner > #-------------------------------# > deleted:16 hits > deleted:16 hits > doing blastx repeats > doing blastx repeats > formating database... > #--------- command -------------# > Widget::formater: > /opt/miniconda3/bin/makeblastdb -dbtype prot -in /gpfs0/scratch/1895302/maker_kXmduG/1/blastprep/te_proteins%2Efasta.mpi.10.6 > #-------------------------------# > formating database... > #--------- command -------------# > Widget::formater: > /opt/miniconda3/bin/makeblastdb -dbtype prot -in /gpfs0/scratch/1895302/maker_kXmduG/2/blastprep/te_proteins%2Efasta.mpi.10.7 > #-------------------------------# > running blast search. > #--------- command -------------# > Widget::blastx: > /opt/miniconda3/bin/blastx -db /gpfs0/scratch/1895302/maker_kXmduG/te_proteins%2Efasta.mpi.10.6 -query /gpfs0/scratch/1895302/maker_kXmduG/1/NT_010783%2E15.1 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.6.repeatrunner > #-------------------------------# > running blast search. > #--------- command -------------# > Widget::blastx: > /opt/miniconda3/bin/blastx -db /gpfs0/scratch/1895302/maker_kXmduG/te_proteins%2Efasta.mpi.10.7 -query /gpfs0/scratch/1895302/maker_kXmduG/2/NT_010783%2E15.1 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.7.repeatrunner > #-------------------------------# > deleted:8 hits > deleted:12 hits > doing blastx repeats > doing blastx repeats > formating database... > #--------- command -------------# > Widget::formater: > /opt/miniconda3/bin/makeblastdb -dbtype prot -in /gpfs0/scratch/1895302/maker_kXmduG/1/blastprep/te_proteins%2Efasta.mpi.10.8 > #-------------------------------# > formating database... > #--------- command -------------# > Widget::formater: > /opt/miniconda3/bin/makeblastdb -dbtype prot -in /gpfs0/scratch/1895302/maker_kXmduG/2/blastprep/te_proteins%2Efasta.mpi.10.9 > #-------------------------------# > running blast search. > #--------- command -------------# > Widget::blastx: > /opt/miniconda3/bin/blastx -db /gpfs0/scratch/1895302/maker_kXmduG/te_proteins%2Efasta.mpi.10.8 -query /gpfs0/scratch/1895302/maker_kXmduG/1/NT_010783%2E15.1 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.8.repeatrunner > #-------------------------------# > running blast search. > #--------- command -------------# > Widget::blastx: > /opt/miniconda3/bin/blastx -db /gpfs0/scratch/1895302/maker_kXmduG/te_proteins%2Efasta.mpi.10.9 -query /gpfs0/scratch/1895302/maker_kXmduG/2/NT_010783%2E15.1 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.9.repeatrunner > #-------------------------------# > deleted:5 hits > deleted:7 hits > collecting blastx repeatmasking > processing all repeats > in cluster::shadow_cluster... > ...finished clustering. > > Let me know if you need more information. Thank you in advance for your help. > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Aug 14 13:06:41 2019 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 14 Aug 2019 13:06:41 -0600 Subject: [maker-devel] maker with mpi support on example still not done after two days In-Reply-To: References: <82339125479f4b278b87bf8458c2c04d@l1perdwmbx02.childrensroot.net> Message-ID: <54BF4DAE-272A-453E-9522-A82FAADABEF7@gmail.com> You can also try OpenMPI instead of MPICH to see if it behaves better (requires MAKER reinstall against OpenMPI libraries). ?Carson > On Aug 14, 2019, at 1:05 PM, Carson Holt wrote: > > If no additional output is produced, then it is frozen for some reason. My first suggestion would be to reinstall MPICH and then reinstall MAKER using the MPICH you just installed. Freezing issues are usually related to MPI communication which is handled by the communicator you have installed (i.e. the mpiexec command that launches MAKER). > > Alternatively it can be related to the file system. Some less common network mounted file systems do not support do not correctly support hardlinks which can cause programatic file locks to freeze. You can try running with the -nolock flag if that is the case. > > ?Carson > > >> On Aug 8, 2019, at 7:12 AM, Taslim, Cenny > wrote: >> >> Hi Maker developers, >> >> Thanks for approving my subscription. >> I tried running maker with mpi support on the human fasta file provided in example_01_basic. It?s been running for 2 days and 21 hours. I didn?t think the example require a long time to run. >> I?m hoping someone can help me point out the problem. >> >> I?m running it with 4 processes: >> ~/opt/mpich-3.3.1/bin/mpiexec -n 4 ~/opt/maker.4/maker/bin/maker -f 2> maker.error >> >> Maker_opts.ctl is the same as opts2.txt >> >> These are the log: >> STATUS: Parsing control files... >> STATUS: Processing and indexing input FASTA files... >> STATUS: Setting up database for any GFF3 input... >> A data structure will be created for you at: >> /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore >> >> To access files for individual sequences use the datastore index: >> /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_master_datastore_index.log >> >> STATUS: Now running MAKER... >> examining contents of the fasta file and run log >> >> >> >> --Next Contig-- >> >> #--------------------------------------------------------------------- >> Now starting the contig!! >> SeqID: NT_010783.15 >> Length: 201444 >> #--------------------------------------------------------------------- >> >> >> setting up GFF3 output and fasta chunks >> doing repeat masking >> doing repeat masking >> running repeat masker. >> #--------- command -------------# >> Widget::RepeatMasker: >> cd /gpfs0/scratch/1895302/maker_kXmduG; /opt/maker.4/maker/exe/RepeatMasker/RepeatMasker /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.0.all.rb -species all -dir /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0 -pa 1 >> #-------------------------------# >> running repeat masker. >> #--------- command -------------# >> Widget::RepeatMasker: >> cd /gpfs0/scratch/1895302/maker_kXmduG; /opt/maker.4/maker/exe/RepeatMasker/RepeatMasker /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.all.rb -species all -dir /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0 -pa 1 >> #-------------------------------# >> doing blastx repeats >> formating database... >> #--------- command -------------# >> Widget::formater: >> /opt/miniconda3/bin/makeblastdb -dbtype prot -in /gpfs0/scratch/1895302/maker_kXmduG/1/blastprep/te_proteins%2Efasta.mpi.10.0 >> #-------------------------------# >> doing blastx repeats >> formating database... >> #--------- command -------------# >> Widget::formater: >> /opt/miniconda3/bin/makeblastdb -dbtype prot -in /gpfs0/scratch/1895302/maker_kXmduG/2/blastprep/te_proteins%2Efasta.mpi.10.1 >> #-------------------------------# >> running blast search. >> #--------- command -------------# >> Widget::blastx: >> /opt/miniconda3/bin/blastx -db /gpfs0/scratch/1895302/maker_kXmduG/te_proteins%2Efasta.mpi.10.0 -query /gpfs0/scratch/1895302/maker_kXmduG/1/NT_010783%2E15.1 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.0.repeatrunner >> #-------------------------------# >> running blast search. >> #--------- command -------------# >> Widget::blastx: >> /opt/miniconda3/bin/blastx -db /gpfs0/scratch/1895302/maker_kXmduG/te_proteins%2Efasta.mpi.10.1 -query /gpfs0/scratch/1895302/maker_kXmduG/2/NT_010783%2E15.1 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.1.repeatrunner >> #-------------------------------# >> deleted:19 hits >> deleted:18 hits >> doing blastx repeats >> doing blastx repeats >> formating database... >> #--------- command -------------# >> Widget::formater: >> /opt/miniconda3/bin/makeblastdb -dbtype prot -in /gpfs0/scratch/1895302/maker_kXmduG/1/blastprep/te_proteins%2Efasta.mpi.10.2 >> #-------------------------------# >> formating database... >> #--------- command -------------# >> Widget::formater: >> /opt/miniconda3/bin/makeblastdb -dbtype prot -in /gpfs0/scratch/1895302/maker_kXmduG/2/blastprep/te_proteins%2Efasta.mpi.10.3 >> #-------------------------------# >> running blast search. >> #--------- command -------------# >> Widget::blastx: >> /opt/miniconda3/bin/blastx -db /gpfs0/scratch/1895302/maker_kXmduG/te_proteins%2Efasta.mpi.10.2 -query /gpfs0/scratch/1895302/maker_kXmduG/1/NT_010783%2E15.1 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.2.repeatrunner >> #-------------------------------# >> running blast search. >> #--------- command -------------# >> Widget::blastx: >> /opt/miniconda3/bin/blastx -db /gpfs0/scratch/1895302/maker_kXmduG/te_proteins%2Efasta.mpi.10.3 -query /gpfs0/scratch/1895302/maker_kXmduG/2/NT_010783%2E15.1 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.3.repeatrunner >> #-------------------------------# >> deleted:9 hits >> deleted:9 hits >> doing blastx repeats >> doing blastx repeats >> formating database... >> #--------- command -------------# >> Widget::formater: >> /opt/miniconda3/bin/makeblastdb -dbtype prot -in /gpfs0/scratch/1895302/maker_kXmduG/1/blastprep/te_proteins%2Efasta.mpi.10.5 >> #-------------------------------# >> formating database... >> #--------- command -------------# >> Widget::formater: >> /opt/miniconda3/bin/makeblastdb -dbtype prot -in /gpfs0/scratch/1895302/maker_kXmduG/2/blastprep/te_proteins%2Efasta.mpi.10.4 >> #-------------------------------# >> running blast search. >> #--------- command -------------# >> Widget::blastx: >> /opt/miniconda3/bin/blastx -db /gpfs0/scratch/1895302/maker_kXmduG/te_proteins%2Efasta.mpi.10.5 -query /gpfs0/scratch/1895302/maker_kXmduG/1/NT_010783%2E15.1 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.5.repeatrunner >> #-------------------------------# >> running blast search. >> #--------- command -------------# >> Widget::blastx: >> /opt/miniconda3/bin/blastx -db /gpfs0/scratch/1895302/maker_kXmduG/te_proteins%2Efasta.mpi.10.4 -query /gpfs0/scratch/1895302/maker_kXmduG/2/NT_010783%2E15.1 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.4.repeatrunner >> #-------------------------------# >> deleted:16 hits >> deleted:16 hits >> doing blastx repeats >> doing blastx repeats >> formating database... >> #--------- command -------------# >> Widget::formater: >> /opt/miniconda3/bin/makeblastdb -dbtype prot -in /gpfs0/scratch/1895302/maker_kXmduG/1/blastprep/te_proteins%2Efasta.mpi.10.6 >> #-------------------------------# >> formating database... >> #--------- command -------------# >> Widget::formater: >> /opt/miniconda3/bin/makeblastdb -dbtype prot -in /gpfs0/scratch/1895302/maker_kXmduG/2/blastprep/te_proteins%2Efasta.mpi.10.7 >> #-------------------------------# >> running blast search. >> #--------- command -------------# >> Widget::blastx: >> /opt/miniconda3/bin/blastx -db /gpfs0/scratch/1895302/maker_kXmduG/te_proteins%2Efasta.mpi.10.6 -query /gpfs0/scratch/1895302/maker_kXmduG/1/NT_010783%2E15.1 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.6.repeatrunner >> #-------------------------------# >> running blast search. >> #--------- command -------------# >> Widget::blastx: >> /opt/miniconda3/bin/blastx -db /gpfs0/scratch/1895302/maker_kXmduG/te_proteins%2Efasta.mpi.10.7 -query /gpfs0/scratch/1895302/maker_kXmduG/2/NT_010783%2E15.1 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.7.repeatrunner >> #-------------------------------# >> deleted:8 hits >> deleted:12 hits >> doing blastx repeats >> doing blastx repeats >> formating database... >> #--------- command -------------# >> Widget::formater: >> /opt/miniconda3/bin/makeblastdb -dbtype prot -in /gpfs0/scratch/1895302/maker_kXmduG/1/blastprep/te_proteins%2Efasta.mpi.10.8 >> #-------------------------------# >> formating database... >> #--------- command -------------# >> Widget::formater: >> /opt/miniconda3/bin/makeblastdb -dbtype prot -in /gpfs0/scratch/1895302/maker_kXmduG/2/blastprep/te_proteins%2Efasta.mpi.10.9 >> #-------------------------------# >> running blast search. >> #--------- command -------------# >> Widget::blastx: >> /opt/miniconda3/bin/blastx -db /gpfs0/scratch/1895302/maker_kXmduG/te_proteins%2Efasta.mpi.10.8 -query /gpfs0/scratch/1895302/maker_kXmduG/1/NT_010783%2E15.1 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.8.repeatrunner >> #-------------------------------# >> running blast search. >> #--------- command -------------# >> Widget::blastx: >> /opt/miniconda3/bin/blastx -db /gpfs0/scratch/1895302/maker_kXmduG/te_proteins%2Efasta.mpi.10.9 -query /gpfs0/scratch/1895302/maker_kXmduG/2/NT_010783%2E15.1 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.9.repeatrunner >> #-------------------------------# >> deleted:5 hits >> deleted:7 hits >> collecting blastx repeatmasking >> processing all repeats >> in cluster::shadow_cluster... >> ...finished clustering. >> >> Let me know if you need more information. Thank you in advance for your help. >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From pickettbd at gmail.com Thu Aug 15 14:48:46 2019 From: pickettbd at gmail.com (Brandon Pickett) Date: Thu, 15 Aug 2019 13:48:46 -0700 Subject: [maker-devel] which files are expected after fasta_merge? Message-ID: Good afternoon! I just finished my third round of maker. I trained snap, augustus, etc. between the rounds. I used fasta_merge and gff3_merge to extract files after each round of maker. gff3_merge performed as expected each time, but fasta_merge surprised me. I will show you which files fasta_merge generated after each round. Please note that, as many people do, I renamed my output files from the default. Accordingly, I will list all the files with a generalized prefix of "maker" and show the rest of the file name as it was generated for me. Also note that I've changed .fasta to .fa for brevity. After round #1: transcripts.fa proteins.fa After round #2: non_overlapping_ab_initio.proteins.fa non_overlapping_ab_initio.transcripts.fa transcripts.fa augustus_masked.proteins.fa augustus_masked.transcripts.fa evm.proteins.fa evm.transcripts.fa genemark.proteins.fa genemark.transcripts.fa snap_masked.proteins.fa snap_masked.transcripts.fa proteins.fa After round #3: non_overlapping_ab_initio.proteins.fa non_overlapping_ab_initio.transcripts.fa augustus_masked.proteins.fa augustus_masked.transcripts.fa genemark.proteins.fa genemark.transcripts.fa snap_masked.proteins.fa snap_masked.transcripts.fa I am unsurprised that I didn't get all these files after round #1 because I used round #1 to generate gene models from transcript evidence. I didn't expect so many files after round #2 (having only seen the output from round #1 up to that point), but it makes sense that I would get output from augustus, evidence modeler (evm), genemark, and snap since I provided them as input to this round (#2) of maker. Between rounds #2 and #3, I re-trained snap and augustus. Genemark was trained between rounds #1 and #2 without gene models from maker and thus did not require re-training. The only difference in my maker control files between rounds #2 and #3 were the paths to the snap and augustus files. In both #2 and #3, the control files had run_evm=1. I can provide my control files for each round, if needed. *My question is why transcripts.fa, proteins.fa, evm.proteins.fa, and evm.transcripts.fa were not generated after round #3? *I recognize that this is probably not an error, rather a lack of my understanding of when each file is and is not generated. Thank you, Brandon Pickett -------------- next part -------------- An HTML attachment was scrubbed... URL: From jacques.dainat at nbis.se Tue Aug 20 02:14:53 2019 From: jacques.dainat at nbis.se (Jacques Dainat) Date: Tue, 20 Aug 2019 10:14:53 +0200 Subject: [maker-devel] maker_gff parameter - problem when gff contains fasta sequences Message-ID: <0EBF46CF-C0C8-4985-93D4-7BA587413DA7@nbis.se> Dear Carson, I?m using maker/3.01.02 with open MPI. I realised that the option maker_gff from the maker_opts.ctl works great as long as no FASTA sequence is embeded in the GFF3 file. e.g: ``` ### ##FASTA >3098|quiver TTTATGGGTTCAGGCGGACCCATGGCGCCGACCATATTTTGAGAGCTGGACGACTCTGTA GGGTTGGGTATTGGCTGATTATTCATTCAAATCCCACGAGTAGCCTAGGAAGTGACGGTC ``` I ended up with GFF3 files containing fasta sequences in a sequential manner (All contig1 features then the sequence of contig1, all contig2 features then the sequence of contig2, etc? I precise this because we can meet gff3 files where all the sequences are gather at the end of the file). In such case MAKER takes in consideration only the features met before to reach the first FASTA sequence in the file. Then it stops to process the file and doesn?t consider the rest of it. I haven?t seen any particular message but my resulting annotation was obviously wrong. Indeed most of the data repeat/alignment/models contained in the gff file haven?t been passed to MAKER. Would it be possible to add a fix to continue to parse a gff file even after meeting a fasta sequence? Best regards, /Jacques ------------------------------------------------- Jacques Dainat, Ph.D. NBIS (National Bioinformatics Infrastructure Sweden) Genome Annotation Service http://nbis.se/about/staff/jacques-dainat https://github.com/NBISweden/GAAS http://nbis.se ? Contact ? Address: Uppsala University, Biomedicinska Centrum Department of Medical Biochemistry Microbiology, Genomics Husargatan 3, box 582 S-75123 Uppsala Sweden Phone: +46 18 471 46 25 -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Aug 20 07:16:23 2019 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 20 Aug 2019 07:16:23 -0600 Subject: [maker-devel] maker_gff parameter - problem when gff contains fasta sequences In-Reply-To: <0EBF46CF-C0C8-4985-93D4-7BA587413DA7@nbis.se> References: <0EBF46CF-C0C8-4985-93D4-7BA587413DA7@nbis.se> Message-ID: All fasta entries must occur at the end of the file according to gff3 specification. If a fasta entry is embedded in the middle, you have a corrupt file. If you are trying to merge gff3, files you can use the gff3_merge script. Concatenation via something like ?cat? however results in a broken file. ?Carson Sent from my iPhone > On Aug 20, 2019, at 2:14 AM, Jacques Dainat wrote: > > Dear Carson, > > I?m using maker/3.01.02 with open MPI. > I realised that the option maker_gff from the maker_opts.ctl works great as long as no FASTA sequence is embeded in the GFF3 file. > e.g: > ``` > ### > ##FASTA > >3098|quiver > TTTATGGGTTCAGGCGGACCCATGGCGCCGACCATATTTTGAGAGCTGGACGACTCTGTA > GGGTTGGGTATTGGCTGATTATTCATTCAAATCCCACGAGTAGCCTAGGAAGTGACGGTC > ``` > I ended up with GFF3 files containing fasta sequences in a sequential manner (All contig1 features then the sequence of contig1, all contig2 features then the sequence of contig2, etc? I precise this because we can meet gff3 files where all the sequences are gather at the end of the file). In such case MAKER takes in consideration only the features met before to reach the first FASTA sequence in the file. Then it stops to process the file and doesn?t consider the rest of it. > > I haven?t seen any particular message but my resulting annotation was obviously wrong. Indeed most of the data repeat/alignment/models contained in the gff file haven?t been passed to MAKER. Would it be possible to add a fix to continue to parse a gff file even after meeting a fasta sequence? > > > Best regards, > > /Jacques > ------------------------------------------------- > Jacques Dainat, Ph.D. > NBIS (National Bioinformatics Infrastructure Sweden) > Genome Annotation Service > http://nbis.se/about/staff/jacques-dainat > https://github.com/NBISweden/GAAS > http://nbis.se > > ? Contact ? > Address: Uppsala University, Biomedicinska Centrum > Department of Medical Biochemistry Microbiology, Genomics > Husargatan 3, box 582 > S-75123 Uppsala Sweden > Phone: +46 18 471 46 25 > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Aug 20 07:20:50 2019 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 20 Aug 2019 07:20:50 -0600 Subject: [maker-devel] maker_gff parameter - problem when gff contains fasta sequences In-Reply-To: References: <0EBF46CF-C0C8-4985-93D4-7BA587413DA7@nbis.se> Message-ID: Here is the relevant part of the format specification ?> ##FASTA This notation indicates that the annotation portion of the file is at an end and that the remainder of the file contains one or more sequences (nucleotide or protein) in FASTA format. This allows features and sequences to be bundled together. All FASTA sequences included in the file must be included together at the end of the file and may not be interspersed with the features lines. Once a ##FASTA section is encountered no other content beyond valid FASTA sequence is allowed. ?Carson Sent from my iPhone > On Aug 20, 2019, at 7:16 AM, Carson Holt wrote: > > All fasta entries must occur at the end of the file according to gff3 specification. If a fasta entry is embedded in the middle, you have a corrupt file. If you are trying to merge gff3, files you can use the gff3_merge script. Concatenation via something like ?cat? however results in a broken file. > > ?Carson > > Sent from my iPhone > >> On Aug 20, 2019, at 2:14 AM, Jacques Dainat wrote: >> >> Dear Carson, >> >> I?m using maker/3.01.02 with open MPI. >> I realised that the option maker_gff from the maker_opts.ctl works great as long as no FASTA sequence is embeded in the GFF3 file. >> e.g: >> ``` >> ### >> ##FASTA >> >3098|quiver >> TTTATGGGTTCAGGCGGACCCATGGCGCCGACCATATTTTGAGAGCTGGACGACTCTGTA >> GGGTTGGGTATTGGCTGATTATTCATTCAAATCCCACGAGTAGCCTAGGAAGTGACGGTC >> ``` >> I ended up with GFF3 files containing fasta sequences in a sequential manner (All contig1 features then the sequence of contig1, all contig2 features then the sequence of contig2, etc? I precise this because we can meet gff3 files where all the sequences are gather at the end of the file). In such case MAKER takes in consideration only the features met before to reach the first FASTA sequence in the file. Then it stops to process the file and doesn?t consider the rest of it. >> >> I haven?t seen any particular message but my resulting annotation was obviously wrong. Indeed most of the data repeat/alignment/models contained in the gff file haven?t been passed to MAKER. Would it be possible to add a fix to continue to parse a gff file even after meeting a fasta sequence? >> >> >> Best regards, >> >> /Jacques >> ------------------------------------------------- >> Jacques Dainat, Ph.D. >> NBIS (National Bioinformatics Infrastructure Sweden) >> Genome Annotation Service >> http://nbis.se/about/staff/jacques-dainat >> https://github.com/NBISweden/GAAS >> http://nbis.se >> >> ? Contact ? >> Address: Uppsala University, Biomedicinska Centrum >> Department of Medical Biochemistry Microbiology, Genomics >> Husargatan 3, box 582 >> S-75123 Uppsala Sweden >> Phone: +46 18 471 46 25 >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From jacques.dainat at nbis.se Tue Aug 20 07:28:46 2019 From: jacques.dainat at nbis.se (Jacques Dainat) Date: Tue, 20 Aug 2019 15:28:46 +0200 Subject: [maker-devel] maker_gff parameter - problem when gff contains fasta sequences In-Reply-To: References: <0EBF46CF-C0C8-4985-93D4-7BA587413DA7@nbis.se> Message-ID: <66852BCA-9F3C-40CB-B58A-9578E3418851@nbis.se> Thank you for your quick answer, you are right I should have read the gff3 specification more carefully. I will investigate which step I modified that introduced the problem. Thank again. /Jacques > On 20 Aug 2019, at 15:16, Carson Holt wrote: > > All fasta entries must occur at the end of the file according to gff3 specification. If a fasta entry is embedded in the middle, you have a corrupt file. If you are trying to merge gff3, files you can use the gff3_merge script. Concatenation via something like ?cat? however results in a broken file. > > ?Carson > > Sent from my iPhone > > On Aug 20, 2019, at 2:14 AM, Jacques Dainat > wrote: > >> Dear Carson, >> >> I?m using maker/3.01.02 with open MPI. >> I realised that the option maker_gff from the maker_opts.ctl works great as long as no FASTA sequence is embeded in the GFF3 file. >> e.g: >> ``` >> ### >> ##FASTA >> >3098|quiver >> TTTATGGGTTCAGGCGGACCCATGGCGCCGACCATATTTTGAGAGCTGGACGACTCTGTA >> GGGTTGGGTATTGGCTGATTATTCATTCAAATCCCACGAGTAGCCTAGGAAGTGACGGTC >> ``` >> I ended up with GFF3 files containing fasta sequences in a sequential manner (All contig1 features then the sequence of contig1, all contig2 features then the sequence of contig2, etc? I precise this because we can meet gff3 files where all the sequences are gather at the end of the file). In such case MAKER takes in consideration only the features met before to reach the first FASTA sequence in the file. Then it stops to process the file and doesn?t consider the rest of it. >> >> I haven?t seen any particular message but my resulting annotation was obviously wrong. Indeed most of the data repeat/alignment/models contained in the gff file haven?t been passed to MAKER. Would it be possible to add a fix to continue to parse a gff file even after meeting a fasta sequence? >> >> >> Best regards, >> >> /Jacques >> ------------------------------------------------- >> Jacques Dainat, Ph.D. >> NBIS (National Bioinformatics Infrastructure Sweden) >> Genome Annotation Service >> http://nbis.se/about/staff/jacques-dainat >> https://github.com/NBISweden/GAAS >> http://nbis.se >> >> ? Contact ? >> Address: Uppsala University, Biomedicinska Centrum >> Department of Medical Biochemistry Microbiology, Genomics >> Husargatan 3, box 582 >> S-75123 Uppsala Sweden >> Phone: +46 18 471 46 25 >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From jacques.dainat at nbis.se Wed Aug 21 06:59:43 2019 From: jacques.dainat at nbis.se (Jacques Dainat) Date: Wed, 21 Aug 2019 14:59:43 +0200 Subject: [maker-devel] GeneWise in MAKER? Message-ID: Dear Carson, Reading this paper J. Armstrong, I. T. Fiddes, M. Diekhans, and B. Paten. Whole-Genome Alignment and Comparative Annotation. Annu Rev Anim Biosci, Oct 2018. I discovered that MAKER is running GeneWise (i.e table 4). Do they mixup with Exonerate, or it is something well hidden within MAKER and its documentation? Best regards, Jacques ------------------------------------------------- Jacques Dainat, Ph.D. NBIS (National Bioinformatics Infrastructure Sweden) Genome Annotation Service http://nbis.se/about/staff/jacques-dainat https://github.com/NBISweden/GAAS http://nbis.se ? Contact ? Address: Uppsala University, Biomedicinska Centrum Department of Medical Biochemistry Microbiology, Genomics Husargatan 3, box 582 S-75123 Uppsala Sweden Phone: +46 18 471 46 25 -------------- next part -------------- An HTML attachment was scrubbed... URL: From ABoyher at danforthcenter.org Wed Aug 28 12:31:33 2019 From: ABoyher at danforthcenter.org (Boyher, Adam) Date: Wed, 28 Aug 2019 18:31:33 +0000 Subject: [maker-devel] Haplotype specific annotations Message-ID: <02119316-A412-43AB-A6A8-05037B14D972@contoso.com> Hi I have a phased genome (haplotype specific assemblies) that I have annotated separately with Maker. There are a couple of things I?ve had a little trouble figuring out in relation to this. The first is that I have 3 sets of genes, two sets that exist in one haplotype but not the other, and a third that exist in both. I want to name these genes specifically based on what set they are in. So for instance, genes that exist in both have the same name in both assemblies, but genes that exist in only one haplotype are named specific to that haplotype. Is there a straightforward way to do this? The second issue is that I?ve discovered one gene that is annotated in one phase, but not the other. However, when I blast the genomic sequence against the second haplotype, I find an exact match. Given that I used the exact same methods to annotate both haplotypes, starting with the same set of evidence (protein, transcriptome, repeat), why might maker miss or exclude that gene? Thanks Adam -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Aug 30 10:20:32 2019 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 30 Aug 2019 10:20:32 -0600 Subject: [maker-devel] Haplotype specific annotations In-Reply-To: <02119316-A412-43AB-A6A8-05037B14D972@contoso.com> References: <02119316-A412-43AB-A6A8-05037B14D972@contoso.com> Message-ID: <098E19B1-BDB4-4FDA-A94B-4E30CEB5B58A@gmail.com> It may be that there are broken splice donors/acceptors in one vs the other which would not be seen with a blast search. Look at both in a browser to see what evidence looks like in both and how ab initial predictions compare between the two. As for naming you could try reciprical best blast hits to see who matches who. Unfortunately you will have to do a lot of manual review to make sure you are not just matching parlors together. ?Carson > On Aug 28, 2019, at 12:31 PM, Boyher, Adam wrote: > > Hi > > I have a phased genome (haplotype specific assemblies) that I have annotated separately with Maker. There are a couple of things I?ve had a little trouble figuring out in relation to this. The first is that I have 3 sets of genes, two sets that exist in one haplotype but not the other, and a third that exist in both. I want to name these genes specifically based on what set they are in. So for instance, genes that exist in both have the same name in both assemblies, but genes that exist in only one haplotype are named specific to that haplotype. Is there a straightforward way to do this? > > The second issue is that I?ve discovered one gene that is annotated in one phase, but not the other. However, when I blast the genomic sequence against the second haplotype, I find an exact match. Given that I used the exact same methods to annotate both haplotypes, starting with the same set of evidence (protein, transcriptome, repeat), why might maker miss or exclude that gene? > > Thanks > Adam > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Aug 30 10:48:52 2019 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 30 Aug 2019 10:48:52 -0600 Subject: [maker-devel] which files are expected after fasta_merge? In-Reply-To: References: Message-ID: If you disabled evidence for round 3 (i.e. protein= and est=) then you will get no annotations and EVM will not run. You can look at the GFF3 in a browser, and if you see that there are no protein/est alignments, then that is likely why. ?Carson > On Aug 15, 2019, at 2:48 PM, Brandon Pickett wrote: > > Good afternoon! > > I just finished my third round of maker. I trained snap, augustus, etc. between the rounds. I used fasta_merge and gff3_merge to extract files after each round of maker. gff3_merge performed as expected each time, but fasta_merge surprised me. I will show you which files fasta_merge generated after each round. Please note that, as many people do, I renamed my output files from the default. Accordingly, I will list all the files with a generalized prefix of "maker" and show the rest of the file name as it was generated for me. Also note that I've changed .fasta to .fa for brevity. > > After round #1: > transcripts.fa > proteins.fa > > After round #2: > non_overlapping_ab_initio.proteins.fa > non_overlapping_ab_initio.transcripts.fa > transcripts.fa > augustus_masked.proteins.fa > augustus_masked.transcripts.fa > evm.proteins.fa > evm.transcripts.fa > genemark.proteins.fa > genemark.transcripts.fa > snap_masked.proteins.fa > snap_masked.transcripts.fa > proteins.fa > > After round #3: > non_overlapping_ab_initio.proteins.fa > non_overlapping_ab_initio.transcripts.fa > augustus_masked.proteins.fa > augustus_masked.transcripts.fa > genemark.proteins.fa > genemark.transcripts.fa > snap_masked.proteins.fa > snap_masked.transcripts.fa > > I am unsurprised that I didn't get all these files after round #1 because I used round #1 to generate gene models from transcript evidence. I didn't expect so many files after round #2 (having only seen the output from round #1 up to that point), but it makes sense that I would get output from augustus, evidence modeler (evm), genemark, and snap since I provided them as input to this round (#2) of maker. Between rounds #2 and #3, I re-trained snap and augustus. Genemark was trained between rounds #1 and #2 without gene models from maker and thus did not require re-training. The only difference in my maker control files between rounds #2 and #3 were the paths to the snap and augustus files. In both #2 and #3, the control files had run_evm=1. I can provide my control files for each round, if needed. My question is why transcripts.fa, proteins.fa, evm.proteins.fa, and evm.transcripts.fa were not generated after round #3? I recognize that this is probably not an error, rather a lack of my understanding of when each file is and is not generated. > > Thank you, > Brandon Pickett > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From Cenny.Taslim at nationwidechildrens.org Thu Aug 8 07:12:23 2019 From: Cenny.Taslim at nationwidechildrens.org (Taslim, Cenny) Date: Thu, 8 Aug 2019 13:12:23 +0000 Subject: [maker-devel] maker with mpi support on example still not done after two days Message-ID: <82339125479f4b278b87bf8458c2c04d@l1perdwmbx02.childrensroot.net> Hi Maker developers, Thanks for approving my subscription. I tried running maker with mpi support on the human fasta file provided in example_01_basic. It's been running for 2 days and 21 hours. I didn't think the example require a long time to run. I'm hoping someone can help me point out the problem. I'm running it with 4 processes: ~/opt/mpich-3.3.1/bin/mpiexec -n 4 ~/opt/maker.4/maker/bin/maker -f 2> maker.error Maker_opts.ctl is the same as opts2.txt These are the log: STATUS: Parsing control files... STATUS: Processing and indexing input FASTA files... STATUS: Setting up database for any GFF3 input... A data structure will be created for you at: /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore To access files for individual sequences use the datastore index: /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_master_datastore_index.log STATUS: Now running MAKER... examining contents of the fasta file and run log --Next Contig-- #--------------------------------------------------------------------- Now starting the contig!! SeqID: NT_010783.15 Length: 201444 #--------------------------------------------------------------------- setting up GFF3 output and fasta chunks doing repeat masking doing repeat masking running repeat masker. #--------- command -------------# Widget::RepeatMasker: cd /gpfs0/scratch/1895302/maker_kXmduG; /opt/maker.4/maker/exe/RepeatMasker/RepeatMasker /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.0.all.rb -species all -dir /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0 -pa 1 #-------------------------------# running repeat masker. #--------- command -------------# Widget::RepeatMasker: cd /gpfs0/scratch/1895302/maker_kXmduG; /opt/maker.4/maker/exe/RepeatMasker/RepeatMasker /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.all.rb -species all -dir /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0 -pa 1 #-------------------------------# doing blastx repeats formating database... #--------- command -------------# Widget::formater: /opt/miniconda3/bin/makeblastdb -dbtype prot -in /gpfs0/scratch/1895302/maker_kXmduG/1/blastprep/te_proteins%2Efasta.mpi.10.0 #-------------------------------# doing blastx repeats formating database... #--------- command -------------# Widget::formater: /opt/miniconda3/bin/makeblastdb -dbtype prot -in /gpfs0/scratch/1895302/maker_kXmduG/2/blastprep/te_proteins%2Efasta.mpi.10.1 #-------------------------------# running blast search. #--------- command -------------# Widget::blastx: /opt/miniconda3/bin/blastx -db /gpfs0/scratch/1895302/maker_kXmduG/te_proteins%2Efasta.mpi.10.0 -query /gpfs0/scratch/1895302/maker_kXmduG/1/NT_010783%2E15.1 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.0.repeatrunner #-------------------------------# running blast search. #--------- command -------------# Widget::blastx: /opt/miniconda3/bin/blastx -db /gpfs0/scratch/1895302/maker_kXmduG/te_proteins%2Efasta.mpi.10.1 -query /gpfs0/scratch/1895302/maker_kXmduG/2/NT_010783%2E15.1 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.1.repeatrunner #-------------------------------# deleted:19 hits deleted:18 hits doing blastx repeats doing blastx repeats formating database... #--------- command -------------# Widget::formater: /opt/miniconda3/bin/makeblastdb -dbtype prot -in /gpfs0/scratch/1895302/maker_kXmduG/1/blastprep/te_proteins%2Efasta.mpi.10.2 #-------------------------------# formating database... #--------- command -------------# Widget::formater: /opt/miniconda3/bin/makeblastdb -dbtype prot -in /gpfs0/scratch/1895302/maker_kXmduG/2/blastprep/te_proteins%2Efasta.mpi.10.3 #-------------------------------# running blast search. #--------- command -------------# Widget::blastx: /opt/miniconda3/bin/blastx -db /gpfs0/scratch/1895302/maker_kXmduG/te_proteins%2Efasta.mpi.10.2 -query /gpfs0/scratch/1895302/maker_kXmduG/1/NT_010783%2E15.1 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.2.repeatrunner #-------------------------------# running blast search. #--------- command -------------# Widget::blastx: /opt/miniconda3/bin/blastx -db /gpfs0/scratch/1895302/maker_kXmduG/te_proteins%2Efasta.mpi.10.3 -query /gpfs0/scratch/1895302/maker_kXmduG/2/NT_010783%2E15.1 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.3.repeatrunner #-------------------------------# deleted:9 hits deleted:9 hits doing blastx repeats doing blastx repeats formating database... #--------- command -------------# Widget::formater: /opt/miniconda3/bin/makeblastdb -dbtype prot -in /gpfs0/scratch/1895302/maker_kXmduG/1/blastprep/te_proteins%2Efasta.mpi.10.5 #-------------------------------# formating database... #--------- command -------------# Widget::formater: /opt/miniconda3/bin/makeblastdb -dbtype prot -in /gpfs0/scratch/1895302/maker_kXmduG/2/blastprep/te_proteins%2Efasta.mpi.10.4 #-------------------------------# running blast search. #--------- command -------------# Widget::blastx: /opt/miniconda3/bin/blastx -db /gpfs0/scratch/1895302/maker_kXmduG/te_proteins%2Efasta.mpi.10.5 -query /gpfs0/scratch/1895302/maker_kXmduG/1/NT_010783%2E15.1 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.5.repeatrunner #-------------------------------# running blast search. #--------- command -------------# Widget::blastx: /opt/miniconda3/bin/blastx -db /gpfs0/scratch/1895302/maker_kXmduG/te_proteins%2Efasta.mpi.10.4 -query /gpfs0/scratch/1895302/maker_kXmduG/2/NT_010783%2E15.1 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.4.repeatrunner #-------------------------------# deleted:16 hits deleted:16 hits doing blastx repeats doing blastx repeats formating database... #--------- command -------------# Widget::formater: /opt/miniconda3/bin/makeblastdb -dbtype prot -in /gpfs0/scratch/1895302/maker_kXmduG/1/blastprep/te_proteins%2Efasta.mpi.10.6 #-------------------------------# formating database... #--------- command -------------# Widget::formater: /opt/miniconda3/bin/makeblastdb -dbtype prot -in /gpfs0/scratch/1895302/maker_kXmduG/2/blastprep/te_proteins%2Efasta.mpi.10.7 #-------------------------------# running blast search. #--------- command -------------# Widget::blastx: /opt/miniconda3/bin/blastx -db /gpfs0/scratch/1895302/maker_kXmduG/te_proteins%2Efasta.mpi.10.6 -query /gpfs0/scratch/1895302/maker_kXmduG/1/NT_010783%2E15.1 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.6.repeatrunner #-------------------------------# running blast search. #--------- command -------------# Widget::blastx: /opt/miniconda3/bin/blastx -db /gpfs0/scratch/1895302/maker_kXmduG/te_proteins%2Efasta.mpi.10.7 -query /gpfs0/scratch/1895302/maker_kXmduG/2/NT_010783%2E15.1 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.7.repeatrunner #-------------------------------# deleted:8 hits deleted:12 hits doing blastx repeats doing blastx repeats formating database... #--------- command -------------# Widget::formater: /opt/miniconda3/bin/makeblastdb -dbtype prot -in /gpfs0/scratch/1895302/maker_kXmduG/1/blastprep/te_proteins%2Efasta.mpi.10.8 #-------------------------------# formating database... #--------- command -------------# Widget::formater: /opt/miniconda3/bin/makeblastdb -dbtype prot -in /gpfs0/scratch/1895302/maker_kXmduG/2/blastprep/te_proteins%2Efasta.mpi.10.9 #-------------------------------# running blast search. #--------- command -------------# Widget::blastx: /opt/miniconda3/bin/blastx -db /gpfs0/scratch/1895302/maker_kXmduG/te_proteins%2Efasta.mpi.10.8 -query /gpfs0/scratch/1895302/maker_kXmduG/1/NT_010783%2E15.1 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.8.repeatrunner #-------------------------------# running blast search. #--------- command -------------# Widget::blastx: /opt/miniconda3/bin/blastx -db /gpfs0/scratch/1895302/maker_kXmduG/te_proteins%2Efasta.mpi.10.9 -query /gpfs0/scratch/1895302/maker_kXmduG/2/NT_010783%2E15.1 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.9.repeatrunner #-------------------------------# deleted:5 hits deleted:7 hits collecting blastx repeatmasking processing all repeats in cluster::shadow_cluster... ...finished clustering. Let me know if you need more information. Thank you in advance for your help. -------------- next part -------------- An HTML attachment was scrubbed... URL: From debojyoti.das.und at gmail.com Thu Aug 8 12:54:01 2019 From: debojyoti.das.und at gmail.com (Debojyoti Das) Date: Thu, 8 Aug 2019 13:54:01 -0500 Subject: [maker-devel] maker predicting only part of a gene Message-ID: Hi, I am working on a non-model reptile species. I tried running maker with est2genome=1 and protein2genome=1 with the following evidence: 1. est=transcriptome.fasta (de novo assembled) 2. protein in fasta format from two related species. I keep getting predictions where gene models identify multi-exon genes but fail to incorporate all the exons even though they are present on the scaffolds. The fact that in the predicted gene models some exons were correctly identified while missing others even though the entire gene is present on the scaffolds, we checked this by loading the annotation in IGV Viewer. Since we are not completely confident of our transciptome assembly, we thought of using cDNA from a closely related species. 1. altest="cDNA in fasta format from a related species" 2. protein in fasta format from two related species. However, when I do this I get the error *"ERROR: You must provide some form of EST evidence to use est2genome as a predictor."* Interestingly, if I switch est2genome off (setting it to zero) maker starts running. Any suggestions on how to proceed. Best, Debojyoti -------------- next part -------------- An HTML attachment was scrubbed... URL: From debojyoti.das at und.edu Thu Aug 8 13:48:36 2019 From: debojyoti.das at und.edu (Das, Debojyoti) Date: Thu, 8 Aug 2019 19:48:36 +0000 Subject: [maker-devel] maker predicting only part of a gene Message-ID: Hi Carson, I am working on a non-model reptile species. I tried running maker with est2genome=1 and protein2genome=1 with the following evidence: 1. est=transcriptome.fasta (de novo assembled) 2. protein in fasta format from two related species. I keep getting predictions where gene models identify multi-exon genes but fail to incorporate all the exons even though they are present on the scaffolds. The fact that in the predicted gene models some exons were correctly identified while missing others even though the entire gene is present on the scaffolds, we checked this by loading the annotation in IGV Viewer. Since we are not completely confident of our transciptome assembly, we thought of using cDNA from a closely related species. 1. altest="cDNA in fasta format from a related species" 2. protein in fasta format from two related species. However, when I do this I get the error "ERROR: You must provide some form of EST evidence to use est2genome as a predictor." Interestingly, if I switch est2genome off (setting it to zero) maker starts running. Any suggestions on how to proceed. Best, Debojyoti -------------- next part -------------- An HTML attachment was scrubbed... URL: From jmartin at wustl.edu Thu Aug 8 18:41:31 2019 From: jmartin at wustl.edu (Martin, John) Date: Fri, 9 Aug 2019 00:41:31 +0000 Subject: [maker-devel] Running maker with suboptimal evidence Message-ID: Greetings, I would like to annotate a worm genome of ~90Mb but the evidence I have is not all of good quality. I have 2 high quality protein sets from previously finished & curated, closely related worms. I also have a small amount of RNAseq and some old EST data neither of which give good coverage of the transcriptome. And I have a previously run Maker geneset for this worm that I believe was generated using probably all nematodes from genbank and that poor set of RNAseq. I also have a 'fair' set of predictions from running Braker2 using only the high quality protein data I mentioned. My opinion that the previous Maker geneset is of poor quality comes from comparing that geneset to my recent Braker2 annotations and that RNAseq. I would like to try and build an improved annotation for this assembly but I'm unsure of whether I should use all this evidence, or if I would be better off not using the low coverage RNAseq, EST data and previous (questionable looking) Maker geneset. I'm looking for opinions on whether I should throw all the evidence I have into Maker or should I use only evidence that I consider of good quality? Thanks, John Martin ________________________________ The materials in this message are private and may contain Protected Healthcare Information or other information of a sensitive nature. If you are not the intended recipient, be advised that any unauthorized use, disclosure, copying or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error, please immediately notify the sender via telephone or return mail. From Cenny.Taslim at nationwidechildrens.org Wed Aug 14 10:08:40 2019 From: Cenny.Taslim at nationwidechildrens.org (Taslim, Cenny) Date: Wed, 14 Aug 2019 16:08:40 +0000 Subject: [maker-devel] maker with mpi support on example still not done after two days Message-ID: I figured out that the example job will be finished if I'm not using mpi. i.e. running the job as such is fine: ~/opt/maker.4/maker/bin/maker -f 2> maker.error Any suggestions? Without mpi, I suspect it will take months to complete as I have human genome with ~3500 contigs From carsonhh at gmail.com Wed Aug 14 12:54:26 2019 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 14 Aug 2019 12:54:26 -0600 Subject: [maker-devel] Running maker with suboptimal evidence In-Reply-To: References: Message-ID: <9C1C9005-8285-4A48-9F48-066D15D402A9@gmail.com> Run it both ways then look at it in a browser like Apollo. You can then visually see how evidence alignments compare to the models and if they are spurious. While not practical in most situations, manual review is still the gold standard for genome annotation. ?Carson > On Aug 8, 2019, at 6:41 PM, Martin, John wrote: > > Greetings, > > I would like to annotate a worm genome of ~90Mb but the evidence I > have is not all of good quality. I have 2 high quality protein sets > from previously finished & curated, closely related worms. I also have > a small amount of RNAseq and some old EST data neither of which give > good coverage of the transcriptome. And I have a previously run Maker > geneset for this worm that I believe was generated using probably all > nematodes from genbank and that poor set of RNAseq. I also have a > 'fair' set of predictions from running Braker2 using only the high > quality protein data I mentioned. My opinion that the previous Maker > geneset is of poor quality comes from comparing that geneset to my > recent Braker2 annotations and that RNAseq. > > I would like to try and build an improved annotation for this > assembly but I'm unsure of whether I should use all this evidence, or if > I would be better off not using the low coverage RNAseq, EST data and > previous (questionable looking) Maker geneset. I'm looking for > opinions on whether I should throw all the evidence I have into Maker or > should I use only evidence that I consider of good quality? > > > Thanks, > > John Martin > > > > ________________________________ > The materials in this message are private and may contain Protected Healthcare Information or other information of a sensitive nature. If you are not the intended recipient, be advised that any unauthorized use, disclosure, copying or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error, please immediately notify the sender via telephone or return mail. > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Wed Aug 14 12:57:39 2019 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 14 Aug 2019 12:57:39 -0600 Subject: [maker-devel] maker predicting only part of a gene In-Reply-To: References: Message-ID: <31721D1F-7079-4935-BE34-A8F4786A5797@gmail.com> Both est2genome=1 and protein2genome=1 do not predict genes. They simply transfer exonerate alignments which match ORFs into gene models. It?s good enough to train a predictor like SNAP or Augustus, but should not be used as the final models. If you review the documentation you will see that they should be turned off once you train a predictor. Here is an example ?> http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial_for_WGS_Assembly_and_Annotation_Winter_School_2018#Training_ab_initio_Gene_Predictors ?Carson > On Aug 8, 2019, at 1:48 PM, Das, Debojyoti wrote: > > Hi Carson, > > I am working on a non-model reptile species. I tried running maker with est2genome=1 and protein2genome=1 with the following evidence: > > 1. est=transcriptome.fasta (de novo assembled) > 2. protein in fasta format from two related species. > > > I keep getting predictions where gene models identify multi-exon genes but fail to incorporate all the exons even though they are present on the scaffolds. The fact that in the predicted gene models some exons were correctly identified while missing others even though the entire gene is present on the scaffolds, we checked this by loading the annotation in IGV Viewer. > > Since we are not completely confident of our transciptome assembly, we thought of using cDNA from a closely related species. > 1. altest="cDNA in fasta format from a related species" > 2. protein in fasta format from two related species. > > However, when I do this I get the error > "ERROR: You must provide some form of EST evidence to use est2genome as a predictor." > > Interestingly, if I switch est2genome off (setting it to zero) maker starts running. > > Any suggestions on how to proceed. > > Best, > Debojyoti > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Aug 14 13:05:05 2019 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 14 Aug 2019 13:05:05 -0600 Subject: [maker-devel] maker with mpi support on example still not done after two days In-Reply-To: <82339125479f4b278b87bf8458c2c04d@l1perdwmbx02.childrensroot.net> References: <82339125479f4b278b87bf8458c2c04d@l1perdwmbx02.childrensroot.net> Message-ID: If no additional output is produced, then it is frozen for some reason. My first suggestion would be to reinstall MPICH and then reinstall MAKER using the MPICH you just installed. Freezing issues are usually related to MPI communication which is handled by the communicator you have installed (i.e. the mpiexec command that launches MAKER). Alternatively it can be related to the file system. Some less common network mounted file systems do not support do not correctly support hardlinks which can cause programatic file locks to freeze. You can try running with the -nolock flag if that is the case. ?Carson > On Aug 8, 2019, at 7:12 AM, Taslim, Cenny wrote: > > Hi Maker developers, > > Thanks for approving my subscription. > I tried running maker with mpi support on the human fasta file provided in example_01_basic. It?s been running for 2 days and 21 hours. I didn?t think the example require a long time to run. > I?m hoping someone can help me point out the problem. > > I?m running it with 4 processes: > ~/opt/mpich-3.3.1/bin/mpiexec -n 4 ~/opt/maker.4/maker/bin/maker -f 2> maker.error > > Maker_opts.ctl is the same as opts2.txt > > These are the log: > STATUS: Parsing control files... > STATUS: Processing and indexing input FASTA files... > STATUS: Setting up database for any GFF3 input... > A data structure will be created for you at: > /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore > > To access files for individual sequences use the datastore index: > /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_master_datastore_index.log > > STATUS: Now running MAKER... > examining contents of the fasta file and run log > > > > --Next Contig-- > > #--------------------------------------------------------------------- > Now starting the contig!! > SeqID: NT_010783.15 > Length: 201444 > #--------------------------------------------------------------------- > > > setting up GFF3 output and fasta chunks > doing repeat masking > doing repeat masking > running repeat masker. > #--------- command -------------# > Widget::RepeatMasker: > cd /gpfs0/scratch/1895302/maker_kXmduG; /opt/maker.4/maker/exe/RepeatMasker/RepeatMasker /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.0.all.rb -species all -dir /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0 -pa 1 > #-------------------------------# > running repeat masker. > #--------- command -------------# > Widget::RepeatMasker: > cd /gpfs0/scratch/1895302/maker_kXmduG; /opt/maker.4/maker/exe/RepeatMasker/RepeatMasker /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.all.rb -species all -dir /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0 -pa 1 > #-------------------------------# > doing blastx repeats > formating database... > #--------- command -------------# > Widget::formater: > /opt/miniconda3/bin/makeblastdb -dbtype prot -in /gpfs0/scratch/1895302/maker_kXmduG/1/blastprep/te_proteins%2Efasta.mpi.10.0 > #-------------------------------# > doing blastx repeats > formating database... > #--------- command -------------# > Widget::formater: > /opt/miniconda3/bin/makeblastdb -dbtype prot -in /gpfs0/scratch/1895302/maker_kXmduG/2/blastprep/te_proteins%2Efasta.mpi.10.1 > #-------------------------------# > running blast search. > #--------- command -------------# > Widget::blastx: > /opt/miniconda3/bin/blastx -db /gpfs0/scratch/1895302/maker_kXmduG/te_proteins%2Efasta.mpi.10.0 -query /gpfs0/scratch/1895302/maker_kXmduG/1/NT_010783%2E15.1 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.0.repeatrunner > #-------------------------------# > running blast search. > #--------- command -------------# > Widget::blastx: > /opt/miniconda3/bin/blastx -db /gpfs0/scratch/1895302/maker_kXmduG/te_proteins%2Efasta.mpi.10.1 -query /gpfs0/scratch/1895302/maker_kXmduG/2/NT_010783%2E15.1 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.1.repeatrunner > #-------------------------------# > deleted:19 hits > deleted:18 hits > doing blastx repeats > doing blastx repeats > formating database... > #--------- command -------------# > Widget::formater: > /opt/miniconda3/bin/makeblastdb -dbtype prot -in /gpfs0/scratch/1895302/maker_kXmduG/1/blastprep/te_proteins%2Efasta.mpi.10.2 > #-------------------------------# > formating database... > #--------- command -------------# > Widget::formater: > /opt/miniconda3/bin/makeblastdb -dbtype prot -in /gpfs0/scratch/1895302/maker_kXmduG/2/blastprep/te_proteins%2Efasta.mpi.10.3 > #-------------------------------# > running blast search. > #--------- command -------------# > Widget::blastx: > /opt/miniconda3/bin/blastx -db /gpfs0/scratch/1895302/maker_kXmduG/te_proteins%2Efasta.mpi.10.2 -query /gpfs0/scratch/1895302/maker_kXmduG/1/NT_010783%2E15.1 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.2.repeatrunner > #-------------------------------# > running blast search. > #--------- command -------------# > Widget::blastx: > /opt/miniconda3/bin/blastx -db /gpfs0/scratch/1895302/maker_kXmduG/te_proteins%2Efasta.mpi.10.3 -query /gpfs0/scratch/1895302/maker_kXmduG/2/NT_010783%2E15.1 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.3.repeatrunner > #-------------------------------# > deleted:9 hits > deleted:9 hits > doing blastx repeats > doing blastx repeats > formating database... > #--------- command -------------# > Widget::formater: > /opt/miniconda3/bin/makeblastdb -dbtype prot -in /gpfs0/scratch/1895302/maker_kXmduG/1/blastprep/te_proteins%2Efasta.mpi.10.5 > #-------------------------------# > formating database... > #--------- command -------------# > Widget::formater: > /opt/miniconda3/bin/makeblastdb -dbtype prot -in /gpfs0/scratch/1895302/maker_kXmduG/2/blastprep/te_proteins%2Efasta.mpi.10.4 > #-------------------------------# > running blast search. > #--------- command -------------# > Widget::blastx: > /opt/miniconda3/bin/blastx -db /gpfs0/scratch/1895302/maker_kXmduG/te_proteins%2Efasta.mpi.10.5 -query /gpfs0/scratch/1895302/maker_kXmduG/1/NT_010783%2E15.1 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.5.repeatrunner > #-------------------------------# > running blast search. > #--------- command -------------# > Widget::blastx: > /opt/miniconda3/bin/blastx -db /gpfs0/scratch/1895302/maker_kXmduG/te_proteins%2Efasta.mpi.10.4 -query /gpfs0/scratch/1895302/maker_kXmduG/2/NT_010783%2E15.1 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.4.repeatrunner > #-------------------------------# > deleted:16 hits > deleted:16 hits > doing blastx repeats > doing blastx repeats > formating database... > #--------- command -------------# > Widget::formater: > /opt/miniconda3/bin/makeblastdb -dbtype prot -in /gpfs0/scratch/1895302/maker_kXmduG/1/blastprep/te_proteins%2Efasta.mpi.10.6 > #-------------------------------# > formating database... > #--------- command -------------# > Widget::formater: > /opt/miniconda3/bin/makeblastdb -dbtype prot -in /gpfs0/scratch/1895302/maker_kXmduG/2/blastprep/te_proteins%2Efasta.mpi.10.7 > #-------------------------------# > running blast search. > #--------- command -------------# > Widget::blastx: > /opt/miniconda3/bin/blastx -db /gpfs0/scratch/1895302/maker_kXmduG/te_proteins%2Efasta.mpi.10.6 -query /gpfs0/scratch/1895302/maker_kXmduG/1/NT_010783%2E15.1 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.6.repeatrunner > #-------------------------------# > running blast search. > #--------- command -------------# > Widget::blastx: > /opt/miniconda3/bin/blastx -db /gpfs0/scratch/1895302/maker_kXmduG/te_proteins%2Efasta.mpi.10.7 -query /gpfs0/scratch/1895302/maker_kXmduG/2/NT_010783%2E15.1 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.7.repeatrunner > #-------------------------------# > deleted:8 hits > deleted:12 hits > doing blastx repeats > doing blastx repeats > formating database... > #--------- command -------------# > Widget::formater: > /opt/miniconda3/bin/makeblastdb -dbtype prot -in /gpfs0/scratch/1895302/maker_kXmduG/1/blastprep/te_proteins%2Efasta.mpi.10.8 > #-------------------------------# > formating database... > #--------- command -------------# > Widget::formater: > /opt/miniconda3/bin/makeblastdb -dbtype prot -in /gpfs0/scratch/1895302/maker_kXmduG/2/blastprep/te_proteins%2Efasta.mpi.10.9 > #-------------------------------# > running blast search. > #--------- command -------------# > Widget::blastx: > /opt/miniconda3/bin/blastx -db /gpfs0/scratch/1895302/maker_kXmduG/te_proteins%2Efasta.mpi.10.8 -query /gpfs0/scratch/1895302/maker_kXmduG/1/NT_010783%2E15.1 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.8.repeatrunner > #-------------------------------# > running blast search. > #--------- command -------------# > Widget::blastx: > /opt/miniconda3/bin/blastx -db /gpfs0/scratch/1895302/maker_kXmduG/te_proteins%2Efasta.mpi.10.9 -query /gpfs0/scratch/1895302/maker_kXmduG/2/NT_010783%2E15.1 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.9.repeatrunner > #-------------------------------# > deleted:5 hits > deleted:7 hits > collecting blastx repeatmasking > processing all repeats > in cluster::shadow_cluster... > ...finished clustering. > > Let me know if you need more information. Thank you in advance for your help. > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Aug 14 13:06:41 2019 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 14 Aug 2019 13:06:41 -0600 Subject: [maker-devel] maker with mpi support on example still not done after two days In-Reply-To: References: <82339125479f4b278b87bf8458c2c04d@l1perdwmbx02.childrensroot.net> Message-ID: <54BF4DAE-272A-453E-9522-A82FAADABEF7@gmail.com> You can also try OpenMPI instead of MPICH to see if it behaves better (requires MAKER reinstall against OpenMPI libraries). ?Carson > On Aug 14, 2019, at 1:05 PM, Carson Holt wrote: > > If no additional output is produced, then it is frozen for some reason. My first suggestion would be to reinstall MPICH and then reinstall MAKER using the MPICH you just installed. Freezing issues are usually related to MPI communication which is handled by the communicator you have installed (i.e. the mpiexec command that launches MAKER). > > Alternatively it can be related to the file system. Some less common network mounted file systems do not support do not correctly support hardlinks which can cause programatic file locks to freeze. You can try running with the -nolock flag if that is the case. > > ?Carson > > >> On Aug 8, 2019, at 7:12 AM, Taslim, Cenny > wrote: >> >> Hi Maker developers, >> >> Thanks for approving my subscription. >> I tried running maker with mpi support on the human fasta file provided in example_01_basic. It?s been running for 2 days and 21 hours. I didn?t think the example require a long time to run. >> I?m hoping someone can help me point out the problem. >> >> I?m running it with 4 processes: >> ~/opt/mpich-3.3.1/bin/mpiexec -n 4 ~/opt/maker.4/maker/bin/maker -f 2> maker.error >> >> Maker_opts.ctl is the same as opts2.txt >> >> These are the log: >> STATUS: Parsing control files... >> STATUS: Processing and indexing input FASTA files... >> STATUS: Setting up database for any GFF3 input... >> A data structure will be created for you at: >> /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore >> >> To access files for individual sequences use the datastore index: >> /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_master_datastore_index.log >> >> STATUS: Now running MAKER... >> examining contents of the fasta file and run log >> >> >> >> --Next Contig-- >> >> #--------------------------------------------------------------------- >> Now starting the contig!! >> SeqID: NT_010783.15 >> Length: 201444 >> #--------------------------------------------------------------------- >> >> >> setting up GFF3 output and fasta chunks >> doing repeat masking >> doing repeat masking >> running repeat masker. >> #--------- command -------------# >> Widget::RepeatMasker: >> cd /gpfs0/scratch/1895302/maker_kXmduG; /opt/maker.4/maker/exe/RepeatMasker/RepeatMasker /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.0.all.rb -species all -dir /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0 -pa 1 >> #-------------------------------# >> running repeat masker. >> #--------- command -------------# >> Widget::RepeatMasker: >> cd /gpfs0/scratch/1895302/maker_kXmduG; /opt/maker.4/maker/exe/RepeatMasker/RepeatMasker /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.all.rb -species all -dir /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0 -pa 1 >> #-------------------------------# >> doing blastx repeats >> formating database... >> #--------- command -------------# >> Widget::formater: >> /opt/miniconda3/bin/makeblastdb -dbtype prot -in /gpfs0/scratch/1895302/maker_kXmduG/1/blastprep/te_proteins%2Efasta.mpi.10.0 >> #-------------------------------# >> doing blastx repeats >> formating database... >> #--------- command -------------# >> Widget::formater: >> /opt/miniconda3/bin/makeblastdb -dbtype prot -in /gpfs0/scratch/1895302/maker_kXmduG/2/blastprep/te_proteins%2Efasta.mpi.10.1 >> #-------------------------------# >> running blast search. >> #--------- command -------------# >> Widget::blastx: >> /opt/miniconda3/bin/blastx -db /gpfs0/scratch/1895302/maker_kXmduG/te_proteins%2Efasta.mpi.10.0 -query /gpfs0/scratch/1895302/maker_kXmduG/1/NT_010783%2E15.1 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.0.repeatrunner >> #-------------------------------# >> running blast search. >> #--------- command -------------# >> Widget::blastx: >> /opt/miniconda3/bin/blastx -db /gpfs0/scratch/1895302/maker_kXmduG/te_proteins%2Efasta.mpi.10.1 -query /gpfs0/scratch/1895302/maker_kXmduG/2/NT_010783%2E15.1 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.1.repeatrunner >> #-------------------------------# >> deleted:19 hits >> deleted:18 hits >> doing blastx repeats >> doing blastx repeats >> formating database... >> #--------- command -------------# >> Widget::formater: >> /opt/miniconda3/bin/makeblastdb -dbtype prot -in /gpfs0/scratch/1895302/maker_kXmduG/1/blastprep/te_proteins%2Efasta.mpi.10.2 >> #-------------------------------# >> formating database... >> #--------- command -------------# >> Widget::formater: >> /opt/miniconda3/bin/makeblastdb -dbtype prot -in /gpfs0/scratch/1895302/maker_kXmduG/2/blastprep/te_proteins%2Efasta.mpi.10.3 >> #-------------------------------# >> running blast search. >> #--------- command -------------# >> Widget::blastx: >> /opt/miniconda3/bin/blastx -db /gpfs0/scratch/1895302/maker_kXmduG/te_proteins%2Efasta.mpi.10.2 -query /gpfs0/scratch/1895302/maker_kXmduG/1/NT_010783%2E15.1 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.2.repeatrunner >> #-------------------------------# >> running blast search. >> #--------- command -------------# >> Widget::blastx: >> /opt/miniconda3/bin/blastx -db /gpfs0/scratch/1895302/maker_kXmduG/te_proteins%2Efasta.mpi.10.3 -query /gpfs0/scratch/1895302/maker_kXmduG/2/NT_010783%2E15.1 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.3.repeatrunner >> #-------------------------------# >> deleted:9 hits >> deleted:9 hits >> doing blastx repeats >> doing blastx repeats >> formating database... >> #--------- command -------------# >> Widget::formater: >> /opt/miniconda3/bin/makeblastdb -dbtype prot -in /gpfs0/scratch/1895302/maker_kXmduG/1/blastprep/te_proteins%2Efasta.mpi.10.5 >> #-------------------------------# >> formating database... >> #--------- command -------------# >> Widget::formater: >> /opt/miniconda3/bin/makeblastdb -dbtype prot -in /gpfs0/scratch/1895302/maker_kXmduG/2/blastprep/te_proteins%2Efasta.mpi.10.4 >> #-------------------------------# >> running blast search. >> #--------- command -------------# >> Widget::blastx: >> /opt/miniconda3/bin/blastx -db /gpfs0/scratch/1895302/maker_kXmduG/te_proteins%2Efasta.mpi.10.5 -query /gpfs0/scratch/1895302/maker_kXmduG/1/NT_010783%2E15.1 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.5.repeatrunner >> #-------------------------------# >> running blast search. >> #--------- command -------------# >> Widget::blastx: >> /opt/miniconda3/bin/blastx -db /gpfs0/scratch/1895302/maker_kXmduG/te_proteins%2Efasta.mpi.10.4 -query /gpfs0/scratch/1895302/maker_kXmduG/2/NT_010783%2E15.1 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.4.repeatrunner >> #-------------------------------# >> deleted:16 hits >> deleted:16 hits >> doing blastx repeats >> doing blastx repeats >> formating database... >> #--------- command -------------# >> Widget::formater: >> /opt/miniconda3/bin/makeblastdb -dbtype prot -in /gpfs0/scratch/1895302/maker_kXmduG/1/blastprep/te_proteins%2Efasta.mpi.10.6 >> #-------------------------------# >> formating database... >> #--------- command -------------# >> Widget::formater: >> /opt/miniconda3/bin/makeblastdb -dbtype prot -in /gpfs0/scratch/1895302/maker_kXmduG/2/blastprep/te_proteins%2Efasta.mpi.10.7 >> #-------------------------------# >> running blast search. >> #--------- command -------------# >> Widget::blastx: >> /opt/miniconda3/bin/blastx -db /gpfs0/scratch/1895302/maker_kXmduG/te_proteins%2Efasta.mpi.10.6 -query /gpfs0/scratch/1895302/maker_kXmduG/1/NT_010783%2E15.1 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.6.repeatrunner >> #-------------------------------# >> running blast search. >> #--------- command -------------# >> Widget::blastx: >> /opt/miniconda3/bin/blastx -db /gpfs0/scratch/1895302/maker_kXmduG/te_proteins%2Efasta.mpi.10.7 -query /gpfs0/scratch/1895302/maker_kXmduG/2/NT_010783%2E15.1 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.7.repeatrunner >> #-------------------------------# >> deleted:8 hits >> deleted:12 hits >> doing blastx repeats >> doing blastx repeats >> formating database... >> #--------- command -------------# >> Widget::formater: >> /opt/miniconda3/bin/makeblastdb -dbtype prot -in /gpfs0/scratch/1895302/maker_kXmduG/1/blastprep/te_proteins%2Efasta.mpi.10.8 >> #-------------------------------# >> formating database... >> #--------- command -------------# >> Widget::formater: >> /opt/miniconda3/bin/makeblastdb -dbtype prot -in /gpfs0/scratch/1895302/maker_kXmduG/2/blastprep/te_proteins%2Efasta.mpi.10.9 >> #-------------------------------# >> running blast search. >> #--------- command -------------# >> Widget::blastx: >> /opt/miniconda3/bin/blastx -db /gpfs0/scratch/1895302/maker_kXmduG/te_proteins%2Efasta.mpi.10.8 -query /gpfs0/scratch/1895302/maker_kXmduG/1/NT_010783%2E15.1 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.8.repeatrunner >> #-------------------------------# >> running blast search. >> #--------- command -------------# >> Widget::blastx: >> /opt/miniconda3/bin/blastx -db /gpfs0/scratch/1895302/maker_kXmduG/te_proteins%2Efasta.mpi.10.9 -query /gpfs0/scratch/1895302/maker_kXmduG/2/NT_010783%2E15.1 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /maker_tutorial/example_01_basic/hsap_contig.maker.output/hsap_contig_datastore/80/99/NT_010783.15//theVoid.NT_010783.15/0/NT_010783%2E15.1.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.9.repeatrunner >> #-------------------------------# >> deleted:5 hits >> deleted:7 hits >> collecting blastx repeatmasking >> processing all repeats >> in cluster::shadow_cluster... >> ...finished clustering. >> >> Let me know if you need more information. Thank you in advance for your help. >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From pickettbd at gmail.com Thu Aug 15 14:48:46 2019 From: pickettbd at gmail.com (Brandon Pickett) Date: Thu, 15 Aug 2019 13:48:46 -0700 Subject: [maker-devel] which files are expected after fasta_merge? Message-ID: Good afternoon! I just finished my third round of maker. I trained snap, augustus, etc. between the rounds. I used fasta_merge and gff3_merge to extract files after each round of maker. gff3_merge performed as expected each time, but fasta_merge surprised me. I will show you which files fasta_merge generated after each round. Please note that, as many people do, I renamed my output files from the default. Accordingly, I will list all the files with a generalized prefix of "maker" and show the rest of the file name as it was generated for me. Also note that I've changed .fasta to .fa for brevity. After round #1: transcripts.fa proteins.fa After round #2: non_overlapping_ab_initio.proteins.fa non_overlapping_ab_initio.transcripts.fa transcripts.fa augustus_masked.proteins.fa augustus_masked.transcripts.fa evm.proteins.fa evm.transcripts.fa genemark.proteins.fa genemark.transcripts.fa snap_masked.proteins.fa snap_masked.transcripts.fa proteins.fa After round #3: non_overlapping_ab_initio.proteins.fa non_overlapping_ab_initio.transcripts.fa augustus_masked.proteins.fa augustus_masked.transcripts.fa genemark.proteins.fa genemark.transcripts.fa snap_masked.proteins.fa snap_masked.transcripts.fa I am unsurprised that I didn't get all these files after round #1 because I used round #1 to generate gene models from transcript evidence. I didn't expect so many files after round #2 (having only seen the output from round #1 up to that point), but it makes sense that I would get output from augustus, evidence modeler (evm), genemark, and snap since I provided them as input to this round (#2) of maker. Between rounds #2 and #3, I re-trained snap and augustus. Genemark was trained between rounds #1 and #2 without gene models from maker and thus did not require re-training. The only difference in my maker control files between rounds #2 and #3 were the paths to the snap and augustus files. In both #2 and #3, the control files had run_evm=1. I can provide my control files for each round, if needed. *My question is why transcripts.fa, proteins.fa, evm.proteins.fa, and evm.transcripts.fa were not generated after round #3? *I recognize that this is probably not an error, rather a lack of my understanding of when each file is and is not generated. Thank you, Brandon Pickett -------------- next part -------------- An HTML attachment was scrubbed... URL: From jacques.dainat at nbis.se Tue Aug 20 02:14:53 2019 From: jacques.dainat at nbis.se (Jacques Dainat) Date: Tue, 20 Aug 2019 10:14:53 +0200 Subject: [maker-devel] maker_gff parameter - problem when gff contains fasta sequences Message-ID: <0EBF46CF-C0C8-4985-93D4-7BA587413DA7@nbis.se> Dear Carson, I?m using maker/3.01.02 with open MPI. I realised that the option maker_gff from the maker_opts.ctl works great as long as no FASTA sequence is embeded in the GFF3 file. e.g: ``` ### ##FASTA >3098|quiver TTTATGGGTTCAGGCGGACCCATGGCGCCGACCATATTTTGAGAGCTGGACGACTCTGTA GGGTTGGGTATTGGCTGATTATTCATTCAAATCCCACGAGTAGCCTAGGAAGTGACGGTC ``` I ended up with GFF3 files containing fasta sequences in a sequential manner (All contig1 features then the sequence of contig1, all contig2 features then the sequence of contig2, etc? I precise this because we can meet gff3 files where all the sequences are gather at the end of the file). In such case MAKER takes in consideration only the features met before to reach the first FASTA sequence in the file. Then it stops to process the file and doesn?t consider the rest of it. I haven?t seen any particular message but my resulting annotation was obviously wrong. Indeed most of the data repeat/alignment/models contained in the gff file haven?t been passed to MAKER. Would it be possible to add a fix to continue to parse a gff file even after meeting a fasta sequence? Best regards, /Jacques ------------------------------------------------- Jacques Dainat, Ph.D. NBIS (National Bioinformatics Infrastructure Sweden) Genome Annotation Service http://nbis.se/about/staff/jacques-dainat https://github.com/NBISweden/GAAS http://nbis.se ? Contact ? Address: Uppsala University, Biomedicinska Centrum Department of Medical Biochemistry Microbiology, Genomics Husargatan 3, box 582 S-75123 Uppsala Sweden Phone: +46 18 471 46 25 -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Aug 20 07:16:23 2019 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 20 Aug 2019 07:16:23 -0600 Subject: [maker-devel] maker_gff parameter - problem when gff contains fasta sequences In-Reply-To: <0EBF46CF-C0C8-4985-93D4-7BA587413DA7@nbis.se> References: <0EBF46CF-C0C8-4985-93D4-7BA587413DA7@nbis.se> Message-ID: All fasta entries must occur at the end of the file according to gff3 specification. If a fasta entry is embedded in the middle, you have a corrupt file. If you are trying to merge gff3, files you can use the gff3_merge script. Concatenation via something like ?cat? however results in a broken file. ?Carson Sent from my iPhone > On Aug 20, 2019, at 2:14 AM, Jacques Dainat wrote: > > Dear Carson, > > I?m using maker/3.01.02 with open MPI. > I realised that the option maker_gff from the maker_opts.ctl works great as long as no FASTA sequence is embeded in the GFF3 file. > e.g: > ``` > ### > ##FASTA > >3098|quiver > TTTATGGGTTCAGGCGGACCCATGGCGCCGACCATATTTTGAGAGCTGGACGACTCTGTA > GGGTTGGGTATTGGCTGATTATTCATTCAAATCCCACGAGTAGCCTAGGAAGTGACGGTC > ``` > I ended up with GFF3 files containing fasta sequences in a sequential manner (All contig1 features then the sequence of contig1, all contig2 features then the sequence of contig2, etc? I precise this because we can meet gff3 files where all the sequences are gather at the end of the file). In such case MAKER takes in consideration only the features met before to reach the first FASTA sequence in the file. Then it stops to process the file and doesn?t consider the rest of it. > > I haven?t seen any particular message but my resulting annotation was obviously wrong. Indeed most of the data repeat/alignment/models contained in the gff file haven?t been passed to MAKER. Would it be possible to add a fix to continue to parse a gff file even after meeting a fasta sequence? > > > Best regards, > > /Jacques > ------------------------------------------------- > Jacques Dainat, Ph.D. > NBIS (National Bioinformatics Infrastructure Sweden) > Genome Annotation Service > http://nbis.se/about/staff/jacques-dainat > https://github.com/NBISweden/GAAS > http://nbis.se > > ? Contact ? > Address: Uppsala University, Biomedicinska Centrum > Department of Medical Biochemistry Microbiology, Genomics > Husargatan 3, box 582 > S-75123 Uppsala Sweden > Phone: +46 18 471 46 25 > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Aug 20 07:20:50 2019 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 20 Aug 2019 07:20:50 -0600 Subject: [maker-devel] maker_gff parameter - problem when gff contains fasta sequences In-Reply-To: References: <0EBF46CF-C0C8-4985-93D4-7BA587413DA7@nbis.se> Message-ID: Here is the relevant part of the format specification ?> ##FASTA This notation indicates that the annotation portion of the file is at an end and that the remainder of the file contains one or more sequences (nucleotide or protein) in FASTA format. This allows features and sequences to be bundled together. All FASTA sequences included in the file must be included together at the end of the file and may not be interspersed with the features lines. Once a ##FASTA section is encountered no other content beyond valid FASTA sequence is allowed. ?Carson Sent from my iPhone > On Aug 20, 2019, at 7:16 AM, Carson Holt wrote: > > All fasta entries must occur at the end of the file according to gff3 specification. If a fasta entry is embedded in the middle, you have a corrupt file. If you are trying to merge gff3, files you can use the gff3_merge script. Concatenation via something like ?cat? however results in a broken file. > > ?Carson > > Sent from my iPhone > >> On Aug 20, 2019, at 2:14 AM, Jacques Dainat wrote: >> >> Dear Carson, >> >> I?m using maker/3.01.02 with open MPI. >> I realised that the option maker_gff from the maker_opts.ctl works great as long as no FASTA sequence is embeded in the GFF3 file. >> e.g: >> ``` >> ### >> ##FASTA >> >3098|quiver >> TTTATGGGTTCAGGCGGACCCATGGCGCCGACCATATTTTGAGAGCTGGACGACTCTGTA >> GGGTTGGGTATTGGCTGATTATTCATTCAAATCCCACGAGTAGCCTAGGAAGTGACGGTC >> ``` >> I ended up with GFF3 files containing fasta sequences in a sequential manner (All contig1 features then the sequence of contig1, all contig2 features then the sequence of contig2, etc? I precise this because we can meet gff3 files where all the sequences are gather at the end of the file). In such case MAKER takes in consideration only the features met before to reach the first FASTA sequence in the file. Then it stops to process the file and doesn?t consider the rest of it. >> >> I haven?t seen any particular message but my resulting annotation was obviously wrong. Indeed most of the data repeat/alignment/models contained in the gff file haven?t been passed to MAKER. Would it be possible to add a fix to continue to parse a gff file even after meeting a fasta sequence? >> >> >> Best regards, >> >> /Jacques >> ------------------------------------------------- >> Jacques Dainat, Ph.D. >> NBIS (National Bioinformatics Infrastructure Sweden) >> Genome Annotation Service >> http://nbis.se/about/staff/jacques-dainat >> https://github.com/NBISweden/GAAS >> http://nbis.se >> >> ? Contact ? >> Address: Uppsala University, Biomedicinska Centrum >> Department of Medical Biochemistry Microbiology, Genomics >> Husargatan 3, box 582 >> S-75123 Uppsala Sweden >> Phone: +46 18 471 46 25 >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From jacques.dainat at nbis.se Tue Aug 20 07:28:46 2019 From: jacques.dainat at nbis.se (Jacques Dainat) Date: Tue, 20 Aug 2019 15:28:46 +0200 Subject: [maker-devel] maker_gff parameter - problem when gff contains fasta sequences In-Reply-To: References: <0EBF46CF-C0C8-4985-93D4-7BA587413DA7@nbis.se> Message-ID: <66852BCA-9F3C-40CB-B58A-9578E3418851@nbis.se> Thank you for your quick answer, you are right I should have read the gff3 specification more carefully. I will investigate which step I modified that introduced the problem. Thank again. /Jacques > On 20 Aug 2019, at 15:16, Carson Holt wrote: > > All fasta entries must occur at the end of the file according to gff3 specification. If a fasta entry is embedded in the middle, you have a corrupt file. If you are trying to merge gff3, files you can use the gff3_merge script. Concatenation via something like ?cat? however results in a broken file. > > ?Carson > > Sent from my iPhone > > On Aug 20, 2019, at 2:14 AM, Jacques Dainat > wrote: > >> Dear Carson, >> >> I?m using maker/3.01.02 with open MPI. >> I realised that the option maker_gff from the maker_opts.ctl works great as long as no FASTA sequence is embeded in the GFF3 file. >> e.g: >> ``` >> ### >> ##FASTA >> >3098|quiver >> TTTATGGGTTCAGGCGGACCCATGGCGCCGACCATATTTTGAGAGCTGGACGACTCTGTA >> GGGTTGGGTATTGGCTGATTATTCATTCAAATCCCACGAGTAGCCTAGGAAGTGACGGTC >> ``` >> I ended up with GFF3 files containing fasta sequences in a sequential manner (All contig1 features then the sequence of contig1, all contig2 features then the sequence of contig2, etc? I precise this because we can meet gff3 files where all the sequences are gather at the end of the file). In such case MAKER takes in consideration only the features met before to reach the first FASTA sequence in the file. Then it stops to process the file and doesn?t consider the rest of it. >> >> I haven?t seen any particular message but my resulting annotation was obviously wrong. Indeed most of the data repeat/alignment/models contained in the gff file haven?t been passed to MAKER. Would it be possible to add a fix to continue to parse a gff file even after meeting a fasta sequence? >> >> >> Best regards, >> >> /Jacques >> ------------------------------------------------- >> Jacques Dainat, Ph.D. >> NBIS (National Bioinformatics Infrastructure Sweden) >> Genome Annotation Service >> http://nbis.se/about/staff/jacques-dainat >> https://github.com/NBISweden/GAAS >> http://nbis.se >> >> ? Contact ? >> Address: Uppsala University, Biomedicinska Centrum >> Department of Medical Biochemistry Microbiology, Genomics >> Husargatan 3, box 582 >> S-75123 Uppsala Sweden >> Phone: +46 18 471 46 25 >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From jacques.dainat at nbis.se Wed Aug 21 06:59:43 2019 From: jacques.dainat at nbis.se (Jacques Dainat) Date: Wed, 21 Aug 2019 14:59:43 +0200 Subject: [maker-devel] GeneWise in MAKER? Message-ID: Dear Carson, Reading this paper J. Armstrong, I. T. Fiddes, M. Diekhans, and B. Paten. Whole-Genome Alignment and Comparative Annotation. Annu Rev Anim Biosci, Oct 2018. I discovered that MAKER is running GeneWise (i.e table 4). Do they mixup with Exonerate, or it is something well hidden within MAKER and its documentation? Best regards, Jacques ------------------------------------------------- Jacques Dainat, Ph.D. NBIS (National Bioinformatics Infrastructure Sweden) Genome Annotation Service http://nbis.se/about/staff/jacques-dainat https://github.com/NBISweden/GAAS http://nbis.se ? Contact ? Address: Uppsala University, Biomedicinska Centrum Department of Medical Biochemistry Microbiology, Genomics Husargatan 3, box 582 S-75123 Uppsala Sweden Phone: +46 18 471 46 25 -------------- next part -------------- An HTML attachment was scrubbed... URL: From ABoyher at danforthcenter.org Wed Aug 28 12:31:33 2019 From: ABoyher at danforthcenter.org (Boyher, Adam) Date: Wed, 28 Aug 2019 18:31:33 +0000 Subject: [maker-devel] Haplotype specific annotations Message-ID: <02119316-A412-43AB-A6A8-05037B14D972@contoso.com> Hi I have a phased genome (haplotype specific assemblies) that I have annotated separately with Maker. There are a couple of things I?ve had a little trouble figuring out in relation to this. The first is that I have 3 sets of genes, two sets that exist in one haplotype but not the other, and a third that exist in both. I want to name these genes specifically based on what set they are in. So for instance, genes that exist in both have the same name in both assemblies, but genes that exist in only one haplotype are named specific to that haplotype. Is there a straightforward way to do this? The second issue is that I?ve discovered one gene that is annotated in one phase, but not the other. However, when I blast the genomic sequence against the second haplotype, I find an exact match. Given that I used the exact same methods to annotate both haplotypes, starting with the same set of evidence (protein, transcriptome, repeat), why might maker miss or exclude that gene? Thanks Adam -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Aug 30 10:20:32 2019 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 30 Aug 2019 10:20:32 -0600 Subject: [maker-devel] Haplotype specific annotations In-Reply-To: <02119316-A412-43AB-A6A8-05037B14D972@contoso.com> References: <02119316-A412-43AB-A6A8-05037B14D972@contoso.com> Message-ID: <098E19B1-BDB4-4FDA-A94B-4E30CEB5B58A@gmail.com> It may be that there are broken splice donors/acceptors in one vs the other which would not be seen with a blast search. Look at both in a browser to see what evidence looks like in both and how ab initial predictions compare between the two. As for naming you could try reciprical best blast hits to see who matches who. Unfortunately you will have to do a lot of manual review to make sure you are not just matching parlors together. ?Carson > On Aug 28, 2019, at 12:31 PM, Boyher, Adam wrote: > > Hi > > I have a phased genome (haplotype specific assemblies) that I have annotated separately with Maker. There are a couple of things I?ve had a little trouble figuring out in relation to this. The first is that I have 3 sets of genes, two sets that exist in one haplotype but not the other, and a third that exist in both. I want to name these genes specifically based on what set they are in. So for instance, genes that exist in both have the same name in both assemblies, but genes that exist in only one haplotype are named specific to that haplotype. Is there a straightforward way to do this? > > The second issue is that I?ve discovered one gene that is annotated in one phase, but not the other. However, when I blast the genomic sequence against the second haplotype, I find an exact match. Given that I used the exact same methods to annotate both haplotypes, starting with the same set of evidence (protein, transcriptome, repeat), why might maker miss or exclude that gene? > > Thanks > Adam > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Aug 30 10:48:52 2019 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 30 Aug 2019 10:48:52 -0600 Subject: [maker-devel] which files are expected after fasta_merge? In-Reply-To: References: Message-ID: If you disabled evidence for round 3 (i.e. protein= and est=) then you will get no annotations and EVM will not run. You can look at the GFF3 in a browser, and if you see that there are no protein/est alignments, then that is likely why. ?Carson > On Aug 15, 2019, at 2:48 PM, Brandon Pickett wrote: > > Good afternoon! > > I just finished my third round of maker. I trained snap, augustus, etc. between the rounds. I used fasta_merge and gff3_merge to extract files after each round of maker. gff3_merge performed as expected each time, but fasta_merge surprised me. I will show you which files fasta_merge generated after each round. Please note that, as many people do, I renamed my output files from the default. Accordingly, I will list all the files with a generalized prefix of "maker" and show the rest of the file name as it was generated for me. Also note that I've changed .fasta to .fa for brevity. > > After round #1: > transcripts.fa > proteins.fa > > After round #2: > non_overlapping_ab_initio.proteins.fa > non_overlapping_ab_initio.transcripts.fa > transcripts.fa > augustus_masked.proteins.fa > augustus_masked.transcripts.fa > evm.proteins.fa > evm.transcripts.fa > genemark.proteins.fa > genemark.transcripts.fa > snap_masked.proteins.fa > snap_masked.transcripts.fa > proteins.fa > > After round #3: > non_overlapping_ab_initio.proteins.fa > non_overlapping_ab_initio.transcripts.fa > augustus_masked.proteins.fa > augustus_masked.transcripts.fa > genemark.proteins.fa > genemark.transcripts.fa > snap_masked.proteins.fa > snap_masked.transcripts.fa > > I am unsurprised that I didn't get all these files after round #1 because I used round #1 to generate gene models from transcript evidence. I didn't expect so many files after round #2 (having only seen the output from round #1 up to that point), but it makes sense that I would get output from augustus, evidence modeler (evm), genemark, and snap since I provided them as input to this round (#2) of maker. Between rounds #2 and #3, I re-trained snap and augustus. Genemark was trained between rounds #1 and #2 without gene models from maker and thus did not require re-training. The only difference in my maker control files between rounds #2 and #3 were the paths to the snap and augustus files. In both #2 and #3, the control files had run_evm=1. I can provide my control files for each round, if needed. My question is why transcripts.fa, proteins.fa, evm.proteins.fa, and evm.transcripts.fa were not generated after round #3? I recognize that this is probably not an error, rather a lack of my understanding of when each file is and is not generated. > > Thank you, > Brandon Pickett > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: