From jennifer.anderson at ebc.uu.se Tue Jul 3 06:57:56 2018 From: jennifer.anderson at ebc.uu.se (Jennifer Anderson) Date: Tue, 3 Jul 2018 13:57:56 +0200 Subject: [maker-devel] Genemark XXX.mod files Message-ID: <902950FF-775C-46DC-987A-5666A56A6650@ebc.uu.se> Hello, I am working on annotations for fungal genomes, using GenemarkES with ?fungi for gene prediction. In earlier attempts, I did not use the training flag, and I did get the output gmhmm file. Now I have tried with the training flag and do not get this file. In the /run/ directory I do get mod files ES_A.mod, ES_B.mod, and ES_C.mod, as well as ini.mod. Does one of these files work as the ES.mod file as in "gmhmm=../train_genemark/es.mod #GeneMark HMM file? from http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/The_MAKER_control_files_explained? I don?t find documentation of the genemarkES output online. Thank you. Jenni N?r du har kontakt med oss p? Uppsala universitet med e-post s? inneb?r det att vi behandlar dina personuppgifter. F?r att l?sa mer om hur vi g?r det kan du l?sa h?r: http://www.uu.se/om-uu/dataskydd-personuppgifter/ E-mailing Uppsala University means that we will process your personal data. For more information on how this is performed, please read here: http://www.uu.se/om-uu/dataskydd-personuppgifter/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From liorglic at mail.tau.ac.il Wed Jul 4 07:32:05 2018 From: liorglic at mail.tau.ac.il (Lior Glick) Date: Wed, 4 Jul 2018 14:32:05 +0200 Subject: [maker-devel] How sensitive is MAKER to redundant/partial transcripts? Message-ID: Dear MAKER users, I am new to MAKER and would like your advice. I am planning to annotate multiple genomes of tomato variants and wild relatives. To this end, I have been working on generating a diverse transcripts data set to be used as input for MAKER (along with protein sequences and the 'official' tomato annotation). My transcripts set was generated by collecting multiple available RNA-Seq results from SRA, covering diverse variants, conditions and tissues, and assembling them into transcripts using Trinity. My goal is to have a data set as diverse and broad as possible. Now I have ~30 fasta files of transcripts, originating from different studies. Of course, many of the transcripts are redundant and/or partial. I am exploring ways to merge the multiple data sets into a non-redundant one, while also stitching partial transcripts into longer ones based on overlaps. However, this turns out to be not-so-trivial and I am wandering if this is really necessary in order to get a good annotation? Maybe I can just concatenate all my transcriptome assembly results, and MAKER will handle redundant and partial transcripts? Can someone clarify how this works, and try to assess if an annotation based on a merged data set should be superior to one that didn't undergo such a process? If someone has actual experience with such data, that would be really helpful, but any advice would be highly appreciated. Thanks a lot and best regards, Lior -------------- next part -------------- An HTML attachment was scrubbed... URL: From jason.stajich at gmail.com Thu Jul 5 13:13:57 2018 From: jason.stajich at gmail.com (Jason Stajich) Date: Thu, 5 Jul 2018 11:13:57 -0700 Subject: [maker-devel] Genemark XXX.mod files In-Reply-To: <902950FF-775C-46DC-987A-5666A56A6650@ebc.uu.se> References: <902950FF-775C-46DC-987A-5666A56A6650@ebc.uu.se> Message-ID: the run/ES_C.mod should be the right one if it is there. It is possible is crashing on one of the training / retraining? Jason Stajich jason.stajich at gmail.com On Tue, Jul 3, 2018 at 11:05 AM Jennifer Anderson < jennifer.anderson at ebc.uu.se> wrote: > > Hello, > > I am working on annotations for fungal genomes, using GenemarkES with > ?fungi for gene prediction. In earlier attempts, I did not use the > training flag, and I did get the output gmhmm file. Now I have tried with > the training flag and do not get this file. In the /run/ directory I do > get mod files ES_A.mod, ES_B.mod, and ES_C.mod, as well as ini.mod. Does > one of these files work as the ES.mod file as in > "gmhmm=../train_genemark/es.mod #GeneMark HMM file? from > http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/The_MAKER_control_files_explained? I > don?t find documentation of the genemarkES output online. > > Thank you. > > Jenni > > > > > > > > > > N?r du har kontakt med oss p? Uppsala universitet med e-post s? inneb?r > det att vi behandlar dina personuppgifter. F?r att l?sa mer om hur vi g?r > det kan du l?sa h?r: http://www.uu.se/om-uu/dataskydd-personuppgifter/ > > E-mailing Uppsala University means that we will process your personal > data. For more information on how this is performed, please read here: > http://www.uu.se/om-uu/dataskydd-personuppgifter/ > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Jul 5 13:47:38 2018 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 5 Jul 2018 12:47:38 -0600 Subject: [maker-devel] How sensitive is MAKER to redundant/partial transcripts? In-Reply-To: References: Message-ID: <788E84AB-DB85-43AD-8FE1-C1D8A7DBD4B5@gmail.com> MAKER will collapse redundant evidence after alignment, so it will primarily just increase run time. The main issue with so many datasets would be false positive alignments (assembled background transcription). You can look at individual contigs in Apollo, IGV, or other browser to see where spurious alignments occur and if they are overall associated with a particular dataset (it?s ok to throw out a noisy dataset especially if you have additional data). ?Carson > On Jul 4, 2018, at 6:32 AM, Lior Glick wrote: > > Dear MAKER users, > > I am new to MAKER and would like your advice. > I am planning to annotate multiple genomes of tomato variants and wild relatives. To this end, I have been working on generating a diverse transcripts data set to be used as input for MAKER (along with protein sequences and the 'official' tomato annotation). My transcripts set was generated by collecting multiple available RNA-Seq results from SRA, covering diverse variants, conditions and tissues, and assembling them into transcripts using Trinity. My goal is to have a data set as diverse and broad as possible. > Now I have ~30 fasta files of transcripts, originating from different studies. Of course, many of the transcripts are redundant and/or partial. I am exploring ways to merge the multiple data sets into a non-redundant one, while also stitching partial transcripts into longer ones based on overlaps. > However, this turns out to be not-so-trivial and I am wandering if this is really necessary in order to get a good annotation? Maybe I can just concatenate all my transcriptome assembly results, and MAKER will handle redundant and partial transcripts? > Can someone clarify how this works, and try to assess if an annotation based on a merged data set should be superior to one that didn't undergo such a process? If someone has actual experience with such data, that would be really helpful, but any advice would be highly appreciated. > > Thanks a lot and best regards, > Lior > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Jul 5 13:50:36 2018 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 5 Jul 2018 12:50:36 -0600 Subject: [maker-devel] [CAUTION: Suspicious Link] map_forward=1 not mapping reference ID's to output correctly In-Reply-To: References: Message-ID: <4EE96E7F-5F5B-4988-BC9C-FC441848B768@gmail.com> A quick overview of MAKER behavior. MAKER will keep everything in model_gff as long as you don?t provide another predictor to run or pred_gff file to use. But if you give it a predictor to run, it takes that as an indicator that you want to update models. So model_gff may get replaced by another prediction that overlaps it but scores better. So depending on the behavior you want, make sure you are using model_gff and do or don?t provide a gene predictor to run. ?Carson > On Jun 22, 2018, at 2:04 PM, Poelchau, Monica wrote: > > Hi Kapeel, > > If you just want your community annotations to replace models in an existing gene set, we have a tool for this: > > https://github.com/NAL-i5K/GFF3toolkit > > You?d need to run gff3_QC on your annotation files first to make sure your annotations are okay, then use gff3_merge to merge your community annotations with your existing gene set (in gff3 format). If you end up trying this out - we?re actively developing the GFF3toolkit, so feel free to post an issue if you notice any problems. > > Hth, > > Monica > > From: maker-devel > on behalf of Kapeel Chougule > > Date: Friday, June 22, 2018 at 13:53 > To: "maker-devel at yandell-lab.org " > > Subject: [CAUTION: Suspicious Link][maker-devel] map_forward=1 not mapping reference ID's to output correctly > > PROCEED WITH CAUTION: This message triggered warnings of potentially malicious web content. Evaluate this email by considering whether you are expecting the message, along with inspection for suspicious links. > > Questions: Spam.Abuse at wdc.usda.gov > > Hi, > > I am trying to update community annotation in the light of new evidence data but my MAKER runs are not keeping all the genes from the community annotation. > > > Community annotation feature count: 2 1 bicolor 239969 CDS 266301 exon 51066 five_prime_UTR 34129 gene 47121 mRNA 53708 three_prime_UTR > MAKER gene count-> > awk '$3=="gene"{print}' maker_output.all.gff | grep "Sobic*" | wc -l 21105 > > In the maker_opts.ctl file attached, I did make keep_preds=1 and map_forward=1 which keep all the community gene models even if they dont have evidence support. This was explained here: > http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Updating_annotations_in_light_of_new_data > . So not sure why we dont have the all the community gene models mapped in the MAKER output > > Thanks > > Kapeel > -- > > Kapeel Chougule > Computational Scientist Developer II > One Bungtown Road Cold Spring Harbor, NY 11724 > http://www.warelab.org/ > > > > This electronic message contains information generated by the USDA solely for the intended recipients. Any unauthorized interception of this message or the use or disclosure of the information it contains may violate the law and subject the violator to civil or criminal penalties. If you believe you have received this message in error, please notify the sender and delete the email immediately. _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Jul 5 14:17:14 2018 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 5 Jul 2018 13:17:14 -0600 Subject: [maker-devel] Maker Error : Thread 1 terminated abnormally.. In-Reply-To: References: Message-ID: Sorry for the slow reply. Make sure you find out what flavor of MPI you are using (MPICH, MVAPICH2, Intel MPI, or OpenMPI). MAKER does not work with MVAPICH2. It can work with Intel MPI and OpenMPI with some command line modification. And it always works with MPICH, but MPICH may not be able to scale to more than ~100 CPUs. This command ?-mca btl ^openib?, is only for OpenMPI for example. Also if using OpenMPI, set LD_PRELOAD in accordance with the INSTALL documentation. Also make sure you do not have multiple MPI flavors installed and you compiled MAKER with one then are running with a different flavor. That will cause failure shortly after starting MAKER. Try looking further back in your STDER for the actual cause. The ?Thread 1 terminated abnormally:? message is the tail end of the failure snowball, so the actual cause is often much further back. ?Carson > On Jun 26, 2018, at 9:36 AM, Andr? Machado wrote: > > Hi , > > First of all thanks for your efforts in Maker pipeline. Its a tremendous help for the people that works with genomes. > In the last 4 days i have broke my head.. with an error .. but still without a solution. > I found this old thread: https://groups.google.com/forum/#!msg/maker-devel/X2-76BH9gvg/rU4kLJ3B6tsJ > Seems to be a quite similar... but don't point to a specific solution. > I have run maker with the data test and all runned ok. Maker finalize the entire process without errors. > Recently, i?m trying to aplly my own data on MPI cluster. But this error, frequently occurred. > Thread 1 terminated abnormally: ../dna.maker.output/mpi_blastdb/dna%2Efa.mpi.1/dna%2Efa.mpi.1.0 > --> rank=8, hostname=compute-0-1.local, at ../Analysis/Geno/maker/bin/maker line 1451 thread 1. > --> rank=8, hostname=compute-0-1.local > deleted:0 hits > deleted:0 hits > preparing ab-inits > deleted:0 hits > deleted:0 hits > FATAL: Thread terminated, causing all processes to fail > --> rank=8, hostname=compute-0-1.local > deleted:0 hits > > Basically im tring to run a maker with dna.fa, rna.fa, prot.fa and my_custom_lib_of_repeats.fa, to produce raw genes models which will be used to train SNAP. > > I already used several command lines and all gave me the same error.. The only change between different tests was the local of the error, sometimes happened in compute-0-1.local other time in compute-0-4.local or in another one. > mpiexec -n 63 --hostfile Host maker 1>1.log 2>2.err > > mpiexec --hostfile Host maker 1>1.log 2>2.err > mpiexec -mca btl ^openib -n 63 --hostfile Host maker 1>1.log 2>2.err > nohup mpiexec -mca btl ^openib -n 63 --hostfile Host maker -a 1>1.log 2>2.err > > The log file as well the option files are provided below. > > Many thanks in advance, > > Andr? > > <2.log>_______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From andremmachado25 at gmail.com Wed Jul 4 06:16:08 2018 From: andremmachado25 at gmail.com (=?UTF-8?Q?Andr=C3=A9_Machado?=) Date: Wed, 4 Jul 2018 12:16:08 +0100 Subject: [maker-devel] =?utf-8?q?Maker_Error_=3A_Thread_1_terminated_abno?= =?utf-8?q?rmally=2E=2E_Andr=C3=A9_Machado_=3Candremmachado25=40gma?= =?utf-8?q?il=2Ecom=3E_AttachmentsJun_26_=288_days_ago=29_to_maker-?= =?utf-8?q?devel_Hi_=2C_First_of_all_thanks_for_your_efforts_in_Mak?= =?utf-8?q?er_pipeline=2E_Its_a_tremendous_help_for_the_people_that?= =?utf-8?q?_works_with_genomes=2E_In_the_last_4_days_i_have_broke_m?= =?utf-8?q?y_head=2E=2E_with_an_error_=2E=2E_but_still_without_a_so?= =?utf-8?q?lution=2E_I_found_this_old_thread=3A_https=3A//groups=2E?= =?utf-8?q?google=2Ecom/forum/=23!msg/maker-devel/X2-76BH9gvg/rU4kL?= =?utf-8?q?J3B6tsJ_Seems_to_be_a_quite_similar=2E=2E=2E_but_don=27t?= =?utf-8?q?_point_to_a_specific_solution=2E_I_have_run_maker_with_t?= =?utf-8?q?he_data_test_and_all_runned_ok=2E_Maker_finalize_the_ent?= =?utf-8?q?ire_process_without_errors=2E_Recently=2C_i=E2=80=99m_tr?= =?utf-8?q?ying_to_aplly_my_own_data_on_MPI_cluster=2E_But_this_err?= =?utf-8?q?or=2C_frequently_occurred=2E_Thread_1_terminated_abnorma?= =?utf-8?q?lly=3A_=2E=2E/dna=2Emaker=2Eoutput/mpi=5Fblastdb/dna=252?= =?utf-8?b?RWZhLm1waS4xL2RuYSUyRWZhLm1waS4xLjAgLS0+IHJhbms9OCwgaG9z?= =?utf-8?q?tname=3Dcompute-0-1=2Elocal=2C_at_=2E=2E/Analysis/Geno/m?= =?utf-8?q?aker/bin/maker_line_1451_thread_1=2E_--=3E_rank=3D8=2C_h?= =?utf-8?q?ostname=3Dcompute-0-1=2Elocal_deleted=3A0_hits_deleted?= =?utf-8?q?=3A0_hits_preparing_ab-ini?= Message-ID: Hi , First of all thanks for your efforts in Maker pipeline. Its a tremendous help for the people that works with genomes. In the last 4 days i have broke my head.. with an error .. but still without a solution. I found this old thread: https://groups.google.com/ forum/#!msg/maker-devel/X2-76BH9gvg/rU4kLJ3B6tsJ Seems to be a quite similar... but don't point to a specific solution. I have run maker with the data test and all runned ok. Maker finalize the entire process without errors. Recently, i?m trying to aplly my own data on MPI cluster. But this error, frequently occurred. Thread 1 terminated abnormally: ../dna.maker.output/mpi_ blastdb/dna%2Efa.mpi.1/dna%2Efa.mpi.1.0 --> rank=8, hostname=compute-0-1.local, at ../Analysis/Geno/maker/bin/maker line 1451 thread 1. --> rank=8, hostname=compute-0-1.local deleted:0 hits deleted:0 hits preparing ab-inits deleted:0 hits deleted:0 hits FATAL: Thread terminated, causing all processes to fail --> rank=8, hostname=compute-0-1.local deleted:0 hits Basically im tring to run a maker with dna.fa, rna.fa, prot.fa and my_custom_lib_of_repeats.fa, to produce raw genes models which will be used to train SNAP. I already used several command lines and all gave me the same error.. The only change between different tests was the local of the error, sometimes happened in compute-0-1.local other time in compute-0-4.local or in another one. mpiexec -n 63 --hostfile Host maker 1>1.log 2>2.err mpiexec --hostfile Host maker 1>1.log 2>2.err mpiexec -mca btl ^openib -n 63 --hostfile Host maker 1>1.log 2>2.err nohup mpiexec -mca btl ^openib -n 63 --hostfile Host maker -a 1>1.log 2>2.err The log file as well the option files are provided below. Many thanks in advance, Andr? -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 2.log Type: text/x-log Size: 38654 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_exe.ctl Type: application/octet-stream Size: 1223 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_opts.ctl Type: application/octet-stream Size: 4547 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_bopts.ctl Type: application/octet-stream Size: 1412 bytes Desc: not available URL: From liorglck at gmail.com Wed Jul 4 07:28:14 2018 From: liorglck at gmail.com (Lior Glick) Date: Wed, 4 Jul 2018 14:28:14 +0200 Subject: [maker-devel] How sensitive is MAKER to redundant/partial transcripts? Message-ID: Dear MAKER users, I am new to MAKER and would like your advice. I am planning to annotate multiple genomes of tomato variants and wild relatives. To this end, I have been working on generating a diverse transcripts data set to be used as input for MAKER (along with protein sequences and the 'official' tomato annotation). My transcripts set was generated by collecting multiple available RNA-Seq results from SRA, covering diverse variants, conditions and tissues, and assembling them into transcripts using Trinity. My goal is to have a data set as diverse and broad as possible. Now I have ~30 fasta files of transcripts, originating from different studies. Of course, many of the transcripts are redundant and/or partial. I am exploring ways to merge the multiple data sets into a non-redundant one, while also stitching partial transcripts into longer ones based on overlaps. However, this turns out to be not-so-trivial and I am wandering if this is really necessary in order to get a good annotation? Maybe I can just concatenate all my transcriptome assembly results, and MAKER will handle redundant and partial transcripts? Can someone clarify how this works, and try to assess if an annotation based on a merged data set should be superior to one that didn't undergo such a process? If someone has actual experience with such data, that would be really helpful, but any advice would be highly appreciated. Thanks a lot and best regards, Lior -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Jul 12 15:05:00 2018 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 12 Jul 2018 14:05:00 -0600 Subject: [maker-devel] =?utf-8?q?Maker_Error_=3A_Thread_1_terminated_abno?= =?utf-8?q?rmally=2E=2E_Andr=C3=A9_Machado_=3Candremmachado25=40gmail=2Eco?= =?utf-8?q?m=3E_AttachmentsJun_26_=288_days_ago=29_to_maker-devel_Hi_=2C_F?= =?utf-8?q?irst_of_all_thanks_for_your_efforts_in_Maker_pipeline=2E_Its_a_?= =?utf-8?q?tremendous_help_for_the_people_that_works_with_genomes=2E_In_th?= =?utf-8?q?e_last_4_days_i_have_broke_my_head=2E=2E_with_an_error_=2E=2E_b?= =?utf-8?q?ut_still_without_a_solution=2E_I_found_this_old_thread=3A_https?= =?utf-8?q?=3A//groups=2Egoogle=2Ecom/forum/=23!msg/maker-devel/X2-76BH9gv?= =?utf-8?q?g/rU4kLJ3B6tsJ_Seems_to_be_a_quite_similar=2E=2E=2E_but_don=27t?= =?utf-8?q?_point_to_a_specific_solution=2E_I_have_run_maker_with_the_data?= =?utf-8?q?_test_and_all_runned_ok=2E_Maker_finalize_the_entire_process_wi?= =?utf-8?q?thout_errors=2E_Recently=2C_i=E2=80=99m_trying_to_aplly_my_own_?= =?utf-8?q?data_on_MPI_cluster=2E_But_this_error=2C_frequently_occurred=2E?= =?utf-8?q?_Thread_1_terminated_abnormally=3A_=2E=2E/dna=2Emaker=2Eoutput/?= =?utf-8?q?mpi=5Fblastdb/dna=252Efa=2Empi=2E1/dna=252Efa=2Empi=2E1=2E0_--?= =?utf-8?q?=3E_rank=3D8=2C_hostname=3Dcompute-0-1=2Elocal=2C_at_=2E=2E/Ana?= =?utf-8?q?lysis/Geno/maker/bin/maker_line_1451_thread_1=2E_--=3E_rank=3D8?= =?utf-8?q?=2C_hostname=3Dcompute-0-1=2Elocal_deleted=3A0_hits_deleted=3A0?= =?utf-8?q?_hits_preparing_ab-ini?= In-Reply-To: References: Message-ID: <5F1E5499-239E-405E-81EC-CECC755D7838@gmail.com> Because you truncated / removed line before the actual error (I need to see the several hundred lines that happened before "Thread 1 terminated abnormally: ../dna.maker.output/mpi_blastdb/dna%2Efa.mpi.1/dna%2Efa.mpi.1.0?), I can?t give hyou more info. But you are getting a lot of OpenMPI complaints at the start. You may need to reinstall OpenMPI or use MPICH instead (both will require you to reinstall maker as it will need to rebuild the MPI C/Perl binding for the new installation). Also when using OpenMPI, make sure to export LD_PRELOAD in the way outlined in the ?/maker/INSTALL instructions. ?Carson > On Jul 4, 2018, at 5:16 AM, Andr? Machado wrote: > > Hi , > > First of all thanks for your efforts in Maker pipeline. Its a tremendous help for the people that works with genomes. > In the last 4 days i have broke my head.. with an error .. but still without a solution. > I found this old thread: https://groups.google.com/forum/#!msg/maker-devel/X2-76BH9gvg/rU4kLJ3B6tsJ > Seems to be a quite similar... but don't point to a specific solution. > I have run maker with the data test and all runned ok. Maker finalize the entire process without errors. > Recently, i?m trying to aplly my own data on MPI cluster. But this error, frequently occurred. > Thread 1 terminated abnormally: ../dna.maker.output/mpi_blastdb/dna%2Efa.mpi.1/dna%2Efa.mpi.1.0 > --> rank=8, hostname=compute-0-1.local, at ../Analysis/Geno/maker/bin/maker line 1451 thread 1. > --> rank=8, hostname=compute-0-1.local > deleted:0 hits > deleted:0 hits > preparing ab-inits > deleted:0 hits > deleted:0 hits > FATAL: Thread terminated, causing all processes to fail > --> rank=8, hostname=compute-0-1.local > deleted:0 hits > > Basically im tring to run a maker with dna.fa, rna.fa, prot.fa and my_custom_lib_of_repeats.fa, to produce raw genes models which will be used to train SNAP. > > I already used several command lines and all gave me the same error.. The only change between different tests was the local of the error, sometimes happened in compute-0-1.local other time in compute-0-4.local or in another one. > mpiexec -n 63 --hostfile Host maker 1>1.log 2>2.err > > mpiexec --hostfile Host maker 1>1.log 2>2.err > mpiexec -mca btl ^openib -n 63 --hostfile Host maker 1>1.log 2>2.err > nohup mpiexec -mca btl ^openib -n 63 --hostfile Host maker -a 1>1.log 2>2.err > > The log file as well the option files are provided below. > > Many thanks in advance, > > Andr? > > <2.log>_______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Jul 12 15:38:33 2018 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 12 Jul 2018 14:38:33 -0600 Subject: [maker-devel] How sensitive is MAKER to redundant/partial transcripts? In-Reply-To: References: Message-ID: MAKER will automatically collapse redundant evidence. The only thing you may need to worry about with too many datasets is background transcription. With more datasets you will have more spurious assemblies from background transcription (if you sequence deep enough everything is transcribed at some level). You should also look at the results in a browser like apollo, you may find that some datasets are more noisy than others and it would be beneficial to drop them especially if they are redundant. So always do a visual review of results. ?Carson > On Jul 4, 2018, at 6:28 AM, Lior Glick wrote: > > Dear MAKER users, > > I am new to MAKER and would like your advice. > I am planning to annotate multiple genomes of tomato variants and wild relatives. To this end, I have been working on generating a diverse transcripts data set to be used as input for MAKER (along with protein sequences and the 'official' tomato annotation). My transcripts set was generated by collecting multiple available RNA-Seq results from SRA, covering diverse variants, conditions and tissues, and assembling them into transcripts using Trinity. My goal is to have a data set as diverse and broad as possible. > Now I have ~30 fasta files of transcripts, originating from different studies. Of course, many of the transcripts are redundant and/or partial. I am exploring ways to merge the multiple data sets into a non-redundant one, while also stitching partial transcripts into longer ones based on overlaps. > However, this turns out to be not-so-trivial and I am wandering if this is really necessary in order to get a good annotation? Maybe I can just concatenate all my transcriptome assembly results, and MAKER will handle redundant and partial transcripts? > Can someone clarify how this works, and try to assess if an annotation based on a merged data set should be superior to one that didn't undergo such a process? If someone has actual experience with such data, that would be really helpful, but any advice would be highly appreciated. > > Thanks a lot and best regards, > Lior > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From shijunpeng at cau.edu.cn Sat Jul 14 03:04:38 2018 From: shijunpeng at cau.edu.cn (=?UTF-8?B?5Y+y5L+K6bmP?=) Date: Sat, 14 Jul 2018 16:04:38 +0800 (GMT+08:00) Subject: [maker-devel] Ask for help about the collapse of Maker (version 2.31.9) when annotated with Fgenesh In-Reply-To: References: Message-ID: <183e519e.83bf.16497d1fd4b.Coremail.shijunpeng@cau.edu.cn> Dear Carson, First of all, I must apologize that I could't post my questions in Google group since I can't get access to Google in mainland China. I am using Maker (version 2.31.9) to annotate several foxtail millet genomes. I combined Augustus and Fgenesh (v.3.1.1) for the de novo annotation of these genomes. The majority of contigs were anotated well with maker pipeline. While, several contigs failed when annotated with Fgenesh with the following error information: #--------- command -------------# Widget::fgenesh: /NAS7/home/shijunpeng/software/maker/bin/../lib/Widget/fgenesh/fgenesh_wrap /NAS7/home/shijunpeng/software/fgenesh/fgenesh /NAS7/home/shijunpeng/software/fgenesh/Monocots /tmp/43438.1.all.q/maker_8zLUxB/0/108_0.4597215-4597401.Monocots.auto_annotator.fgenesh.fasta -exon_table:/tmp/43438.1.all.q/maker_8zLUxB/0/108_0.4597215-4597401.Monocots.auto_annotator.xdef.fgenesh > /tmp/43438.1.all.q/maker_8zLUxB/0/108_0.4597215- #-------------------------------# ERROR: FgenesH failed --> rank=NA, hostname=bioinfor3.local ERROR: Failed while annotating transcripts ERROR: Chunk failed at level:1, tier_type:4 FAILED CONTIG:scaffold_1 ERROR: Chunk failed at level:6, tier_type:0 FAILED CONTIG:scaffold_1 ############################################################################################################################################### A system core file generated after this collapse. I checked the temperate fasta file 108_0.4597215-4597401.Monocots.auto_annotator.fgenesh.fasta to be normal about ~300 bp. I also checked my original sequence file and confirmed no problem (A,T,C,G and N). I also tried to set the pred_flank option from 200 (original) to 0 and the error still exists. I ran the Maker pipeline in a single node with 16 processors and 256 Gb RAMs, so it may be not due to the MPI problems. Below were my detailed maker bahavior options: #-----MAKER Behavior Options max_dna_len=300000 #length for dividing up contigs into chunks (increases/decreases memory usage) min_contig=10000 #skip genome contigs below this length (under 10kb are often useless) pred_flank=0 #flank for extending evidence clusters sent to gene predictors pred_stats=1 #report AED and QI statistics for all predictions as well as models AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and 1) min_protein=0 #require at least this many amino acids in predicted proteins alt_splice=1 #Take extra steps to try and find alternative splicing, 1 = yes, 0 = no always_complete=1 #extra steps to force start and stop codons, 1 = yes, 0 = no map_forward=1 #map names and attributes forward from old GFF3 genes, 1 = yes, 0 = no keep_preds=0 #Concordance threshold to add unsupported gene prediction (bound by 0 and 1) split_hit=10000 #length for the splitting of hits (expected max intron size for evidence alignments) single_exon=0 #consider single exon EST evidence when generating annotations, 1 = yes, 0 = no single_length=250 #min length required for single exon ESTs if 'single_exon is enabled' correct_est_fusion=0 #limits use of ESTs in annotation to avoid fusion genes tries=5 #number of times to try a contig if there is a failure for some reason clean_try=0 #remove all data from previous run before retrying, 1 = yes, 0 = no clean_up=0 #removes theVoid directory with individual analysis files, 1 = yes, 0 = no TMP= #specify a directory other than the system default temporary directory for temporary files Could you please help me to solve this error? I am looking forward to hearing from you. Sincerely, Junpeng -- Junpeng Shi, PhD State Key Lab For Agrobiotech, China Agricultural University National Maize Improvement Center of China Center For Life Science, NO.2, The West Street of Yuanmingyuan Park, Beijing, P.R.China Tel?+86-13581863941 From liorglic at mail.tau.ac.il Tue Jul 24 02:45:06 2018 From: liorglic at mail.tau.ac.il (Lior Glick) Date: Tue, 24 Jul 2018 09:45:06 +0200 Subject: [maker-devel] Annotation of a new variant within a species Message-ID: Hello, I am trying to annotate multiple variants of tomato. While a good annotation of the reference genome is available, I have denovo-assembled other variants of the same species and wish to annotate them. Most MAKER documentation refers to annotation of a new species, while using transcripts and proteins from either the exact same sample (individual) or from "an alternate organism", so I'm not sure what to do in this case, where I am annotating various samples from the same species. I have two questions: 1. Regarding transcripts data, how should I use transcripts from other variants of the same species? Namely, should I use the est or the altest parameter? What is the actual difference in behavior? 2. Is there a way to incorporate gene models (in gff format) from the reference annotation? I expect high similarity in my assembled variants, but not identity in terms of content and coordinates, so neither pred_gff nor model_gff sound like what I need, as far as I understand. I could also use the reference annotation and sequence to extract cDNA and provide them as EST data. Is this the way to go? It feels like some information on introns might be lost this way. Would highly appreciate your answers to these questions or any other advice. Thank you very much! -------------- next part -------------- An HTML attachment was scrubbed... URL: From roscito at mpi-cbg.de Tue Jul 31 07:59:58 2018 From: roscito at mpi-cbg.de (Ju Roscito) Date: Tue, 31 Jul 2018 14:59:58 +0200 Subject: [maker-devel] Few alternative isoforms when alt_splice=0 Message-ID: <2C92DF72-0733-490F-A2EE-6F3724EF7099@mpi-cbg.de> Dear all, I have a question about the behaviour of alt_splice option, seems there?s not much about it on the forum. I have run a single round of MAKER (2.31.9) on a vertebrate genome, with trinity mRNA data and mapped proteins from closely-related species. I set alt_splice to 0, but still got from two to four mRNAs for ~20 out of the 19.000 predicted genes. Has someone also seen the same? Any idea why would that happen? Thanks a lot in advance. From timo.metz at googlemail.com Fri Jul 20 07:20:05 2018 From: timo.metz at googlemail.com (Timo Metz) Date: Fri, 20 Jul 2018 12:20:05 -0000 Subject: [maker-devel] MAKER chooser algorithm Message-ID: Hey, I am working on the improvement of an already existing annotation. I could find that sometimes MAKER would split or merge genes where it intuitively does not look correct when looking at the evidence. Please find two examples attached. The first track is the old annotation, the second track the new annotation, then there is RNA-seq data, proteins, repeats, snap prediction, augustus prediction. It is visible, that in both cases the evidence supports two genes, and one gene predictor in each case tends to create one gene where the other one creates two genes. I do not understand why in this case the gene is merged, if evidence and also one ab initio prediction support rather two genes. Are there any suggestions on how to solve this? best Timo -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Picture1.png Type: image/png Size: 26778 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Picutre2.png Type: image/png Size: 24145 bytes Desc: not available URL: From cganote at iu.edu Tue Jul 24 11:31:02 2018 From: cganote at iu.edu (Ganote, Carrie L) Date: Tue, 24 Jul 2018 16:31:02 -0000 Subject: [maker-devel] Maker ignores evidence and just returns gffs with genome contigs Message-ID: Running maker, I don't see anything in the gff except the names of the contigs and their lengths: ##gff-version 3 SczI0sq_2092%3%3D3122 . contig 1 119548 . . . ID=SczI0sq_2092%3%3D3122;Name=SczI0sq_2092%3%3D3122 ### SczI0sq_842%3%3D1778 . contig 1 4693 . . . ID=SczI0sq_842%3B%3D1778;Name=SczI0sq_842%3%3D1778 ### ... In my opts file, I have: #-----Genome (these are always required) genome=/projects/Reference/genome.chr.fa #genome sequence (fasta file or fasta embeded in GFF3 file) organism_type=eukaryotic #eukaryotic or prokaryotic. Default is eukaryotic #-----Re-annotation Using MAKER Derived GFF3 maker_gff= #MAKER derived GFF3 file est_pass=0 #use ESTs in maker_gff: 1 = yes, 0 = no altest_pass=0 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no protein_pass=0 #use protein alignments in maker_gff: 1 = yes, 0 = no rm_pass=0 #use repeats in maker_gff: 1 = yes, 0 = no model_pass=0 #use gene models in maker_gff: 1 = yes, 0 = no pred_pass=0 #use ab-initio predictions in maker_gff: 1 = yes, 0 = no other_pass=0 #passthrough anyything else in maker_gff: 1 = yes, 0 = no #-----EST Evidence (for best results provide a file for at least one) est= #set of ESTs or assembled mRNA-seq in fasta format altest= #EST/cDNA sequence file in fasta format from an alternate organism est_gff=/projects/Reference/Maker/EST_assembled.all.gff #aligned ESTs or mRNA-seq from an external GFF3 file altest_gff= #aligned ESTs from a closly relate species in GFF3 format #-----Protein Homology Evidence (for best results provide a file for at least one) protein= #protein sequence file in fasta format (i.e. from mutiple oransisms) protein_gff=/projects/Reference/Maker/exonerate_withCC.gff3 #aligned protein homology evidence from an external GFF3 file #-----Repeat Masking (leave values blank to skip repeat masking) model_org= #select a model organism for RepBase masking in RepeatMasker rmlib= #provide an organism specific repeat library in fasta format for RepeatMasker repeat_protein= #provide a fasta file of transposable element proteins for RepeatRunner rm_gff= #pre-identified repeat elements from an external GFF3 file prok_rm=0 #forces MAKER to repeatmask prokaryotes (no reason to change this), 1 = yes, 0 = no softmask=1 #use soft-masking rather than hard-masking in BLAST (i.e. seg and dust filtering) #-----Gene Prediction snaphmm= #SNAP HMM file gmhmm= #GeneMark HMM file augustus_species= #Augustus gene prediction species model fgenesh_par_file= #FGENESH parameter file pred_gff=/projects/Reference/Maker/augustus_output.reformated.gff #ab-initio predictions from an external GFF3 file model_gff= #annotated gene models from an external GFF3 file (annotation pass-through) est2genome=0 #infer gene predictions directly from ESTs, 1 = yes, 0 = no protein2genome=0 #infer predictions from protein homology, 1 = yes, 0 = no trna=0 #find tRNAs with tRNAscan, 1 = yes, 0 = no snoscan_rrna= #rRNA file to have Snoscan find snoRNAs unmask=0 #also run ab-initio prediction programs on unmasked sequence, 1 = yes, 0 = no #-----Other Annotation Feature Types (features MAKER doesn't recognize) other_gff= #extra features to pass-through to final MAKER generated GFF3 file #-----External Application Behavior Options alt_peptide=C #amino acid used to replace non-standard amino acids in BLAST databases cpus=1 #max number of cpus to use in BLAST and RepeatMasker (not for MPI, leave 1 when using MPI) #-----MAKER Behavior Options max_dna_len=100000 #length for dividing up contigs into chunks (increases/decreases memory usage) min_contig=1 #skip genome contigs below this length (under 10kb are often useless) pred_flank=200 #flank for extending evidence clusters sent to gene predictors pred_stats=0 #report AED and QI statistics for all predictions as well as models AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and 1) min_protein=0 #require at least this many amino acids in predicted proteins alt_splice=0 #Take extra steps to try and find alternative splicing, 1 = yes, 0 = no always_complete=0 #extra steps to force start and stop codons, 1 = yes, 0 = no map_forward=0 #map names and attributes forward from old GFF3 genes, 1 = yes, 0 = no keep_preds=0 #Concordance threshold to add unsupported gene prediction (bound by 0 and 1) split_hit=10000 #length for the splitting of hits (expected max intron size for evidence alignments) single_exon=0 #consider single exon EST evidence when generating annotations, 1 = yes, 0 = no single_length=250 #min length required for single exon ESTs if 'single_exon is enabled' correct_est_fusion=0 #limits use of ESTs in annotation to avoid fusion genes tries=2 #number of times to try a contig if there is a failure for some reason clean_try=0 #remove all data from previous run before retrying, 1 = yes, 0 = no clean_up=0 #removes theVoid directory with individual analysis files, 1 = yes, 0 = no TMP= #specify a directory other than the system default temporary directory for temporary files It ran for ~3 hours and all contigs in the log file said FINISHED. No failures. Did I set something wrong? -Carrie -------------- next part -------------- An HTML attachment was scrubbed... URL: From jennifer.anderson at ebc.uu.se Tue Jul 3 05:57:56 2018 From: jennifer.anderson at ebc.uu.se (Jennifer Anderson) Date: Tue, 3 Jul 2018 13:57:56 +0200 Subject: [maker-devel] Genemark XXX.mod files Message-ID: <902950FF-775C-46DC-987A-5666A56A6650@ebc.uu.se> Hello, I am working on annotations for fungal genomes, using GenemarkES with ?fungi for gene prediction. In earlier attempts, I did not use the training flag, and I did get the output gmhmm file. Now I have tried with the training flag and do not get this file. In the /run/ directory I do get mod files ES_A.mod, ES_B.mod, and ES_C.mod, as well as ini.mod. Does one of these files work as the ES.mod file as in "gmhmm=../train_genemark/es.mod #GeneMark HMM file? from http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/The_MAKER_control_files_explained? I don?t find documentation of the genemarkES output online. Thank you. Jenni N?r du har kontakt med oss p? Uppsala universitet med e-post s? inneb?r det att vi behandlar dina personuppgifter. F?r att l?sa mer om hur vi g?r det kan du l?sa h?r: http://www.uu.se/om-uu/dataskydd-personuppgifter/ E-mailing Uppsala University means that we will process your personal data. For more information on how this is performed, please read here: http://www.uu.se/om-uu/dataskydd-personuppgifter/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From liorglic at mail.tau.ac.il Wed Jul 4 06:32:05 2018 From: liorglic at mail.tau.ac.il (Lior Glick) Date: Wed, 4 Jul 2018 14:32:05 +0200 Subject: [maker-devel] How sensitive is MAKER to redundant/partial transcripts? Message-ID: Dear MAKER users, I am new to MAKER and would like your advice. I am planning to annotate multiple genomes of tomato variants and wild relatives. To this end, I have been working on generating a diverse transcripts data set to be used as input for MAKER (along with protein sequences and the 'official' tomato annotation). My transcripts set was generated by collecting multiple available RNA-Seq results from SRA, covering diverse variants, conditions and tissues, and assembling them into transcripts using Trinity. My goal is to have a data set as diverse and broad as possible. Now I have ~30 fasta files of transcripts, originating from different studies. Of course, many of the transcripts are redundant and/or partial. I am exploring ways to merge the multiple data sets into a non-redundant one, while also stitching partial transcripts into longer ones based on overlaps. However, this turns out to be not-so-trivial and I am wandering if this is really necessary in order to get a good annotation? Maybe I can just concatenate all my transcriptome assembly results, and MAKER will handle redundant and partial transcripts? Can someone clarify how this works, and try to assess if an annotation based on a merged data set should be superior to one that didn't undergo such a process? If someone has actual experience with such data, that would be really helpful, but any advice would be highly appreciated. Thanks a lot and best regards, Lior -------------- next part -------------- An HTML attachment was scrubbed... URL: From jason.stajich at gmail.com Thu Jul 5 12:13:57 2018 From: jason.stajich at gmail.com (Jason Stajich) Date: Thu, 5 Jul 2018 11:13:57 -0700 Subject: [maker-devel] Genemark XXX.mod files In-Reply-To: <902950FF-775C-46DC-987A-5666A56A6650@ebc.uu.se> References: <902950FF-775C-46DC-987A-5666A56A6650@ebc.uu.se> Message-ID: the run/ES_C.mod should be the right one if it is there. It is possible is crashing on one of the training / retraining? Jason Stajich jason.stajich at gmail.com On Tue, Jul 3, 2018 at 11:05 AM Jennifer Anderson < jennifer.anderson at ebc.uu.se> wrote: > > Hello, > > I am working on annotations for fungal genomes, using GenemarkES with > ?fungi for gene prediction. In earlier attempts, I did not use the > training flag, and I did get the output gmhmm file. Now I have tried with > the training flag and do not get this file. In the /run/ directory I do > get mod files ES_A.mod, ES_B.mod, and ES_C.mod, as well as ini.mod. Does > one of these files work as the ES.mod file as in > "gmhmm=../train_genemark/es.mod #GeneMark HMM file? from > http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/The_MAKER_control_files_explained? I > don?t find documentation of the genemarkES output online. > > Thank you. > > Jenni > > > > > > > > > > N?r du har kontakt med oss p? Uppsala universitet med e-post s? inneb?r > det att vi behandlar dina personuppgifter. F?r att l?sa mer om hur vi g?r > det kan du l?sa h?r: http://www.uu.se/om-uu/dataskydd-personuppgifter/ > > E-mailing Uppsala University means that we will process your personal > data. For more information on how this is performed, please read here: > http://www.uu.se/om-uu/dataskydd-personuppgifter/ > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Jul 5 12:47:38 2018 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 5 Jul 2018 12:47:38 -0600 Subject: [maker-devel] How sensitive is MAKER to redundant/partial transcripts? In-Reply-To: References: Message-ID: <788E84AB-DB85-43AD-8FE1-C1D8A7DBD4B5@gmail.com> MAKER will collapse redundant evidence after alignment, so it will primarily just increase run time. The main issue with so many datasets would be false positive alignments (assembled background transcription). You can look at individual contigs in Apollo, IGV, or other browser to see where spurious alignments occur and if they are overall associated with a particular dataset (it?s ok to throw out a noisy dataset especially if you have additional data). ?Carson > On Jul 4, 2018, at 6:32 AM, Lior Glick wrote: > > Dear MAKER users, > > I am new to MAKER and would like your advice. > I am planning to annotate multiple genomes of tomato variants and wild relatives. To this end, I have been working on generating a diverse transcripts data set to be used as input for MAKER (along with protein sequences and the 'official' tomato annotation). My transcripts set was generated by collecting multiple available RNA-Seq results from SRA, covering diverse variants, conditions and tissues, and assembling them into transcripts using Trinity. My goal is to have a data set as diverse and broad as possible. > Now I have ~30 fasta files of transcripts, originating from different studies. Of course, many of the transcripts are redundant and/or partial. I am exploring ways to merge the multiple data sets into a non-redundant one, while also stitching partial transcripts into longer ones based on overlaps. > However, this turns out to be not-so-trivial and I am wandering if this is really necessary in order to get a good annotation? Maybe I can just concatenate all my transcriptome assembly results, and MAKER will handle redundant and partial transcripts? > Can someone clarify how this works, and try to assess if an annotation based on a merged data set should be superior to one that didn't undergo such a process? If someone has actual experience with such data, that would be really helpful, but any advice would be highly appreciated. > > Thanks a lot and best regards, > Lior > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Jul 5 12:50:36 2018 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 5 Jul 2018 12:50:36 -0600 Subject: [maker-devel] [CAUTION: Suspicious Link] map_forward=1 not mapping reference ID's to output correctly In-Reply-To: References: Message-ID: <4EE96E7F-5F5B-4988-BC9C-FC441848B768@gmail.com> A quick overview of MAKER behavior. MAKER will keep everything in model_gff as long as you don?t provide another predictor to run or pred_gff file to use. But if you give it a predictor to run, it takes that as an indicator that you want to update models. So model_gff may get replaced by another prediction that overlaps it but scores better. So depending on the behavior you want, make sure you are using model_gff and do or don?t provide a gene predictor to run. ?Carson > On Jun 22, 2018, at 2:04 PM, Poelchau, Monica wrote: > > Hi Kapeel, > > If you just want your community annotations to replace models in an existing gene set, we have a tool for this: > > https://github.com/NAL-i5K/GFF3toolkit > > You?d need to run gff3_QC on your annotation files first to make sure your annotations are okay, then use gff3_merge to merge your community annotations with your existing gene set (in gff3 format). If you end up trying this out - we?re actively developing the GFF3toolkit, so feel free to post an issue if you notice any problems. > > Hth, > > Monica > > From: maker-devel > on behalf of Kapeel Chougule > > Date: Friday, June 22, 2018 at 13:53 > To: "maker-devel at yandell-lab.org " > > Subject: [CAUTION: Suspicious Link][maker-devel] map_forward=1 not mapping reference ID's to output correctly > > PROCEED WITH CAUTION: This message triggered warnings of potentially malicious web content. Evaluate this email by considering whether you are expecting the message, along with inspection for suspicious links. > > Questions: Spam.Abuse at wdc.usda.gov > > Hi, > > I am trying to update community annotation in the light of new evidence data but my MAKER runs are not keeping all the genes from the community annotation. > > > Community annotation feature count: 2 1 bicolor 239969 CDS 266301 exon 51066 five_prime_UTR 34129 gene 47121 mRNA 53708 three_prime_UTR > MAKER gene count-> > awk '$3=="gene"{print}' maker_output.all.gff | grep "Sobic*" | wc -l 21105 > > In the maker_opts.ctl file attached, I did make keep_preds=1 and map_forward=1 which keep all the community gene models even if they dont have evidence support. This was explained here: > http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Updating_annotations_in_light_of_new_data > . So not sure why we dont have the all the community gene models mapped in the MAKER output > > Thanks > > Kapeel > -- > > Kapeel Chougule > Computational Scientist Developer II > One Bungtown Road Cold Spring Harbor, NY 11724 > http://www.warelab.org/ > > > > This electronic message contains information generated by the USDA solely for the intended recipients. Any unauthorized interception of this message or the use or disclosure of the information it contains may violate the law and subject the violator to civil or criminal penalties. If you believe you have received this message in error, please notify the sender and delete the email immediately. _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Jul 5 13:17:14 2018 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 5 Jul 2018 13:17:14 -0600 Subject: [maker-devel] Maker Error : Thread 1 terminated abnormally.. In-Reply-To: References: Message-ID: Sorry for the slow reply. Make sure you find out what flavor of MPI you are using (MPICH, MVAPICH2, Intel MPI, or OpenMPI). MAKER does not work with MVAPICH2. It can work with Intel MPI and OpenMPI with some command line modification. And it always works with MPICH, but MPICH may not be able to scale to more than ~100 CPUs. This command ?-mca btl ^openib?, is only for OpenMPI for example. Also if using OpenMPI, set LD_PRELOAD in accordance with the INSTALL documentation. Also make sure you do not have multiple MPI flavors installed and you compiled MAKER with one then are running with a different flavor. That will cause failure shortly after starting MAKER. Try looking further back in your STDER for the actual cause. The ?Thread 1 terminated abnormally:? message is the tail end of the failure snowball, so the actual cause is often much further back. ?Carson > On Jun 26, 2018, at 9:36 AM, Andr? Machado wrote: > > Hi , > > First of all thanks for your efforts in Maker pipeline. Its a tremendous help for the people that works with genomes. > In the last 4 days i have broke my head.. with an error .. but still without a solution. > I found this old thread: https://groups.google.com/forum/#!msg/maker-devel/X2-76BH9gvg/rU4kLJ3B6tsJ > Seems to be a quite similar... but don't point to a specific solution. > I have run maker with the data test and all runned ok. Maker finalize the entire process without errors. > Recently, i?m trying to aplly my own data on MPI cluster. But this error, frequently occurred. > Thread 1 terminated abnormally: ../dna.maker.output/mpi_blastdb/dna%2Efa.mpi.1/dna%2Efa.mpi.1.0 > --> rank=8, hostname=compute-0-1.local, at ../Analysis/Geno/maker/bin/maker line 1451 thread 1. > --> rank=8, hostname=compute-0-1.local > deleted:0 hits > deleted:0 hits > preparing ab-inits > deleted:0 hits > deleted:0 hits > FATAL: Thread terminated, causing all processes to fail > --> rank=8, hostname=compute-0-1.local > deleted:0 hits > > Basically im tring to run a maker with dna.fa, rna.fa, prot.fa and my_custom_lib_of_repeats.fa, to produce raw genes models which will be used to train SNAP. > > I already used several command lines and all gave me the same error.. The only change between different tests was the local of the error, sometimes happened in compute-0-1.local other time in compute-0-4.local or in another one. > mpiexec -n 63 --hostfile Host maker 1>1.log 2>2.err > > mpiexec --hostfile Host maker 1>1.log 2>2.err > mpiexec -mca btl ^openib -n 63 --hostfile Host maker 1>1.log 2>2.err > nohup mpiexec -mca btl ^openib -n 63 --hostfile Host maker -a 1>1.log 2>2.err > > The log file as well the option files are provided below. > > Many thanks in advance, > > Andr? > > <2.log>_______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From andremmachado25 at gmail.com Wed Jul 4 05:16:08 2018 From: andremmachado25 at gmail.com (=?UTF-8?Q?Andr=C3=A9_Machado?=) Date: Wed, 4 Jul 2018 12:16:08 +0100 Subject: [maker-devel] =?utf-8?q?Maker_Error_=3A_Thread_1_terminated_abno?= =?utf-8?q?rmally=2E=2E_Andr=C3=A9_Machado_=3Candremmachado25=40gma?= =?utf-8?q?il=2Ecom=3E_AttachmentsJun_26_=288_days_ago=29_to_maker-?= =?utf-8?q?devel_Hi_=2C_First_of_all_thanks_for_your_efforts_in_Mak?= =?utf-8?q?er_pipeline=2E_Its_a_tremendous_help_for_the_people_that?= =?utf-8?q?_works_with_genomes=2E_In_the_last_4_days_i_have_broke_m?= =?utf-8?q?y_head=2E=2E_with_an_error_=2E=2E_but_still_without_a_so?= =?utf-8?q?lution=2E_I_found_this_old_thread=3A_https=3A//groups=2E?= =?utf-8?q?google=2Ecom/forum/=23!msg/maker-devel/X2-76BH9gvg/rU4kL?= =?utf-8?q?J3B6tsJ_Seems_to_be_a_quite_similar=2E=2E=2E_but_don=27t?= =?utf-8?q?_point_to_a_specific_solution=2E_I_have_run_maker_with_t?= =?utf-8?q?he_data_test_and_all_runned_ok=2E_Maker_finalize_the_ent?= =?utf-8?q?ire_process_without_errors=2E_Recently=2C_i=E2=80=99m_tr?= =?utf-8?q?ying_to_aplly_my_own_data_on_MPI_cluster=2E_But_this_err?= =?utf-8?q?or=2C_frequently_occurred=2E_Thread_1_terminated_abnorma?= =?utf-8?q?lly=3A_=2E=2E/dna=2Emaker=2Eoutput/mpi=5Fblastdb/dna=252?= =?utf-8?b?RWZhLm1waS4xL2RuYSUyRWZhLm1waS4xLjAgLS0+IHJhbms9OCwgaG9z?= =?utf-8?q?tname=3Dcompute-0-1=2Elocal=2C_at_=2E=2E/Analysis/Geno/m?= =?utf-8?q?aker/bin/maker_line_1451_thread_1=2E_--=3E_rank=3D8=2C_h?= =?utf-8?q?ostname=3Dcompute-0-1=2Elocal_deleted=3A0_hits_deleted?= =?utf-8?q?=3A0_hits_preparing_ab-ini?= Message-ID: Hi , First of all thanks for your efforts in Maker pipeline. Its a tremendous help for the people that works with genomes. In the last 4 days i have broke my head.. with an error .. but still without a solution. I found this old thread: https://groups.google.com/ forum/#!msg/maker-devel/X2-76BH9gvg/rU4kLJ3B6tsJ Seems to be a quite similar... but don't point to a specific solution. I have run maker with the data test and all runned ok. Maker finalize the entire process without errors. Recently, i?m trying to aplly my own data on MPI cluster. But this error, frequently occurred. Thread 1 terminated abnormally: ../dna.maker.output/mpi_ blastdb/dna%2Efa.mpi.1/dna%2Efa.mpi.1.0 --> rank=8, hostname=compute-0-1.local, at ../Analysis/Geno/maker/bin/maker line 1451 thread 1. --> rank=8, hostname=compute-0-1.local deleted:0 hits deleted:0 hits preparing ab-inits deleted:0 hits deleted:0 hits FATAL: Thread terminated, causing all processes to fail --> rank=8, hostname=compute-0-1.local deleted:0 hits Basically im tring to run a maker with dna.fa, rna.fa, prot.fa and my_custom_lib_of_repeats.fa, to produce raw genes models which will be used to train SNAP. I already used several command lines and all gave me the same error.. The only change between different tests was the local of the error, sometimes happened in compute-0-1.local other time in compute-0-4.local or in another one. mpiexec -n 63 --hostfile Host maker 1>1.log 2>2.err mpiexec --hostfile Host maker 1>1.log 2>2.err mpiexec -mca btl ^openib -n 63 --hostfile Host maker 1>1.log 2>2.err nohup mpiexec -mca btl ^openib -n 63 --hostfile Host maker -a 1>1.log 2>2.err The log file as well the option files are provided below. Many thanks in advance, Andr? -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 2.log Type: text/x-log Size: 38654 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_exe.ctl Type: application/octet-stream Size: 1223 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_opts.ctl Type: application/octet-stream Size: 4547 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_bopts.ctl Type: application/octet-stream Size: 1412 bytes Desc: not available URL: From liorglck at gmail.com Wed Jul 4 06:28:14 2018 From: liorglck at gmail.com (Lior Glick) Date: Wed, 4 Jul 2018 14:28:14 +0200 Subject: [maker-devel] How sensitive is MAKER to redundant/partial transcripts? Message-ID: Dear MAKER users, I am new to MAKER and would like your advice. I am planning to annotate multiple genomes of tomato variants and wild relatives. To this end, I have been working on generating a diverse transcripts data set to be used as input for MAKER (along with protein sequences and the 'official' tomato annotation). My transcripts set was generated by collecting multiple available RNA-Seq results from SRA, covering diverse variants, conditions and tissues, and assembling them into transcripts using Trinity. My goal is to have a data set as diverse and broad as possible. Now I have ~30 fasta files of transcripts, originating from different studies. Of course, many of the transcripts are redundant and/or partial. I am exploring ways to merge the multiple data sets into a non-redundant one, while also stitching partial transcripts into longer ones based on overlaps. However, this turns out to be not-so-trivial and I am wandering if this is really necessary in order to get a good annotation? Maybe I can just concatenate all my transcriptome assembly results, and MAKER will handle redundant and partial transcripts? Can someone clarify how this works, and try to assess if an annotation based on a merged data set should be superior to one that didn't undergo such a process? If someone has actual experience with such data, that would be really helpful, but any advice would be highly appreciated. Thanks a lot and best regards, Lior -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Jul 12 14:05:00 2018 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 12 Jul 2018 14:05:00 -0600 Subject: [maker-devel] =?utf-8?q?Maker_Error_=3A_Thread_1_terminated_abno?= =?utf-8?q?rmally=2E=2E_Andr=C3=A9_Machado_=3Candremmachado25=40gmail=2Eco?= =?utf-8?q?m=3E_AttachmentsJun_26_=288_days_ago=29_to_maker-devel_Hi_=2C_F?= =?utf-8?q?irst_of_all_thanks_for_your_efforts_in_Maker_pipeline=2E_Its_a_?= =?utf-8?q?tremendous_help_for_the_people_that_works_with_genomes=2E_In_th?= =?utf-8?q?e_last_4_days_i_have_broke_my_head=2E=2E_with_an_error_=2E=2E_b?= =?utf-8?q?ut_still_without_a_solution=2E_I_found_this_old_thread=3A_https?= =?utf-8?q?=3A//groups=2Egoogle=2Ecom/forum/=23!msg/maker-devel/X2-76BH9gv?= =?utf-8?q?g/rU4kLJ3B6tsJ_Seems_to_be_a_quite_similar=2E=2E=2E_but_don=27t?= =?utf-8?q?_point_to_a_specific_solution=2E_I_have_run_maker_with_the_data?= =?utf-8?q?_test_and_all_runned_ok=2E_Maker_finalize_the_entire_process_wi?= =?utf-8?q?thout_errors=2E_Recently=2C_i=E2=80=99m_trying_to_aplly_my_own_?= =?utf-8?q?data_on_MPI_cluster=2E_But_this_error=2C_frequently_occurred=2E?= =?utf-8?q?_Thread_1_terminated_abnormally=3A_=2E=2E/dna=2Emaker=2Eoutput/?= =?utf-8?q?mpi=5Fblastdb/dna=252Efa=2Empi=2E1/dna=252Efa=2Empi=2E1=2E0_--?= =?utf-8?q?=3E_rank=3D8=2C_hostname=3Dcompute-0-1=2Elocal=2C_at_=2E=2E/Ana?= =?utf-8?q?lysis/Geno/maker/bin/maker_line_1451_thread_1=2E_--=3E_rank=3D8?= =?utf-8?q?=2C_hostname=3Dcompute-0-1=2Elocal_deleted=3A0_hits_deleted=3A0?= =?utf-8?q?_hits_preparing_ab-ini?= In-Reply-To: References: Message-ID: <5F1E5499-239E-405E-81EC-CECC755D7838@gmail.com> Because you truncated / removed line before the actual error (I need to see the several hundred lines that happened before "Thread 1 terminated abnormally: ../dna.maker.output/mpi_blastdb/dna%2Efa.mpi.1/dna%2Efa.mpi.1.0?), I can?t give hyou more info. But you are getting a lot of OpenMPI complaints at the start. You may need to reinstall OpenMPI or use MPICH instead (both will require you to reinstall maker as it will need to rebuild the MPI C/Perl binding for the new installation). Also when using OpenMPI, make sure to export LD_PRELOAD in the way outlined in the ?/maker/INSTALL instructions. ?Carson > On Jul 4, 2018, at 5:16 AM, Andr? Machado wrote: > > Hi , > > First of all thanks for your efforts in Maker pipeline. Its a tremendous help for the people that works with genomes. > In the last 4 days i have broke my head.. with an error .. but still without a solution. > I found this old thread: https://groups.google.com/forum/#!msg/maker-devel/X2-76BH9gvg/rU4kLJ3B6tsJ > Seems to be a quite similar... but don't point to a specific solution. > I have run maker with the data test and all runned ok. Maker finalize the entire process without errors. > Recently, i?m trying to aplly my own data on MPI cluster. But this error, frequently occurred. > Thread 1 terminated abnormally: ../dna.maker.output/mpi_blastdb/dna%2Efa.mpi.1/dna%2Efa.mpi.1.0 > --> rank=8, hostname=compute-0-1.local, at ../Analysis/Geno/maker/bin/maker line 1451 thread 1. > --> rank=8, hostname=compute-0-1.local > deleted:0 hits > deleted:0 hits > preparing ab-inits > deleted:0 hits > deleted:0 hits > FATAL: Thread terminated, causing all processes to fail > --> rank=8, hostname=compute-0-1.local > deleted:0 hits > > Basically im tring to run a maker with dna.fa, rna.fa, prot.fa and my_custom_lib_of_repeats.fa, to produce raw genes models which will be used to train SNAP. > > I already used several command lines and all gave me the same error.. The only change between different tests was the local of the error, sometimes happened in compute-0-1.local other time in compute-0-4.local or in another one. > mpiexec -n 63 --hostfile Host maker 1>1.log 2>2.err > > mpiexec --hostfile Host maker 1>1.log 2>2.err > mpiexec -mca btl ^openib -n 63 --hostfile Host maker 1>1.log 2>2.err > nohup mpiexec -mca btl ^openib -n 63 --hostfile Host maker -a 1>1.log 2>2.err > > The log file as well the option files are provided below. > > Many thanks in advance, > > Andr? > > <2.log>_______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Jul 12 14:38:33 2018 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 12 Jul 2018 14:38:33 -0600 Subject: [maker-devel] How sensitive is MAKER to redundant/partial transcripts? In-Reply-To: References: Message-ID: MAKER will automatically collapse redundant evidence. The only thing you may need to worry about with too many datasets is background transcription. With more datasets you will have more spurious assemblies from background transcription (if you sequence deep enough everything is transcribed at some level). You should also look at the results in a browser like apollo, you may find that some datasets are more noisy than others and it would be beneficial to drop them especially if they are redundant. So always do a visual review of results. ?Carson > On Jul 4, 2018, at 6:28 AM, Lior Glick wrote: > > Dear MAKER users, > > I am new to MAKER and would like your advice. > I am planning to annotate multiple genomes of tomato variants and wild relatives. To this end, I have been working on generating a diverse transcripts data set to be used as input for MAKER (along with protein sequences and the 'official' tomato annotation). My transcripts set was generated by collecting multiple available RNA-Seq results from SRA, covering diverse variants, conditions and tissues, and assembling them into transcripts using Trinity. My goal is to have a data set as diverse and broad as possible. > Now I have ~30 fasta files of transcripts, originating from different studies. Of course, many of the transcripts are redundant and/or partial. I am exploring ways to merge the multiple data sets into a non-redundant one, while also stitching partial transcripts into longer ones based on overlaps. > However, this turns out to be not-so-trivial and I am wandering if this is really necessary in order to get a good annotation? Maybe I can just concatenate all my transcriptome assembly results, and MAKER will handle redundant and partial transcripts? > Can someone clarify how this works, and try to assess if an annotation based on a merged data set should be superior to one that didn't undergo such a process? If someone has actual experience with such data, that would be really helpful, but any advice would be highly appreciated. > > Thanks a lot and best regards, > Lior > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From shijunpeng at cau.edu.cn Sat Jul 14 02:04:38 2018 From: shijunpeng at cau.edu.cn (=?UTF-8?B?5Y+y5L+K6bmP?=) Date: Sat, 14 Jul 2018 16:04:38 +0800 (GMT+08:00) Subject: [maker-devel] Ask for help about the collapse of Maker (version 2.31.9) when annotated with Fgenesh In-Reply-To: References: Message-ID: <183e519e.83bf.16497d1fd4b.Coremail.shijunpeng@cau.edu.cn> Dear Carson, First of all, I must apologize that I could't post my questions in Google group since I can't get access to Google in mainland China. I am using Maker (version 2.31.9) to annotate several foxtail millet genomes. I combined Augustus and Fgenesh (v.3.1.1) for the de novo annotation of these genomes. The majority of contigs were anotated well with maker pipeline. While, several contigs failed when annotated with Fgenesh with the following error information: #--------- command -------------# Widget::fgenesh: /NAS7/home/shijunpeng/software/maker/bin/../lib/Widget/fgenesh/fgenesh_wrap /NAS7/home/shijunpeng/software/fgenesh/fgenesh /NAS7/home/shijunpeng/software/fgenesh/Monocots /tmp/43438.1.all.q/maker_8zLUxB/0/108_0.4597215-4597401.Monocots.auto_annotator.fgenesh.fasta -exon_table:/tmp/43438.1.all.q/maker_8zLUxB/0/108_0.4597215-4597401.Monocots.auto_annotator.xdef.fgenesh > /tmp/43438.1.all.q/maker_8zLUxB/0/108_0.4597215- #-------------------------------# ERROR: FgenesH failed --> rank=NA, hostname=bioinfor3.local ERROR: Failed while annotating transcripts ERROR: Chunk failed at level:1, tier_type:4 FAILED CONTIG:scaffold_1 ERROR: Chunk failed at level:6, tier_type:0 FAILED CONTIG:scaffold_1 ############################################################################################################################################### A system core file generated after this collapse. I checked the temperate fasta file 108_0.4597215-4597401.Monocots.auto_annotator.fgenesh.fasta to be normal about ~300 bp. I also checked my original sequence file and confirmed no problem (A,T,C,G and N). I also tried to set the pred_flank option from 200 (original) to 0 and the error still exists. I ran the Maker pipeline in a single node with 16 processors and 256 Gb RAMs, so it may be not due to the MPI problems. Below were my detailed maker bahavior options: #-----MAKER Behavior Options max_dna_len=300000 #length for dividing up contigs into chunks (increases/decreases memory usage) min_contig=10000 #skip genome contigs below this length (under 10kb are often useless) pred_flank=0 #flank for extending evidence clusters sent to gene predictors pred_stats=1 #report AED and QI statistics for all predictions as well as models AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and 1) min_protein=0 #require at least this many amino acids in predicted proteins alt_splice=1 #Take extra steps to try and find alternative splicing, 1 = yes, 0 = no always_complete=1 #extra steps to force start and stop codons, 1 = yes, 0 = no map_forward=1 #map names and attributes forward from old GFF3 genes, 1 = yes, 0 = no keep_preds=0 #Concordance threshold to add unsupported gene prediction (bound by 0 and 1) split_hit=10000 #length for the splitting of hits (expected max intron size for evidence alignments) single_exon=0 #consider single exon EST evidence when generating annotations, 1 = yes, 0 = no single_length=250 #min length required for single exon ESTs if 'single_exon is enabled' correct_est_fusion=0 #limits use of ESTs in annotation to avoid fusion genes tries=5 #number of times to try a contig if there is a failure for some reason clean_try=0 #remove all data from previous run before retrying, 1 = yes, 0 = no clean_up=0 #removes theVoid directory with individual analysis files, 1 = yes, 0 = no TMP= #specify a directory other than the system default temporary directory for temporary files Could you please help me to solve this error? I am looking forward to hearing from you. Sincerely, Junpeng -- Junpeng Shi, PhD State Key Lab For Agrobiotech, China Agricultural University National Maize Improvement Center of China Center For Life Science, NO.2, The West Street of Yuanmingyuan Park, Beijing, P.R.China Tel?+86-13581863941 From liorglic at mail.tau.ac.il Tue Jul 24 01:45:06 2018 From: liorglic at mail.tau.ac.il (Lior Glick) Date: Tue, 24 Jul 2018 09:45:06 +0200 Subject: [maker-devel] Annotation of a new variant within a species Message-ID: Hello, I am trying to annotate multiple variants of tomato. While a good annotation of the reference genome is available, I have denovo-assembled other variants of the same species and wish to annotate them. Most MAKER documentation refers to annotation of a new species, while using transcripts and proteins from either the exact same sample (individual) or from "an alternate organism", so I'm not sure what to do in this case, where I am annotating various samples from the same species. I have two questions: 1. Regarding transcripts data, how should I use transcripts from other variants of the same species? Namely, should I use the est or the altest parameter? What is the actual difference in behavior? 2. Is there a way to incorporate gene models (in gff format) from the reference annotation? I expect high similarity in my assembled variants, but not identity in terms of content and coordinates, so neither pred_gff nor model_gff sound like what I need, as far as I understand. I could also use the reference annotation and sequence to extract cDNA and provide them as EST data. Is this the way to go? It feels like some information on introns might be lost this way. Would highly appreciate your answers to these questions or any other advice. Thank you very much! -------------- next part -------------- An HTML attachment was scrubbed... URL: From roscito at mpi-cbg.de Tue Jul 31 06:59:58 2018 From: roscito at mpi-cbg.de (Ju Roscito) Date: Tue, 31 Jul 2018 14:59:58 +0200 Subject: [maker-devel] Few alternative isoforms when alt_splice=0 Message-ID: <2C92DF72-0733-490F-A2EE-6F3724EF7099@mpi-cbg.de> Dear all, I have a question about the behaviour of alt_splice option, seems there?s not much about it on the forum. I have run a single round of MAKER (2.31.9) on a vertebrate genome, with trinity mRNA data and mapped proteins from closely-related species. I set alt_splice to 0, but still got from two to four mRNAs for ~20 out of the 19.000 predicted genes. Has someone also seen the same? Any idea why would that happen? Thanks a lot in advance. From timo.metz at googlemail.com Fri Jul 20 06:20:05 2018 From: timo.metz at googlemail.com (Timo Metz) Date: Fri, 20 Jul 2018 12:20:05 -0000 Subject: [maker-devel] MAKER chooser algorithm Message-ID: Hey, I am working on the improvement of an already existing annotation. I could find that sometimes MAKER would split or merge genes where it intuitively does not look correct when looking at the evidence. Please find two examples attached. The first track is the old annotation, the second track the new annotation, then there is RNA-seq data, proteins, repeats, snap prediction, augustus prediction. It is visible, that in both cases the evidence supports two genes, and one gene predictor in each case tends to create one gene where the other one creates two genes. I do not understand why in this case the gene is merged, if evidence and also one ab initio prediction support rather two genes. Are there any suggestions on how to solve this? best Timo -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Picture1.png Type: image/png Size: 26778 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Picutre2.png Type: image/png Size: 24145 bytes Desc: not available URL: From cganote at iu.edu Tue Jul 24 10:31:02 2018 From: cganote at iu.edu (Ganote, Carrie L) Date: Tue, 24 Jul 2018 16:31:02 -0000 Subject: [maker-devel] Maker ignores evidence and just returns gffs with genome contigs Message-ID: Running maker, I don't see anything in the gff except the names of the contigs and their lengths: ##gff-version 3 SczI0sq_2092%3%3D3122 . contig 1 119548 . . . ID=SczI0sq_2092%3%3D3122;Name=SczI0sq_2092%3%3D3122 ### SczI0sq_842%3%3D1778 . contig 1 4693 . . . ID=SczI0sq_842%3B%3D1778;Name=SczI0sq_842%3%3D1778 ### ... In my opts file, I have: #-----Genome (these are always required) genome=/projects/Reference/genome.chr.fa #genome sequence (fasta file or fasta embeded in GFF3 file) organism_type=eukaryotic #eukaryotic or prokaryotic. Default is eukaryotic #-----Re-annotation Using MAKER Derived GFF3 maker_gff= #MAKER derived GFF3 file est_pass=0 #use ESTs in maker_gff: 1 = yes, 0 = no altest_pass=0 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no protein_pass=0 #use protein alignments in maker_gff: 1 = yes, 0 = no rm_pass=0 #use repeats in maker_gff: 1 = yes, 0 = no model_pass=0 #use gene models in maker_gff: 1 = yes, 0 = no pred_pass=0 #use ab-initio predictions in maker_gff: 1 = yes, 0 = no other_pass=0 #passthrough anyything else in maker_gff: 1 = yes, 0 = no #-----EST Evidence (for best results provide a file for at least one) est= #set of ESTs or assembled mRNA-seq in fasta format altest= #EST/cDNA sequence file in fasta format from an alternate organism est_gff=/projects/Reference/Maker/EST_assembled.all.gff #aligned ESTs or mRNA-seq from an external GFF3 file altest_gff= #aligned ESTs from a closly relate species in GFF3 format #-----Protein Homology Evidence (for best results provide a file for at least one) protein= #protein sequence file in fasta format (i.e. from mutiple oransisms) protein_gff=/projects/Reference/Maker/exonerate_withCC.gff3 #aligned protein homology evidence from an external GFF3 file #-----Repeat Masking (leave values blank to skip repeat masking) model_org= #select a model organism for RepBase masking in RepeatMasker rmlib= #provide an organism specific repeat library in fasta format for RepeatMasker repeat_protein= #provide a fasta file of transposable element proteins for RepeatRunner rm_gff= #pre-identified repeat elements from an external GFF3 file prok_rm=0 #forces MAKER to repeatmask prokaryotes (no reason to change this), 1 = yes, 0 = no softmask=1 #use soft-masking rather than hard-masking in BLAST (i.e. seg and dust filtering) #-----Gene Prediction snaphmm= #SNAP HMM file gmhmm= #GeneMark HMM file augustus_species= #Augustus gene prediction species model fgenesh_par_file= #FGENESH parameter file pred_gff=/projects/Reference/Maker/augustus_output.reformated.gff #ab-initio predictions from an external GFF3 file model_gff= #annotated gene models from an external GFF3 file (annotation pass-through) est2genome=0 #infer gene predictions directly from ESTs, 1 = yes, 0 = no protein2genome=0 #infer predictions from protein homology, 1 = yes, 0 = no trna=0 #find tRNAs with tRNAscan, 1 = yes, 0 = no snoscan_rrna= #rRNA file to have Snoscan find snoRNAs unmask=0 #also run ab-initio prediction programs on unmasked sequence, 1 = yes, 0 = no #-----Other Annotation Feature Types (features MAKER doesn't recognize) other_gff= #extra features to pass-through to final MAKER generated GFF3 file #-----External Application Behavior Options alt_peptide=C #amino acid used to replace non-standard amino acids in BLAST databases cpus=1 #max number of cpus to use in BLAST and RepeatMasker (not for MPI, leave 1 when using MPI) #-----MAKER Behavior Options max_dna_len=100000 #length for dividing up contigs into chunks (increases/decreases memory usage) min_contig=1 #skip genome contigs below this length (under 10kb are often useless) pred_flank=200 #flank for extending evidence clusters sent to gene predictors pred_stats=0 #report AED and QI statistics for all predictions as well as models AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and 1) min_protein=0 #require at least this many amino acids in predicted proteins alt_splice=0 #Take extra steps to try and find alternative splicing, 1 = yes, 0 = no always_complete=0 #extra steps to force start and stop codons, 1 = yes, 0 = no map_forward=0 #map names and attributes forward from old GFF3 genes, 1 = yes, 0 = no keep_preds=0 #Concordance threshold to add unsupported gene prediction (bound by 0 and 1) split_hit=10000 #length for the splitting of hits (expected max intron size for evidence alignments) single_exon=0 #consider single exon EST evidence when generating annotations, 1 = yes, 0 = no single_length=250 #min length required for single exon ESTs if 'single_exon is enabled' correct_est_fusion=0 #limits use of ESTs in annotation to avoid fusion genes tries=2 #number of times to try a contig if there is a failure for some reason clean_try=0 #remove all data from previous run before retrying, 1 = yes, 0 = no clean_up=0 #removes theVoid directory with individual analysis files, 1 = yes, 0 = no TMP= #specify a directory other than the system default temporary directory for temporary files It ran for ~3 hours and all contigs in the log file said FINISHED. No failures. Did I set something wrong? -Carrie -------------- next part -------------- An HTML attachment was scrubbed... URL: From jennifer.anderson at ebc.uu.se Tue Jul 3 05:57:56 2018 From: jennifer.anderson at ebc.uu.se (Jennifer Anderson) Date: Tue, 3 Jul 2018 13:57:56 +0200 Subject: [maker-devel] Genemark XXX.mod files Message-ID: <902950FF-775C-46DC-987A-5666A56A6650@ebc.uu.se> Hello, I am working on annotations for fungal genomes, using GenemarkES with ?fungi for gene prediction. In earlier attempts, I did not use the training flag, and I did get the output gmhmm file. Now I have tried with the training flag and do not get this file. In the /run/ directory I do get mod files ES_A.mod, ES_B.mod, and ES_C.mod, as well as ini.mod. Does one of these files work as the ES.mod file as in "gmhmm=../train_genemark/es.mod #GeneMark HMM file? from http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/The_MAKER_control_files_explained? I don?t find documentation of the genemarkES output online. Thank you. Jenni N?r du har kontakt med oss p? Uppsala universitet med e-post s? inneb?r det att vi behandlar dina personuppgifter. F?r att l?sa mer om hur vi g?r det kan du l?sa h?r: http://www.uu.se/om-uu/dataskydd-personuppgifter/ E-mailing Uppsala University means that we will process your personal data. For more information on how this is performed, please read here: http://www.uu.se/om-uu/dataskydd-personuppgifter/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From liorglic at mail.tau.ac.il Wed Jul 4 06:32:05 2018 From: liorglic at mail.tau.ac.il (Lior Glick) Date: Wed, 4 Jul 2018 14:32:05 +0200 Subject: [maker-devel] How sensitive is MAKER to redundant/partial transcripts? Message-ID: Dear MAKER users, I am new to MAKER and would like your advice. I am planning to annotate multiple genomes of tomato variants and wild relatives. To this end, I have been working on generating a diverse transcripts data set to be used as input for MAKER (along with protein sequences and the 'official' tomato annotation). My transcripts set was generated by collecting multiple available RNA-Seq results from SRA, covering diverse variants, conditions and tissues, and assembling them into transcripts using Trinity. My goal is to have a data set as diverse and broad as possible. Now I have ~30 fasta files of transcripts, originating from different studies. Of course, many of the transcripts are redundant and/or partial. I am exploring ways to merge the multiple data sets into a non-redundant one, while also stitching partial transcripts into longer ones based on overlaps. However, this turns out to be not-so-trivial and I am wandering if this is really necessary in order to get a good annotation? Maybe I can just concatenate all my transcriptome assembly results, and MAKER will handle redundant and partial transcripts? Can someone clarify how this works, and try to assess if an annotation based on a merged data set should be superior to one that didn't undergo such a process? If someone has actual experience with such data, that would be really helpful, but any advice would be highly appreciated. Thanks a lot and best regards, Lior -------------- next part -------------- An HTML attachment was scrubbed... URL: From jason.stajich at gmail.com Thu Jul 5 12:13:57 2018 From: jason.stajich at gmail.com (Jason Stajich) Date: Thu, 5 Jul 2018 11:13:57 -0700 Subject: [maker-devel] Genemark XXX.mod files In-Reply-To: <902950FF-775C-46DC-987A-5666A56A6650@ebc.uu.se> References: <902950FF-775C-46DC-987A-5666A56A6650@ebc.uu.se> Message-ID: the run/ES_C.mod should be the right one if it is there. It is possible is crashing on one of the training / retraining? Jason Stajich jason.stajich at gmail.com On Tue, Jul 3, 2018 at 11:05 AM Jennifer Anderson < jennifer.anderson at ebc.uu.se> wrote: > > Hello, > > I am working on annotations for fungal genomes, using GenemarkES with > ?fungi for gene prediction. In earlier attempts, I did not use the > training flag, and I did get the output gmhmm file. Now I have tried with > the training flag and do not get this file. In the /run/ directory I do > get mod files ES_A.mod, ES_B.mod, and ES_C.mod, as well as ini.mod. Does > one of these files work as the ES.mod file as in > "gmhmm=../train_genemark/es.mod #GeneMark HMM file? from > http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/The_MAKER_control_files_explained? I > don?t find documentation of the genemarkES output online. > > Thank you. > > Jenni > > > > > > > > > > N?r du har kontakt med oss p? Uppsala universitet med e-post s? inneb?r > det att vi behandlar dina personuppgifter. F?r att l?sa mer om hur vi g?r > det kan du l?sa h?r: http://www.uu.se/om-uu/dataskydd-personuppgifter/ > > E-mailing Uppsala University means that we will process your personal > data. For more information on how this is performed, please read here: > http://www.uu.se/om-uu/dataskydd-personuppgifter/ > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Jul 5 12:47:38 2018 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 5 Jul 2018 12:47:38 -0600 Subject: [maker-devel] How sensitive is MAKER to redundant/partial transcripts? In-Reply-To: References: Message-ID: <788E84AB-DB85-43AD-8FE1-C1D8A7DBD4B5@gmail.com> MAKER will collapse redundant evidence after alignment, so it will primarily just increase run time. The main issue with so many datasets would be false positive alignments (assembled background transcription). You can look at individual contigs in Apollo, IGV, or other browser to see where spurious alignments occur and if they are overall associated with a particular dataset (it?s ok to throw out a noisy dataset especially if you have additional data). ?Carson > On Jul 4, 2018, at 6:32 AM, Lior Glick wrote: > > Dear MAKER users, > > I am new to MAKER and would like your advice. > I am planning to annotate multiple genomes of tomato variants and wild relatives. To this end, I have been working on generating a diverse transcripts data set to be used as input for MAKER (along with protein sequences and the 'official' tomato annotation). My transcripts set was generated by collecting multiple available RNA-Seq results from SRA, covering diverse variants, conditions and tissues, and assembling them into transcripts using Trinity. My goal is to have a data set as diverse and broad as possible. > Now I have ~30 fasta files of transcripts, originating from different studies. Of course, many of the transcripts are redundant and/or partial. I am exploring ways to merge the multiple data sets into a non-redundant one, while also stitching partial transcripts into longer ones based on overlaps. > However, this turns out to be not-so-trivial and I am wandering if this is really necessary in order to get a good annotation? Maybe I can just concatenate all my transcriptome assembly results, and MAKER will handle redundant and partial transcripts? > Can someone clarify how this works, and try to assess if an annotation based on a merged data set should be superior to one that didn't undergo such a process? If someone has actual experience with such data, that would be really helpful, but any advice would be highly appreciated. > > Thanks a lot and best regards, > Lior > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Jul 5 12:50:36 2018 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 5 Jul 2018 12:50:36 -0600 Subject: [maker-devel] [CAUTION: Suspicious Link] map_forward=1 not mapping reference ID's to output correctly In-Reply-To: References: Message-ID: <4EE96E7F-5F5B-4988-BC9C-FC441848B768@gmail.com> A quick overview of MAKER behavior. MAKER will keep everything in model_gff as long as you don?t provide another predictor to run or pred_gff file to use. But if you give it a predictor to run, it takes that as an indicator that you want to update models. So model_gff may get replaced by another prediction that overlaps it but scores better. So depending on the behavior you want, make sure you are using model_gff and do or don?t provide a gene predictor to run. ?Carson > On Jun 22, 2018, at 2:04 PM, Poelchau, Monica wrote: > > Hi Kapeel, > > If you just want your community annotations to replace models in an existing gene set, we have a tool for this: > > https://github.com/NAL-i5K/GFF3toolkit > > You?d need to run gff3_QC on your annotation files first to make sure your annotations are okay, then use gff3_merge to merge your community annotations with your existing gene set (in gff3 format). If you end up trying this out - we?re actively developing the GFF3toolkit, so feel free to post an issue if you notice any problems. > > Hth, > > Monica > > From: maker-devel > on behalf of Kapeel Chougule > > Date: Friday, June 22, 2018 at 13:53 > To: "maker-devel at yandell-lab.org " > > Subject: [CAUTION: Suspicious Link][maker-devel] map_forward=1 not mapping reference ID's to output correctly > > PROCEED WITH CAUTION: This message triggered warnings of potentially malicious web content. Evaluate this email by considering whether you are expecting the message, along with inspection for suspicious links. > > Questions: Spam.Abuse at wdc.usda.gov > > Hi, > > I am trying to update community annotation in the light of new evidence data but my MAKER runs are not keeping all the genes from the community annotation. > > > Community annotation feature count: 2 1 bicolor 239969 CDS 266301 exon 51066 five_prime_UTR 34129 gene 47121 mRNA 53708 three_prime_UTR > MAKER gene count-> > awk '$3=="gene"{print}' maker_output.all.gff | grep "Sobic*" | wc -l 21105 > > In the maker_opts.ctl file attached, I did make keep_preds=1 and map_forward=1 which keep all the community gene models even if they dont have evidence support. This was explained here: > http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Updating_annotations_in_light_of_new_data > . So not sure why we dont have the all the community gene models mapped in the MAKER output > > Thanks > > Kapeel > -- > > Kapeel Chougule > Computational Scientist Developer II > One Bungtown Road Cold Spring Harbor, NY 11724 > http://www.warelab.org/ > > > > This electronic message contains information generated by the USDA solely for the intended recipients. Any unauthorized interception of this message or the use or disclosure of the information it contains may violate the law and subject the violator to civil or criminal penalties. If you believe you have received this message in error, please notify the sender and delete the email immediately. _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Jul 5 13:17:14 2018 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 5 Jul 2018 13:17:14 -0600 Subject: [maker-devel] Maker Error : Thread 1 terminated abnormally.. In-Reply-To: References: Message-ID: Sorry for the slow reply. Make sure you find out what flavor of MPI you are using (MPICH, MVAPICH2, Intel MPI, or OpenMPI). MAKER does not work with MVAPICH2. It can work with Intel MPI and OpenMPI with some command line modification. And it always works with MPICH, but MPICH may not be able to scale to more than ~100 CPUs. This command ?-mca btl ^openib?, is only for OpenMPI for example. Also if using OpenMPI, set LD_PRELOAD in accordance with the INSTALL documentation. Also make sure you do not have multiple MPI flavors installed and you compiled MAKER with one then are running with a different flavor. That will cause failure shortly after starting MAKER. Try looking further back in your STDER for the actual cause. The ?Thread 1 terminated abnormally:? message is the tail end of the failure snowball, so the actual cause is often much further back. ?Carson > On Jun 26, 2018, at 9:36 AM, Andr? Machado wrote: > > Hi , > > First of all thanks for your efforts in Maker pipeline. Its a tremendous help for the people that works with genomes. > In the last 4 days i have broke my head.. with an error .. but still without a solution. > I found this old thread: https://groups.google.com/forum/#!msg/maker-devel/X2-76BH9gvg/rU4kLJ3B6tsJ > Seems to be a quite similar... but don't point to a specific solution. > I have run maker with the data test and all runned ok. Maker finalize the entire process without errors. > Recently, i?m trying to aplly my own data on MPI cluster. But this error, frequently occurred. > Thread 1 terminated abnormally: ../dna.maker.output/mpi_blastdb/dna%2Efa.mpi.1/dna%2Efa.mpi.1.0 > --> rank=8, hostname=compute-0-1.local, at ../Analysis/Geno/maker/bin/maker line 1451 thread 1. > --> rank=8, hostname=compute-0-1.local > deleted:0 hits > deleted:0 hits > preparing ab-inits > deleted:0 hits > deleted:0 hits > FATAL: Thread terminated, causing all processes to fail > --> rank=8, hostname=compute-0-1.local > deleted:0 hits > > Basically im tring to run a maker with dna.fa, rna.fa, prot.fa and my_custom_lib_of_repeats.fa, to produce raw genes models which will be used to train SNAP. > > I already used several command lines and all gave me the same error.. The only change between different tests was the local of the error, sometimes happened in compute-0-1.local other time in compute-0-4.local or in another one. > mpiexec -n 63 --hostfile Host maker 1>1.log 2>2.err > > mpiexec --hostfile Host maker 1>1.log 2>2.err > mpiexec -mca btl ^openib -n 63 --hostfile Host maker 1>1.log 2>2.err > nohup mpiexec -mca btl ^openib -n 63 --hostfile Host maker -a 1>1.log 2>2.err > > The log file as well the option files are provided below. > > Many thanks in advance, > > Andr? > > <2.log>_______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From andremmachado25 at gmail.com Wed Jul 4 05:16:08 2018 From: andremmachado25 at gmail.com (=?UTF-8?Q?Andr=C3=A9_Machado?=) Date: Wed, 4 Jul 2018 12:16:08 +0100 Subject: [maker-devel] =?utf-8?q?Maker_Error_=3A_Thread_1_terminated_abno?= =?utf-8?q?rmally=2E=2E_Andr=C3=A9_Machado_=3Candremmachado25=40gma?= =?utf-8?q?il=2Ecom=3E_AttachmentsJun_26_=288_days_ago=29_to_maker-?= =?utf-8?q?devel_Hi_=2C_First_of_all_thanks_for_your_efforts_in_Mak?= =?utf-8?q?er_pipeline=2E_Its_a_tremendous_help_for_the_people_that?= =?utf-8?q?_works_with_genomes=2E_In_the_last_4_days_i_have_broke_m?= =?utf-8?q?y_head=2E=2E_with_an_error_=2E=2E_but_still_without_a_so?= =?utf-8?q?lution=2E_I_found_this_old_thread=3A_https=3A//groups=2E?= =?utf-8?q?google=2Ecom/forum/=23!msg/maker-devel/X2-76BH9gvg/rU4kL?= =?utf-8?q?J3B6tsJ_Seems_to_be_a_quite_similar=2E=2E=2E_but_don=27t?= =?utf-8?q?_point_to_a_specific_solution=2E_I_have_run_maker_with_t?= =?utf-8?q?he_data_test_and_all_runned_ok=2E_Maker_finalize_the_ent?= =?utf-8?q?ire_process_without_errors=2E_Recently=2C_i=E2=80=99m_tr?= =?utf-8?q?ying_to_aplly_my_own_data_on_MPI_cluster=2E_But_this_err?= =?utf-8?q?or=2C_frequently_occurred=2E_Thread_1_terminated_abnorma?= =?utf-8?q?lly=3A_=2E=2E/dna=2Emaker=2Eoutput/mpi=5Fblastdb/dna=252?= =?utf-8?b?RWZhLm1waS4xL2RuYSUyRWZhLm1waS4xLjAgLS0+IHJhbms9OCwgaG9z?= =?utf-8?q?tname=3Dcompute-0-1=2Elocal=2C_at_=2E=2E/Analysis/Geno/m?= =?utf-8?q?aker/bin/maker_line_1451_thread_1=2E_--=3E_rank=3D8=2C_h?= =?utf-8?q?ostname=3Dcompute-0-1=2Elocal_deleted=3A0_hits_deleted?= =?utf-8?q?=3A0_hits_preparing_ab-ini?= Message-ID: Hi , First of all thanks for your efforts in Maker pipeline. Its a tremendous help for the people that works with genomes. In the last 4 days i have broke my head.. with an error .. but still without a solution. I found this old thread: https://groups.google.com/ forum/#!msg/maker-devel/X2-76BH9gvg/rU4kLJ3B6tsJ Seems to be a quite similar... but don't point to a specific solution. I have run maker with the data test and all runned ok. Maker finalize the entire process without errors. Recently, i?m trying to aplly my own data on MPI cluster. But this error, frequently occurred. Thread 1 terminated abnormally: ../dna.maker.output/mpi_ blastdb/dna%2Efa.mpi.1/dna%2Efa.mpi.1.0 --> rank=8, hostname=compute-0-1.local, at ../Analysis/Geno/maker/bin/maker line 1451 thread 1. --> rank=8, hostname=compute-0-1.local deleted:0 hits deleted:0 hits preparing ab-inits deleted:0 hits deleted:0 hits FATAL: Thread terminated, causing all processes to fail --> rank=8, hostname=compute-0-1.local deleted:0 hits Basically im tring to run a maker with dna.fa, rna.fa, prot.fa and my_custom_lib_of_repeats.fa, to produce raw genes models which will be used to train SNAP. I already used several command lines and all gave me the same error.. The only change between different tests was the local of the error, sometimes happened in compute-0-1.local other time in compute-0-4.local or in another one. mpiexec -n 63 --hostfile Host maker 1>1.log 2>2.err mpiexec --hostfile Host maker 1>1.log 2>2.err mpiexec -mca btl ^openib -n 63 --hostfile Host maker 1>1.log 2>2.err nohup mpiexec -mca btl ^openib -n 63 --hostfile Host maker -a 1>1.log 2>2.err The log file as well the option files are provided below. Many thanks in advance, Andr? -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 2.log Type: text/x-log Size: 38655 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_exe.ctl Type: application/octet-stream Size: 1224 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_opts.ctl Type: application/octet-stream Size: 4548 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_bopts.ctl Type: application/octet-stream Size: 1413 bytes Desc: not available URL: From liorglck at gmail.com Wed Jul 4 06:28:14 2018 From: liorglck at gmail.com (Lior Glick) Date: Wed, 4 Jul 2018 14:28:14 +0200 Subject: [maker-devel] How sensitive is MAKER to redundant/partial transcripts? Message-ID: Dear MAKER users, I am new to MAKER and would like your advice. I am planning to annotate multiple genomes of tomato variants and wild relatives. To this end, I have been working on generating a diverse transcripts data set to be used as input for MAKER (along with protein sequences and the 'official' tomato annotation). My transcripts set was generated by collecting multiple available RNA-Seq results from SRA, covering diverse variants, conditions and tissues, and assembling them into transcripts using Trinity. My goal is to have a data set as diverse and broad as possible. Now I have ~30 fasta files of transcripts, originating from different studies. Of course, many of the transcripts are redundant and/or partial. I am exploring ways to merge the multiple data sets into a non-redundant one, while also stitching partial transcripts into longer ones based on overlaps. However, this turns out to be not-so-trivial and I am wandering if this is really necessary in order to get a good annotation? Maybe I can just concatenate all my transcriptome assembly results, and MAKER will handle redundant and partial transcripts? Can someone clarify how this works, and try to assess if an annotation based on a merged data set should be superior to one that didn't undergo such a process? If someone has actual experience with such data, that would be really helpful, but any advice would be highly appreciated. Thanks a lot and best regards, Lior -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Jul 12 14:05:00 2018 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 12 Jul 2018 14:05:00 -0600 Subject: [maker-devel] =?utf-8?q?Maker_Error_=3A_Thread_1_terminated_abno?= =?utf-8?q?rmally=2E=2E_Andr=C3=A9_Machado_=3Candremmachado25=40gmail=2Eco?= =?utf-8?q?m=3E_AttachmentsJun_26_=288_days_ago=29_to_maker-devel_Hi_=2C_F?= =?utf-8?q?irst_of_all_thanks_for_your_efforts_in_Maker_pipeline=2E_Its_a_?= =?utf-8?q?tremendous_help_for_the_people_that_works_with_genomes=2E_In_th?= =?utf-8?q?e_last_4_days_i_have_broke_my_head=2E=2E_with_an_error_=2E=2E_b?= =?utf-8?q?ut_still_without_a_solution=2E_I_found_this_old_thread=3A_https?= =?utf-8?q?=3A//groups=2Egoogle=2Ecom/forum/=23!msg/maker-devel/X2-76BH9gv?= =?utf-8?q?g/rU4kLJ3B6tsJ_Seems_to_be_a_quite_similar=2E=2E=2E_but_don=27t?= =?utf-8?q?_point_to_a_specific_solution=2E_I_have_run_maker_with_the_data?= =?utf-8?q?_test_and_all_runned_ok=2E_Maker_finalize_the_entire_process_wi?= =?utf-8?q?thout_errors=2E_Recently=2C_i=E2=80=99m_trying_to_aplly_my_own_?= =?utf-8?q?data_on_MPI_cluster=2E_But_this_error=2C_frequently_occurred=2E?= =?utf-8?q?_Thread_1_terminated_abnormally=3A_=2E=2E/dna=2Emaker=2Eoutput/?= =?utf-8?q?mpi=5Fblastdb/dna=252Efa=2Empi=2E1/dna=252Efa=2Empi=2E1=2E0_--?= =?utf-8?q?=3E_rank=3D8=2C_hostname=3Dcompute-0-1=2Elocal=2C_at_=2E=2E/Ana?= =?utf-8?q?lysis/Geno/maker/bin/maker_line_1451_thread_1=2E_--=3E_rank=3D8?= =?utf-8?q?=2C_hostname=3Dcompute-0-1=2Elocal_deleted=3A0_hits_deleted=3A0?= =?utf-8?q?_hits_preparing_ab-ini?= In-Reply-To: References: Message-ID: <5F1E5499-239E-405E-81EC-CECC755D7838@gmail.com> Because you truncated / removed line before the actual error (I need to see the several hundred lines that happened before "Thread 1 terminated abnormally: ../dna.maker.output/mpi_blastdb/dna%2Efa.mpi.1/dna%2Efa.mpi.1.0?), I can?t give hyou more info. But you are getting a lot of OpenMPI complaints at the start. You may need to reinstall OpenMPI or use MPICH instead (both will require you to reinstall maker as it will need to rebuild the MPI C/Perl binding for the new installation). Also when using OpenMPI, make sure to export LD_PRELOAD in the way outlined in the ?/maker/INSTALL instructions. ?Carson > On Jul 4, 2018, at 5:16 AM, Andr? Machado wrote: > > Hi , > > First of all thanks for your efforts in Maker pipeline. Its a tremendous help for the people that works with genomes. > In the last 4 days i have broke my head.. with an error .. but still without a solution. > I found this old thread: https://groups.google.com/forum/#!msg/maker-devel/X2-76BH9gvg/rU4kLJ3B6tsJ > Seems to be a quite similar... but don't point to a specific solution. > I have run maker with the data test and all runned ok. Maker finalize the entire process without errors. > Recently, i?m trying to aplly my own data on MPI cluster. But this error, frequently occurred. > Thread 1 terminated abnormally: ../dna.maker.output/mpi_blastdb/dna%2Efa.mpi.1/dna%2Efa.mpi.1.0 > --> rank=8, hostname=compute-0-1.local, at ../Analysis/Geno/maker/bin/maker line 1451 thread 1. > --> rank=8, hostname=compute-0-1.local > deleted:0 hits > deleted:0 hits > preparing ab-inits > deleted:0 hits > deleted:0 hits > FATAL: Thread terminated, causing all processes to fail > --> rank=8, hostname=compute-0-1.local > deleted:0 hits > > Basically im tring to run a maker with dna.fa, rna.fa, prot.fa and my_custom_lib_of_repeats.fa, to produce raw genes models which will be used to train SNAP. > > I already used several command lines and all gave me the same error.. The only change between different tests was the local of the error, sometimes happened in compute-0-1.local other time in compute-0-4.local or in another one. > mpiexec -n 63 --hostfile Host maker 1>1.log 2>2.err > > mpiexec --hostfile Host maker 1>1.log 2>2.err > mpiexec -mca btl ^openib -n 63 --hostfile Host maker 1>1.log 2>2.err > nohup mpiexec -mca btl ^openib -n 63 --hostfile Host maker -a 1>1.log 2>2.err > > The log file as well the option files are provided below. > > Many thanks in advance, > > Andr? > > <2.log>_______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Jul 12 14:38:33 2018 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 12 Jul 2018 14:38:33 -0600 Subject: [maker-devel] How sensitive is MAKER to redundant/partial transcripts? In-Reply-To: References: Message-ID: MAKER will automatically collapse redundant evidence. The only thing you may need to worry about with too many datasets is background transcription. With more datasets you will have more spurious assemblies from background transcription (if you sequence deep enough everything is transcribed at some level). You should also look at the results in a browser like apollo, you may find that some datasets are more noisy than others and it would be beneficial to drop them especially if they are redundant. So always do a visual review of results. ?Carson > On Jul 4, 2018, at 6:28 AM, Lior Glick wrote: > > Dear MAKER users, > > I am new to MAKER and would like your advice. > I am planning to annotate multiple genomes of tomato variants and wild relatives. To this end, I have been working on generating a diverse transcripts data set to be used as input for MAKER (along with protein sequences and the 'official' tomato annotation). My transcripts set was generated by collecting multiple available RNA-Seq results from SRA, covering diverse variants, conditions and tissues, and assembling them into transcripts using Trinity. My goal is to have a data set as diverse and broad as possible. > Now I have ~30 fasta files of transcripts, originating from different studies. Of course, many of the transcripts are redundant and/or partial. I am exploring ways to merge the multiple data sets into a non-redundant one, while also stitching partial transcripts into longer ones based on overlaps. > However, this turns out to be not-so-trivial and I am wandering if this is really necessary in order to get a good annotation? Maybe I can just concatenate all my transcriptome assembly results, and MAKER will handle redundant and partial transcripts? > Can someone clarify how this works, and try to assess if an annotation based on a merged data set should be superior to one that didn't undergo such a process? If someone has actual experience with such data, that would be really helpful, but any advice would be highly appreciated. > > Thanks a lot and best regards, > Lior > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From shijunpeng at cau.edu.cn Sat Jul 14 02:04:38 2018 From: shijunpeng at cau.edu.cn (=?UTF-8?B?5Y+y5L+K6bmP?=) Date: Sat, 14 Jul 2018 16:04:38 +0800 (GMT+08:00) Subject: [maker-devel] Ask for help about the collapse of Maker (version 2.31.9) when annotated with Fgenesh In-Reply-To: References: Message-ID: <183e519e.83bf.16497d1fd4b.Coremail.shijunpeng@cau.edu.cn> Dear Carson, First of all, I must apologize that I could't post my questions in Google group since I can't get access to Google in mainland China. I am using Maker (version 2.31.9) to annotate several foxtail millet genomes. I combined Augustus and Fgenesh (v.3.1.1) for the de novo annotation of these genomes. The majority of contigs were anotated well with maker pipeline. While, several contigs failed when annotated with Fgenesh with the following error information: #--------- command -------------# Widget::fgenesh: /NAS7/home/shijunpeng/software/maker/bin/../lib/Widget/fgenesh/fgenesh_wrap /NAS7/home/shijunpeng/software/fgenesh/fgenesh /NAS7/home/shijunpeng/software/fgenesh/Monocots /tmp/43438.1.all.q/maker_8zLUxB/0/108_0.4597215-4597401.Monocots.auto_annotator.fgenesh.fasta -exon_table:/tmp/43438.1.all.q/maker_8zLUxB/0/108_0.4597215-4597401.Monocots.auto_annotator.xdef.fgenesh > /tmp/43438.1.all.q/maker_8zLUxB/0/108_0.4597215- #-------------------------------# ERROR: FgenesH failed --> rank=NA, hostname=bioinfor3.local ERROR: Failed while annotating transcripts ERROR: Chunk failed at level:1, tier_type:4 FAILED CONTIG:scaffold_1 ERROR: Chunk failed at level:6, tier_type:0 FAILED CONTIG:scaffold_1 ############################################################################################################################################### A system core file generated after this collapse. I checked the temperate fasta file 108_0.4597215-4597401.Monocots.auto_annotator.fgenesh.fasta to be normal about ~300 bp. I also checked my original sequence file and confirmed no problem (A,T,C,G and N). I also tried to set the pred_flank option from 200 (original) to 0 and the error still exists. I ran the Maker pipeline in a single node with 16 processors and 256 Gb RAMs, so it may be not due to the MPI problems. Below were my detailed maker bahavior options: #-----MAKER Behavior Options max_dna_len=300000 #length for dividing up contigs into chunks (increases/decreases memory usage) min_contig=10000 #skip genome contigs below this length (under 10kb are often useless) pred_flank=0 #flank for extending evidence clusters sent to gene predictors pred_stats=1 #report AED and QI statistics for all predictions as well as models AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and 1) min_protein=0 #require at least this many amino acids in predicted proteins alt_splice=1 #Take extra steps to try and find alternative splicing, 1 = yes, 0 = no always_complete=1 #extra steps to force start and stop codons, 1 = yes, 0 = no map_forward=1 #map names and attributes forward from old GFF3 genes, 1 = yes, 0 = no keep_preds=0 #Concordance threshold to add unsupported gene prediction (bound by 0 and 1) split_hit=10000 #length for the splitting of hits (expected max intron size for evidence alignments) single_exon=0 #consider single exon EST evidence when generating annotations, 1 = yes, 0 = no single_length=250 #min length required for single exon ESTs if 'single_exon is enabled' correct_est_fusion=0 #limits use of ESTs in annotation to avoid fusion genes tries=5 #number of times to try a contig if there is a failure for some reason clean_try=0 #remove all data from previous run before retrying, 1 = yes, 0 = no clean_up=0 #removes theVoid directory with individual analysis files, 1 = yes, 0 = no TMP= #specify a directory other than the system default temporary directory for temporary files Could you please help me to solve this error? I am looking forward to hearing from you. Sincerely, Junpeng -- Junpeng Shi, PhD State Key Lab For Agrobiotech, China Agricultural University National Maize Improvement Center of China Center For Life Science, NO.2, The West Street of Yuanmingyuan Park, Beijing, P.R.China Tel?+86-13581863941 From liorglic at mail.tau.ac.il Tue Jul 24 01:45:06 2018 From: liorglic at mail.tau.ac.il (Lior Glick) Date: Tue, 24 Jul 2018 09:45:06 +0200 Subject: [maker-devel] Annotation of a new variant within a species Message-ID: Hello, I am trying to annotate multiple variants of tomato. While a good annotation of the reference genome is available, I have denovo-assembled other variants of the same species and wish to annotate them. Most MAKER documentation refers to annotation of a new species, while using transcripts and proteins from either the exact same sample (individual) or from "an alternate organism", so I'm not sure what to do in this case, where I am annotating various samples from the same species. I have two questions: 1. Regarding transcripts data, how should I use transcripts from other variants of the same species? Namely, should I use the est or the altest parameter? What is the actual difference in behavior? 2. Is there a way to incorporate gene models (in gff format) from the reference annotation? I expect high similarity in my assembled variants, but not identity in terms of content and coordinates, so neither pred_gff nor model_gff sound like what I need, as far as I understand. I could also use the reference annotation and sequence to extract cDNA and provide them as EST data. Is this the way to go? It feels like some information on introns might be lost this way. Would highly appreciate your answers to these questions or any other advice. Thank you very much! -------------- next part -------------- An HTML attachment was scrubbed... URL: From roscito at mpi-cbg.de Tue Jul 31 06:59:58 2018 From: roscito at mpi-cbg.de (Ju Roscito) Date: Tue, 31 Jul 2018 14:59:58 +0200 Subject: [maker-devel] Few alternative isoforms when alt_splice=0 Message-ID: <2C92DF72-0733-490F-A2EE-6F3724EF7099@mpi-cbg.de> Dear all, I have a question about the behaviour of alt_splice option, seems there?s not much about it on the forum. I have run a single round of MAKER (2.31.9) on a vertebrate genome, with trinity mRNA data and mapped proteins from closely-related species. I set alt_splice to 0, but still got from two to four mRNAs for ~20 out of the 19.000 predicted genes. Has someone also seen the same? Any idea why would that happen? Thanks a lot in advance. From timo.metz at googlemail.com Fri Jul 20 06:20:05 2018 From: timo.metz at googlemail.com (Timo Metz) Date: Fri, 20 Jul 2018 12:20:05 -0000 Subject: [maker-devel] MAKER chooser algorithm Message-ID: Hey, I am working on the improvement of an already existing annotation. I could find that sometimes MAKER would split or merge genes where it intuitively does not look correct when looking at the evidence. Please find two examples attached. The first track is the old annotation, the second track the new annotation, then there is RNA-seq data, proteins, repeats, snap prediction, augustus prediction. It is visible, that in both cases the evidence supports two genes, and one gene predictor in each case tends to create one gene where the other one creates two genes. I do not understand why in this case the gene is merged, if evidence and also one ab initio prediction support rather two genes. Are there any suggestions on how to solve this? best Timo -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Picture1.png Type: image/png Size: 26778 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Picutre2.png Type: image/png Size: 24145 bytes Desc: not available URL: From cganote at iu.edu Tue Jul 24 10:31:02 2018 From: cganote at iu.edu (Ganote, Carrie L) Date: Tue, 24 Jul 2018 16:31:02 -0000 Subject: [maker-devel] Maker ignores evidence and just returns gffs with genome contigs Message-ID: Running maker, I don't see anything in the gff except the names of the contigs and their lengths: ##gff-version 3 SczI0sq_2092%3%3D3122 . contig 1 119548 . . . ID=SczI0sq_2092%3%3D3122;Name=SczI0sq_2092%3%3D3122 ### SczI0sq_842%3%3D1778 . contig 1 4693 . . . ID=SczI0sq_842%3B%3D1778;Name=SczI0sq_842%3%3D1778 ### ... In my opts file, I have: #-----Genome (these are always required) genome=/projects/Reference/genome.chr.fa #genome sequence (fasta file or fasta embeded in GFF3 file) organism_type=eukaryotic #eukaryotic or prokaryotic. Default is eukaryotic #-----Re-annotation Using MAKER Derived GFF3 maker_gff= #MAKER derived GFF3 file est_pass=0 #use ESTs in maker_gff: 1 = yes, 0 = no altest_pass=0 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no protein_pass=0 #use protein alignments in maker_gff: 1 = yes, 0 = no rm_pass=0 #use repeats in maker_gff: 1 = yes, 0 = no model_pass=0 #use gene models in maker_gff: 1 = yes, 0 = no pred_pass=0 #use ab-initio predictions in maker_gff: 1 = yes, 0 = no other_pass=0 #passthrough anyything else in maker_gff: 1 = yes, 0 = no #-----EST Evidence (for best results provide a file for at least one) est= #set of ESTs or assembled mRNA-seq in fasta format altest= #EST/cDNA sequence file in fasta format from an alternate organism est_gff=/projects/Reference/Maker/EST_assembled.all.gff #aligned ESTs or mRNA-seq from an external GFF3 file altest_gff= #aligned ESTs from a closly relate species in GFF3 format #-----Protein Homology Evidence (for best results provide a file for at least one) protein= #protein sequence file in fasta format (i.e. from mutiple oransisms) protein_gff=/projects/Reference/Maker/exonerate_withCC.gff3 #aligned protein homology evidence from an external GFF3 file #-----Repeat Masking (leave values blank to skip repeat masking) model_org= #select a model organism for RepBase masking in RepeatMasker rmlib= #provide an organism specific repeat library in fasta format for RepeatMasker repeat_protein= #provide a fasta file of transposable element proteins for RepeatRunner rm_gff= #pre-identified repeat elements from an external GFF3 file prok_rm=0 #forces MAKER to repeatmask prokaryotes (no reason to change this), 1 = yes, 0 = no softmask=1 #use soft-masking rather than hard-masking in BLAST (i.e. seg and dust filtering) #-----Gene Prediction snaphmm= #SNAP HMM file gmhmm= #GeneMark HMM file augustus_species= #Augustus gene prediction species model fgenesh_par_file= #FGENESH parameter file pred_gff=/projects/Reference/Maker/augustus_output.reformated.gff #ab-initio predictions from an external GFF3 file model_gff= #annotated gene models from an external GFF3 file (annotation pass-through) est2genome=0 #infer gene predictions directly from ESTs, 1 = yes, 0 = no protein2genome=0 #infer predictions from protein homology, 1 = yes, 0 = no trna=0 #find tRNAs with tRNAscan, 1 = yes, 0 = no snoscan_rrna= #rRNA file to have Snoscan find snoRNAs unmask=0 #also run ab-initio prediction programs on unmasked sequence, 1 = yes, 0 = no #-----Other Annotation Feature Types (features MAKER doesn't recognize) other_gff= #extra features to pass-through to final MAKER generated GFF3 file #-----External Application Behavior Options alt_peptide=C #amino acid used to replace non-standard amino acids in BLAST databases cpus=1 #max number of cpus to use in BLAST and RepeatMasker (not for MPI, leave 1 when using MPI) #-----MAKER Behavior Options max_dna_len=100000 #length for dividing up contigs into chunks (increases/decreases memory usage) min_contig=1 #skip genome contigs below this length (under 10kb are often useless) pred_flank=200 #flank for extending evidence clusters sent to gene predictors pred_stats=0 #report AED and QI statistics for all predictions as well as models AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and 1) min_protein=0 #require at least this many amino acids in predicted proteins alt_splice=0 #Take extra steps to try and find alternative splicing, 1 = yes, 0 = no always_complete=0 #extra steps to force start and stop codons, 1 = yes, 0 = no map_forward=0 #map names and attributes forward from old GFF3 genes, 1 = yes, 0 = no keep_preds=0 #Concordance threshold to add unsupported gene prediction (bound by 0 and 1) split_hit=10000 #length for the splitting of hits (expected max intron size for evidence alignments) single_exon=0 #consider single exon EST evidence when generating annotations, 1 = yes, 0 = no single_length=250 #min length required for single exon ESTs if 'single_exon is enabled' correct_est_fusion=0 #limits use of ESTs in annotation to avoid fusion genes tries=2 #number of times to try a contig if there is a failure for some reason clean_try=0 #remove all data from previous run before retrying, 1 = yes, 0 = no clean_up=0 #removes theVoid directory with individual analysis files, 1 = yes, 0 = no TMP= #specify a directory other than the system default temporary directory for temporary files It ran for ~3 hours and all contigs in the log file said FINISHED. No failures. Did I set something wrong? -Carrie -------------- next part -------------- An HTML attachment was scrubbed... URL: From jennifer.anderson at ebc.uu.se Tue Jul 3 05:57:56 2018 From: jennifer.anderson at ebc.uu.se (Jennifer Anderson) Date: Tue, 3 Jul 2018 13:57:56 +0200 Subject: [maker-devel] Genemark XXX.mod files Message-ID: <902950FF-775C-46DC-987A-5666A56A6650@ebc.uu.se> Hello, I am working on annotations for fungal genomes, using GenemarkES with ?fungi for gene prediction. In earlier attempts, I did not use the training flag, and I did get the output gmhmm file. Now I have tried with the training flag and do not get this file. In the /run/ directory I do get mod files ES_A.mod, ES_B.mod, and ES_C.mod, as well as ini.mod. Does one of these files work as the ES.mod file as in "gmhmm=../train_genemark/es.mod #GeneMark HMM file? from http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/The_MAKER_control_files_explained? I don?t find documentation of the genemarkES output online. Thank you. Jenni N?r du har kontakt med oss p? Uppsala universitet med e-post s? inneb?r det att vi behandlar dina personuppgifter. F?r att l?sa mer om hur vi g?r det kan du l?sa h?r: http://www.uu.se/om-uu/dataskydd-personuppgifter/ E-mailing Uppsala University means that we will process your personal data. For more information on how this is performed, please read here: http://www.uu.se/om-uu/dataskydd-personuppgifter/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From liorglic at mail.tau.ac.il Wed Jul 4 06:32:05 2018 From: liorglic at mail.tau.ac.il (Lior Glick) Date: Wed, 4 Jul 2018 14:32:05 +0200 Subject: [maker-devel] How sensitive is MAKER to redundant/partial transcripts? Message-ID: Dear MAKER users, I am new to MAKER and would like your advice. I am planning to annotate multiple genomes of tomato variants and wild relatives. To this end, I have been working on generating a diverse transcripts data set to be used as input for MAKER (along with protein sequences and the 'official' tomato annotation). My transcripts set was generated by collecting multiple available RNA-Seq results from SRA, covering diverse variants, conditions and tissues, and assembling them into transcripts using Trinity. My goal is to have a data set as diverse and broad as possible. Now I have ~30 fasta files of transcripts, originating from different studies. Of course, many of the transcripts are redundant and/or partial. I am exploring ways to merge the multiple data sets into a non-redundant one, while also stitching partial transcripts into longer ones based on overlaps. However, this turns out to be not-so-trivial and I am wandering if this is really necessary in order to get a good annotation? Maybe I can just concatenate all my transcriptome assembly results, and MAKER will handle redundant and partial transcripts? Can someone clarify how this works, and try to assess if an annotation based on a merged data set should be superior to one that didn't undergo such a process? If someone has actual experience with such data, that would be really helpful, but any advice would be highly appreciated. Thanks a lot and best regards, Lior -------------- next part -------------- An HTML attachment was scrubbed... URL: From jason.stajich at gmail.com Thu Jul 5 12:13:57 2018 From: jason.stajich at gmail.com (Jason Stajich) Date: Thu, 5 Jul 2018 11:13:57 -0700 Subject: [maker-devel] Genemark XXX.mod files In-Reply-To: <902950FF-775C-46DC-987A-5666A56A6650@ebc.uu.se> References: <902950FF-775C-46DC-987A-5666A56A6650@ebc.uu.se> Message-ID: the run/ES_C.mod should be the right one if it is there. It is possible is crashing on one of the training / retraining? Jason Stajich jason.stajich at gmail.com On Tue, Jul 3, 2018 at 11:05 AM Jennifer Anderson < jennifer.anderson at ebc.uu.se> wrote: > > Hello, > > I am working on annotations for fungal genomes, using GenemarkES with > ?fungi for gene prediction. In earlier attempts, I did not use the > training flag, and I did get the output gmhmm file. Now I have tried with > the training flag and do not get this file. In the /run/ directory I do > get mod files ES_A.mod, ES_B.mod, and ES_C.mod, as well as ini.mod. Does > one of these files work as the ES.mod file as in > "gmhmm=../train_genemark/es.mod #GeneMark HMM file? from > http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/The_MAKER_control_files_explained? I > don?t find documentation of the genemarkES output online. > > Thank you. > > Jenni > > > > > > > > > > N?r du har kontakt med oss p? Uppsala universitet med e-post s? inneb?r > det att vi behandlar dina personuppgifter. F?r att l?sa mer om hur vi g?r > det kan du l?sa h?r: http://www.uu.se/om-uu/dataskydd-personuppgifter/ > > E-mailing Uppsala University means that we will process your personal > data. For more information on how this is performed, please read here: > http://www.uu.se/om-uu/dataskydd-personuppgifter/ > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Jul 5 12:47:38 2018 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 5 Jul 2018 12:47:38 -0600 Subject: [maker-devel] How sensitive is MAKER to redundant/partial transcripts? In-Reply-To: References: Message-ID: <788E84AB-DB85-43AD-8FE1-C1D8A7DBD4B5@gmail.com> MAKER will collapse redundant evidence after alignment, so it will primarily just increase run time. The main issue with so many datasets would be false positive alignments (assembled background transcription). You can look at individual contigs in Apollo, IGV, or other browser to see where spurious alignments occur and if they are overall associated with a particular dataset (it?s ok to throw out a noisy dataset especially if you have additional data). ?Carson > On Jul 4, 2018, at 6:32 AM, Lior Glick wrote: > > Dear MAKER users, > > I am new to MAKER and would like your advice. > I am planning to annotate multiple genomes of tomato variants and wild relatives. To this end, I have been working on generating a diverse transcripts data set to be used as input for MAKER (along with protein sequences and the 'official' tomato annotation). My transcripts set was generated by collecting multiple available RNA-Seq results from SRA, covering diverse variants, conditions and tissues, and assembling them into transcripts using Trinity. My goal is to have a data set as diverse and broad as possible. > Now I have ~30 fasta files of transcripts, originating from different studies. Of course, many of the transcripts are redundant and/or partial. I am exploring ways to merge the multiple data sets into a non-redundant one, while also stitching partial transcripts into longer ones based on overlaps. > However, this turns out to be not-so-trivial and I am wandering if this is really necessary in order to get a good annotation? Maybe I can just concatenate all my transcriptome assembly results, and MAKER will handle redundant and partial transcripts? > Can someone clarify how this works, and try to assess if an annotation based on a merged data set should be superior to one that didn't undergo such a process? If someone has actual experience with such data, that would be really helpful, but any advice would be highly appreciated. > > Thanks a lot and best regards, > Lior > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Jul 5 12:50:36 2018 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 5 Jul 2018 12:50:36 -0600 Subject: [maker-devel] [CAUTION: Suspicious Link] map_forward=1 not mapping reference ID's to output correctly In-Reply-To: References: Message-ID: <4EE96E7F-5F5B-4988-BC9C-FC441848B768@gmail.com> A quick overview of MAKER behavior. MAKER will keep everything in model_gff as long as you don?t provide another predictor to run or pred_gff file to use. But if you give it a predictor to run, it takes that as an indicator that you want to update models. So model_gff may get replaced by another prediction that overlaps it but scores better. So depending on the behavior you want, make sure you are using model_gff and do or don?t provide a gene predictor to run. ?Carson > On Jun 22, 2018, at 2:04 PM, Poelchau, Monica wrote: > > Hi Kapeel, > > If you just want your community annotations to replace models in an existing gene set, we have a tool for this: > > https://github.com/NAL-i5K/GFF3toolkit > > You?d need to run gff3_QC on your annotation files first to make sure your annotations are okay, then use gff3_merge to merge your community annotations with your existing gene set (in gff3 format). If you end up trying this out - we?re actively developing the GFF3toolkit, so feel free to post an issue if you notice any problems. > > Hth, > > Monica > > From: maker-devel > on behalf of Kapeel Chougule > > Date: Friday, June 22, 2018 at 13:53 > To: "maker-devel at yandell-lab.org " > > Subject: [CAUTION: Suspicious Link][maker-devel] map_forward=1 not mapping reference ID's to output correctly > > PROCEED WITH CAUTION: This message triggered warnings of potentially malicious web content. Evaluate this email by considering whether you are expecting the message, along with inspection for suspicious links. > > Questions: Spam.Abuse at wdc.usda.gov > > Hi, > > I am trying to update community annotation in the light of new evidence data but my MAKER runs are not keeping all the genes from the community annotation. > > > Community annotation feature count: 2 1 bicolor 239969 CDS 266301 exon 51066 five_prime_UTR 34129 gene 47121 mRNA 53708 three_prime_UTR > MAKER gene count-> > awk '$3=="gene"{print}' maker_output.all.gff | grep "Sobic*" | wc -l 21105 > > In the maker_opts.ctl file attached, I did make keep_preds=1 and map_forward=1 which keep all the community gene models even if they dont have evidence support. This was explained here: > http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Updating_annotations_in_light_of_new_data > . So not sure why we dont have the all the community gene models mapped in the MAKER output > > Thanks > > Kapeel > -- > > Kapeel Chougule > Computational Scientist Developer II > One Bungtown Road Cold Spring Harbor, NY 11724 > http://www.warelab.org/ > > > > This electronic message contains information generated by the USDA solely for the intended recipients. Any unauthorized interception of this message or the use or disclosure of the information it contains may violate the law and subject the violator to civil or criminal penalties. If you believe you have received this message in error, please notify the sender and delete the email immediately. _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Jul 5 13:17:14 2018 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 5 Jul 2018 13:17:14 -0600 Subject: [maker-devel] Maker Error : Thread 1 terminated abnormally.. In-Reply-To: References: Message-ID: Sorry for the slow reply. Make sure you find out what flavor of MPI you are using (MPICH, MVAPICH2, Intel MPI, or OpenMPI). MAKER does not work with MVAPICH2. It can work with Intel MPI and OpenMPI with some command line modification. And it always works with MPICH, but MPICH may not be able to scale to more than ~100 CPUs. This command ?-mca btl ^openib?, is only for OpenMPI for example. Also if using OpenMPI, set LD_PRELOAD in accordance with the INSTALL documentation. Also make sure you do not have multiple MPI flavors installed and you compiled MAKER with one then are running with a different flavor. That will cause failure shortly after starting MAKER. Try looking further back in your STDER for the actual cause. The ?Thread 1 terminated abnormally:? message is the tail end of the failure snowball, so the actual cause is often much further back. ?Carson > On Jun 26, 2018, at 9:36 AM, Andr? Machado wrote: > > Hi , > > First of all thanks for your efforts in Maker pipeline. Its a tremendous help for the people that works with genomes. > In the last 4 days i have broke my head.. with an error .. but still without a solution. > I found this old thread: https://groups.google.com/forum/#!msg/maker-devel/X2-76BH9gvg/rU4kLJ3B6tsJ > Seems to be a quite similar... but don't point to a specific solution. > I have run maker with the data test and all runned ok. Maker finalize the entire process without errors. > Recently, i?m trying to aplly my own data on MPI cluster. But this error, frequently occurred. > Thread 1 terminated abnormally: ../dna.maker.output/mpi_blastdb/dna%2Efa.mpi.1/dna%2Efa.mpi.1.0 > --> rank=8, hostname=compute-0-1.local, at ../Analysis/Geno/maker/bin/maker line 1451 thread 1. > --> rank=8, hostname=compute-0-1.local > deleted:0 hits > deleted:0 hits > preparing ab-inits > deleted:0 hits > deleted:0 hits > FATAL: Thread terminated, causing all processes to fail > --> rank=8, hostname=compute-0-1.local > deleted:0 hits > > Basically im tring to run a maker with dna.fa, rna.fa, prot.fa and my_custom_lib_of_repeats.fa, to produce raw genes models which will be used to train SNAP. > > I already used several command lines and all gave me the same error.. The only change between different tests was the local of the error, sometimes happened in compute-0-1.local other time in compute-0-4.local or in another one. > mpiexec -n 63 --hostfile Host maker 1>1.log 2>2.err > > mpiexec --hostfile Host maker 1>1.log 2>2.err > mpiexec -mca btl ^openib -n 63 --hostfile Host maker 1>1.log 2>2.err > nohup mpiexec -mca btl ^openib -n 63 --hostfile Host maker -a 1>1.log 2>2.err > > The log file as well the option files are provided below. > > Many thanks in advance, > > Andr? > > <2.log>_______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From andremmachado25 at gmail.com Wed Jul 4 05:16:08 2018 From: andremmachado25 at gmail.com (=?UTF-8?Q?Andr=C3=A9_Machado?=) Date: Wed, 4 Jul 2018 12:16:08 +0100 Subject: [maker-devel] =?utf-8?q?Maker_Error_=3A_Thread_1_terminated_abno?= =?utf-8?q?rmally=2E=2E_Andr=C3=A9_Machado_=3Candremmachado25=40gma?= =?utf-8?q?il=2Ecom=3E_AttachmentsJun_26_=288_days_ago=29_to_maker-?= =?utf-8?q?devel_Hi_=2C_First_of_all_thanks_for_your_efforts_in_Mak?= =?utf-8?q?er_pipeline=2E_Its_a_tremendous_help_for_the_people_that?= =?utf-8?q?_works_with_genomes=2E_In_the_last_4_days_i_have_broke_m?= =?utf-8?q?y_head=2E=2E_with_an_error_=2E=2E_but_still_without_a_so?= =?utf-8?q?lution=2E_I_found_this_old_thread=3A_https=3A//groups=2E?= =?utf-8?q?google=2Ecom/forum/=23!msg/maker-devel/X2-76BH9gvg/rU4kL?= =?utf-8?q?J3B6tsJ_Seems_to_be_a_quite_similar=2E=2E=2E_but_don=27t?= =?utf-8?q?_point_to_a_specific_solution=2E_I_have_run_maker_with_t?= =?utf-8?q?he_data_test_and_all_runned_ok=2E_Maker_finalize_the_ent?= =?utf-8?q?ire_process_without_errors=2E_Recently=2C_i=E2=80=99m_tr?= =?utf-8?q?ying_to_aplly_my_own_data_on_MPI_cluster=2E_But_this_err?= =?utf-8?q?or=2C_frequently_occurred=2E_Thread_1_terminated_abnorma?= =?utf-8?q?lly=3A_=2E=2E/dna=2Emaker=2Eoutput/mpi=5Fblastdb/dna=252?= =?utf-8?b?RWZhLm1waS4xL2RuYSUyRWZhLm1waS4xLjAgLS0+IHJhbms9OCwgaG9z?= =?utf-8?q?tname=3Dcompute-0-1=2Elocal=2C_at_=2E=2E/Analysis/Geno/m?= =?utf-8?q?aker/bin/maker_line_1451_thread_1=2E_--=3E_rank=3D8=2C_h?= =?utf-8?q?ostname=3Dcompute-0-1=2Elocal_deleted=3A0_hits_deleted?= =?utf-8?q?=3A0_hits_preparing_ab-ini?= Message-ID: Hi , First of all thanks for your efforts in Maker pipeline. Its a tremendous help for the people that works with genomes. In the last 4 days i have broke my head.. with an error .. but still without a solution. I found this old thread: https://groups.google.com/ forum/#!msg/maker-devel/X2-76BH9gvg/rU4kLJ3B6tsJ Seems to be a quite similar... but don't point to a specific solution. I have run maker with the data test and all runned ok. Maker finalize the entire process without errors. Recently, i?m trying to aplly my own data on MPI cluster. But this error, frequently occurred. Thread 1 terminated abnormally: ../dna.maker.output/mpi_ blastdb/dna%2Efa.mpi.1/dna%2Efa.mpi.1.0 --> rank=8, hostname=compute-0-1.local, at ../Analysis/Geno/maker/bin/maker line 1451 thread 1. --> rank=8, hostname=compute-0-1.local deleted:0 hits deleted:0 hits preparing ab-inits deleted:0 hits deleted:0 hits FATAL: Thread terminated, causing all processes to fail --> rank=8, hostname=compute-0-1.local deleted:0 hits Basically im tring to run a maker with dna.fa, rna.fa, prot.fa and my_custom_lib_of_repeats.fa, to produce raw genes models which will be used to train SNAP. I already used several command lines and all gave me the same error.. The only change between different tests was the local of the error, sometimes happened in compute-0-1.local other time in compute-0-4.local or in another one. mpiexec -n 63 --hostfile Host maker 1>1.log 2>2.err mpiexec --hostfile Host maker 1>1.log 2>2.err mpiexec -mca btl ^openib -n 63 --hostfile Host maker 1>1.log 2>2.err nohup mpiexec -mca btl ^openib -n 63 --hostfile Host maker -a 1>1.log 2>2.err The log file as well the option files are provided below. Many thanks in advance, Andr? -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 2.log Type: text/x-log Size: 38655 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_exe.ctl Type: application/octet-stream Size: 1224 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_opts.ctl Type: application/octet-stream Size: 4548 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_bopts.ctl Type: application/octet-stream Size: 1413 bytes Desc: not available URL: From liorglck at gmail.com Wed Jul 4 06:28:14 2018 From: liorglck at gmail.com (Lior Glick) Date: Wed, 4 Jul 2018 14:28:14 +0200 Subject: [maker-devel] How sensitive is MAKER to redundant/partial transcripts? Message-ID: Dear MAKER users, I am new to MAKER and would like your advice. I am planning to annotate multiple genomes of tomato variants and wild relatives. To this end, I have been working on generating a diverse transcripts data set to be used as input for MAKER (along with protein sequences and the 'official' tomato annotation). My transcripts set was generated by collecting multiple available RNA-Seq results from SRA, covering diverse variants, conditions and tissues, and assembling them into transcripts using Trinity. My goal is to have a data set as diverse and broad as possible. Now I have ~30 fasta files of transcripts, originating from different studies. Of course, many of the transcripts are redundant and/or partial. I am exploring ways to merge the multiple data sets into a non-redundant one, while also stitching partial transcripts into longer ones based on overlaps. However, this turns out to be not-so-trivial and I am wandering if this is really necessary in order to get a good annotation? Maybe I can just concatenate all my transcriptome assembly results, and MAKER will handle redundant and partial transcripts? Can someone clarify how this works, and try to assess if an annotation based on a merged data set should be superior to one that didn't undergo such a process? If someone has actual experience with such data, that would be really helpful, but any advice would be highly appreciated. Thanks a lot and best regards, Lior -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Jul 12 14:05:00 2018 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 12 Jul 2018 14:05:00 -0600 Subject: [maker-devel] =?utf-8?q?Maker_Error_=3A_Thread_1_terminated_abno?= =?utf-8?q?rmally=2E=2E_Andr=C3=A9_Machado_=3Candremmachado25=40gmail=2Eco?= =?utf-8?q?m=3E_AttachmentsJun_26_=288_days_ago=29_to_maker-devel_Hi_=2C_F?= =?utf-8?q?irst_of_all_thanks_for_your_efforts_in_Maker_pipeline=2E_Its_a_?= =?utf-8?q?tremendous_help_for_the_people_that_works_with_genomes=2E_In_th?= =?utf-8?q?e_last_4_days_i_have_broke_my_head=2E=2E_with_an_error_=2E=2E_b?= =?utf-8?q?ut_still_without_a_solution=2E_I_found_this_old_thread=3A_https?= =?utf-8?q?=3A//groups=2Egoogle=2Ecom/forum/=23!msg/maker-devel/X2-76BH9gv?= =?utf-8?q?g/rU4kLJ3B6tsJ_Seems_to_be_a_quite_similar=2E=2E=2E_but_don=27t?= =?utf-8?q?_point_to_a_specific_solution=2E_I_have_run_maker_with_the_data?= =?utf-8?q?_test_and_all_runned_ok=2E_Maker_finalize_the_entire_process_wi?= =?utf-8?q?thout_errors=2E_Recently=2C_i=E2=80=99m_trying_to_aplly_my_own_?= =?utf-8?q?data_on_MPI_cluster=2E_But_this_error=2C_frequently_occurred=2E?= =?utf-8?q?_Thread_1_terminated_abnormally=3A_=2E=2E/dna=2Emaker=2Eoutput/?= =?utf-8?q?mpi=5Fblastdb/dna=252Efa=2Empi=2E1/dna=252Efa=2Empi=2E1=2E0_--?= =?utf-8?q?=3E_rank=3D8=2C_hostname=3Dcompute-0-1=2Elocal=2C_at_=2E=2E/Ana?= =?utf-8?q?lysis/Geno/maker/bin/maker_line_1451_thread_1=2E_--=3E_rank=3D8?= =?utf-8?q?=2C_hostname=3Dcompute-0-1=2Elocal_deleted=3A0_hits_deleted=3A0?= =?utf-8?q?_hits_preparing_ab-ini?= In-Reply-To: References: Message-ID: <5F1E5499-239E-405E-81EC-CECC755D7838@gmail.com> Because you truncated / removed line before the actual error (I need to see the several hundred lines that happened before "Thread 1 terminated abnormally: ../dna.maker.output/mpi_blastdb/dna%2Efa.mpi.1/dna%2Efa.mpi.1.0?), I can?t give hyou more info. But you are getting a lot of OpenMPI complaints at the start. You may need to reinstall OpenMPI or use MPICH instead (both will require you to reinstall maker as it will need to rebuild the MPI C/Perl binding for the new installation). Also when using OpenMPI, make sure to export LD_PRELOAD in the way outlined in the ?/maker/INSTALL instructions. ?Carson > On Jul 4, 2018, at 5:16 AM, Andr? Machado wrote: > > Hi , > > First of all thanks for your efforts in Maker pipeline. Its a tremendous help for the people that works with genomes. > In the last 4 days i have broke my head.. with an error .. but still without a solution. > I found this old thread: https://groups.google.com/forum/#!msg/maker-devel/X2-76BH9gvg/rU4kLJ3B6tsJ > Seems to be a quite similar... but don't point to a specific solution. > I have run maker with the data test and all runned ok. Maker finalize the entire process without errors. > Recently, i?m trying to aplly my own data on MPI cluster. But this error, frequently occurred. > Thread 1 terminated abnormally: ../dna.maker.output/mpi_blastdb/dna%2Efa.mpi.1/dna%2Efa.mpi.1.0 > --> rank=8, hostname=compute-0-1.local, at ../Analysis/Geno/maker/bin/maker line 1451 thread 1. > --> rank=8, hostname=compute-0-1.local > deleted:0 hits > deleted:0 hits > preparing ab-inits > deleted:0 hits > deleted:0 hits > FATAL: Thread terminated, causing all processes to fail > --> rank=8, hostname=compute-0-1.local > deleted:0 hits > > Basically im tring to run a maker with dna.fa, rna.fa, prot.fa and my_custom_lib_of_repeats.fa, to produce raw genes models which will be used to train SNAP. > > I already used several command lines and all gave me the same error.. The only change between different tests was the local of the error, sometimes happened in compute-0-1.local other time in compute-0-4.local or in another one. > mpiexec -n 63 --hostfile Host maker 1>1.log 2>2.err > > mpiexec --hostfile Host maker 1>1.log 2>2.err > mpiexec -mca btl ^openib -n 63 --hostfile Host maker 1>1.log 2>2.err > nohup mpiexec -mca btl ^openib -n 63 --hostfile Host maker -a 1>1.log 2>2.err > > The log file as well the option files are provided below. > > Many thanks in advance, > > Andr? > > <2.log>_______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Jul 12 14:38:33 2018 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 12 Jul 2018 14:38:33 -0600 Subject: [maker-devel] How sensitive is MAKER to redundant/partial transcripts? In-Reply-To: References: Message-ID: MAKER will automatically collapse redundant evidence. The only thing you may need to worry about with too many datasets is background transcription. With more datasets you will have more spurious assemblies from background transcription (if you sequence deep enough everything is transcribed at some level). You should also look at the results in a browser like apollo, you may find that some datasets are more noisy than others and it would be beneficial to drop them especially if they are redundant. So always do a visual review of results. ?Carson > On Jul 4, 2018, at 6:28 AM, Lior Glick wrote: > > Dear MAKER users, > > I am new to MAKER and would like your advice. > I am planning to annotate multiple genomes of tomato variants and wild relatives. To this end, I have been working on generating a diverse transcripts data set to be used as input for MAKER (along with protein sequences and the 'official' tomato annotation). My transcripts set was generated by collecting multiple available RNA-Seq results from SRA, covering diverse variants, conditions and tissues, and assembling them into transcripts using Trinity. My goal is to have a data set as diverse and broad as possible. > Now I have ~30 fasta files of transcripts, originating from different studies. Of course, many of the transcripts are redundant and/or partial. I am exploring ways to merge the multiple data sets into a non-redundant one, while also stitching partial transcripts into longer ones based on overlaps. > However, this turns out to be not-so-trivial and I am wandering if this is really necessary in order to get a good annotation? Maybe I can just concatenate all my transcriptome assembly results, and MAKER will handle redundant and partial transcripts? > Can someone clarify how this works, and try to assess if an annotation based on a merged data set should be superior to one that didn't undergo such a process? If someone has actual experience with such data, that would be really helpful, but any advice would be highly appreciated. > > Thanks a lot and best regards, > Lior > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From shijunpeng at cau.edu.cn Sat Jul 14 02:04:38 2018 From: shijunpeng at cau.edu.cn (=?UTF-8?B?5Y+y5L+K6bmP?=) Date: Sat, 14 Jul 2018 16:04:38 +0800 (GMT+08:00) Subject: [maker-devel] Ask for help about the collapse of Maker (version 2.31.9) when annotated with Fgenesh In-Reply-To: References: Message-ID: <183e519e.83bf.16497d1fd4b.Coremail.shijunpeng@cau.edu.cn> Dear Carson, First of all, I must apologize that I could't post my questions in Google group since I can't get access to Google in mainland China. I am using Maker (version 2.31.9) to annotate several foxtail millet genomes. I combined Augustus and Fgenesh (v.3.1.1) for the de novo annotation of these genomes. The majority of contigs were anotated well with maker pipeline. While, several contigs failed when annotated with Fgenesh with the following error information: #--------- command -------------# Widget::fgenesh: /NAS7/home/shijunpeng/software/maker/bin/../lib/Widget/fgenesh/fgenesh_wrap /NAS7/home/shijunpeng/software/fgenesh/fgenesh /NAS7/home/shijunpeng/software/fgenesh/Monocots /tmp/43438.1.all.q/maker_8zLUxB/0/108_0.4597215-4597401.Monocots.auto_annotator.fgenesh.fasta -exon_table:/tmp/43438.1.all.q/maker_8zLUxB/0/108_0.4597215-4597401.Monocots.auto_annotator.xdef.fgenesh > /tmp/43438.1.all.q/maker_8zLUxB/0/108_0.4597215- #-------------------------------# ERROR: FgenesH failed --> rank=NA, hostname=bioinfor3.local ERROR: Failed while annotating transcripts ERROR: Chunk failed at level:1, tier_type:4 FAILED CONTIG:scaffold_1 ERROR: Chunk failed at level:6, tier_type:0 FAILED CONTIG:scaffold_1 ############################################################################################################################################### A system core file generated after this collapse. I checked the temperate fasta file 108_0.4597215-4597401.Monocots.auto_annotator.fgenesh.fasta to be normal about ~300 bp. I also checked my original sequence file and confirmed no problem (A,T,C,G and N). I also tried to set the pred_flank option from 200 (original) to 0 and the error still exists. I ran the Maker pipeline in a single node with 16 processors and 256 Gb RAMs, so it may be not due to the MPI problems. Below were my detailed maker bahavior options: #-----MAKER Behavior Options max_dna_len=300000 #length for dividing up contigs into chunks (increases/decreases memory usage) min_contig=10000 #skip genome contigs below this length (under 10kb are often useless) pred_flank=0 #flank for extending evidence clusters sent to gene predictors pred_stats=1 #report AED and QI statistics for all predictions as well as models AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and 1) min_protein=0 #require at least this many amino acids in predicted proteins alt_splice=1 #Take extra steps to try and find alternative splicing, 1 = yes, 0 = no always_complete=1 #extra steps to force start and stop codons, 1 = yes, 0 = no map_forward=1 #map names and attributes forward from old GFF3 genes, 1 = yes, 0 = no keep_preds=0 #Concordance threshold to add unsupported gene prediction (bound by 0 and 1) split_hit=10000 #length for the splitting of hits (expected max intron size for evidence alignments) single_exon=0 #consider single exon EST evidence when generating annotations, 1 = yes, 0 = no single_length=250 #min length required for single exon ESTs if 'single_exon is enabled' correct_est_fusion=0 #limits use of ESTs in annotation to avoid fusion genes tries=5 #number of times to try a contig if there is a failure for some reason clean_try=0 #remove all data from previous run before retrying, 1 = yes, 0 = no clean_up=0 #removes theVoid directory with individual analysis files, 1 = yes, 0 = no TMP= #specify a directory other than the system default temporary directory for temporary files Could you please help me to solve this error? I am looking forward to hearing from you. Sincerely, Junpeng -- Junpeng Shi, PhD State Key Lab For Agrobiotech, China Agricultural University National Maize Improvement Center of China Center For Life Science, NO.2, The West Street of Yuanmingyuan Park, Beijing, P.R.China Tel?+86-13581863941 From liorglic at mail.tau.ac.il Tue Jul 24 01:45:06 2018 From: liorglic at mail.tau.ac.il (Lior Glick) Date: Tue, 24 Jul 2018 09:45:06 +0200 Subject: [maker-devel] Annotation of a new variant within a species Message-ID: Hello, I am trying to annotate multiple variants of tomato. While a good annotation of the reference genome is available, I have denovo-assembled other variants of the same species and wish to annotate them. Most MAKER documentation refers to annotation of a new species, while using transcripts and proteins from either the exact same sample (individual) or from "an alternate organism", so I'm not sure what to do in this case, where I am annotating various samples from the same species. I have two questions: 1. Regarding transcripts data, how should I use transcripts from other variants of the same species? Namely, should I use the est or the altest parameter? What is the actual difference in behavior? 2. Is there a way to incorporate gene models (in gff format) from the reference annotation? I expect high similarity in my assembled variants, but not identity in terms of content and coordinates, so neither pred_gff nor model_gff sound like what I need, as far as I understand. I could also use the reference annotation and sequence to extract cDNA and provide them as EST data. Is this the way to go? It feels like some information on introns might be lost this way. Would highly appreciate your answers to these questions or any other advice. Thank you very much! -------------- next part -------------- An HTML attachment was scrubbed... URL: From roscito at mpi-cbg.de Tue Jul 31 06:59:58 2018 From: roscito at mpi-cbg.de (Ju Roscito) Date: Tue, 31 Jul 2018 14:59:58 +0200 Subject: [maker-devel] Few alternative isoforms when alt_splice=0 Message-ID: <2C92DF72-0733-490F-A2EE-6F3724EF7099@mpi-cbg.de> Dear all, I have a question about the behaviour of alt_splice option, seems there?s not much about it on the forum. I have run a single round of MAKER (2.31.9) on a vertebrate genome, with trinity mRNA data and mapped proteins from closely-related species. I set alt_splice to 0, but still got from two to four mRNAs for ~20 out of the 19.000 predicted genes. Has someone also seen the same? Any idea why would that happen? Thanks a lot in advance. From timo.metz at googlemail.com Fri Jul 20 06:20:05 2018 From: timo.metz at googlemail.com (Timo Metz) Date: Fri, 20 Jul 2018 12:20:05 -0000 Subject: [maker-devel] MAKER chooser algorithm Message-ID: Hey, I am working on the improvement of an already existing annotation. I could find that sometimes MAKER would split or merge genes where it intuitively does not look correct when looking at the evidence. Please find two examples attached. The first track is the old annotation, the second track the new annotation, then there is RNA-seq data, proteins, repeats, snap prediction, augustus prediction. It is visible, that in both cases the evidence supports two genes, and one gene predictor in each case tends to create one gene where the other one creates two genes. I do not understand why in this case the gene is merged, if evidence and also one ab initio prediction support rather two genes. Are there any suggestions on how to solve this? best Timo -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Picture1.png Type: image/png Size: 26778 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Picutre2.png Type: image/png Size: 24145 bytes Desc: not available URL: From cganote at iu.edu Tue Jul 24 10:31:02 2018 From: cganote at iu.edu (Ganote, Carrie L) Date: Tue, 24 Jul 2018 16:31:02 -0000 Subject: [maker-devel] Maker ignores evidence and just returns gffs with genome contigs Message-ID: Running maker, I don't see anything in the gff except the names of the contigs and their lengths: ##gff-version 3 SczI0sq_2092%3%3D3122 . contig 1 119548 . . . ID=SczI0sq_2092%3%3D3122;Name=SczI0sq_2092%3%3D3122 ### SczI0sq_842%3%3D1778 . contig 1 4693 . . . ID=SczI0sq_842%3B%3D1778;Name=SczI0sq_842%3%3D1778 ### ... In my opts file, I have: #-----Genome (these are always required) genome=/projects/Reference/genome.chr.fa #genome sequence (fasta file or fasta embeded in GFF3 file) organism_type=eukaryotic #eukaryotic or prokaryotic. Default is eukaryotic #-----Re-annotation Using MAKER Derived GFF3 maker_gff= #MAKER derived GFF3 file est_pass=0 #use ESTs in maker_gff: 1 = yes, 0 = no altest_pass=0 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no protein_pass=0 #use protein alignments in maker_gff: 1 = yes, 0 = no rm_pass=0 #use repeats in maker_gff: 1 = yes, 0 = no model_pass=0 #use gene models in maker_gff: 1 = yes, 0 = no pred_pass=0 #use ab-initio predictions in maker_gff: 1 = yes, 0 = no other_pass=0 #passthrough anyything else in maker_gff: 1 = yes, 0 = no #-----EST Evidence (for best results provide a file for at least one) est= #set of ESTs or assembled mRNA-seq in fasta format altest= #EST/cDNA sequence file in fasta format from an alternate organism est_gff=/projects/Reference/Maker/EST_assembled.all.gff #aligned ESTs or mRNA-seq from an external GFF3 file altest_gff= #aligned ESTs from a closly relate species in GFF3 format #-----Protein Homology Evidence (for best results provide a file for at least one) protein= #protein sequence file in fasta format (i.e. from mutiple oransisms) protein_gff=/projects/Reference/Maker/exonerate_withCC.gff3 #aligned protein homology evidence from an external GFF3 file #-----Repeat Masking (leave values blank to skip repeat masking) model_org= #select a model organism for RepBase masking in RepeatMasker rmlib= #provide an organism specific repeat library in fasta format for RepeatMasker repeat_protein= #provide a fasta file of transposable element proteins for RepeatRunner rm_gff= #pre-identified repeat elements from an external GFF3 file prok_rm=0 #forces MAKER to repeatmask prokaryotes (no reason to change this), 1 = yes, 0 = no softmask=1 #use soft-masking rather than hard-masking in BLAST (i.e. seg and dust filtering) #-----Gene Prediction snaphmm= #SNAP HMM file gmhmm= #GeneMark HMM file augustus_species= #Augustus gene prediction species model fgenesh_par_file= #FGENESH parameter file pred_gff=/projects/Reference/Maker/augustus_output.reformated.gff #ab-initio predictions from an external GFF3 file model_gff= #annotated gene models from an external GFF3 file (annotation pass-through) est2genome=0 #infer gene predictions directly from ESTs, 1 = yes, 0 = no protein2genome=0 #infer predictions from protein homology, 1 = yes, 0 = no trna=0 #find tRNAs with tRNAscan, 1 = yes, 0 = no snoscan_rrna= #rRNA file to have Snoscan find snoRNAs unmask=0 #also run ab-initio prediction programs on unmasked sequence, 1 = yes, 0 = no #-----Other Annotation Feature Types (features MAKER doesn't recognize) other_gff= #extra features to pass-through to final MAKER generated GFF3 file #-----External Application Behavior Options alt_peptide=C #amino acid used to replace non-standard amino acids in BLAST databases cpus=1 #max number of cpus to use in BLAST and RepeatMasker (not for MPI, leave 1 when using MPI) #-----MAKER Behavior Options max_dna_len=100000 #length for dividing up contigs into chunks (increases/decreases memory usage) min_contig=1 #skip genome contigs below this length (under 10kb are often useless) pred_flank=200 #flank for extending evidence clusters sent to gene predictors pred_stats=0 #report AED and QI statistics for all predictions as well as models AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and 1) min_protein=0 #require at least this many amino acids in predicted proteins alt_splice=0 #Take extra steps to try and find alternative splicing, 1 = yes, 0 = no always_complete=0 #extra steps to force start and stop codons, 1 = yes, 0 = no map_forward=0 #map names and attributes forward from old GFF3 genes, 1 = yes, 0 = no keep_preds=0 #Concordance threshold to add unsupported gene prediction (bound by 0 and 1) split_hit=10000 #length for the splitting of hits (expected max intron size for evidence alignments) single_exon=0 #consider single exon EST evidence when generating annotations, 1 = yes, 0 = no single_length=250 #min length required for single exon ESTs if 'single_exon is enabled' correct_est_fusion=0 #limits use of ESTs in annotation to avoid fusion genes tries=2 #number of times to try a contig if there is a failure for some reason clean_try=0 #remove all data from previous run before retrying, 1 = yes, 0 = no clean_up=0 #removes theVoid directory with individual analysis files, 1 = yes, 0 = no TMP= #specify a directory other than the system default temporary directory for temporary files It ran for ~3 hours and all contigs in the log file said FINISHED. No failures. Did I set something wrong? -Carrie -------------- next part -------------- An HTML attachment was scrubbed... URL: