From carsonhh at gmail.com Wed Sep 1 10:25:23 2021 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 1 Sep 2021 10:25:23 -0600 Subject: [maker-devel] Fatal error in PMPI_Send In-Reply-To: <7bad655de17b4a1ab537e1b624c1da06@unil.ch> References: <7bad655de17b4a1ab537e1b624c1da06@unil.ch> Message-ID: <96237564-C1AC-4BB8-B332-F93560999165@gmail.com> Set --debugmpi as a command line option to maker. It will produce a bunch of extra output that could help. You will want to capture the output to a file. Example: maker --debugmpi &> all_output That might help track down exactly where it happens. From the error below I gather that there is too much data to transfer over MPI. The size of the data message causes the count variable to overflow the max positive value which loops back around to a negative value as a result. It makes me think something is odd with your data input, i.e. you are trying to align raw mRNA-seq reads as evidence instead of assembled read, you set the max_dna_len to a value that is way too large in the control files, or you have an odd contig and evidence dataset resulting in alignment depth in the tens of thousands for a single locus. If that?s the case, set all of the depth_blast parameters in maker_bopts (20 is a good number). ?Carson > On Aug 30, 2021, at 2:45 AM, Patrick Tran Van wrote: > > Hi, > > I have this strange error that kill my job and tthat seems to occur randomly: > > running est2genome search. > #--------- command -------------# > Widget::exonerate::est2genome: > exonerate -q /tmp/maker_JZ_DUt/12/MSTRG%2E13631%2E1.for.154952409-154955259.12.fasta -t /tmp/maker_JZ_DUt/12/Tps_LRv5b_scf2.154952409-154955259.12.fasta -Q dna -T dna --model est2genome --minintron 20 --maxintron 100000 --showcigar --percent 20 > /tmp/maker_JZ_DUt/12/Tps_LRv5b_scf2.154952409-154955259.MSTRG%2E13631%2E1.e.exonerate > #-------------------------------# > Fatal error in PMPI_Send: Invalid count, error stack: > PMPI_Send(159): MPI_Send(buf=0x7f526282e010, count=-1326766804, MPI_CHAR, dest=22, tag=9999, MPI_COMM_WORLD) failed > PMPI_Send(99).: Negative count, value is -1326766804 > > > > Do you know what could be the problem ? > > Best, > > Patrick Tran Van > > Bioinformatician: Lab Chapuisat & Schwander > Department of Ecology and Evolution > University of Lausanne > Lausanne - Switzerland > Office 3206 > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 1376 bytes Desc: not available URL: From kmkocot at ua.edu Fri Sep 10 09:19:14 2021 From: kmkocot at ua.edu (Kevin Kocot) Date: Fri, 10 Sep 2021 15:19:14 +0000 Subject: [maker-devel] Troubleshooting Maker failure In-Reply-To: References: Message-ID: Hi Carson and all, I ran Maker 3.01.03 on a chromosome-level mollusc genome assembly using some evidence I previously generated with Funannotate and BRAKER2, but Maker is not completing successfully. Every scaffold in the datastore_index.log file has both STARTED and FAILED statuses. I can't figure out where the problem lies, though. Running ./Build status indicates all the dependencies are there (I'm not using MPI). The run.log files just ends with "DIED RANK 0" and "DIED COUNT 3." Is there a way to tell which (if any) of the dependencies is misbehaving here (they all seem to run fine independently) or if my evidence .gff3 files are not correctly formatted? I've uploaded a zipped sample output folder as well as my config files and the datastore.log file here: http://genomes.ua.edu/Kocot/2021-09-10_Maker/ Any guidance on what might be the issue would be greatly appreciated. Thanks! Kevin Kevin M. Kocot he/him/his Associate Professor & Curator of Invertebrates Department of Biological Sciences & Alabama Museum of Natural History The University of Alabama 307 Mary Harmon Bryant Hall Box 870344 Tuscaloosa, AL 35487 phone 205-348-4052 | fax 205-348-4039 kmkocot at ua.edu | www.kocotlab.com https://uasystem.zoom.us/j/3755490727 -------------- next part -------------- An HTML attachment was scrubbed... URL: From kmkocot at ua.edu Fri Sep 3 16:09:57 2021 From: kmkocot at ua.edu (Kevin Kocot) Date: Fri, 3 Sep 2021 22:09:57 +0000 Subject: [maker-devel] Maker failing but not sure what dependency is the culprit Message-ID: Hello! I ran Maker on a chromosome-level mollusc genome using some evidence I previously generated with Funannotate and BRAKER2, but Maker is not completing successfully. I can't figure out where the problem lies, though. The run.log files just end with "DIED RANK 0" and "DIED COUNT 3." Attached is a sample output folder (from this location: /home/wirenia/Desktop/2021-08-11_MAKER_Dreissena_rostriformis/PGA_assembly_shortened_headers.fasta.maker.output/PGA_assembly_shortened_headers.fasta_datastore/0A/5A) as well as my config files and master datastore index log. Any guidance on what might be the issue would be greatly appreciated. Thanks! Kevin -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: PGA_assembly_shortened_headers.fasta_master_datastore_index.log Type: text/x-log Size: 474332 bytes Desc: PGA_assembly_shortened_headers.fasta_master_datastore_index.log URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_opts.log Type: text/x-log Size: 5207 bytes Desc: maker_opts.log URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_exe.log Type: text/x-log Size: 1610 bytes Desc: maker_exe.log URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_evm.log Type: text/x-log Size: 893 bytes Desc: maker_evm.log URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_bopts.log Type: text/x-log Size: 1479 bytes Desc: maker_bopts.log URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 5A.zip Type: application/zip Size: 48184 bytes Desc: 5A.zip URL: From stuckerta at gmail.com Mon Sep 6 14:50:13 2021 From: stuckerta at gmail.com (Adam Stuckert) Date: Mon, 6 Sep 2021 14:50:13 -0600 Subject: [maker-devel] Issues including Repeat Masker gff in Maker runs Message-ID: Hi, I have been working on this problem for a while now, but can't seem to come up with a solution despite extensive searching. When I include a repeat masker gff, it always fails with messages like this: #--------------------------------------------------------------------- Now starting the contig!! SeqID: P_RNA_scaffold_115 Length: 2076116 #--------------------------------------------------------------------- setting up GFF3 output and fasta chunks doing repeat masking ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Did not specify a Hit End or Hit Begin STACK: Error::throw STACK: Bio::Root::Root::throw /mnt/oldhome/software/anaconda/colsa/envs/maker-3.01.02/lib/perl5/site_perl/5.22.0/Bio/Root/Root.pm:447 STACK: Bio::Search::HSP::GenericHSP::_subject_seq_feature /mnt/oldhome/software/anaconda/colsa/envs/maker-3.01.02/lib/perl5/site_perl/5.22.0/Bio/Search/HSP/GenericHSP.pm:1603 STACK: Bio::Search::HSP::GenericHSP::hit /mnt/oldhome/software/anaconda/colsa/envs/maker-3.01.02/lib/perl5/site_perl/5.22.0/Bio/Search/HSP/GenericHSP.pm:987 STACK: repeat_mask_seq::separate_types /mnt/oldhome/macmaneslab/macmanes/test/maker/bin/../lib/ repeat_mask_seq.pm:307 STACK: repeat_mask_seq::mask_chunk /mnt/oldhome/macmaneslab/macmanes/test/maker/bin/../lib/ repeat_mask_seq.pm:191 STACK: Process::MpiChunk::_go /mnt/oldhome/macmaneslab/macmanes/test/maker/bin/../lib/Process/MpiChunk.pm:762 STACK: Process::MpiChunk::run /mnt/oldhome/macmaneslab/macmanes/test/maker/bin/../lib/Process/MpiChunk.pm:340 STACK: Process::MpiChunk::run_all /mnt/oldhome/macmaneslab/macmanes/test/maker/bin/../lib/Process/MpiChunk.pm:356 STACK: Process::MpiTiers::run_all /mnt/oldhome/macmaneslab/macmanes/test/maker/bin/../lib/Process/MpiTiers.pm:287 STACK: Process::MpiTiers::run_all /mnt/oldhome/macmaneslab/macmanes/test/maker/bin/../lib/Process/MpiTiers.pm:287 STACK: /mnt/lustre/macmaneslab/macmanes/test/maker/bin/maker:679 ----------------------------------------------------------- --> rank=NA, hostname=node143.rcchpc ERROR: Failed while doing repeat masking ERROR: Chunk failed at level:0, tier_type:1 FAILED CONTIG:P_RNA_scaffold_54 ERROR: Chunk failed at level:2, tier_type:0 FAILED CONTIG:P_RNA_scaffold_54 I have tried a number of ways to modify the output of RepeatMasker to fit with Maker's expectation, and to me, the gff looks fine (see attached, which is the first 500 lines of a repeatmasker.gff I've used. Note it says "similarity" for hit type, but I've also changed the gff to only have "match_part" and had the same issue). I have had this issue across multiple clusters. Any suggestions? Thanks, Adam -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: rm.500lines.gff Type: application/octet-stream Size: 55243 bytes Desc: not available URL: From carsonhh at gmail.com Tue Sep 14 20:51:37 2021 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 14 Sep 2021 20:51:37 -0600 Subject: [maker-devel] Troubleshooting Maker failure In-Reply-To: References: Message-ID: <7CC4EAAC-85A5-4696-8919-D5243F997942@gmail.com> What I really need is the captured STDERR from the failed run. ?Carson > On Sep 10, 2021, at 9:19 AM, Kevin Kocot wrote: > > Hi Carson and all, > > I ran Maker 3.01.03 on a chromosome-level mollusc genome assembly using some evidence I previously generated with Funannotate and BRAKER2, but Maker is not completing successfully. Every scaffold in the datastore_index.log file has both STARTED and FAILED statuses. I can't figure out where the problem lies, though. Running ./Build status indicates all the dependencies are there (I?m not using MPI). The run.log files just ends with "DIED RANK 0" and "DIED COUNT 3." Is there a way to tell which (if any) of the dependencies is misbehaving here (they all seem to run fine independently) or if my evidence .gff3 files are not correctly formatted? > > I?ve uploaded a zipped sample output folder as well as my config files and the datastore.log file here: http://genomes.ua.edu/Kocot/2021-09-10_Maker/ > > Any guidance on what might be the issue would be greatly appreciated. > > Thanks! > Kevin > > Kevin M. Kocot > he/him/his > Associate Professor & Curator of Invertebrates > Department of Biological Sciences & Alabama Museum of Natural History > The University of Alabama > 307 Mary Harmon Bryant Hall > Box 870344 > Tuscaloosa, AL 35487 > phone 205-348-4052 | fax 205-348-4039 > kmkocot at ua.edu | www.kocotlab.com > https://uasystem.zoom.us/j/3755490727 > > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 1376 bytes Desc: not available URL: From carsonhh at gmail.com Tue Sep 14 21:20:17 2021 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 14 Sep 2021 21:20:17 -0600 Subject: [maker-devel] Issues including Repeat Masker gff in Maker runs In-Reply-To: References: Message-ID: <94399EE4-8EB1-49DA-997C-3A8B4D210D46@gmail.com> A couple of things. First let MAKER run RepeatMasker. Don?t provide the RepeatMasker GFF as input to MAKER. The file you sent is GFF version 2 for example which is not backwards compatible with GFF version 3. Second use the latest versions of MAKER2 or MAKER3. There is an issue with RepeatMasker sometimes producing start/end coordinates that are 0 or even negative numbers. The current releases of MAKER2/3 know how to find and fix invalid coordinate features. ?Carson > On Sep 6, 2021, at 2:50 PM, Adam Stuckert wrote: > > Hi, > > I have been working on this problem for a while now, but can't seem to come up with a solution despite extensive searching. When I include a repeat masker gff, it always fails with messages like this: > > #--------------------------------------------------------------------- > Now starting the contig!! > SeqID: P_RNA_scaffold_115 > Length: 2076116 > #--------------------------------------------------------------------- > > > setting up GFF3 output and fasta chunks > doing repeat masking > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Did not specify a Hit End or Hit Begin > STACK: Error::throw > STACK: Bio::Root::Root::throw /mnt/oldhome/software/anaconda/colsa/envs/maker-3.01.02/lib/perl5/site_perl/5.22.0/Bio/Root/Root.pm:447 > STACK: Bio::Search::HSP::GenericHSP::_subject_seq_feature /mnt/oldhome/software/anaconda/colsa/envs/maker-3.01.02/lib/perl5/site_perl/5.22.0/Bio/Search/HSP/GenericHSP.pm:1603 > STACK: Bio::Search::HSP::GenericHSP::hit /mnt/oldhome/software/anaconda/colsa/envs/maker-3.01.02/lib/perl5/site_perl/5.22.0/Bio/Search/HSP/GenericHSP.pm:987 > STACK: repeat_mask_seq::separate_types /mnt/oldhome/macmaneslab/macmanes/test/maker/bin/../lib/repeat_mask_seq.pm:307 > STACK: repeat_mask_seq::mask_chunk /mnt/oldhome/macmaneslab/macmanes/test/maker/bin/../lib/repeat_mask_seq.pm:191 > STACK: Process::MpiChunk::_go /mnt/oldhome/macmaneslab/macmanes/test/maker/bin/../lib/Process/MpiChunk.pm:762 > STACK: Process::MpiChunk::run /mnt/oldhome/macmaneslab/macmanes/test/maker/bin/../lib/Process/MpiChunk.pm:340 > STACK: Process::MpiChunk::run_all /mnt/oldhome/macmaneslab/macmanes/test/maker/bin/../lib/Process/MpiChunk.pm:356 > STACK: Process::MpiTiers::run_all /mnt/oldhome/macmaneslab/macmanes/test/maker/bin/../lib/Process/MpiTiers.pm:287 > STACK: Process::MpiTiers::run_all /mnt/oldhome/macmaneslab/macmanes/test/maker/bin/../lib/Process/MpiTiers.pm:287 > STACK: /mnt/lustre/macmaneslab/macmanes/test/maker/bin/maker:679 > ----------------------------------------------------------- > --> rank=NA, hostname=node143.rcchpc > ERROR: Failed while doing repeat masking > ERROR: Chunk failed at level:0, tier_type:1 > FAILED CONTIG:P_RNA_scaffold_54 > > ERROR: Chunk failed at level:2, tier_type:0 > FAILED CONTIG:P_RNA_scaffold_54 > > I have tried a number of ways to modify the output of RepeatMasker to fit with Maker's expectation, and to me, the gff looks fine (see attached, which is the first 500 lines of a repeatmasker.gff I've used. Note it says "similarity" for hit type, but I've also changed the gff to only have "match_part" and had the same issue). I have had this issue across multiple clusters. > > Any suggestions? > > Thanks, > Adam > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 1376 bytes Desc: not available URL: From kmkocot at ua.edu Mon Sep 20 04:55:59 2021 From: kmkocot at ua.edu (Kevin Kocot) Date: Mon, 20 Sep 2021 10:55:59 +0000 Subject: [maker-devel] [EXTERNAL] Re: Troubleshooting Maker failure Message-ID: Thanks Carson! I've uploaded that file here: http://genomes.ua.edu/Kocot/2021-09-10_Maker/round1_run_maker_try2.log ________________________________ From: Carson Holt Sent: Tuesday, September 14, 2021 9:51 PM To: Kevin Kocot Cc: maker-devel at yandell-lab.org Subject: [EXTERNAL] Re: [maker-devel] Troubleshooting Maker failure What I really need is the captured STDERR from the failed run. ?Carson On Sep 10, 2021, at 9:19 AM, Kevin Kocot > wrote: Hi Carson and all, I ran Maker 3.01.03 on a chromosome-level mollusc genome assembly using some evidence I previously generated with Funannotate and BRAKER2, but Maker is not completing successfully. Every scaffold in the datastore_index.log file has both STARTED and FAILED statuses. I can't figure out where the problem lies, though. Running ./Build status indicates all the dependencies are there (I?m not using MPI). The run.log files just ends with "DIED RANK 0" and "DIED COUNT 3." Is there a way to tell which (if any) of the dependencies is misbehaving here (they all seem to run fine independently) or if my evidence .gff3 files are not correctly formatted? I?ve uploaded a zipped sample output folder as well as my config files and the datastore.log file here: http://genomes.ua.edu/Kocot/2021-09-10_Maker/ Any guidance on what might be the issue would be greatly appreciated. Thanks! Kevin Kevin M. Kocot he/him/his Associate Professor & Curator of Invertebrates Department of Biological Sciences & Alabama Museum of Natural History The University of Alabama 307 Mary Harmon Bryant Hall Box 870344 Tuscaloosa, AL 35487 phone 205-348-4052 | fax 205-348-4039 kmkocot at ua.edu | www.kocotlab.com https://uasystem.zoom.us/j/3755490727 _______________________________________________ maker-devel mailing list maker-devel at yandell-lab.org http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Sep 20 07:06:29 2021 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 20 Sep 2021 07:06:29 -0600 Subject: [maker-devel] [EXTERNAL] Re: Troubleshooting Maker failure In-Reply-To: References: Message-ID: <3F3F5F45-E733-4AC5-892F-39E4964C23D0@gmail.com> Hi Kevin, The files are already being skipped because of previous failures. Can you increase the try count (-t on the command line) to something like 6, and send me the STDERR after it generates a new failure. ?Carson Sent from my iPhone > On Sep 20, 2021, at 4:56 AM, Kevin Kocot wrote: > > ? > Thanks Carson! I've uploaded that file here: > http://genomes.ua.edu/Kocot/2021-09-10_Maker/round1_run_maker_try2.log > > > From: Carson Holt > Sent: Tuesday, September 14, 2021 9:51 PM > To: Kevin Kocot > Cc: maker-devel at yandell-lab.org > Subject: [EXTERNAL] Re: [maker-devel] Troubleshooting Maker failure > > What I really need is the captured STDERR from the failed run. > > ?Carson > >> On Sep 10, 2021, at 9:19 AM, Kevin Kocot wrote: >> >> Hi Carson and all, >> >> I ran Maker 3.01.03 on a chromosome-level mollusc genome assembly using some evidence I previously generated with Funannotate and BRAKER2, but Maker is not completing successfully. Every scaffold in the datastore_index.log file has both STARTED and FAILED statuses. I can't figure out where the problem lies, though. Running ./Build status indicates all the dependencies are there (I?m not using MPI). The run.log files just ends with "DIED RANK 0" and "DIED COUNT 3." Is there a way to tell which (if any) of the dependencies is misbehaving here (they all seem to run fine independently) or if my evidence .gff3 files are not correctly formatted? >> >> I?ve uploaded a zipped sample output folder as well as my config files and the datastore.log file here: http://genomes.ua.edu/Kocot/2021-09-10_Maker/ >> >> Any guidance on what might be the issue would be greatly appreciated. >> >> Thanks! >> Kevin >> >> Kevin M. Kocot >> he/him/his >> Associate Professor & Curator of Invertebrates >> Department of Biological Sciences & Alabama Museum of Natural History >> The University of Alabama >> 307 Mary Harmon Bryant Hall >> Box 870344 >> Tuscaloosa, AL 35487 >> phone 205-348-4052 | fax 205-348-4039 >> kmkocot at ua.edu | www.kocotlab.com >> https://uasystem.zoom.us/j/3755490727 >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at yandell-lab.org >> http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kmkocot at ua.edu Wed Sep 22 08:06:38 2021 From: kmkocot at ua.edu (Kevin Kocot) Date: Wed, 22 Sep 2021 14:06:38 +0000 Subject: [maker-devel] [EXTERNAL] Re: Troubleshooting Maker failure In-Reply-To: <3F3F5F45-E733-4AC5-892F-39E4964C23D0@gmail.com> References: <3F3F5F45-E733-4AC5-892F-39E4964C23D0@gmail.com> Message-ID: Thanks Carson, I think I see the problem now. Here's what I'm getting: ----- STATUS: Parsing control files... STATUS: Processing and indexing input FASTA files... STATUS: Setting up database for any GFF3 input... A data structure will be created for you at: /home/wirenia/Desktop/2021-08-11_MAKER_Dreissena_rostriformis/PGA_assembly_shortened_headers.fasta.maker.output/PGA_assembly_shortened_headers.fasta_datastore To access files for individual sequences use the datastore index: /home/wirenia/Desktop/2021-08-11_MAKER_Dreissena_rostriformis/PGA_assembly_shortened_headers.fasta.maker.output/PGA_assembly_shortened_headers.fasta_master_datastore_index.log STATUS: Now running MAKER... examining contents of the fasta file and run log --Next Contig-- Processing run.log file... #--------------------------------------------------------------------- Now retrying the contig!! SeqID: PGA_scaffold0 Length: 141759199 Tries: 5!! #--------------------------------------------------------------------- setting up GFF3 output and fasta chunks prepare section files Gathering GFF3 input into hits - chunk:0 ERROR: Non-unique top level ID for match.19561.56 While this is technically legal in GFF3, it usually indicates a poorly fomatted GFF3 file (perhaps you tried to merge two GFF3 files without accounting for unique IDs). MAKER will not handle these correctly. --> rank=NA, hostname=wirenia ERROR: Failed while prepare section files ERROR: Chunk failed at level:12, tier_type:3 FAILED CONTIG:PGA_scaffold0 ERROR: Chunk failed at level:4, tier_type:0 FAILED CONTIG:PGA_scaffold0 examining contents of the fasta file and run log ----- It looks like maker doesn't like the format of the exonerate gff3 I am using. I tried 'fixing' it with agat_convert_sp_gxf2gxf.pl, which seemed to work on my Braker output, but that just produced an empty gff3 file for both my exonerate and PASA gff3 files. Any advice on how to prepare exonerate or PASA gff3 files for Maker? Thanks! Kevin ________________________________ From: Carson Holt Sent: Monday, September 20, 2021 8:06 AM To: Kevin Kocot Cc: maker-devel at yandell-lab.org Subject: [EXTERNAL] Re: [maker-devel] Troubleshooting Maker failure Hi Kevin, The files are already being skipped because of previous failures. Can you increase the try count (-t on the command line) to something like 6, and send me the STDERR after it generates a new failure. ?Carson Sent from my iPhone On Sep 20, 2021, at 4:56 AM, Kevin Kocot wrote: ? Thanks Carson! I've uploaded that file here: http://genomes.ua.edu/Kocot/2021-09-10_Maker/round1_run_maker_try2.log ________________________________ From: Carson Holt Sent: Tuesday, September 14, 2021 9:51 PM To: Kevin Kocot Cc: maker-devel at yandell-lab.org Subject: [EXTERNAL] Re: [maker-devel] Troubleshooting Maker failure What I really need is the captured STDERR from the failed run. ?Carson On Sep 10, 2021, at 9:19 AM, Kevin Kocot > wrote: Hi Carson and all, I ran Maker 3.01.03 on a chromosome-level mollusc genome assembly using some evidence I previously generated with Funannotate and BRAKER2, but Maker is not completing successfully. Every scaffold in the datastore_index.log file has both STARTED and FAILED statuses. I can't figure out where the problem lies, though. Running ./Build status indicates all the dependencies are there (I?m not using MPI). The run.log files just ends with "DIED RANK 0" and "DIED COUNT 3." Is there a way to tell which (if any) of the dependencies is misbehaving here (they all seem to run fine independently) or if my evidence .gff3 files are not correctly formatted? I?ve uploaded a zipped sample output folder as well as my config files and the datastore.log file here: http://genomes.ua.edu/Kocot/2021-09-10_Maker/ Any guidance on what might be the issue would be greatly appreciated. Thanks! Kevin Kevin M. Kocot he/him/his Associate Professor & Curator of Invertebrates Department of Biological Sciences & Alabama Museum of Natural History The University of Alabama 307 Mary Harmon Bryant Hall Box 870344 Tuscaloosa, AL 35487 phone 205-348-4052 | fax 205-348-4039 kmkocot at ua.edu | www.kocotlab.com https://uasystem.zoom.us/j/3755490727 _______________________________________________ maker-devel mailing list maker-devel at yandell-lab.org http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Sep 22 10:21:29 2021 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 22 Sep 2021 10:21:29 -0600 Subject: [maker-devel] [EXTERNAL] Troubleshooting Maker failure In-Reply-To: References: <3F3F5F45-E733-4AC5-892F-39E4964C23D0@gmail.com> Message-ID: <92CB6C3D-688F-4EFC-B763-BD51CF455FFA@gmail.com> I?d have to see the GFF, but in general you should organize sequence alignments as match/match_part features. Here is n example from the GFF3 format specification: ctg123 . cDNA_match 1200 9000 . . . ID=cDNA00001 ctg123 . match_part 1200 3200 2.2e-30 + . ID=match00002;Parent=cDNA00001;Target=mjm1123.5 5 506;Gap=M301 D1499 M201 ctg123 . match_part 7000 9000 7.4e-32 - . ID=match00003;Parent=cDNA00001;Target=mjm1123.3 1 502;Gap=M101 D1499 M401 Also make sure you are not inadvertently using GFF2 or GTF. They are not backwards compatible with GFF3. ?Carson > On Sep 22, 2021, at 8:06 AM, Kevin Kocot wrote: > > Thanks Carson, > > I think I see the problem now. Here's what I'm getting: > > ----- > STATUS: Parsing control files... > STATUS: Processing and indexing input FASTA files... > STATUS: Setting up database for any GFF3 input... > A data structure will be created for you at: > /home/wirenia/Desktop/2021-08-11_MAKER_Dreissena_rostriformis/PGA_assembly_shortened_headers.fasta.maker.output/PGA_assembly_shortened_headers.fasta_datastore > > To access files for individual sequences use the datastore index: > /home/wirenia/Desktop/2021-08-11_MAKER_Dreissena_rostriformis/PGA_assembly_shortened_headers.fasta.maker.output/PGA_assembly_shortened_headers.fasta_master_datastore_index.log > > STATUS: Now running MAKER... > examining contents of the fasta file and run log > > > > --Next Contig-- > > Processing run.log file... > #--------------------------------------------------------------------- > Now retrying the contig!! > SeqID: PGA_scaffold0 > Length: 141759199 > Tries: 5!! > #--------------------------------------------------------------------- > > > setting up GFF3 output and fasta chunks > prepare section files > Gathering GFF3 input into hits - chunk:0 > ERROR: Non-unique top level ID for match.19561.56 > While this is technically legal in GFF3, it usually > indicates a poorly fomatted GFF3 file (perhaps you > tried to merge two GFF3 files without accounting for > unique IDs). MAKER will not handle these correctly. > > --> rank=NA, hostname=wirenia > ERROR: Failed while prepare section files > ERROR: Chunk failed at level:12, tier_type:3 > FAILED CONTIG:PGA_scaffold0 > > ERROR: Chunk failed at level:4, tier_type:0 > FAILED CONTIG:PGA_scaffold0 > > examining contents of the fasta file and run log > ----- > > It looks like maker doesn't like the format of the exonerate gff3 I am using. I tried 'fixing' it with agat_convert_sp_gxf2gxf.pl, which seemed to work on my Braker output, but that just produced an empty gff3 file for both my exonerate and PASA gff3 files. Any advice on how to prepare exonerate or PASA gff3 files for Maker? > > Thanks! > Kevin > From: Carson Holt > > Sent: Monday, September 20, 2021 8:06 AM > To: Kevin Kocot > > Cc: maker-devel at yandell-lab.org > > Subject: [EXTERNAL] Re: [maker-devel] Troubleshooting Maker failure > > Hi Kevin, > > The files are already being skipped because of previous failures. Can you increase the try count (-t on the command line) to something like 6, and send me the STDERR after it generates a new failure. > > ?Carson > > Sent from my iPhone > >> On Sep 20, 2021, at 4:56 AM, Kevin Kocot > wrote: >> >> ? >> Thanks Carson! I've uploaded that file here: >> http://genomes.ua.edu/Kocot/2021-09-10_Maker/round1_run_maker_try2.log >> >> >> From: Carson Holt >> Sent: Tuesday, September 14, 2021 9:51 PM >> To: Kevin Kocot >> Cc: maker-devel at yandell-lab.org >> Subject: [EXTERNAL] Re: [maker-devel] Troubleshooting Maker failure >> >> What I really need is the captured STDERR from the failed run. >> >> ?Carson >> >>> On Sep 10, 2021, at 9:19 AM, Kevin Kocot > wrote: >>> >>> Hi Carson and all, >>> >>> I ran Maker 3.01.03 on a chromosome-level mollusc genome assembly using some evidence I previously generated with Funannotate and BRAKER2, but Maker is not completing successfully. Every scaffold in the datastore_index.log file has both STARTED and FAILED statuses. I can't figure out where the problem lies, though. Running ./Build status indicates all the dependencies are there (I?m not using MPI). The run.log files just ends with "DIED RANK 0" and "DIED COUNT 3." Is there a way to tell which (if any) of the dependencies is misbehaving here (they all seem to run fine independently) or if my evidence .gff3 files are not correctly formatted? >>> >>> I?ve uploaded a zipped sample output folder as well as my config files and the datastore.log file here: http://genomes.ua.edu/Kocot/2021-09-10_Maker/ >>> >>> Any guidance on what might be the issue would be greatly appreciated. >>> >>> Thanks! >>> Kevin >>> >>> Kevin M. Kocot >>> he/him/his >>> Associate Professor & Curator of Invertebrates >>> Department of Biological Sciences & Alabama Museum of Natural History >>> The University of Alabama >>> 307 Mary Harmon Bryant Hall >>> Box 870344 >>> Tuscaloosa, AL 35487 >>> phone 205-348-4052 | fax 205-348-4039 >>> kmkocot at ua.edu | www.kocotlab.com >>> https://uasystem.zoom.us/j/3755490727 >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at yandell-lab.org >>> http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 1376 bytes Desc: not available URL: From jacques.dainat at nbis.se Wed Sep 22 13:09:52 2021 From: jacques.dainat at nbis.se (Jacques Dainat) Date: Wed, 22 Sep 2021 21:09:52 +0200 Subject: [maker-devel] [EXTERNAL] Troubleshooting Maker failure In-Reply-To: <92CB6C3D-688F-4EFC-B763-BD51CF455FFA@gmail.com> References: <3F3F5F45-E733-4AC5-892F-39E4964C23D0@gmail.com> <92CB6C3D-688F-4EFC-B763-BD51CF455FFA@gmail.com> Message-ID: <9FBD8DD4-660C-4A6D-90A2-9DF1B948F236@nbis.se> Hi Kevin, About using AGAT (agat_convert_sp_gxf2gxf.pl), two reasons to get empty output files. I) The feature types (3rd column) are not yet handled by AGAT. You can inform AGAT how to deal with it See https://agat.readthedocs.io/en/latest/troubleshooting.html#agat-throws-features-out-because-the-feature-type-is-not-yet-taken-into-account II) The features are thrown by AGAT because child feature are missing (e.g. gene feature expect at least one transcript linked to it). See https://agat.readthedocs.io/en/latest/troubleshooting.html#agat-throws-features-out-because-child-features-are-not-provided I invite you to open an issue in the AGAT GitHub repository. Once the file is parsed correctly you can use the script agat_sp_alignment_output_style.pl to turn level1 feature types (e.g. gene) and level2 feature types (e.g. mRNA) into match and match_part features respectively as it can be preferred by MAKER. Best regards, Jacques Dainat, Ph.D. > On 22 Sep 2021, at 18:21, Carson Holt wrote: > > I?d have to see the GFF, but in general you should organize sequence alignments as match/match_part features. > > Here is n example from the GFF3 format specification: > > ctg123 . cDNA_match 1200 9000 . . . ID=cDNA00001 > ctg123 . match_part 1200 3200 2.2e-30 + . ID=match00002;Parent=cDNA00001;Target=mjm1123.5 5 506;Gap=M301 D1499 M201 > ctg123 . match_part 7000 9000 7.4e-32 - . ID=match00003;Parent=cDNA00001;Target=mjm1123.3 1 502;Gap=M101 D1499 M401 > > Also make sure you are not inadvertently using GFF2 or GTF. They are not backwards compatible with GFF3. > > ?Carson > > >> On Sep 22, 2021, at 8:06 AM, Kevin Kocot > wrote: >> >> Thanks Carson, >> >> I think I see the problem now. Here's what I'm getting: >> >> ----- >> STATUS: Parsing control files... >> STATUS: Processing and indexing input FASTA files... >> STATUS: Setting up database for any GFF3 input... >> A data structure will be created for you at: >> /home/wirenia/Desktop/2021-08-11_MAKER_Dreissena_rostriformis/PGA_assembly_shortened_headers.fasta.maker.output/PGA_assembly_shortened_headers.fasta_datastore >> >> To access files for individual sequences use the datastore index: >> /home/wirenia/Desktop/2021-08-11_MAKER_Dreissena_rostriformis/PGA_assembly_shortened_headers.fasta.maker.output/PGA_assembly_shortened_headers.fasta_master_datastore_index.log >> >> STATUS: Now running MAKER... >> examining contents of the fasta file and run log >> >> >> >> --Next Contig-- >> >> Processing run.log file... >> #--------------------------------------------------------------------- >> Now retrying the contig!! >> SeqID: PGA_scaffold0 >> Length: 141759199 >> Tries: 5!! >> #--------------------------------------------------------------------- >> >> >> setting up GFF3 output and fasta chunks >> prepare section files >> Gathering GFF3 input into hits - chunk:0 >> ERROR: Non-unique top level ID for match.19561.56 >> While this is technically legal in GFF3, it usually >> indicates a poorly fomatted GFF3 file (perhaps you >> tried to merge two GFF3 files without accounting for >> unique IDs). MAKER will not handle these correctly. >> >> --> rank=NA, hostname=wirenia >> ERROR: Failed while prepare section files >> ERROR: Chunk failed at level:12, tier_type:3 >> FAILED CONTIG:PGA_scaffold0 >> >> ERROR: Chunk failed at level:4, tier_type:0 >> FAILED CONTIG:PGA_scaffold0 >> >> examining contents of the fasta file and run log >> ----- >> >> It looks like maker doesn't like the format of the exonerate gff3 I am using. I tried 'fixing' it with agat_convert_sp_gxf2gxf.pl, which seemed to work on my Braker output, but that just produced an empty gff3 file for both my exonerate and PASA gff3 files. Any advice on how to prepare exonerate or PASA gff3 files for Maker? >> >> Thanks! >> Kevin >> From: Carson Holt > >> Sent: Monday, September 20, 2021 8:06 AM >> To: Kevin Kocot > >> Cc: maker-devel at yandell-lab.org > >> Subject: [EXTERNAL] Re: [maker-devel] Troubleshooting Maker failure >> >> Hi Kevin, >> >> The files are already being skipped because of previous failures. Can you increase the try count (-t on the command line) to something like 6, and send me the STDERR after it generates a new failure. >> >> ?Carson >> >> Sent from my iPhone >> >>> On Sep 20, 2021, at 4:56 AM, Kevin Kocot > wrote: >>> >>> ? >>> Thanks Carson! I've uploaded that file here: >>> http://genomes.ua.edu/Kocot/2021-09-10_Maker/round1_run_maker_try2.log >>> >>> >>> From: Carson Holt >>> Sent: Tuesday, September 14, 2021 9:51 PM >>> To: Kevin Kocot >>> Cc: maker-devel at yandell-lab.org >>> Subject: [EXTERNAL] Re: [maker-devel] Troubleshooting Maker failure >>> >>> What I really need is the captured STDERR from the failed run. >>> >>> ?Carson >>> >>>> On Sep 10, 2021, at 9:19 AM, Kevin Kocot > wrote: >>>> >>>> Hi Carson and all, >>>> >>>> I ran Maker 3.01.03 on a chromosome-level mollusc genome assembly using some evidence I previously generated with Funannotate and BRAKER2, but Maker is not completing successfully. Every scaffold in the datastore_index.log file has both STARTED and FAILED statuses. I can't figure out where the problem lies, though. Running ./Build status indicates all the dependencies are there (I?m not using MPI). The run.log files just ends with "DIED RANK 0" and "DIED COUNT 3." Is there a way to tell which (if any) of the dependencies is misbehaving here (they all seem to run fine independently) or if my evidence .gff3 files are not correctly formatted? >>>> >>>> I?ve uploaded a zipped sample output folder as well as my config files and the datastore.log file here: http://genomes.ua.edu/Kocot/2021-09-10_Maker/ >>>> >>>> Any guidance on what might be the issue would be greatly appreciated. >>>> >>>> Thanks! >>>> Kevin >>>> >>>> Kevin M. Kocot >>>> he/him/his >>>> Associate Professor & Curator of Invertebrates >>>> Department of Biological Sciences & Alabama Museum of Natural History >>>> The University of Alabama >>>> 307 Mary Harmon Bryant Hall >>>> Box 870344 >>>> Tuscaloosa, AL 35487 >>>> phone 205-348-4052 | fax 205-348-4039 >>>> kmkocot at ua.edu | www.kocotlab.com >>>> https://uasystem.zoom.us/j/3755490727 >>>> >>>> _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at yandell-lab.org >>>> http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From kmkocot at ua.edu Thu Sep 23 10:42:12 2021 From: kmkocot at ua.edu (Kevin Kocot) Date: Thu, 23 Sep 2021 16:42:12 +0000 Subject: [maker-devel] [EXTERNAL] Re: Troubleshooting Maker failure In-Reply-To: <9FBD8DD4-660C-4A6D-90A2-9DF1B948F236@nbis.se> References: <3F3F5F45-E733-4AC5-892F-39E4964C23D0@gmail.com> <92CB6C3D-688F-4EFC-B763-BD51CF455FFA@gmail.com> <9FBD8DD4-660C-4A6D-90A2-9DF1B948F236@nbis.se> Message-ID: Hi Carson and Jacques, Thank you both very much for the help. It looks like my PASA and exonerate (via Funannotate) gff3 files are not correctly formatted for maker. I've uploaded them here just in case it might be helpful for you to see them, but I will try to figure out how to reformat them correctly following Jacques's advice. http://genomes.ua.edu/Kocot/2021-09-10_Maker/pasa_predictions.gff3 http://genomes.ua.edu/Kocot/2021-09-10_Maker/protein_alignments.gff3 Is there a standalone tool or Maker feature I'm not seeing that can assess whether a gff3 file is correctly formatted for Maker? Thank you again! Kevin ________________________________ From: Jacques Dainat Sent: Wednesday, September 22, 2021 2:09 PM To: Kevin Kocot Cc: maker-devel at yandell-lab.org ; Carson Holt Subject: [EXTERNAL] Re: [maker-devel] Troubleshooting Maker failure Hi Kevin, About using AGAT (agat_convert_sp_gxf2gxf.pl), two reasons to get empty output files. I) The feature types (3rd column) are not yet handled by AGAT. You can inform AGAT how to deal with it See https://agat.readthedocs.io/en/latest/troubleshooting.html#agat-throws-features-out-because-the-feature-type-is-not-yet-taken-into-account II) The features are thrown by AGAT because child feature are missing (e.g. gene feature expect at least one transcript linked to it). See https://agat.readthedocs.io/en/latest/troubleshooting.html#agat-throws-features-out-because-child-features-are-not-provided I invite you to open an issue in the AGAT GitHub repository. Once the file is parsed correctly you can use the script agat_sp_alignment_output_style.pl to turn level1 feature types (e.g. gene) and level2 feature types (e.g. mRNA) into match and match_part features respectively as it can be preferred by MAKER. Best regards, Jacques Dainat, Ph.D. On 22 Sep 2021, at 18:21, Carson Holt > wrote: I?d have to see the GFF, but in general you should organize sequence alignments as match/match_part features. Here is n example from the GFF3 format specification: ctg123 . cDNA_match 1200 9000 . . . ID=cDNA00001 ctg123 . match_part 1200 3200 2.2e-30 + . ID=match00002;Parent=cDNA00001;Target=mjm1123.5 5 506;Gap=M301 D1499 M201 ctg123 . match_part 7000 9000 7.4e-32 - . ID=match00003;Parent=cDNA00001;Target=mjm1123.3 1 502;Gap=M101 D1499 M401 Also make sure you are not inadvertently using GFF2 or GTF. They are not backwards compatible with GFF3. ?Carson On Sep 22, 2021, at 8:06 AM, Kevin Kocot > wrote: Thanks Carson, I think I see the problem now. Here's what I'm getting: ----- STATUS: Parsing control files... STATUS: Processing and indexing input FASTA files... STATUS: Setting up database for any GFF3 input... A data structure will be created for you at: /home/wirenia/Desktop/2021-08-11_MAKER_Dreissena_rostriformis/PGA_assembly_shortened_headers.fasta.maker.output/PGA_assembly_shortened_headers.fasta_datastore To access files for individual sequences use the datastore index: /home/wirenia/Desktop/2021-08-11_MAKER_Dreissena_rostriformis/PGA_assembly_shortened_headers.fasta.maker.output/PGA_assembly_shortened_headers.fasta_master_datastore_index.log STATUS: Now running MAKER... examining contents of the fasta file and run log --Next Contig-- Processing run.log file... #--------------------------------------------------------------------- Now retrying the contig!! SeqID: PGA_scaffold0 Length: 141759199 Tries: 5!! #--------------------------------------------------------------------- setting up GFF3 output and fasta chunks prepare section files Gathering GFF3 input into hits - chunk:0 ERROR: Non-unique top level ID for match.19561.56 While this is technically legal in GFF3, it usually indicates a poorly fomatted GFF3 file (perhaps you tried to merge two GFF3 files without accounting for unique IDs). MAKER will not handle these correctly. --> rank=NA, hostname=wirenia ERROR: Failed while prepare section files ERROR: Chunk failed at level:12, tier_type:3 FAILED CONTIG:PGA_scaffold0 ERROR: Chunk failed at level:4, tier_type:0 FAILED CONTIG:PGA_scaffold0 examining contents of the fasta file and run log ----- It looks like maker doesn't like the format of the exonerate gff3 I am using. I tried 'fixing' it with agat_convert_sp_gxf2gxf.pl, which seemed to work on my Braker output, but that just produced an empty gff3 file for both my exonerate and PASA gff3 files. Any advice on how to prepare exonerate or PASA gff3 files for Maker? Thanks! Kevin ________________________________ From: Carson Holt > Sent: Monday, September 20, 2021 8:06 AM To: Kevin Kocot > Cc: maker-devel at yandell-lab.org > Subject: [EXTERNAL] Re: [maker-devel] Troubleshooting Maker failure Hi Kevin, The files are already being skipped because of previous failures. Can you increase the try count (-t on the command line) to something like 6, and send me the STDERR after it generates a new failure. ?Carson Sent from my iPhone On Sep 20, 2021, at 4:56 AM, Kevin Kocot > wrote: ? Thanks Carson! I've uploaded that file here: http://genomes.ua.edu/Kocot/2021-09-10_Maker/round1_run_maker_try2.log ________________________________ From: Carson Holt Sent: Tuesday, September 14, 2021 9:51 PM To: Kevin Kocot Cc: maker-devel at yandell-lab.org Subject: [EXTERNAL] Re: [maker-devel] Troubleshooting Maker failure What I really need is the captured STDERR from the failed run. ?Carson On Sep 10, 2021, at 9:19 AM, Kevin Kocot > wrote: Hi Carson and all, I ran Maker 3.01.03 on a chromosome-level mollusc genome assembly using some evidence I previously generated with Funannotate and BRAKER2, but Maker is not completing successfully. Every scaffold in the datastore_index.log file has both STARTED and FAILED statuses. I can't figure out where the problem lies, though. Running ./Build status indicates all the dependencies are there (I?m not using MPI). The run.log files just ends with "DIED RANK 0" and "DIED COUNT 3." Is there a way to tell which (if any) of the dependencies is misbehaving here (they all seem to run fine independently) or if my evidence .gff3 files are not correctly formatted? I?ve uploaded a zipped sample output folder as well as my config files and the datastore.log file here: http://genomes.ua.edu/Kocot/2021-09-10_Maker/ Any guidance on what might be the issue would be greatly appreciated. Thanks! Kevin Kevin M. Kocot he/him/his Associate Professor & Curator of Invertebrates Department of Biological Sciences & Alabama Museum of Natural History The University of Alabama 307 Mary Harmon Bryant Hall Box 870344 Tuscaloosa, AL 35487 phone 205-348-4052 | fax 205-348-4039 kmkocot at ua.edu | www.kocotlab.com https://uasystem.zoom.us/j/3755490727 _______________________________________________ maker-devel mailing list maker-devel at yandell-lab.org http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at yandell-lab.org http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From jacques.dainat at gmail.com Fri Sep 24 02:41:49 2021 From: jacques.dainat at gmail.com (Jacques Dainat) Date: Fri, 24 Sep 2021 10:41:49 +0200 Subject: [maker-devel] [EXTERNAL] Re: Troubleshooting Maker failure In-Reply-To: <3161576B-6A96-4876-B348-5CFF993188F4@nbis.se> References: <3161576B-6A96-4876-B348-5CFF993188F4@nbis.se> Message-ID: Hi Kevin, I have checked the 2 files. About pasa_predictions.gff3 : It is parsed without any problem with agat_convert_sp_gxf2gxf.pl in about 10 mins. MAKER (>=v3) should be able to use it like that, but you might prefer to convert it into match/match_part style using the agat_sp_alignment_output_style.p l. The match/match_part style works in all of MAKER version. About protein_alignments.gff3 : This file is more problematic, it contains only 1 feature type, which is level1 in AGAT (i.e. like a gene), and in the current state is expecting sub-features. On top of that, the ID attribute is confusing because it is supposed to be unique, and it is not. So here the commands you should launch to get a proper file for MAKER: First ``` sed 's/nucleotide_to_protein_match/match_part/' protein_alignments.gff3 | sed 's/ID=/Parent=/' > protein_alignments_repared.gff3 ``` Then ``` agat_convert_sp_gxf2gxf.pl --gff protein_alignments_repared.gff3 -o protein_alignments_clean.gff3 ``` Then you should be good with a proper match/match_part file. Best, /Jacques ? > Hi Carson and Jacques, > > Thank you both very much for the help. It looks like my PASA and exonerate > (via Funannotate) gff3 files are not correctly formatted for maker. I've > uploaded them here just in case it might be helpful for you to see them, > but I will try to figure out how to reformat them correctly following > Jacques's advice. > http://genomes.ua.edu/Kocot/2021-09-10_Maker/pasa_predictions.gff3 > http://genomes.ua.edu/Kocot/2021-09-10_Maker/protein_alignments.gff3 > > Is there a standalone tool or Maker feature I'm not seeing that can assess > whether a gff3 file is correctly formatted for Maker? > > Thank you again! > Kevin > ------------------------------ > *From:* Jacques Dainat > *Sent:* Wednesday, September 22, 2021 2:09 PM > *To:* Kevin Kocot > *Cc:* maker-devel at yandell-lab.org ; Carson > Holt > *Subject:* [EXTERNAL] Re: [maker-devel] Troubleshooting Maker failure > > Hi Kevin, > > About using AGAT (agat_convert_sp_gxf2gxf.pl), two reasons to get empty > output files. > I) The feature types (3rd column) are not yet handled by AGAT. You can > inform AGAT how to deal with it > See > https://agat.readthedocs.io/en/latest/troubleshooting.html#agat-throws-features-out-because-the-feature-type-is-not-yet-taken-into-account > > II) The features are thrown by AGAT because child feature are missing (e.g. > gene feature expect at least one transcript linked to it). See > https://agat.readthedocs.io/en/latest/troubleshooting.html#agat-throws-features-out-because-child-features-are-not-provided > > > I invite you to open an issue in the AGAT GitHub repository. > > Once the file is parsed correctly you can use the script > agat_sp_alignment_output_style.pl to turn level1 feature types (e.g. > gene) and level2 feature types (e.g. mRNA) into match and match_part > features respectively as it can be preferred by MAKER. > > Best regards, > > Jacques Dainat, Ph.D. > > > On 22 Sep 2021, at 18:21, Carson Holt wrote: > > I?d have to see the GFF, but in general you should organize sequence > alignments as match/match_part features. > > Here is n example from the GFF3 format specification: > > ctg123 . cDNA_match 1200 9000 . . . ID=cDNA00001 > ctg123 . match_part 1200 3200 2.2e-30 + . > ID=match00002;Parent=cDNA00001;Target=mjm1123.5 5 506;Gap=M301 D1499 M201 > ctg123 . match_part 7000 9000 7.4e-32 - . > ID=match00003;Parent=cDNA00001;Target=mjm1123.3 1 502;Gap=M101 D1499 M401 > > Also make sure you are not inadvertently using GFF2 or GTF. They are not > backwards compatible with GFF3. > > ?Carson > > > On Sep 22, 2021, at 8:06 AM, Kevin Kocot wrote: > > Thanks Carson, > > I think I see the problem now. Here's what I'm getting: > > ----- > STATUS: Parsing control files... > STATUS: Processing and indexing input FASTA files... > STATUS: Setting up database for any GFF3 input... > A data structure will be created for you at: > > /home/wirenia/Desktop/2021-08-11_MAKER_Dreissena_rostriformis/PGA_assembly_shortened_headers.fasta.maker.output/PGA_assembly_shortened_headers.fasta_datastore > > To access files for individual sequences use the datastore index: > > /home/wirenia/Desktop/2021-08-11_MAKER_Dreissena_rostriformis/PGA_assembly_shortened_headers.fasta.maker.output/PGA_assembly_shortened_headers.fasta_master_datastore_index.log > > STATUS: Now running MAKER... > examining contents of the fasta file and run log > > > > --Next Contig-- > > Processing run.log file... > #--------------------------------------------------------------------- > Now retrying the contig!! > SeqID: PGA_scaffold0 > Length: 141759199 > Tries: 5!! > #--------------------------------------------------------------------- > > > setting up GFF3 output and fasta chunks > prepare section files > Gathering GFF3 input into hits - chunk:0 > ERROR: Non-unique top level ID for match.19561.56 > While this is technically legal in GFF3, it usually > indicates a poorly fomatted GFF3 file (perhaps you > tried to merge two GFF3 files without accounting for > unique IDs). MAKER will not handle these correctly. > > --> rank=NA, hostname=wirenia > ERROR: Failed while prepare section files > ERROR: Chunk failed at level:12, tier_type:3 > FAILED CONTIG:PGA_scaffold0 > > ERROR: Chunk failed at level:4, tier_type:0 > FAILED CONTIG:PGA_scaffold0 > > examining contents of the fasta file and run log > ----- > > It looks like maker doesn't like the format of the exonerate gff3 I am > using. I tried 'fixing' it with agat_convert_sp_gxf2gxf.pl, which seemed > to work on my Braker output, but that just produced an empty gff3 file for > both my exonerate and PASA gff3 files. Any advice on how to prepare > exonerate or PASA gff3 files for Maker? > > Thanks! > Kevin > ------------------------------ > *From:* Carson Holt > *Sent:* Monday, September 20, 2021 8:06 AM > *To:* Kevin Kocot > *Cc:* maker-devel at yandell-lab.org > *Subject:* [EXTERNAL] Re: [maker-devel] Troubleshooting Maker failure > > Hi Kevin, > > The files are already being skipped because of previous failures. Can you > increase the try count (-t on the command line) to something like 6, and > send me the STDERR after it generates a new failure. > > ?Carson > > Sent from my iPhone > > On Sep 20, 2021, at 4:56 AM, Kevin Kocot wrote: > > ? > Thanks Carson! I've uploaded that file here: > http://genomes.ua.edu/Kocot/2021-09-10_Maker/round1_run_maker_try2.log > > > ------------------------------ > *From:* Carson Holt > *Sent:* Tuesday, September 14, 2021 9:51 PM > *To:* Kevin Kocot > *Cc:* maker-devel at yandell-lab.org > *Subject:* [EXTERNAL] Re: [maker-devel] Troubleshooting Maker failure > > What I really need is the captured STDERR from the failed run. > > ?Carson > > On Sep 10, 2021, at 9:19 AM, Kevin Kocot wrote: > > Hi Carson and all, > > I ran Maker 3.01.03 on a chromosome-level mollusc genome assembly using > some evidence I previously generated with Funannotate and BRAKER2, but > Maker is not completing successfully. Every scaffold in the > datastore_index.log file has both STARTED and FAILED statuses. I can't > figure out where the problem lies, though. Running ./Build status indicates > all the dependencies are there (I?m not using MPI). The run.log files just > ends with "DIED RANK 0" and "DIED COUNT 3." Is there a way to tell which > (if any) of the dependencies is misbehaving here (they all seem to run fine > independently) or if my evidence .gff3 files are not correctly formatted? > > I?ve uploaded a zipped sample output folder as well as my config files and > the datastore.log file here: http://genomes.ua.edu/Kocot/2021-09-10_Maker/ > > Any guidance on what might be the issue would be greatly appreciated. > > Thanks! > Kevin > > > > *Kevin M. Kocot *he/him/his > > Associate Professor & Curator of Invertebrates > Department of Biological Sciences & Alabama Museum of Natural History > The University of Alabama > 307 Mary Harmon Bryant Hall > Box 870344 > Tuscaloosa, AL 35487 > phone 205-348-4052 | fax 205-348-4039 > kmkocot at ua.edu | www.kocotlab.com > > https://uasystem.zoom.us/j/3755490727 > > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org > > > > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org > > > -- Jacques Dainat -------------- next part -------------- An HTML attachment was scrubbed... URL: From kchilds at msu.edu Sun Sep 26 13:54:06 2021 From: kchilds at msu.edu (Childs, Kevin) Date: Sun, 26 Sep 2021 19:54:06 +0000 Subject: [maker-devel] mising UTRs even with transcript evidence Message-ID: Carson, I am helping a student to annotate a genome. Despite having provided transcript evidence, there are many cases where the MAKER gene models are missing obvious UTRs that are found in the transcripts. The link below is for a screenshot with some examples. https://figshare.com/s/0b29e0b0ac10f868dbe9 In the figure, there are genes at ~148 kbp, ~152 kbp, ~160 kbp, ~170 kbp that are missing UTRs that should have been predicted by the transcripts in the second track. The annotation is pretty much riddled with this type of behavior. One mea culpa, the transcript data was provided as a gff file, and that file had all transcripts defined with a source of ?bed2gff? instead of ?est2genome?. I feel that I had found that to be important during some past annotation work, but it doesn?t explain why many but not all genes have UTRs when evidence is present. Thanks. Kevin --- Kevin Childs, PhD Assistant Professor - Fixed Term Director MSU Genomics Core Facility Plant Biology Department Michigan State University kchilds at msu.edu 517-775-2844 (m) 517-884-6926 (o) http://childslab.plantbiology.msu.edu https://rtsf.natsci.msu.edu/genomics/ From carsonhh at gmail.com Mon Sep 27 09:01:22 2021 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 27 Sep 2021 09:01:22 -0600 Subject: [maker-devel] mising UTRs even with transcript evidence In-Reply-To: References: Message-ID: <9DBC6B87-E086-42CB-9B4D-80385652CA9A@gmail.com> MAKER needs a little more info about an alignment that may not exist in the GFF3. MAKER2 will reject GFF3 input for UTR generation, but MAKER3 will use it as long as it can generate some of the missing info internally using certain assumptions about the alignment. Also alignments will be rejected for UTR generation that have non-canonical splicing. ?Carson > On Sep 26, 2021, at 1:54 PM, Childs, Kevin wrote: > > Carson, > > I am helping a student to annotate a genome. Despite having provided transcript evidence, there are many cases where the MAKER gene models are missing obvious UTRs that are found in the transcripts. The link below is for a screenshot with some examples. > > https://figshare.com/s/0b29e0b0ac10f868dbe9 > > In the figure, there are genes at ~148 kbp, ~152 kbp, ~160 kbp, ~170 kbp that are missing UTRs that should have been predicted by the transcripts in the second track. The annotation is pretty much riddled with this type of behavior. > > One mea culpa, the transcript data was provided as a gff file, and that file had all transcripts defined with a source of ?bed2gff? instead of ?est2genome?. I feel that I had found that to be important during some past annotation work, but it doesn?t explain why many but not all genes have UTRs when evidence is present. > > Thanks. > > Kevin > > --- > Kevin Childs, PhD > > Assistant Professor - Fixed Term > Director MSU Genomics Core Facility > Plant Biology Department > Michigan State University > > kchilds at msu.edu > 517-775-2844 (m) > 517-884-6926 (o) > > http://childslab.plantbiology.msu.edu > https://rtsf.natsci.msu.edu/genomics/ > > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 1376 bytes Desc: not available URL: From kmkocot at ua.edu Mon Sep 27 11:36:18 2021 From: kmkocot at ua.edu (Kevin Kocot) Date: Mon, 27 Sep 2021 17:36:18 +0000 Subject: [maker-devel] [EXTERNAL] Re: Troubleshooting Maker failure In-Reply-To: References: <3161576B-6A96-4876-B348-5CFF993188F4@nbis.se> Message-ID: Hi all, Thank you again for the help! This solved my issue with the file formatting and I was able to successfully run Maker on a test scaffold. I have the full run going now. Thanks again! Kevin From: Jacques Dainat Sent: Friday, September 24, 2021 3:42 AM To: Kevin Kocot Cc: carsonhh at gmail.com; maker-devel at yandell-lab.org Subject: [EXTERNAL] Re: [maker-devel] Troubleshooting Maker failure Hi Kevin, I have checked the 2 files. About pasa_predictions.gff3: It is parsed without any problem with agat_convert_sp_gxf2gxf.pl in about 10 mins. MAKER (>=v3) should be able to use it like that, but you might prefer to convert it into match/match_part style using the agat_sp_alignment_output_style.pl. The match/match_part style works in all of MAKER version. About protein_alignments.gff3: This file is more problematic, it contains only 1 feature type, which is level1 in AGAT (i.e. like a gene), and in the current state is expecting sub-features. On top of that, the ID attribute is confusing because it is supposed to be unique, and it is not. So here the commands you should launch to get a proper file for MAKER: First ``` sed 's/nucleotide_to_protein_match/match_part/' protein_alignments.gff3 | sed 's/ID=/Parent=/' > protein_alignments_repared.gff3 ``` Then ``` agat_convert_sp_gxf2gxf.pl --gff protein_alignments_repared.gff3 -o protein_alignments_clean.gff3 ``` Then you should be good with a proper match/match_part file. Best, /Jacques ? Hi Carson and Jacques, Thank you both very much for the help. It looks like my PASA and exonerate (via Funannotate) gff3 files are not correctly formatted for maker. I've uploaded them here just in case it might be helpful for you to see them, but I will try to figure out how to reformat them correctly following Jacques's advice. http://genomes.ua.edu/Kocot/2021-09-10_Maker/pasa_predictions.gff3 http://genomes.ua.edu/Kocot/2021-09-10_Maker/protein_alignments.gff3 Is there a standalone tool or Maker feature I'm not seeing that can assess whether a gff3 file is correctly formatted for Maker? Thank you again! Kevin ________________________________ From: Jacques Dainat > Sent: Wednesday, September 22, 2021 2:09 PM To: Kevin Kocot > Cc: maker-devel at yandell-lab.org >; Carson Holt > Subject: [EXTERNAL] Re: [maker-devel] Troubleshooting Maker failure Hi Kevin, About using AGAT (agat_convert_sp_gxf2gxf.pl), two reasons to get empty output files. I) The feature types (3rd column) are not yet handled by AGAT. You can inform AGAT how to deal with it See https://agat.readthedocs.io/en/latest/troubleshooting.html#agat-throws-features-out-because-the-feature-type-is-not-yet-taken-into-account II) The features are thrown by AGAT because child feature are missing (e.g. gene feature expect at least one transcript linked to it). See https://agat.readthedocs.io/en/latest/troubleshooting.html#agat-throws-features-out-because-child-features-are-not-provided I invite you to open an issue in the AGAT GitHub repository. Once the file is parsed correctly you can use the script agat_sp_alignment_output_style.pl to turn level1 feature types (e.g. gene) and level2 feature types (e.g. mRNA) into match and match_part features respectively as it can be preferred by MAKER. Best regards, Jacques Dainat, Ph.D. On 22 Sep 2021, at 18:21, Carson Holt > wrote: I?d have to see the GFF, but in general you should organize sequence alignments as match/match_part features. Here is n example from the GFF3 format specification: ctg123 . cDNA_match 1200 9000 . . . ID=cDNA00001 ctg123 . match_part 1200 3200 2.2e-30 + . ID=match00002;Parent=cDNA00001;Target=mjm1123.5 5 506;Gap=M301 D1499 M201 ctg123 . match_part 7000 9000 7.4e-32 - . ID=match00003;Parent=cDNA00001;Target=mjm1123.3 1 502;Gap=M101 D1499 M401 Also make sure you are not inadvertently using GFF2 or GTF. They are not backwards compatible with GFF3. ?Carson On Sep 22, 2021, at 8:06 AM, Kevin Kocot > wrote: Thanks Carson, I think I see the problem now. Here's what I'm getting: ----- STATUS: Parsing control files... STATUS: Processing and indexing input FASTA files... STATUS: Setting up database for any GFF3 input... A data structure will be created for you at: /home/wirenia/Desktop/2021-08-11_MAKER_Dreissena_rostriformis/PGA_assembly_shortened_headers.fasta.maker.output/PGA_assembly_shortened_headers.fasta_datastore To access files for individual sequences use the datastore index: /home/wirenia/Desktop/2021-08-11_MAKER_Dreissena_rostriformis/PGA_assembly_shortened_headers.fasta.maker.output/PGA_assembly_shortened_headers.fasta_master_datastore_index.log STATUS: Now running MAKER... examining contents of the fasta file and run log --Next Contig-- Processing run.log file... #--------------------------------------------------------------------- Now retrying the contig!! SeqID: PGA_scaffold0 Length: 141759199 Tries: 5!! #--------------------------------------------------------------------- setting up GFF3 output and fasta chunks prepare section files Gathering GFF3 input into hits - chunk:0 ERROR: Non-unique top level ID for match.19561.56 While this is technically legal in GFF3, it usually indicates a poorly fomatted GFF3 file (perhaps you tried to merge two GFF3 files without accounting for unique IDs). MAKER will not handle these correctly. --> rank=NA, hostname=wirenia ERROR: Failed while prepare section files ERROR: Chunk failed at level:12, tier_type:3 FAILED CONTIG:PGA_scaffold0 ERROR: Chunk failed at level:4, tier_type:0 FAILED CONTIG:PGA_scaffold0 examining contents of the fasta file and run log ----- It looks like maker doesn't like the format of the exonerate gff3 I am using. I tried 'fixing' it with agat_convert_sp_gxf2gxf.pl, which seemed to work on my Braker output, but that just produced an empty gff3 file for both my exonerate and PASA gff3 files. Any advice on how to prepare exonerate or PASA gff3 files for Maker? Thanks! Kevin ________________________________ From: Carson Holt > Sent: Monday, September 20, 2021 8:06 AM To: Kevin Kocot > Cc: maker-devel at yandell-lab.org > Subject: [EXTERNAL] Re: [maker-devel] Troubleshooting Maker failure Hi Kevin, The files are already being skipped because of previous failures. Can you increase the try count (-t on the command line) to something like 6, and send me the STDERR after it generates a new failure. ?Carson Sent from my iPhone On Sep 20, 2021, at 4:56 AM, Kevin Kocot > wrote: ? Thanks Carson! I've uploaded that file here: http://genomes.ua.edu/Kocot/2021-09-10_Maker/round1_run_maker_try2.log ________________________________ From: Carson Holt Sent: Tuesday, September 14, 2021 9:51 PM To: Kevin Kocot Cc: maker-devel at yandell-lab.org Subject: [EXTERNAL] Re: [maker-devel] Troubleshooting Maker failure What I really need is the captured STDERR from the failed run. ?Carson On Sep 10, 2021, at 9:19 AM, Kevin Kocot > wrote: Hi Carson and all, I ran Maker 3.01.03 on a chromosome-level mollusc genome assembly using some evidence I previously generated with Funannotate and BRAKER2, but Maker is not completing successfully. Every scaffold in the datastore_index.log file has both STARTED and FAILED statuses. I can't figure out where the problem lies, though. Running ./Build status indicates all the dependencies are there (I?m not using MPI). The run.log files just ends with "DIED RANK 0" and "DIED COUNT 3." Is there a way to tell which (if any) of the dependencies is misbehaving here (they all seem to run fine independently) or if my evidence .gff3 files are not correctly formatted? I?ve uploaded a zipped sample output folder as well as my config files and the datastore.log file here: http://genomes.ua.edu/Kocot/2021-09-10_Maker/ Any guidance on what might be the issue would be greatly appreciated. Thanks! Kevin Kevin M. Kocot he/him/his Associate Professor & Curator of Invertebrates Department of Biological Sciences & Alabama Museum of Natural History The University of Alabama 307 Mary Harmon Bryant Hall Box 870344 Tuscaloosa, AL 35487 phone 205-348-4052 | fax 205-348-4039 kmkocot at ua.edu | www.kocotlab.com https://uasystem.zoom.us/j/3755490727 _______________________________________________ maker-devel mailing list maker-devel at yandell-lab.org http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at yandell-lab.org http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -- Jacques Dainat -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Sep 1 10:25:23 2021 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 1 Sep 2021 10:25:23 -0600 Subject: [maker-devel] Fatal error in PMPI_Send In-Reply-To: <7bad655de17b4a1ab537e1b624c1da06@unil.ch> References: <7bad655de17b4a1ab537e1b624c1da06@unil.ch> Message-ID: <96237564-C1AC-4BB8-B332-F93560999165@gmail.com> Set --debugmpi as a command line option to maker. It will produce a bunch of extra output that could help. You will want to capture the output to a file. Example: maker --debugmpi &> all_output That might help track down exactly where it happens. From the error below I gather that there is too much data to transfer over MPI. The size of the data message causes the count variable to overflow the max positive value which loops back around to a negative value as a result. It makes me think something is odd with your data input, i.e. you are trying to align raw mRNA-seq reads as evidence instead of assembled read, you set the max_dna_len to a value that is way too large in the control files, or you have an odd contig and evidence dataset resulting in alignment depth in the tens of thousands for a single locus. If that?s the case, set all of the depth_blast parameters in maker_bopts (20 is a good number). ?Carson > On Aug 30, 2021, at 2:45 AM, Patrick Tran Van wrote: > > Hi, > > I have this strange error that kill my job and tthat seems to occur randomly: > > running est2genome search. > #--------- command -------------# > Widget::exonerate::est2genome: > exonerate -q /tmp/maker_JZ_DUt/12/MSTRG%2E13631%2E1.for.154952409-154955259.12.fasta -t /tmp/maker_JZ_DUt/12/Tps_LRv5b_scf2.154952409-154955259.12.fasta -Q dna -T dna --model est2genome --minintron 20 --maxintron 100000 --showcigar --percent 20 > /tmp/maker_JZ_DUt/12/Tps_LRv5b_scf2.154952409-154955259.MSTRG%2E13631%2E1.e.exonerate > #-------------------------------# > Fatal error in PMPI_Send: Invalid count, error stack: > PMPI_Send(159): MPI_Send(buf=0x7f526282e010, count=-1326766804, MPI_CHAR, dest=22, tag=9999, MPI_COMM_WORLD) failed > PMPI_Send(99).: Negative count, value is -1326766804 > > > > Do you know what could be the problem ? > > Best, > > Patrick Tran Van > > Bioinformatician: Lab Chapuisat & Schwander > Department of Ecology and Evolution > University of Lausanne > Lausanne - Switzerland > Office 3206 > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 1376 bytes Desc: not available URL: From kmkocot at ua.edu Fri Sep 10 09:19:14 2021 From: kmkocot at ua.edu (Kevin Kocot) Date: Fri, 10 Sep 2021 15:19:14 +0000 Subject: [maker-devel] Troubleshooting Maker failure In-Reply-To: References: Message-ID: Hi Carson and all, I ran Maker 3.01.03 on a chromosome-level mollusc genome assembly using some evidence I previously generated with Funannotate and BRAKER2, but Maker is not completing successfully. Every scaffold in the datastore_index.log file has both STARTED and FAILED statuses. I can't figure out where the problem lies, though. Running ./Build status indicates all the dependencies are there (I'm not using MPI). The run.log files just ends with "DIED RANK 0" and "DIED COUNT 3." Is there a way to tell which (if any) of the dependencies is misbehaving here (they all seem to run fine independently) or if my evidence .gff3 files are not correctly formatted? I've uploaded a zipped sample output folder as well as my config files and the datastore.log file here: http://genomes.ua.edu/Kocot/2021-09-10_Maker/ Any guidance on what might be the issue would be greatly appreciated. Thanks! Kevin Kevin M. Kocot he/him/his Associate Professor & Curator of Invertebrates Department of Biological Sciences & Alabama Museum of Natural History The University of Alabama 307 Mary Harmon Bryant Hall Box 870344 Tuscaloosa, AL 35487 phone 205-348-4052 | fax 205-348-4039 kmkocot at ua.edu | www.kocotlab.com https://uasystem.zoom.us/j/3755490727 -------------- next part -------------- An HTML attachment was scrubbed... URL: From kmkocot at ua.edu Fri Sep 3 16:09:57 2021 From: kmkocot at ua.edu (Kevin Kocot) Date: Fri, 3 Sep 2021 22:09:57 +0000 Subject: [maker-devel] Maker failing but not sure what dependency is the culprit Message-ID: Hello! I ran Maker on a chromosome-level mollusc genome using some evidence I previously generated with Funannotate and BRAKER2, but Maker is not completing successfully. I can't figure out where the problem lies, though. The run.log files just end with "DIED RANK 0" and "DIED COUNT 3." Attached is a sample output folder (from this location: /home/wirenia/Desktop/2021-08-11_MAKER_Dreissena_rostriformis/PGA_assembly_shortened_headers.fasta.maker.output/PGA_assembly_shortened_headers.fasta_datastore/0A/5A) as well as my config files and master datastore index log. Any guidance on what might be the issue would be greatly appreciated. Thanks! Kevin -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: PGA_assembly_shortened_headers.fasta_master_datastore_index.log Type: text/x-log Size: 474332 bytes Desc: PGA_assembly_shortened_headers.fasta_master_datastore_index.log URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_opts.log Type: text/x-log Size: 5207 bytes Desc: maker_opts.log URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_exe.log Type: text/x-log Size: 1610 bytes Desc: maker_exe.log URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_evm.log Type: text/x-log Size: 893 bytes Desc: maker_evm.log URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_bopts.log Type: text/x-log Size: 1479 bytes Desc: maker_bopts.log URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 5A.zip Type: application/zip Size: 48184 bytes Desc: 5A.zip URL: From stuckerta at gmail.com Mon Sep 6 14:50:13 2021 From: stuckerta at gmail.com (Adam Stuckert) Date: Mon, 6 Sep 2021 14:50:13 -0600 Subject: [maker-devel] Issues including Repeat Masker gff in Maker runs Message-ID: Hi, I have been working on this problem for a while now, but can't seem to come up with a solution despite extensive searching. When I include a repeat masker gff, it always fails with messages like this: #--------------------------------------------------------------------- Now starting the contig!! SeqID: P_RNA_scaffold_115 Length: 2076116 #--------------------------------------------------------------------- setting up GFF3 output and fasta chunks doing repeat masking ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Did not specify a Hit End or Hit Begin STACK: Error::throw STACK: Bio::Root::Root::throw /mnt/oldhome/software/anaconda/colsa/envs/maker-3.01.02/lib/perl5/site_perl/5.22.0/Bio/Root/Root.pm:447 STACK: Bio::Search::HSP::GenericHSP::_subject_seq_feature /mnt/oldhome/software/anaconda/colsa/envs/maker-3.01.02/lib/perl5/site_perl/5.22.0/Bio/Search/HSP/GenericHSP.pm:1603 STACK: Bio::Search::HSP::GenericHSP::hit /mnt/oldhome/software/anaconda/colsa/envs/maker-3.01.02/lib/perl5/site_perl/5.22.0/Bio/Search/HSP/GenericHSP.pm:987 STACK: repeat_mask_seq::separate_types /mnt/oldhome/macmaneslab/macmanes/test/maker/bin/../lib/ repeat_mask_seq.pm:307 STACK: repeat_mask_seq::mask_chunk /mnt/oldhome/macmaneslab/macmanes/test/maker/bin/../lib/ repeat_mask_seq.pm:191 STACK: Process::MpiChunk::_go /mnt/oldhome/macmaneslab/macmanes/test/maker/bin/../lib/Process/MpiChunk.pm:762 STACK: Process::MpiChunk::run /mnt/oldhome/macmaneslab/macmanes/test/maker/bin/../lib/Process/MpiChunk.pm:340 STACK: Process::MpiChunk::run_all /mnt/oldhome/macmaneslab/macmanes/test/maker/bin/../lib/Process/MpiChunk.pm:356 STACK: Process::MpiTiers::run_all /mnt/oldhome/macmaneslab/macmanes/test/maker/bin/../lib/Process/MpiTiers.pm:287 STACK: Process::MpiTiers::run_all /mnt/oldhome/macmaneslab/macmanes/test/maker/bin/../lib/Process/MpiTiers.pm:287 STACK: /mnt/lustre/macmaneslab/macmanes/test/maker/bin/maker:679 ----------------------------------------------------------- --> rank=NA, hostname=node143.rcchpc ERROR: Failed while doing repeat masking ERROR: Chunk failed at level:0, tier_type:1 FAILED CONTIG:P_RNA_scaffold_54 ERROR: Chunk failed at level:2, tier_type:0 FAILED CONTIG:P_RNA_scaffold_54 I have tried a number of ways to modify the output of RepeatMasker to fit with Maker's expectation, and to me, the gff looks fine (see attached, which is the first 500 lines of a repeatmasker.gff I've used. Note it says "similarity" for hit type, but I've also changed the gff to only have "match_part" and had the same issue). I have had this issue across multiple clusters. Any suggestions? Thanks, Adam -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: rm.500lines.gff Type: application/octet-stream Size: 55244 bytes Desc: not available URL: From carsonhh at gmail.com Tue Sep 14 20:51:37 2021 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 14 Sep 2021 20:51:37 -0600 Subject: [maker-devel] Troubleshooting Maker failure In-Reply-To: References: Message-ID: <7CC4EAAC-85A5-4696-8919-D5243F997942@gmail.com> What I really need is the captured STDERR from the failed run. ?Carson > On Sep 10, 2021, at 9:19 AM, Kevin Kocot wrote: > > Hi Carson and all, > > I ran Maker 3.01.03 on a chromosome-level mollusc genome assembly using some evidence I previously generated with Funannotate and BRAKER2, but Maker is not completing successfully. Every scaffold in the datastore_index.log file has both STARTED and FAILED statuses. I can't figure out where the problem lies, though. Running ./Build status indicates all the dependencies are there (I?m not using MPI). The run.log files just ends with "DIED RANK 0" and "DIED COUNT 3." Is there a way to tell which (if any) of the dependencies is misbehaving here (they all seem to run fine independently) or if my evidence .gff3 files are not correctly formatted? > > I?ve uploaded a zipped sample output folder as well as my config files and the datastore.log file here: http://genomes.ua.edu/Kocot/2021-09-10_Maker/ > > Any guidance on what might be the issue would be greatly appreciated. > > Thanks! > Kevin > > Kevin M. Kocot > he/him/his > Associate Professor & Curator of Invertebrates > Department of Biological Sciences & Alabama Museum of Natural History > The University of Alabama > 307 Mary Harmon Bryant Hall > Box 870344 > Tuscaloosa, AL 35487 > phone 205-348-4052 | fax 205-348-4039 > kmkocot at ua.edu | www.kocotlab.com > https://uasystem.zoom.us/j/3755490727 > > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 1376 bytes Desc: not available URL: From carsonhh at gmail.com Tue Sep 14 21:20:17 2021 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 14 Sep 2021 21:20:17 -0600 Subject: [maker-devel] Issues including Repeat Masker gff in Maker runs In-Reply-To: References: Message-ID: <94399EE4-8EB1-49DA-997C-3A8B4D210D46@gmail.com> A couple of things. First let MAKER run RepeatMasker. Don?t provide the RepeatMasker GFF as input to MAKER. The file you sent is GFF version 2 for example which is not backwards compatible with GFF version 3. Second use the latest versions of MAKER2 or MAKER3. There is an issue with RepeatMasker sometimes producing start/end coordinates that are 0 or even negative numbers. The current releases of MAKER2/3 know how to find and fix invalid coordinate features. ?Carson > On Sep 6, 2021, at 2:50 PM, Adam Stuckert wrote: > > Hi, > > I have been working on this problem for a while now, but can't seem to come up with a solution despite extensive searching. When I include a repeat masker gff, it always fails with messages like this: > > #--------------------------------------------------------------------- > Now starting the contig!! > SeqID: P_RNA_scaffold_115 > Length: 2076116 > #--------------------------------------------------------------------- > > > setting up GFF3 output and fasta chunks > doing repeat masking > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Did not specify a Hit End or Hit Begin > STACK: Error::throw > STACK: Bio::Root::Root::throw /mnt/oldhome/software/anaconda/colsa/envs/maker-3.01.02/lib/perl5/site_perl/5.22.0/Bio/Root/Root.pm:447 > STACK: Bio::Search::HSP::GenericHSP::_subject_seq_feature /mnt/oldhome/software/anaconda/colsa/envs/maker-3.01.02/lib/perl5/site_perl/5.22.0/Bio/Search/HSP/GenericHSP.pm:1603 > STACK: Bio::Search::HSP::GenericHSP::hit /mnt/oldhome/software/anaconda/colsa/envs/maker-3.01.02/lib/perl5/site_perl/5.22.0/Bio/Search/HSP/GenericHSP.pm:987 > STACK: repeat_mask_seq::separate_types /mnt/oldhome/macmaneslab/macmanes/test/maker/bin/../lib/repeat_mask_seq.pm:307 > STACK: repeat_mask_seq::mask_chunk /mnt/oldhome/macmaneslab/macmanes/test/maker/bin/../lib/repeat_mask_seq.pm:191 > STACK: Process::MpiChunk::_go /mnt/oldhome/macmaneslab/macmanes/test/maker/bin/../lib/Process/MpiChunk.pm:762 > STACK: Process::MpiChunk::run /mnt/oldhome/macmaneslab/macmanes/test/maker/bin/../lib/Process/MpiChunk.pm:340 > STACK: Process::MpiChunk::run_all /mnt/oldhome/macmaneslab/macmanes/test/maker/bin/../lib/Process/MpiChunk.pm:356 > STACK: Process::MpiTiers::run_all /mnt/oldhome/macmaneslab/macmanes/test/maker/bin/../lib/Process/MpiTiers.pm:287 > STACK: Process::MpiTiers::run_all /mnt/oldhome/macmaneslab/macmanes/test/maker/bin/../lib/Process/MpiTiers.pm:287 > STACK: /mnt/lustre/macmaneslab/macmanes/test/maker/bin/maker:679 > ----------------------------------------------------------- > --> rank=NA, hostname=node143.rcchpc > ERROR: Failed while doing repeat masking > ERROR: Chunk failed at level:0, tier_type:1 > FAILED CONTIG:P_RNA_scaffold_54 > > ERROR: Chunk failed at level:2, tier_type:0 > FAILED CONTIG:P_RNA_scaffold_54 > > I have tried a number of ways to modify the output of RepeatMasker to fit with Maker's expectation, and to me, the gff looks fine (see attached, which is the first 500 lines of a repeatmasker.gff I've used. Note it says "similarity" for hit type, but I've also changed the gff to only have "match_part" and had the same issue). I have had this issue across multiple clusters. > > Any suggestions? > > Thanks, > Adam > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 1376 bytes Desc: not available URL: From kmkocot at ua.edu Mon Sep 20 04:55:59 2021 From: kmkocot at ua.edu (Kevin Kocot) Date: Mon, 20 Sep 2021 10:55:59 +0000 Subject: [maker-devel] [EXTERNAL] Re: Troubleshooting Maker failure Message-ID: Thanks Carson! I've uploaded that file here: http://genomes.ua.edu/Kocot/2021-09-10_Maker/round1_run_maker_try2.log ________________________________ From: Carson Holt Sent: Tuesday, September 14, 2021 9:51 PM To: Kevin Kocot Cc: maker-devel at yandell-lab.org Subject: [EXTERNAL] Re: [maker-devel] Troubleshooting Maker failure What I really need is the captured STDERR from the failed run. ?Carson On Sep 10, 2021, at 9:19 AM, Kevin Kocot > wrote: Hi Carson and all, I ran Maker 3.01.03 on a chromosome-level mollusc genome assembly using some evidence I previously generated with Funannotate and BRAKER2, but Maker is not completing successfully. Every scaffold in the datastore_index.log file has both STARTED and FAILED statuses. I can't figure out where the problem lies, though. Running ./Build status indicates all the dependencies are there (I?m not using MPI). The run.log files just ends with "DIED RANK 0" and "DIED COUNT 3." Is there a way to tell which (if any) of the dependencies is misbehaving here (they all seem to run fine independently) or if my evidence .gff3 files are not correctly formatted? I?ve uploaded a zipped sample output folder as well as my config files and the datastore.log file here: http://genomes.ua.edu/Kocot/2021-09-10_Maker/ Any guidance on what might be the issue would be greatly appreciated. Thanks! Kevin Kevin M. Kocot he/him/his Associate Professor & Curator of Invertebrates Department of Biological Sciences & Alabama Museum of Natural History The University of Alabama 307 Mary Harmon Bryant Hall Box 870344 Tuscaloosa, AL 35487 phone 205-348-4052 | fax 205-348-4039 kmkocot at ua.edu | www.kocotlab.com https://uasystem.zoom.us/j/3755490727 _______________________________________________ maker-devel mailing list maker-devel at yandell-lab.org http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Sep 20 07:06:29 2021 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 20 Sep 2021 07:06:29 -0600 Subject: [maker-devel] [EXTERNAL] Re: Troubleshooting Maker failure In-Reply-To: References: Message-ID: <3F3F5F45-E733-4AC5-892F-39E4964C23D0@gmail.com> Hi Kevin, The files are already being skipped because of previous failures. Can you increase the try count (-t on the command line) to something like 6, and send me the STDERR after it generates a new failure. ?Carson Sent from my iPhone > On Sep 20, 2021, at 4:56 AM, Kevin Kocot wrote: > > ? > Thanks Carson! I've uploaded that file here: > http://genomes.ua.edu/Kocot/2021-09-10_Maker/round1_run_maker_try2.log > > > From: Carson Holt > Sent: Tuesday, September 14, 2021 9:51 PM > To: Kevin Kocot > Cc: maker-devel at yandell-lab.org > Subject: [EXTERNAL] Re: [maker-devel] Troubleshooting Maker failure > > What I really need is the captured STDERR from the failed run. > > ?Carson > >> On Sep 10, 2021, at 9:19 AM, Kevin Kocot wrote: >> >> Hi Carson and all, >> >> I ran Maker 3.01.03 on a chromosome-level mollusc genome assembly using some evidence I previously generated with Funannotate and BRAKER2, but Maker is not completing successfully. Every scaffold in the datastore_index.log file has both STARTED and FAILED statuses. I can't figure out where the problem lies, though. Running ./Build status indicates all the dependencies are there (I?m not using MPI). The run.log files just ends with "DIED RANK 0" and "DIED COUNT 3." Is there a way to tell which (if any) of the dependencies is misbehaving here (they all seem to run fine independently) or if my evidence .gff3 files are not correctly formatted? >> >> I?ve uploaded a zipped sample output folder as well as my config files and the datastore.log file here: http://genomes.ua.edu/Kocot/2021-09-10_Maker/ >> >> Any guidance on what might be the issue would be greatly appreciated. >> >> Thanks! >> Kevin >> >> Kevin M. Kocot >> he/him/his >> Associate Professor & Curator of Invertebrates >> Department of Biological Sciences & Alabama Museum of Natural History >> The University of Alabama >> 307 Mary Harmon Bryant Hall >> Box 870344 >> Tuscaloosa, AL 35487 >> phone 205-348-4052 | fax 205-348-4039 >> kmkocot at ua.edu | www.kocotlab.com >> https://uasystem.zoom.us/j/3755490727 >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at yandell-lab.org >> http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kmkocot at ua.edu Wed Sep 22 08:06:38 2021 From: kmkocot at ua.edu (Kevin Kocot) Date: Wed, 22 Sep 2021 14:06:38 +0000 Subject: [maker-devel] [EXTERNAL] Re: Troubleshooting Maker failure In-Reply-To: <3F3F5F45-E733-4AC5-892F-39E4964C23D0@gmail.com> References: <3F3F5F45-E733-4AC5-892F-39E4964C23D0@gmail.com> Message-ID: Thanks Carson, I think I see the problem now. Here's what I'm getting: ----- STATUS: Parsing control files... STATUS: Processing and indexing input FASTA files... STATUS: Setting up database for any GFF3 input... A data structure will be created for you at: /home/wirenia/Desktop/2021-08-11_MAKER_Dreissena_rostriformis/PGA_assembly_shortened_headers.fasta.maker.output/PGA_assembly_shortened_headers.fasta_datastore To access files for individual sequences use the datastore index: /home/wirenia/Desktop/2021-08-11_MAKER_Dreissena_rostriformis/PGA_assembly_shortened_headers.fasta.maker.output/PGA_assembly_shortened_headers.fasta_master_datastore_index.log STATUS: Now running MAKER... examining contents of the fasta file and run log --Next Contig-- Processing run.log file... #--------------------------------------------------------------------- Now retrying the contig!! SeqID: PGA_scaffold0 Length: 141759199 Tries: 5!! #--------------------------------------------------------------------- setting up GFF3 output and fasta chunks prepare section files Gathering GFF3 input into hits - chunk:0 ERROR: Non-unique top level ID for match.19561.56 While this is technically legal in GFF3, it usually indicates a poorly fomatted GFF3 file (perhaps you tried to merge two GFF3 files without accounting for unique IDs). MAKER will not handle these correctly. --> rank=NA, hostname=wirenia ERROR: Failed while prepare section files ERROR: Chunk failed at level:12, tier_type:3 FAILED CONTIG:PGA_scaffold0 ERROR: Chunk failed at level:4, tier_type:0 FAILED CONTIG:PGA_scaffold0 examining contents of the fasta file and run log ----- It looks like maker doesn't like the format of the exonerate gff3 I am using. I tried 'fixing' it with agat_convert_sp_gxf2gxf.pl, which seemed to work on my Braker output, but that just produced an empty gff3 file for both my exonerate and PASA gff3 files. Any advice on how to prepare exonerate or PASA gff3 files for Maker? Thanks! Kevin ________________________________ From: Carson Holt Sent: Monday, September 20, 2021 8:06 AM To: Kevin Kocot Cc: maker-devel at yandell-lab.org Subject: [EXTERNAL] Re: [maker-devel] Troubleshooting Maker failure Hi Kevin, The files are already being skipped because of previous failures. Can you increase the try count (-t on the command line) to something like 6, and send me the STDERR after it generates a new failure. ?Carson Sent from my iPhone On Sep 20, 2021, at 4:56 AM, Kevin Kocot wrote: ? Thanks Carson! I've uploaded that file here: http://genomes.ua.edu/Kocot/2021-09-10_Maker/round1_run_maker_try2.log ________________________________ From: Carson Holt Sent: Tuesday, September 14, 2021 9:51 PM To: Kevin Kocot Cc: maker-devel at yandell-lab.org Subject: [EXTERNAL] Re: [maker-devel] Troubleshooting Maker failure What I really need is the captured STDERR from the failed run. ?Carson On Sep 10, 2021, at 9:19 AM, Kevin Kocot > wrote: Hi Carson and all, I ran Maker 3.01.03 on a chromosome-level mollusc genome assembly using some evidence I previously generated with Funannotate and BRAKER2, but Maker is not completing successfully. Every scaffold in the datastore_index.log file has both STARTED and FAILED statuses. I can't figure out where the problem lies, though. Running ./Build status indicates all the dependencies are there (I?m not using MPI). The run.log files just ends with "DIED RANK 0" and "DIED COUNT 3." Is there a way to tell which (if any) of the dependencies is misbehaving here (they all seem to run fine independently) or if my evidence .gff3 files are not correctly formatted? I?ve uploaded a zipped sample output folder as well as my config files and the datastore.log file here: http://genomes.ua.edu/Kocot/2021-09-10_Maker/ Any guidance on what might be the issue would be greatly appreciated. Thanks! Kevin Kevin M. Kocot he/him/his Associate Professor & Curator of Invertebrates Department of Biological Sciences & Alabama Museum of Natural History The University of Alabama 307 Mary Harmon Bryant Hall Box 870344 Tuscaloosa, AL 35487 phone 205-348-4052 | fax 205-348-4039 kmkocot at ua.edu | www.kocotlab.com https://uasystem.zoom.us/j/3755490727 _______________________________________________ maker-devel mailing list maker-devel at yandell-lab.org http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Sep 22 10:21:29 2021 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 22 Sep 2021 10:21:29 -0600 Subject: [maker-devel] [EXTERNAL] Troubleshooting Maker failure In-Reply-To: References: <3F3F5F45-E733-4AC5-892F-39E4964C23D0@gmail.com> Message-ID: <92CB6C3D-688F-4EFC-B763-BD51CF455FFA@gmail.com> I?d have to see the GFF, but in general you should organize sequence alignments as match/match_part features. Here is n example from the GFF3 format specification: ctg123 . cDNA_match 1200 9000 . . . ID=cDNA00001 ctg123 . match_part 1200 3200 2.2e-30 + . ID=match00002;Parent=cDNA00001;Target=mjm1123.5 5 506;Gap=M301 D1499 M201 ctg123 . match_part 7000 9000 7.4e-32 - . ID=match00003;Parent=cDNA00001;Target=mjm1123.3 1 502;Gap=M101 D1499 M401 Also make sure you are not inadvertently using GFF2 or GTF. They are not backwards compatible with GFF3. ?Carson > On Sep 22, 2021, at 8:06 AM, Kevin Kocot wrote: > > Thanks Carson, > > I think I see the problem now. Here's what I'm getting: > > ----- > STATUS: Parsing control files... > STATUS: Processing and indexing input FASTA files... > STATUS: Setting up database for any GFF3 input... > A data structure will be created for you at: > /home/wirenia/Desktop/2021-08-11_MAKER_Dreissena_rostriformis/PGA_assembly_shortened_headers.fasta.maker.output/PGA_assembly_shortened_headers.fasta_datastore > > To access files for individual sequences use the datastore index: > /home/wirenia/Desktop/2021-08-11_MAKER_Dreissena_rostriformis/PGA_assembly_shortened_headers.fasta.maker.output/PGA_assembly_shortened_headers.fasta_master_datastore_index.log > > STATUS: Now running MAKER... > examining contents of the fasta file and run log > > > > --Next Contig-- > > Processing run.log file... > #--------------------------------------------------------------------- > Now retrying the contig!! > SeqID: PGA_scaffold0 > Length: 141759199 > Tries: 5!! > #--------------------------------------------------------------------- > > > setting up GFF3 output and fasta chunks > prepare section files > Gathering GFF3 input into hits - chunk:0 > ERROR: Non-unique top level ID for match.19561.56 > While this is technically legal in GFF3, it usually > indicates a poorly fomatted GFF3 file (perhaps you > tried to merge two GFF3 files without accounting for > unique IDs). MAKER will not handle these correctly. > > --> rank=NA, hostname=wirenia > ERROR: Failed while prepare section files > ERROR: Chunk failed at level:12, tier_type:3 > FAILED CONTIG:PGA_scaffold0 > > ERROR: Chunk failed at level:4, tier_type:0 > FAILED CONTIG:PGA_scaffold0 > > examining contents of the fasta file and run log > ----- > > It looks like maker doesn't like the format of the exonerate gff3 I am using. I tried 'fixing' it with agat_convert_sp_gxf2gxf.pl, which seemed to work on my Braker output, but that just produced an empty gff3 file for both my exonerate and PASA gff3 files. Any advice on how to prepare exonerate or PASA gff3 files for Maker? > > Thanks! > Kevin > From: Carson Holt > > Sent: Monday, September 20, 2021 8:06 AM > To: Kevin Kocot > > Cc: maker-devel at yandell-lab.org > > Subject: [EXTERNAL] Re: [maker-devel] Troubleshooting Maker failure > > Hi Kevin, > > The files are already being skipped because of previous failures. Can you increase the try count (-t on the command line) to something like 6, and send me the STDERR after it generates a new failure. > > ?Carson > > Sent from my iPhone > >> On Sep 20, 2021, at 4:56 AM, Kevin Kocot > wrote: >> >> ? >> Thanks Carson! I've uploaded that file here: >> http://genomes.ua.edu/Kocot/2021-09-10_Maker/round1_run_maker_try2.log >> >> >> From: Carson Holt >> Sent: Tuesday, September 14, 2021 9:51 PM >> To: Kevin Kocot >> Cc: maker-devel at yandell-lab.org >> Subject: [EXTERNAL] Re: [maker-devel] Troubleshooting Maker failure >> >> What I really need is the captured STDERR from the failed run. >> >> ?Carson >> >>> On Sep 10, 2021, at 9:19 AM, Kevin Kocot > wrote: >>> >>> Hi Carson and all, >>> >>> I ran Maker 3.01.03 on a chromosome-level mollusc genome assembly using some evidence I previously generated with Funannotate and BRAKER2, but Maker is not completing successfully. Every scaffold in the datastore_index.log file has both STARTED and FAILED statuses. I can't figure out where the problem lies, though. Running ./Build status indicates all the dependencies are there (I?m not using MPI). The run.log files just ends with "DIED RANK 0" and "DIED COUNT 3." Is there a way to tell which (if any) of the dependencies is misbehaving here (they all seem to run fine independently) or if my evidence .gff3 files are not correctly formatted? >>> >>> I?ve uploaded a zipped sample output folder as well as my config files and the datastore.log file here: http://genomes.ua.edu/Kocot/2021-09-10_Maker/ >>> >>> Any guidance on what might be the issue would be greatly appreciated. >>> >>> Thanks! >>> Kevin >>> >>> Kevin M. Kocot >>> he/him/his >>> Associate Professor & Curator of Invertebrates >>> Department of Biological Sciences & Alabama Museum of Natural History >>> The University of Alabama >>> 307 Mary Harmon Bryant Hall >>> Box 870344 >>> Tuscaloosa, AL 35487 >>> phone 205-348-4052 | fax 205-348-4039 >>> kmkocot at ua.edu | www.kocotlab.com >>> https://uasystem.zoom.us/j/3755490727 >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at yandell-lab.org >>> http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 1376 bytes Desc: not available URL: From jacques.dainat at nbis.se Wed Sep 22 13:09:52 2021 From: jacques.dainat at nbis.se (Jacques Dainat) Date: Wed, 22 Sep 2021 21:09:52 +0200 Subject: [maker-devel] [EXTERNAL] Troubleshooting Maker failure In-Reply-To: <92CB6C3D-688F-4EFC-B763-BD51CF455FFA@gmail.com> References: <3F3F5F45-E733-4AC5-892F-39E4964C23D0@gmail.com> <92CB6C3D-688F-4EFC-B763-BD51CF455FFA@gmail.com> Message-ID: <9FBD8DD4-660C-4A6D-90A2-9DF1B948F236@nbis.se> Hi Kevin, About using AGAT (agat_convert_sp_gxf2gxf.pl), two reasons to get empty output files. I) The feature types (3rd column) are not yet handled by AGAT. You can inform AGAT how to deal with it See https://agat.readthedocs.io/en/latest/troubleshooting.html#agat-throws-features-out-because-the-feature-type-is-not-yet-taken-into-account II) The features are thrown by AGAT because child feature are missing (e.g. gene feature expect at least one transcript linked to it). See https://agat.readthedocs.io/en/latest/troubleshooting.html#agat-throws-features-out-because-child-features-are-not-provided I invite you to open an issue in the AGAT GitHub repository. Once the file is parsed correctly you can use the script agat_sp_alignment_output_style.pl to turn level1 feature types (e.g. gene) and level2 feature types (e.g. mRNA) into match and match_part features respectively as it can be preferred by MAKER. Best regards, Jacques Dainat, Ph.D. > On 22 Sep 2021, at 18:21, Carson Holt wrote: > > I?d have to see the GFF, but in general you should organize sequence alignments as match/match_part features. > > Here is n example from the GFF3 format specification: > > ctg123 . cDNA_match 1200 9000 . . . ID=cDNA00001 > ctg123 . match_part 1200 3200 2.2e-30 + . ID=match00002;Parent=cDNA00001;Target=mjm1123.5 5 506;Gap=M301 D1499 M201 > ctg123 . match_part 7000 9000 7.4e-32 - . ID=match00003;Parent=cDNA00001;Target=mjm1123.3 1 502;Gap=M101 D1499 M401 > > Also make sure you are not inadvertently using GFF2 or GTF. They are not backwards compatible with GFF3. > > ?Carson > > >> On Sep 22, 2021, at 8:06 AM, Kevin Kocot > wrote: >> >> Thanks Carson, >> >> I think I see the problem now. Here's what I'm getting: >> >> ----- >> STATUS: Parsing control files... >> STATUS: Processing and indexing input FASTA files... >> STATUS: Setting up database for any GFF3 input... >> A data structure will be created for you at: >> /home/wirenia/Desktop/2021-08-11_MAKER_Dreissena_rostriformis/PGA_assembly_shortened_headers.fasta.maker.output/PGA_assembly_shortened_headers.fasta_datastore >> >> To access files for individual sequences use the datastore index: >> /home/wirenia/Desktop/2021-08-11_MAKER_Dreissena_rostriformis/PGA_assembly_shortened_headers.fasta.maker.output/PGA_assembly_shortened_headers.fasta_master_datastore_index.log >> >> STATUS: Now running MAKER... >> examining contents of the fasta file and run log >> >> >> >> --Next Contig-- >> >> Processing run.log file... >> #--------------------------------------------------------------------- >> Now retrying the contig!! >> SeqID: PGA_scaffold0 >> Length: 141759199 >> Tries: 5!! >> #--------------------------------------------------------------------- >> >> >> setting up GFF3 output and fasta chunks >> prepare section files >> Gathering GFF3 input into hits - chunk:0 >> ERROR: Non-unique top level ID for match.19561.56 >> While this is technically legal in GFF3, it usually >> indicates a poorly fomatted GFF3 file (perhaps you >> tried to merge two GFF3 files without accounting for >> unique IDs). MAKER will not handle these correctly. >> >> --> rank=NA, hostname=wirenia >> ERROR: Failed while prepare section files >> ERROR: Chunk failed at level:12, tier_type:3 >> FAILED CONTIG:PGA_scaffold0 >> >> ERROR: Chunk failed at level:4, tier_type:0 >> FAILED CONTIG:PGA_scaffold0 >> >> examining contents of the fasta file and run log >> ----- >> >> It looks like maker doesn't like the format of the exonerate gff3 I am using. I tried 'fixing' it with agat_convert_sp_gxf2gxf.pl, which seemed to work on my Braker output, but that just produced an empty gff3 file for both my exonerate and PASA gff3 files. Any advice on how to prepare exonerate or PASA gff3 files for Maker? >> >> Thanks! >> Kevin >> From: Carson Holt > >> Sent: Monday, September 20, 2021 8:06 AM >> To: Kevin Kocot > >> Cc: maker-devel at yandell-lab.org > >> Subject: [EXTERNAL] Re: [maker-devel] Troubleshooting Maker failure >> >> Hi Kevin, >> >> The files are already being skipped because of previous failures. Can you increase the try count (-t on the command line) to something like 6, and send me the STDERR after it generates a new failure. >> >> ?Carson >> >> Sent from my iPhone >> >>> On Sep 20, 2021, at 4:56 AM, Kevin Kocot > wrote: >>> >>> ? >>> Thanks Carson! I've uploaded that file here: >>> http://genomes.ua.edu/Kocot/2021-09-10_Maker/round1_run_maker_try2.log >>> >>> >>> From: Carson Holt >>> Sent: Tuesday, September 14, 2021 9:51 PM >>> To: Kevin Kocot >>> Cc: maker-devel at yandell-lab.org >>> Subject: [EXTERNAL] Re: [maker-devel] Troubleshooting Maker failure >>> >>> What I really need is the captured STDERR from the failed run. >>> >>> ?Carson >>> >>>> On Sep 10, 2021, at 9:19 AM, Kevin Kocot > wrote: >>>> >>>> Hi Carson and all, >>>> >>>> I ran Maker 3.01.03 on a chromosome-level mollusc genome assembly using some evidence I previously generated with Funannotate and BRAKER2, but Maker is not completing successfully. Every scaffold in the datastore_index.log file has both STARTED and FAILED statuses. I can't figure out where the problem lies, though. Running ./Build status indicates all the dependencies are there (I?m not using MPI). The run.log files just ends with "DIED RANK 0" and "DIED COUNT 3." Is there a way to tell which (if any) of the dependencies is misbehaving here (they all seem to run fine independently) or if my evidence .gff3 files are not correctly formatted? >>>> >>>> I?ve uploaded a zipped sample output folder as well as my config files and the datastore.log file here: http://genomes.ua.edu/Kocot/2021-09-10_Maker/ >>>> >>>> Any guidance on what might be the issue would be greatly appreciated. >>>> >>>> Thanks! >>>> Kevin >>>> >>>> Kevin M. Kocot >>>> he/him/his >>>> Associate Professor & Curator of Invertebrates >>>> Department of Biological Sciences & Alabama Museum of Natural History >>>> The University of Alabama >>>> 307 Mary Harmon Bryant Hall >>>> Box 870344 >>>> Tuscaloosa, AL 35487 >>>> phone 205-348-4052 | fax 205-348-4039 >>>> kmkocot at ua.edu | www.kocotlab.com >>>> https://uasystem.zoom.us/j/3755490727 >>>> >>>> _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at yandell-lab.org >>>> http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From kmkocot at ua.edu Thu Sep 23 10:42:12 2021 From: kmkocot at ua.edu (Kevin Kocot) Date: Thu, 23 Sep 2021 16:42:12 +0000 Subject: [maker-devel] [EXTERNAL] Re: Troubleshooting Maker failure In-Reply-To: <9FBD8DD4-660C-4A6D-90A2-9DF1B948F236@nbis.se> References: <3F3F5F45-E733-4AC5-892F-39E4964C23D0@gmail.com> <92CB6C3D-688F-4EFC-B763-BD51CF455FFA@gmail.com> <9FBD8DD4-660C-4A6D-90A2-9DF1B948F236@nbis.se> Message-ID: Hi Carson and Jacques, Thank you both very much for the help. It looks like my PASA and exonerate (via Funannotate) gff3 files are not correctly formatted for maker. I've uploaded them here just in case it might be helpful for you to see them, but I will try to figure out how to reformat them correctly following Jacques's advice. http://genomes.ua.edu/Kocot/2021-09-10_Maker/pasa_predictions.gff3 http://genomes.ua.edu/Kocot/2021-09-10_Maker/protein_alignments.gff3 Is there a standalone tool or Maker feature I'm not seeing that can assess whether a gff3 file is correctly formatted for Maker? Thank you again! Kevin ________________________________ From: Jacques Dainat Sent: Wednesday, September 22, 2021 2:09 PM To: Kevin Kocot Cc: maker-devel at yandell-lab.org ; Carson Holt Subject: [EXTERNAL] Re: [maker-devel] Troubleshooting Maker failure Hi Kevin, About using AGAT (agat_convert_sp_gxf2gxf.pl), two reasons to get empty output files. I) The feature types (3rd column) are not yet handled by AGAT. You can inform AGAT how to deal with it See https://agat.readthedocs.io/en/latest/troubleshooting.html#agat-throws-features-out-because-the-feature-type-is-not-yet-taken-into-account II) The features are thrown by AGAT because child feature are missing (e.g. gene feature expect at least one transcript linked to it). See https://agat.readthedocs.io/en/latest/troubleshooting.html#agat-throws-features-out-because-child-features-are-not-provided I invite you to open an issue in the AGAT GitHub repository. Once the file is parsed correctly you can use the script agat_sp_alignment_output_style.pl to turn level1 feature types (e.g. gene) and level2 feature types (e.g. mRNA) into match and match_part features respectively as it can be preferred by MAKER. Best regards, Jacques Dainat, Ph.D. On 22 Sep 2021, at 18:21, Carson Holt > wrote: I?d have to see the GFF, but in general you should organize sequence alignments as match/match_part features. Here is n example from the GFF3 format specification: ctg123 . cDNA_match 1200 9000 . . . ID=cDNA00001 ctg123 . match_part 1200 3200 2.2e-30 + . ID=match00002;Parent=cDNA00001;Target=mjm1123.5 5 506;Gap=M301 D1499 M201 ctg123 . match_part 7000 9000 7.4e-32 - . ID=match00003;Parent=cDNA00001;Target=mjm1123.3 1 502;Gap=M101 D1499 M401 Also make sure you are not inadvertently using GFF2 or GTF. They are not backwards compatible with GFF3. ?Carson On Sep 22, 2021, at 8:06 AM, Kevin Kocot > wrote: Thanks Carson, I think I see the problem now. Here's what I'm getting: ----- STATUS: Parsing control files... STATUS: Processing and indexing input FASTA files... STATUS: Setting up database for any GFF3 input... A data structure will be created for you at: /home/wirenia/Desktop/2021-08-11_MAKER_Dreissena_rostriformis/PGA_assembly_shortened_headers.fasta.maker.output/PGA_assembly_shortened_headers.fasta_datastore To access files for individual sequences use the datastore index: /home/wirenia/Desktop/2021-08-11_MAKER_Dreissena_rostriformis/PGA_assembly_shortened_headers.fasta.maker.output/PGA_assembly_shortened_headers.fasta_master_datastore_index.log STATUS: Now running MAKER... examining contents of the fasta file and run log --Next Contig-- Processing run.log file... #--------------------------------------------------------------------- Now retrying the contig!! SeqID: PGA_scaffold0 Length: 141759199 Tries: 5!! #--------------------------------------------------------------------- setting up GFF3 output and fasta chunks prepare section files Gathering GFF3 input into hits - chunk:0 ERROR: Non-unique top level ID for match.19561.56 While this is technically legal in GFF3, it usually indicates a poorly fomatted GFF3 file (perhaps you tried to merge two GFF3 files without accounting for unique IDs). MAKER will not handle these correctly. --> rank=NA, hostname=wirenia ERROR: Failed while prepare section files ERROR: Chunk failed at level:12, tier_type:3 FAILED CONTIG:PGA_scaffold0 ERROR: Chunk failed at level:4, tier_type:0 FAILED CONTIG:PGA_scaffold0 examining contents of the fasta file and run log ----- It looks like maker doesn't like the format of the exonerate gff3 I am using. I tried 'fixing' it with agat_convert_sp_gxf2gxf.pl, which seemed to work on my Braker output, but that just produced an empty gff3 file for both my exonerate and PASA gff3 files. Any advice on how to prepare exonerate or PASA gff3 files for Maker? Thanks! Kevin ________________________________ From: Carson Holt > Sent: Monday, September 20, 2021 8:06 AM To: Kevin Kocot > Cc: maker-devel at yandell-lab.org > Subject: [EXTERNAL] Re: [maker-devel] Troubleshooting Maker failure Hi Kevin, The files are already being skipped because of previous failures. Can you increase the try count (-t on the command line) to something like 6, and send me the STDERR after it generates a new failure. ?Carson Sent from my iPhone On Sep 20, 2021, at 4:56 AM, Kevin Kocot > wrote: ? Thanks Carson! I've uploaded that file here: http://genomes.ua.edu/Kocot/2021-09-10_Maker/round1_run_maker_try2.log ________________________________ From: Carson Holt Sent: Tuesday, September 14, 2021 9:51 PM To: Kevin Kocot Cc: maker-devel at yandell-lab.org Subject: [EXTERNAL] Re: [maker-devel] Troubleshooting Maker failure What I really need is the captured STDERR from the failed run. ?Carson On Sep 10, 2021, at 9:19 AM, Kevin Kocot > wrote: Hi Carson and all, I ran Maker 3.01.03 on a chromosome-level mollusc genome assembly using some evidence I previously generated with Funannotate and BRAKER2, but Maker is not completing successfully. Every scaffold in the datastore_index.log file has both STARTED and FAILED statuses. I can't figure out where the problem lies, though. Running ./Build status indicates all the dependencies are there (I?m not using MPI). The run.log files just ends with "DIED RANK 0" and "DIED COUNT 3." Is there a way to tell which (if any) of the dependencies is misbehaving here (they all seem to run fine independently) or if my evidence .gff3 files are not correctly formatted? I?ve uploaded a zipped sample output folder as well as my config files and the datastore.log file here: http://genomes.ua.edu/Kocot/2021-09-10_Maker/ Any guidance on what might be the issue would be greatly appreciated. Thanks! Kevin Kevin M. Kocot he/him/his Associate Professor & Curator of Invertebrates Department of Biological Sciences & Alabama Museum of Natural History The University of Alabama 307 Mary Harmon Bryant Hall Box 870344 Tuscaloosa, AL 35487 phone 205-348-4052 | fax 205-348-4039 kmkocot at ua.edu | www.kocotlab.com https://uasystem.zoom.us/j/3755490727 _______________________________________________ maker-devel mailing list maker-devel at yandell-lab.org http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at yandell-lab.org http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From jacques.dainat at gmail.com Fri Sep 24 02:41:49 2021 From: jacques.dainat at gmail.com (Jacques Dainat) Date: Fri, 24 Sep 2021 10:41:49 +0200 Subject: [maker-devel] [EXTERNAL] Re: Troubleshooting Maker failure In-Reply-To: <3161576B-6A96-4876-B348-5CFF993188F4@nbis.se> References: <3161576B-6A96-4876-B348-5CFF993188F4@nbis.se> Message-ID: Hi Kevin, I have checked the 2 files. About pasa_predictions.gff3 : It is parsed without any problem with agat_convert_sp_gxf2gxf.pl in about 10 mins. MAKER (>=v3) should be able to use it like that, but you might prefer to convert it into match/match_part style using the agat_sp_alignment_output_style.p l. The match/match_part style works in all of MAKER version. About protein_alignments.gff3 : This file is more problematic, it contains only 1 feature type, which is level1 in AGAT (i.e. like a gene), and in the current state is expecting sub-features. On top of that, the ID attribute is confusing because it is supposed to be unique, and it is not. So here the commands you should launch to get a proper file for MAKER: First ``` sed 's/nucleotide_to_protein_match/match_part/' protein_alignments.gff3 | sed 's/ID=/Parent=/' > protein_alignments_repared.gff3 ``` Then ``` agat_convert_sp_gxf2gxf.pl --gff protein_alignments_repared.gff3 -o protein_alignments_clean.gff3 ``` Then you should be good with a proper match/match_part file. Best, /Jacques ? > Hi Carson and Jacques, > > Thank you both very much for the help. It looks like my PASA and exonerate > (via Funannotate) gff3 files are not correctly formatted for maker. I've > uploaded them here just in case it might be helpful for you to see them, > but I will try to figure out how to reformat them correctly following > Jacques's advice. > http://genomes.ua.edu/Kocot/2021-09-10_Maker/pasa_predictions.gff3 > http://genomes.ua.edu/Kocot/2021-09-10_Maker/protein_alignments.gff3 > > Is there a standalone tool or Maker feature I'm not seeing that can assess > whether a gff3 file is correctly formatted for Maker? > > Thank you again! > Kevin > ------------------------------ > *From:* Jacques Dainat > *Sent:* Wednesday, September 22, 2021 2:09 PM > *To:* Kevin Kocot > *Cc:* maker-devel at yandell-lab.org ; Carson > Holt > *Subject:* [EXTERNAL] Re: [maker-devel] Troubleshooting Maker failure > > Hi Kevin, > > About using AGAT (agat_convert_sp_gxf2gxf.pl), two reasons to get empty > output files. > I) The feature types (3rd column) are not yet handled by AGAT. You can > inform AGAT how to deal with it > See > https://agat.readthedocs.io/en/latest/troubleshooting.html#agat-throws-features-out-because-the-feature-type-is-not-yet-taken-into-account > > II) The features are thrown by AGAT because child feature are missing (e.g. > gene feature expect at least one transcript linked to it). See > https://agat.readthedocs.io/en/latest/troubleshooting.html#agat-throws-features-out-because-child-features-are-not-provided > > > I invite you to open an issue in the AGAT GitHub repository. > > Once the file is parsed correctly you can use the script > agat_sp_alignment_output_style.pl to turn level1 feature types (e.g. > gene) and level2 feature types (e.g. mRNA) into match and match_part > features respectively as it can be preferred by MAKER. > > Best regards, > > Jacques Dainat, Ph.D. > > > On 22 Sep 2021, at 18:21, Carson Holt wrote: > > I?d have to see the GFF, but in general you should organize sequence > alignments as match/match_part features. > > Here is n example from the GFF3 format specification: > > ctg123 . cDNA_match 1200 9000 . . . ID=cDNA00001 > ctg123 . match_part 1200 3200 2.2e-30 + . > ID=match00002;Parent=cDNA00001;Target=mjm1123.5 5 506;Gap=M301 D1499 M201 > ctg123 . match_part 7000 9000 7.4e-32 - . > ID=match00003;Parent=cDNA00001;Target=mjm1123.3 1 502;Gap=M101 D1499 M401 > > Also make sure you are not inadvertently using GFF2 or GTF. They are not > backwards compatible with GFF3. > > ?Carson > > > On Sep 22, 2021, at 8:06 AM, Kevin Kocot wrote: > > Thanks Carson, > > I think I see the problem now. Here's what I'm getting: > > ----- > STATUS: Parsing control files... > STATUS: Processing and indexing input FASTA files... > STATUS: Setting up database for any GFF3 input... > A data structure will be created for you at: > > /home/wirenia/Desktop/2021-08-11_MAKER_Dreissena_rostriformis/PGA_assembly_shortened_headers.fasta.maker.output/PGA_assembly_shortened_headers.fasta_datastore > > To access files for individual sequences use the datastore index: > > /home/wirenia/Desktop/2021-08-11_MAKER_Dreissena_rostriformis/PGA_assembly_shortened_headers.fasta.maker.output/PGA_assembly_shortened_headers.fasta_master_datastore_index.log > > STATUS: Now running MAKER... > examining contents of the fasta file and run log > > > > --Next Contig-- > > Processing run.log file... > #--------------------------------------------------------------------- > Now retrying the contig!! > SeqID: PGA_scaffold0 > Length: 141759199 > Tries: 5!! > #--------------------------------------------------------------------- > > > setting up GFF3 output and fasta chunks > prepare section files > Gathering GFF3 input into hits - chunk:0 > ERROR: Non-unique top level ID for match.19561.56 > While this is technically legal in GFF3, it usually > indicates a poorly fomatted GFF3 file (perhaps you > tried to merge two GFF3 files without accounting for > unique IDs). MAKER will not handle these correctly. > > --> rank=NA, hostname=wirenia > ERROR: Failed while prepare section files > ERROR: Chunk failed at level:12, tier_type:3 > FAILED CONTIG:PGA_scaffold0 > > ERROR: Chunk failed at level:4, tier_type:0 > FAILED CONTIG:PGA_scaffold0 > > examining contents of the fasta file and run log > ----- > > It looks like maker doesn't like the format of the exonerate gff3 I am > using. I tried 'fixing' it with agat_convert_sp_gxf2gxf.pl, which seemed > to work on my Braker output, but that just produced an empty gff3 file for > both my exonerate and PASA gff3 files. Any advice on how to prepare > exonerate or PASA gff3 files for Maker? > > Thanks! > Kevin > ------------------------------ > *From:* Carson Holt > *Sent:* Monday, September 20, 2021 8:06 AM > *To:* Kevin Kocot > *Cc:* maker-devel at yandell-lab.org > *Subject:* [EXTERNAL] Re: [maker-devel] Troubleshooting Maker failure > > Hi Kevin, > > The files are already being skipped because of previous failures. Can you > increase the try count (-t on the command line) to something like 6, and > send me the STDERR after it generates a new failure. > > ?Carson > > Sent from my iPhone > > On Sep 20, 2021, at 4:56 AM, Kevin Kocot wrote: > > ? > Thanks Carson! I've uploaded that file here: > http://genomes.ua.edu/Kocot/2021-09-10_Maker/round1_run_maker_try2.log > > > ------------------------------ > *From:* Carson Holt > *Sent:* Tuesday, September 14, 2021 9:51 PM > *To:* Kevin Kocot > *Cc:* maker-devel at yandell-lab.org > *Subject:* [EXTERNAL] Re: [maker-devel] Troubleshooting Maker failure > > What I really need is the captured STDERR from the failed run. > > ?Carson > > On Sep 10, 2021, at 9:19 AM, Kevin Kocot wrote: > > Hi Carson and all, > > I ran Maker 3.01.03 on a chromosome-level mollusc genome assembly using > some evidence I previously generated with Funannotate and BRAKER2, but > Maker is not completing successfully. Every scaffold in the > datastore_index.log file has both STARTED and FAILED statuses. I can't > figure out where the problem lies, though. Running ./Build status indicates > all the dependencies are there (I?m not using MPI). The run.log files just > ends with "DIED RANK 0" and "DIED COUNT 3." Is there a way to tell which > (if any) of the dependencies is misbehaving here (they all seem to run fine > independently) or if my evidence .gff3 files are not correctly formatted? > > I?ve uploaded a zipped sample output folder as well as my config files and > the datastore.log file here: http://genomes.ua.edu/Kocot/2021-09-10_Maker/ > > Any guidance on what might be the issue would be greatly appreciated. > > Thanks! > Kevin > > > > *Kevin M. Kocot *he/him/his > > Associate Professor & Curator of Invertebrates > Department of Biological Sciences & Alabama Museum of Natural History > The University of Alabama > 307 Mary Harmon Bryant Hall > Box 870344 > Tuscaloosa, AL 35487 > phone 205-348-4052 | fax 205-348-4039 > kmkocot at ua.edu | www.kocotlab.com > > https://uasystem.zoom.us/j/3755490727 > > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org > > > > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org > > > -- Jacques Dainat -------------- next part -------------- An HTML attachment was scrubbed... URL: From kchilds at msu.edu Sun Sep 26 13:54:06 2021 From: kchilds at msu.edu (Childs, Kevin) Date: Sun, 26 Sep 2021 19:54:06 +0000 Subject: [maker-devel] mising UTRs even with transcript evidence Message-ID: Carson, I am helping a student to annotate a genome. Despite having provided transcript evidence, there are many cases where the MAKER gene models are missing obvious UTRs that are found in the transcripts. The link below is for a screenshot with some examples. https://figshare.com/s/0b29e0b0ac10f868dbe9 In the figure, there are genes at ~148 kbp, ~152 kbp, ~160 kbp, ~170 kbp that are missing UTRs that should have been predicted by the transcripts in the second track. The annotation is pretty much riddled with this type of behavior. One mea culpa, the transcript data was provided as a gff file, and that file had all transcripts defined with a source of ?bed2gff? instead of ?est2genome?. I feel that I had found that to be important during some past annotation work, but it doesn?t explain why many but not all genes have UTRs when evidence is present. Thanks. Kevin --- Kevin Childs, PhD Assistant Professor - Fixed Term Director MSU Genomics Core Facility Plant Biology Department Michigan State University kchilds at msu.edu 517-775-2844 (m) 517-884-6926 (o) http://childslab.plantbiology.msu.edu https://rtsf.natsci.msu.edu/genomics/ From carsonhh at gmail.com Mon Sep 27 09:01:22 2021 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 27 Sep 2021 09:01:22 -0600 Subject: [maker-devel] mising UTRs even with transcript evidence In-Reply-To: References: Message-ID: <9DBC6B87-E086-42CB-9B4D-80385652CA9A@gmail.com> MAKER needs a little more info about an alignment that may not exist in the GFF3. MAKER2 will reject GFF3 input for UTR generation, but MAKER3 will use it as long as it can generate some of the missing info internally using certain assumptions about the alignment. Also alignments will be rejected for UTR generation that have non-canonical splicing. ?Carson > On Sep 26, 2021, at 1:54 PM, Childs, Kevin wrote: > > Carson, > > I am helping a student to annotate a genome. Despite having provided transcript evidence, there are many cases where the MAKER gene models are missing obvious UTRs that are found in the transcripts. The link below is for a screenshot with some examples. > > https://figshare.com/s/0b29e0b0ac10f868dbe9 > > In the figure, there are genes at ~148 kbp, ~152 kbp, ~160 kbp, ~170 kbp that are missing UTRs that should have been predicted by the transcripts in the second track. The annotation is pretty much riddled with this type of behavior. > > One mea culpa, the transcript data was provided as a gff file, and that file had all transcripts defined with a source of ?bed2gff? instead of ?est2genome?. I feel that I had found that to be important during some past annotation work, but it doesn?t explain why many but not all genes have UTRs when evidence is present. > > Thanks. > > Kevin > > --- > Kevin Childs, PhD > > Assistant Professor - Fixed Term > Director MSU Genomics Core Facility > Plant Biology Department > Michigan State University > > kchilds at msu.edu > 517-775-2844 (m) > 517-884-6926 (o) > > http://childslab.plantbiology.msu.edu > https://rtsf.natsci.msu.edu/genomics/ > > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 1376 bytes Desc: not available URL: From kmkocot at ua.edu Mon Sep 27 11:36:18 2021 From: kmkocot at ua.edu (Kevin Kocot) Date: Mon, 27 Sep 2021 17:36:18 +0000 Subject: [maker-devel] [EXTERNAL] Re: Troubleshooting Maker failure In-Reply-To: References: <3161576B-6A96-4876-B348-5CFF993188F4@nbis.se> Message-ID: Hi all, Thank you again for the help! This solved my issue with the file formatting and I was able to successfully run Maker on a test scaffold. I have the full run going now. Thanks again! Kevin From: Jacques Dainat Sent: Friday, September 24, 2021 3:42 AM To: Kevin Kocot Cc: carsonhh at gmail.com; maker-devel at yandell-lab.org Subject: [EXTERNAL] Re: [maker-devel] Troubleshooting Maker failure Hi Kevin, I have checked the 2 files. About pasa_predictions.gff3: It is parsed without any problem with agat_convert_sp_gxf2gxf.pl in about 10 mins. MAKER (>=v3) should be able to use it like that, but you might prefer to convert it into match/match_part style using the agat_sp_alignment_output_style.pl. The match/match_part style works in all of MAKER version. About protein_alignments.gff3: This file is more problematic, it contains only 1 feature type, which is level1 in AGAT (i.e. like a gene), and in the current state is expecting sub-features. On top of that, the ID attribute is confusing because it is supposed to be unique, and it is not. So here the commands you should launch to get a proper file for MAKER: First ``` sed 's/nucleotide_to_protein_match/match_part/' protein_alignments.gff3 | sed 's/ID=/Parent=/' > protein_alignments_repared.gff3 ``` Then ``` agat_convert_sp_gxf2gxf.pl --gff protein_alignments_repared.gff3 -o protein_alignments_clean.gff3 ``` Then you should be good with a proper match/match_part file. Best, /Jacques ? Hi Carson and Jacques, Thank you both very much for the help. It looks like my PASA and exonerate (via Funannotate) gff3 files are not correctly formatted for maker. I've uploaded them here just in case it might be helpful for you to see them, but I will try to figure out how to reformat them correctly following Jacques's advice. http://genomes.ua.edu/Kocot/2021-09-10_Maker/pasa_predictions.gff3 http://genomes.ua.edu/Kocot/2021-09-10_Maker/protein_alignments.gff3 Is there a standalone tool or Maker feature I'm not seeing that can assess whether a gff3 file is correctly formatted for Maker? Thank you again! Kevin ________________________________ From: Jacques Dainat > Sent: Wednesday, September 22, 2021 2:09 PM To: Kevin Kocot > Cc: maker-devel at yandell-lab.org >; Carson Holt > Subject: [EXTERNAL] Re: [maker-devel] Troubleshooting Maker failure Hi Kevin, About using AGAT (agat_convert_sp_gxf2gxf.pl), two reasons to get empty output files. I) The feature types (3rd column) are not yet handled by AGAT. You can inform AGAT how to deal with it See https://agat.readthedocs.io/en/latest/troubleshooting.html#agat-throws-features-out-because-the-feature-type-is-not-yet-taken-into-account II) The features are thrown by AGAT because child feature are missing (e.g. gene feature expect at least one transcript linked to it). See https://agat.readthedocs.io/en/latest/troubleshooting.html#agat-throws-features-out-because-child-features-are-not-provided I invite you to open an issue in the AGAT GitHub repository. Once the file is parsed correctly you can use the script agat_sp_alignment_output_style.pl to turn level1 feature types (e.g. gene) and level2 feature types (e.g. mRNA) into match and match_part features respectively as it can be preferred by MAKER. Best regards, Jacques Dainat, Ph.D. On 22 Sep 2021, at 18:21, Carson Holt > wrote: I?d have to see the GFF, but in general you should organize sequence alignments as match/match_part features. Here is n example from the GFF3 format specification: ctg123 . cDNA_match 1200 9000 . . . ID=cDNA00001 ctg123 . match_part 1200 3200 2.2e-30 + . ID=match00002;Parent=cDNA00001;Target=mjm1123.5 5 506;Gap=M301 D1499 M201 ctg123 . match_part 7000 9000 7.4e-32 - . ID=match00003;Parent=cDNA00001;Target=mjm1123.3 1 502;Gap=M101 D1499 M401 Also make sure you are not inadvertently using GFF2 or GTF. They are not backwards compatible with GFF3. ?Carson On Sep 22, 2021, at 8:06 AM, Kevin Kocot > wrote: Thanks Carson, I think I see the problem now. Here's what I'm getting: ----- STATUS: Parsing control files... STATUS: Processing and indexing input FASTA files... STATUS: Setting up database for any GFF3 input... A data structure will be created for you at: /home/wirenia/Desktop/2021-08-11_MAKER_Dreissena_rostriformis/PGA_assembly_shortened_headers.fasta.maker.output/PGA_assembly_shortened_headers.fasta_datastore To access files for individual sequences use the datastore index: /home/wirenia/Desktop/2021-08-11_MAKER_Dreissena_rostriformis/PGA_assembly_shortened_headers.fasta.maker.output/PGA_assembly_shortened_headers.fasta_master_datastore_index.log STATUS: Now running MAKER... examining contents of the fasta file and run log --Next Contig-- Processing run.log file... #--------------------------------------------------------------------- Now retrying the contig!! SeqID: PGA_scaffold0 Length: 141759199 Tries: 5!! #--------------------------------------------------------------------- setting up GFF3 output and fasta chunks prepare section files Gathering GFF3 input into hits - chunk:0 ERROR: Non-unique top level ID for match.19561.56 While this is technically legal in GFF3, it usually indicates a poorly fomatted GFF3 file (perhaps you tried to merge two GFF3 files without accounting for unique IDs). MAKER will not handle these correctly. --> rank=NA, hostname=wirenia ERROR: Failed while prepare section files ERROR: Chunk failed at level:12, tier_type:3 FAILED CONTIG:PGA_scaffold0 ERROR: Chunk failed at level:4, tier_type:0 FAILED CONTIG:PGA_scaffold0 examining contents of the fasta file and run log ----- It looks like maker doesn't like the format of the exonerate gff3 I am using. I tried 'fixing' it with agat_convert_sp_gxf2gxf.pl, which seemed to work on my Braker output, but that just produced an empty gff3 file for both my exonerate and PASA gff3 files. Any advice on how to prepare exonerate or PASA gff3 files for Maker? Thanks! Kevin ________________________________ From: Carson Holt > Sent: Monday, September 20, 2021 8:06 AM To: Kevin Kocot > Cc: maker-devel at yandell-lab.org > Subject: [EXTERNAL] Re: [maker-devel] Troubleshooting Maker failure Hi Kevin, The files are already being skipped because of previous failures. Can you increase the try count (-t on the command line) to something like 6, and send me the STDERR after it generates a new failure. ?Carson Sent from my iPhone On Sep 20, 2021, at 4:56 AM, Kevin Kocot > wrote: ? Thanks Carson! I've uploaded that file here: http://genomes.ua.edu/Kocot/2021-09-10_Maker/round1_run_maker_try2.log ________________________________ From: Carson Holt Sent: Tuesday, September 14, 2021 9:51 PM To: Kevin Kocot Cc: maker-devel at yandell-lab.org Subject: [EXTERNAL] Re: [maker-devel] Troubleshooting Maker failure What I really need is the captured STDERR from the failed run. ?Carson On Sep 10, 2021, at 9:19 AM, Kevin Kocot > wrote: Hi Carson and all, I ran Maker 3.01.03 on a chromosome-level mollusc genome assembly using some evidence I previously generated with Funannotate and BRAKER2, but Maker is not completing successfully. Every scaffold in the datastore_index.log file has both STARTED and FAILED statuses. I can't figure out where the problem lies, though. Running ./Build status indicates all the dependencies are there (I?m not using MPI). The run.log files just ends with "DIED RANK 0" and "DIED COUNT 3." Is there a way to tell which (if any) of the dependencies is misbehaving here (they all seem to run fine independently) or if my evidence .gff3 files are not correctly formatted? I?ve uploaded a zipped sample output folder as well as my config files and the datastore.log file here: http://genomes.ua.edu/Kocot/2021-09-10_Maker/ Any guidance on what might be the issue would be greatly appreciated. Thanks! Kevin Kevin M. Kocot he/him/his Associate Professor & Curator of Invertebrates Department of Biological Sciences & Alabama Museum of Natural History The University of Alabama 307 Mary Harmon Bryant Hall Box 870344 Tuscaloosa, AL 35487 phone 205-348-4052 | fax 205-348-4039 kmkocot at ua.edu | www.kocotlab.com https://uasystem.zoom.us/j/3755490727 _______________________________________________ maker-devel mailing list maker-devel at yandell-lab.org http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at yandell-lab.org http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -- Jacques Dainat -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Sep 1 10:25:23 2021 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 1 Sep 2021 10:25:23 -0600 Subject: [maker-devel] Fatal error in PMPI_Send In-Reply-To: <7bad655de17b4a1ab537e1b624c1da06@unil.ch> References: <7bad655de17b4a1ab537e1b624c1da06@unil.ch> Message-ID: <96237564-C1AC-4BB8-B332-F93560999165@gmail.com> Set --debugmpi as a command line option to maker. It will produce a bunch of extra output that could help. You will want to capture the output to a file. Example: maker --debugmpi &> all_output That might help track down exactly where it happens. From the error below I gather that there is too much data to transfer over MPI. The size of the data message causes the count variable to overflow the max positive value which loops back around to a negative value as a result. It makes me think something is odd with your data input, i.e. you are trying to align raw mRNA-seq reads as evidence instead of assembled read, you set the max_dna_len to a value that is way too large in the control files, or you have an odd contig and evidence dataset resulting in alignment depth in the tens of thousands for a single locus. If that?s the case, set all of the depth_blast parameters in maker_bopts (20 is a good number). ?Carson > On Aug 30, 2021, at 2:45 AM, Patrick Tran Van wrote: > > Hi, > > I have this strange error that kill my job and tthat seems to occur randomly: > > running est2genome search. > #--------- command -------------# > Widget::exonerate::est2genome: > exonerate -q /tmp/maker_JZ_DUt/12/MSTRG%2E13631%2E1.for.154952409-154955259.12.fasta -t /tmp/maker_JZ_DUt/12/Tps_LRv5b_scf2.154952409-154955259.12.fasta -Q dna -T dna --model est2genome --minintron 20 --maxintron 100000 --showcigar --percent 20 > /tmp/maker_JZ_DUt/12/Tps_LRv5b_scf2.154952409-154955259.MSTRG%2E13631%2E1.e.exonerate > #-------------------------------# > Fatal error in PMPI_Send: Invalid count, error stack: > PMPI_Send(159): MPI_Send(buf=0x7f526282e010, count=-1326766804, MPI_CHAR, dest=22, tag=9999, MPI_COMM_WORLD) failed > PMPI_Send(99).: Negative count, value is -1326766804 > > > > Do you know what could be the problem ? > > Best, > > Patrick Tran Van > > Bioinformatician: Lab Chapuisat & Schwander > Department of Ecology and Evolution > University of Lausanne > Lausanne - Switzerland > Office 3206 > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 1376 bytes Desc: not available URL: From kmkocot at ua.edu Fri Sep 10 09:19:14 2021 From: kmkocot at ua.edu (Kevin Kocot) Date: Fri, 10 Sep 2021 15:19:14 +0000 Subject: [maker-devel] Troubleshooting Maker failure In-Reply-To: References: Message-ID: Hi Carson and all, I ran Maker 3.01.03 on a chromosome-level mollusc genome assembly using some evidence I previously generated with Funannotate and BRAKER2, but Maker is not completing successfully. Every scaffold in the datastore_index.log file has both STARTED and FAILED statuses. I can't figure out where the problem lies, though. Running ./Build status indicates all the dependencies are there (I'm not using MPI). The run.log files just ends with "DIED RANK 0" and "DIED COUNT 3." Is there a way to tell which (if any) of the dependencies is misbehaving here (they all seem to run fine independently) or if my evidence .gff3 files are not correctly formatted? I've uploaded a zipped sample output folder as well as my config files and the datastore.log file here: http://genomes.ua.edu/Kocot/2021-09-10_Maker/ Any guidance on what might be the issue would be greatly appreciated. Thanks! Kevin Kevin M. Kocot he/him/his Associate Professor & Curator of Invertebrates Department of Biological Sciences & Alabama Museum of Natural History The University of Alabama 307 Mary Harmon Bryant Hall Box 870344 Tuscaloosa, AL 35487 phone 205-348-4052 | fax 205-348-4039 kmkocot at ua.edu | www.kocotlab.com https://uasystem.zoom.us/j/3755490727 -------------- next part -------------- An HTML attachment was scrubbed... URL: From kmkocot at ua.edu Fri Sep 3 16:09:57 2021 From: kmkocot at ua.edu (Kevin Kocot) Date: Fri, 3 Sep 2021 22:09:57 +0000 Subject: [maker-devel] Maker failing but not sure what dependency is the culprit Message-ID: Hello! I ran Maker on a chromosome-level mollusc genome using some evidence I previously generated with Funannotate and BRAKER2, but Maker is not completing successfully. I can't figure out where the problem lies, though. The run.log files just end with "DIED RANK 0" and "DIED COUNT 3." Attached is a sample output folder (from this location: /home/wirenia/Desktop/2021-08-11_MAKER_Dreissena_rostriformis/PGA_assembly_shortened_headers.fasta.maker.output/PGA_assembly_shortened_headers.fasta_datastore/0A/5A) as well as my config files and master datastore index log. Any guidance on what might be the issue would be greatly appreciated. Thanks! Kevin -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: PGA_assembly_shortened_headers.fasta_master_datastore_index.log Type: text/x-log Size: 474332 bytes Desc: PGA_assembly_shortened_headers.fasta_master_datastore_index.log URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_opts.log Type: text/x-log Size: 5207 bytes Desc: maker_opts.log URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_exe.log Type: text/x-log Size: 1610 bytes Desc: maker_exe.log URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_evm.log Type: text/x-log Size: 893 bytes Desc: maker_evm.log URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_bopts.log Type: text/x-log Size: 1479 bytes Desc: maker_bopts.log URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 5A.zip Type: application/zip Size: 48184 bytes Desc: 5A.zip URL: From stuckerta at gmail.com Mon Sep 6 14:50:13 2021 From: stuckerta at gmail.com (Adam Stuckert) Date: Mon, 6 Sep 2021 14:50:13 -0600 Subject: [maker-devel] Issues including Repeat Masker gff in Maker runs Message-ID: Hi, I have been working on this problem for a while now, but can't seem to come up with a solution despite extensive searching. When I include a repeat masker gff, it always fails with messages like this: #--------------------------------------------------------------------- Now starting the contig!! SeqID: P_RNA_scaffold_115 Length: 2076116 #--------------------------------------------------------------------- setting up GFF3 output and fasta chunks doing repeat masking ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Did not specify a Hit End or Hit Begin STACK: Error::throw STACK: Bio::Root::Root::throw /mnt/oldhome/software/anaconda/colsa/envs/maker-3.01.02/lib/perl5/site_perl/5.22.0/Bio/Root/Root.pm:447 STACK: Bio::Search::HSP::GenericHSP::_subject_seq_feature /mnt/oldhome/software/anaconda/colsa/envs/maker-3.01.02/lib/perl5/site_perl/5.22.0/Bio/Search/HSP/GenericHSP.pm:1603 STACK: Bio::Search::HSP::GenericHSP::hit /mnt/oldhome/software/anaconda/colsa/envs/maker-3.01.02/lib/perl5/site_perl/5.22.0/Bio/Search/HSP/GenericHSP.pm:987 STACK: repeat_mask_seq::separate_types /mnt/oldhome/macmaneslab/macmanes/test/maker/bin/../lib/ repeat_mask_seq.pm:307 STACK: repeat_mask_seq::mask_chunk /mnt/oldhome/macmaneslab/macmanes/test/maker/bin/../lib/ repeat_mask_seq.pm:191 STACK: Process::MpiChunk::_go /mnt/oldhome/macmaneslab/macmanes/test/maker/bin/../lib/Process/MpiChunk.pm:762 STACK: Process::MpiChunk::run /mnt/oldhome/macmaneslab/macmanes/test/maker/bin/../lib/Process/MpiChunk.pm:340 STACK: Process::MpiChunk::run_all /mnt/oldhome/macmaneslab/macmanes/test/maker/bin/../lib/Process/MpiChunk.pm:356 STACK: Process::MpiTiers::run_all /mnt/oldhome/macmaneslab/macmanes/test/maker/bin/../lib/Process/MpiTiers.pm:287 STACK: Process::MpiTiers::run_all /mnt/oldhome/macmaneslab/macmanes/test/maker/bin/../lib/Process/MpiTiers.pm:287 STACK: /mnt/lustre/macmaneslab/macmanes/test/maker/bin/maker:679 ----------------------------------------------------------- --> rank=NA, hostname=node143.rcchpc ERROR: Failed while doing repeat masking ERROR: Chunk failed at level:0, tier_type:1 FAILED CONTIG:P_RNA_scaffold_54 ERROR: Chunk failed at level:2, tier_type:0 FAILED CONTIG:P_RNA_scaffold_54 I have tried a number of ways to modify the output of RepeatMasker to fit with Maker's expectation, and to me, the gff looks fine (see attached, which is the first 500 lines of a repeatmasker.gff I've used. Note it says "similarity" for hit type, but I've also changed the gff to only have "match_part" and had the same issue). I have had this issue across multiple clusters. Any suggestions? Thanks, Adam -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: rm.500lines.gff Type: application/octet-stream Size: 55244 bytes Desc: not available URL: From carsonhh at gmail.com Tue Sep 14 20:51:37 2021 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 14 Sep 2021 20:51:37 -0600 Subject: [maker-devel] Troubleshooting Maker failure In-Reply-To: References: Message-ID: <7CC4EAAC-85A5-4696-8919-D5243F997942@gmail.com> What I really need is the captured STDERR from the failed run. ?Carson > On Sep 10, 2021, at 9:19 AM, Kevin Kocot wrote: > > Hi Carson and all, > > I ran Maker 3.01.03 on a chromosome-level mollusc genome assembly using some evidence I previously generated with Funannotate and BRAKER2, but Maker is not completing successfully. Every scaffold in the datastore_index.log file has both STARTED and FAILED statuses. I can't figure out where the problem lies, though. Running ./Build status indicates all the dependencies are there (I?m not using MPI). The run.log files just ends with "DIED RANK 0" and "DIED COUNT 3." Is there a way to tell which (if any) of the dependencies is misbehaving here (they all seem to run fine independently) or if my evidence .gff3 files are not correctly formatted? > > I?ve uploaded a zipped sample output folder as well as my config files and the datastore.log file here: http://genomes.ua.edu/Kocot/2021-09-10_Maker/ > > Any guidance on what might be the issue would be greatly appreciated. > > Thanks! > Kevin > > Kevin M. Kocot > he/him/his > Associate Professor & Curator of Invertebrates > Department of Biological Sciences & Alabama Museum of Natural History > The University of Alabama > 307 Mary Harmon Bryant Hall > Box 870344 > Tuscaloosa, AL 35487 > phone 205-348-4052 | fax 205-348-4039 > kmkocot at ua.edu | www.kocotlab.com > https://uasystem.zoom.us/j/3755490727 > > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 1376 bytes Desc: not available URL: From carsonhh at gmail.com Tue Sep 14 21:20:17 2021 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 14 Sep 2021 21:20:17 -0600 Subject: [maker-devel] Issues including Repeat Masker gff in Maker runs In-Reply-To: References: Message-ID: <94399EE4-8EB1-49DA-997C-3A8B4D210D46@gmail.com> A couple of things. First let MAKER run RepeatMasker. Don?t provide the RepeatMasker GFF as input to MAKER. The file you sent is GFF version 2 for example which is not backwards compatible with GFF version 3. Second use the latest versions of MAKER2 or MAKER3. There is an issue with RepeatMasker sometimes producing start/end coordinates that are 0 or even negative numbers. The current releases of MAKER2/3 know how to find and fix invalid coordinate features. ?Carson > On Sep 6, 2021, at 2:50 PM, Adam Stuckert wrote: > > Hi, > > I have been working on this problem for a while now, but can't seem to come up with a solution despite extensive searching. When I include a repeat masker gff, it always fails with messages like this: > > #--------------------------------------------------------------------- > Now starting the contig!! > SeqID: P_RNA_scaffold_115 > Length: 2076116 > #--------------------------------------------------------------------- > > > setting up GFF3 output and fasta chunks > doing repeat masking > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Did not specify a Hit End or Hit Begin > STACK: Error::throw > STACK: Bio::Root::Root::throw /mnt/oldhome/software/anaconda/colsa/envs/maker-3.01.02/lib/perl5/site_perl/5.22.0/Bio/Root/Root.pm:447 > STACK: Bio::Search::HSP::GenericHSP::_subject_seq_feature /mnt/oldhome/software/anaconda/colsa/envs/maker-3.01.02/lib/perl5/site_perl/5.22.0/Bio/Search/HSP/GenericHSP.pm:1603 > STACK: Bio::Search::HSP::GenericHSP::hit /mnt/oldhome/software/anaconda/colsa/envs/maker-3.01.02/lib/perl5/site_perl/5.22.0/Bio/Search/HSP/GenericHSP.pm:987 > STACK: repeat_mask_seq::separate_types /mnt/oldhome/macmaneslab/macmanes/test/maker/bin/../lib/repeat_mask_seq.pm:307 > STACK: repeat_mask_seq::mask_chunk /mnt/oldhome/macmaneslab/macmanes/test/maker/bin/../lib/repeat_mask_seq.pm:191 > STACK: Process::MpiChunk::_go /mnt/oldhome/macmaneslab/macmanes/test/maker/bin/../lib/Process/MpiChunk.pm:762 > STACK: Process::MpiChunk::run /mnt/oldhome/macmaneslab/macmanes/test/maker/bin/../lib/Process/MpiChunk.pm:340 > STACK: Process::MpiChunk::run_all /mnt/oldhome/macmaneslab/macmanes/test/maker/bin/../lib/Process/MpiChunk.pm:356 > STACK: Process::MpiTiers::run_all /mnt/oldhome/macmaneslab/macmanes/test/maker/bin/../lib/Process/MpiTiers.pm:287 > STACK: Process::MpiTiers::run_all /mnt/oldhome/macmaneslab/macmanes/test/maker/bin/../lib/Process/MpiTiers.pm:287 > STACK: /mnt/lustre/macmaneslab/macmanes/test/maker/bin/maker:679 > ----------------------------------------------------------- > --> rank=NA, hostname=node143.rcchpc > ERROR: Failed while doing repeat masking > ERROR: Chunk failed at level:0, tier_type:1 > FAILED CONTIG:P_RNA_scaffold_54 > > ERROR: Chunk failed at level:2, tier_type:0 > FAILED CONTIG:P_RNA_scaffold_54 > > I have tried a number of ways to modify the output of RepeatMasker to fit with Maker's expectation, and to me, the gff looks fine (see attached, which is the first 500 lines of a repeatmasker.gff I've used. Note it says "similarity" for hit type, but I've also changed the gff to only have "match_part" and had the same issue). I have had this issue across multiple clusters. > > Any suggestions? > > Thanks, > Adam > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 1376 bytes Desc: not available URL: From kmkocot at ua.edu Mon Sep 20 04:55:59 2021 From: kmkocot at ua.edu (Kevin Kocot) Date: Mon, 20 Sep 2021 10:55:59 +0000 Subject: [maker-devel] [EXTERNAL] Re: Troubleshooting Maker failure Message-ID: Thanks Carson! I've uploaded that file here: http://genomes.ua.edu/Kocot/2021-09-10_Maker/round1_run_maker_try2.log ________________________________ From: Carson Holt Sent: Tuesday, September 14, 2021 9:51 PM To: Kevin Kocot Cc: maker-devel at yandell-lab.org Subject: [EXTERNAL] Re: [maker-devel] Troubleshooting Maker failure What I really need is the captured STDERR from the failed run. ?Carson On Sep 10, 2021, at 9:19 AM, Kevin Kocot > wrote: Hi Carson and all, I ran Maker 3.01.03 on a chromosome-level mollusc genome assembly using some evidence I previously generated with Funannotate and BRAKER2, but Maker is not completing successfully. Every scaffold in the datastore_index.log file has both STARTED and FAILED statuses. I can't figure out where the problem lies, though. Running ./Build status indicates all the dependencies are there (I?m not using MPI). The run.log files just ends with "DIED RANK 0" and "DIED COUNT 3." Is there a way to tell which (if any) of the dependencies is misbehaving here (they all seem to run fine independently) or if my evidence .gff3 files are not correctly formatted? I?ve uploaded a zipped sample output folder as well as my config files and the datastore.log file here: http://genomes.ua.edu/Kocot/2021-09-10_Maker/ Any guidance on what might be the issue would be greatly appreciated. Thanks! Kevin Kevin M. Kocot he/him/his Associate Professor & Curator of Invertebrates Department of Biological Sciences & Alabama Museum of Natural History The University of Alabama 307 Mary Harmon Bryant Hall Box 870344 Tuscaloosa, AL 35487 phone 205-348-4052 | fax 205-348-4039 kmkocot at ua.edu | www.kocotlab.com https://uasystem.zoom.us/j/3755490727 _______________________________________________ maker-devel mailing list maker-devel at yandell-lab.org http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Sep 20 07:06:29 2021 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 20 Sep 2021 07:06:29 -0600 Subject: [maker-devel] [EXTERNAL] Re: Troubleshooting Maker failure In-Reply-To: References: Message-ID: <3F3F5F45-E733-4AC5-892F-39E4964C23D0@gmail.com> Hi Kevin, The files are already being skipped because of previous failures. Can you increase the try count (-t on the command line) to something like 6, and send me the STDERR after it generates a new failure. ?Carson Sent from my iPhone > On Sep 20, 2021, at 4:56 AM, Kevin Kocot wrote: > > ? > Thanks Carson! I've uploaded that file here: > http://genomes.ua.edu/Kocot/2021-09-10_Maker/round1_run_maker_try2.log > > > From: Carson Holt > Sent: Tuesday, September 14, 2021 9:51 PM > To: Kevin Kocot > Cc: maker-devel at yandell-lab.org > Subject: [EXTERNAL] Re: [maker-devel] Troubleshooting Maker failure > > What I really need is the captured STDERR from the failed run. > > ?Carson > >> On Sep 10, 2021, at 9:19 AM, Kevin Kocot wrote: >> >> Hi Carson and all, >> >> I ran Maker 3.01.03 on a chromosome-level mollusc genome assembly using some evidence I previously generated with Funannotate and BRAKER2, but Maker is not completing successfully. Every scaffold in the datastore_index.log file has both STARTED and FAILED statuses. I can't figure out where the problem lies, though. Running ./Build status indicates all the dependencies are there (I?m not using MPI). The run.log files just ends with "DIED RANK 0" and "DIED COUNT 3." Is there a way to tell which (if any) of the dependencies is misbehaving here (they all seem to run fine independently) or if my evidence .gff3 files are not correctly formatted? >> >> I?ve uploaded a zipped sample output folder as well as my config files and the datastore.log file here: http://genomes.ua.edu/Kocot/2021-09-10_Maker/ >> >> Any guidance on what might be the issue would be greatly appreciated. >> >> Thanks! >> Kevin >> >> Kevin M. Kocot >> he/him/his >> Associate Professor & Curator of Invertebrates >> Department of Biological Sciences & Alabama Museum of Natural History >> The University of Alabama >> 307 Mary Harmon Bryant Hall >> Box 870344 >> Tuscaloosa, AL 35487 >> phone 205-348-4052 | fax 205-348-4039 >> kmkocot at ua.edu | www.kocotlab.com >> https://uasystem.zoom.us/j/3755490727 >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at yandell-lab.org >> http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kmkocot at ua.edu Wed Sep 22 08:06:38 2021 From: kmkocot at ua.edu (Kevin Kocot) Date: Wed, 22 Sep 2021 14:06:38 +0000 Subject: [maker-devel] [EXTERNAL] Re: Troubleshooting Maker failure In-Reply-To: <3F3F5F45-E733-4AC5-892F-39E4964C23D0@gmail.com> References: <3F3F5F45-E733-4AC5-892F-39E4964C23D0@gmail.com> Message-ID: Thanks Carson, I think I see the problem now. Here's what I'm getting: ----- STATUS: Parsing control files... STATUS: Processing and indexing input FASTA files... STATUS: Setting up database for any GFF3 input... A data structure will be created for you at: /home/wirenia/Desktop/2021-08-11_MAKER_Dreissena_rostriformis/PGA_assembly_shortened_headers.fasta.maker.output/PGA_assembly_shortened_headers.fasta_datastore To access files for individual sequences use the datastore index: /home/wirenia/Desktop/2021-08-11_MAKER_Dreissena_rostriformis/PGA_assembly_shortened_headers.fasta.maker.output/PGA_assembly_shortened_headers.fasta_master_datastore_index.log STATUS: Now running MAKER... examining contents of the fasta file and run log --Next Contig-- Processing run.log file... #--------------------------------------------------------------------- Now retrying the contig!! SeqID: PGA_scaffold0 Length: 141759199 Tries: 5!! #--------------------------------------------------------------------- setting up GFF3 output and fasta chunks prepare section files Gathering GFF3 input into hits - chunk:0 ERROR: Non-unique top level ID for match.19561.56 While this is technically legal in GFF3, it usually indicates a poorly fomatted GFF3 file (perhaps you tried to merge two GFF3 files without accounting for unique IDs). MAKER will not handle these correctly. --> rank=NA, hostname=wirenia ERROR: Failed while prepare section files ERROR: Chunk failed at level:12, tier_type:3 FAILED CONTIG:PGA_scaffold0 ERROR: Chunk failed at level:4, tier_type:0 FAILED CONTIG:PGA_scaffold0 examining contents of the fasta file and run log ----- It looks like maker doesn't like the format of the exonerate gff3 I am using. I tried 'fixing' it with agat_convert_sp_gxf2gxf.pl, which seemed to work on my Braker output, but that just produced an empty gff3 file for both my exonerate and PASA gff3 files. Any advice on how to prepare exonerate or PASA gff3 files for Maker? Thanks! Kevin ________________________________ From: Carson Holt Sent: Monday, September 20, 2021 8:06 AM To: Kevin Kocot Cc: maker-devel at yandell-lab.org Subject: [EXTERNAL] Re: [maker-devel] Troubleshooting Maker failure Hi Kevin, The files are already being skipped because of previous failures. Can you increase the try count (-t on the command line) to something like 6, and send me the STDERR after it generates a new failure. ?Carson Sent from my iPhone On Sep 20, 2021, at 4:56 AM, Kevin Kocot wrote: ? Thanks Carson! I've uploaded that file here: http://genomes.ua.edu/Kocot/2021-09-10_Maker/round1_run_maker_try2.log ________________________________ From: Carson Holt Sent: Tuesday, September 14, 2021 9:51 PM To: Kevin Kocot Cc: maker-devel at yandell-lab.org Subject: [EXTERNAL] Re: [maker-devel] Troubleshooting Maker failure What I really need is the captured STDERR from the failed run. ?Carson On Sep 10, 2021, at 9:19 AM, Kevin Kocot > wrote: Hi Carson and all, I ran Maker 3.01.03 on a chromosome-level mollusc genome assembly using some evidence I previously generated with Funannotate and BRAKER2, but Maker is not completing successfully. Every scaffold in the datastore_index.log file has both STARTED and FAILED statuses. I can't figure out where the problem lies, though. Running ./Build status indicates all the dependencies are there (I?m not using MPI). The run.log files just ends with "DIED RANK 0" and "DIED COUNT 3." Is there a way to tell which (if any) of the dependencies is misbehaving here (they all seem to run fine independently) or if my evidence .gff3 files are not correctly formatted? I?ve uploaded a zipped sample output folder as well as my config files and the datastore.log file here: http://genomes.ua.edu/Kocot/2021-09-10_Maker/ Any guidance on what might be the issue would be greatly appreciated. Thanks! Kevin Kevin M. Kocot he/him/his Associate Professor & Curator of Invertebrates Department of Biological Sciences & Alabama Museum of Natural History The University of Alabama 307 Mary Harmon Bryant Hall Box 870344 Tuscaloosa, AL 35487 phone 205-348-4052 | fax 205-348-4039 kmkocot at ua.edu | www.kocotlab.com https://uasystem.zoom.us/j/3755490727 _______________________________________________ maker-devel mailing list maker-devel at yandell-lab.org http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Sep 22 10:21:29 2021 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 22 Sep 2021 10:21:29 -0600 Subject: [maker-devel] [EXTERNAL] Troubleshooting Maker failure In-Reply-To: References: <3F3F5F45-E733-4AC5-892F-39E4964C23D0@gmail.com> Message-ID: <92CB6C3D-688F-4EFC-B763-BD51CF455FFA@gmail.com> I?d have to see the GFF, but in general you should organize sequence alignments as match/match_part features. Here is n example from the GFF3 format specification: ctg123 . cDNA_match 1200 9000 . . . ID=cDNA00001 ctg123 . match_part 1200 3200 2.2e-30 + . ID=match00002;Parent=cDNA00001;Target=mjm1123.5 5 506;Gap=M301 D1499 M201 ctg123 . match_part 7000 9000 7.4e-32 - . ID=match00003;Parent=cDNA00001;Target=mjm1123.3 1 502;Gap=M101 D1499 M401 Also make sure you are not inadvertently using GFF2 or GTF. They are not backwards compatible with GFF3. ?Carson > On Sep 22, 2021, at 8:06 AM, Kevin Kocot wrote: > > Thanks Carson, > > I think I see the problem now. Here's what I'm getting: > > ----- > STATUS: Parsing control files... > STATUS: Processing and indexing input FASTA files... > STATUS: Setting up database for any GFF3 input... > A data structure will be created for you at: > /home/wirenia/Desktop/2021-08-11_MAKER_Dreissena_rostriformis/PGA_assembly_shortened_headers.fasta.maker.output/PGA_assembly_shortened_headers.fasta_datastore > > To access files for individual sequences use the datastore index: > /home/wirenia/Desktop/2021-08-11_MAKER_Dreissena_rostriformis/PGA_assembly_shortened_headers.fasta.maker.output/PGA_assembly_shortened_headers.fasta_master_datastore_index.log > > STATUS: Now running MAKER... > examining contents of the fasta file and run log > > > > --Next Contig-- > > Processing run.log file... > #--------------------------------------------------------------------- > Now retrying the contig!! > SeqID: PGA_scaffold0 > Length: 141759199 > Tries: 5!! > #--------------------------------------------------------------------- > > > setting up GFF3 output and fasta chunks > prepare section files > Gathering GFF3 input into hits - chunk:0 > ERROR: Non-unique top level ID for match.19561.56 > While this is technically legal in GFF3, it usually > indicates a poorly fomatted GFF3 file (perhaps you > tried to merge two GFF3 files without accounting for > unique IDs). MAKER will not handle these correctly. > > --> rank=NA, hostname=wirenia > ERROR: Failed while prepare section files > ERROR: Chunk failed at level:12, tier_type:3 > FAILED CONTIG:PGA_scaffold0 > > ERROR: Chunk failed at level:4, tier_type:0 > FAILED CONTIG:PGA_scaffold0 > > examining contents of the fasta file and run log > ----- > > It looks like maker doesn't like the format of the exonerate gff3 I am using. I tried 'fixing' it with agat_convert_sp_gxf2gxf.pl, which seemed to work on my Braker output, but that just produced an empty gff3 file for both my exonerate and PASA gff3 files. Any advice on how to prepare exonerate or PASA gff3 files for Maker? > > Thanks! > Kevin > From: Carson Holt > > Sent: Monday, September 20, 2021 8:06 AM > To: Kevin Kocot > > Cc: maker-devel at yandell-lab.org > > Subject: [EXTERNAL] Re: [maker-devel] Troubleshooting Maker failure > > Hi Kevin, > > The files are already being skipped because of previous failures. Can you increase the try count (-t on the command line) to something like 6, and send me the STDERR after it generates a new failure. > > ?Carson > > Sent from my iPhone > >> On Sep 20, 2021, at 4:56 AM, Kevin Kocot > wrote: >> >> ? >> Thanks Carson! I've uploaded that file here: >> http://genomes.ua.edu/Kocot/2021-09-10_Maker/round1_run_maker_try2.log >> >> >> From: Carson Holt >> Sent: Tuesday, September 14, 2021 9:51 PM >> To: Kevin Kocot >> Cc: maker-devel at yandell-lab.org >> Subject: [EXTERNAL] Re: [maker-devel] Troubleshooting Maker failure >> >> What I really need is the captured STDERR from the failed run. >> >> ?Carson >> >>> On Sep 10, 2021, at 9:19 AM, Kevin Kocot > wrote: >>> >>> Hi Carson and all, >>> >>> I ran Maker 3.01.03 on a chromosome-level mollusc genome assembly using some evidence I previously generated with Funannotate and BRAKER2, but Maker is not completing successfully. Every scaffold in the datastore_index.log file has both STARTED and FAILED statuses. I can't figure out where the problem lies, though. Running ./Build status indicates all the dependencies are there (I?m not using MPI). The run.log files just ends with "DIED RANK 0" and "DIED COUNT 3." Is there a way to tell which (if any) of the dependencies is misbehaving here (they all seem to run fine independently) or if my evidence .gff3 files are not correctly formatted? >>> >>> I?ve uploaded a zipped sample output folder as well as my config files and the datastore.log file here: http://genomes.ua.edu/Kocot/2021-09-10_Maker/ >>> >>> Any guidance on what might be the issue would be greatly appreciated. >>> >>> Thanks! >>> Kevin >>> >>> Kevin M. Kocot >>> he/him/his >>> Associate Professor & Curator of Invertebrates >>> Department of Biological Sciences & Alabama Museum of Natural History >>> The University of Alabama >>> 307 Mary Harmon Bryant Hall >>> Box 870344 >>> Tuscaloosa, AL 35487 >>> phone 205-348-4052 | fax 205-348-4039 >>> kmkocot at ua.edu | www.kocotlab.com >>> https://uasystem.zoom.us/j/3755490727 >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at yandell-lab.org >>> http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 1376 bytes Desc: not available URL: From jacques.dainat at nbis.se Wed Sep 22 13:09:52 2021 From: jacques.dainat at nbis.se (Jacques Dainat) Date: Wed, 22 Sep 2021 21:09:52 +0200 Subject: [maker-devel] [EXTERNAL] Troubleshooting Maker failure In-Reply-To: <92CB6C3D-688F-4EFC-B763-BD51CF455FFA@gmail.com> References: <3F3F5F45-E733-4AC5-892F-39E4964C23D0@gmail.com> <92CB6C3D-688F-4EFC-B763-BD51CF455FFA@gmail.com> Message-ID: <9FBD8DD4-660C-4A6D-90A2-9DF1B948F236@nbis.se> Hi Kevin, About using AGAT (agat_convert_sp_gxf2gxf.pl), two reasons to get empty output files. I) The feature types (3rd column) are not yet handled by AGAT. You can inform AGAT how to deal with it See https://agat.readthedocs.io/en/latest/troubleshooting.html#agat-throws-features-out-because-the-feature-type-is-not-yet-taken-into-account II) The features are thrown by AGAT because child feature are missing (e.g. gene feature expect at least one transcript linked to it). See https://agat.readthedocs.io/en/latest/troubleshooting.html#agat-throws-features-out-because-child-features-are-not-provided I invite you to open an issue in the AGAT GitHub repository. Once the file is parsed correctly you can use the script agat_sp_alignment_output_style.pl to turn level1 feature types (e.g. gene) and level2 feature types (e.g. mRNA) into match and match_part features respectively as it can be preferred by MAKER. Best regards, Jacques Dainat, Ph.D. > On 22 Sep 2021, at 18:21, Carson Holt wrote: > > I?d have to see the GFF, but in general you should organize sequence alignments as match/match_part features. > > Here is n example from the GFF3 format specification: > > ctg123 . cDNA_match 1200 9000 . . . ID=cDNA00001 > ctg123 . match_part 1200 3200 2.2e-30 + . ID=match00002;Parent=cDNA00001;Target=mjm1123.5 5 506;Gap=M301 D1499 M201 > ctg123 . match_part 7000 9000 7.4e-32 - . ID=match00003;Parent=cDNA00001;Target=mjm1123.3 1 502;Gap=M101 D1499 M401 > > Also make sure you are not inadvertently using GFF2 or GTF. They are not backwards compatible with GFF3. > > ?Carson > > >> On Sep 22, 2021, at 8:06 AM, Kevin Kocot > wrote: >> >> Thanks Carson, >> >> I think I see the problem now. Here's what I'm getting: >> >> ----- >> STATUS: Parsing control files... >> STATUS: Processing and indexing input FASTA files... >> STATUS: Setting up database for any GFF3 input... >> A data structure will be created for you at: >> /home/wirenia/Desktop/2021-08-11_MAKER_Dreissena_rostriformis/PGA_assembly_shortened_headers.fasta.maker.output/PGA_assembly_shortened_headers.fasta_datastore >> >> To access files for individual sequences use the datastore index: >> /home/wirenia/Desktop/2021-08-11_MAKER_Dreissena_rostriformis/PGA_assembly_shortened_headers.fasta.maker.output/PGA_assembly_shortened_headers.fasta_master_datastore_index.log >> >> STATUS: Now running MAKER... >> examining contents of the fasta file and run log >> >> >> >> --Next Contig-- >> >> Processing run.log file... >> #--------------------------------------------------------------------- >> Now retrying the contig!! >> SeqID: PGA_scaffold0 >> Length: 141759199 >> Tries: 5!! >> #--------------------------------------------------------------------- >> >> >> setting up GFF3 output and fasta chunks >> prepare section files >> Gathering GFF3 input into hits - chunk:0 >> ERROR: Non-unique top level ID for match.19561.56 >> While this is technically legal in GFF3, it usually >> indicates a poorly fomatted GFF3 file (perhaps you >> tried to merge two GFF3 files without accounting for >> unique IDs). MAKER will not handle these correctly. >> >> --> rank=NA, hostname=wirenia >> ERROR: Failed while prepare section files >> ERROR: Chunk failed at level:12, tier_type:3 >> FAILED CONTIG:PGA_scaffold0 >> >> ERROR: Chunk failed at level:4, tier_type:0 >> FAILED CONTIG:PGA_scaffold0 >> >> examining contents of the fasta file and run log >> ----- >> >> It looks like maker doesn't like the format of the exonerate gff3 I am using. I tried 'fixing' it with agat_convert_sp_gxf2gxf.pl, which seemed to work on my Braker output, but that just produced an empty gff3 file for both my exonerate and PASA gff3 files. Any advice on how to prepare exonerate or PASA gff3 files for Maker? >> >> Thanks! >> Kevin >> From: Carson Holt > >> Sent: Monday, September 20, 2021 8:06 AM >> To: Kevin Kocot > >> Cc: maker-devel at yandell-lab.org > >> Subject: [EXTERNAL] Re: [maker-devel] Troubleshooting Maker failure >> >> Hi Kevin, >> >> The files are already being skipped because of previous failures. Can you increase the try count (-t on the command line) to something like 6, and send me the STDERR after it generates a new failure. >> >> ?Carson >> >> Sent from my iPhone >> >>> On Sep 20, 2021, at 4:56 AM, Kevin Kocot > wrote: >>> >>> ? >>> Thanks Carson! I've uploaded that file here: >>> http://genomes.ua.edu/Kocot/2021-09-10_Maker/round1_run_maker_try2.log >>> >>> >>> From: Carson Holt >>> Sent: Tuesday, September 14, 2021 9:51 PM >>> To: Kevin Kocot >>> Cc: maker-devel at yandell-lab.org >>> Subject: [EXTERNAL] Re: [maker-devel] Troubleshooting Maker failure >>> >>> What I really need is the captured STDERR from the failed run. >>> >>> ?Carson >>> >>>> On Sep 10, 2021, at 9:19 AM, Kevin Kocot > wrote: >>>> >>>> Hi Carson and all, >>>> >>>> I ran Maker 3.01.03 on a chromosome-level mollusc genome assembly using some evidence I previously generated with Funannotate and BRAKER2, but Maker is not completing successfully. Every scaffold in the datastore_index.log file has both STARTED and FAILED statuses. I can't figure out where the problem lies, though. Running ./Build status indicates all the dependencies are there (I?m not using MPI). The run.log files just ends with "DIED RANK 0" and "DIED COUNT 3." Is there a way to tell which (if any) of the dependencies is misbehaving here (they all seem to run fine independently) or if my evidence .gff3 files are not correctly formatted? >>>> >>>> I?ve uploaded a zipped sample output folder as well as my config files and the datastore.log file here: http://genomes.ua.edu/Kocot/2021-09-10_Maker/ >>>> >>>> Any guidance on what might be the issue would be greatly appreciated. >>>> >>>> Thanks! >>>> Kevin >>>> >>>> Kevin M. Kocot >>>> he/him/his >>>> Associate Professor & Curator of Invertebrates >>>> Department of Biological Sciences & Alabama Museum of Natural History >>>> The University of Alabama >>>> 307 Mary Harmon Bryant Hall >>>> Box 870344 >>>> Tuscaloosa, AL 35487 >>>> phone 205-348-4052 | fax 205-348-4039 >>>> kmkocot at ua.edu | www.kocotlab.com >>>> https://uasystem.zoom.us/j/3755490727 >>>> >>>> _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at yandell-lab.org >>>> http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From kmkocot at ua.edu Thu Sep 23 10:42:12 2021 From: kmkocot at ua.edu (Kevin Kocot) Date: Thu, 23 Sep 2021 16:42:12 +0000 Subject: [maker-devel] [EXTERNAL] Re: Troubleshooting Maker failure In-Reply-To: <9FBD8DD4-660C-4A6D-90A2-9DF1B948F236@nbis.se> References: <3F3F5F45-E733-4AC5-892F-39E4964C23D0@gmail.com> <92CB6C3D-688F-4EFC-B763-BD51CF455FFA@gmail.com> <9FBD8DD4-660C-4A6D-90A2-9DF1B948F236@nbis.se> Message-ID: Hi Carson and Jacques, Thank you both very much for the help. It looks like my PASA and exonerate (via Funannotate) gff3 files are not correctly formatted for maker. I've uploaded them here just in case it might be helpful for you to see them, but I will try to figure out how to reformat them correctly following Jacques's advice. http://genomes.ua.edu/Kocot/2021-09-10_Maker/pasa_predictions.gff3 http://genomes.ua.edu/Kocot/2021-09-10_Maker/protein_alignments.gff3 Is there a standalone tool or Maker feature I'm not seeing that can assess whether a gff3 file is correctly formatted for Maker? Thank you again! Kevin ________________________________ From: Jacques Dainat Sent: Wednesday, September 22, 2021 2:09 PM To: Kevin Kocot Cc: maker-devel at yandell-lab.org ; Carson Holt Subject: [EXTERNAL] Re: [maker-devel] Troubleshooting Maker failure Hi Kevin, About using AGAT (agat_convert_sp_gxf2gxf.pl), two reasons to get empty output files. I) The feature types (3rd column) are not yet handled by AGAT. You can inform AGAT how to deal with it See https://agat.readthedocs.io/en/latest/troubleshooting.html#agat-throws-features-out-because-the-feature-type-is-not-yet-taken-into-account II) The features are thrown by AGAT because child feature are missing (e.g. gene feature expect at least one transcript linked to it). See https://agat.readthedocs.io/en/latest/troubleshooting.html#agat-throws-features-out-because-child-features-are-not-provided I invite you to open an issue in the AGAT GitHub repository. Once the file is parsed correctly you can use the script agat_sp_alignment_output_style.pl to turn level1 feature types (e.g. gene) and level2 feature types (e.g. mRNA) into match and match_part features respectively as it can be preferred by MAKER. Best regards, Jacques Dainat, Ph.D. On 22 Sep 2021, at 18:21, Carson Holt > wrote: I?d have to see the GFF, but in general you should organize sequence alignments as match/match_part features. Here is n example from the GFF3 format specification: ctg123 . cDNA_match 1200 9000 . . . ID=cDNA00001 ctg123 . match_part 1200 3200 2.2e-30 + . ID=match00002;Parent=cDNA00001;Target=mjm1123.5 5 506;Gap=M301 D1499 M201 ctg123 . match_part 7000 9000 7.4e-32 - . ID=match00003;Parent=cDNA00001;Target=mjm1123.3 1 502;Gap=M101 D1499 M401 Also make sure you are not inadvertently using GFF2 or GTF. They are not backwards compatible with GFF3. ?Carson On Sep 22, 2021, at 8:06 AM, Kevin Kocot > wrote: Thanks Carson, I think I see the problem now. Here's what I'm getting: ----- STATUS: Parsing control files... STATUS: Processing and indexing input FASTA files... STATUS: Setting up database for any GFF3 input... A data structure will be created for you at: /home/wirenia/Desktop/2021-08-11_MAKER_Dreissena_rostriformis/PGA_assembly_shortened_headers.fasta.maker.output/PGA_assembly_shortened_headers.fasta_datastore To access files for individual sequences use the datastore index: /home/wirenia/Desktop/2021-08-11_MAKER_Dreissena_rostriformis/PGA_assembly_shortened_headers.fasta.maker.output/PGA_assembly_shortened_headers.fasta_master_datastore_index.log STATUS: Now running MAKER... examining contents of the fasta file and run log --Next Contig-- Processing run.log file... #--------------------------------------------------------------------- Now retrying the contig!! SeqID: PGA_scaffold0 Length: 141759199 Tries: 5!! #--------------------------------------------------------------------- setting up GFF3 output and fasta chunks prepare section files Gathering GFF3 input into hits - chunk:0 ERROR: Non-unique top level ID for match.19561.56 While this is technically legal in GFF3, it usually indicates a poorly fomatted GFF3 file (perhaps you tried to merge two GFF3 files without accounting for unique IDs). MAKER will not handle these correctly. --> rank=NA, hostname=wirenia ERROR: Failed while prepare section files ERROR: Chunk failed at level:12, tier_type:3 FAILED CONTIG:PGA_scaffold0 ERROR: Chunk failed at level:4, tier_type:0 FAILED CONTIG:PGA_scaffold0 examining contents of the fasta file and run log ----- It looks like maker doesn't like the format of the exonerate gff3 I am using. I tried 'fixing' it with agat_convert_sp_gxf2gxf.pl, which seemed to work on my Braker output, but that just produced an empty gff3 file for both my exonerate and PASA gff3 files. Any advice on how to prepare exonerate or PASA gff3 files for Maker? Thanks! Kevin ________________________________ From: Carson Holt > Sent: Monday, September 20, 2021 8:06 AM To: Kevin Kocot > Cc: maker-devel at yandell-lab.org > Subject: [EXTERNAL] Re: [maker-devel] Troubleshooting Maker failure Hi Kevin, The files are already being skipped because of previous failures. Can you increase the try count (-t on the command line) to something like 6, and send me the STDERR after it generates a new failure. ?Carson Sent from my iPhone On Sep 20, 2021, at 4:56 AM, Kevin Kocot > wrote: ? Thanks Carson! I've uploaded that file here: http://genomes.ua.edu/Kocot/2021-09-10_Maker/round1_run_maker_try2.log ________________________________ From: Carson Holt Sent: Tuesday, September 14, 2021 9:51 PM To: Kevin Kocot Cc: maker-devel at yandell-lab.org Subject: [EXTERNAL] Re: [maker-devel] Troubleshooting Maker failure What I really need is the captured STDERR from the failed run. ?Carson On Sep 10, 2021, at 9:19 AM, Kevin Kocot > wrote: Hi Carson and all, I ran Maker 3.01.03 on a chromosome-level mollusc genome assembly using some evidence I previously generated with Funannotate and BRAKER2, but Maker is not completing successfully. Every scaffold in the datastore_index.log file has both STARTED and FAILED statuses. I can't figure out where the problem lies, though. Running ./Build status indicates all the dependencies are there (I?m not using MPI). The run.log files just ends with "DIED RANK 0" and "DIED COUNT 3." Is there a way to tell which (if any) of the dependencies is misbehaving here (they all seem to run fine independently) or if my evidence .gff3 files are not correctly formatted? I?ve uploaded a zipped sample output folder as well as my config files and the datastore.log file here: http://genomes.ua.edu/Kocot/2021-09-10_Maker/ Any guidance on what might be the issue would be greatly appreciated. Thanks! Kevin Kevin M. Kocot he/him/his Associate Professor & Curator of Invertebrates Department of Biological Sciences & Alabama Museum of Natural History The University of Alabama 307 Mary Harmon Bryant Hall Box 870344 Tuscaloosa, AL 35487 phone 205-348-4052 | fax 205-348-4039 kmkocot at ua.edu | www.kocotlab.com https://uasystem.zoom.us/j/3755490727 _______________________________________________ maker-devel mailing list maker-devel at yandell-lab.org http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at yandell-lab.org http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From jacques.dainat at gmail.com Fri Sep 24 02:41:49 2021 From: jacques.dainat at gmail.com (Jacques Dainat) Date: Fri, 24 Sep 2021 10:41:49 +0200 Subject: [maker-devel] [EXTERNAL] Re: Troubleshooting Maker failure In-Reply-To: <3161576B-6A96-4876-B348-5CFF993188F4@nbis.se> References: <3161576B-6A96-4876-B348-5CFF993188F4@nbis.se> Message-ID: Hi Kevin, I have checked the 2 files. About pasa_predictions.gff3 : It is parsed without any problem with agat_convert_sp_gxf2gxf.pl in about 10 mins. MAKER (>=v3) should be able to use it like that, but you might prefer to convert it into match/match_part style using the agat_sp_alignment_output_style.p l. The match/match_part style works in all of MAKER version. About protein_alignments.gff3 : This file is more problematic, it contains only 1 feature type, which is level1 in AGAT (i.e. like a gene), and in the current state is expecting sub-features. On top of that, the ID attribute is confusing because it is supposed to be unique, and it is not. So here the commands you should launch to get a proper file for MAKER: First ``` sed 's/nucleotide_to_protein_match/match_part/' protein_alignments.gff3 | sed 's/ID=/Parent=/' > protein_alignments_repared.gff3 ``` Then ``` agat_convert_sp_gxf2gxf.pl --gff protein_alignments_repared.gff3 -o protein_alignments_clean.gff3 ``` Then you should be good with a proper match/match_part file. Best, /Jacques ? > Hi Carson and Jacques, > > Thank you both very much for the help. It looks like my PASA and exonerate > (via Funannotate) gff3 files are not correctly formatted for maker. I've > uploaded them here just in case it might be helpful for you to see them, > but I will try to figure out how to reformat them correctly following > Jacques's advice. > http://genomes.ua.edu/Kocot/2021-09-10_Maker/pasa_predictions.gff3 > http://genomes.ua.edu/Kocot/2021-09-10_Maker/protein_alignments.gff3 > > Is there a standalone tool or Maker feature I'm not seeing that can assess > whether a gff3 file is correctly formatted for Maker? > > Thank you again! > Kevin > ------------------------------ > *From:* Jacques Dainat > *Sent:* Wednesday, September 22, 2021 2:09 PM > *To:* Kevin Kocot > *Cc:* maker-devel at yandell-lab.org ; Carson > Holt > *Subject:* [EXTERNAL] Re: [maker-devel] Troubleshooting Maker failure > > Hi Kevin, > > About using AGAT (agat_convert_sp_gxf2gxf.pl), two reasons to get empty > output files. > I) The feature types (3rd column) are not yet handled by AGAT. You can > inform AGAT how to deal with it > See > https://agat.readthedocs.io/en/latest/troubleshooting.html#agat-throws-features-out-because-the-feature-type-is-not-yet-taken-into-account > > II) The features are thrown by AGAT because child feature are missing (e.g. > gene feature expect at least one transcript linked to it). See > https://agat.readthedocs.io/en/latest/troubleshooting.html#agat-throws-features-out-because-child-features-are-not-provided > > > I invite you to open an issue in the AGAT GitHub repository. > > Once the file is parsed correctly you can use the script > agat_sp_alignment_output_style.pl to turn level1 feature types (e.g. > gene) and level2 feature types (e.g. mRNA) into match and match_part > features respectively as it can be preferred by MAKER. > > Best regards, > > Jacques Dainat, Ph.D. > > > On 22 Sep 2021, at 18:21, Carson Holt wrote: > > I?d have to see the GFF, but in general you should organize sequence > alignments as match/match_part features. > > Here is n example from the GFF3 format specification: > > ctg123 . cDNA_match 1200 9000 . . . ID=cDNA00001 > ctg123 . match_part 1200 3200 2.2e-30 + . > ID=match00002;Parent=cDNA00001;Target=mjm1123.5 5 506;Gap=M301 D1499 M201 > ctg123 . match_part 7000 9000 7.4e-32 - . > ID=match00003;Parent=cDNA00001;Target=mjm1123.3 1 502;Gap=M101 D1499 M401 > > Also make sure you are not inadvertently using GFF2 or GTF. They are not > backwards compatible with GFF3. > > ?Carson > > > On Sep 22, 2021, at 8:06 AM, Kevin Kocot wrote: > > Thanks Carson, > > I think I see the problem now. Here's what I'm getting: > > ----- > STATUS: Parsing control files... > STATUS: Processing and indexing input FASTA files... > STATUS: Setting up database for any GFF3 input... > A data structure will be created for you at: > > /home/wirenia/Desktop/2021-08-11_MAKER_Dreissena_rostriformis/PGA_assembly_shortened_headers.fasta.maker.output/PGA_assembly_shortened_headers.fasta_datastore > > To access files for individual sequences use the datastore index: > > /home/wirenia/Desktop/2021-08-11_MAKER_Dreissena_rostriformis/PGA_assembly_shortened_headers.fasta.maker.output/PGA_assembly_shortened_headers.fasta_master_datastore_index.log > > STATUS: Now running MAKER... > examining contents of the fasta file and run log > > > > --Next Contig-- > > Processing run.log file... > #--------------------------------------------------------------------- > Now retrying the contig!! > SeqID: PGA_scaffold0 > Length: 141759199 > Tries: 5!! > #--------------------------------------------------------------------- > > > setting up GFF3 output and fasta chunks > prepare section files > Gathering GFF3 input into hits - chunk:0 > ERROR: Non-unique top level ID for match.19561.56 > While this is technically legal in GFF3, it usually > indicates a poorly fomatted GFF3 file (perhaps you > tried to merge two GFF3 files without accounting for > unique IDs). MAKER will not handle these correctly. > > --> rank=NA, hostname=wirenia > ERROR: Failed while prepare section files > ERROR: Chunk failed at level:12, tier_type:3 > FAILED CONTIG:PGA_scaffold0 > > ERROR: Chunk failed at level:4, tier_type:0 > FAILED CONTIG:PGA_scaffold0 > > examining contents of the fasta file and run log > ----- > > It looks like maker doesn't like the format of the exonerate gff3 I am > using. I tried 'fixing' it with agat_convert_sp_gxf2gxf.pl, which seemed > to work on my Braker output, but that just produced an empty gff3 file for > both my exonerate and PASA gff3 files. Any advice on how to prepare > exonerate or PASA gff3 files for Maker? > > Thanks! > Kevin > ------------------------------ > *From:* Carson Holt > *Sent:* Monday, September 20, 2021 8:06 AM > *To:* Kevin Kocot > *Cc:* maker-devel at yandell-lab.org > *Subject:* [EXTERNAL] Re: [maker-devel] Troubleshooting Maker failure > > Hi Kevin, > > The files are already being skipped because of previous failures. Can you > increase the try count (-t on the command line) to something like 6, and > send me the STDERR after it generates a new failure. > > ?Carson > > Sent from my iPhone > > On Sep 20, 2021, at 4:56 AM, Kevin Kocot wrote: > > ? > Thanks Carson! I've uploaded that file here: > http://genomes.ua.edu/Kocot/2021-09-10_Maker/round1_run_maker_try2.log > > > ------------------------------ > *From:* Carson Holt > *Sent:* Tuesday, September 14, 2021 9:51 PM > *To:* Kevin Kocot > *Cc:* maker-devel at yandell-lab.org > *Subject:* [EXTERNAL] Re: [maker-devel] Troubleshooting Maker failure > > What I really need is the captured STDERR from the failed run. > > ?Carson > > On Sep 10, 2021, at 9:19 AM, Kevin Kocot wrote: > > Hi Carson and all, > > I ran Maker 3.01.03 on a chromosome-level mollusc genome assembly using > some evidence I previously generated with Funannotate and BRAKER2, but > Maker is not completing successfully. Every scaffold in the > datastore_index.log file has both STARTED and FAILED statuses. I can't > figure out where the problem lies, though. Running ./Build status indicates > all the dependencies are there (I?m not using MPI). The run.log files just > ends with "DIED RANK 0" and "DIED COUNT 3." Is there a way to tell which > (if any) of the dependencies is misbehaving here (they all seem to run fine > independently) or if my evidence .gff3 files are not correctly formatted? > > I?ve uploaded a zipped sample output folder as well as my config files and > the datastore.log file here: http://genomes.ua.edu/Kocot/2021-09-10_Maker/ > > Any guidance on what might be the issue would be greatly appreciated. > > Thanks! > Kevin > > > > *Kevin M. Kocot *he/him/his > > Associate Professor & Curator of Invertebrates > Department of Biological Sciences & Alabama Museum of Natural History > The University of Alabama > 307 Mary Harmon Bryant Hall > Box 870344 > Tuscaloosa, AL 35487 > phone 205-348-4052 | fax 205-348-4039 > kmkocot at ua.edu | www.kocotlab.com > > https://uasystem.zoom.us/j/3755490727 > > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org > > > > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org > > > -- Jacques Dainat -------------- next part -------------- An HTML attachment was scrubbed... URL: From kchilds at msu.edu Sun Sep 26 13:54:06 2021 From: kchilds at msu.edu (Childs, Kevin) Date: Sun, 26 Sep 2021 19:54:06 +0000 Subject: [maker-devel] mising UTRs even with transcript evidence Message-ID: Carson, I am helping a student to annotate a genome. Despite having provided transcript evidence, there are many cases where the MAKER gene models are missing obvious UTRs that are found in the transcripts. The link below is for a screenshot with some examples. https://figshare.com/s/0b29e0b0ac10f868dbe9 In the figure, there are genes at ~148 kbp, ~152 kbp, ~160 kbp, ~170 kbp that are missing UTRs that should have been predicted by the transcripts in the second track. The annotation is pretty much riddled with this type of behavior. One mea culpa, the transcript data was provided as a gff file, and that file had all transcripts defined with a source of ?bed2gff? instead of ?est2genome?. I feel that I had found that to be important during some past annotation work, but it doesn?t explain why many but not all genes have UTRs when evidence is present. Thanks. Kevin --- Kevin Childs, PhD Assistant Professor - Fixed Term Director MSU Genomics Core Facility Plant Biology Department Michigan State University kchilds at msu.edu 517-775-2844 (m) 517-884-6926 (o) http://childslab.plantbiology.msu.edu https://rtsf.natsci.msu.edu/genomics/ From carsonhh at gmail.com Mon Sep 27 09:01:22 2021 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 27 Sep 2021 09:01:22 -0600 Subject: [maker-devel] mising UTRs even with transcript evidence In-Reply-To: References: Message-ID: <9DBC6B87-E086-42CB-9B4D-80385652CA9A@gmail.com> MAKER needs a little more info about an alignment that may not exist in the GFF3. MAKER2 will reject GFF3 input for UTR generation, but MAKER3 will use it as long as it can generate some of the missing info internally using certain assumptions about the alignment. Also alignments will be rejected for UTR generation that have non-canonical splicing. ?Carson > On Sep 26, 2021, at 1:54 PM, Childs, Kevin wrote: > > Carson, > > I am helping a student to annotate a genome. Despite having provided transcript evidence, there are many cases where the MAKER gene models are missing obvious UTRs that are found in the transcripts. The link below is for a screenshot with some examples. > > https://figshare.com/s/0b29e0b0ac10f868dbe9 > > In the figure, there are genes at ~148 kbp, ~152 kbp, ~160 kbp, ~170 kbp that are missing UTRs that should have been predicted by the transcripts in the second track. The annotation is pretty much riddled with this type of behavior. > > One mea culpa, the transcript data was provided as a gff file, and that file had all transcripts defined with a source of ?bed2gff? instead of ?est2genome?. I feel that I had found that to be important during some past annotation work, but it doesn?t explain why many but not all genes have UTRs when evidence is present. > > Thanks. > > Kevin > > --- > Kevin Childs, PhD > > Assistant Professor - Fixed Term > Director MSU Genomics Core Facility > Plant Biology Department > Michigan State University > > kchilds at msu.edu > 517-775-2844 (m) > 517-884-6926 (o) > > http://childslab.plantbiology.msu.edu > https://rtsf.natsci.msu.edu/genomics/ > > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 1376 bytes Desc: not available URL: From kmkocot at ua.edu Mon Sep 27 11:36:18 2021 From: kmkocot at ua.edu (Kevin Kocot) Date: Mon, 27 Sep 2021 17:36:18 +0000 Subject: [maker-devel] [EXTERNAL] Re: Troubleshooting Maker failure In-Reply-To: References: <3161576B-6A96-4876-B348-5CFF993188F4@nbis.se> Message-ID: Hi all, Thank you again for the help! This solved my issue with the file formatting and I was able to successfully run Maker on a test scaffold. I have the full run going now. Thanks again! Kevin From: Jacques Dainat Sent: Friday, September 24, 2021 3:42 AM To: Kevin Kocot Cc: carsonhh at gmail.com; maker-devel at yandell-lab.org Subject: [EXTERNAL] Re: [maker-devel] Troubleshooting Maker failure Hi Kevin, I have checked the 2 files. About pasa_predictions.gff3: It is parsed without any problem with agat_convert_sp_gxf2gxf.pl in about 10 mins. MAKER (>=v3) should be able to use it like that, but you might prefer to convert it into match/match_part style using the agat_sp_alignment_output_style.pl. The match/match_part style works in all of MAKER version. About protein_alignments.gff3: This file is more problematic, it contains only 1 feature type, which is level1 in AGAT (i.e. like a gene), and in the current state is expecting sub-features. On top of that, the ID attribute is confusing because it is supposed to be unique, and it is not. So here the commands you should launch to get a proper file for MAKER: First ``` sed 's/nucleotide_to_protein_match/match_part/' protein_alignments.gff3 | sed 's/ID=/Parent=/' > protein_alignments_repared.gff3 ``` Then ``` agat_convert_sp_gxf2gxf.pl --gff protein_alignments_repared.gff3 -o protein_alignments_clean.gff3 ``` Then you should be good with a proper match/match_part file. Best, /Jacques ? Hi Carson and Jacques, Thank you both very much for the help. It looks like my PASA and exonerate (via Funannotate) gff3 files are not correctly formatted for maker. I've uploaded them here just in case it might be helpful for you to see them, but I will try to figure out how to reformat them correctly following Jacques's advice. http://genomes.ua.edu/Kocot/2021-09-10_Maker/pasa_predictions.gff3 http://genomes.ua.edu/Kocot/2021-09-10_Maker/protein_alignments.gff3 Is there a standalone tool or Maker feature I'm not seeing that can assess whether a gff3 file is correctly formatted for Maker? Thank you again! Kevin ________________________________ From: Jacques Dainat > Sent: Wednesday, September 22, 2021 2:09 PM To: Kevin Kocot > Cc: maker-devel at yandell-lab.org >; Carson Holt > Subject: [EXTERNAL] Re: [maker-devel] Troubleshooting Maker failure Hi Kevin, About using AGAT (agat_convert_sp_gxf2gxf.pl), two reasons to get empty output files. I) The feature types (3rd column) are not yet handled by AGAT. You can inform AGAT how to deal with it See https://agat.readthedocs.io/en/latest/troubleshooting.html#agat-throws-features-out-because-the-feature-type-is-not-yet-taken-into-account II) The features are thrown by AGAT because child feature are missing (e.g. gene feature expect at least one transcript linked to it). See https://agat.readthedocs.io/en/latest/troubleshooting.html#agat-throws-features-out-because-child-features-are-not-provided I invite you to open an issue in the AGAT GitHub repository. Once the file is parsed correctly you can use the script agat_sp_alignment_output_style.pl to turn level1 feature types (e.g. gene) and level2 feature types (e.g. mRNA) into match and match_part features respectively as it can be preferred by MAKER. Best regards, Jacques Dainat, Ph.D. On 22 Sep 2021, at 18:21, Carson Holt > wrote: I?d have to see the GFF, but in general you should organize sequence alignments as match/match_part features. Here is n example from the GFF3 format specification: ctg123 . cDNA_match 1200 9000 . . . ID=cDNA00001 ctg123 . match_part 1200 3200 2.2e-30 + . ID=match00002;Parent=cDNA00001;Target=mjm1123.5 5 506;Gap=M301 D1499 M201 ctg123 . match_part 7000 9000 7.4e-32 - . ID=match00003;Parent=cDNA00001;Target=mjm1123.3 1 502;Gap=M101 D1499 M401 Also make sure you are not inadvertently using GFF2 or GTF. They are not backwards compatible with GFF3. ?Carson On Sep 22, 2021, at 8:06 AM, Kevin Kocot > wrote: Thanks Carson, I think I see the problem now. Here's what I'm getting: ----- STATUS: Parsing control files... STATUS: Processing and indexing input FASTA files... STATUS: Setting up database for any GFF3 input... A data structure will be created for you at: /home/wirenia/Desktop/2021-08-11_MAKER_Dreissena_rostriformis/PGA_assembly_shortened_headers.fasta.maker.output/PGA_assembly_shortened_headers.fasta_datastore To access files for individual sequences use the datastore index: /home/wirenia/Desktop/2021-08-11_MAKER_Dreissena_rostriformis/PGA_assembly_shortened_headers.fasta.maker.output/PGA_assembly_shortened_headers.fasta_master_datastore_index.log STATUS: Now running MAKER... examining contents of the fasta file and run log --Next Contig-- Processing run.log file... #--------------------------------------------------------------------- Now retrying the contig!! SeqID: PGA_scaffold0 Length: 141759199 Tries: 5!! #--------------------------------------------------------------------- setting up GFF3 output and fasta chunks prepare section files Gathering GFF3 input into hits - chunk:0 ERROR: Non-unique top level ID for match.19561.56 While this is technically legal in GFF3, it usually indicates a poorly fomatted GFF3 file (perhaps you tried to merge two GFF3 files without accounting for unique IDs). MAKER will not handle these correctly. --> rank=NA, hostname=wirenia ERROR: Failed while prepare section files ERROR: Chunk failed at level:12, tier_type:3 FAILED CONTIG:PGA_scaffold0 ERROR: Chunk failed at level:4, tier_type:0 FAILED CONTIG:PGA_scaffold0 examining contents of the fasta file and run log ----- It looks like maker doesn't like the format of the exonerate gff3 I am using. I tried 'fixing' it with agat_convert_sp_gxf2gxf.pl, which seemed to work on my Braker output, but that just produced an empty gff3 file for both my exonerate and PASA gff3 files. Any advice on how to prepare exonerate or PASA gff3 files for Maker? Thanks! Kevin ________________________________ From: Carson Holt > Sent: Monday, September 20, 2021 8:06 AM To: Kevin Kocot > Cc: maker-devel at yandell-lab.org > Subject: [EXTERNAL] Re: [maker-devel] Troubleshooting Maker failure Hi Kevin, The files are already being skipped because of previous failures. Can you increase the try count (-t on the command line) to something like 6, and send me the STDERR after it generates a new failure. ?Carson Sent from my iPhone On Sep 20, 2021, at 4:56 AM, Kevin Kocot > wrote: ? Thanks Carson! I've uploaded that file here: http://genomes.ua.edu/Kocot/2021-09-10_Maker/round1_run_maker_try2.log ________________________________ From: Carson Holt Sent: Tuesday, September 14, 2021 9:51 PM To: Kevin Kocot Cc: maker-devel at yandell-lab.org Subject: [EXTERNAL] Re: [maker-devel] Troubleshooting Maker failure What I really need is the captured STDERR from the failed run. ?Carson On Sep 10, 2021, at 9:19 AM, Kevin Kocot > wrote: Hi Carson and all, I ran Maker 3.01.03 on a chromosome-level mollusc genome assembly using some evidence I previously generated with Funannotate and BRAKER2, but Maker is not completing successfully. Every scaffold in the datastore_index.log file has both STARTED and FAILED statuses. I can't figure out where the problem lies, though. Running ./Build status indicates all the dependencies are there (I?m not using MPI). The run.log files just ends with "DIED RANK 0" and "DIED COUNT 3." Is there a way to tell which (if any) of the dependencies is misbehaving here (they all seem to run fine independently) or if my evidence .gff3 files are not correctly formatted? I?ve uploaded a zipped sample output folder as well as my config files and the datastore.log file here: http://genomes.ua.edu/Kocot/2021-09-10_Maker/ Any guidance on what might be the issue would be greatly appreciated. Thanks! Kevin Kevin M. Kocot he/him/his Associate Professor & Curator of Invertebrates Department of Biological Sciences & Alabama Museum of Natural History The University of Alabama 307 Mary Harmon Bryant Hall Box 870344 Tuscaloosa, AL 35487 phone 205-348-4052 | fax 205-348-4039 kmkocot at ua.edu | www.kocotlab.com https://uasystem.zoom.us/j/3755490727 _______________________________________________ maker-devel mailing list maker-devel at yandell-lab.org http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at yandell-lab.org http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -- Jacques Dainat -------------- next part -------------- An HTML attachment was scrubbed... URL: