From carsonhh at gmail.com Fri Dec 11 15:52:57 2020 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 11 Dec 2020 15:52:57 -0700 Subject: [maker-devel] Maker yields sequences without start and stop codon In-Reply-To: References: Message-ID: <3C8780FE-E998-4BB4-BF49-C1CC31F1E4BB@gmail.com> Maybe. I?ve never tried it. MAKER just wanders upstream and downstream, then gives up if it doesn?t find anything to add. Perhaps this script does something similar. If it takes GFF3, may be worth a shot. ?Carson > On Oct 27, 2020, at 9:01 PM, Emmanuel Nnadi wrote: > > Hi Carson > I always set always_complete=1 yet I still get sequences without start and stop codon. > > Can https://github.com/Gaius-Augustus/Augustus/blob/master/scripts/fix_in_frame_stop_codon_genes.py be used on the final sequence to fix start and stop codon problem? > > > Nnaemeka Emmanuel Nnadi,Ph.D > Department of Microbiology, > Faculty of Natural and Applied Science, > Plateau State University, Bokkos, Plateau State, Nigeria. > +2348068124819 > Publications: > https://www.researchgate.net/profile/Emmanuel_Nnadi/publications -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 1376 bytes Desc: not available URL: From charlesgumbi at gmail.com Tue Dec 8 12:24:28 2020 From: charlesgumbi at gmail.com (Charles Gumbi) Date: Tue, 8 Dec 2020 14:24:28 -0500 Subject: [maker-devel] Possible precedence issue with control flow operator at /apps/maker/3.01.03/lib/site_perl/5.26.2/Bio/DB/IndexedBase.pm line 805 Message-ID: Dear maker support team I need your help to troubleshoot this error. I don't know what I'm doing wrong, this is my first time annotating a genome. I have gone through almost three maker tutorials online but it's like the annotation doesn't generate the datastore folder and I don't know why because I have provided all the input files. I am running this analysis on the cluster platform. Below I have pasted the slurm script and the error message. Any suggestions and help would be highly appropriated. Slurm script #!/bin/bash #SBATCH --account=austin #SBATCH --job-name=maker #SBATCH --mail-type=ALL #SBATCH --mail-user=charlesgumbi at ufl.edu #SBATCH --mem=30gb #SBATCH --ntasks=1 #SBATCH --cpus-per-task2 #SBATCH --time=48:00:00 #SBATCH --output=maker%j.out #SBATCH --error=maker%j.err date;hostname;pwd #loading modules module purge module load maker/3.01.03 #runing maker maker -base natalensis -fix_nucleotides -dsindex maker_bopts.ctl maker_exe.ctl maker_opts.ctl #making gff3 files cd natalensis.maker.output gff3_merge -d natalensis.maker.output/natalensis_master_datastore_index.log fasta_merge -d natalensis.maker.output/natalensis_master_datastore_index.log. Error file Possible precedence issue with control flow operator at /apps/maker/3.01.03/lib/site_perl/5.26.2/Bio/DB/IndexedBase.pm line 805. STATUS: Parsing control files... STATUS: Processing and indexing input FASTA files... STATUS: Setting up database for any GFF3 input... A data structure will be created for you at: /blue/austin/bonginkosi.gumbi/mastomys/genome/wtdg/annotation/maker/natalensis.maker.output/natalensis_datastore To access files for individual sequences use the datastore index: /blue/austin/bonginkosi.gumbi/mastomys/genome/wtdg/annotation/maker/natalensis.maker.output/natalensis_master_datastore_index.log ERROR: The file 'natalensis.maker.output/natalensis_master_datastore_index.log' does not exist ERROR: The file 'natalensis.maker.output/natalensis_master_datastore_index.log' does not exist maker61784163.err (END) Humble regards charles -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Dec 11 15:49:55 2020 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 11 Dec 2020 15:49:55 -0700 Subject: [maker-devel] Why my gene is not annotated ? In-Reply-To: References: Message-ID: Are you running a trained gene predictor? Perhaps the predictor does not like something about the area, so even with hints, it cannot make a sensible model. Non-canonical splice sites, runs of NNN, internal stop codons (indicates assembly issue in the region) can be the cause. ?Carson > On Nov 17, 2020, at 9:22 AM, Patrick Tran Van wrote: > > Dear Maker developper, > > I use a transcipt and a protein file with MAKER, here is the intermediate GFF after the blast : > > ### > ctg889_racon_pilon3 tblastx translated_nucleotide_match 153760 157392 137 - . ID=ctg889_racon_pilon3:hit:0:3.6.0.0;Name=Tsi_rna > ctg889_racon_pilon3 tblastx match_part 157186 157392 263 - . ID=ctg889_racon_pilon3:hsp:0:3.6.0.0;Parent=ctg889_racon_pilon3:hit:0:3.6.0.0;Target=Tsi_rna 259 465 -;Gap=M69 > ctg889_racon_pilon3 tblastx match_part 157218 157391 284 - . ID=ctg889_racon_pilon3:hsp:1:3.6.0.0;Parent=ctg889_racon_pilon3:hit:0:3.6.0.0;Target=Tsi_rna 291 464 -;Gap=M58 > ctg889_racon_pilon3 tblastx match_part 157193 157390 241 - . ID=ctg889_racon_pilon3:hsp:2:3.6.0.0;Parent=ctg889_racon_pilon3:hit:0:3.6.0.0;Target=Tsi_rna 266 463 -;Gap=M66 > ctg889_racon_pilon3 tblastx match_part 156985 157056 87 - . ID=ctg889_racon_pilon3:hsp:3:3.6.0.0;Parent=ctg889_racon_pilon3:hit:0:3.6.0.0;Target=Tsi_rna 223 294 -;Gap=M24 > ctg889_racon_pilon3 tblastx match_part 156990 157055 96 - . ID=ctg889_racon_pilon3:hsp:4:3.6.0.0;Parent=ctg889_racon_pilon3:hit:0:3.6.0.0;Target=Tsi_rna 228 293 -;Gap=M22 > ctg889_racon_pilon3 tblastx match_part 156880 156993 166 - . ID=ctg889_racon_pilon3:hsp:5:3.6.0.0;Parent=ctg889_racon_pilon3:hit:0:3.6.0.0;Target=Tsi_rna 117 230 -;Gap=M38 > ctg889_racon_pilon3 tblastx match_part 156879 156986 154 - . ID=ctg889_racon_pilon3:hsp:6:3.6.0.0;Parent=ctg889_racon_pilon3:hit:0:3.6.0.0;Target=Tsi_rna 116 223 -;Gap=M36 > ctg889_racon_pilon3 tblastx match_part 156881 156985 141 - . ID=ctg889_racon_pilon3:hsp:7:3.6.0.0;Parent=ctg889_racon_pilon3:hit:0:3.6.0.0;Target=Tsi_rna 118 222 -;Gap=M35 > ctg889_racon_pilon3 tblastx match_part 153762 153845 128 - . ID=ctg889_racon_pilon3:hsp:8:3.6.0.0;Parent=ctg889_racon_pilon3:hit:0:3.6.0.0;Target=Tsi_rna 3 86 -;Gap=M28 > ctg889_racon_pilon3 tblastx match_part 153761 153844 134 - . ID=ctg889_racon_pilon3:hsp:9:3.6.0.0;Parent=ctg889_racon_pilon3:hit:0:3.6.0.0;Target=Tsi_rna 2 85 -;Gap=M28 > ctg889_racon_pilon3 tblastx match_part 153760 153843 137 - . ID=ctg889_racon_pilon3:hsp:10:3.6.0.0;Parent=ctg889_racon_pilon3:hit:0:3.6.0.0;Target=Tsi_rna 1 84 -;Gap=M28 > ctg889_racon_pilon3 cdna2genome expressed_sequence_match 153760 157392 2022 + . ID=ctg889_racon_pilon3:hit:1:3.6.0.0;Name=Tsi_rna > ctg889_racon_pilon3 cdna2genome match_part 153760 153845 2022 + . ID=ctg889_racon_pilon3:hsp:11:3.6.0.0;Parent=ctg889_racon_pilon3:hit:1:3.6.0.0;Target=Tsi_rna 1 86 +;Gap=M86 > ctg889_racon_pilon3 cdna2genome match_part 155744 155774 2022 + . ID=ctg889_racon_pilon3:hsp:12:3.6.0.0;Parent=ctg889_racon_pilon3:hit:1:3.6.0.0;Target=Tsi_rna 87 117 +;Gap=M31 > ctg889_racon_pilon3 cdna2genome match_part 156881 157055 2022 + . ID=ctg889_racon_pilon3:hsp:13:3.6.0.0;Parent=ctg889_racon_pilon3:hit:1:3.6.0.0;Target=Tsi_rna 118 293 +;Gap=M105 I1 M70 > ctg889_racon_pilon3 cdna2genome match_part 157221 157392 2022 + . ID=ctg889_racon_pilon3:hsp:14:3.6.0.0;Parent=ctg889_racon_pilon3:hit:1:3.6.0.0;Target=Tsi_rna 294 465 +;Gap=M172 > ctg889_racon_pilon3 blastx protein_match 153760 157389 133 + . ID=ctg889_racon_pilon3:hit:2:3.10.0.0;Name=TSI_CENPA > ctg889_racon_pilon3 blastx match_part 153760 153846 133 + . ID=ctg889_racon_pilon3:hsp:15:3.10.0.0;Parent=ctg889_racon_pilon3:hit:2:3.10.0.0;Target=TSI_CENPA 1 29;Gap=M29 > ctg889_racon_pilon3 blastx match_part 156881 156994 191 + . ID=ctg889_racon_pilon3:hsp:16:3.10.0.0;Parent=ctg889_racon_pilon3:hit:2:3.10.0.0;Target=TSI_CENPA 40 77;Gap=M38 > ctg889_racon_pilon3 blastx match_part 156985 157389 314 + . ID=ctg889_racon_pilon3:hsp:17:3.10.0.0;Parent=ctg889_racon_pilon3:hit:2:3.10.0.0;Target=TSI_CENPA 75 154;Gap=M19 D35 M4 D20 M57 > ctg889_racon_pilon3 protein2genome protein_match 153760 157389 668 + . ID=ctg889_racon_pilon3:hit:3:3.10.0.0;Name=TSI_CENPA > ctg889_racon_pilon3 protein2genome match_part 153760 153845 668 + . ID=ctg889_racon_pilon3:hsp:18:3.10.0.0;Parent=ctg889_racon_pilon3:hit:3:3.10.0.0;Target=TSI_CENPA 1 28;Gap=M28 F2 > ctg889_racon_pilon3 protein2genome match_part 155744 155774 668 + . ID=ctg889_racon_pilon3:hsp:19:3.10.0.0;Parent=ctg889_racon_pilon3:hit:3:3.10.0.0;Target=TSI_CENPA 29 39;Gap=R2 M11 > ctg889_racon_pilon3 protein2genome match_part 156881 157055 668 + . ID=ctg889_racon_pilon3:hsp:20:3.10.0.0;Parent=ctg889_racon_pilon3:hit:3:3.10.0.0;Target=TSI_CENPA 40 97;Gap=M35 I1 M1 F2 M20 F1 F1 > ctg889_racon_pilon3 protein2genome match_part 157221 157389 668 + . ID=ctg889_racon_pilon3:hsp:21:3.10.0.0;Parent=ctg889_racon_pilon3:hit:3:3.10.0.0;Target=TSI_CENPA 98 154;Gap=R2 M57 > > > There is obviously a gene here but at the end I don't have anything annotated by MAKER, why ? > > Thanks for your help. > > Patrick Tran Van > > Bioinformatician: Lab Chapuisat & Schwander > Department of Ecology and Evolution > University of Lausanne > Lausanne - Switzerland > Office 3206 > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 1376 bytes Desc: not available URL: From carsonhh at gmail.com Fri Dec 11 16:02:14 2020 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 11 Dec 2020 16:02:14 -0700 Subject: [maker-devel] Format error detected with maker2eval_gtf In-Reply-To: References: Message-ID: <800FEDCE-F74F-44EF-8AB8-09A9DAC7A03C@gmail.com> Truncated line in the GFF3. Network mounted file systems can sometimes return true on write operations that actually failed (should have killed the program, but NFS lies, and the program keeps running). Since MAKER does such high IO, this comes up from time to time. If you can identify the contig involved, you can either rerun it or delete just the gff3 files in the folder and let MAEKR rewrite them. ?Carson > On Nov 16, 2020, at 8:29 PM, Zoe Clarke wrote: > > Hello! > > I am currently working with an annotation from Maker that has just finished. Unfortunately I think there are some errors in my annotation, as a lot of my reads from a test RNA-seq alignment got discarded from my analysis. The analysis tool suggested my annotation might have overlapping reads. > > I am trying to figure out what went wrong with my annotation, and the closest hint I have are error messages that I receive from running maker2eval_gtf on my output merged gff file. It appears every line of my gff is producing the following errors: > > Use of uninitialized value $att in pattern match (m//) at maker2eval_gtf line 244, line 200199481. > Use of uninitialized value $att in pattern match (m//) at maker2eval_gtf line 245, line 200199481. > Use of uninitialized value $att in pattern match (m//) at maker2eval_gtf line 246, line 200199481. > Use of uninitialized value in string eq at maker2eval_gtf line 35, line 200199481. > Use of uninitialized value in string eq at maker2eval_gtf line 37, line 200199481. > Use of uninitialized value in string eq at maker2eval_gtf line 38, line 200199481. > Use of uninitialized value in string eq at maker2eval_gtf line 39, line 200199481. > Use of uninitialized value in string eq at maker2eval_gtf line 40, line 200199481. > Use of uninitialized value in string eq at maker2eval_gtf line 41, line 200199481. > > Do you know what issue this might indicate for my gff? At this point, I have done no filtering, just merged the files with gff3_merge. > > Thank you so much, and please let me know if there is anything else I can provide to inform you more about my issue! > > Zoe > ______________________________________ > Zoe Clarke > PhD candidate in Computational Biology at U of T > Lab profile: http://baderlab.org/Zoe%20Clarke > Personal website: https://zoe-clarke.weebly.com/ _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 1376 bytes Desc: not available URL: From revale at well.ox.ac.uk Thu Dec 24 09:29:23 2020 From: revale at well.ox.ac.uk (Santiago Revale) Date: Thu, 24 Dec 2020 16:29:23 +0000 Subject: [maker-devel] processing of simple and complex repeats Message-ID: Dear Maker developers, Can Maker distinguish between simple and complex repeats from a gff3 file of pre-aligned repeats? I'm trying to annotate a genome of a non-model Drosophila species and I've already generated a gff3 file with both simple and complex repeats for this species. I would like to use this gff3 file as input for Repeat Masking so Maker won't have to align repeats from any library. My maker_opts.ctl file looks like this: #-----Repeat Masking model_org= rmlib= repeat_/path/to/te_proteins.fasta rm_gff=/path/to/Dato_genome.Dato-first.full_mask.out.reformat.gff3 prok_rm=0 softmask=1 By using softmask=1 I understand that Maker will softmask only low complexity repeats (while complex ones will be hardmasked). My question is whether Maker can distinguish between simple and complex repeats from the gff3 file in order to softmask only simple repeats. Also, do you think it would be better to only include complex repeats in the gff3 file and let Maker find simple repeats on its own by using model_org=simple? Thank you very much in advance. Best regards, Santiago -------------- next part -------------- An HTML attachment was scrubbed... URL: From nanshangogo at gmail.com Wed Dec 23 00:03:35 2020 From: nanshangogo at gmail.com (nanshan yang) Date: Wed, 23 Dec 2020 15:03:35 +0800 Subject: [maker-devel] Qusetion about the ab initio gene predictors result Message-ID: Hi MAKER community : I have questions about MAKER output files.I get result from *ab initio* gene predictors which use snap and augustus by maker,and after fasta_merge step,there are some fasta files as: result.maker.augustus_masked.proteins.fasta result.maker.augustus_masked.transcripts.fasta result.maker.augustus.proteins.fasta result.maker.augustus.transcripts.fasta result.maker.non_overlapping_ab_initio.proteins.fasta result.maker.non_overlapping_ab_initio.transcripts.fasta result.maker.snap.proteins.fasta result.maker.snap_masked..transcripts.fasta result.maker.snap_masked..proteins.fasta result.maker.snap.transcripts.fasta result.maker.proteins.fasta result.maker.transcripts.fasta if i continue to analysis the fasta files,which fasta should i choose? because i choose *ab initio* gene predictors,so the result .maker.non_overlapping_ab_initio*fasta can be uesed into the downstream analysis?or the result.maker.proteins.fasta Thanks verymuch for any help or insights -------------- next part -------------- An HTML attachment was scrubbed... URL: From revale at well.ox.ac.uk Tue Dec 29 12:19:11 2020 From: revale at well.ox.ac.uk (Santiago Revale) Date: Tue, 29 Dec 2020 19:19:11 +0000 Subject: [maker-devel] dispersed_repeat features on rm_gff GFF3 file Message-ID: Dear Maker developers, Can Maker deal with a "dispersed_repeat" one-level features GFF3 file provided to the rm_gff option in the control file? We're trying to annotate a genome of a non-model Drosophila species using a previously generated GFF3 file with repeats for this species. Some months ago, Carson Holt replied to a post (https://groups.google.com/g/maker-devel/c/BhBMTF8dze8/m/y-QwJYRFAQAJ) saying that "They (repeat models) must be match/match_part two level feature for rm_gff". However, all features on our GFF3 file are 'dispersed_repeat' one-level features. Here is how our file looks like (extract of the file for a scaffold attached as well, repeats.gff3): ##sequence-region scaffold1062|size32755 1 32755 scaffold1062|size32755 RepeatMasker dispersed_repeat 1 47 271 - . Target=DNAREP1_DM 26 73;ID=2740 scaffold1062|size32755 RepeatMasker dispersed_repeat 48 83 263 + . Target=Dbuz_kz3_5_36225 704 739;ID=2741 scaffold1062|size32755 RepeatMasker dispersed_repeat 340 386 299 - . Target=rnd-1_family-30 19 65;ID=2742 scaffold1062|size32755 RepeatMasker dispersed_repeat 349 387 247 + . Target=rnd-1_family-29 1 71;ID=2743 scaffold1062|size32755 RepeatMasker dispersed_repeat 388 430 240 + . Target=rnd-5_family-3333_TIR_P 3669 3713;ID=2744 scaffold1062|size32755 RepeatMasker dispersed_repeat 446 555 408 - . Target=rnd-1_family-15 16 138;ID=2745 I'm attaching both the log file and the CTL file for what it's worth (Dtest.log / maker_opts.ctl). I wonder whether Maker handled right our rm_gff file, whether the generated output is reliable. If not, do you think modifying the GFF3 file manually duplicating each row and changing the third column to 'match/match_part' could do the trick? Thank you very much in advance. Best regards, Santiago -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Dtest.log Type: application/octet-stream Size: 40356 bytes Desc: Dtest.log URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_opts.ctl Type: application/octet-stream Size: 4995 bytes Desc: maker_opts.ctl URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: repeats.gff3 Type: application/octet-stream Size: 1355 bytes Desc: repeats.gff3 URL: From carsonhh at gmail.com Fri Dec 11 15:52:57 2020 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 11 Dec 2020 15:52:57 -0700 Subject: [maker-devel] Maker yields sequences without start and stop codon In-Reply-To: References: Message-ID: <3C8780FE-E998-4BB4-BF49-C1CC31F1E4BB@gmail.com> Maybe. I?ve never tried it. MAKER just wanders upstream and downstream, then gives up if it doesn?t find anything to add. Perhaps this script does something similar. If it takes GFF3, may be worth a shot. ?Carson > On Oct 27, 2020, at 9:01 PM, Emmanuel Nnadi wrote: > > Hi Carson > I always set always_complete=1 yet I still get sequences without start and stop codon. > > Can https://github.com/Gaius-Augustus/Augustus/blob/master/scripts/fix_in_frame_stop_codon_genes.py be used on the final sequence to fix start and stop codon problem? > > > Nnaemeka Emmanuel Nnadi,Ph.D > Department of Microbiology, > Faculty of Natural and Applied Science, > Plateau State University, Bokkos, Plateau State, Nigeria. > +2348068124819 > Publications: > https://www.researchgate.net/profile/Emmanuel_Nnadi/publications -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 1376 bytes Desc: not available URL: From charlesgumbi at gmail.com Tue Dec 8 12:24:28 2020 From: charlesgumbi at gmail.com (Charles Gumbi) Date: Tue, 8 Dec 2020 14:24:28 -0500 Subject: [maker-devel] Possible precedence issue with control flow operator at /apps/maker/3.01.03/lib/site_perl/5.26.2/Bio/DB/IndexedBase.pm line 805 Message-ID: Dear maker support team I need your help to troubleshoot this error. I don't know what I'm doing wrong, this is my first time annotating a genome. I have gone through almost three maker tutorials online but it's like the annotation doesn't generate the datastore folder and I don't know why because I have provided all the input files. I am running this analysis on the cluster platform. Below I have pasted the slurm script and the error message. Any suggestions and help would be highly appropriated. Slurm script #!/bin/bash #SBATCH --account=austin #SBATCH --job-name=maker #SBATCH --mail-type=ALL #SBATCH --mail-user=charlesgumbi at ufl.edu #SBATCH --mem=30gb #SBATCH --ntasks=1 #SBATCH --cpus-per-task2 #SBATCH --time=48:00:00 #SBATCH --output=maker%j.out #SBATCH --error=maker%j.err date;hostname;pwd #loading modules module purge module load maker/3.01.03 #runing maker maker -base natalensis -fix_nucleotides -dsindex maker_bopts.ctl maker_exe.ctl maker_opts.ctl #making gff3 files cd natalensis.maker.output gff3_merge -d natalensis.maker.output/natalensis_master_datastore_index.log fasta_merge -d natalensis.maker.output/natalensis_master_datastore_index.log. Error file Possible precedence issue with control flow operator at /apps/maker/3.01.03/lib/site_perl/5.26.2/Bio/DB/IndexedBase.pm line 805. STATUS: Parsing control files... STATUS: Processing and indexing input FASTA files... STATUS: Setting up database for any GFF3 input... A data structure will be created for you at: /blue/austin/bonginkosi.gumbi/mastomys/genome/wtdg/annotation/maker/natalensis.maker.output/natalensis_datastore To access files for individual sequences use the datastore index: /blue/austin/bonginkosi.gumbi/mastomys/genome/wtdg/annotation/maker/natalensis.maker.output/natalensis_master_datastore_index.log ERROR: The file 'natalensis.maker.output/natalensis_master_datastore_index.log' does not exist ERROR: The file 'natalensis.maker.output/natalensis_master_datastore_index.log' does not exist maker61784163.err (END) Humble regards charles -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Dec 11 15:49:55 2020 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 11 Dec 2020 15:49:55 -0700 Subject: [maker-devel] Why my gene is not annotated ? In-Reply-To: References: Message-ID: Are you running a trained gene predictor? Perhaps the predictor does not like something about the area, so even with hints, it cannot make a sensible model. Non-canonical splice sites, runs of NNN, internal stop codons (indicates assembly issue in the region) can be the cause. ?Carson > On Nov 17, 2020, at 9:22 AM, Patrick Tran Van wrote: > > Dear Maker developper, > > I use a transcipt and a protein file with MAKER, here is the intermediate GFF after the blast : > > ### > ctg889_racon_pilon3 tblastx translated_nucleotide_match 153760 157392 137 - . ID=ctg889_racon_pilon3:hit:0:3.6.0.0;Name=Tsi_rna > ctg889_racon_pilon3 tblastx match_part 157186 157392 263 - . ID=ctg889_racon_pilon3:hsp:0:3.6.0.0;Parent=ctg889_racon_pilon3:hit:0:3.6.0.0;Target=Tsi_rna 259 465 -;Gap=M69 > ctg889_racon_pilon3 tblastx match_part 157218 157391 284 - . ID=ctg889_racon_pilon3:hsp:1:3.6.0.0;Parent=ctg889_racon_pilon3:hit:0:3.6.0.0;Target=Tsi_rna 291 464 -;Gap=M58 > ctg889_racon_pilon3 tblastx match_part 157193 157390 241 - . ID=ctg889_racon_pilon3:hsp:2:3.6.0.0;Parent=ctg889_racon_pilon3:hit:0:3.6.0.0;Target=Tsi_rna 266 463 -;Gap=M66 > ctg889_racon_pilon3 tblastx match_part 156985 157056 87 - . ID=ctg889_racon_pilon3:hsp:3:3.6.0.0;Parent=ctg889_racon_pilon3:hit:0:3.6.0.0;Target=Tsi_rna 223 294 -;Gap=M24 > ctg889_racon_pilon3 tblastx match_part 156990 157055 96 - . ID=ctg889_racon_pilon3:hsp:4:3.6.0.0;Parent=ctg889_racon_pilon3:hit:0:3.6.0.0;Target=Tsi_rna 228 293 -;Gap=M22 > ctg889_racon_pilon3 tblastx match_part 156880 156993 166 - . ID=ctg889_racon_pilon3:hsp:5:3.6.0.0;Parent=ctg889_racon_pilon3:hit:0:3.6.0.0;Target=Tsi_rna 117 230 -;Gap=M38 > ctg889_racon_pilon3 tblastx match_part 156879 156986 154 - . ID=ctg889_racon_pilon3:hsp:6:3.6.0.0;Parent=ctg889_racon_pilon3:hit:0:3.6.0.0;Target=Tsi_rna 116 223 -;Gap=M36 > ctg889_racon_pilon3 tblastx match_part 156881 156985 141 - . ID=ctg889_racon_pilon3:hsp:7:3.6.0.0;Parent=ctg889_racon_pilon3:hit:0:3.6.0.0;Target=Tsi_rna 118 222 -;Gap=M35 > ctg889_racon_pilon3 tblastx match_part 153762 153845 128 - . ID=ctg889_racon_pilon3:hsp:8:3.6.0.0;Parent=ctg889_racon_pilon3:hit:0:3.6.0.0;Target=Tsi_rna 3 86 -;Gap=M28 > ctg889_racon_pilon3 tblastx match_part 153761 153844 134 - . ID=ctg889_racon_pilon3:hsp:9:3.6.0.0;Parent=ctg889_racon_pilon3:hit:0:3.6.0.0;Target=Tsi_rna 2 85 -;Gap=M28 > ctg889_racon_pilon3 tblastx match_part 153760 153843 137 - . ID=ctg889_racon_pilon3:hsp:10:3.6.0.0;Parent=ctg889_racon_pilon3:hit:0:3.6.0.0;Target=Tsi_rna 1 84 -;Gap=M28 > ctg889_racon_pilon3 cdna2genome expressed_sequence_match 153760 157392 2022 + . ID=ctg889_racon_pilon3:hit:1:3.6.0.0;Name=Tsi_rna > ctg889_racon_pilon3 cdna2genome match_part 153760 153845 2022 + . ID=ctg889_racon_pilon3:hsp:11:3.6.0.0;Parent=ctg889_racon_pilon3:hit:1:3.6.0.0;Target=Tsi_rna 1 86 +;Gap=M86 > ctg889_racon_pilon3 cdna2genome match_part 155744 155774 2022 + . ID=ctg889_racon_pilon3:hsp:12:3.6.0.0;Parent=ctg889_racon_pilon3:hit:1:3.6.0.0;Target=Tsi_rna 87 117 +;Gap=M31 > ctg889_racon_pilon3 cdna2genome match_part 156881 157055 2022 + . ID=ctg889_racon_pilon3:hsp:13:3.6.0.0;Parent=ctg889_racon_pilon3:hit:1:3.6.0.0;Target=Tsi_rna 118 293 +;Gap=M105 I1 M70 > ctg889_racon_pilon3 cdna2genome match_part 157221 157392 2022 + . ID=ctg889_racon_pilon3:hsp:14:3.6.0.0;Parent=ctg889_racon_pilon3:hit:1:3.6.0.0;Target=Tsi_rna 294 465 +;Gap=M172 > ctg889_racon_pilon3 blastx protein_match 153760 157389 133 + . ID=ctg889_racon_pilon3:hit:2:3.10.0.0;Name=TSI_CENPA > ctg889_racon_pilon3 blastx match_part 153760 153846 133 + . ID=ctg889_racon_pilon3:hsp:15:3.10.0.0;Parent=ctg889_racon_pilon3:hit:2:3.10.0.0;Target=TSI_CENPA 1 29;Gap=M29 > ctg889_racon_pilon3 blastx match_part 156881 156994 191 + . ID=ctg889_racon_pilon3:hsp:16:3.10.0.0;Parent=ctg889_racon_pilon3:hit:2:3.10.0.0;Target=TSI_CENPA 40 77;Gap=M38 > ctg889_racon_pilon3 blastx match_part 156985 157389 314 + . ID=ctg889_racon_pilon3:hsp:17:3.10.0.0;Parent=ctg889_racon_pilon3:hit:2:3.10.0.0;Target=TSI_CENPA 75 154;Gap=M19 D35 M4 D20 M57 > ctg889_racon_pilon3 protein2genome protein_match 153760 157389 668 + . ID=ctg889_racon_pilon3:hit:3:3.10.0.0;Name=TSI_CENPA > ctg889_racon_pilon3 protein2genome match_part 153760 153845 668 + . ID=ctg889_racon_pilon3:hsp:18:3.10.0.0;Parent=ctg889_racon_pilon3:hit:3:3.10.0.0;Target=TSI_CENPA 1 28;Gap=M28 F2 > ctg889_racon_pilon3 protein2genome match_part 155744 155774 668 + . ID=ctg889_racon_pilon3:hsp:19:3.10.0.0;Parent=ctg889_racon_pilon3:hit:3:3.10.0.0;Target=TSI_CENPA 29 39;Gap=R2 M11 > ctg889_racon_pilon3 protein2genome match_part 156881 157055 668 + . ID=ctg889_racon_pilon3:hsp:20:3.10.0.0;Parent=ctg889_racon_pilon3:hit:3:3.10.0.0;Target=TSI_CENPA 40 97;Gap=M35 I1 M1 F2 M20 F1 F1 > ctg889_racon_pilon3 protein2genome match_part 157221 157389 668 + . ID=ctg889_racon_pilon3:hsp:21:3.10.0.0;Parent=ctg889_racon_pilon3:hit:3:3.10.0.0;Target=TSI_CENPA 98 154;Gap=R2 M57 > > > There is obviously a gene here but at the end I don't have anything annotated by MAKER, why ? > > Thanks for your help. > > Patrick Tran Van > > Bioinformatician: Lab Chapuisat & Schwander > Department of Ecology and Evolution > University of Lausanne > Lausanne - Switzerland > Office 3206 > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 1376 bytes Desc: not available URL: From carsonhh at gmail.com Fri Dec 11 16:02:14 2020 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 11 Dec 2020 16:02:14 -0700 Subject: [maker-devel] Format error detected with maker2eval_gtf In-Reply-To: References: Message-ID: <800FEDCE-F74F-44EF-8AB8-09A9DAC7A03C@gmail.com> Truncated line in the GFF3. Network mounted file systems can sometimes return true on write operations that actually failed (should have killed the program, but NFS lies, and the program keeps running). Since MAKER does such high IO, this comes up from time to time. If you can identify the contig involved, you can either rerun it or delete just the gff3 files in the folder and let MAEKR rewrite them. ?Carson > On Nov 16, 2020, at 8:29 PM, Zoe Clarke wrote: > > Hello! > > I am currently working with an annotation from Maker that has just finished. Unfortunately I think there are some errors in my annotation, as a lot of my reads from a test RNA-seq alignment got discarded from my analysis. The analysis tool suggested my annotation might have overlapping reads. > > I am trying to figure out what went wrong with my annotation, and the closest hint I have are error messages that I receive from running maker2eval_gtf on my output merged gff file. It appears every line of my gff is producing the following errors: > > Use of uninitialized value $att in pattern match (m//) at maker2eval_gtf line 244, line 200199481. > Use of uninitialized value $att in pattern match (m//) at maker2eval_gtf line 245, line 200199481. > Use of uninitialized value $att in pattern match (m//) at maker2eval_gtf line 246, line 200199481. > Use of uninitialized value in string eq at maker2eval_gtf line 35, line 200199481. > Use of uninitialized value in string eq at maker2eval_gtf line 37, line 200199481. > Use of uninitialized value in string eq at maker2eval_gtf line 38, line 200199481. > Use of uninitialized value in string eq at maker2eval_gtf line 39, line 200199481. > Use of uninitialized value in string eq at maker2eval_gtf line 40, line 200199481. > Use of uninitialized value in string eq at maker2eval_gtf line 41, line 200199481. > > Do you know what issue this might indicate for my gff? At this point, I have done no filtering, just merged the files with gff3_merge. > > Thank you so much, and please let me know if there is anything else I can provide to inform you more about my issue! > > Zoe > ______________________________________ > Zoe Clarke > PhD candidate in Computational Biology at U of T > Lab profile: http://baderlab.org/Zoe%20Clarke > Personal website: https://zoe-clarke.weebly.com/ _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 1376 bytes Desc: not available URL: From revale at well.ox.ac.uk Thu Dec 24 09:29:23 2020 From: revale at well.ox.ac.uk (Santiago Revale) Date: Thu, 24 Dec 2020 16:29:23 +0000 Subject: [maker-devel] processing of simple and complex repeats Message-ID: Dear Maker developers, Can Maker distinguish between simple and complex repeats from a gff3 file of pre-aligned repeats? I'm trying to annotate a genome of a non-model Drosophila species and I've already generated a gff3 file with both simple and complex repeats for this species. I would like to use this gff3 file as input for Repeat Masking so Maker won't have to align repeats from any library. My maker_opts.ctl file looks like this: #-----Repeat Masking model_org= rmlib= repeat_/path/to/te_proteins.fasta rm_gff=/path/to/Dato_genome.Dato-first.full_mask.out.reformat.gff3 prok_rm=0 softmask=1 By using softmask=1 I understand that Maker will softmask only low complexity repeats (while complex ones will be hardmasked). My question is whether Maker can distinguish between simple and complex repeats from the gff3 file in order to softmask only simple repeats. Also, do you think it would be better to only include complex repeats in the gff3 file and let Maker find simple repeats on its own by using model_org=simple? Thank you very much in advance. Best regards, Santiago -------------- next part -------------- An HTML attachment was scrubbed... URL: From nanshangogo at gmail.com Wed Dec 23 00:03:35 2020 From: nanshangogo at gmail.com (nanshan yang) Date: Wed, 23 Dec 2020 15:03:35 +0800 Subject: [maker-devel] Qusetion about the ab initio gene predictors result Message-ID: Hi MAKER community : I have questions about MAKER output files.I get result from *ab initio* gene predictors which use snap and augustus by maker,and after fasta_merge step,there are some fasta files as: result.maker.augustus_masked.proteins.fasta result.maker.augustus_masked.transcripts.fasta result.maker.augustus.proteins.fasta result.maker.augustus.transcripts.fasta result.maker.non_overlapping_ab_initio.proteins.fasta result.maker.non_overlapping_ab_initio.transcripts.fasta result.maker.snap.proteins.fasta result.maker.snap_masked..transcripts.fasta result.maker.snap_masked..proteins.fasta result.maker.snap.transcripts.fasta result.maker.proteins.fasta result.maker.transcripts.fasta if i continue to analysis the fasta files,which fasta should i choose? because i choose *ab initio* gene predictors,so the result .maker.non_overlapping_ab_initio*fasta can be uesed into the downstream analysis?or the result.maker.proteins.fasta Thanks verymuch for any help or insights -------------- next part -------------- An HTML attachment was scrubbed... URL: From revale at well.ox.ac.uk Tue Dec 29 12:19:11 2020 From: revale at well.ox.ac.uk (Santiago Revale) Date: Tue, 29 Dec 2020 19:19:11 +0000 Subject: [maker-devel] dispersed_repeat features on rm_gff GFF3 file Message-ID: Dear Maker developers, Can Maker deal with a "dispersed_repeat" one-level features GFF3 file provided to the rm_gff option in the control file? We're trying to annotate a genome of a non-model Drosophila species using a previously generated GFF3 file with repeats for this species. Some months ago, Carson Holt replied to a post (https://groups.google.com/g/maker-devel/c/BhBMTF8dze8/m/y-QwJYRFAQAJ) saying that "They (repeat models) must be match/match_part two level feature for rm_gff". However, all features on our GFF3 file are 'dispersed_repeat' one-level features. Here is how our file looks like (extract of the file for a scaffold attached as well, repeats.gff3): ##sequence-region scaffold1062|size32755 1 32755 scaffold1062|size32755 RepeatMasker dispersed_repeat 1 47 271 - . Target=DNAREP1_DM 26 73;ID=2740 scaffold1062|size32755 RepeatMasker dispersed_repeat 48 83 263 + . Target=Dbuz_kz3_5_36225 704 739;ID=2741 scaffold1062|size32755 RepeatMasker dispersed_repeat 340 386 299 - . Target=rnd-1_family-30 19 65;ID=2742 scaffold1062|size32755 RepeatMasker dispersed_repeat 349 387 247 + . Target=rnd-1_family-29 1 71;ID=2743 scaffold1062|size32755 RepeatMasker dispersed_repeat 388 430 240 + . Target=rnd-5_family-3333_TIR_P 3669 3713;ID=2744 scaffold1062|size32755 RepeatMasker dispersed_repeat 446 555 408 - . Target=rnd-1_family-15 16 138;ID=2745 I'm attaching both the log file and the CTL file for what it's worth (Dtest.log / maker_opts.ctl). I wonder whether Maker handled right our rm_gff file, whether the generated output is reliable. If not, do you think modifying the GFF3 file manually duplicating each row and changing the third column to 'match/match_part' could do the trick? Thank you very much in advance. Best regards, Santiago -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Dtest.log Type: application/octet-stream Size: 40356 bytes Desc: Dtest.log URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_opts.ctl Type: application/octet-stream Size: 4995 bytes Desc: maker_opts.ctl URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: repeats.gff3 Type: application/octet-stream Size: 1355 bytes Desc: repeats.gff3 URL: From carsonhh at gmail.com Fri Dec 11 15:52:57 2020 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 11 Dec 2020 15:52:57 -0700 Subject: [maker-devel] Maker yields sequences without start and stop codon In-Reply-To: References: Message-ID: <3C8780FE-E998-4BB4-BF49-C1CC31F1E4BB@gmail.com> Maybe. I?ve never tried it. MAKER just wanders upstream and downstream, then gives up if it doesn?t find anything to add. Perhaps this script does something similar. If it takes GFF3, may be worth a shot. ?Carson > On Oct 27, 2020, at 9:01 PM, Emmanuel Nnadi wrote: > > Hi Carson > I always set always_complete=1 yet I still get sequences without start and stop codon. > > Can https://github.com/Gaius-Augustus/Augustus/blob/master/scripts/fix_in_frame_stop_codon_genes.py be used on the final sequence to fix start and stop codon problem? > > > Nnaemeka Emmanuel Nnadi,Ph.D > Department of Microbiology, > Faculty of Natural and Applied Science, > Plateau State University, Bokkos, Plateau State, Nigeria. > +2348068124819 > Publications: > https://www.researchgate.net/profile/Emmanuel_Nnadi/publications -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 1376 bytes Desc: not available URL: From charlesgumbi at gmail.com Tue Dec 8 12:24:28 2020 From: charlesgumbi at gmail.com (Charles Gumbi) Date: Tue, 8 Dec 2020 14:24:28 -0500 Subject: [maker-devel] Possible precedence issue with control flow operator at /apps/maker/3.01.03/lib/site_perl/5.26.2/Bio/DB/IndexedBase.pm line 805 Message-ID: Dear maker support team I need your help to troubleshoot this error. I don't know what I'm doing wrong, this is my first time annotating a genome. I have gone through almost three maker tutorials online but it's like the annotation doesn't generate the datastore folder and I don't know why because I have provided all the input files. I am running this analysis on the cluster platform. Below I have pasted the slurm script and the error message. Any suggestions and help would be highly appropriated. Slurm script #!/bin/bash #SBATCH --account=austin #SBATCH --job-name=maker #SBATCH --mail-type=ALL #SBATCH --mail-user=charlesgumbi at ufl.edu #SBATCH --mem=30gb #SBATCH --ntasks=1 #SBATCH --cpus-per-task2 #SBATCH --time=48:00:00 #SBATCH --output=maker%j.out #SBATCH --error=maker%j.err date;hostname;pwd #loading modules module purge module load maker/3.01.03 #runing maker maker -base natalensis -fix_nucleotides -dsindex maker_bopts.ctl maker_exe.ctl maker_opts.ctl #making gff3 files cd natalensis.maker.output gff3_merge -d natalensis.maker.output/natalensis_master_datastore_index.log fasta_merge -d natalensis.maker.output/natalensis_master_datastore_index.log. Error file Possible precedence issue with control flow operator at /apps/maker/3.01.03/lib/site_perl/5.26.2/Bio/DB/IndexedBase.pm line 805. STATUS: Parsing control files... STATUS: Processing and indexing input FASTA files... STATUS: Setting up database for any GFF3 input... A data structure will be created for you at: /blue/austin/bonginkosi.gumbi/mastomys/genome/wtdg/annotation/maker/natalensis.maker.output/natalensis_datastore To access files for individual sequences use the datastore index: /blue/austin/bonginkosi.gumbi/mastomys/genome/wtdg/annotation/maker/natalensis.maker.output/natalensis_master_datastore_index.log ERROR: The file 'natalensis.maker.output/natalensis_master_datastore_index.log' does not exist ERROR: The file 'natalensis.maker.output/natalensis_master_datastore_index.log' does not exist maker61784163.err (END) Humble regards charles -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Dec 11 15:49:55 2020 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 11 Dec 2020 15:49:55 -0700 Subject: [maker-devel] Why my gene is not annotated ? In-Reply-To: References: Message-ID: Are you running a trained gene predictor? Perhaps the predictor does not like something about the area, so even with hints, it cannot make a sensible model. Non-canonical splice sites, runs of NNN, internal stop codons (indicates assembly issue in the region) can be the cause. ?Carson > On Nov 17, 2020, at 9:22 AM, Patrick Tran Van wrote: > > Dear Maker developper, > > I use a transcipt and a protein file with MAKER, here is the intermediate GFF after the blast : > > ### > ctg889_racon_pilon3 tblastx translated_nucleotide_match 153760 157392 137 - . ID=ctg889_racon_pilon3:hit:0:3.6.0.0;Name=Tsi_rna > ctg889_racon_pilon3 tblastx match_part 157186 157392 263 - . ID=ctg889_racon_pilon3:hsp:0:3.6.0.0;Parent=ctg889_racon_pilon3:hit:0:3.6.0.0;Target=Tsi_rna 259 465 -;Gap=M69 > ctg889_racon_pilon3 tblastx match_part 157218 157391 284 - . ID=ctg889_racon_pilon3:hsp:1:3.6.0.0;Parent=ctg889_racon_pilon3:hit:0:3.6.0.0;Target=Tsi_rna 291 464 -;Gap=M58 > ctg889_racon_pilon3 tblastx match_part 157193 157390 241 - . ID=ctg889_racon_pilon3:hsp:2:3.6.0.0;Parent=ctg889_racon_pilon3:hit:0:3.6.0.0;Target=Tsi_rna 266 463 -;Gap=M66 > ctg889_racon_pilon3 tblastx match_part 156985 157056 87 - . ID=ctg889_racon_pilon3:hsp:3:3.6.0.0;Parent=ctg889_racon_pilon3:hit:0:3.6.0.0;Target=Tsi_rna 223 294 -;Gap=M24 > ctg889_racon_pilon3 tblastx match_part 156990 157055 96 - . ID=ctg889_racon_pilon3:hsp:4:3.6.0.0;Parent=ctg889_racon_pilon3:hit:0:3.6.0.0;Target=Tsi_rna 228 293 -;Gap=M22 > ctg889_racon_pilon3 tblastx match_part 156880 156993 166 - . ID=ctg889_racon_pilon3:hsp:5:3.6.0.0;Parent=ctg889_racon_pilon3:hit:0:3.6.0.0;Target=Tsi_rna 117 230 -;Gap=M38 > ctg889_racon_pilon3 tblastx match_part 156879 156986 154 - . ID=ctg889_racon_pilon3:hsp:6:3.6.0.0;Parent=ctg889_racon_pilon3:hit:0:3.6.0.0;Target=Tsi_rna 116 223 -;Gap=M36 > ctg889_racon_pilon3 tblastx match_part 156881 156985 141 - . ID=ctg889_racon_pilon3:hsp:7:3.6.0.0;Parent=ctg889_racon_pilon3:hit:0:3.6.0.0;Target=Tsi_rna 118 222 -;Gap=M35 > ctg889_racon_pilon3 tblastx match_part 153762 153845 128 - . ID=ctg889_racon_pilon3:hsp:8:3.6.0.0;Parent=ctg889_racon_pilon3:hit:0:3.6.0.0;Target=Tsi_rna 3 86 -;Gap=M28 > ctg889_racon_pilon3 tblastx match_part 153761 153844 134 - . ID=ctg889_racon_pilon3:hsp:9:3.6.0.0;Parent=ctg889_racon_pilon3:hit:0:3.6.0.0;Target=Tsi_rna 2 85 -;Gap=M28 > ctg889_racon_pilon3 tblastx match_part 153760 153843 137 - . ID=ctg889_racon_pilon3:hsp:10:3.6.0.0;Parent=ctg889_racon_pilon3:hit:0:3.6.0.0;Target=Tsi_rna 1 84 -;Gap=M28 > ctg889_racon_pilon3 cdna2genome expressed_sequence_match 153760 157392 2022 + . ID=ctg889_racon_pilon3:hit:1:3.6.0.0;Name=Tsi_rna > ctg889_racon_pilon3 cdna2genome match_part 153760 153845 2022 + . ID=ctg889_racon_pilon3:hsp:11:3.6.0.0;Parent=ctg889_racon_pilon3:hit:1:3.6.0.0;Target=Tsi_rna 1 86 +;Gap=M86 > ctg889_racon_pilon3 cdna2genome match_part 155744 155774 2022 + . ID=ctg889_racon_pilon3:hsp:12:3.6.0.0;Parent=ctg889_racon_pilon3:hit:1:3.6.0.0;Target=Tsi_rna 87 117 +;Gap=M31 > ctg889_racon_pilon3 cdna2genome match_part 156881 157055 2022 + . ID=ctg889_racon_pilon3:hsp:13:3.6.0.0;Parent=ctg889_racon_pilon3:hit:1:3.6.0.0;Target=Tsi_rna 118 293 +;Gap=M105 I1 M70 > ctg889_racon_pilon3 cdna2genome match_part 157221 157392 2022 + . ID=ctg889_racon_pilon3:hsp:14:3.6.0.0;Parent=ctg889_racon_pilon3:hit:1:3.6.0.0;Target=Tsi_rna 294 465 +;Gap=M172 > ctg889_racon_pilon3 blastx protein_match 153760 157389 133 + . ID=ctg889_racon_pilon3:hit:2:3.10.0.0;Name=TSI_CENPA > ctg889_racon_pilon3 blastx match_part 153760 153846 133 + . ID=ctg889_racon_pilon3:hsp:15:3.10.0.0;Parent=ctg889_racon_pilon3:hit:2:3.10.0.0;Target=TSI_CENPA 1 29;Gap=M29 > ctg889_racon_pilon3 blastx match_part 156881 156994 191 + . ID=ctg889_racon_pilon3:hsp:16:3.10.0.0;Parent=ctg889_racon_pilon3:hit:2:3.10.0.0;Target=TSI_CENPA 40 77;Gap=M38 > ctg889_racon_pilon3 blastx match_part 156985 157389 314 + . ID=ctg889_racon_pilon3:hsp:17:3.10.0.0;Parent=ctg889_racon_pilon3:hit:2:3.10.0.0;Target=TSI_CENPA 75 154;Gap=M19 D35 M4 D20 M57 > ctg889_racon_pilon3 protein2genome protein_match 153760 157389 668 + . ID=ctg889_racon_pilon3:hit:3:3.10.0.0;Name=TSI_CENPA > ctg889_racon_pilon3 protein2genome match_part 153760 153845 668 + . ID=ctg889_racon_pilon3:hsp:18:3.10.0.0;Parent=ctg889_racon_pilon3:hit:3:3.10.0.0;Target=TSI_CENPA 1 28;Gap=M28 F2 > ctg889_racon_pilon3 protein2genome match_part 155744 155774 668 + . ID=ctg889_racon_pilon3:hsp:19:3.10.0.0;Parent=ctg889_racon_pilon3:hit:3:3.10.0.0;Target=TSI_CENPA 29 39;Gap=R2 M11 > ctg889_racon_pilon3 protein2genome match_part 156881 157055 668 + . ID=ctg889_racon_pilon3:hsp:20:3.10.0.0;Parent=ctg889_racon_pilon3:hit:3:3.10.0.0;Target=TSI_CENPA 40 97;Gap=M35 I1 M1 F2 M20 F1 F1 > ctg889_racon_pilon3 protein2genome match_part 157221 157389 668 + . ID=ctg889_racon_pilon3:hsp:21:3.10.0.0;Parent=ctg889_racon_pilon3:hit:3:3.10.0.0;Target=TSI_CENPA 98 154;Gap=R2 M57 > > > There is obviously a gene here but at the end I don't have anything annotated by MAKER, why ? > > Thanks for your help. > > Patrick Tran Van > > Bioinformatician: Lab Chapuisat & Schwander > Department of Ecology and Evolution > University of Lausanne > Lausanne - Switzerland > Office 3206 > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 1376 bytes Desc: not available URL: From carsonhh at gmail.com Fri Dec 11 16:02:14 2020 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 11 Dec 2020 16:02:14 -0700 Subject: [maker-devel] Format error detected with maker2eval_gtf In-Reply-To: References: Message-ID: <800FEDCE-F74F-44EF-8AB8-09A9DAC7A03C@gmail.com> Truncated line in the GFF3. Network mounted file systems can sometimes return true on write operations that actually failed (should have killed the program, but NFS lies, and the program keeps running). Since MAKER does such high IO, this comes up from time to time. If you can identify the contig involved, you can either rerun it or delete just the gff3 files in the folder and let MAEKR rewrite them. ?Carson > On Nov 16, 2020, at 8:29 PM, Zoe Clarke wrote: > > Hello! > > I am currently working with an annotation from Maker that has just finished. Unfortunately I think there are some errors in my annotation, as a lot of my reads from a test RNA-seq alignment got discarded from my analysis. The analysis tool suggested my annotation might have overlapping reads. > > I am trying to figure out what went wrong with my annotation, and the closest hint I have are error messages that I receive from running maker2eval_gtf on my output merged gff file. It appears every line of my gff is producing the following errors: > > Use of uninitialized value $att in pattern match (m//) at maker2eval_gtf line 244, line 200199481. > Use of uninitialized value $att in pattern match (m//) at maker2eval_gtf line 245, line 200199481. > Use of uninitialized value $att in pattern match (m//) at maker2eval_gtf line 246, line 200199481. > Use of uninitialized value in string eq at maker2eval_gtf line 35, line 200199481. > Use of uninitialized value in string eq at maker2eval_gtf line 37, line 200199481. > Use of uninitialized value in string eq at maker2eval_gtf line 38, line 200199481. > Use of uninitialized value in string eq at maker2eval_gtf line 39, line 200199481. > Use of uninitialized value in string eq at maker2eval_gtf line 40, line 200199481. > Use of uninitialized value in string eq at maker2eval_gtf line 41, line 200199481. > > Do you know what issue this might indicate for my gff? At this point, I have done no filtering, just merged the files with gff3_merge. > > Thank you so much, and please let me know if there is anything else I can provide to inform you more about my issue! > > Zoe > ______________________________________ > Zoe Clarke > PhD candidate in Computational Biology at U of T > Lab profile: http://baderlab.org/Zoe%20Clarke > Personal website: https://zoe-clarke.weebly.com/ _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 1376 bytes Desc: not available URL: From revale at well.ox.ac.uk Thu Dec 24 09:29:23 2020 From: revale at well.ox.ac.uk (Santiago Revale) Date: Thu, 24 Dec 2020 16:29:23 +0000 Subject: [maker-devel] processing of simple and complex repeats Message-ID: Dear Maker developers, Can Maker distinguish between simple and complex repeats from a gff3 file of pre-aligned repeats? I'm trying to annotate a genome of a non-model Drosophila species and I've already generated a gff3 file with both simple and complex repeats for this species. I would like to use this gff3 file as input for Repeat Masking so Maker won't have to align repeats from any library. My maker_opts.ctl file looks like this: #-----Repeat Masking model_org= rmlib= repeat_/path/to/te_proteins.fasta rm_gff=/path/to/Dato_genome.Dato-first.full_mask.out.reformat.gff3 prok_rm=0 softmask=1 By using softmask=1 I understand that Maker will softmask only low complexity repeats (while complex ones will be hardmasked). My question is whether Maker can distinguish between simple and complex repeats from the gff3 file in order to softmask only simple repeats. Also, do you think it would be better to only include complex repeats in the gff3 file and let Maker find simple repeats on its own by using model_org=simple? Thank you very much in advance. Best regards, Santiago -------------- next part -------------- An HTML attachment was scrubbed... URL: From nanshangogo at gmail.com Wed Dec 23 00:03:35 2020 From: nanshangogo at gmail.com (nanshan yang) Date: Wed, 23 Dec 2020 15:03:35 +0800 Subject: [maker-devel] Qusetion about the ab initio gene predictors result Message-ID: Hi MAKER community : I have questions about MAKER output files.I get result from *ab initio* gene predictors which use snap and augustus by maker,and after fasta_merge step,there are some fasta files as: result.maker.augustus_masked.proteins.fasta result.maker.augustus_masked.transcripts.fasta result.maker.augustus.proteins.fasta result.maker.augustus.transcripts.fasta result.maker.non_overlapping_ab_initio.proteins.fasta result.maker.non_overlapping_ab_initio.transcripts.fasta result.maker.snap.proteins.fasta result.maker.snap_masked..transcripts.fasta result.maker.snap_masked..proteins.fasta result.maker.snap.transcripts.fasta result.maker.proteins.fasta result.maker.transcripts.fasta if i continue to analysis the fasta files,which fasta should i choose? because i choose *ab initio* gene predictors,so the result .maker.non_overlapping_ab_initio*fasta can be uesed into the downstream analysis?or the result.maker.proteins.fasta Thanks verymuch for any help or insights -------------- next part -------------- An HTML attachment was scrubbed... URL: From revale at well.ox.ac.uk Tue Dec 29 12:19:11 2020 From: revale at well.ox.ac.uk (Santiago Revale) Date: Tue, 29 Dec 2020 19:19:11 +0000 Subject: [maker-devel] dispersed_repeat features on rm_gff GFF3 file Message-ID: Dear Maker developers, Can Maker deal with a "dispersed_repeat" one-level features GFF3 file provided to the rm_gff option in the control file? We're trying to annotate a genome of a non-model Drosophila species using a previously generated GFF3 file with repeats for this species. Some months ago, Carson Holt replied to a post (https://groups.google.com/g/maker-devel/c/BhBMTF8dze8/m/y-QwJYRFAQAJ) saying that "They (repeat models) must be match/match_part two level feature for rm_gff". However, all features on our GFF3 file are 'dispersed_repeat' one-level features. Here is how our file looks like (extract of the file for a scaffold attached as well, repeats.gff3): ##sequence-region scaffold1062|size32755 1 32755 scaffold1062|size32755 RepeatMasker dispersed_repeat 1 47 271 - . Target=DNAREP1_DM 26 73;ID=2740 scaffold1062|size32755 RepeatMasker dispersed_repeat 48 83 263 + . Target=Dbuz_kz3_5_36225 704 739;ID=2741 scaffold1062|size32755 RepeatMasker dispersed_repeat 340 386 299 - . Target=rnd-1_family-30 19 65;ID=2742 scaffold1062|size32755 RepeatMasker dispersed_repeat 349 387 247 + . Target=rnd-1_family-29 1 71;ID=2743 scaffold1062|size32755 RepeatMasker dispersed_repeat 388 430 240 + . Target=rnd-5_family-3333_TIR_P 3669 3713;ID=2744 scaffold1062|size32755 RepeatMasker dispersed_repeat 446 555 408 - . Target=rnd-1_family-15 16 138;ID=2745 I'm attaching both the log file and the CTL file for what it's worth (Dtest.log / maker_opts.ctl). I wonder whether Maker handled right our rm_gff file, whether the generated output is reliable. If not, do you think modifying the GFF3 file manually duplicating each row and changing the third column to 'match/match_part' could do the trick? Thank you very much in advance. Best regards, Santiago -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Dtest.log Type: application/octet-stream Size: 40356 bytes Desc: Dtest.log URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_opts.ctl Type: application/octet-stream Size: 4995 bytes Desc: maker_opts.ctl URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: repeats.gff3 Type: application/octet-stream Size: 1355 bytes Desc: repeats.gff3 URL: