From patrick.tranvan at unil.ch Sun May 3 05:39:59 2020 From: patrick.tranvan at unil.ch (Patrick Tran Van) Date: Sun, 3 May 2020 11:39:59 +0000 Subject: [maker-devel] Multiple UTR ? In-Reply-To: References: , Message-ID: <16630d833d1448e7a771a4f2b19b0476@unil.ch> Hi Carson, for instance, if have this: SCFXX maker five_prime_UTR 5164370 5164715 . - . ID=GENE-RA:five_prime_utr;Parent=GENE-RA; SCFXX maker five_prime_UTR 5156091 5156136 . - . ID=GENE-RA:five_prime_utr;Parent=GENE-RA; Does it mean that real coordinate of the 5' UTR is from 5156091 to 5164715 ? Patrick Tran Van Bioinformatician: Lab Chapuisat & Schwander Department of Ecology and Evolution University of Lausanne Lausanne - Switzerland Office 3206 ________________________________ From: Carson Holt Sent: Wednesday, February 26, 2020 8:27:43 PM To: Patrick Tran Van Cc: maker-devel at yandell-lab.org Subject: Re: [maker-devel] Multiple UTR ? Sorry for the very slow reply. I found this way way down in my inbox. The UTR features are the parts of the exons that are not CDS. So multiple UTR, means it spans multiple exons, and must assembled to generate the full UTR in a browser. Any exon that is fully non-coding will produce a UTR feature that mirrors an exons coordinates, and if it?s partially coding the UTR will share the same start or end by will terminate somewhere in the middle with a CDS filling up the remains coordinates. The UTR and CDS features get tiled over the top of the exon features when assembling a gene model. ?Carson On Dec 18, 2019, at 7:19 AM, Patrick Tran Van > wrote: Hi Carson, I have seen something strange in my annotation: multiple UTR. How can we explain this ? Thanks! Scaffold maker mRNA 12117462 12128433 . - . ID=GENE_02395-RA;Parent=GENE_02395;Name=GENE_02395-RA;Alias=maker-Scaffold-augustus-gene-40.12-mRNA-3;_AED=0.02;_QI=5383|1|1|1|0.88|0.9|10|247|238;_eAED=0.02;Note=Protein of unknown function; Scaffold maker exon 12128112 12128433 . - . ID=GENE_02395-RA:exon:571;Parent=GENE_02395-RA; Scaffold maker exon 12117462 12118046 . - . ID=GENE_02395-RB:exon:569;Parent=GENE_02395-RB,GENE_02395-RC,GENE_02395-RA; Scaffold maker exon 12118141 12118301 . - . ID=GENE_02395-RB:exon:568;Parent=GENE_02395-RB,GENE_02395-RC,GENE_02395-RA; Scaffold maker exon 12118386 12118539 . - . ID=GENE_02395-RB:exon:567;Parent=GENE_02395-RB,GENE_02395-RA; Scaffold maker exon 12118818 12122493 . - . ID=GENE_02395-RB:exon:566;Parent=GENE_02395-RB,GENE_02395-RA; Scaffold maker exon 12123591 12123893 . - . ID=GENE_02395-RB:exon:565;Parent=GENE_02395-RB,GENE_02395-RC,GENE_02395-RA; Scaffold maker exon 12123995 12124303 . - . ID=GENE_02395-RB:exon:564;Parent=GENE_02395-RB,GENE_02395-RC,GENE_02395-RA; Scaffold maker exon 12125119 12125418 . - . ID=GENE_02395-RB:exon:563;Parent=GENE_02395-RB,GENE_02395-RC,GENE_02395-RA; Scaffold maker exon 12126005 12126313 . - . ID=GENE_02395-RB:exon:562;Parent=GENE_02395-RB,GENE_02395-RC,GENE_02395-RA; Scaffold maker exon 12127460 12127687 . - . ID=GENE_02395-RB:exon:561;Parent=GENE_02395-RB,GENE_02395-RC,GENE_02395-RA; Scaffold maker five_prime_UTR 12128112 12128433 . - . ID=GENE_02395-RA:five_prime_utr;Parent=GENE_02395-RA; Scaffold maker five_prime_UTR 12127460 12127687 . - . ID=GENE_02395-RA:five_prime_utr;Parent=GENE_02395-RA; Scaffold maker five_prime_UTR 12126005 12126313 . - . ID=GENE_02395-RA:five_prime_utr;Parent=GENE_02395-RA; Scaffold maker five_prime_UTR 12125119 12125418 . - . ID=GENE_02395-RA:five_prime_utr;Parent=GENE_02395-RA; Scaffold maker five_prime_UTR 12123995 12124303 . - . ID=GENE_02395-RA:five_prime_utr;Parent=GENE_02395-RA; Scaffold maker five_prime_UTR 12123591 12123893 . - . ID=GENE_02395-RA:five_prime_utr;Parent=GENE_02395-RA; Scaffold maker five_prime_UTR 12118882 12122493 . - . ID=GENE_02395-RA:five_prime_utr;Parent=GENE_02395-RA; Scaffold maker CDS 12118818 12118881 . - 0 ID=GENE_02395-RA:cds;Parent=GENE_02395-RA; Scaffold maker CDS 12118386 12118539 . - 2 ID=GENE_02395-RA:cds;Parent=GENE_02395-RA; Scaffold maker CDS 12118141 12118301 . - 1 ID=GENE_02395-RA:cds;Parent=GENE_02395-RA; Scaffold maker CDS 12117709 12118046 . - 2 ID=GENE_02395-RA:cds;Parent=GENE_02395-RA; Scaffold maker three_prime_UTR 12117462 12117708 . - . ID=GENE_02395-RA:three_prime_utr;Parent=GENE_02395-RA; Patrick Tran Van Bioinformatician: Lab Chapuisat & Schwander Department of Ecology and Evolution University of Lausanne Lausanne - Switzerland Office 3206 _______________________________________________ maker-devel mailing list maker-devel at yandell-lab.org http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From liorglic at mail.tau.ac.il Tue May 5 03:46:01 2020 From: liorglic at mail.tau.ac.il (Lior Glick) Date: Tue, 5 May 2020 12:46:01 +0300 Subject: [maker-devel] Unable to reproduce MAKER blastn results Message-ID: Hello, I am running MAKER 2.31.10 with a very simple configuration, with only EST evidence and the est2genome option enabled (basically a lift-over procedure). I noticed that some of my transcripts are not included in the annotation output and when I looked at the blastn results the reason was clear - they do not pass the coverage cutoff defined in maker_bopts.ctl. Interestingly, when I tried running blastn myself, using the same command (taken from the maker log) and the same blastn version, I got slightly different results. Specifically, for some of the transcripts the MAKER blastn run produced less HSPs than my blastn run, resulting in a lower total coverage. The additional HSPs seem to have good % identity and E-values, so I don't understand why and how they are discarded. Are the blastn results changed by MAKER in subsequent steps (after the blastn run)? Please find attached blastn results from MAKER run and from my run. You can look at transcript AT1G01740.3 as an example. in my.blastn, there are 8 HSPs, while MAKER.blastn only has 3 of them. Can you explain the difference? Maybe it has to do with repeat masking or other processing of the genome sequence? Just to make sure you have all the details: Relevant maker_bopts parameters: pcov_blastn=0.7 #Blastn Percent Coverage Threhold EST-Genome Alignments pid_blastn=0.85 #Blastn Percent Identity Threshold EST-Genome Aligments eval_blastn=1e-10 #Blastn eval cutoff bit_blastn=40 #Blastn bit cutoff depth_blastn=0 #Blastn depth cutoff (0 to disable cutoff) Blastn command: blastn -db /groups/itay_mayrose/nosnap/liorglic/Projects/PGCM/output/A_thaliana_pan_genome/PGC_de_novo/RESULT_RG_new/per_sample/col-0/liftover_SRR1945757/chunks/chunk00.fa/TMP/maker_sPf3Rf/TAIR10_longest_trans%2Efasta.mpi.10.0 -query /groups/itay_mayrose/nosnap/liorglic/Projects/PGCM/output/A_thaliana_pan_genome/PGC_de_novo/RESULT_RG_new/per_sample/col-0/liftover_SRR1945757/chunks/chunk00.fa/TMP/maker_sPf3Rf/0/chunk00.0 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-10 -word_size 28 -reward 1 -penalty -5 -gapopen 5 -gapextend 5 -dbsize 1000 -searchsp 500000000 -num_threads 10 -lcase_masking -dust yes -soft_masking true -show_gis -out Thank you! -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: my.blastn Type: application/octet-stream Size: 440089 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: MAKER.blastn Type: application/octet-stream Size: 1143935 bytes Desc: not available URL: From zuyao.liu.0910 at gmail.com Tue May 5 14:41:25 2020 From: zuyao.liu.0910 at gmail.com (=?UTF-8?B?56WW5bCn5YiY?=) Date: Tue, 5 May 2020 22:41:25 +0200 Subject: [maker-devel] Question about maker. Maker2 failed Message-ID: Hi maker developer, I'm using maker 2 to annotate a vertebrate genome. When I try to provide rm_gff file, it always fails. Here is log: Now starting the contig!! SeqID: chr_XXII Length: 12689475 #--------------------------------------------------------------------- setting up GFF3 output and fasta chunks doing repeat masking ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Did not specify a Hit End or Hit Begin STACK: Error::throw STACK: Bio::Root::Root::throw /gpfs/homefs/iee/zl19g775/miniconda3/envs/maker/lib/site_perl/5.26.2/Bio/Root/Root.pm:449 STACK: Bio::Search::HSP::GenericHSP::_subject_seq_feature /gpfs/homefs/iee/zl19g775/miniconda3/envs/maker/lib/site_perl/5.26.2/Bio/Search/HSP/GenericHSP.pm:1604 STACK: Bio::Search::HSP::GenericHSP::hit /gpfs/homefs/iee/zl19g775/miniconda3/envs/maker/lib/site_perl/5.26.2/Bio/Search/HSP/GenericHSP.pm:988 STACK: repeat_mask_seq::separate_types /gpfs/homefs/iee/zl19g775/miniconda3/envs/maker/bin/../lib/ repeat_mask_seq.pm:307 STACK: repeat_mask_seq::mask_chunk /gpfs/homefs/iee/zl19g775/miniconda3/envs/maker/bin/../lib/ repeat_mask_seq.pm:191 STACK: Process::MpiChunk::_go /gpfs/homefs/iee/zl19g775/miniconda3/envs/maker/bin/../lib/Process/MpiChunk.pm:763 STACK: Process::MpiChunk::run /gpfs/homefs/iee/zl19g775/miniconda3/envs/maker/bin/../lib/Process/MpiChunk.pm:341 STACK: Process::MpiChunk::run_all /gpfs/homefs/iee/zl19g775/miniconda3/envs/maker/bin/../lib/Process/MpiChunk.pm:357 STACK: Process::MpiTiers::run_all /gpfs/homefs/iee/zl19g775/miniconda3/envs/maker/bin/../lib/Process/MpiTiers.pm:287 STACK: Process::MpiTiers::run_all /gpfs/homefs/iee/zl19g775/miniconda3/envs/maker/bin/../lib/Process/MpiTiers.pm:287 STACK: /home/ubelix/iee/zl19g775/miniconda3/envs/maker/bin/maker:689 ----------------------------------------------------------- --> rank=NA, hostname=submit02.ubelix.unibe.ch ERROR: Failed while doing repeat masking ERROR: Chunk failed at level:0, tier_type:1 FAILED CONTIG:chr_XXII ERROR: Chunk failed at level:2, tier_type:0 FAILED CONTIG:chr_XXII examining contents of the fasta file and run log I also searched the google group and tried update my bioperl to 1.7.7 the latest version, but it didn't help. Could you please help me? Thanks a lot. Zuyao -------------- next part -------------- An HTML attachment was scrubbed... URL: From c143dad at ufl.edu Wed May 6 11:36:40 2020 From: c143dad at ufl.edu (Carneiro,Celine M) Date: Wed, 6 May 2020 17:36:40 +0000 Subject: [maker-devel] gene:multiple_Einit and overlaps_prev_exon errors in first round of SNAP training Message-ID: Hello, I am getting the errors gene:multiple_Einit, gene:multiple_Eterm, and exon:overlaps_prev_exon, at just about every gene model. I've ran the first round of maker on a bird genome I'm annotating with no errors and have started the steps to train SNAP. However, after running fathom -categorize, just about every single gene model has the same set of errors. Here is an example from my log file after running fathom -categorize: MODEL117 1 1 8 - errors(6): gene:multiple_Einit gene:multiple_Eterm exon-7:overlaps_prev_exon exon-5:overlaps_prev_exon exon-3:overlaps_prev_exon exon-1:overlaps_prev_exon MODEL851 1 1 100 - errors(78): gene:multiple_Einit gene:multiple_Eterm exon-99:overlaps_prev_exon exon-98:overlaps_prev_exon exon-97:overlaps_prev_exon exon-95:overlaps_prev_exon exon-94:overlaps_prev_exon exon-93:overlaps_prev_exon exon-91:overlaps_prev_exon exon-90:overlaps_prev_exon exon-89:overlaps_prev_exon exon-87:overlaps_prev_exon exon-86:overlaps_prev_exon exon-85:overlaps_prev_exon exon-83:overlaps_prev_exon exon-82:overlaps_prev_exon exon-81:overlaps_prev_exon exon-79:overlaps_prev_exon exon-78:overlaps_prev_exon exon-77:overlaps_prev_exon exon-75:overlaps_prev_exon exon-74:overlaps_prev_exon exon-73:overlaps_prev_exon exon-71:overlaps_prev_exon exon-70:overlaps_prev_exon exon-69:overlaps_prev_exon exon-67:overlaps_prev_exon exon-66:overlaps_prev_exon exon-65:overlaps_prev_exon exon-63:overlaps_prev_exon exon-62:overlaps_prev_exon exon-61:overlaps_prev_exon exon-59:overlaps_prev_exon exon-58:overlaps_prev_exon exon-57:overlaps_prev_exon exon-55:overlaps_prev_exon exon-54:overlaps_prev_exon exon-53:overlaps_prev_exon exon-51:overlaps_prev_exon exon-50:overlaps_prev_exon exon-49:overlaps_prev_exon exon-48:overlaps_prev_exon exon-47:overlaps_prev_exon exon-46:overlaps_prev_exon exon-45:overlaps_prev_exon exon-43:overlaps_prev_exon exon-42:overlaps_prev_exon exon-41:overlaps_prev_exon exon-39:overlaps_prev_exon exon-38:overlaps_prev_exon exon-37:overlaps_prev_exon exon-35:overlaps_prev_exon exon-34:overlaps_prev_exon exon-33:overlaps_prev_exon exon-31:overlaps_prev_exon exon-30:overlaps_prev_exon exon-29:overlaps_prev_exon exon-27:overlaps_prev_exon exon-26:overlaps_prev_exon exon-25:overlaps_prev_exon exon-23:overlaps_prev_exon exon-22:overlaps_prev_exon exon-21:overlaps_prev_exon exon-19:overlaps_prev_exon exon-18:overlaps_prev_exon exon-17:overlaps_prev_exon exon-15:overlaps_prev_exon exon-14:overlaps_prev_exon exon-13:overlaps_prev_exon exon-11:overlaps_prev_exon exon-10:overlaps_prev_exon exon-9:overlaps_prev_exon exon-7:overlaps_prev_exon exon-6:overlaps_prev_exon exon-5:overlaps_prev_exon exon-3:overlaps_prev_exon exon-2:overlaps_prev_exon exon-1:overlaps_prev_exon MODEL190 1 1 39 + errors(35): gene:multiple_Einit gene:multiple_Eterm exon-2:overlaps_prev_exon exon-3:overlaps_prev_exon exon-4:overlaps_prev_exon exon-5:overlaps_prev_exon exon-6:overlaps_prev_exon exon-7:overlaps_prev_exon exon-8:overlaps_prev_exon exon-9:overlaps_prev_exon exon-11:overlaps_prev_exon exon-12:overlaps_prev_exon exon-13:overlaps_prev_exon exon-14:overlaps_prev_exon exon-15:overlaps_prev_exon exon-16:overlaps_prev_exon exon-17:overlaps_prev_exon exon-18:overlaps_prev_exon exon-20:overlaps_prev_exon exon-21:overlaps_prev_exon exon-22:overlaps_prev_exon exon-23:overlaps_prev_exon exon-24:overlaps_prev_exon exon-25:overlaps_prev_exon exon-26:overlaps_prev_exon exon-27:overlaps_prev_exon exon-29:overlaps_prev_exon exon-30:overlaps_prev_exon exon-32:overlaps_prev_exon exon-33:overlaps_prev_exon exon-34:overlaps_prev_exon exon-35:overlaps_prev_exon exon-36:overlaps_prev_exon exon-38:overlaps_prev_exon exon-39:overlaps_prev_exon MODEL424 1 1 10 - errors(8): gene:multiple_Einit gene:multiple_Eterm exon-9:overlaps_prev_exon exon-7:overlaps_prev_exon exon-6:overlaps_prev_exon exon-5:overlaps_prev_exon exon-3:overlaps_prev_exon exon-1:overlaps_prev_exon MODEL902 1 1 20 - errors(14): gene:multiple_Einit gene:multiple_Eterm exon-19:overlaps_prev_exon exon-18:overlaps_prev_exon exon-17:overlaps_prev_exon exon-15:overlaps_prev_exon exon-13:overlaps_prev_exon exon-11:overlaps_prev_exon exon-9:overlaps_prev_exon exon-7:overlaps_prev_exon exon-6:overlaps_prev_exon exon-5:overlaps_prev_exon exon-3:overlaps_prev_exon exon-1:overlaps_prev_exon MODEL238 1 1 14 - errors(11): gene:multiple_Einit gene:multiple_Eterm exon-13:overlaps_prev_exon exon-12:overlaps_prev_exon exon-11:overlaps_prev_exon exon-9:overlaps_prev_exon exon-7:overlaps_prev_exon exon-6:overlaps_prev_exon exon-5:overlaps_prev_exon exon-3:overlaps_prev_exon exon-1:overlaps_prev_exon MODEL39 1 1 6 - errors(1): exon-3:overlaps_prev_exon MODEL119 1 1 10 + errors(8): gene:multiple_Einit gene:multiple_Eterm exon-2:overlaps_prev_exon exon-4:overlaps_prev_exon exon-6:overlaps_prev_exon exon-7:overlaps_prev_exon exon-8:overlaps_prev_exon exon-10:overlaps_prev_exon Furthermore, I checked my genome.ann file and noticed that my Einit and Exon sites are duplicated. For example: >ScdimlH_1004;HRSCAF=1084 Einit 38730 38677 MODEL851 Exon 38255 38178 MODEL851 Exon 38074 38021 MODEL851 Exon 24755 24717 MODEL851 Exon 24213 24149 MODEL851 Exon 23176 23098 MODEL851 Exon 22037 21961 MODEL851 Exon 21269 21080 MODEL851 Exon 20232 20167 MODEL851 Exon 19742 19704 MODEL851 Exon 14705 14590 MODEL851 Exon 14255 13980 MODEL851 Exon 14169 13980 MODEL851 Exon 13303 13223 MODEL851 Exon 13303 13223 MODEL851 Exon 12782 12639 MODEL851 Exon 12782 12639 MODEL851 Exon 5761 5592 MODEL851 Exon 5482 5404 MODEL851 Exon 5140 5064 MODEL851 Exon 4951 4750 MODEL851 Exon 4567 4502 MODEL851 Exon 4256 4185 MODEL851 Exon 3569 3403 MODEL851 Exon 3157 3076 MODEL851 Exon 2936 2800 MODEL851 Eterm 2186 2000 MODEL851 Einit 38730 38677 MODEL851 Exon 38255 38178 MODEL851 Exon 38074 38021 MODEL851 Exon 24755 24717 MODEL851 Exon 24213 24149 MODEL851 Exon 23176 23098 MODEL851 Exon 22037 21961 MODEL851 Exon 21269 21080 MODEL851 Exon 20232 20167 MODEL851 Exon 19742 19704 MODEL851 Exon 14705 14590 MODEL851 Exon 14255 13980 MODEL851 Exon 14169 13980 MODEL851 Exon 13303 13223 MODEL851 Exon 13303 13223 MODEL851 Exon 12782 12639 MODEL851 Exon 12782 12639 MODEL851 Exon 5761 5592 MODEL851 Exon 5482 5404 MODEL851 Exon 5140 5064 MODEL851 Exon 4951 4750 MODEL851 Exon 4567 4502 MODEL851 Exon 4256 4185 MODEL851 Exon 3569 3403 MODEL851 Exon 3157 3076 MODEL851 Exon 2936 2800 MODEL851 Eterm 2186 2000 MODEL851 Any ideas why I'm seeing this duplication? Lastly, any ideas why my exons are overlapping so much? I appreciate any input and please let me know if you require any more information. Thank you! Celine -------------- next part -------------- An HTML attachment was scrubbed... URL: From peruzzaluca at gmail.com Tue May 19 02:31:22 2020 From: peruzzaluca at gmail.com (Luca Peruzza) Date: Tue, 19 May 2020 10:31:22 +0200 Subject: [maker-devel] Maker v3.01 change-log + 3'UTR question Message-ID: Hi There, I have two questions and I hope you guys can help me with them: 1. I have seen that maker version 3.01 is now out. Is there a change log available to see the changes in comparison to the previous maker version and have a glimpse of the new features of this release? 2. If I was to improve the annotation of my 3? UTRs within a certain (non-model species) gff3, is there a particular way or a protocol to follow? I was thinking for example that Lexogen has released their 3? UTR kit for RNA-seq of the three prime end of transcripts. Would it be possible to feed those reads to maker and somehow suggest that the reads are originating from the three-prime end so that this info is then passed in the gff3 file? Thanks a lot in advance for your help Best Luca From zuyao.liu.0910 at gmail.com Tue May 19 03:10:30 2020 From: zuyao.liu.0910 at gmail.com (=?UTF-8?B?56WW5bCn5YiY?=) Date: Tue, 19 May 2020 11:10:30 +0200 Subject: [maker-devel] Question about maker. Maker2 failed Message-ID: Hi maker developers I'm using maker 2 to annotate a fish genome. When I try to provide rm_gff file, it always fails. Here is log: collecting blastx repeatmasking doing repeat masking processing all repeats deleted:0 hits in cluster::shadow_cluster... Died at /gpfs/homefs/iee/zl19g775/miniconda3/envs/maker/bin/../lib/Bio/Search/Hit/PhatHit/Base.pm line 188. --> rank=23, hostname=hnode48 ERROR: Failed while processing all repeats ERROR: Chunk failed at level:3, tier_type:1 FAILED CONTIG:chr_XXIII ERROR: Chunk failed at level:2, tier_type:0 FAILED CONTIG:chr_XXIII I use maker 2.3.10 with repeatmasker 4.0.9. I saw someone got this error as well and I followed the solutions. I tried update to blast 2.9.0, rmblast 2.9.0,bioperl1.7.7 and also checked rm gff file with gff3 validator. But the error still existed. Do you have any suggestions? Thanks a lot for your help. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jason.stajich at gmail.com Tue May 19 17:29:38 2020 From: jason.stajich at gmail.com (Jason Stajich) Date: Tue, 19 May 2020 16:29:38 -0700 Subject: [maker-devel] Maker v3.01 change-log + 3'UTR question In-Reply-To: References: Message-ID: Luca - I would suggest PASA as a tool for 3'UTR (and 5'UTR) improvement in gene annotation too. https://github.com/PASApipeline/PASApipeline Funannotate has a step that can be use to run and update gene models if you want to also take on from an existing maker run - https://funannotate.readthedocs.io/en/latest/ Jason Jason Stajich jason.stajich at gmail.com On Tue, May 19, 2020 at 1:33 AM Luca Peruzza wrote: > Hi There, > I have two questions and I hope you guys can help me with them: > > 1. I have seen that maker version 3.01 is now out. Is there a change log > available to see the changes in comparison to the previous maker version > and have a glimpse of the new features of this release? > > 2. If I was to improve the annotation of my 3? UTRs within a certain > (non-model species) gff3, is there a particular way or a protocol to > follow? I was thinking for example that Lexogen has released their 3? UTR > kit for RNA-seq of the three prime end of transcripts. Would it be possible > to feed those reads to maker and somehow suggest that the reads are > originating from the three-prime end so that this info is then passed in > the gff3 file? > > Thanks a lot in advance for your help > Best > Luca > > > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From peruzzaluca at gmail.com Wed May 20 06:41:14 2020 From: peruzzaluca at gmail.com (Luca Peruzza) Date: Wed, 20 May 2020 14:41:14 +0200 Subject: [maker-devel] Maker v3.01 change-log + 3'UTR question In-Reply-To: References: Message-ID: <4ABBF9F2-4F9E-4D7F-B821-3276D4D3EFD1@gmail.com> Thanks Jason, Yes, my idea was to add extra 3?UTR info to an existing maker gff3 file. If you say that funannotate can do it, I?ll have a look. Thanks Luca > On 20 May 2020, at 01:29, Jason Stajich wrote: > > Luca - I would suggest PASA as a tool for 3'UTR (and 5'UTR) improvement in gene annotation too. https://github.com/PASApipeline/PASApipeline > > Funannotate has a step that can be use to run and update gene models if you want to also take on from an existing maker run - https://funannotate.readthedocs.io/en/latest/ > > Jason > Jason Stajich > jason.stajich at gmail.com > > > On Tue, May 19, 2020 at 1:33 AM Luca Peruzza > wrote: > Hi There, > I have two questions and I hope you guys can help me with them: > > 1. I have seen that maker version 3.01 is now out. Is there a change log available to see the changes in comparison to the previous maker version and have a glimpse of the new features of this release? > > 2. If I was to improve the annotation of my 3? UTRs within a certain (non-model species) gff3, is there a particular way or a protocol to follow? I was thinking for example that Lexogen has released their 3? UTR kit for RNA-seq of the three prime end of transcripts. Would it be possible to feed those reads to maker and somehow suggest that the reads are originating from the three-prime end so that this info is then passed in the gff3 file? > > Thanks a lot in advance for your help > Best > Luca > > > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From niconm89 at gmail.com Thu May 21 07:58:34 2020 From: niconm89 at gmail.com (=?UTF-8?Q?Nicol=C3=A1s_Moreyra?=) Date: Thu, 21 May 2020 10:58:34 -0300 Subject: [maker-devel] different number of annotated genes and transcripts Message-ID: Dear all, First of all, thank you for sharing your experiences here. I tried to find this issue in the posts already made but failed. Secondly, I am sorry for asking you a silly question (I think), but after I complete the genome annotation of four species, I obtained fewer transcripts than genes. I do not understand why MAKER annotated genes unable to transcribe. I was trying to find the reason for this issue to discuss it in my thesis but I am a bit lost. Has this happened to anyone? Is there any possible cause that comes to mind? Thanks in advance. Nicol?s *--* *Nicolas Nahuel Moreyra* *BSc/MSc in Bioinformatics* *CONICET PhD Fellow @ IEGEBA* *PhD Student in Comparative Genomics @ EGE (**FCEyN - UBA) **-> **nmoreyra at ege.fcen.uba.ar * Professor of Bioinformatics @ Favaloro University Professor of Informatics @ IFTS N? 7 *Argentina* -------------- next part -------------- An HTML attachment was scrubbed... URL: From yujin at genomics.cn Fri May 22 23:46:33 2020 From: yujin at genomics.cn (=?gb2312?B?0+C9+ChKaW4gWXUp?=) Date: Sat, 23 May 2020 05:46:33 +0000 Subject: [maker-devel] maker error-ERROR: Failed while annotating transcripts Message-ID: Hi, Dear developers. I'm using maker-3.01.03 to annotate a plant genome. But I met this error: Can't locate object method "add_entry" via package "1" (perhaps you forgot to load "1"?) at /vol2/liuyang_group/liuyang/software/maker-3.01.03/bin/../lib/Widget/snap.pm line 540. ERROR: Failed while annotating transcripts The attached file is the full STDERR from maker. I have searched the archived mailing list, and found a similar question (https://groups.google.com/forum/#!topic/maker-devel/fGGCKXhi6cw), but I didn't find any error which occurred before this one in the log. Appreciate it a lot if you could help me! Best regards Jin Yu ??? ?? 15527740380 ??????????? ???????? -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker.log Type: application/octet-stream Size: 3935522 bytes Desc: maker.log URL: From zhoux233 at mail2.sysu.edu.cn Sun May 24 17:43:43 2020 From: zhoux233 at mail2.sysu.edu.cn (=?utf-8?B?5ZGo6ZGr?=) Date: Mon, 25 May 2020 07:43:43 +0800 Subject: [maker-devel] Trouble in opening the registration page for Maker Message-ID: Hello, Developers of Maker, I'm a student from SYSU, China. Recently, I wanted to download Maker for my lab annotation work from your website, but I got in trouble opening the registration page for days, and I didn't figure out why. And I failed to install maker with conda, so could you please tell me how to deal with it? Or could you please send me a copy of source? If you are convenient to send me a copy, here is my information: Name: Zhou Xin Email address: zhoux233 at mail2.sysu.edu.cn Software needed: Maker PI name: Huang ShengFeng Research: Genome Annotation for zebrafish Institute: Life Science School, Sun Yat-Sen University Institute URL: http://lifesciences.sysu.edu.cn/ Country: China Province: GuangDong City: Guang Zhou If anything else needed, please email me, I will add it as soon as I see it. Anyway, thank you for your attention very much! Any reply will be appreciated very much! Regards! Zhou Xin -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue May 26 11:54:45 2020 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 26 May 2020 11:54:45 -0600 Subject: [maker-devel] different number of annotated genes and transcripts In-Reply-To: References: Message-ID: Perhaps you are counting wrong. If you want to know the number go genes, you must look at the GFF3. You can use ?grep -c -P ?\tgene\t? file.gff?, then the number of transcripts would be ?grep -c -P ?RNA\t? file.gff" Note that if you are using things like tRNAscan, you will get tRNA transcripts and associated genes. If you are trying to count from the fasta files, make sure you use the right file (maker.proteins.fasta and maker.transcripts.fasta). Thanks, Carson > On May 21, 2020, at 7:58 AM, Nicol?s Moreyra wrote: > > Dear all, > > First of all, thank you for sharing your experiences here. I tried to find this issue in the posts already made but failed. > Secondly, I am sorry for asking you a silly question (I think), but after I complete the genome annotation of four species, I obtained fewer transcripts than genes. I do not understand why MAKER annotated genes unable to transcribe. > I was trying to find the reason for this issue to discuss it in my thesis but I am a bit lost. Has this happened to anyone? Is there any possible cause that comes to mind? > > Thanks in advance. > > Nicol?s > > -- > Nicolas Nahuel Moreyra > BSc/MSc in Bioinformatics > CONICET PhD Fellow @ IEGEBA > PhD Student in Comparative Genomics @ EGE (FCEyN - UBA) -> nmoreyra at ege.fcen.uba.ar > Professor of Bioinformatics @ Favaloro University > Professor of Informatics @ IFTS N? 7 > Argentina > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue May 26 12:10:16 2020 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 26 May 2020 12:10:16 -0600 Subject: [maker-devel] Maker 0 genes after SNAP or with proteins.gff In-Reply-To: <84c5fc195df0fcc5e03484e65076fa9c@uni-duesseldorf.de> References: <84c5fc195df0fcc5e03484e65076fa9c@uni-duesseldorf.de> Message-ID: <053268A7-E5B0-4878-92F2-63B01869B677@gmail.com> You would have to look at the alignments, but I suspect they do not align in a way to the gene models to supply sufficient support for the annotation. If it is the maker2zff script producing 0 genes, that it because it requires at least some EST evidence. You can change that using the command line options. ?Carson > On Apr 24, 2020, at 8:27 AM, Ricardo Nuno Ferreira Martins Guerreiro wrote: > > Dear Makers list, > > > I am struggling with Maker after many successful attempts. I don't understand why but my final .gff does not contain any genes, 0. > > I am running first an Evidence based modelling, with proteins only. Here I get around 40 thousand genes if I give the proteins as a fasta to align (if I provide a protein.gff from a previous maker try, I get 0 genes, same problem). > > Afterwards I'm creating a SNAP hmm and running maker again, turning protein2genome=0 and snaphmm=snap.hmm as you say, but now I have 0 genes. This happens either I keep providing proteins as a fasta or as .gff of a previous run. > > I have done this many times and it always worked. The only difference now is that I am using no ESTs whatsoever, only proteins. It's also strange that it works on the first round of maker but doesn't work on the SNAP rounds. > > > Hope you can help, > Ricardo_______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Tue May 26 12:14:28 2020 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 26 May 2020 12:14:28 -0600 Subject: [maker-devel] Problems with openMPI in multiple computing nodes In-Reply-To: References: Message-ID: <78BFA69F-C631-4190-9F97-6B6ECC7AE15B@gmail.com> You don?t have berkleyDB installed on your system, so BioPerl is trying to fall back to another index format that has issues on network mounted file systems. You can try and install BerkleyDB then the related perl module (https://metacpan.org/pod/BerkeleyDB ). You would then need to reinstall BioPerl and MAKER. You can also try running on a single CPU until indexing finishes, then launch MAKER. That might be enough to get around any early race conditions. ?Carson > On Apr 26, 2020, at 12:58 AM, Xu, taosheng wrote: > > Hello, > I am using a computer cluster with 20 nodes(40cpus per node) for gene annotation. I submit my maker task to one node with 40 CPUs using openMPI. Everything is well. > But I encounter the problem when submitting the same maker task to the cluster with multiple nodes (120 cpus) There are errors shown below. > I would also appreciate any advice. Thank you. > > Best regards, > Taosheng > > > STATUS: Processing and indexing input FASTA files... > cannot remove directory for home/20200425/genome.maker.output/mpi_blastdb/te_proteins%2Efasta.mpi.10//.dbtmp0: No such file or directory at /maker/bin/../lib/FastaDB.pm line 145. > cannot remove directory for /home/20200425/genome.maker.output/mpi_blastdb/te_proteins%2Efasta.mpi.10//.dbtmp0: Directory not empty at /maker/bin/../lib/FastaDB.pm line 145. > cannot remove directory for /home/20200425/genome.maker.output/mpi_blastdb/te_proteins%2Efasta.mpi.10//.dbtmp0: Directory not empty at /maker/bin/../lib/FastaDB.pm line 145. > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue May 26 12:48:14 2020 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 26 May 2020 12:48:14 -0600 Subject: [maker-devel] Missing genes in lift-over with est2genome In-Reply-To: References: <373413EA-9D4C-44CF-AA51-632C0F54B7AC@gmail.com> Message-ID: If using the est_forward=1 options for the leftover, you can also anchor a search to a specific contig or region by adding a tag to the fasta header ( maker_coor=contig:1-1000; ). The tag will force Exonerate to only run on that region. Sometimes that can rescue a model. When you pass results into model_gff=, it will leave them unchanged. It just accepts or rejects them as is. But the model itself is considered evidence, and can alter clustering. Other_gff= just passes things through with no processing or evaluation (it?s like cut and paste). You can also try deFusion on result models for resolving gene fusions ?> https://wjidea.github.io/defusion/Introduction.html ?Carson > On Apr 30, 2020, at 6:58 AM, Lior Glick wrote: > > Thanks Carson - your answer was very helpful. > Another question related to the lift-over process, if I may. > I want to take the resulting gff and pass it on to another MAKER run, where I provide further, lower confidence evidence (ESTs and proteins). I'm not sure which option to use though. According to this helpful post , I tried using pred_gff and model_gff, but both created cases of fusion genes when genes are very adjacent to one another (see attached picture), even with the correct_est_fusion parameter enabled. It looks like the only way to take lifted-over genes "as-is" would be to use other_gff, but I figure that this was not really intended for genes. Would you recommend this usage? Am I missing something? > Thank you! > > ??????? ??? ??, 23 ????? 2020 ?-20:43 ??? ?Carson Holt?? ??>:? > There are percent cutoffs for the est2genome algorithm you can set in the maker_bopts.ctl file. Additionally, maker will give the alignment but not produce a gene model if it can?t translate through the est2genome alignment (i.e. stop codons in the assembly). I believe the cutoff is 50%. If you add est_forward=1 to the maker_opts.ctl file names will be copied from the alignment source and the score in the GFF3 column will be the percent match to the original transcript. > > ?Carson > > > > > On Apr 21, 2020, at 7:08 AM, Lior Glick > wrote: > > > > Hello, > > I am using MAKER to annotate a plant genome assembly. A high-quality reference genome and annotation exists for another variety of the same species, so my first step is lifting over reference genes to my genome. I do this by setting est2genome = 1 and providing MAKER with the reference cDNA (transcriptome). No other evidence is provided and no prediction is performed. Repeat masking is done using the reference repeats library. > > When checking the results, I found out lots of reference genes missing from the lift-over result. However, if I blast the sequences of these genes myself, I get good matches. I even see these matches when I look at the blast results buried in the MAKER data_store. > > For example, a transcript of length 1077 got a match of length 855 - 100% identity and no gaps. Bitscore was 1709 and E-value 0. This looks like a pretty good match, but it is not found in the final MAKER results (gff/fasta). > > Why is this happening? Are there some cutoffs that are not satisfied? If so, what are they and how can they be configured? > > > > Thanks, > > Lior > > _______________________________________________ > > maker-devel mailing list > > maker-devel at yandell-lab.org > > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue May 26 12:51:52 2020 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 26 May 2020 12:51:52 -0600 Subject: [maker-devel] Multiple UTR ? In-Reply-To: <16630d833d1448e7a771a4f2b19b0476@unil.ch> References: <16630d833d1448e7a771a4f2b19b0476@unil.ch> Message-ID: <003DC610-63E0-4D04-8CD2-431C92337C5F@gmail.com> The UTR is split across two exons. The intron is not considered part of the UTR. The UTR exists in the post splicing mRNA, so the corresponding genomic coordinates will have gaps because the introns that exist in the genome have been spliced out of the mRNA. So while the UTR is continuous in the mRNA, it is punctuated in the genome. ?Carson > On May 3, 2020, at 5:39 AM, Patrick Tran Van wrote: > > Hi Carson, > > for instance, if have this: > > SCFXX maker > five_prime_UTR 5164370 > 5164715 . > - . ID=GENE-RA:five_prime_utr;Parent=GENE-RA; > SCFXX maker > five_prime_UTR 5156091 > 5156136 . > - . ID=GENE-RA:five_prime_utr;Parent=GENE-RA; > > > Does it mean that real coordinate of the 5' UTR is from 5156091 to 5164715 ? > > Patrick Tran Van > > Bioinformatician: Lab Chapuisat & Schwander > Department of Ecology and Evolution > University of Lausanne > Lausanne - Switzerland > Office 3206 > From: Carson Holt > > Sent: Wednesday, February 26, 2020 8:27:43 PM > To: Patrick Tran Van > Cc: maker-devel at yandell-lab.org > Subject: Re: [maker-devel] Multiple UTR ? > > Sorry for the very slow reply. I found this way way down in my inbox. > > The UTR features are the parts of the exons that are not CDS. So multiple UTR, means it spans multiple exons, and must assembled to generate the full UTR in a browser. Any exon that is fully non-coding will produce a UTR feature that mirrors an exons coordinates, and if it?s partially coding the UTR will share the same start or end by will terminate somewhere in the middle with a CDS filling up the remains coordinates. The UTR and CDS features get tiled over the top of the exon features when assembling a gene model. > > ?Carson > > > >> On Dec 18, 2019, at 7:19 AM, Patrick Tran Van > wrote: >> >> Hi Carson, >> >> I have seen something strange in my annotation: multiple UTR. How can we explain this ? Thanks! >> >> Scaffold maker >> mRNA 12117462 >> 12128433 . >> - . >> ID=GENE_02395-RA;Parent=GENE_02395;Name=GENE_02395-RA;Alias=maker-Scaffold-augustus-gene-40.12-mRNA-3;_AED=0.02;_QI=5383|1|1|1|0.88|0.9|10|247|238;_eAED=0.02;Note=Protein of unknown function; >> Scaffold maker >> exon 12128112 >> 12128433 . >> - . >> ID=GENE_02395-RA:exon:571;Parent=GENE_02395-RA; >> Scaffold maker >> exon 12117462 >> 12118046 . >> - . >> ID=GENE_02395-RB:exon:569;Parent=GENE_02395-RB,GENE_02395-RC,GENE_02395-RA; >> Scaffold maker >> exon 12118141 >> 12118301 . >> - . >> ID=GENE_02395-RB:exon:568;Parent=GENE_02395-RB,GENE_02395-RC,GENE_02395-RA; >> Scaffold maker >> exon 12118386 >> 12118539 . >> - . >> ID=GENE_02395-RB:exon:567;Parent=GENE_02395-RB,GENE_02395-RA; >> Scaffold maker >> exon 12118818 >> 12122493 . >> - . >> ID=GENE_02395-RB:exon:566;Parent=GENE_02395-RB,GENE_02395-RA; >> Scaffold maker >> exon 12123591 >> 12123893 . >> - . >> ID=GENE_02395-RB:exon:565;Parent=GENE_02395-RB,GENE_02395-RC,GENE_02395-RA; >> Scaffold maker >> exon 12123995 >> 12124303 . >> - . >> ID=GENE_02395-RB:exon:564;Parent=GENE_02395-RB,GENE_02395-RC,GENE_02395-RA; >> Scaffold maker >> exon 12125119 >> 12125418 . >> - . >> ID=GENE_02395-RB:exon:563;Parent=GENE_02395-RB,GENE_02395-RC,GENE_02395-RA; >> Scaffold maker >> exon 12126005 >> 12126313 . >> - . >> ID=GENE_02395-RB:exon:562;Parent=GENE_02395-RB,GENE_02395-RC,GENE_02395-RA; >> Scaffold maker >> exon 12127460 >> 12127687 . >> - . >> ID=GENE_02395-RB:exon:561;Parent=GENE_02395-RB,GENE_02395-RC,GENE_02395-RA; >> Scaffold maker >> five_prime_UTR 12128112 >> 12128433 . >> - . >> ID=GENE_02395-RA:five_prime_utr;Parent=GENE_02395-RA; >> Scaffold maker >> five_prime_UTR 12127460 >> 12127687 . >> - . >> ID=GENE_02395-RA:five_prime_utr;Parent=GENE_02395-RA; >> Scaffold maker >> five_prime_UTR 12126005 >> 12126313 . >> - . >> ID=GENE_02395-RA:five_prime_utr;Parent=GENE_02395-RA; >> Scaffold maker >> five_prime_UTR 12125119 >> 12125418 . >> - . >> ID=GENE_02395-RA:five_prime_utr;Parent=GENE_02395-RA; >> Scaffold maker >> five_prime_UTR 12123995 >> 12124303 . >> - . >> ID=GENE_02395-RA:five_prime_utr;Parent=GENE_02395-RA; >> Scaffold maker >> five_prime_UTR 12123591 >> 12123893 . >> - . >> ID=GENE_02395-RA:five_prime_utr;Parent=GENE_02395-RA; >> Scaffold maker >> five_prime_UTR 12118882 >> 12122493 . >> - . >> ID=GENE_02395-RA:five_prime_utr;Parent=GENE_02395-RA; >> Scaffold maker >> CDS 12118818 >> 12118881 . >> - 0 >> ID=GENE_02395-RA:cds;Parent=GENE_02395-RA; >> Scaffold maker >> CDS 12118386 >> 12118539 . >> - 2 >> ID=GENE_02395-RA:cds;Parent=GENE_02395-RA; >> Scaffold maker >> CDS 12118141 >> 12118301 . >> - 1 >> ID=GENE_02395-RA:cds;Parent=GENE_02395-RA; >> Scaffold maker >> CDS 12117709 >> 12118046 . >> - 2 >> ID=GENE_02395-RA:cds;Parent=GENE_02395-RA; >> Scaffold maker >> three_prime_UTR 12117462 >> 12117708 . >> - . >> ID=GENE_02395-RA:three_prime_utr;Parent=GENE_02395-RA; >> >> >> >> Patrick Tran Van >> >> Bioinformatician: Lab Chapuisat & Schwander >> Department of Ecology and Evolution >> University of Lausanne >> Lausanne - Switzerland >> Office 3206 >> _______________________________________________ >> maker-devel mailing list >> maker-devel at yandell-lab.org >> http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From niconm89 at gmail.com Tue May 26 12:58:22 2020 From: niconm89 at gmail.com (=?UTF-8?Q?Nicol=C3=A1s_Moreyra?=) Date: Tue, 26 May 2020 15:58:22 -0300 Subject: [maker-devel] different number of annotated genes and transcripts In-Reply-To: References: Message-ID: Hi Carson, thanks for your reply. Yes, I did the same as you. Here are different outputs for the same annotation file: > grep -c -P "\tgene\t" Dato_struct-annot.noseq.gff > 17688 > grep -c -P "RNA\t" Dato_struct-annot.noseq.gff > 17688 > grep -c -P "mRNA\t" Dato_struct-annot.noseq.gff > 17205 > grep -P "RNA\t" Dato_struct-annot.noseq.gff| cut -f3 | sort -u > mRNA > tRNA After using a tool to extract transcripts sequences in a Fasta file, y obtained 17205 sequences. Looking for those genes without an associated transcript, it seems that you can only find tRNAs annotated there. It is odd: > Backbone_23 maker gene 486041 486112 . - . > ID=Dato03103;Name=Dato03103;Alias=trnascan-Backbone_23-noncoding-Glu_CTC-gene-4.38; > Backbone_23 maker tRNA 486041 486112 . - . > ID=Dato03103-RA;Parent=Dato03103;Name=Dato03103-RA;_AED=1.00;_QI=0|-1|0|0|-1|0|1|73|0;_eAED=1.00; > Backbone_23 maker exon 486041 486112 . - . > ID=Dato03103-RA:exon:45875;Parent=Dato03103-RA; The AED is bad in this example, so I'm thinking that it would be possible this gene had no evidence supporting it. I do not understand either the "Alias" for the gene line, it looks like trnaScan detected the gene. Any ideas? Nicol?s *--* *Nicolas Nahuel Moreyra* *BSc/MSc in Bioinformatics* *CONICET PhD Fellow @ IEGEBA* *PhD Student in Comparative Genomics @ EGE (**FCEyN - UBA) **-> **nmoreyra at ege.fcen.uba.ar * Professor of Bioinformatics @ Favaloro University Professor of Informatics @ IFTS N? 7 *Argentina* El mar., 26 de may. de 2020 a la(s) 14:54, Carson Holt (carsonhh at gmail.com) escribi?: > Perhaps you are counting wrong. If you want to know the number go genes, > you must look at the GFF3. You can use ?grep -c -P ?\tgene\t? file.gff?, > then the number of transcripts would be ?grep -c -P ?RNA\t? file.gff" > > Note that if you are using things like tRNAscan, you will get tRNA > transcripts and associated genes. If you are trying to count from the > fasta files, make sure you use the right file (maker.proteins.fasta and > maker.transcripts.fasta). > > Thanks, > Carson > > > On May 21, 2020, at 7:58 AM, Nicol?s Moreyra wrote: > > Dear all, > > First of all, thank you for sharing your experiences here. I tried to find > this issue in the posts already made but failed. > Secondly, I am sorry for asking you a silly question (I think), but after > I complete the genome annotation of four species, I obtained fewer > transcripts than genes. I do not understand why MAKER annotated genes > unable to transcribe. > I was trying to find the reason for this issue to discuss it in my thesis > but I am a bit lost. Has this happened to anyone? Is there any possible > cause that comes to mind? > > Thanks in advance. > > Nicol?s > > *--* > *Nicolas Nahuel Moreyra* > *BSc/MSc in Bioinformatics* > *CONICET PhD Fellow @ IEGEBA* > *PhD Student in Comparative Genomics @ EGE (**FCEyN - UBA) **-> **nmoreyra at ege.fcen.uba.ar > * > Professor of Bioinformatics @ Favaloro University > Professor of Informatics @ IFTS N? 7 > *Argentina* > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue May 26 13:01:38 2020 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 26 May 2020 13:01:38 -0600 Subject: [maker-devel] Unable to reproduce MAKER blastn results In-Reply-To: References: Message-ID: <39657402-9B41-4B82-8164-199ECCA634AC@gmail.com> The only downstream change to the blast results would be the removal of HSPs not meeting the bit_blastn of a minimum bitscore. Also note the prove is not a blast parameter. It is a post blast filter. The HSPs are tiled and flattened, then the percent coverage against the original query is calculated (i.e if every base of the query is represented at least once in the result, then coverage is 100% ). The blast results are used only for identifying the rough region a model overlaps that is then passed to exonerate. The exonerate alignment is used to generate the splice aware est2genome model. Many good blastn alignments will produce poor exonerate alignments, and no est2genome results. ?Carson > On May 5, 2020, at 3:46 AM, Lior Glick wrote: > > Hello, > > I am running MAKER 2.31.10 with a very simple configuration, with only EST evidence and the est2genome option enabled (basically a lift-over procedure). > I noticed that some of my transcripts are not included in the annotation output and when I looked at the blastn results the reason was clear - they do not pass the coverage cutoff defined in maker_bopts.ctl. Interestingly, when I tried running blastn myself, using the same command (taken from the maker log) and the same blastn version, I got slightly different results. Specifically, for some of the transcripts the MAKER blastn run produced less HSPs than my blastn run, resulting in a lower total coverage. The additional HSPs seem to have good % identity and E-values, so I don't understand why and how they are discarded. Are the blastn results changed by MAKER in subsequent steps (after the blastn run)? > Please find attached blastn results from MAKER run and from my run. You can look at transcript AT1G01740.3 as an example. in my.blastn, there are 8 HSPs, while MAKER.blastn only has 3 of them. > Can you explain the difference? Maybe it has to do with repeat masking or other processing of the genome sequence? > > Just to make sure you have all the details: > Relevant maker_bopts parameters: > pcov_blastn=0.7 #Blastn Percent Coverage Threhold EST-Genome Alignments > pid_blastn=0.85 #Blastn Percent Identity Threshold EST-Genome Aligments > eval_blastn=1e-10 #Blastn eval cutoff > bit_blastn=40 #Blastn bit cutoff > depth_blastn=0 #Blastn depth cutoff (0 to disable cutoff) > > Blastn command: > blastn -db /groups/itay_mayrose/nosnap/liorglic/Projects/PGCM/output/A_thaliana_pan_genome/PGC_de_novo/RESULT_RG_new/per_sample/col-0/liftover_SRR1945757/chunks/chunk00.fa/TMP/maker_sPf3Rf/TAIR10_longest_trans%2Efasta.mpi.10.0 -query /groups/itay_mayrose/nosnap/liorglic/Projects/PGCM/output/A_thaliana_pan_genome/PGC_de_novo/RESULT_RG_new/per_sample/col-0/liftover_SRR1945757/chunks/chunk00.fa/TMP/maker_sPf3Rf/0/chunk00.0 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-10 -word_size 28 -reward 1 -penalty -5 -gapopen 5 -gapextend 5 -dbsize 1000 -searchsp 500000000 -num_threads 10 -lcase_masking -dust yes -soft_masking true -show_gis -out > > Thank you! > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Tue May 26 13:03:31 2020 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 26 May 2020 13:03:31 -0600 Subject: [maker-devel] Question about maker. Maker2 failed In-Reply-To: References: Message-ID: <2F9626D1-F7C2-4BDA-A59F-C6566C0D558D@gmail.com> It is probably the formating of the models provided. There is something wrong with them. They must be match/match_part two level feature for rm_gff. You can send us the file, and I can take a look if it helps. ?Carson > On May 5, 2020, at 2:41 PM, ??? wrote: > > Hi maker developer, > > I'm using maker 2 to annotate a vertebrate genome. > When I try to provide rm_gff file, it always fails. > Here is log: > Now starting the contig!! > SeqID: chr_XXII > Length: 12689475 > #--------------------------------------------------------------------- > > > setting up GFF3 output and fasta chunks > doing repeat masking > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Did not specify a Hit End or Hit Begin > STACK: Error::throw > STACK: Bio::Root::Root::throw /gpfs/homefs/iee/zl19g775/miniconda3/envs/maker/lib/site_perl/5.26.2/Bio/Root/Root.pm:449 > STACK: Bio::Search::HSP::GenericHSP::_subject_seq_feature /gpfs/homefs/iee/zl19g775/miniconda3/envs/maker/lib/site_perl/5.26.2/Bio/Search/HSP/GenericHSP.pm:1604 > STACK: Bio::Search::HSP::GenericHSP::hit /gpfs/homefs/iee/zl19g775/miniconda3/envs/maker/lib/site_perl/5.26.2/Bio/Search/HSP/GenericHSP.pm:988 > STACK: repeat_mask_seq::separate_types /gpfs/homefs/iee/zl19g775/miniconda3/envs/maker/bin/../lib/repeat_mask_seq.pm:307 > STACK: repeat_mask_seq::mask_chunk /gpfs/homefs/iee/zl19g775/miniconda3/envs/maker/bin/../lib/repeat_mask_seq.pm:191 > STACK: Process::MpiChunk::_go /gpfs/homefs/iee/zl19g775/miniconda3/envs/maker/bin/../lib/Process/MpiChunk.pm:763 > STACK: Process::MpiChunk::run /gpfs/homefs/iee/zl19g775/miniconda3/envs/maker/bin/../lib/Process/MpiChunk.pm:341 > STACK: Process::MpiChunk::run_all /gpfs/homefs/iee/zl19g775/miniconda3/envs/maker/bin/../lib/Process/MpiChunk.pm:357 > STACK: Process::MpiTiers::run_all /gpfs/homefs/iee/zl19g775/miniconda3/envs/maker/bin/../lib/Process/MpiTiers.pm:287 > STACK: Process::MpiTiers::run_all /gpfs/homefs/iee/zl19g775/miniconda3/envs/maker/bin/../lib/Process/MpiTiers.pm:287 > STACK: /home/ubelix/iee/zl19g775/miniconda3/envs/maker/bin/maker:689 > ----------------------------------------------------------- > --> rank=NA, hostname=submit02.ubelix.unibe.ch > ERROR: Failed while doing repeat masking > ERROR: Chunk failed at level:0, tier_type:1 > FAILED CONTIG:chr_XXII > > ERROR: Chunk failed at level:2, tier_type:0 > FAILED CONTIG:chr_XXII > > examining contents of the fasta file and run log > > > > I also searched the google group and tried update my bioperl to 1.7.7 the latest version, but it didn't help. > > Could you please help me? > > Thanks a lot. > > Zuyao > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue May 26 13:15:20 2020 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 26 May 2020 13:15:20 -0600 Subject: [maker-devel] gene:multiple_Einit and overlaps_prev_exon errors in first round of SNAP training In-Reply-To: References: Message-ID: <92F27280-90B8-4AF2-8C1D-B54955A7C521@gmail.com> You have an ID collision in the GFF3. Check the gff3 being sent to maker2zff. If you are using GFF3 as input to MAKER, you likely have a non-unique ID's there that are causing the issue in the first place. ?Carson > On May 6, 2020, at 11:36 AM, Carneiro,Celine M wrote: > > Hello, > > I am getting the errors gene:multiple_Einit, gene:multiple_Eterm, and exon:overlaps_prev_exon, at just about every gene model. I've ran the first round of maker on a bird genome I'm annotating with no errors and have started the steps to train SNAP. However, after running fathom -categorize, just about every single gene model has the same set of errors. Here is an example from my log file after running fathom -categorize: > > MODEL117 1 1 8 - errors(6): gene:multiple_Einit gene:multiple_Eterm exon-7:overlaps_prev_exon exon-5:overlaps_prev_exon exon-3:overlaps_prev_exon exon-1:overlaps_prev_exon > MODEL851 1 1 100 - errors(78): gene:multiple_Einit gene:multiple_Eterm exon-99:overlaps_prev_exon exon-98:overlaps_prev_exon exon-97:overlaps_prev_exon exon-95:overlaps_prev_exon exon-94:overlaps_prev_exon exon-93:overlaps_prev_exon exon-91:overlaps_prev_exon exon-90:overlaps_prev_exon exon-89:overlaps_prev_exon exon-87:overlaps_prev_exon exon-86:overlaps_prev_exon exon-85:overlaps_prev_exon exon-83:overlaps_prev_exon exon-82:overlaps_prev_exon exon-81:overlaps_prev_exon exon-79:overlaps_prev_exon exon-78:overlaps_prev_exon exon-77:overlaps_prev_exon exon-75:overlaps_prev_exon exon-74:overlaps_prev_exon exon-73:overlaps_prev_exon exon-71:overlaps_prev_exon exon-70:overlaps_prev_exon exon-69:overlaps_prev_exon exon-67:overlaps_prev_exon exon-66:overlaps_prev_exon exon-65:overlaps_prev_exon exon-63:overlaps_prev_exon exon-62:overlaps_prev_exon exon-61:overlaps_prev_exon exon-59:overlaps_prev_exon exon-58:overlaps_prev_exon exon-57:overlaps_prev_exon exon-55:overlaps_prev_exon exon-54:overlaps_prev_exon exon-53:overlaps_prev_exon exon-51:overlaps_prev_exon exon-50:overlaps_prev_exon exon-49:overlaps_prev_exon exon-48:overlaps_prev_exon exon-47:overlaps_prev_exon exon-46:overlaps_prev_exon exon-45:overlaps_prev_exon exon-43:overlaps_prev_exon exon-42:overlaps_prev_exon exon-41:overlaps_prev_exon exon-39:overlaps_prev_exon exon-38:overlaps_prev_exon exon-37:overlaps_prev_exon exon-35:overlaps_prev_exon exon-34:overlaps_prev_exon exon-33:overlaps_prev_exon exon-31:overlaps_prev_exon exon-30:overlaps_prev_exon exon-29:overlaps_prev_exon exon-27:overlaps_prev_exon exon-26:overlaps_prev_exon exon-25:overlaps_prev_exon exon-23:overlaps_prev_exon exon-22:overlaps_prev_exon exon-21:overlaps_prev_exon exon-19:overlaps_prev_exon exon-18:overlaps_prev_exon exon-17:overlaps_prev_exon exon-15:overlaps_prev_exon exon-14:overlaps_prev_exon exon-13:overlaps_prev_exon exon-11:overlaps_prev_exon exon-10:overlaps_prev_exon exon-9:overlaps_prev_exon exon-7:overlaps_prev_exon exon-6:overlaps_prev_exon exon-5:overlaps_prev_exon exon-3:overlaps_prev_exon exon-2:overlaps_prev_exon exon-1:overlaps_prev_exon > MODEL190 1 1 39 + errors(35): gene:multiple_Einit gene:multiple_Eterm exon-2:overlaps_prev_exon exon-3:overlaps_prev_exon exon-4:overlaps_prev_exon exon-5:overlaps_prev_exon exon-6:overlaps_prev_exon exon-7:overlaps_prev_exon exon-8:overlaps_prev_exon exon-9:overlaps_prev_exon exon-11:overlaps_prev_exon exon-12:overlaps_prev_exon exon-13:overlaps_prev_exon exon-14:overlaps_prev_exon exon-15:overlaps_prev_exon exon-16:overlaps_prev_exon exon-17:overlaps_prev_exon exon-18:overlaps_prev_exon exon-20:overlaps_prev_exon exon-21:overlaps_prev_exon exon-22:overlaps_prev_exon exon-23:overlaps_prev_exon exon-24:overlaps_prev_exon exon-25:overlaps_prev_exon exon-26:overlaps_prev_exon exon-27:overlaps_prev_exon exon-29:overlaps_prev_exon exon-30:overlaps_prev_exon exon-32:overlaps_prev_exon exon-33:overlaps_prev_exon exon-34:overlaps_prev_exon exon-35:overlaps_prev_exon exon-36:overlaps_prev_exon exon-38:overlaps_prev_exon exon-39:overlaps_prev_exon > MODEL424 1 1 10 - errors(8): gene:multiple_Einit gene:multiple_Eterm exon-9:overlaps_prev_exon exon-7:overlaps_prev_exon exon-6:overlaps_prev_exon exon-5:overlaps_prev_exon exon-3:overlaps_prev_exon exon-1:overlaps_prev_exon > MODEL902 1 1 20 - errors(14): gene:multiple_Einit gene:multiple_Eterm exon-19:overlaps_prev_exon exon-18:overlaps_prev_exon exon-17:overlaps_prev_exon exon-15:overlaps_prev_exon exon-13:overlaps_prev_exon exon-11:overlaps_prev_exon exon-9:overlaps_prev_exon exon-7:overlaps_prev_exon exon-6:overlaps_prev_exon exon-5:overlaps_prev_exon exon-3:overlaps_prev_exon exon-1:overlaps_prev_exon > MODEL238 1 1 14 - errors(11): gene:multiple_Einit gene:multiple_Eterm exon-13:overlaps_prev_exon exon-12:overlaps_prev_exon exon-11:overlaps_prev_exon exon-9:overlaps_prev_exon exon-7:overlaps_prev_exon exon-6:overlaps_prev_exon exon-5:overlaps_prev_exon exon-3:overlaps_prev_exon exon-1:overlaps_prev_exon > MODEL39 1 1 6 - errors(1): exon-3:overlaps_prev_exon > MODEL119 1 1 10 + errors(8): gene:multiple_Einit gene:multiple_Eterm exon-2:overlaps_prev_exon exon-4:overlaps_prev_exon exon-6:overlaps_prev_exon exon-7:overlaps_prev_exon exon-8:overlaps_prev_exon exon-10:overlaps_prev_exon > > Furthermore, I checked my genome.ann file and noticed that my Einit and Exon sites are duplicated. For example: > > >ScdimlH_1004;HRSCAF=1084 > Einit 38730 38677 MODEL851 > Exon 38255 38178 MODEL851 > Exon 38074 38021 MODEL851 > Exon 24755 24717 MODEL851 > Exon 24213 24149 MODEL851 > Exon 23176 23098 MODEL851 > Exon 22037 21961 MODEL851 > Exon 21269 21080 MODEL851 > Exon 20232 20167 MODEL851 > Exon 19742 19704 MODEL851 > Exon 14705 14590 MODEL851 > Exon 14255 13980 MODEL851 > Exon 14169 13980 MODEL851 > Exon 13303 13223 MODEL851 > Exon 13303 13223 MODEL851 > Exon 12782 12639 MODEL851 > Exon 12782 12639 MODEL851 > Exon 5761 5592 MODEL851 > Exon 5482 5404 MODEL851 > Exon 5140 5064 MODEL851 > Exon 4951 4750 MODEL851 > Exon 4567 4502 MODEL851 > Exon 4256 4185 MODEL851 > Exon 3569 3403 MODEL851 > Exon 3157 3076 MODEL851 > Exon 2936 2800 MODEL851 > Eterm 2186 2000 MODEL851 > Einit 38730 38677 MODEL851 > Exon 38255 38178 MODEL851 > Exon 38074 38021 MODEL851 > Exon 24755 24717 MODEL851 > Exon 24213 24149 MODEL851 > Exon 23176 23098 MODEL851 > Exon 22037 21961 MODEL851 > Exon 21269 21080 MODEL851 > Exon 20232 20167 MODEL851 > Exon 19742 19704 MODEL851 > Exon 14705 14590 MODEL851 > Exon 14255 13980 MODEL851 > Exon 14169 13980 MODEL851 > Exon 13303 13223 MODEL851 > Exon 13303 13223 MODEL851 > Exon 12782 12639 MODEL851 > Exon 12782 12639 MODEL851 > Exon 5761 5592 MODEL851 > Exon 5482 5404 MODEL851 > Exon 5140 5064 MODEL851 > Exon 4951 4750 MODEL851 > Exon 4567 4502 MODEL851 > Exon 4256 4185 MODEL851 > Exon 3569 3403 MODEL851 > Exon 3157 3076 MODEL851 > Exon 2936 2800 MODEL851 > Eterm 2186 2000 MODEL851 > > Any ideas why I'm seeing this duplication? Lastly, any ideas why my exons are overlapping so much? I appreciate any input and please let me know if you require any more information. > > Thank you! > > Celine > > > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue May 26 13:26:03 2020 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 26 May 2020 13:26:03 -0600 Subject: [maker-devel] Maker v3.01 change-log + 3'UTR question In-Reply-To: References: Message-ID: <07D0AA31-0FCF-4E60-AC50-1E377CBF870C@gmail.com> 2. If I was to improve the annotation of my 3? UTRs within a certain (non-model species) gff3, is there a particular way or a protocol to follow? I was thinking for example that Lexogen has released their 3? UTR kit for RNA-seq of the three prime end of transcripts. Would it be possible to feed those reads to maker and somehow suggest that the reads are originating from the three-prime end so that this info is then passed in the gff3 file? You could pass final models back in as predicted_gff (no UTR on the models), then pass in just the evidence you want as UTR as est_gff (would have to be assembled and not as individual reads). As long as the overlap the pred_gff models, MAKER would try and make UTR out of them. Might be worth an experiment. ?Carson -------------- next part -------------- An HTML attachment was scrubbed... URL: From shawn.trojahn at wsu.edu Fri May 29 15:49:38 2020 From: shawn.trojahn at wsu.edu (Trojahn, Shawn Michael) Date: Fri, 29 May 2020 21:49:38 +0000 Subject: [maker-devel] Intron lengths below minimum cutoff Message-ID: Hello, I have been having a problem with the final annotation coming from Maker2 where I have a few thousand introns that are below the minimum intron value I have set. Most of the exons around these problem introns have no support in the final merged gff file, but a few are supported by blast hits. Is there a reason why these introns would remain in the final gff? Thanks, Shawn -------------- next part -------------- An HTML attachment was scrubbed... URL: From patrick.tranvan at unil.ch Sun May 3 05:39:59 2020 From: patrick.tranvan at unil.ch (Patrick Tran Van) Date: Sun, 3 May 2020 11:39:59 +0000 Subject: [maker-devel] Multiple UTR ? In-Reply-To: References: , Message-ID: <16630d833d1448e7a771a4f2b19b0476@unil.ch> Hi Carson, for instance, if have this: SCFXX maker five_prime_UTR 5164370 5164715 . - . ID=GENE-RA:five_prime_utr;Parent=GENE-RA; SCFXX maker five_prime_UTR 5156091 5156136 . - . ID=GENE-RA:five_prime_utr;Parent=GENE-RA; Does it mean that real coordinate of the 5' UTR is from 5156091 to 5164715 ? Patrick Tran Van Bioinformatician: Lab Chapuisat & Schwander Department of Ecology and Evolution University of Lausanne Lausanne - Switzerland Office 3206 ________________________________ From: Carson Holt Sent: Wednesday, February 26, 2020 8:27:43 PM To: Patrick Tran Van Cc: maker-devel at yandell-lab.org Subject: Re: [maker-devel] Multiple UTR ? Sorry for the very slow reply. I found this way way down in my inbox. The UTR features are the parts of the exons that are not CDS. So multiple UTR, means it spans multiple exons, and must assembled to generate the full UTR in a browser. Any exon that is fully non-coding will produce a UTR feature that mirrors an exons coordinates, and if it?s partially coding the UTR will share the same start or end by will terminate somewhere in the middle with a CDS filling up the remains coordinates. The UTR and CDS features get tiled over the top of the exon features when assembling a gene model. ?Carson On Dec 18, 2019, at 7:19 AM, Patrick Tran Van > wrote: Hi Carson, I have seen something strange in my annotation: multiple UTR. How can we explain this ? Thanks! Scaffold maker mRNA 12117462 12128433 . - . ID=GENE_02395-RA;Parent=GENE_02395;Name=GENE_02395-RA;Alias=maker-Scaffold-augustus-gene-40.12-mRNA-3;_AED=0.02;_QI=5383|1|1|1|0.88|0.9|10|247|238;_eAED=0.02;Note=Protein of unknown function; Scaffold maker exon 12128112 12128433 . - . ID=GENE_02395-RA:exon:571;Parent=GENE_02395-RA; Scaffold maker exon 12117462 12118046 . - . ID=GENE_02395-RB:exon:569;Parent=GENE_02395-RB,GENE_02395-RC,GENE_02395-RA; Scaffold maker exon 12118141 12118301 . - . ID=GENE_02395-RB:exon:568;Parent=GENE_02395-RB,GENE_02395-RC,GENE_02395-RA; Scaffold maker exon 12118386 12118539 . - . ID=GENE_02395-RB:exon:567;Parent=GENE_02395-RB,GENE_02395-RA; Scaffold maker exon 12118818 12122493 . - . ID=GENE_02395-RB:exon:566;Parent=GENE_02395-RB,GENE_02395-RA; Scaffold maker exon 12123591 12123893 . - . ID=GENE_02395-RB:exon:565;Parent=GENE_02395-RB,GENE_02395-RC,GENE_02395-RA; Scaffold maker exon 12123995 12124303 . - . ID=GENE_02395-RB:exon:564;Parent=GENE_02395-RB,GENE_02395-RC,GENE_02395-RA; Scaffold maker exon 12125119 12125418 . - . ID=GENE_02395-RB:exon:563;Parent=GENE_02395-RB,GENE_02395-RC,GENE_02395-RA; Scaffold maker exon 12126005 12126313 . - . ID=GENE_02395-RB:exon:562;Parent=GENE_02395-RB,GENE_02395-RC,GENE_02395-RA; Scaffold maker exon 12127460 12127687 . - . ID=GENE_02395-RB:exon:561;Parent=GENE_02395-RB,GENE_02395-RC,GENE_02395-RA; Scaffold maker five_prime_UTR 12128112 12128433 . - . ID=GENE_02395-RA:five_prime_utr;Parent=GENE_02395-RA; Scaffold maker five_prime_UTR 12127460 12127687 . - . ID=GENE_02395-RA:five_prime_utr;Parent=GENE_02395-RA; Scaffold maker five_prime_UTR 12126005 12126313 . - . ID=GENE_02395-RA:five_prime_utr;Parent=GENE_02395-RA; Scaffold maker five_prime_UTR 12125119 12125418 . - . ID=GENE_02395-RA:five_prime_utr;Parent=GENE_02395-RA; Scaffold maker five_prime_UTR 12123995 12124303 . - . ID=GENE_02395-RA:five_prime_utr;Parent=GENE_02395-RA; Scaffold maker five_prime_UTR 12123591 12123893 . - . ID=GENE_02395-RA:five_prime_utr;Parent=GENE_02395-RA; Scaffold maker five_prime_UTR 12118882 12122493 . - . ID=GENE_02395-RA:five_prime_utr;Parent=GENE_02395-RA; Scaffold maker CDS 12118818 12118881 . - 0 ID=GENE_02395-RA:cds;Parent=GENE_02395-RA; Scaffold maker CDS 12118386 12118539 . - 2 ID=GENE_02395-RA:cds;Parent=GENE_02395-RA; Scaffold maker CDS 12118141 12118301 . - 1 ID=GENE_02395-RA:cds;Parent=GENE_02395-RA; Scaffold maker CDS 12117709 12118046 . - 2 ID=GENE_02395-RA:cds;Parent=GENE_02395-RA; Scaffold maker three_prime_UTR 12117462 12117708 . - . ID=GENE_02395-RA:three_prime_utr;Parent=GENE_02395-RA; Patrick Tran Van Bioinformatician: Lab Chapuisat & Schwander Department of Ecology and Evolution University of Lausanne Lausanne - Switzerland Office 3206 _______________________________________________ maker-devel mailing list maker-devel at yandell-lab.org http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From liorglic at mail.tau.ac.il Tue May 5 03:46:01 2020 From: liorglic at mail.tau.ac.il (Lior Glick) Date: Tue, 5 May 2020 12:46:01 +0300 Subject: [maker-devel] Unable to reproduce MAKER blastn results Message-ID: Hello, I am running MAKER 2.31.10 with a very simple configuration, with only EST evidence and the est2genome option enabled (basically a lift-over procedure). I noticed that some of my transcripts are not included in the annotation output and when I looked at the blastn results the reason was clear - they do not pass the coverage cutoff defined in maker_bopts.ctl. Interestingly, when I tried running blastn myself, using the same command (taken from the maker log) and the same blastn version, I got slightly different results. Specifically, for some of the transcripts the MAKER blastn run produced less HSPs than my blastn run, resulting in a lower total coverage. The additional HSPs seem to have good % identity and E-values, so I don't understand why and how they are discarded. Are the blastn results changed by MAKER in subsequent steps (after the blastn run)? Please find attached blastn results from MAKER run and from my run. You can look at transcript AT1G01740.3 as an example. in my.blastn, there are 8 HSPs, while MAKER.blastn only has 3 of them. Can you explain the difference? Maybe it has to do with repeat masking or other processing of the genome sequence? Just to make sure you have all the details: Relevant maker_bopts parameters: pcov_blastn=0.7 #Blastn Percent Coverage Threhold EST-Genome Alignments pid_blastn=0.85 #Blastn Percent Identity Threshold EST-Genome Aligments eval_blastn=1e-10 #Blastn eval cutoff bit_blastn=40 #Blastn bit cutoff depth_blastn=0 #Blastn depth cutoff (0 to disable cutoff) Blastn command: blastn -db /groups/itay_mayrose/nosnap/liorglic/Projects/PGCM/output/A_thaliana_pan_genome/PGC_de_novo/RESULT_RG_new/per_sample/col-0/liftover_SRR1945757/chunks/chunk00.fa/TMP/maker_sPf3Rf/TAIR10_longest_trans%2Efasta.mpi.10.0 -query /groups/itay_mayrose/nosnap/liorglic/Projects/PGCM/output/A_thaliana_pan_genome/PGC_de_novo/RESULT_RG_new/per_sample/col-0/liftover_SRR1945757/chunks/chunk00.fa/TMP/maker_sPf3Rf/0/chunk00.0 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-10 -word_size 28 -reward 1 -penalty -5 -gapopen 5 -gapextend 5 -dbsize 1000 -searchsp 500000000 -num_threads 10 -lcase_masking -dust yes -soft_masking true -show_gis -out Thank you! -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: my.blastn Type: application/octet-stream Size: 440089 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: MAKER.blastn Type: application/octet-stream Size: 1143935 bytes Desc: not available URL: From zuyao.liu.0910 at gmail.com Tue May 5 14:41:25 2020 From: zuyao.liu.0910 at gmail.com (=?UTF-8?B?56WW5bCn5YiY?=) Date: Tue, 5 May 2020 22:41:25 +0200 Subject: [maker-devel] Question about maker. Maker2 failed Message-ID: Hi maker developer, I'm using maker 2 to annotate a vertebrate genome. When I try to provide rm_gff file, it always fails. Here is log: Now starting the contig!! SeqID: chr_XXII Length: 12689475 #--------------------------------------------------------------------- setting up GFF3 output and fasta chunks doing repeat masking ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Did not specify a Hit End or Hit Begin STACK: Error::throw STACK: Bio::Root::Root::throw /gpfs/homefs/iee/zl19g775/miniconda3/envs/maker/lib/site_perl/5.26.2/Bio/Root/Root.pm:449 STACK: Bio::Search::HSP::GenericHSP::_subject_seq_feature /gpfs/homefs/iee/zl19g775/miniconda3/envs/maker/lib/site_perl/5.26.2/Bio/Search/HSP/GenericHSP.pm:1604 STACK: Bio::Search::HSP::GenericHSP::hit /gpfs/homefs/iee/zl19g775/miniconda3/envs/maker/lib/site_perl/5.26.2/Bio/Search/HSP/GenericHSP.pm:988 STACK: repeat_mask_seq::separate_types /gpfs/homefs/iee/zl19g775/miniconda3/envs/maker/bin/../lib/ repeat_mask_seq.pm:307 STACK: repeat_mask_seq::mask_chunk /gpfs/homefs/iee/zl19g775/miniconda3/envs/maker/bin/../lib/ repeat_mask_seq.pm:191 STACK: Process::MpiChunk::_go /gpfs/homefs/iee/zl19g775/miniconda3/envs/maker/bin/../lib/Process/MpiChunk.pm:763 STACK: Process::MpiChunk::run /gpfs/homefs/iee/zl19g775/miniconda3/envs/maker/bin/../lib/Process/MpiChunk.pm:341 STACK: Process::MpiChunk::run_all /gpfs/homefs/iee/zl19g775/miniconda3/envs/maker/bin/../lib/Process/MpiChunk.pm:357 STACK: Process::MpiTiers::run_all /gpfs/homefs/iee/zl19g775/miniconda3/envs/maker/bin/../lib/Process/MpiTiers.pm:287 STACK: Process::MpiTiers::run_all /gpfs/homefs/iee/zl19g775/miniconda3/envs/maker/bin/../lib/Process/MpiTiers.pm:287 STACK: /home/ubelix/iee/zl19g775/miniconda3/envs/maker/bin/maker:689 ----------------------------------------------------------- --> rank=NA, hostname=submit02.ubelix.unibe.ch ERROR: Failed while doing repeat masking ERROR: Chunk failed at level:0, tier_type:1 FAILED CONTIG:chr_XXII ERROR: Chunk failed at level:2, tier_type:0 FAILED CONTIG:chr_XXII examining contents of the fasta file and run log I also searched the google group and tried update my bioperl to 1.7.7 the latest version, but it didn't help. Could you please help me? Thanks a lot. Zuyao -------------- next part -------------- An HTML attachment was scrubbed... URL: From c143dad at ufl.edu Wed May 6 11:36:40 2020 From: c143dad at ufl.edu (Carneiro,Celine M) Date: Wed, 6 May 2020 17:36:40 +0000 Subject: [maker-devel] gene:multiple_Einit and overlaps_prev_exon errors in first round of SNAP training Message-ID: Hello, I am getting the errors gene:multiple_Einit, gene:multiple_Eterm, and exon:overlaps_prev_exon, at just about every gene model. I've ran the first round of maker on a bird genome I'm annotating with no errors and have started the steps to train SNAP. However, after running fathom -categorize, just about every single gene model has the same set of errors. Here is an example from my log file after running fathom -categorize: MODEL117 1 1 8 - errors(6): gene:multiple_Einit gene:multiple_Eterm exon-7:overlaps_prev_exon exon-5:overlaps_prev_exon exon-3:overlaps_prev_exon exon-1:overlaps_prev_exon MODEL851 1 1 100 - errors(78): gene:multiple_Einit gene:multiple_Eterm exon-99:overlaps_prev_exon exon-98:overlaps_prev_exon exon-97:overlaps_prev_exon exon-95:overlaps_prev_exon exon-94:overlaps_prev_exon exon-93:overlaps_prev_exon exon-91:overlaps_prev_exon exon-90:overlaps_prev_exon exon-89:overlaps_prev_exon exon-87:overlaps_prev_exon exon-86:overlaps_prev_exon exon-85:overlaps_prev_exon exon-83:overlaps_prev_exon exon-82:overlaps_prev_exon exon-81:overlaps_prev_exon exon-79:overlaps_prev_exon exon-78:overlaps_prev_exon exon-77:overlaps_prev_exon exon-75:overlaps_prev_exon exon-74:overlaps_prev_exon exon-73:overlaps_prev_exon exon-71:overlaps_prev_exon exon-70:overlaps_prev_exon exon-69:overlaps_prev_exon exon-67:overlaps_prev_exon exon-66:overlaps_prev_exon exon-65:overlaps_prev_exon exon-63:overlaps_prev_exon exon-62:overlaps_prev_exon exon-61:overlaps_prev_exon exon-59:overlaps_prev_exon exon-58:overlaps_prev_exon exon-57:overlaps_prev_exon exon-55:overlaps_prev_exon exon-54:overlaps_prev_exon exon-53:overlaps_prev_exon exon-51:overlaps_prev_exon exon-50:overlaps_prev_exon exon-49:overlaps_prev_exon exon-48:overlaps_prev_exon exon-47:overlaps_prev_exon exon-46:overlaps_prev_exon exon-45:overlaps_prev_exon exon-43:overlaps_prev_exon exon-42:overlaps_prev_exon exon-41:overlaps_prev_exon exon-39:overlaps_prev_exon exon-38:overlaps_prev_exon exon-37:overlaps_prev_exon exon-35:overlaps_prev_exon exon-34:overlaps_prev_exon exon-33:overlaps_prev_exon exon-31:overlaps_prev_exon exon-30:overlaps_prev_exon exon-29:overlaps_prev_exon exon-27:overlaps_prev_exon exon-26:overlaps_prev_exon exon-25:overlaps_prev_exon exon-23:overlaps_prev_exon exon-22:overlaps_prev_exon exon-21:overlaps_prev_exon exon-19:overlaps_prev_exon exon-18:overlaps_prev_exon exon-17:overlaps_prev_exon exon-15:overlaps_prev_exon exon-14:overlaps_prev_exon exon-13:overlaps_prev_exon exon-11:overlaps_prev_exon exon-10:overlaps_prev_exon exon-9:overlaps_prev_exon exon-7:overlaps_prev_exon exon-6:overlaps_prev_exon exon-5:overlaps_prev_exon exon-3:overlaps_prev_exon exon-2:overlaps_prev_exon exon-1:overlaps_prev_exon MODEL190 1 1 39 + errors(35): gene:multiple_Einit gene:multiple_Eterm exon-2:overlaps_prev_exon exon-3:overlaps_prev_exon exon-4:overlaps_prev_exon exon-5:overlaps_prev_exon exon-6:overlaps_prev_exon exon-7:overlaps_prev_exon exon-8:overlaps_prev_exon exon-9:overlaps_prev_exon exon-11:overlaps_prev_exon exon-12:overlaps_prev_exon exon-13:overlaps_prev_exon exon-14:overlaps_prev_exon exon-15:overlaps_prev_exon exon-16:overlaps_prev_exon exon-17:overlaps_prev_exon exon-18:overlaps_prev_exon exon-20:overlaps_prev_exon exon-21:overlaps_prev_exon exon-22:overlaps_prev_exon exon-23:overlaps_prev_exon exon-24:overlaps_prev_exon exon-25:overlaps_prev_exon exon-26:overlaps_prev_exon exon-27:overlaps_prev_exon exon-29:overlaps_prev_exon exon-30:overlaps_prev_exon exon-32:overlaps_prev_exon exon-33:overlaps_prev_exon exon-34:overlaps_prev_exon exon-35:overlaps_prev_exon exon-36:overlaps_prev_exon exon-38:overlaps_prev_exon exon-39:overlaps_prev_exon MODEL424 1 1 10 - errors(8): gene:multiple_Einit gene:multiple_Eterm exon-9:overlaps_prev_exon exon-7:overlaps_prev_exon exon-6:overlaps_prev_exon exon-5:overlaps_prev_exon exon-3:overlaps_prev_exon exon-1:overlaps_prev_exon MODEL902 1 1 20 - errors(14): gene:multiple_Einit gene:multiple_Eterm exon-19:overlaps_prev_exon exon-18:overlaps_prev_exon exon-17:overlaps_prev_exon exon-15:overlaps_prev_exon exon-13:overlaps_prev_exon exon-11:overlaps_prev_exon exon-9:overlaps_prev_exon exon-7:overlaps_prev_exon exon-6:overlaps_prev_exon exon-5:overlaps_prev_exon exon-3:overlaps_prev_exon exon-1:overlaps_prev_exon MODEL238 1 1 14 - errors(11): gene:multiple_Einit gene:multiple_Eterm exon-13:overlaps_prev_exon exon-12:overlaps_prev_exon exon-11:overlaps_prev_exon exon-9:overlaps_prev_exon exon-7:overlaps_prev_exon exon-6:overlaps_prev_exon exon-5:overlaps_prev_exon exon-3:overlaps_prev_exon exon-1:overlaps_prev_exon MODEL39 1 1 6 - errors(1): exon-3:overlaps_prev_exon MODEL119 1 1 10 + errors(8): gene:multiple_Einit gene:multiple_Eterm exon-2:overlaps_prev_exon exon-4:overlaps_prev_exon exon-6:overlaps_prev_exon exon-7:overlaps_prev_exon exon-8:overlaps_prev_exon exon-10:overlaps_prev_exon Furthermore, I checked my genome.ann file and noticed that my Einit and Exon sites are duplicated. For example: >ScdimlH_1004;HRSCAF=1084 Einit 38730 38677 MODEL851 Exon 38255 38178 MODEL851 Exon 38074 38021 MODEL851 Exon 24755 24717 MODEL851 Exon 24213 24149 MODEL851 Exon 23176 23098 MODEL851 Exon 22037 21961 MODEL851 Exon 21269 21080 MODEL851 Exon 20232 20167 MODEL851 Exon 19742 19704 MODEL851 Exon 14705 14590 MODEL851 Exon 14255 13980 MODEL851 Exon 14169 13980 MODEL851 Exon 13303 13223 MODEL851 Exon 13303 13223 MODEL851 Exon 12782 12639 MODEL851 Exon 12782 12639 MODEL851 Exon 5761 5592 MODEL851 Exon 5482 5404 MODEL851 Exon 5140 5064 MODEL851 Exon 4951 4750 MODEL851 Exon 4567 4502 MODEL851 Exon 4256 4185 MODEL851 Exon 3569 3403 MODEL851 Exon 3157 3076 MODEL851 Exon 2936 2800 MODEL851 Eterm 2186 2000 MODEL851 Einit 38730 38677 MODEL851 Exon 38255 38178 MODEL851 Exon 38074 38021 MODEL851 Exon 24755 24717 MODEL851 Exon 24213 24149 MODEL851 Exon 23176 23098 MODEL851 Exon 22037 21961 MODEL851 Exon 21269 21080 MODEL851 Exon 20232 20167 MODEL851 Exon 19742 19704 MODEL851 Exon 14705 14590 MODEL851 Exon 14255 13980 MODEL851 Exon 14169 13980 MODEL851 Exon 13303 13223 MODEL851 Exon 13303 13223 MODEL851 Exon 12782 12639 MODEL851 Exon 12782 12639 MODEL851 Exon 5761 5592 MODEL851 Exon 5482 5404 MODEL851 Exon 5140 5064 MODEL851 Exon 4951 4750 MODEL851 Exon 4567 4502 MODEL851 Exon 4256 4185 MODEL851 Exon 3569 3403 MODEL851 Exon 3157 3076 MODEL851 Exon 2936 2800 MODEL851 Eterm 2186 2000 MODEL851 Any ideas why I'm seeing this duplication? Lastly, any ideas why my exons are overlapping so much? I appreciate any input and please let me know if you require any more information. Thank you! Celine -------------- next part -------------- An HTML attachment was scrubbed... URL: From peruzzaluca at gmail.com Tue May 19 02:31:22 2020 From: peruzzaluca at gmail.com (Luca Peruzza) Date: Tue, 19 May 2020 10:31:22 +0200 Subject: [maker-devel] Maker v3.01 change-log + 3'UTR question Message-ID: Hi There, I have two questions and I hope you guys can help me with them: 1. I have seen that maker version 3.01 is now out. Is there a change log available to see the changes in comparison to the previous maker version and have a glimpse of the new features of this release? 2. If I was to improve the annotation of my 3? UTRs within a certain (non-model species) gff3, is there a particular way or a protocol to follow? I was thinking for example that Lexogen has released their 3? UTR kit for RNA-seq of the three prime end of transcripts. Would it be possible to feed those reads to maker and somehow suggest that the reads are originating from the three-prime end so that this info is then passed in the gff3 file? Thanks a lot in advance for your help Best Luca From zuyao.liu.0910 at gmail.com Tue May 19 03:10:30 2020 From: zuyao.liu.0910 at gmail.com (=?UTF-8?B?56WW5bCn5YiY?=) Date: Tue, 19 May 2020 11:10:30 +0200 Subject: [maker-devel] Question about maker. Maker2 failed Message-ID: Hi maker developers I'm using maker 2 to annotate a fish genome. When I try to provide rm_gff file, it always fails. Here is log: collecting blastx repeatmasking doing repeat masking processing all repeats deleted:0 hits in cluster::shadow_cluster... Died at /gpfs/homefs/iee/zl19g775/miniconda3/envs/maker/bin/../lib/Bio/Search/Hit/PhatHit/Base.pm line 188. --> rank=23, hostname=hnode48 ERROR: Failed while processing all repeats ERROR: Chunk failed at level:3, tier_type:1 FAILED CONTIG:chr_XXIII ERROR: Chunk failed at level:2, tier_type:0 FAILED CONTIG:chr_XXIII I use maker 2.3.10 with repeatmasker 4.0.9. I saw someone got this error as well and I followed the solutions. I tried update to blast 2.9.0, rmblast 2.9.0,bioperl1.7.7 and also checked rm gff file with gff3 validator. But the error still existed. Do you have any suggestions? Thanks a lot for your help. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jason.stajich at gmail.com Tue May 19 17:29:38 2020 From: jason.stajich at gmail.com (Jason Stajich) Date: Tue, 19 May 2020 16:29:38 -0700 Subject: [maker-devel] Maker v3.01 change-log + 3'UTR question In-Reply-To: References: Message-ID: Luca - I would suggest PASA as a tool for 3'UTR (and 5'UTR) improvement in gene annotation too. https://github.com/PASApipeline/PASApipeline Funannotate has a step that can be use to run and update gene models if you want to also take on from an existing maker run - https://funannotate.readthedocs.io/en/latest/ Jason Jason Stajich jason.stajich at gmail.com On Tue, May 19, 2020 at 1:33 AM Luca Peruzza wrote: > Hi There, > I have two questions and I hope you guys can help me with them: > > 1. I have seen that maker version 3.01 is now out. Is there a change log > available to see the changes in comparison to the previous maker version > and have a glimpse of the new features of this release? > > 2. If I was to improve the annotation of my 3? UTRs within a certain > (non-model species) gff3, is there a particular way or a protocol to > follow? I was thinking for example that Lexogen has released their 3? UTR > kit for RNA-seq of the three prime end of transcripts. Would it be possible > to feed those reads to maker and somehow suggest that the reads are > originating from the three-prime end so that this info is then passed in > the gff3 file? > > Thanks a lot in advance for your help > Best > Luca > > > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From peruzzaluca at gmail.com Wed May 20 06:41:14 2020 From: peruzzaluca at gmail.com (Luca Peruzza) Date: Wed, 20 May 2020 14:41:14 +0200 Subject: [maker-devel] Maker v3.01 change-log + 3'UTR question In-Reply-To: References: Message-ID: <4ABBF9F2-4F9E-4D7F-B821-3276D4D3EFD1@gmail.com> Thanks Jason, Yes, my idea was to add extra 3?UTR info to an existing maker gff3 file. If you say that funannotate can do it, I?ll have a look. Thanks Luca > On 20 May 2020, at 01:29, Jason Stajich wrote: > > Luca - I would suggest PASA as a tool for 3'UTR (and 5'UTR) improvement in gene annotation too. https://github.com/PASApipeline/PASApipeline > > Funannotate has a step that can be use to run and update gene models if you want to also take on from an existing maker run - https://funannotate.readthedocs.io/en/latest/ > > Jason > Jason Stajich > jason.stajich at gmail.com > > > On Tue, May 19, 2020 at 1:33 AM Luca Peruzza > wrote: > Hi There, > I have two questions and I hope you guys can help me with them: > > 1. I have seen that maker version 3.01 is now out. Is there a change log available to see the changes in comparison to the previous maker version and have a glimpse of the new features of this release? > > 2. If I was to improve the annotation of my 3? UTRs within a certain (non-model species) gff3, is there a particular way or a protocol to follow? I was thinking for example that Lexogen has released their 3? UTR kit for RNA-seq of the three prime end of transcripts. Would it be possible to feed those reads to maker and somehow suggest that the reads are originating from the three-prime end so that this info is then passed in the gff3 file? > > Thanks a lot in advance for your help > Best > Luca > > > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From niconm89 at gmail.com Thu May 21 07:58:34 2020 From: niconm89 at gmail.com (=?UTF-8?Q?Nicol=C3=A1s_Moreyra?=) Date: Thu, 21 May 2020 10:58:34 -0300 Subject: [maker-devel] different number of annotated genes and transcripts Message-ID: Dear all, First of all, thank you for sharing your experiences here. I tried to find this issue in the posts already made but failed. Secondly, I am sorry for asking you a silly question (I think), but after I complete the genome annotation of four species, I obtained fewer transcripts than genes. I do not understand why MAKER annotated genes unable to transcribe. I was trying to find the reason for this issue to discuss it in my thesis but I am a bit lost. Has this happened to anyone? Is there any possible cause that comes to mind? Thanks in advance. Nicol?s *--* *Nicolas Nahuel Moreyra* *BSc/MSc in Bioinformatics* *CONICET PhD Fellow @ IEGEBA* *PhD Student in Comparative Genomics @ EGE (**FCEyN - UBA) **-> **nmoreyra at ege.fcen.uba.ar * Professor of Bioinformatics @ Favaloro University Professor of Informatics @ IFTS N? 7 *Argentina* -------------- next part -------------- An HTML attachment was scrubbed... URL: From yujin at genomics.cn Fri May 22 23:46:33 2020 From: yujin at genomics.cn (=?gb2312?B?0+C9+ChKaW4gWXUp?=) Date: Sat, 23 May 2020 05:46:33 +0000 Subject: [maker-devel] maker error-ERROR: Failed while annotating transcripts Message-ID: Hi, Dear developers. I'm using maker-3.01.03 to annotate a plant genome. But I met this error: Can't locate object method "add_entry" via package "1" (perhaps you forgot to load "1"?) at /vol2/liuyang_group/liuyang/software/maker-3.01.03/bin/../lib/Widget/snap.pm line 540. ERROR: Failed while annotating transcripts The attached file is the full STDERR from maker. I have searched the archived mailing list, and found a similar question (https://groups.google.com/forum/#!topic/maker-devel/fGGCKXhi6cw), but I didn't find any error which occurred before this one in the log. Appreciate it a lot if you could help me! Best regards Jin Yu ??? ?? 15527740380 ??????????? ???????? -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker.log Type: application/octet-stream Size: 3935522 bytes Desc: maker.log URL: From zhoux233 at mail2.sysu.edu.cn Sun May 24 17:43:43 2020 From: zhoux233 at mail2.sysu.edu.cn (=?utf-8?B?5ZGo6ZGr?=) Date: Mon, 25 May 2020 07:43:43 +0800 Subject: [maker-devel] Trouble in opening the registration page for Maker Message-ID: Hello, Developers of Maker, I'm a student from SYSU, China. Recently, I wanted to download Maker for my lab annotation work from your website, but I got in trouble opening the registration page for days, and I didn't figure out why. And I failed to install maker with conda, so could you please tell me how to deal with it? Or could you please send me a copy of source? If you are convenient to send me a copy, here is my information: Name: Zhou Xin Email address: zhoux233 at mail2.sysu.edu.cn Software needed: Maker PI name: Huang ShengFeng Research: Genome Annotation for zebrafish Institute: Life Science School, Sun Yat-Sen University Institute URL: http://lifesciences.sysu.edu.cn/ Country: China Province: GuangDong City: Guang Zhou If anything else needed, please email me, I will add it as soon as I see it. Anyway, thank you for your attention very much! Any reply will be appreciated very much! Regards! Zhou Xin -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue May 26 11:54:45 2020 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 26 May 2020 11:54:45 -0600 Subject: [maker-devel] different number of annotated genes and transcripts In-Reply-To: References: Message-ID: Perhaps you are counting wrong. If you want to know the number go genes, you must look at the GFF3. You can use ?grep -c -P ?\tgene\t? file.gff?, then the number of transcripts would be ?grep -c -P ?RNA\t? file.gff" Note that if you are using things like tRNAscan, you will get tRNA transcripts and associated genes. If you are trying to count from the fasta files, make sure you use the right file (maker.proteins.fasta and maker.transcripts.fasta). Thanks, Carson > On May 21, 2020, at 7:58 AM, Nicol?s Moreyra wrote: > > Dear all, > > First of all, thank you for sharing your experiences here. I tried to find this issue in the posts already made but failed. > Secondly, I am sorry for asking you a silly question (I think), but after I complete the genome annotation of four species, I obtained fewer transcripts than genes. I do not understand why MAKER annotated genes unable to transcribe. > I was trying to find the reason for this issue to discuss it in my thesis but I am a bit lost. Has this happened to anyone? Is there any possible cause that comes to mind? > > Thanks in advance. > > Nicol?s > > -- > Nicolas Nahuel Moreyra > BSc/MSc in Bioinformatics > CONICET PhD Fellow @ IEGEBA > PhD Student in Comparative Genomics @ EGE (FCEyN - UBA) -> nmoreyra at ege.fcen.uba.ar > Professor of Bioinformatics @ Favaloro University > Professor of Informatics @ IFTS N? 7 > Argentina > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue May 26 12:10:16 2020 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 26 May 2020 12:10:16 -0600 Subject: [maker-devel] Maker 0 genes after SNAP or with proteins.gff In-Reply-To: <84c5fc195df0fcc5e03484e65076fa9c@uni-duesseldorf.de> References: <84c5fc195df0fcc5e03484e65076fa9c@uni-duesseldorf.de> Message-ID: <053268A7-E5B0-4878-92F2-63B01869B677@gmail.com> You would have to look at the alignments, but I suspect they do not align in a way to the gene models to supply sufficient support for the annotation. If it is the maker2zff script producing 0 genes, that it because it requires at least some EST evidence. You can change that using the command line options. ?Carson > On Apr 24, 2020, at 8:27 AM, Ricardo Nuno Ferreira Martins Guerreiro wrote: > > Dear Makers list, > > > I am struggling with Maker after many successful attempts. I don't understand why but my final .gff does not contain any genes, 0. > > I am running first an Evidence based modelling, with proteins only. Here I get around 40 thousand genes if I give the proteins as a fasta to align (if I provide a protein.gff from a previous maker try, I get 0 genes, same problem). > > Afterwards I'm creating a SNAP hmm and running maker again, turning protein2genome=0 and snaphmm=snap.hmm as you say, but now I have 0 genes. This happens either I keep providing proteins as a fasta or as .gff of a previous run. > > I have done this many times and it always worked. The only difference now is that I am using no ESTs whatsoever, only proteins. It's also strange that it works on the first round of maker but doesn't work on the SNAP rounds. > > > Hope you can help, > Ricardo_______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Tue May 26 12:14:28 2020 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 26 May 2020 12:14:28 -0600 Subject: [maker-devel] Problems with openMPI in multiple computing nodes In-Reply-To: References: Message-ID: <78BFA69F-C631-4190-9F97-6B6ECC7AE15B@gmail.com> You don?t have berkleyDB installed on your system, so BioPerl is trying to fall back to another index format that has issues on network mounted file systems. You can try and install BerkleyDB then the related perl module (https://metacpan.org/pod/BerkeleyDB ). You would then need to reinstall BioPerl and MAKER. You can also try running on a single CPU until indexing finishes, then launch MAKER. That might be enough to get around any early race conditions. ?Carson > On Apr 26, 2020, at 12:58 AM, Xu, taosheng wrote: > > Hello, > I am using a computer cluster with 20 nodes(40cpus per node) for gene annotation. I submit my maker task to one node with 40 CPUs using openMPI. Everything is well. > But I encounter the problem when submitting the same maker task to the cluster with multiple nodes (120 cpus) There are errors shown below. > I would also appreciate any advice. Thank you. > > Best regards, > Taosheng > > > STATUS: Processing and indexing input FASTA files... > cannot remove directory for home/20200425/genome.maker.output/mpi_blastdb/te_proteins%2Efasta.mpi.10//.dbtmp0: No such file or directory at /maker/bin/../lib/FastaDB.pm line 145. > cannot remove directory for /home/20200425/genome.maker.output/mpi_blastdb/te_proteins%2Efasta.mpi.10//.dbtmp0: Directory not empty at /maker/bin/../lib/FastaDB.pm line 145. > cannot remove directory for /home/20200425/genome.maker.output/mpi_blastdb/te_proteins%2Efasta.mpi.10//.dbtmp0: Directory not empty at /maker/bin/../lib/FastaDB.pm line 145. > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue May 26 12:48:14 2020 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 26 May 2020 12:48:14 -0600 Subject: [maker-devel] Missing genes in lift-over with est2genome In-Reply-To: References: <373413EA-9D4C-44CF-AA51-632C0F54B7AC@gmail.com> Message-ID: If using the est_forward=1 options for the leftover, you can also anchor a search to a specific contig or region by adding a tag to the fasta header ( maker_coor=contig:1-1000; ). The tag will force Exonerate to only run on that region. Sometimes that can rescue a model. When you pass results into model_gff=, it will leave them unchanged. It just accepts or rejects them as is. But the model itself is considered evidence, and can alter clustering. Other_gff= just passes things through with no processing or evaluation (it?s like cut and paste). You can also try deFusion on result models for resolving gene fusions ?> https://wjidea.github.io/defusion/Introduction.html ?Carson > On Apr 30, 2020, at 6:58 AM, Lior Glick wrote: > > Thanks Carson - your answer was very helpful. > Another question related to the lift-over process, if I may. > I want to take the resulting gff and pass it on to another MAKER run, where I provide further, lower confidence evidence (ESTs and proteins). I'm not sure which option to use though. According to this helpful post , I tried using pred_gff and model_gff, but both created cases of fusion genes when genes are very adjacent to one another (see attached picture), even with the correct_est_fusion parameter enabled. It looks like the only way to take lifted-over genes "as-is" would be to use other_gff, but I figure that this was not really intended for genes. Would you recommend this usage? Am I missing something? > Thank you! > > ??????? ??? ??, 23 ????? 2020 ?-20:43 ??? ?Carson Holt?? ??>:? > There are percent cutoffs for the est2genome algorithm you can set in the maker_bopts.ctl file. Additionally, maker will give the alignment but not produce a gene model if it can?t translate through the est2genome alignment (i.e. stop codons in the assembly). I believe the cutoff is 50%. If you add est_forward=1 to the maker_opts.ctl file names will be copied from the alignment source and the score in the GFF3 column will be the percent match to the original transcript. > > ?Carson > > > > > On Apr 21, 2020, at 7:08 AM, Lior Glick > wrote: > > > > Hello, > > I am using MAKER to annotate a plant genome assembly. A high-quality reference genome and annotation exists for another variety of the same species, so my first step is lifting over reference genes to my genome. I do this by setting est2genome = 1 and providing MAKER with the reference cDNA (transcriptome). No other evidence is provided and no prediction is performed. Repeat masking is done using the reference repeats library. > > When checking the results, I found out lots of reference genes missing from the lift-over result. However, if I blast the sequences of these genes myself, I get good matches. I even see these matches when I look at the blast results buried in the MAKER data_store. > > For example, a transcript of length 1077 got a match of length 855 - 100% identity and no gaps. Bitscore was 1709 and E-value 0. This looks like a pretty good match, but it is not found in the final MAKER results (gff/fasta). > > Why is this happening? Are there some cutoffs that are not satisfied? If so, what are they and how can they be configured? > > > > Thanks, > > Lior > > _______________________________________________ > > maker-devel mailing list > > maker-devel at yandell-lab.org > > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue May 26 12:51:52 2020 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 26 May 2020 12:51:52 -0600 Subject: [maker-devel] Multiple UTR ? In-Reply-To: <16630d833d1448e7a771a4f2b19b0476@unil.ch> References: <16630d833d1448e7a771a4f2b19b0476@unil.ch> Message-ID: <003DC610-63E0-4D04-8CD2-431C92337C5F@gmail.com> The UTR is split across two exons. The intron is not considered part of the UTR. The UTR exists in the post splicing mRNA, so the corresponding genomic coordinates will have gaps because the introns that exist in the genome have been spliced out of the mRNA. So while the UTR is continuous in the mRNA, it is punctuated in the genome. ?Carson > On May 3, 2020, at 5:39 AM, Patrick Tran Van wrote: > > Hi Carson, > > for instance, if have this: > > SCFXX maker > five_prime_UTR 5164370 > 5164715 . > - . ID=GENE-RA:five_prime_utr;Parent=GENE-RA; > SCFXX maker > five_prime_UTR 5156091 > 5156136 . > - . ID=GENE-RA:five_prime_utr;Parent=GENE-RA; > > > Does it mean that real coordinate of the 5' UTR is from 5156091 to 5164715 ? > > Patrick Tran Van > > Bioinformatician: Lab Chapuisat & Schwander > Department of Ecology and Evolution > University of Lausanne > Lausanne - Switzerland > Office 3206 > From: Carson Holt > > Sent: Wednesday, February 26, 2020 8:27:43 PM > To: Patrick Tran Van > Cc: maker-devel at yandell-lab.org > Subject: Re: [maker-devel] Multiple UTR ? > > Sorry for the very slow reply. I found this way way down in my inbox. > > The UTR features are the parts of the exons that are not CDS. So multiple UTR, means it spans multiple exons, and must assembled to generate the full UTR in a browser. Any exon that is fully non-coding will produce a UTR feature that mirrors an exons coordinates, and if it?s partially coding the UTR will share the same start or end by will terminate somewhere in the middle with a CDS filling up the remains coordinates. The UTR and CDS features get tiled over the top of the exon features when assembling a gene model. > > ?Carson > > > >> On Dec 18, 2019, at 7:19 AM, Patrick Tran Van > wrote: >> >> Hi Carson, >> >> I have seen something strange in my annotation: multiple UTR. How can we explain this ? Thanks! >> >> Scaffold maker >> mRNA 12117462 >> 12128433 . >> - . >> ID=GENE_02395-RA;Parent=GENE_02395;Name=GENE_02395-RA;Alias=maker-Scaffold-augustus-gene-40.12-mRNA-3;_AED=0.02;_QI=5383|1|1|1|0.88|0.9|10|247|238;_eAED=0.02;Note=Protein of unknown function; >> Scaffold maker >> exon 12128112 >> 12128433 . >> - . >> ID=GENE_02395-RA:exon:571;Parent=GENE_02395-RA; >> Scaffold maker >> exon 12117462 >> 12118046 . >> - . >> ID=GENE_02395-RB:exon:569;Parent=GENE_02395-RB,GENE_02395-RC,GENE_02395-RA; >> Scaffold maker >> exon 12118141 >> 12118301 . >> - . >> ID=GENE_02395-RB:exon:568;Parent=GENE_02395-RB,GENE_02395-RC,GENE_02395-RA; >> Scaffold maker >> exon 12118386 >> 12118539 . >> - . >> ID=GENE_02395-RB:exon:567;Parent=GENE_02395-RB,GENE_02395-RA; >> Scaffold maker >> exon 12118818 >> 12122493 . >> - . >> ID=GENE_02395-RB:exon:566;Parent=GENE_02395-RB,GENE_02395-RA; >> Scaffold maker >> exon 12123591 >> 12123893 . >> - . >> ID=GENE_02395-RB:exon:565;Parent=GENE_02395-RB,GENE_02395-RC,GENE_02395-RA; >> Scaffold maker >> exon 12123995 >> 12124303 . >> - . >> ID=GENE_02395-RB:exon:564;Parent=GENE_02395-RB,GENE_02395-RC,GENE_02395-RA; >> Scaffold maker >> exon 12125119 >> 12125418 . >> - . >> ID=GENE_02395-RB:exon:563;Parent=GENE_02395-RB,GENE_02395-RC,GENE_02395-RA; >> Scaffold maker >> exon 12126005 >> 12126313 . >> - . >> ID=GENE_02395-RB:exon:562;Parent=GENE_02395-RB,GENE_02395-RC,GENE_02395-RA; >> Scaffold maker >> exon 12127460 >> 12127687 . >> - . >> ID=GENE_02395-RB:exon:561;Parent=GENE_02395-RB,GENE_02395-RC,GENE_02395-RA; >> Scaffold maker >> five_prime_UTR 12128112 >> 12128433 . >> - . >> ID=GENE_02395-RA:five_prime_utr;Parent=GENE_02395-RA; >> Scaffold maker >> five_prime_UTR 12127460 >> 12127687 . >> - . >> ID=GENE_02395-RA:five_prime_utr;Parent=GENE_02395-RA; >> Scaffold maker >> five_prime_UTR 12126005 >> 12126313 . >> - . >> ID=GENE_02395-RA:five_prime_utr;Parent=GENE_02395-RA; >> Scaffold maker >> five_prime_UTR 12125119 >> 12125418 . >> - . >> ID=GENE_02395-RA:five_prime_utr;Parent=GENE_02395-RA; >> Scaffold maker >> five_prime_UTR 12123995 >> 12124303 . >> - . >> ID=GENE_02395-RA:five_prime_utr;Parent=GENE_02395-RA; >> Scaffold maker >> five_prime_UTR 12123591 >> 12123893 . >> - . >> ID=GENE_02395-RA:five_prime_utr;Parent=GENE_02395-RA; >> Scaffold maker >> five_prime_UTR 12118882 >> 12122493 . >> - . >> ID=GENE_02395-RA:five_prime_utr;Parent=GENE_02395-RA; >> Scaffold maker >> CDS 12118818 >> 12118881 . >> - 0 >> ID=GENE_02395-RA:cds;Parent=GENE_02395-RA; >> Scaffold maker >> CDS 12118386 >> 12118539 . >> - 2 >> ID=GENE_02395-RA:cds;Parent=GENE_02395-RA; >> Scaffold maker >> CDS 12118141 >> 12118301 . >> - 1 >> ID=GENE_02395-RA:cds;Parent=GENE_02395-RA; >> Scaffold maker >> CDS 12117709 >> 12118046 . >> - 2 >> ID=GENE_02395-RA:cds;Parent=GENE_02395-RA; >> Scaffold maker >> three_prime_UTR 12117462 >> 12117708 . >> - . >> ID=GENE_02395-RA:three_prime_utr;Parent=GENE_02395-RA; >> >> >> >> Patrick Tran Van >> >> Bioinformatician: Lab Chapuisat & Schwander >> Department of Ecology and Evolution >> University of Lausanne >> Lausanne - Switzerland >> Office 3206 >> _______________________________________________ >> maker-devel mailing list >> maker-devel at yandell-lab.org >> http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From niconm89 at gmail.com Tue May 26 12:58:22 2020 From: niconm89 at gmail.com (=?UTF-8?Q?Nicol=C3=A1s_Moreyra?=) Date: Tue, 26 May 2020 15:58:22 -0300 Subject: [maker-devel] different number of annotated genes and transcripts In-Reply-To: References: Message-ID: Hi Carson, thanks for your reply. Yes, I did the same as you. Here are different outputs for the same annotation file: > grep -c -P "\tgene\t" Dato_struct-annot.noseq.gff > 17688 > grep -c -P "RNA\t" Dato_struct-annot.noseq.gff > 17688 > grep -c -P "mRNA\t" Dato_struct-annot.noseq.gff > 17205 > grep -P "RNA\t" Dato_struct-annot.noseq.gff| cut -f3 | sort -u > mRNA > tRNA After using a tool to extract transcripts sequences in a Fasta file, y obtained 17205 sequences. Looking for those genes without an associated transcript, it seems that you can only find tRNAs annotated there. It is odd: > Backbone_23 maker gene 486041 486112 . - . > ID=Dato03103;Name=Dato03103;Alias=trnascan-Backbone_23-noncoding-Glu_CTC-gene-4.38; > Backbone_23 maker tRNA 486041 486112 . - . > ID=Dato03103-RA;Parent=Dato03103;Name=Dato03103-RA;_AED=1.00;_QI=0|-1|0|0|-1|0|1|73|0;_eAED=1.00; > Backbone_23 maker exon 486041 486112 . - . > ID=Dato03103-RA:exon:45875;Parent=Dato03103-RA; The AED is bad in this example, so I'm thinking that it would be possible this gene had no evidence supporting it. I do not understand either the "Alias" for the gene line, it looks like trnaScan detected the gene. Any ideas? Nicol?s *--* *Nicolas Nahuel Moreyra* *BSc/MSc in Bioinformatics* *CONICET PhD Fellow @ IEGEBA* *PhD Student in Comparative Genomics @ EGE (**FCEyN - UBA) **-> **nmoreyra at ege.fcen.uba.ar * Professor of Bioinformatics @ Favaloro University Professor of Informatics @ IFTS N? 7 *Argentina* El mar., 26 de may. de 2020 a la(s) 14:54, Carson Holt (carsonhh at gmail.com) escribi?: > Perhaps you are counting wrong. If you want to know the number go genes, > you must look at the GFF3. You can use ?grep -c -P ?\tgene\t? file.gff?, > then the number of transcripts would be ?grep -c -P ?RNA\t? file.gff" > > Note that if you are using things like tRNAscan, you will get tRNA > transcripts and associated genes. If you are trying to count from the > fasta files, make sure you use the right file (maker.proteins.fasta and > maker.transcripts.fasta). > > Thanks, > Carson > > > On May 21, 2020, at 7:58 AM, Nicol?s Moreyra wrote: > > Dear all, > > First of all, thank you for sharing your experiences here. I tried to find > this issue in the posts already made but failed. > Secondly, I am sorry for asking you a silly question (I think), but after > I complete the genome annotation of four species, I obtained fewer > transcripts than genes. I do not understand why MAKER annotated genes > unable to transcribe. > I was trying to find the reason for this issue to discuss it in my thesis > but I am a bit lost. Has this happened to anyone? Is there any possible > cause that comes to mind? > > Thanks in advance. > > Nicol?s > > *--* > *Nicolas Nahuel Moreyra* > *BSc/MSc in Bioinformatics* > *CONICET PhD Fellow @ IEGEBA* > *PhD Student in Comparative Genomics @ EGE (**FCEyN - UBA) **-> **nmoreyra at ege.fcen.uba.ar > * > Professor of Bioinformatics @ Favaloro University > Professor of Informatics @ IFTS N? 7 > *Argentina* > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue May 26 13:01:38 2020 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 26 May 2020 13:01:38 -0600 Subject: [maker-devel] Unable to reproduce MAKER blastn results In-Reply-To: References: Message-ID: <39657402-9B41-4B82-8164-199ECCA634AC@gmail.com> The only downstream change to the blast results would be the removal of HSPs not meeting the bit_blastn of a minimum bitscore. Also note the prove is not a blast parameter. It is a post blast filter. The HSPs are tiled and flattened, then the percent coverage against the original query is calculated (i.e if every base of the query is represented at least once in the result, then coverage is 100% ). The blast results are used only for identifying the rough region a model overlaps that is then passed to exonerate. The exonerate alignment is used to generate the splice aware est2genome model. Many good blastn alignments will produce poor exonerate alignments, and no est2genome results. ?Carson > On May 5, 2020, at 3:46 AM, Lior Glick wrote: > > Hello, > > I am running MAKER 2.31.10 with a very simple configuration, with only EST evidence and the est2genome option enabled (basically a lift-over procedure). > I noticed that some of my transcripts are not included in the annotation output and when I looked at the blastn results the reason was clear - they do not pass the coverage cutoff defined in maker_bopts.ctl. Interestingly, when I tried running blastn myself, using the same command (taken from the maker log) and the same blastn version, I got slightly different results. Specifically, for some of the transcripts the MAKER blastn run produced less HSPs than my blastn run, resulting in a lower total coverage. The additional HSPs seem to have good % identity and E-values, so I don't understand why and how they are discarded. Are the blastn results changed by MAKER in subsequent steps (after the blastn run)? > Please find attached blastn results from MAKER run and from my run. You can look at transcript AT1G01740.3 as an example. in my.blastn, there are 8 HSPs, while MAKER.blastn only has 3 of them. > Can you explain the difference? Maybe it has to do with repeat masking or other processing of the genome sequence? > > Just to make sure you have all the details: > Relevant maker_bopts parameters: > pcov_blastn=0.7 #Blastn Percent Coverage Threhold EST-Genome Alignments > pid_blastn=0.85 #Blastn Percent Identity Threshold EST-Genome Aligments > eval_blastn=1e-10 #Blastn eval cutoff > bit_blastn=40 #Blastn bit cutoff > depth_blastn=0 #Blastn depth cutoff (0 to disable cutoff) > > Blastn command: > blastn -db /groups/itay_mayrose/nosnap/liorglic/Projects/PGCM/output/A_thaliana_pan_genome/PGC_de_novo/RESULT_RG_new/per_sample/col-0/liftover_SRR1945757/chunks/chunk00.fa/TMP/maker_sPf3Rf/TAIR10_longest_trans%2Efasta.mpi.10.0 -query /groups/itay_mayrose/nosnap/liorglic/Projects/PGCM/output/A_thaliana_pan_genome/PGC_de_novo/RESULT_RG_new/per_sample/col-0/liftover_SRR1945757/chunks/chunk00.fa/TMP/maker_sPf3Rf/0/chunk00.0 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-10 -word_size 28 -reward 1 -penalty -5 -gapopen 5 -gapextend 5 -dbsize 1000 -searchsp 500000000 -num_threads 10 -lcase_masking -dust yes -soft_masking true -show_gis -out > > Thank you! > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Tue May 26 13:03:31 2020 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 26 May 2020 13:03:31 -0600 Subject: [maker-devel] Question about maker. Maker2 failed In-Reply-To: References: Message-ID: <2F9626D1-F7C2-4BDA-A59F-C6566C0D558D@gmail.com> It is probably the formating of the models provided. There is something wrong with them. They must be match/match_part two level feature for rm_gff. You can send us the file, and I can take a look if it helps. ?Carson > On May 5, 2020, at 2:41 PM, ??? wrote: > > Hi maker developer, > > I'm using maker 2 to annotate a vertebrate genome. > When I try to provide rm_gff file, it always fails. > Here is log: > Now starting the contig!! > SeqID: chr_XXII > Length: 12689475 > #--------------------------------------------------------------------- > > > setting up GFF3 output and fasta chunks > doing repeat masking > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Did not specify a Hit End or Hit Begin > STACK: Error::throw > STACK: Bio::Root::Root::throw /gpfs/homefs/iee/zl19g775/miniconda3/envs/maker/lib/site_perl/5.26.2/Bio/Root/Root.pm:449 > STACK: Bio::Search::HSP::GenericHSP::_subject_seq_feature /gpfs/homefs/iee/zl19g775/miniconda3/envs/maker/lib/site_perl/5.26.2/Bio/Search/HSP/GenericHSP.pm:1604 > STACK: Bio::Search::HSP::GenericHSP::hit /gpfs/homefs/iee/zl19g775/miniconda3/envs/maker/lib/site_perl/5.26.2/Bio/Search/HSP/GenericHSP.pm:988 > STACK: repeat_mask_seq::separate_types /gpfs/homefs/iee/zl19g775/miniconda3/envs/maker/bin/../lib/repeat_mask_seq.pm:307 > STACK: repeat_mask_seq::mask_chunk /gpfs/homefs/iee/zl19g775/miniconda3/envs/maker/bin/../lib/repeat_mask_seq.pm:191 > STACK: Process::MpiChunk::_go /gpfs/homefs/iee/zl19g775/miniconda3/envs/maker/bin/../lib/Process/MpiChunk.pm:763 > STACK: Process::MpiChunk::run /gpfs/homefs/iee/zl19g775/miniconda3/envs/maker/bin/../lib/Process/MpiChunk.pm:341 > STACK: Process::MpiChunk::run_all /gpfs/homefs/iee/zl19g775/miniconda3/envs/maker/bin/../lib/Process/MpiChunk.pm:357 > STACK: Process::MpiTiers::run_all /gpfs/homefs/iee/zl19g775/miniconda3/envs/maker/bin/../lib/Process/MpiTiers.pm:287 > STACK: Process::MpiTiers::run_all /gpfs/homefs/iee/zl19g775/miniconda3/envs/maker/bin/../lib/Process/MpiTiers.pm:287 > STACK: /home/ubelix/iee/zl19g775/miniconda3/envs/maker/bin/maker:689 > ----------------------------------------------------------- > --> rank=NA, hostname=submit02.ubelix.unibe.ch > ERROR: Failed while doing repeat masking > ERROR: Chunk failed at level:0, tier_type:1 > FAILED CONTIG:chr_XXII > > ERROR: Chunk failed at level:2, tier_type:0 > FAILED CONTIG:chr_XXII > > examining contents of the fasta file and run log > > > > I also searched the google group and tried update my bioperl to 1.7.7 the latest version, but it didn't help. > > Could you please help me? > > Thanks a lot. > > Zuyao > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue May 26 13:15:20 2020 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 26 May 2020 13:15:20 -0600 Subject: [maker-devel] gene:multiple_Einit and overlaps_prev_exon errors in first round of SNAP training In-Reply-To: References: Message-ID: <92F27280-90B8-4AF2-8C1D-B54955A7C521@gmail.com> You have an ID collision in the GFF3. Check the gff3 being sent to maker2zff. If you are using GFF3 as input to MAKER, you likely have a non-unique ID's there that are causing the issue in the first place. ?Carson > On May 6, 2020, at 11:36 AM, Carneiro,Celine M wrote: > > Hello, > > I am getting the errors gene:multiple_Einit, gene:multiple_Eterm, and exon:overlaps_prev_exon, at just about every gene model. I've ran the first round of maker on a bird genome I'm annotating with no errors and have started the steps to train SNAP. However, after running fathom -categorize, just about every single gene model has the same set of errors. Here is an example from my log file after running fathom -categorize: > > MODEL117 1 1 8 - errors(6): gene:multiple_Einit gene:multiple_Eterm exon-7:overlaps_prev_exon exon-5:overlaps_prev_exon exon-3:overlaps_prev_exon exon-1:overlaps_prev_exon > MODEL851 1 1 100 - errors(78): gene:multiple_Einit gene:multiple_Eterm exon-99:overlaps_prev_exon exon-98:overlaps_prev_exon exon-97:overlaps_prev_exon exon-95:overlaps_prev_exon exon-94:overlaps_prev_exon exon-93:overlaps_prev_exon exon-91:overlaps_prev_exon exon-90:overlaps_prev_exon exon-89:overlaps_prev_exon exon-87:overlaps_prev_exon exon-86:overlaps_prev_exon exon-85:overlaps_prev_exon exon-83:overlaps_prev_exon exon-82:overlaps_prev_exon exon-81:overlaps_prev_exon exon-79:overlaps_prev_exon exon-78:overlaps_prev_exon exon-77:overlaps_prev_exon exon-75:overlaps_prev_exon exon-74:overlaps_prev_exon exon-73:overlaps_prev_exon exon-71:overlaps_prev_exon exon-70:overlaps_prev_exon exon-69:overlaps_prev_exon exon-67:overlaps_prev_exon exon-66:overlaps_prev_exon exon-65:overlaps_prev_exon exon-63:overlaps_prev_exon exon-62:overlaps_prev_exon exon-61:overlaps_prev_exon exon-59:overlaps_prev_exon exon-58:overlaps_prev_exon exon-57:overlaps_prev_exon exon-55:overlaps_prev_exon exon-54:overlaps_prev_exon exon-53:overlaps_prev_exon exon-51:overlaps_prev_exon exon-50:overlaps_prev_exon exon-49:overlaps_prev_exon exon-48:overlaps_prev_exon exon-47:overlaps_prev_exon exon-46:overlaps_prev_exon exon-45:overlaps_prev_exon exon-43:overlaps_prev_exon exon-42:overlaps_prev_exon exon-41:overlaps_prev_exon exon-39:overlaps_prev_exon exon-38:overlaps_prev_exon exon-37:overlaps_prev_exon exon-35:overlaps_prev_exon exon-34:overlaps_prev_exon exon-33:overlaps_prev_exon exon-31:overlaps_prev_exon exon-30:overlaps_prev_exon exon-29:overlaps_prev_exon exon-27:overlaps_prev_exon exon-26:overlaps_prev_exon exon-25:overlaps_prev_exon exon-23:overlaps_prev_exon exon-22:overlaps_prev_exon exon-21:overlaps_prev_exon exon-19:overlaps_prev_exon exon-18:overlaps_prev_exon exon-17:overlaps_prev_exon exon-15:overlaps_prev_exon exon-14:overlaps_prev_exon exon-13:overlaps_prev_exon exon-11:overlaps_prev_exon exon-10:overlaps_prev_exon exon-9:overlaps_prev_exon exon-7:overlaps_prev_exon exon-6:overlaps_prev_exon exon-5:overlaps_prev_exon exon-3:overlaps_prev_exon exon-2:overlaps_prev_exon exon-1:overlaps_prev_exon > MODEL190 1 1 39 + errors(35): gene:multiple_Einit gene:multiple_Eterm exon-2:overlaps_prev_exon exon-3:overlaps_prev_exon exon-4:overlaps_prev_exon exon-5:overlaps_prev_exon exon-6:overlaps_prev_exon exon-7:overlaps_prev_exon exon-8:overlaps_prev_exon exon-9:overlaps_prev_exon exon-11:overlaps_prev_exon exon-12:overlaps_prev_exon exon-13:overlaps_prev_exon exon-14:overlaps_prev_exon exon-15:overlaps_prev_exon exon-16:overlaps_prev_exon exon-17:overlaps_prev_exon exon-18:overlaps_prev_exon exon-20:overlaps_prev_exon exon-21:overlaps_prev_exon exon-22:overlaps_prev_exon exon-23:overlaps_prev_exon exon-24:overlaps_prev_exon exon-25:overlaps_prev_exon exon-26:overlaps_prev_exon exon-27:overlaps_prev_exon exon-29:overlaps_prev_exon exon-30:overlaps_prev_exon exon-32:overlaps_prev_exon exon-33:overlaps_prev_exon exon-34:overlaps_prev_exon exon-35:overlaps_prev_exon exon-36:overlaps_prev_exon exon-38:overlaps_prev_exon exon-39:overlaps_prev_exon > MODEL424 1 1 10 - errors(8): gene:multiple_Einit gene:multiple_Eterm exon-9:overlaps_prev_exon exon-7:overlaps_prev_exon exon-6:overlaps_prev_exon exon-5:overlaps_prev_exon exon-3:overlaps_prev_exon exon-1:overlaps_prev_exon > MODEL902 1 1 20 - errors(14): gene:multiple_Einit gene:multiple_Eterm exon-19:overlaps_prev_exon exon-18:overlaps_prev_exon exon-17:overlaps_prev_exon exon-15:overlaps_prev_exon exon-13:overlaps_prev_exon exon-11:overlaps_prev_exon exon-9:overlaps_prev_exon exon-7:overlaps_prev_exon exon-6:overlaps_prev_exon exon-5:overlaps_prev_exon exon-3:overlaps_prev_exon exon-1:overlaps_prev_exon > MODEL238 1 1 14 - errors(11): gene:multiple_Einit gene:multiple_Eterm exon-13:overlaps_prev_exon exon-12:overlaps_prev_exon exon-11:overlaps_prev_exon exon-9:overlaps_prev_exon exon-7:overlaps_prev_exon exon-6:overlaps_prev_exon exon-5:overlaps_prev_exon exon-3:overlaps_prev_exon exon-1:overlaps_prev_exon > MODEL39 1 1 6 - errors(1): exon-3:overlaps_prev_exon > MODEL119 1 1 10 + errors(8): gene:multiple_Einit gene:multiple_Eterm exon-2:overlaps_prev_exon exon-4:overlaps_prev_exon exon-6:overlaps_prev_exon exon-7:overlaps_prev_exon exon-8:overlaps_prev_exon exon-10:overlaps_prev_exon > > Furthermore, I checked my genome.ann file and noticed that my Einit and Exon sites are duplicated. For example: > > >ScdimlH_1004;HRSCAF=1084 > Einit 38730 38677 MODEL851 > Exon 38255 38178 MODEL851 > Exon 38074 38021 MODEL851 > Exon 24755 24717 MODEL851 > Exon 24213 24149 MODEL851 > Exon 23176 23098 MODEL851 > Exon 22037 21961 MODEL851 > Exon 21269 21080 MODEL851 > Exon 20232 20167 MODEL851 > Exon 19742 19704 MODEL851 > Exon 14705 14590 MODEL851 > Exon 14255 13980 MODEL851 > Exon 14169 13980 MODEL851 > Exon 13303 13223 MODEL851 > Exon 13303 13223 MODEL851 > Exon 12782 12639 MODEL851 > Exon 12782 12639 MODEL851 > Exon 5761 5592 MODEL851 > Exon 5482 5404 MODEL851 > Exon 5140 5064 MODEL851 > Exon 4951 4750 MODEL851 > Exon 4567 4502 MODEL851 > Exon 4256 4185 MODEL851 > Exon 3569 3403 MODEL851 > Exon 3157 3076 MODEL851 > Exon 2936 2800 MODEL851 > Eterm 2186 2000 MODEL851 > Einit 38730 38677 MODEL851 > Exon 38255 38178 MODEL851 > Exon 38074 38021 MODEL851 > Exon 24755 24717 MODEL851 > Exon 24213 24149 MODEL851 > Exon 23176 23098 MODEL851 > Exon 22037 21961 MODEL851 > Exon 21269 21080 MODEL851 > Exon 20232 20167 MODEL851 > Exon 19742 19704 MODEL851 > Exon 14705 14590 MODEL851 > Exon 14255 13980 MODEL851 > Exon 14169 13980 MODEL851 > Exon 13303 13223 MODEL851 > Exon 13303 13223 MODEL851 > Exon 12782 12639 MODEL851 > Exon 12782 12639 MODEL851 > Exon 5761 5592 MODEL851 > Exon 5482 5404 MODEL851 > Exon 5140 5064 MODEL851 > Exon 4951 4750 MODEL851 > Exon 4567 4502 MODEL851 > Exon 4256 4185 MODEL851 > Exon 3569 3403 MODEL851 > Exon 3157 3076 MODEL851 > Exon 2936 2800 MODEL851 > Eterm 2186 2000 MODEL851 > > Any ideas why I'm seeing this duplication? Lastly, any ideas why my exons are overlapping so much? I appreciate any input and please let me know if you require any more information. > > Thank you! > > Celine > > > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue May 26 13:26:03 2020 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 26 May 2020 13:26:03 -0600 Subject: [maker-devel] Maker v3.01 change-log + 3'UTR question In-Reply-To: References: Message-ID: <07D0AA31-0FCF-4E60-AC50-1E377CBF870C@gmail.com> 2. If I was to improve the annotation of my 3? UTRs within a certain (non-model species) gff3, is there a particular way or a protocol to follow? I was thinking for example that Lexogen has released their 3? UTR kit for RNA-seq of the three prime end of transcripts. Would it be possible to feed those reads to maker and somehow suggest that the reads are originating from the three-prime end so that this info is then passed in the gff3 file? You could pass final models back in as predicted_gff (no UTR on the models), then pass in just the evidence you want as UTR as est_gff (would have to be assembled and not as individual reads). As long as the overlap the pred_gff models, MAKER would try and make UTR out of them. Might be worth an experiment. ?Carson -------------- next part -------------- An HTML attachment was scrubbed... URL: From shawn.trojahn at wsu.edu Fri May 29 15:49:38 2020 From: shawn.trojahn at wsu.edu (Trojahn, Shawn Michael) Date: Fri, 29 May 2020 21:49:38 +0000 Subject: [maker-devel] Intron lengths below minimum cutoff Message-ID: Hello, I have been having a problem with the final annotation coming from Maker2 where I have a few thousand introns that are below the minimum intron value I have set. Most of the exons around these problem introns have no support in the final merged gff file, but a few are supported by blast hits. Is there a reason why these introns would remain in the final gff? Thanks, Shawn -------------- next part -------------- An HTML attachment was scrubbed... URL: From patrick.tranvan at unil.ch Sun May 3 05:39:59 2020 From: patrick.tranvan at unil.ch (Patrick Tran Van) Date: Sun, 3 May 2020 11:39:59 +0000 Subject: [maker-devel] Multiple UTR ? In-Reply-To: References: , Message-ID: <16630d833d1448e7a771a4f2b19b0476@unil.ch> Hi Carson, for instance, if have this: SCFXX maker five_prime_UTR 5164370 5164715 . - . ID=GENE-RA:five_prime_utr;Parent=GENE-RA; SCFXX maker five_prime_UTR 5156091 5156136 . - . ID=GENE-RA:five_prime_utr;Parent=GENE-RA; Does it mean that real coordinate of the 5' UTR is from 5156091 to 5164715 ? Patrick Tran Van Bioinformatician: Lab Chapuisat & Schwander Department of Ecology and Evolution University of Lausanne Lausanne - Switzerland Office 3206 ________________________________ From: Carson Holt Sent: Wednesday, February 26, 2020 8:27:43 PM To: Patrick Tran Van Cc: maker-devel at yandell-lab.org Subject: Re: [maker-devel] Multiple UTR ? Sorry for the very slow reply. I found this way way down in my inbox. The UTR features are the parts of the exons that are not CDS. So multiple UTR, means it spans multiple exons, and must assembled to generate the full UTR in a browser. Any exon that is fully non-coding will produce a UTR feature that mirrors an exons coordinates, and if it?s partially coding the UTR will share the same start or end by will terminate somewhere in the middle with a CDS filling up the remains coordinates. The UTR and CDS features get tiled over the top of the exon features when assembling a gene model. ?Carson On Dec 18, 2019, at 7:19 AM, Patrick Tran Van > wrote: Hi Carson, I have seen something strange in my annotation: multiple UTR. How can we explain this ? Thanks! Scaffold maker mRNA 12117462 12128433 . - . ID=GENE_02395-RA;Parent=GENE_02395;Name=GENE_02395-RA;Alias=maker-Scaffold-augustus-gene-40.12-mRNA-3;_AED=0.02;_QI=5383|1|1|1|0.88|0.9|10|247|238;_eAED=0.02;Note=Protein of unknown function; Scaffold maker exon 12128112 12128433 . - . ID=GENE_02395-RA:exon:571;Parent=GENE_02395-RA; Scaffold maker exon 12117462 12118046 . - . ID=GENE_02395-RB:exon:569;Parent=GENE_02395-RB,GENE_02395-RC,GENE_02395-RA; Scaffold maker exon 12118141 12118301 . - . ID=GENE_02395-RB:exon:568;Parent=GENE_02395-RB,GENE_02395-RC,GENE_02395-RA; Scaffold maker exon 12118386 12118539 . - . ID=GENE_02395-RB:exon:567;Parent=GENE_02395-RB,GENE_02395-RA; Scaffold maker exon 12118818 12122493 . - . ID=GENE_02395-RB:exon:566;Parent=GENE_02395-RB,GENE_02395-RA; Scaffold maker exon 12123591 12123893 . - . ID=GENE_02395-RB:exon:565;Parent=GENE_02395-RB,GENE_02395-RC,GENE_02395-RA; Scaffold maker exon 12123995 12124303 . - . ID=GENE_02395-RB:exon:564;Parent=GENE_02395-RB,GENE_02395-RC,GENE_02395-RA; Scaffold maker exon 12125119 12125418 . - . ID=GENE_02395-RB:exon:563;Parent=GENE_02395-RB,GENE_02395-RC,GENE_02395-RA; Scaffold maker exon 12126005 12126313 . - . ID=GENE_02395-RB:exon:562;Parent=GENE_02395-RB,GENE_02395-RC,GENE_02395-RA; Scaffold maker exon 12127460 12127687 . - . ID=GENE_02395-RB:exon:561;Parent=GENE_02395-RB,GENE_02395-RC,GENE_02395-RA; Scaffold maker five_prime_UTR 12128112 12128433 . - . ID=GENE_02395-RA:five_prime_utr;Parent=GENE_02395-RA; Scaffold maker five_prime_UTR 12127460 12127687 . - . ID=GENE_02395-RA:five_prime_utr;Parent=GENE_02395-RA; Scaffold maker five_prime_UTR 12126005 12126313 . - . ID=GENE_02395-RA:five_prime_utr;Parent=GENE_02395-RA; Scaffold maker five_prime_UTR 12125119 12125418 . - . ID=GENE_02395-RA:five_prime_utr;Parent=GENE_02395-RA; Scaffold maker five_prime_UTR 12123995 12124303 . - . ID=GENE_02395-RA:five_prime_utr;Parent=GENE_02395-RA; Scaffold maker five_prime_UTR 12123591 12123893 . - . ID=GENE_02395-RA:five_prime_utr;Parent=GENE_02395-RA; Scaffold maker five_prime_UTR 12118882 12122493 . - . ID=GENE_02395-RA:five_prime_utr;Parent=GENE_02395-RA; Scaffold maker CDS 12118818 12118881 . - 0 ID=GENE_02395-RA:cds;Parent=GENE_02395-RA; Scaffold maker CDS 12118386 12118539 . - 2 ID=GENE_02395-RA:cds;Parent=GENE_02395-RA; Scaffold maker CDS 12118141 12118301 . - 1 ID=GENE_02395-RA:cds;Parent=GENE_02395-RA; Scaffold maker CDS 12117709 12118046 . - 2 ID=GENE_02395-RA:cds;Parent=GENE_02395-RA; Scaffold maker three_prime_UTR 12117462 12117708 . - . ID=GENE_02395-RA:three_prime_utr;Parent=GENE_02395-RA; Patrick Tran Van Bioinformatician: Lab Chapuisat & Schwander Department of Ecology and Evolution University of Lausanne Lausanne - Switzerland Office 3206 _______________________________________________ maker-devel mailing list maker-devel at yandell-lab.org http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From liorglic at mail.tau.ac.il Tue May 5 03:46:01 2020 From: liorglic at mail.tau.ac.il (Lior Glick) Date: Tue, 5 May 2020 12:46:01 +0300 Subject: [maker-devel] Unable to reproduce MAKER blastn results Message-ID: Hello, I am running MAKER 2.31.10 with a very simple configuration, with only EST evidence and the est2genome option enabled (basically a lift-over procedure). I noticed that some of my transcripts are not included in the annotation output and when I looked at the blastn results the reason was clear - they do not pass the coverage cutoff defined in maker_bopts.ctl. Interestingly, when I tried running blastn myself, using the same command (taken from the maker log) and the same blastn version, I got slightly different results. Specifically, for some of the transcripts the MAKER blastn run produced less HSPs than my blastn run, resulting in a lower total coverage. The additional HSPs seem to have good % identity and E-values, so I don't understand why and how they are discarded. Are the blastn results changed by MAKER in subsequent steps (after the blastn run)? Please find attached blastn results from MAKER run and from my run. You can look at transcript AT1G01740.3 as an example. in my.blastn, there are 8 HSPs, while MAKER.blastn only has 3 of them. Can you explain the difference? Maybe it has to do with repeat masking or other processing of the genome sequence? Just to make sure you have all the details: Relevant maker_bopts parameters: pcov_blastn=0.7 #Blastn Percent Coverage Threhold EST-Genome Alignments pid_blastn=0.85 #Blastn Percent Identity Threshold EST-Genome Aligments eval_blastn=1e-10 #Blastn eval cutoff bit_blastn=40 #Blastn bit cutoff depth_blastn=0 #Blastn depth cutoff (0 to disable cutoff) Blastn command: blastn -db /groups/itay_mayrose/nosnap/liorglic/Projects/PGCM/output/A_thaliana_pan_genome/PGC_de_novo/RESULT_RG_new/per_sample/col-0/liftover_SRR1945757/chunks/chunk00.fa/TMP/maker_sPf3Rf/TAIR10_longest_trans%2Efasta.mpi.10.0 -query /groups/itay_mayrose/nosnap/liorglic/Projects/PGCM/output/A_thaliana_pan_genome/PGC_de_novo/RESULT_RG_new/per_sample/col-0/liftover_SRR1945757/chunks/chunk00.fa/TMP/maker_sPf3Rf/0/chunk00.0 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-10 -word_size 28 -reward 1 -penalty -5 -gapopen 5 -gapextend 5 -dbsize 1000 -searchsp 500000000 -num_threads 10 -lcase_masking -dust yes -soft_masking true -show_gis -out Thank you! -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: my.blastn Type: application/octet-stream Size: 440090 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: MAKER.blastn Type: application/octet-stream Size: 1143936 bytes Desc: not available URL: From zuyao.liu.0910 at gmail.com Tue May 5 14:41:25 2020 From: zuyao.liu.0910 at gmail.com (=?UTF-8?B?56WW5bCn5YiY?=) Date: Tue, 5 May 2020 22:41:25 +0200 Subject: [maker-devel] Question about maker. Maker2 failed Message-ID: Hi maker developer, I'm using maker 2 to annotate a vertebrate genome. When I try to provide rm_gff file, it always fails. Here is log: Now starting the contig!! SeqID: chr_XXII Length: 12689475 #--------------------------------------------------------------------- setting up GFF3 output and fasta chunks doing repeat masking ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Did not specify a Hit End or Hit Begin STACK: Error::throw STACK: Bio::Root::Root::throw /gpfs/homefs/iee/zl19g775/miniconda3/envs/maker/lib/site_perl/5.26.2/Bio/Root/Root.pm:449 STACK: Bio::Search::HSP::GenericHSP::_subject_seq_feature /gpfs/homefs/iee/zl19g775/miniconda3/envs/maker/lib/site_perl/5.26.2/Bio/Search/HSP/GenericHSP.pm:1604 STACK: Bio::Search::HSP::GenericHSP::hit /gpfs/homefs/iee/zl19g775/miniconda3/envs/maker/lib/site_perl/5.26.2/Bio/Search/HSP/GenericHSP.pm:988 STACK: repeat_mask_seq::separate_types /gpfs/homefs/iee/zl19g775/miniconda3/envs/maker/bin/../lib/ repeat_mask_seq.pm:307 STACK: repeat_mask_seq::mask_chunk /gpfs/homefs/iee/zl19g775/miniconda3/envs/maker/bin/../lib/ repeat_mask_seq.pm:191 STACK: Process::MpiChunk::_go /gpfs/homefs/iee/zl19g775/miniconda3/envs/maker/bin/../lib/Process/MpiChunk.pm:763 STACK: Process::MpiChunk::run /gpfs/homefs/iee/zl19g775/miniconda3/envs/maker/bin/../lib/Process/MpiChunk.pm:341 STACK: Process::MpiChunk::run_all /gpfs/homefs/iee/zl19g775/miniconda3/envs/maker/bin/../lib/Process/MpiChunk.pm:357 STACK: Process::MpiTiers::run_all /gpfs/homefs/iee/zl19g775/miniconda3/envs/maker/bin/../lib/Process/MpiTiers.pm:287 STACK: Process::MpiTiers::run_all /gpfs/homefs/iee/zl19g775/miniconda3/envs/maker/bin/../lib/Process/MpiTiers.pm:287 STACK: /home/ubelix/iee/zl19g775/miniconda3/envs/maker/bin/maker:689 ----------------------------------------------------------- --> rank=NA, hostname=submit02.ubelix.unibe.ch ERROR: Failed while doing repeat masking ERROR: Chunk failed at level:0, tier_type:1 FAILED CONTIG:chr_XXII ERROR: Chunk failed at level:2, tier_type:0 FAILED CONTIG:chr_XXII examining contents of the fasta file and run log I also searched the google group and tried update my bioperl to 1.7.7 the latest version, but it didn't help. Could you please help me? Thanks a lot. Zuyao -------------- next part -------------- An HTML attachment was scrubbed... URL: From c143dad at ufl.edu Wed May 6 11:36:40 2020 From: c143dad at ufl.edu (Carneiro,Celine M) Date: Wed, 6 May 2020 17:36:40 +0000 Subject: [maker-devel] gene:multiple_Einit and overlaps_prev_exon errors in first round of SNAP training Message-ID: Hello, I am getting the errors gene:multiple_Einit, gene:multiple_Eterm, and exon:overlaps_prev_exon, at just about every gene model. I've ran the first round of maker on a bird genome I'm annotating with no errors and have started the steps to train SNAP. However, after running fathom -categorize, just about every single gene model has the same set of errors. Here is an example from my log file after running fathom -categorize: MODEL117 1 1 8 - errors(6): gene:multiple_Einit gene:multiple_Eterm exon-7:overlaps_prev_exon exon-5:overlaps_prev_exon exon-3:overlaps_prev_exon exon-1:overlaps_prev_exon MODEL851 1 1 100 - errors(78): gene:multiple_Einit gene:multiple_Eterm exon-99:overlaps_prev_exon exon-98:overlaps_prev_exon exon-97:overlaps_prev_exon exon-95:overlaps_prev_exon exon-94:overlaps_prev_exon exon-93:overlaps_prev_exon exon-91:overlaps_prev_exon exon-90:overlaps_prev_exon exon-89:overlaps_prev_exon exon-87:overlaps_prev_exon exon-86:overlaps_prev_exon exon-85:overlaps_prev_exon exon-83:overlaps_prev_exon exon-82:overlaps_prev_exon exon-81:overlaps_prev_exon exon-79:overlaps_prev_exon exon-78:overlaps_prev_exon exon-77:overlaps_prev_exon exon-75:overlaps_prev_exon exon-74:overlaps_prev_exon exon-73:overlaps_prev_exon exon-71:overlaps_prev_exon exon-70:overlaps_prev_exon exon-69:overlaps_prev_exon exon-67:overlaps_prev_exon exon-66:overlaps_prev_exon exon-65:overlaps_prev_exon exon-63:overlaps_prev_exon exon-62:overlaps_prev_exon exon-61:overlaps_prev_exon exon-59:overlaps_prev_exon exon-58:overlaps_prev_exon exon-57:overlaps_prev_exon exon-55:overlaps_prev_exon exon-54:overlaps_prev_exon exon-53:overlaps_prev_exon exon-51:overlaps_prev_exon exon-50:overlaps_prev_exon exon-49:overlaps_prev_exon exon-48:overlaps_prev_exon exon-47:overlaps_prev_exon exon-46:overlaps_prev_exon exon-45:overlaps_prev_exon exon-43:overlaps_prev_exon exon-42:overlaps_prev_exon exon-41:overlaps_prev_exon exon-39:overlaps_prev_exon exon-38:overlaps_prev_exon exon-37:overlaps_prev_exon exon-35:overlaps_prev_exon exon-34:overlaps_prev_exon exon-33:overlaps_prev_exon exon-31:overlaps_prev_exon exon-30:overlaps_prev_exon exon-29:overlaps_prev_exon exon-27:overlaps_prev_exon exon-26:overlaps_prev_exon exon-25:overlaps_prev_exon exon-23:overlaps_prev_exon exon-22:overlaps_prev_exon exon-21:overlaps_prev_exon exon-19:overlaps_prev_exon exon-18:overlaps_prev_exon exon-17:overlaps_prev_exon exon-15:overlaps_prev_exon exon-14:overlaps_prev_exon exon-13:overlaps_prev_exon exon-11:overlaps_prev_exon exon-10:overlaps_prev_exon exon-9:overlaps_prev_exon exon-7:overlaps_prev_exon exon-6:overlaps_prev_exon exon-5:overlaps_prev_exon exon-3:overlaps_prev_exon exon-2:overlaps_prev_exon exon-1:overlaps_prev_exon MODEL190 1 1 39 + errors(35): gene:multiple_Einit gene:multiple_Eterm exon-2:overlaps_prev_exon exon-3:overlaps_prev_exon exon-4:overlaps_prev_exon exon-5:overlaps_prev_exon exon-6:overlaps_prev_exon exon-7:overlaps_prev_exon exon-8:overlaps_prev_exon exon-9:overlaps_prev_exon exon-11:overlaps_prev_exon exon-12:overlaps_prev_exon exon-13:overlaps_prev_exon exon-14:overlaps_prev_exon exon-15:overlaps_prev_exon exon-16:overlaps_prev_exon exon-17:overlaps_prev_exon exon-18:overlaps_prev_exon exon-20:overlaps_prev_exon exon-21:overlaps_prev_exon exon-22:overlaps_prev_exon exon-23:overlaps_prev_exon exon-24:overlaps_prev_exon exon-25:overlaps_prev_exon exon-26:overlaps_prev_exon exon-27:overlaps_prev_exon exon-29:overlaps_prev_exon exon-30:overlaps_prev_exon exon-32:overlaps_prev_exon exon-33:overlaps_prev_exon exon-34:overlaps_prev_exon exon-35:overlaps_prev_exon exon-36:overlaps_prev_exon exon-38:overlaps_prev_exon exon-39:overlaps_prev_exon MODEL424 1 1 10 - errors(8): gene:multiple_Einit gene:multiple_Eterm exon-9:overlaps_prev_exon exon-7:overlaps_prev_exon exon-6:overlaps_prev_exon exon-5:overlaps_prev_exon exon-3:overlaps_prev_exon exon-1:overlaps_prev_exon MODEL902 1 1 20 - errors(14): gene:multiple_Einit gene:multiple_Eterm exon-19:overlaps_prev_exon exon-18:overlaps_prev_exon exon-17:overlaps_prev_exon exon-15:overlaps_prev_exon exon-13:overlaps_prev_exon exon-11:overlaps_prev_exon exon-9:overlaps_prev_exon exon-7:overlaps_prev_exon exon-6:overlaps_prev_exon exon-5:overlaps_prev_exon exon-3:overlaps_prev_exon exon-1:overlaps_prev_exon MODEL238 1 1 14 - errors(11): gene:multiple_Einit gene:multiple_Eterm exon-13:overlaps_prev_exon exon-12:overlaps_prev_exon exon-11:overlaps_prev_exon exon-9:overlaps_prev_exon exon-7:overlaps_prev_exon exon-6:overlaps_prev_exon exon-5:overlaps_prev_exon exon-3:overlaps_prev_exon exon-1:overlaps_prev_exon MODEL39 1 1 6 - errors(1): exon-3:overlaps_prev_exon MODEL119 1 1 10 + errors(8): gene:multiple_Einit gene:multiple_Eterm exon-2:overlaps_prev_exon exon-4:overlaps_prev_exon exon-6:overlaps_prev_exon exon-7:overlaps_prev_exon exon-8:overlaps_prev_exon exon-10:overlaps_prev_exon Furthermore, I checked my genome.ann file and noticed that my Einit and Exon sites are duplicated. For example: >ScdimlH_1004;HRSCAF=1084 Einit 38730 38677 MODEL851 Exon 38255 38178 MODEL851 Exon 38074 38021 MODEL851 Exon 24755 24717 MODEL851 Exon 24213 24149 MODEL851 Exon 23176 23098 MODEL851 Exon 22037 21961 MODEL851 Exon 21269 21080 MODEL851 Exon 20232 20167 MODEL851 Exon 19742 19704 MODEL851 Exon 14705 14590 MODEL851 Exon 14255 13980 MODEL851 Exon 14169 13980 MODEL851 Exon 13303 13223 MODEL851 Exon 13303 13223 MODEL851 Exon 12782 12639 MODEL851 Exon 12782 12639 MODEL851 Exon 5761 5592 MODEL851 Exon 5482 5404 MODEL851 Exon 5140 5064 MODEL851 Exon 4951 4750 MODEL851 Exon 4567 4502 MODEL851 Exon 4256 4185 MODEL851 Exon 3569 3403 MODEL851 Exon 3157 3076 MODEL851 Exon 2936 2800 MODEL851 Eterm 2186 2000 MODEL851 Einit 38730 38677 MODEL851 Exon 38255 38178 MODEL851 Exon 38074 38021 MODEL851 Exon 24755 24717 MODEL851 Exon 24213 24149 MODEL851 Exon 23176 23098 MODEL851 Exon 22037 21961 MODEL851 Exon 21269 21080 MODEL851 Exon 20232 20167 MODEL851 Exon 19742 19704 MODEL851 Exon 14705 14590 MODEL851 Exon 14255 13980 MODEL851 Exon 14169 13980 MODEL851 Exon 13303 13223 MODEL851 Exon 13303 13223 MODEL851 Exon 12782 12639 MODEL851 Exon 12782 12639 MODEL851 Exon 5761 5592 MODEL851 Exon 5482 5404 MODEL851 Exon 5140 5064 MODEL851 Exon 4951 4750 MODEL851 Exon 4567 4502 MODEL851 Exon 4256 4185 MODEL851 Exon 3569 3403 MODEL851 Exon 3157 3076 MODEL851 Exon 2936 2800 MODEL851 Eterm 2186 2000 MODEL851 Any ideas why I'm seeing this duplication? Lastly, any ideas why my exons are overlapping so much? I appreciate any input and please let me know if you require any more information. Thank you! Celine -------------- next part -------------- An HTML attachment was scrubbed... URL: From peruzzaluca at gmail.com Tue May 19 02:31:22 2020 From: peruzzaluca at gmail.com (Luca Peruzza) Date: Tue, 19 May 2020 10:31:22 +0200 Subject: [maker-devel] Maker v3.01 change-log + 3'UTR question Message-ID: Hi There, I have two questions and I hope you guys can help me with them: 1. I have seen that maker version 3.01 is now out. Is there a change log available to see the changes in comparison to the previous maker version and have a glimpse of the new features of this release? 2. If I was to improve the annotation of my 3? UTRs within a certain (non-model species) gff3, is there a particular way or a protocol to follow? I was thinking for example that Lexogen has released their 3? UTR kit for RNA-seq of the three prime end of transcripts. Would it be possible to feed those reads to maker and somehow suggest that the reads are originating from the three-prime end so that this info is then passed in the gff3 file? Thanks a lot in advance for your help Best Luca From zuyao.liu.0910 at gmail.com Tue May 19 03:10:30 2020 From: zuyao.liu.0910 at gmail.com (=?UTF-8?B?56WW5bCn5YiY?=) Date: Tue, 19 May 2020 11:10:30 +0200 Subject: [maker-devel] Question about maker. Maker2 failed Message-ID: Hi maker developers I'm using maker 2 to annotate a fish genome. When I try to provide rm_gff file, it always fails. Here is log: collecting blastx repeatmasking doing repeat masking processing all repeats deleted:0 hits in cluster::shadow_cluster... Died at /gpfs/homefs/iee/zl19g775/miniconda3/envs/maker/bin/../lib/Bio/Search/Hit/PhatHit/Base.pm line 188. --> rank=23, hostname=hnode48 ERROR: Failed while processing all repeats ERROR: Chunk failed at level:3, tier_type:1 FAILED CONTIG:chr_XXIII ERROR: Chunk failed at level:2, tier_type:0 FAILED CONTIG:chr_XXIII I use maker 2.3.10 with repeatmasker 4.0.9. I saw someone got this error as well and I followed the solutions. I tried update to blast 2.9.0, rmblast 2.9.0,bioperl1.7.7 and also checked rm gff file with gff3 validator. But the error still existed. Do you have any suggestions? Thanks a lot for your help. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jason.stajich at gmail.com Tue May 19 17:29:38 2020 From: jason.stajich at gmail.com (Jason Stajich) Date: Tue, 19 May 2020 16:29:38 -0700 Subject: [maker-devel] Maker v3.01 change-log + 3'UTR question In-Reply-To: References: Message-ID: Luca - I would suggest PASA as a tool for 3'UTR (and 5'UTR) improvement in gene annotation too. https://github.com/PASApipeline/PASApipeline Funannotate has a step that can be use to run and update gene models if you want to also take on from an existing maker run - https://funannotate.readthedocs.io/en/latest/ Jason Jason Stajich jason.stajich at gmail.com On Tue, May 19, 2020 at 1:33 AM Luca Peruzza wrote: > Hi There, > I have two questions and I hope you guys can help me with them: > > 1. I have seen that maker version 3.01 is now out. Is there a change log > available to see the changes in comparison to the previous maker version > and have a glimpse of the new features of this release? > > 2. If I was to improve the annotation of my 3? UTRs within a certain > (non-model species) gff3, is there a particular way or a protocol to > follow? I was thinking for example that Lexogen has released their 3? UTR > kit for RNA-seq of the three prime end of transcripts. Would it be possible > to feed those reads to maker and somehow suggest that the reads are > originating from the three-prime end so that this info is then passed in > the gff3 file? > > Thanks a lot in advance for your help > Best > Luca > > > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From peruzzaluca at gmail.com Wed May 20 06:41:14 2020 From: peruzzaluca at gmail.com (Luca Peruzza) Date: Wed, 20 May 2020 14:41:14 +0200 Subject: [maker-devel] Maker v3.01 change-log + 3'UTR question In-Reply-To: References: Message-ID: <4ABBF9F2-4F9E-4D7F-B821-3276D4D3EFD1@gmail.com> Thanks Jason, Yes, my idea was to add extra 3?UTR info to an existing maker gff3 file. If you say that funannotate can do it, I?ll have a look. Thanks Luca > On 20 May 2020, at 01:29, Jason Stajich wrote: > > Luca - I would suggest PASA as a tool for 3'UTR (and 5'UTR) improvement in gene annotation too. https://github.com/PASApipeline/PASApipeline > > Funannotate has a step that can be use to run and update gene models if you want to also take on from an existing maker run - https://funannotate.readthedocs.io/en/latest/ > > Jason > Jason Stajich > jason.stajich at gmail.com > > > On Tue, May 19, 2020 at 1:33 AM Luca Peruzza > wrote: > Hi There, > I have two questions and I hope you guys can help me with them: > > 1. I have seen that maker version 3.01 is now out. Is there a change log available to see the changes in comparison to the previous maker version and have a glimpse of the new features of this release? > > 2. If I was to improve the annotation of my 3? UTRs within a certain (non-model species) gff3, is there a particular way or a protocol to follow? I was thinking for example that Lexogen has released their 3? UTR kit for RNA-seq of the three prime end of transcripts. Would it be possible to feed those reads to maker and somehow suggest that the reads are originating from the three-prime end so that this info is then passed in the gff3 file? > > Thanks a lot in advance for your help > Best > Luca > > > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From niconm89 at gmail.com Thu May 21 07:58:34 2020 From: niconm89 at gmail.com (=?UTF-8?Q?Nicol=C3=A1s_Moreyra?=) Date: Thu, 21 May 2020 10:58:34 -0300 Subject: [maker-devel] different number of annotated genes and transcripts Message-ID: Dear all, First of all, thank you for sharing your experiences here. I tried to find this issue in the posts already made but failed. Secondly, I am sorry for asking you a silly question (I think), but after I complete the genome annotation of four species, I obtained fewer transcripts than genes. I do not understand why MAKER annotated genes unable to transcribe. I was trying to find the reason for this issue to discuss it in my thesis but I am a bit lost. Has this happened to anyone? Is there any possible cause that comes to mind? Thanks in advance. Nicol?s *--* *Nicolas Nahuel Moreyra* *BSc/MSc in Bioinformatics* *CONICET PhD Fellow @ IEGEBA* *PhD Student in Comparative Genomics @ EGE (**FCEyN - UBA) **-> **nmoreyra at ege.fcen.uba.ar * Professor of Bioinformatics @ Favaloro University Professor of Informatics @ IFTS N? 7 *Argentina* -------------- next part -------------- An HTML attachment was scrubbed... URL: From yujin at genomics.cn Fri May 22 23:46:33 2020 From: yujin at genomics.cn (=?gb2312?B?0+C9+ChKaW4gWXUp?=) Date: Sat, 23 May 2020 05:46:33 +0000 Subject: [maker-devel] maker error-ERROR: Failed while annotating transcripts Message-ID: Hi, Dear developers. I'm using maker-3.01.03 to annotate a plant genome. But I met this error: Can't locate object method "add_entry" via package "1" (perhaps you forgot to load "1"?) at /vol2/liuyang_group/liuyang/software/maker-3.01.03/bin/../lib/Widget/snap.pm line 540. ERROR: Failed while annotating transcripts The attached file is the full STDERR from maker. I have searched the archived mailing list, and found a similar question (https://groups.google.com/forum/#!topic/maker-devel/fGGCKXhi6cw), but I didn't find any error which occurred before this one in the log. Appreciate it a lot if you could help me! Best regards Jin Yu ??? ?? 15527740380 ??????????? ???????? -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker.log Type: application/octet-stream Size: 3935522 bytes Desc: maker.log URL: From zhoux233 at mail2.sysu.edu.cn Sun May 24 17:43:43 2020 From: zhoux233 at mail2.sysu.edu.cn (=?utf-8?B?5ZGo6ZGr?=) Date: Mon, 25 May 2020 07:43:43 +0800 Subject: [maker-devel] Trouble in opening the registration page for Maker Message-ID: Hello, Developers of Maker, I'm a student from SYSU, China. Recently, I wanted to download Maker for my lab annotation work from your website, but I got in trouble opening the registration page for days, and I didn't figure out why. And I failed to install maker with conda, so could you please tell me how to deal with it? Or could you please send me a copy of source? If you are convenient to send me a copy, here is my information: Name: Zhou Xin Email address: zhoux233 at mail2.sysu.edu.cn Software needed: Maker PI name: Huang ShengFeng Research: Genome Annotation for zebrafish Institute: Life Science School, Sun Yat-Sen University Institute URL: http://lifesciences.sysu.edu.cn/ Country: China Province: GuangDong City: Guang Zhou If anything else needed, please email me, I will add it as soon as I see it. Anyway, thank you for your attention very much! Any reply will be appreciated very much! Regards! Zhou Xin -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue May 26 11:54:45 2020 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 26 May 2020 11:54:45 -0600 Subject: [maker-devel] different number of annotated genes and transcripts In-Reply-To: References: Message-ID: Perhaps you are counting wrong. If you want to know the number go genes, you must look at the GFF3. You can use ?grep -c -P ?\tgene\t? file.gff?, then the number of transcripts would be ?grep -c -P ?RNA\t? file.gff" Note that if you are using things like tRNAscan, you will get tRNA transcripts and associated genes. If you are trying to count from the fasta files, make sure you use the right file (maker.proteins.fasta and maker.transcripts.fasta). Thanks, Carson > On May 21, 2020, at 7:58 AM, Nicol?s Moreyra wrote: > > Dear all, > > First of all, thank you for sharing your experiences here. I tried to find this issue in the posts already made but failed. > Secondly, I am sorry for asking you a silly question (I think), but after I complete the genome annotation of four species, I obtained fewer transcripts than genes. I do not understand why MAKER annotated genes unable to transcribe. > I was trying to find the reason for this issue to discuss it in my thesis but I am a bit lost. Has this happened to anyone? Is there any possible cause that comes to mind? > > Thanks in advance. > > Nicol?s > > -- > Nicolas Nahuel Moreyra > BSc/MSc in Bioinformatics > CONICET PhD Fellow @ IEGEBA > PhD Student in Comparative Genomics @ EGE (FCEyN - UBA) -> nmoreyra at ege.fcen.uba.ar > Professor of Bioinformatics @ Favaloro University > Professor of Informatics @ IFTS N? 7 > Argentina > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue May 26 12:10:16 2020 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 26 May 2020 12:10:16 -0600 Subject: [maker-devel] Maker 0 genes after SNAP or with proteins.gff In-Reply-To: <84c5fc195df0fcc5e03484e65076fa9c@uni-duesseldorf.de> References: <84c5fc195df0fcc5e03484e65076fa9c@uni-duesseldorf.de> Message-ID: <053268A7-E5B0-4878-92F2-63B01869B677@gmail.com> You would have to look at the alignments, but I suspect they do not align in a way to the gene models to supply sufficient support for the annotation. If it is the maker2zff script producing 0 genes, that it because it requires at least some EST evidence. You can change that using the command line options. ?Carson > On Apr 24, 2020, at 8:27 AM, Ricardo Nuno Ferreira Martins Guerreiro wrote: > > Dear Makers list, > > > I am struggling with Maker after many successful attempts. I don't understand why but my final .gff does not contain any genes, 0. > > I am running first an Evidence based modelling, with proteins only. Here I get around 40 thousand genes if I give the proteins as a fasta to align (if I provide a protein.gff from a previous maker try, I get 0 genes, same problem). > > Afterwards I'm creating a SNAP hmm and running maker again, turning protein2genome=0 and snaphmm=snap.hmm as you say, but now I have 0 genes. This happens either I keep providing proteins as a fasta or as .gff of a previous run. > > I have done this many times and it always worked. The only difference now is that I am using no ESTs whatsoever, only proteins. It's also strange that it works on the first round of maker but doesn't work on the SNAP rounds. > > > Hope you can help, > Ricardo_______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Tue May 26 12:14:28 2020 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 26 May 2020 12:14:28 -0600 Subject: [maker-devel] Problems with openMPI in multiple computing nodes In-Reply-To: References: Message-ID: <78BFA69F-C631-4190-9F97-6B6ECC7AE15B@gmail.com> You don?t have berkleyDB installed on your system, so BioPerl is trying to fall back to another index format that has issues on network mounted file systems. You can try and install BerkleyDB then the related perl module (https://metacpan.org/pod/BerkeleyDB ). You would then need to reinstall BioPerl and MAKER. You can also try running on a single CPU until indexing finishes, then launch MAKER. That might be enough to get around any early race conditions. ?Carson > On Apr 26, 2020, at 12:58 AM, Xu, taosheng wrote: > > Hello, > I am using a computer cluster with 20 nodes(40cpus per node) for gene annotation. I submit my maker task to one node with 40 CPUs using openMPI. Everything is well. > But I encounter the problem when submitting the same maker task to the cluster with multiple nodes (120 cpus) There are errors shown below. > I would also appreciate any advice. Thank you. > > Best regards, > Taosheng > > > STATUS: Processing and indexing input FASTA files... > cannot remove directory for home/20200425/genome.maker.output/mpi_blastdb/te_proteins%2Efasta.mpi.10//.dbtmp0: No such file or directory at /maker/bin/../lib/FastaDB.pm line 145. > cannot remove directory for /home/20200425/genome.maker.output/mpi_blastdb/te_proteins%2Efasta.mpi.10//.dbtmp0: Directory not empty at /maker/bin/../lib/FastaDB.pm line 145. > cannot remove directory for /home/20200425/genome.maker.output/mpi_blastdb/te_proteins%2Efasta.mpi.10//.dbtmp0: Directory not empty at /maker/bin/../lib/FastaDB.pm line 145. > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue May 26 12:48:14 2020 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 26 May 2020 12:48:14 -0600 Subject: [maker-devel] Missing genes in lift-over with est2genome In-Reply-To: References: <373413EA-9D4C-44CF-AA51-632C0F54B7AC@gmail.com> Message-ID: If using the est_forward=1 options for the leftover, you can also anchor a search to a specific contig or region by adding a tag to the fasta header ( maker_coor=contig:1-1000; ). The tag will force Exonerate to only run on that region. Sometimes that can rescue a model. When you pass results into model_gff=, it will leave them unchanged. It just accepts or rejects them as is. But the model itself is considered evidence, and can alter clustering. Other_gff= just passes things through with no processing or evaluation (it?s like cut and paste). You can also try deFusion on result models for resolving gene fusions ?> https://wjidea.github.io/defusion/Introduction.html ?Carson > On Apr 30, 2020, at 6:58 AM, Lior Glick wrote: > > Thanks Carson - your answer was very helpful. > Another question related to the lift-over process, if I may. > I want to take the resulting gff and pass it on to another MAKER run, where I provide further, lower confidence evidence (ESTs and proteins). I'm not sure which option to use though. According to this helpful post , I tried using pred_gff and model_gff, but both created cases of fusion genes when genes are very adjacent to one another (see attached picture), even with the correct_est_fusion parameter enabled. It looks like the only way to take lifted-over genes "as-is" would be to use other_gff, but I figure that this was not really intended for genes. Would you recommend this usage? Am I missing something? > Thank you! > > ??????? ??? ??, 23 ????? 2020 ?-20:43 ??? ?Carson Holt?? ??>:? > There are percent cutoffs for the est2genome algorithm you can set in the maker_bopts.ctl file. Additionally, maker will give the alignment but not produce a gene model if it can?t translate through the est2genome alignment (i.e. stop codons in the assembly). I believe the cutoff is 50%. If you add est_forward=1 to the maker_opts.ctl file names will be copied from the alignment source and the score in the GFF3 column will be the percent match to the original transcript. > > ?Carson > > > > > On Apr 21, 2020, at 7:08 AM, Lior Glick > wrote: > > > > Hello, > > I am using MAKER to annotate a plant genome assembly. A high-quality reference genome and annotation exists for another variety of the same species, so my first step is lifting over reference genes to my genome. I do this by setting est2genome = 1 and providing MAKER with the reference cDNA (transcriptome). No other evidence is provided and no prediction is performed. Repeat masking is done using the reference repeats library. > > When checking the results, I found out lots of reference genes missing from the lift-over result. However, if I blast the sequences of these genes myself, I get good matches. I even see these matches when I look at the blast results buried in the MAKER data_store. > > For example, a transcript of length 1077 got a match of length 855 - 100% identity and no gaps. Bitscore was 1709 and E-value 0. This looks like a pretty good match, but it is not found in the final MAKER results (gff/fasta). > > Why is this happening? Are there some cutoffs that are not satisfied? If so, what are they and how can they be configured? > > > > Thanks, > > Lior > > _______________________________________________ > > maker-devel mailing list > > maker-devel at yandell-lab.org > > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue May 26 12:51:52 2020 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 26 May 2020 12:51:52 -0600 Subject: [maker-devel] Multiple UTR ? In-Reply-To: <16630d833d1448e7a771a4f2b19b0476@unil.ch> References: <16630d833d1448e7a771a4f2b19b0476@unil.ch> Message-ID: <003DC610-63E0-4D04-8CD2-431C92337C5F@gmail.com> The UTR is split across two exons. The intron is not considered part of the UTR. The UTR exists in the post splicing mRNA, so the corresponding genomic coordinates will have gaps because the introns that exist in the genome have been spliced out of the mRNA. So while the UTR is continuous in the mRNA, it is punctuated in the genome. ?Carson > On May 3, 2020, at 5:39 AM, Patrick Tran Van wrote: > > Hi Carson, > > for instance, if have this: > > SCFXX maker > five_prime_UTR 5164370 > 5164715 . > - . ID=GENE-RA:five_prime_utr;Parent=GENE-RA; > SCFXX maker > five_prime_UTR 5156091 > 5156136 . > - . ID=GENE-RA:five_prime_utr;Parent=GENE-RA; > > > Does it mean that real coordinate of the 5' UTR is from 5156091 to 5164715 ? > > Patrick Tran Van > > Bioinformatician: Lab Chapuisat & Schwander > Department of Ecology and Evolution > University of Lausanne > Lausanne - Switzerland > Office 3206 > From: Carson Holt > > Sent: Wednesday, February 26, 2020 8:27:43 PM > To: Patrick Tran Van > Cc: maker-devel at yandell-lab.org > Subject: Re: [maker-devel] Multiple UTR ? > > Sorry for the very slow reply. I found this way way down in my inbox. > > The UTR features are the parts of the exons that are not CDS. So multiple UTR, means it spans multiple exons, and must assembled to generate the full UTR in a browser. Any exon that is fully non-coding will produce a UTR feature that mirrors an exons coordinates, and if it?s partially coding the UTR will share the same start or end by will terminate somewhere in the middle with a CDS filling up the remains coordinates. The UTR and CDS features get tiled over the top of the exon features when assembling a gene model. > > ?Carson > > > >> On Dec 18, 2019, at 7:19 AM, Patrick Tran Van > wrote: >> >> Hi Carson, >> >> I have seen something strange in my annotation: multiple UTR. How can we explain this ? Thanks! >> >> Scaffold maker >> mRNA 12117462 >> 12128433 . >> - . >> ID=GENE_02395-RA;Parent=GENE_02395;Name=GENE_02395-RA;Alias=maker-Scaffold-augustus-gene-40.12-mRNA-3;_AED=0.02;_QI=5383|1|1|1|0.88|0.9|10|247|238;_eAED=0.02;Note=Protein of unknown function; >> Scaffold maker >> exon 12128112 >> 12128433 . >> - . >> ID=GENE_02395-RA:exon:571;Parent=GENE_02395-RA; >> Scaffold maker >> exon 12117462 >> 12118046 . >> - . >> ID=GENE_02395-RB:exon:569;Parent=GENE_02395-RB,GENE_02395-RC,GENE_02395-RA; >> Scaffold maker >> exon 12118141 >> 12118301 . >> - . >> ID=GENE_02395-RB:exon:568;Parent=GENE_02395-RB,GENE_02395-RC,GENE_02395-RA; >> Scaffold maker >> exon 12118386 >> 12118539 . >> - . >> ID=GENE_02395-RB:exon:567;Parent=GENE_02395-RB,GENE_02395-RA; >> Scaffold maker >> exon 12118818 >> 12122493 . >> - . >> ID=GENE_02395-RB:exon:566;Parent=GENE_02395-RB,GENE_02395-RA; >> Scaffold maker >> exon 12123591 >> 12123893 . >> - . >> ID=GENE_02395-RB:exon:565;Parent=GENE_02395-RB,GENE_02395-RC,GENE_02395-RA; >> Scaffold maker >> exon 12123995 >> 12124303 . >> - . >> ID=GENE_02395-RB:exon:564;Parent=GENE_02395-RB,GENE_02395-RC,GENE_02395-RA; >> Scaffold maker >> exon 12125119 >> 12125418 . >> - . >> ID=GENE_02395-RB:exon:563;Parent=GENE_02395-RB,GENE_02395-RC,GENE_02395-RA; >> Scaffold maker >> exon 12126005 >> 12126313 . >> - . >> ID=GENE_02395-RB:exon:562;Parent=GENE_02395-RB,GENE_02395-RC,GENE_02395-RA; >> Scaffold maker >> exon 12127460 >> 12127687 . >> - . >> ID=GENE_02395-RB:exon:561;Parent=GENE_02395-RB,GENE_02395-RC,GENE_02395-RA; >> Scaffold maker >> five_prime_UTR 12128112 >> 12128433 . >> - . >> ID=GENE_02395-RA:five_prime_utr;Parent=GENE_02395-RA; >> Scaffold maker >> five_prime_UTR 12127460 >> 12127687 . >> - . >> ID=GENE_02395-RA:five_prime_utr;Parent=GENE_02395-RA; >> Scaffold maker >> five_prime_UTR 12126005 >> 12126313 . >> - . >> ID=GENE_02395-RA:five_prime_utr;Parent=GENE_02395-RA; >> Scaffold maker >> five_prime_UTR 12125119 >> 12125418 . >> - . >> ID=GENE_02395-RA:five_prime_utr;Parent=GENE_02395-RA; >> Scaffold maker >> five_prime_UTR 12123995 >> 12124303 . >> - . >> ID=GENE_02395-RA:five_prime_utr;Parent=GENE_02395-RA; >> Scaffold maker >> five_prime_UTR 12123591 >> 12123893 . >> - . >> ID=GENE_02395-RA:five_prime_utr;Parent=GENE_02395-RA; >> Scaffold maker >> five_prime_UTR 12118882 >> 12122493 . >> - . >> ID=GENE_02395-RA:five_prime_utr;Parent=GENE_02395-RA; >> Scaffold maker >> CDS 12118818 >> 12118881 . >> - 0 >> ID=GENE_02395-RA:cds;Parent=GENE_02395-RA; >> Scaffold maker >> CDS 12118386 >> 12118539 . >> - 2 >> ID=GENE_02395-RA:cds;Parent=GENE_02395-RA; >> Scaffold maker >> CDS 12118141 >> 12118301 . >> - 1 >> ID=GENE_02395-RA:cds;Parent=GENE_02395-RA; >> Scaffold maker >> CDS 12117709 >> 12118046 . >> - 2 >> ID=GENE_02395-RA:cds;Parent=GENE_02395-RA; >> Scaffold maker >> three_prime_UTR 12117462 >> 12117708 . >> - . >> ID=GENE_02395-RA:three_prime_utr;Parent=GENE_02395-RA; >> >> >> >> Patrick Tran Van >> >> Bioinformatician: Lab Chapuisat & Schwander >> Department of Ecology and Evolution >> University of Lausanne >> Lausanne - Switzerland >> Office 3206 >> _______________________________________________ >> maker-devel mailing list >> maker-devel at yandell-lab.org >> http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From niconm89 at gmail.com Tue May 26 12:58:22 2020 From: niconm89 at gmail.com (=?UTF-8?Q?Nicol=C3=A1s_Moreyra?=) Date: Tue, 26 May 2020 15:58:22 -0300 Subject: [maker-devel] different number of annotated genes and transcripts In-Reply-To: References: Message-ID: Hi Carson, thanks for your reply. Yes, I did the same as you. Here are different outputs for the same annotation file: > grep -c -P "\tgene\t" Dato_struct-annot.noseq.gff > 17688 > grep -c -P "RNA\t" Dato_struct-annot.noseq.gff > 17688 > grep -c -P "mRNA\t" Dato_struct-annot.noseq.gff > 17205 > grep -P "RNA\t" Dato_struct-annot.noseq.gff| cut -f3 | sort -u > mRNA > tRNA After using a tool to extract transcripts sequences in a Fasta file, y obtained 17205 sequences. Looking for those genes without an associated transcript, it seems that you can only find tRNAs annotated there. It is odd: > Backbone_23 maker gene 486041 486112 . - . > ID=Dato03103;Name=Dato03103;Alias=trnascan-Backbone_23-noncoding-Glu_CTC-gene-4.38; > Backbone_23 maker tRNA 486041 486112 . - . > ID=Dato03103-RA;Parent=Dato03103;Name=Dato03103-RA;_AED=1.00;_QI=0|-1|0|0|-1|0|1|73|0;_eAED=1.00; > Backbone_23 maker exon 486041 486112 . - . > ID=Dato03103-RA:exon:45875;Parent=Dato03103-RA; The AED is bad in this example, so I'm thinking that it would be possible this gene had no evidence supporting it. I do not understand either the "Alias" for the gene line, it looks like trnaScan detected the gene. Any ideas? Nicol?s *--* *Nicolas Nahuel Moreyra* *BSc/MSc in Bioinformatics* *CONICET PhD Fellow @ IEGEBA* *PhD Student in Comparative Genomics @ EGE (**FCEyN - UBA) **-> **nmoreyra at ege.fcen.uba.ar * Professor of Bioinformatics @ Favaloro University Professor of Informatics @ IFTS N? 7 *Argentina* El mar., 26 de may. de 2020 a la(s) 14:54, Carson Holt (carsonhh at gmail.com) escribi?: > Perhaps you are counting wrong. If you want to know the number go genes, > you must look at the GFF3. You can use ?grep -c -P ?\tgene\t? file.gff?, > then the number of transcripts would be ?grep -c -P ?RNA\t? file.gff" > > Note that if you are using things like tRNAscan, you will get tRNA > transcripts and associated genes. If you are trying to count from the > fasta files, make sure you use the right file (maker.proteins.fasta and > maker.transcripts.fasta). > > Thanks, > Carson > > > On May 21, 2020, at 7:58 AM, Nicol?s Moreyra wrote: > > Dear all, > > First of all, thank you for sharing your experiences here. I tried to find > this issue in the posts already made but failed. > Secondly, I am sorry for asking you a silly question (I think), but after > I complete the genome annotation of four species, I obtained fewer > transcripts than genes. I do not understand why MAKER annotated genes > unable to transcribe. > I was trying to find the reason for this issue to discuss it in my thesis > but I am a bit lost. Has this happened to anyone? Is there any possible > cause that comes to mind? > > Thanks in advance. > > Nicol?s > > *--* > *Nicolas Nahuel Moreyra* > *BSc/MSc in Bioinformatics* > *CONICET PhD Fellow @ IEGEBA* > *PhD Student in Comparative Genomics @ EGE (**FCEyN - UBA) **-> **nmoreyra at ege.fcen.uba.ar > * > Professor of Bioinformatics @ Favaloro University > Professor of Informatics @ IFTS N? 7 > *Argentina* > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue May 26 13:01:38 2020 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 26 May 2020 13:01:38 -0600 Subject: [maker-devel] Unable to reproduce MAKER blastn results In-Reply-To: References: Message-ID: <39657402-9B41-4B82-8164-199ECCA634AC@gmail.com> The only downstream change to the blast results would be the removal of HSPs not meeting the bit_blastn of a minimum bitscore. Also note the prove is not a blast parameter. It is a post blast filter. The HSPs are tiled and flattened, then the percent coverage against the original query is calculated (i.e if every base of the query is represented at least once in the result, then coverage is 100% ). The blast results are used only for identifying the rough region a model overlaps that is then passed to exonerate. The exonerate alignment is used to generate the splice aware est2genome model. Many good blastn alignments will produce poor exonerate alignments, and no est2genome results. ?Carson > On May 5, 2020, at 3:46 AM, Lior Glick wrote: > > Hello, > > I am running MAKER 2.31.10 with a very simple configuration, with only EST evidence and the est2genome option enabled (basically a lift-over procedure). > I noticed that some of my transcripts are not included in the annotation output and when I looked at the blastn results the reason was clear - they do not pass the coverage cutoff defined in maker_bopts.ctl. Interestingly, when I tried running blastn myself, using the same command (taken from the maker log) and the same blastn version, I got slightly different results. Specifically, for some of the transcripts the MAKER blastn run produced less HSPs than my blastn run, resulting in a lower total coverage. The additional HSPs seem to have good % identity and E-values, so I don't understand why and how they are discarded. Are the blastn results changed by MAKER in subsequent steps (after the blastn run)? > Please find attached blastn results from MAKER run and from my run. You can look at transcript AT1G01740.3 as an example. in my.blastn, there are 8 HSPs, while MAKER.blastn only has 3 of them. > Can you explain the difference? Maybe it has to do with repeat masking or other processing of the genome sequence? > > Just to make sure you have all the details: > Relevant maker_bopts parameters: > pcov_blastn=0.7 #Blastn Percent Coverage Threhold EST-Genome Alignments > pid_blastn=0.85 #Blastn Percent Identity Threshold EST-Genome Aligments > eval_blastn=1e-10 #Blastn eval cutoff > bit_blastn=40 #Blastn bit cutoff > depth_blastn=0 #Blastn depth cutoff (0 to disable cutoff) > > Blastn command: > blastn -db /groups/itay_mayrose/nosnap/liorglic/Projects/PGCM/output/A_thaliana_pan_genome/PGC_de_novo/RESULT_RG_new/per_sample/col-0/liftover_SRR1945757/chunks/chunk00.fa/TMP/maker_sPf3Rf/TAIR10_longest_trans%2Efasta.mpi.10.0 -query /groups/itay_mayrose/nosnap/liorglic/Projects/PGCM/output/A_thaliana_pan_genome/PGC_de_novo/RESULT_RG_new/per_sample/col-0/liftover_SRR1945757/chunks/chunk00.fa/TMP/maker_sPf3Rf/0/chunk00.0 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-10 -word_size 28 -reward 1 -penalty -5 -gapopen 5 -gapextend 5 -dbsize 1000 -searchsp 500000000 -num_threads 10 -lcase_masking -dust yes -soft_masking true -show_gis -out > > Thank you! > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Tue May 26 13:03:31 2020 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 26 May 2020 13:03:31 -0600 Subject: [maker-devel] Question about maker. Maker2 failed In-Reply-To: References: Message-ID: <2F9626D1-F7C2-4BDA-A59F-C6566C0D558D@gmail.com> It is probably the formating of the models provided. There is something wrong with them. They must be match/match_part two level feature for rm_gff. You can send us the file, and I can take a look if it helps. ?Carson > On May 5, 2020, at 2:41 PM, ??? wrote: > > Hi maker developer, > > I'm using maker 2 to annotate a vertebrate genome. > When I try to provide rm_gff file, it always fails. > Here is log: > Now starting the contig!! > SeqID: chr_XXII > Length: 12689475 > #--------------------------------------------------------------------- > > > setting up GFF3 output and fasta chunks > doing repeat masking > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Did not specify a Hit End or Hit Begin > STACK: Error::throw > STACK: Bio::Root::Root::throw /gpfs/homefs/iee/zl19g775/miniconda3/envs/maker/lib/site_perl/5.26.2/Bio/Root/Root.pm:449 > STACK: Bio::Search::HSP::GenericHSP::_subject_seq_feature /gpfs/homefs/iee/zl19g775/miniconda3/envs/maker/lib/site_perl/5.26.2/Bio/Search/HSP/GenericHSP.pm:1604 > STACK: Bio::Search::HSP::GenericHSP::hit /gpfs/homefs/iee/zl19g775/miniconda3/envs/maker/lib/site_perl/5.26.2/Bio/Search/HSP/GenericHSP.pm:988 > STACK: repeat_mask_seq::separate_types /gpfs/homefs/iee/zl19g775/miniconda3/envs/maker/bin/../lib/repeat_mask_seq.pm:307 > STACK: repeat_mask_seq::mask_chunk /gpfs/homefs/iee/zl19g775/miniconda3/envs/maker/bin/../lib/repeat_mask_seq.pm:191 > STACK: Process::MpiChunk::_go /gpfs/homefs/iee/zl19g775/miniconda3/envs/maker/bin/../lib/Process/MpiChunk.pm:763 > STACK: Process::MpiChunk::run /gpfs/homefs/iee/zl19g775/miniconda3/envs/maker/bin/../lib/Process/MpiChunk.pm:341 > STACK: Process::MpiChunk::run_all /gpfs/homefs/iee/zl19g775/miniconda3/envs/maker/bin/../lib/Process/MpiChunk.pm:357 > STACK: Process::MpiTiers::run_all /gpfs/homefs/iee/zl19g775/miniconda3/envs/maker/bin/../lib/Process/MpiTiers.pm:287 > STACK: Process::MpiTiers::run_all /gpfs/homefs/iee/zl19g775/miniconda3/envs/maker/bin/../lib/Process/MpiTiers.pm:287 > STACK: /home/ubelix/iee/zl19g775/miniconda3/envs/maker/bin/maker:689 > ----------------------------------------------------------- > --> rank=NA, hostname=submit02.ubelix.unibe.ch > ERROR: Failed while doing repeat masking > ERROR: Chunk failed at level:0, tier_type:1 > FAILED CONTIG:chr_XXII > > ERROR: Chunk failed at level:2, tier_type:0 > FAILED CONTIG:chr_XXII > > examining contents of the fasta file and run log > > > > I also searched the google group and tried update my bioperl to 1.7.7 the latest version, but it didn't help. > > Could you please help me? > > Thanks a lot. > > Zuyao > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue May 26 13:15:20 2020 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 26 May 2020 13:15:20 -0600 Subject: [maker-devel] gene:multiple_Einit and overlaps_prev_exon errors in first round of SNAP training In-Reply-To: References: Message-ID: <92F27280-90B8-4AF2-8C1D-B54955A7C521@gmail.com> You have an ID collision in the GFF3. Check the gff3 being sent to maker2zff. If you are using GFF3 as input to MAKER, you likely have a non-unique ID's there that are causing the issue in the first place. ?Carson > On May 6, 2020, at 11:36 AM, Carneiro,Celine M wrote: > > Hello, > > I am getting the errors gene:multiple_Einit, gene:multiple_Eterm, and exon:overlaps_prev_exon, at just about every gene model. I've ran the first round of maker on a bird genome I'm annotating with no errors and have started the steps to train SNAP. However, after running fathom -categorize, just about every single gene model has the same set of errors. Here is an example from my log file after running fathom -categorize: > > MODEL117 1 1 8 - errors(6): gene:multiple_Einit gene:multiple_Eterm exon-7:overlaps_prev_exon exon-5:overlaps_prev_exon exon-3:overlaps_prev_exon exon-1:overlaps_prev_exon > MODEL851 1 1 100 - errors(78): gene:multiple_Einit gene:multiple_Eterm exon-99:overlaps_prev_exon exon-98:overlaps_prev_exon exon-97:overlaps_prev_exon exon-95:overlaps_prev_exon exon-94:overlaps_prev_exon exon-93:overlaps_prev_exon exon-91:overlaps_prev_exon exon-90:overlaps_prev_exon exon-89:overlaps_prev_exon exon-87:overlaps_prev_exon exon-86:overlaps_prev_exon exon-85:overlaps_prev_exon exon-83:overlaps_prev_exon exon-82:overlaps_prev_exon exon-81:overlaps_prev_exon exon-79:overlaps_prev_exon exon-78:overlaps_prev_exon exon-77:overlaps_prev_exon exon-75:overlaps_prev_exon exon-74:overlaps_prev_exon exon-73:overlaps_prev_exon exon-71:overlaps_prev_exon exon-70:overlaps_prev_exon exon-69:overlaps_prev_exon exon-67:overlaps_prev_exon exon-66:overlaps_prev_exon exon-65:overlaps_prev_exon exon-63:overlaps_prev_exon exon-62:overlaps_prev_exon exon-61:overlaps_prev_exon exon-59:overlaps_prev_exon exon-58:overlaps_prev_exon exon-57:overlaps_prev_exon exon-55:overlaps_prev_exon exon-54:overlaps_prev_exon exon-53:overlaps_prev_exon exon-51:overlaps_prev_exon exon-50:overlaps_prev_exon exon-49:overlaps_prev_exon exon-48:overlaps_prev_exon exon-47:overlaps_prev_exon exon-46:overlaps_prev_exon exon-45:overlaps_prev_exon exon-43:overlaps_prev_exon exon-42:overlaps_prev_exon exon-41:overlaps_prev_exon exon-39:overlaps_prev_exon exon-38:overlaps_prev_exon exon-37:overlaps_prev_exon exon-35:overlaps_prev_exon exon-34:overlaps_prev_exon exon-33:overlaps_prev_exon exon-31:overlaps_prev_exon exon-30:overlaps_prev_exon exon-29:overlaps_prev_exon exon-27:overlaps_prev_exon exon-26:overlaps_prev_exon exon-25:overlaps_prev_exon exon-23:overlaps_prev_exon exon-22:overlaps_prev_exon exon-21:overlaps_prev_exon exon-19:overlaps_prev_exon exon-18:overlaps_prev_exon exon-17:overlaps_prev_exon exon-15:overlaps_prev_exon exon-14:overlaps_prev_exon exon-13:overlaps_prev_exon exon-11:overlaps_prev_exon exon-10:overlaps_prev_exon exon-9:overlaps_prev_exon exon-7:overlaps_prev_exon exon-6:overlaps_prev_exon exon-5:overlaps_prev_exon exon-3:overlaps_prev_exon exon-2:overlaps_prev_exon exon-1:overlaps_prev_exon > MODEL190 1 1 39 + errors(35): gene:multiple_Einit gene:multiple_Eterm exon-2:overlaps_prev_exon exon-3:overlaps_prev_exon exon-4:overlaps_prev_exon exon-5:overlaps_prev_exon exon-6:overlaps_prev_exon exon-7:overlaps_prev_exon exon-8:overlaps_prev_exon exon-9:overlaps_prev_exon exon-11:overlaps_prev_exon exon-12:overlaps_prev_exon exon-13:overlaps_prev_exon exon-14:overlaps_prev_exon exon-15:overlaps_prev_exon exon-16:overlaps_prev_exon exon-17:overlaps_prev_exon exon-18:overlaps_prev_exon exon-20:overlaps_prev_exon exon-21:overlaps_prev_exon exon-22:overlaps_prev_exon exon-23:overlaps_prev_exon exon-24:overlaps_prev_exon exon-25:overlaps_prev_exon exon-26:overlaps_prev_exon exon-27:overlaps_prev_exon exon-29:overlaps_prev_exon exon-30:overlaps_prev_exon exon-32:overlaps_prev_exon exon-33:overlaps_prev_exon exon-34:overlaps_prev_exon exon-35:overlaps_prev_exon exon-36:overlaps_prev_exon exon-38:overlaps_prev_exon exon-39:overlaps_prev_exon > MODEL424 1 1 10 - errors(8): gene:multiple_Einit gene:multiple_Eterm exon-9:overlaps_prev_exon exon-7:overlaps_prev_exon exon-6:overlaps_prev_exon exon-5:overlaps_prev_exon exon-3:overlaps_prev_exon exon-1:overlaps_prev_exon > MODEL902 1 1 20 - errors(14): gene:multiple_Einit gene:multiple_Eterm exon-19:overlaps_prev_exon exon-18:overlaps_prev_exon exon-17:overlaps_prev_exon exon-15:overlaps_prev_exon exon-13:overlaps_prev_exon exon-11:overlaps_prev_exon exon-9:overlaps_prev_exon exon-7:overlaps_prev_exon exon-6:overlaps_prev_exon exon-5:overlaps_prev_exon exon-3:overlaps_prev_exon exon-1:overlaps_prev_exon > MODEL238 1 1 14 - errors(11): gene:multiple_Einit gene:multiple_Eterm exon-13:overlaps_prev_exon exon-12:overlaps_prev_exon exon-11:overlaps_prev_exon exon-9:overlaps_prev_exon exon-7:overlaps_prev_exon exon-6:overlaps_prev_exon exon-5:overlaps_prev_exon exon-3:overlaps_prev_exon exon-1:overlaps_prev_exon > MODEL39 1 1 6 - errors(1): exon-3:overlaps_prev_exon > MODEL119 1 1 10 + errors(8): gene:multiple_Einit gene:multiple_Eterm exon-2:overlaps_prev_exon exon-4:overlaps_prev_exon exon-6:overlaps_prev_exon exon-7:overlaps_prev_exon exon-8:overlaps_prev_exon exon-10:overlaps_prev_exon > > Furthermore, I checked my genome.ann file and noticed that my Einit and Exon sites are duplicated. For example: > > >ScdimlH_1004;HRSCAF=1084 > Einit 38730 38677 MODEL851 > Exon 38255 38178 MODEL851 > Exon 38074 38021 MODEL851 > Exon 24755 24717 MODEL851 > Exon 24213 24149 MODEL851 > Exon 23176 23098 MODEL851 > Exon 22037 21961 MODEL851 > Exon 21269 21080 MODEL851 > Exon 20232 20167 MODEL851 > Exon 19742 19704 MODEL851 > Exon 14705 14590 MODEL851 > Exon 14255 13980 MODEL851 > Exon 14169 13980 MODEL851 > Exon 13303 13223 MODEL851 > Exon 13303 13223 MODEL851 > Exon 12782 12639 MODEL851 > Exon 12782 12639 MODEL851 > Exon 5761 5592 MODEL851 > Exon 5482 5404 MODEL851 > Exon 5140 5064 MODEL851 > Exon 4951 4750 MODEL851 > Exon 4567 4502 MODEL851 > Exon 4256 4185 MODEL851 > Exon 3569 3403 MODEL851 > Exon 3157 3076 MODEL851 > Exon 2936 2800 MODEL851 > Eterm 2186 2000 MODEL851 > Einit 38730 38677 MODEL851 > Exon 38255 38178 MODEL851 > Exon 38074 38021 MODEL851 > Exon 24755 24717 MODEL851 > Exon 24213 24149 MODEL851 > Exon 23176 23098 MODEL851 > Exon 22037 21961 MODEL851 > Exon 21269 21080 MODEL851 > Exon 20232 20167 MODEL851 > Exon 19742 19704 MODEL851 > Exon 14705 14590 MODEL851 > Exon 14255 13980 MODEL851 > Exon 14169 13980 MODEL851 > Exon 13303 13223 MODEL851 > Exon 13303 13223 MODEL851 > Exon 12782 12639 MODEL851 > Exon 12782 12639 MODEL851 > Exon 5761 5592 MODEL851 > Exon 5482 5404 MODEL851 > Exon 5140 5064 MODEL851 > Exon 4951 4750 MODEL851 > Exon 4567 4502 MODEL851 > Exon 4256 4185 MODEL851 > Exon 3569 3403 MODEL851 > Exon 3157 3076 MODEL851 > Exon 2936 2800 MODEL851 > Eterm 2186 2000 MODEL851 > > Any ideas why I'm seeing this duplication? Lastly, any ideas why my exons are overlapping so much? I appreciate any input and please let me know if you require any more information. > > Thank you! > > Celine > > > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue May 26 13:26:03 2020 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 26 May 2020 13:26:03 -0600 Subject: [maker-devel] Maker v3.01 change-log + 3'UTR question In-Reply-To: References: Message-ID: <07D0AA31-0FCF-4E60-AC50-1E377CBF870C@gmail.com> 2. If I was to improve the annotation of my 3? UTRs within a certain (non-model species) gff3, is there a particular way or a protocol to follow? I was thinking for example that Lexogen has released their 3? UTR kit for RNA-seq of the three prime end of transcripts. Would it be possible to feed those reads to maker and somehow suggest that the reads are originating from the three-prime end so that this info is then passed in the gff3 file? You could pass final models back in as predicted_gff (no UTR on the models), then pass in just the evidence you want as UTR as est_gff (would have to be assembled and not as individual reads). As long as the overlap the pred_gff models, MAKER would try and make UTR out of them. Might be worth an experiment. ?Carson -------------- next part -------------- An HTML attachment was scrubbed... URL: From shawn.trojahn at wsu.edu Fri May 29 15:49:38 2020 From: shawn.trojahn at wsu.edu (Trojahn, Shawn Michael) Date: Fri, 29 May 2020 21:49:38 +0000 Subject: [maker-devel] Intron lengths below minimum cutoff Message-ID: Hello, I have been having a problem with the final annotation coming from Maker2 where I have a few thousand introns that are below the minimum intron value I have set. Most of the exons around these problem introns have no support in the final merged gff file, but a few are supported by blast hits. Is there a reason why these introns would remain in the final gff? Thanks, Shawn -------------- next part -------------- An HTML attachment was scrubbed... URL: From patrick.tranvan at unil.ch Sun May 3 05:39:59 2020 From: patrick.tranvan at unil.ch (Patrick Tran Van) Date: Sun, 3 May 2020 11:39:59 +0000 Subject: [maker-devel] Multiple UTR ? In-Reply-To: References: , Message-ID: <16630d833d1448e7a771a4f2b19b0476@unil.ch> Hi Carson, for instance, if have this: SCFXX maker five_prime_UTR 5164370 5164715 . - . ID=GENE-RA:five_prime_utr;Parent=GENE-RA; SCFXX maker five_prime_UTR 5156091 5156136 . - . ID=GENE-RA:five_prime_utr;Parent=GENE-RA; Does it mean that real coordinate of the 5' UTR is from 5156091 to 5164715 ? Patrick Tran Van Bioinformatician: Lab Chapuisat & Schwander Department of Ecology and Evolution University of Lausanne Lausanne - Switzerland Office 3206 ________________________________ From: Carson Holt Sent: Wednesday, February 26, 2020 8:27:43 PM To: Patrick Tran Van Cc: maker-devel at yandell-lab.org Subject: Re: [maker-devel] Multiple UTR ? Sorry for the very slow reply. I found this way way down in my inbox. The UTR features are the parts of the exons that are not CDS. So multiple UTR, means it spans multiple exons, and must assembled to generate the full UTR in a browser. Any exon that is fully non-coding will produce a UTR feature that mirrors an exons coordinates, and if it?s partially coding the UTR will share the same start or end by will terminate somewhere in the middle with a CDS filling up the remains coordinates. The UTR and CDS features get tiled over the top of the exon features when assembling a gene model. ?Carson On Dec 18, 2019, at 7:19 AM, Patrick Tran Van > wrote: Hi Carson, I have seen something strange in my annotation: multiple UTR. How can we explain this ? Thanks! Scaffold maker mRNA 12117462 12128433 . - . ID=GENE_02395-RA;Parent=GENE_02395;Name=GENE_02395-RA;Alias=maker-Scaffold-augustus-gene-40.12-mRNA-3;_AED=0.02;_QI=5383|1|1|1|0.88|0.9|10|247|238;_eAED=0.02;Note=Protein of unknown function; Scaffold maker exon 12128112 12128433 . - . ID=GENE_02395-RA:exon:571;Parent=GENE_02395-RA; Scaffold maker exon 12117462 12118046 . - . ID=GENE_02395-RB:exon:569;Parent=GENE_02395-RB,GENE_02395-RC,GENE_02395-RA; Scaffold maker exon 12118141 12118301 . - . ID=GENE_02395-RB:exon:568;Parent=GENE_02395-RB,GENE_02395-RC,GENE_02395-RA; Scaffold maker exon 12118386 12118539 . - . ID=GENE_02395-RB:exon:567;Parent=GENE_02395-RB,GENE_02395-RA; Scaffold maker exon 12118818 12122493 . - . ID=GENE_02395-RB:exon:566;Parent=GENE_02395-RB,GENE_02395-RA; Scaffold maker exon 12123591 12123893 . - . ID=GENE_02395-RB:exon:565;Parent=GENE_02395-RB,GENE_02395-RC,GENE_02395-RA; Scaffold maker exon 12123995 12124303 . - . ID=GENE_02395-RB:exon:564;Parent=GENE_02395-RB,GENE_02395-RC,GENE_02395-RA; Scaffold maker exon 12125119 12125418 . - . ID=GENE_02395-RB:exon:563;Parent=GENE_02395-RB,GENE_02395-RC,GENE_02395-RA; Scaffold maker exon 12126005 12126313 . - . ID=GENE_02395-RB:exon:562;Parent=GENE_02395-RB,GENE_02395-RC,GENE_02395-RA; Scaffold maker exon 12127460 12127687 . - . ID=GENE_02395-RB:exon:561;Parent=GENE_02395-RB,GENE_02395-RC,GENE_02395-RA; Scaffold maker five_prime_UTR 12128112 12128433 . - . ID=GENE_02395-RA:five_prime_utr;Parent=GENE_02395-RA; Scaffold maker five_prime_UTR 12127460 12127687 . - . ID=GENE_02395-RA:five_prime_utr;Parent=GENE_02395-RA; Scaffold maker five_prime_UTR 12126005 12126313 . - . ID=GENE_02395-RA:five_prime_utr;Parent=GENE_02395-RA; Scaffold maker five_prime_UTR 12125119 12125418 . - . ID=GENE_02395-RA:five_prime_utr;Parent=GENE_02395-RA; Scaffold maker five_prime_UTR 12123995 12124303 . - . ID=GENE_02395-RA:five_prime_utr;Parent=GENE_02395-RA; Scaffold maker five_prime_UTR 12123591 12123893 . - . ID=GENE_02395-RA:five_prime_utr;Parent=GENE_02395-RA; Scaffold maker five_prime_UTR 12118882 12122493 . - . ID=GENE_02395-RA:five_prime_utr;Parent=GENE_02395-RA; Scaffold maker CDS 12118818 12118881 . - 0 ID=GENE_02395-RA:cds;Parent=GENE_02395-RA; Scaffold maker CDS 12118386 12118539 . - 2 ID=GENE_02395-RA:cds;Parent=GENE_02395-RA; Scaffold maker CDS 12118141 12118301 . - 1 ID=GENE_02395-RA:cds;Parent=GENE_02395-RA; Scaffold maker CDS 12117709 12118046 . - 2 ID=GENE_02395-RA:cds;Parent=GENE_02395-RA; Scaffold maker three_prime_UTR 12117462 12117708 . - . ID=GENE_02395-RA:three_prime_utr;Parent=GENE_02395-RA; Patrick Tran Van Bioinformatician: Lab Chapuisat & Schwander Department of Ecology and Evolution University of Lausanne Lausanne - Switzerland Office 3206 _______________________________________________ maker-devel mailing list maker-devel at yandell-lab.org http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From liorglic at mail.tau.ac.il Tue May 5 03:46:01 2020 From: liorglic at mail.tau.ac.il (Lior Glick) Date: Tue, 5 May 2020 12:46:01 +0300 Subject: [maker-devel] Unable to reproduce MAKER blastn results Message-ID: Hello, I am running MAKER 2.31.10 with a very simple configuration, with only EST evidence and the est2genome option enabled (basically a lift-over procedure). I noticed that some of my transcripts are not included in the annotation output and when I looked at the blastn results the reason was clear - they do not pass the coverage cutoff defined in maker_bopts.ctl. Interestingly, when I tried running blastn myself, using the same command (taken from the maker log) and the same blastn version, I got slightly different results. Specifically, for some of the transcripts the MAKER blastn run produced less HSPs than my blastn run, resulting in a lower total coverage. The additional HSPs seem to have good % identity and E-values, so I don't understand why and how they are discarded. Are the blastn results changed by MAKER in subsequent steps (after the blastn run)? Please find attached blastn results from MAKER run and from my run. You can look at transcript AT1G01740.3 as an example. in my.blastn, there are 8 HSPs, while MAKER.blastn only has 3 of them. Can you explain the difference? Maybe it has to do with repeat masking or other processing of the genome sequence? Just to make sure you have all the details: Relevant maker_bopts parameters: pcov_blastn=0.7 #Blastn Percent Coverage Threhold EST-Genome Alignments pid_blastn=0.85 #Blastn Percent Identity Threshold EST-Genome Aligments eval_blastn=1e-10 #Blastn eval cutoff bit_blastn=40 #Blastn bit cutoff depth_blastn=0 #Blastn depth cutoff (0 to disable cutoff) Blastn command: blastn -db /groups/itay_mayrose/nosnap/liorglic/Projects/PGCM/output/A_thaliana_pan_genome/PGC_de_novo/RESULT_RG_new/per_sample/col-0/liftover_SRR1945757/chunks/chunk00.fa/TMP/maker_sPf3Rf/TAIR10_longest_trans%2Efasta.mpi.10.0 -query /groups/itay_mayrose/nosnap/liorglic/Projects/PGCM/output/A_thaliana_pan_genome/PGC_de_novo/RESULT_RG_new/per_sample/col-0/liftover_SRR1945757/chunks/chunk00.fa/TMP/maker_sPf3Rf/0/chunk00.0 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-10 -word_size 28 -reward 1 -penalty -5 -gapopen 5 -gapextend 5 -dbsize 1000 -searchsp 500000000 -num_threads 10 -lcase_masking -dust yes -soft_masking true -show_gis -out Thank you! -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: my.blastn Type: application/octet-stream Size: 440090 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: MAKER.blastn Type: application/octet-stream Size: 1143936 bytes Desc: not available URL: From zuyao.liu.0910 at gmail.com Tue May 5 14:41:25 2020 From: zuyao.liu.0910 at gmail.com (=?UTF-8?B?56WW5bCn5YiY?=) Date: Tue, 5 May 2020 22:41:25 +0200 Subject: [maker-devel] Question about maker. Maker2 failed Message-ID: Hi maker developer, I'm using maker 2 to annotate a vertebrate genome. When I try to provide rm_gff file, it always fails. Here is log: Now starting the contig!! SeqID: chr_XXII Length: 12689475 #--------------------------------------------------------------------- setting up GFF3 output and fasta chunks doing repeat masking ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Did not specify a Hit End or Hit Begin STACK: Error::throw STACK: Bio::Root::Root::throw /gpfs/homefs/iee/zl19g775/miniconda3/envs/maker/lib/site_perl/5.26.2/Bio/Root/Root.pm:449 STACK: Bio::Search::HSP::GenericHSP::_subject_seq_feature /gpfs/homefs/iee/zl19g775/miniconda3/envs/maker/lib/site_perl/5.26.2/Bio/Search/HSP/GenericHSP.pm:1604 STACK: Bio::Search::HSP::GenericHSP::hit /gpfs/homefs/iee/zl19g775/miniconda3/envs/maker/lib/site_perl/5.26.2/Bio/Search/HSP/GenericHSP.pm:988 STACK: repeat_mask_seq::separate_types /gpfs/homefs/iee/zl19g775/miniconda3/envs/maker/bin/../lib/ repeat_mask_seq.pm:307 STACK: repeat_mask_seq::mask_chunk /gpfs/homefs/iee/zl19g775/miniconda3/envs/maker/bin/../lib/ repeat_mask_seq.pm:191 STACK: Process::MpiChunk::_go /gpfs/homefs/iee/zl19g775/miniconda3/envs/maker/bin/../lib/Process/MpiChunk.pm:763 STACK: Process::MpiChunk::run /gpfs/homefs/iee/zl19g775/miniconda3/envs/maker/bin/../lib/Process/MpiChunk.pm:341 STACK: Process::MpiChunk::run_all /gpfs/homefs/iee/zl19g775/miniconda3/envs/maker/bin/../lib/Process/MpiChunk.pm:357 STACK: Process::MpiTiers::run_all /gpfs/homefs/iee/zl19g775/miniconda3/envs/maker/bin/../lib/Process/MpiTiers.pm:287 STACK: Process::MpiTiers::run_all /gpfs/homefs/iee/zl19g775/miniconda3/envs/maker/bin/../lib/Process/MpiTiers.pm:287 STACK: /home/ubelix/iee/zl19g775/miniconda3/envs/maker/bin/maker:689 ----------------------------------------------------------- --> rank=NA, hostname=submit02.ubelix.unibe.ch ERROR: Failed while doing repeat masking ERROR: Chunk failed at level:0, tier_type:1 FAILED CONTIG:chr_XXII ERROR: Chunk failed at level:2, tier_type:0 FAILED CONTIG:chr_XXII examining contents of the fasta file and run log I also searched the google group and tried update my bioperl to 1.7.7 the latest version, but it didn't help. Could you please help me? Thanks a lot. Zuyao -------------- next part -------------- An HTML attachment was scrubbed... URL: From c143dad at ufl.edu Wed May 6 11:36:40 2020 From: c143dad at ufl.edu (Carneiro,Celine M) Date: Wed, 6 May 2020 17:36:40 +0000 Subject: [maker-devel] gene:multiple_Einit and overlaps_prev_exon errors in first round of SNAP training Message-ID: Hello, I am getting the errors gene:multiple_Einit, gene:multiple_Eterm, and exon:overlaps_prev_exon, at just about every gene model. I've ran the first round of maker on a bird genome I'm annotating with no errors and have started the steps to train SNAP. However, after running fathom -categorize, just about every single gene model has the same set of errors. Here is an example from my log file after running fathom -categorize: MODEL117 1 1 8 - errors(6): gene:multiple_Einit gene:multiple_Eterm exon-7:overlaps_prev_exon exon-5:overlaps_prev_exon exon-3:overlaps_prev_exon exon-1:overlaps_prev_exon MODEL851 1 1 100 - errors(78): gene:multiple_Einit gene:multiple_Eterm exon-99:overlaps_prev_exon exon-98:overlaps_prev_exon exon-97:overlaps_prev_exon exon-95:overlaps_prev_exon exon-94:overlaps_prev_exon exon-93:overlaps_prev_exon exon-91:overlaps_prev_exon exon-90:overlaps_prev_exon exon-89:overlaps_prev_exon exon-87:overlaps_prev_exon exon-86:overlaps_prev_exon exon-85:overlaps_prev_exon exon-83:overlaps_prev_exon exon-82:overlaps_prev_exon exon-81:overlaps_prev_exon exon-79:overlaps_prev_exon exon-78:overlaps_prev_exon exon-77:overlaps_prev_exon exon-75:overlaps_prev_exon exon-74:overlaps_prev_exon exon-73:overlaps_prev_exon exon-71:overlaps_prev_exon exon-70:overlaps_prev_exon exon-69:overlaps_prev_exon exon-67:overlaps_prev_exon exon-66:overlaps_prev_exon exon-65:overlaps_prev_exon exon-63:overlaps_prev_exon exon-62:overlaps_prev_exon exon-61:overlaps_prev_exon exon-59:overlaps_prev_exon exon-58:overlaps_prev_exon exon-57:overlaps_prev_exon exon-55:overlaps_prev_exon exon-54:overlaps_prev_exon exon-53:overlaps_prev_exon exon-51:overlaps_prev_exon exon-50:overlaps_prev_exon exon-49:overlaps_prev_exon exon-48:overlaps_prev_exon exon-47:overlaps_prev_exon exon-46:overlaps_prev_exon exon-45:overlaps_prev_exon exon-43:overlaps_prev_exon exon-42:overlaps_prev_exon exon-41:overlaps_prev_exon exon-39:overlaps_prev_exon exon-38:overlaps_prev_exon exon-37:overlaps_prev_exon exon-35:overlaps_prev_exon exon-34:overlaps_prev_exon exon-33:overlaps_prev_exon exon-31:overlaps_prev_exon exon-30:overlaps_prev_exon exon-29:overlaps_prev_exon exon-27:overlaps_prev_exon exon-26:overlaps_prev_exon exon-25:overlaps_prev_exon exon-23:overlaps_prev_exon exon-22:overlaps_prev_exon exon-21:overlaps_prev_exon exon-19:overlaps_prev_exon exon-18:overlaps_prev_exon exon-17:overlaps_prev_exon exon-15:overlaps_prev_exon exon-14:overlaps_prev_exon exon-13:overlaps_prev_exon exon-11:overlaps_prev_exon exon-10:overlaps_prev_exon exon-9:overlaps_prev_exon exon-7:overlaps_prev_exon exon-6:overlaps_prev_exon exon-5:overlaps_prev_exon exon-3:overlaps_prev_exon exon-2:overlaps_prev_exon exon-1:overlaps_prev_exon MODEL190 1 1 39 + errors(35): gene:multiple_Einit gene:multiple_Eterm exon-2:overlaps_prev_exon exon-3:overlaps_prev_exon exon-4:overlaps_prev_exon exon-5:overlaps_prev_exon exon-6:overlaps_prev_exon exon-7:overlaps_prev_exon exon-8:overlaps_prev_exon exon-9:overlaps_prev_exon exon-11:overlaps_prev_exon exon-12:overlaps_prev_exon exon-13:overlaps_prev_exon exon-14:overlaps_prev_exon exon-15:overlaps_prev_exon exon-16:overlaps_prev_exon exon-17:overlaps_prev_exon exon-18:overlaps_prev_exon exon-20:overlaps_prev_exon exon-21:overlaps_prev_exon exon-22:overlaps_prev_exon exon-23:overlaps_prev_exon exon-24:overlaps_prev_exon exon-25:overlaps_prev_exon exon-26:overlaps_prev_exon exon-27:overlaps_prev_exon exon-29:overlaps_prev_exon exon-30:overlaps_prev_exon exon-32:overlaps_prev_exon exon-33:overlaps_prev_exon exon-34:overlaps_prev_exon exon-35:overlaps_prev_exon exon-36:overlaps_prev_exon exon-38:overlaps_prev_exon exon-39:overlaps_prev_exon MODEL424 1 1 10 - errors(8): gene:multiple_Einit gene:multiple_Eterm exon-9:overlaps_prev_exon exon-7:overlaps_prev_exon exon-6:overlaps_prev_exon exon-5:overlaps_prev_exon exon-3:overlaps_prev_exon exon-1:overlaps_prev_exon MODEL902 1 1 20 - errors(14): gene:multiple_Einit gene:multiple_Eterm exon-19:overlaps_prev_exon exon-18:overlaps_prev_exon exon-17:overlaps_prev_exon exon-15:overlaps_prev_exon exon-13:overlaps_prev_exon exon-11:overlaps_prev_exon exon-9:overlaps_prev_exon exon-7:overlaps_prev_exon exon-6:overlaps_prev_exon exon-5:overlaps_prev_exon exon-3:overlaps_prev_exon exon-1:overlaps_prev_exon MODEL238 1 1 14 - errors(11): gene:multiple_Einit gene:multiple_Eterm exon-13:overlaps_prev_exon exon-12:overlaps_prev_exon exon-11:overlaps_prev_exon exon-9:overlaps_prev_exon exon-7:overlaps_prev_exon exon-6:overlaps_prev_exon exon-5:overlaps_prev_exon exon-3:overlaps_prev_exon exon-1:overlaps_prev_exon MODEL39 1 1 6 - errors(1): exon-3:overlaps_prev_exon MODEL119 1 1 10 + errors(8): gene:multiple_Einit gene:multiple_Eterm exon-2:overlaps_prev_exon exon-4:overlaps_prev_exon exon-6:overlaps_prev_exon exon-7:overlaps_prev_exon exon-8:overlaps_prev_exon exon-10:overlaps_prev_exon Furthermore, I checked my genome.ann file and noticed that my Einit and Exon sites are duplicated. For example: >ScdimlH_1004;HRSCAF=1084 Einit 38730 38677 MODEL851 Exon 38255 38178 MODEL851 Exon 38074 38021 MODEL851 Exon 24755 24717 MODEL851 Exon 24213 24149 MODEL851 Exon 23176 23098 MODEL851 Exon 22037 21961 MODEL851 Exon 21269 21080 MODEL851 Exon 20232 20167 MODEL851 Exon 19742 19704 MODEL851 Exon 14705 14590 MODEL851 Exon 14255 13980 MODEL851 Exon 14169 13980 MODEL851 Exon 13303 13223 MODEL851 Exon 13303 13223 MODEL851 Exon 12782 12639 MODEL851 Exon 12782 12639 MODEL851 Exon 5761 5592 MODEL851 Exon 5482 5404 MODEL851 Exon 5140 5064 MODEL851 Exon 4951 4750 MODEL851 Exon 4567 4502 MODEL851 Exon 4256 4185 MODEL851 Exon 3569 3403 MODEL851 Exon 3157 3076 MODEL851 Exon 2936 2800 MODEL851 Eterm 2186 2000 MODEL851 Einit 38730 38677 MODEL851 Exon 38255 38178 MODEL851 Exon 38074 38021 MODEL851 Exon 24755 24717 MODEL851 Exon 24213 24149 MODEL851 Exon 23176 23098 MODEL851 Exon 22037 21961 MODEL851 Exon 21269 21080 MODEL851 Exon 20232 20167 MODEL851 Exon 19742 19704 MODEL851 Exon 14705 14590 MODEL851 Exon 14255 13980 MODEL851 Exon 14169 13980 MODEL851 Exon 13303 13223 MODEL851 Exon 13303 13223 MODEL851 Exon 12782 12639 MODEL851 Exon 12782 12639 MODEL851 Exon 5761 5592 MODEL851 Exon 5482 5404 MODEL851 Exon 5140 5064 MODEL851 Exon 4951 4750 MODEL851 Exon 4567 4502 MODEL851 Exon 4256 4185 MODEL851 Exon 3569 3403 MODEL851 Exon 3157 3076 MODEL851 Exon 2936 2800 MODEL851 Eterm 2186 2000 MODEL851 Any ideas why I'm seeing this duplication? Lastly, any ideas why my exons are overlapping so much? I appreciate any input and please let me know if you require any more information. Thank you! Celine -------------- next part -------------- An HTML attachment was scrubbed... URL: From peruzzaluca at gmail.com Tue May 19 02:31:22 2020 From: peruzzaluca at gmail.com (Luca Peruzza) Date: Tue, 19 May 2020 10:31:22 +0200 Subject: [maker-devel] Maker v3.01 change-log + 3'UTR question Message-ID: Hi There, I have two questions and I hope you guys can help me with them: 1. I have seen that maker version 3.01 is now out. Is there a change log available to see the changes in comparison to the previous maker version and have a glimpse of the new features of this release? 2. If I was to improve the annotation of my 3? UTRs within a certain (non-model species) gff3, is there a particular way or a protocol to follow? I was thinking for example that Lexogen has released their 3? UTR kit for RNA-seq of the three prime end of transcripts. Would it be possible to feed those reads to maker and somehow suggest that the reads are originating from the three-prime end so that this info is then passed in the gff3 file? Thanks a lot in advance for your help Best Luca From zuyao.liu.0910 at gmail.com Tue May 19 03:10:30 2020 From: zuyao.liu.0910 at gmail.com (=?UTF-8?B?56WW5bCn5YiY?=) Date: Tue, 19 May 2020 11:10:30 +0200 Subject: [maker-devel] Question about maker. Maker2 failed Message-ID: Hi maker developers I'm using maker 2 to annotate a fish genome. When I try to provide rm_gff file, it always fails. Here is log: collecting blastx repeatmasking doing repeat masking processing all repeats deleted:0 hits in cluster::shadow_cluster... Died at /gpfs/homefs/iee/zl19g775/miniconda3/envs/maker/bin/../lib/Bio/Search/Hit/PhatHit/Base.pm line 188. --> rank=23, hostname=hnode48 ERROR: Failed while processing all repeats ERROR: Chunk failed at level:3, tier_type:1 FAILED CONTIG:chr_XXIII ERROR: Chunk failed at level:2, tier_type:0 FAILED CONTIG:chr_XXIII I use maker 2.3.10 with repeatmasker 4.0.9. I saw someone got this error as well and I followed the solutions. I tried update to blast 2.9.0, rmblast 2.9.0,bioperl1.7.7 and also checked rm gff file with gff3 validator. But the error still existed. Do you have any suggestions? Thanks a lot for your help. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jason.stajich at gmail.com Tue May 19 17:29:38 2020 From: jason.stajich at gmail.com (Jason Stajich) Date: Tue, 19 May 2020 16:29:38 -0700 Subject: [maker-devel] Maker v3.01 change-log + 3'UTR question In-Reply-To: References: Message-ID: Luca - I would suggest PASA as a tool for 3'UTR (and 5'UTR) improvement in gene annotation too. https://github.com/PASApipeline/PASApipeline Funannotate has a step that can be use to run and update gene models if you want to also take on from an existing maker run - https://funannotate.readthedocs.io/en/latest/ Jason Jason Stajich jason.stajich at gmail.com On Tue, May 19, 2020 at 1:33 AM Luca Peruzza wrote: > Hi There, > I have two questions and I hope you guys can help me with them: > > 1. I have seen that maker version 3.01 is now out. Is there a change log > available to see the changes in comparison to the previous maker version > and have a glimpse of the new features of this release? > > 2. If I was to improve the annotation of my 3? UTRs within a certain > (non-model species) gff3, is there a particular way or a protocol to > follow? I was thinking for example that Lexogen has released their 3? UTR > kit for RNA-seq of the three prime end of transcripts. Would it be possible > to feed those reads to maker and somehow suggest that the reads are > originating from the three-prime end so that this info is then passed in > the gff3 file? > > Thanks a lot in advance for your help > Best > Luca > > > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From peruzzaluca at gmail.com Wed May 20 06:41:14 2020 From: peruzzaluca at gmail.com (Luca Peruzza) Date: Wed, 20 May 2020 14:41:14 +0200 Subject: [maker-devel] Maker v3.01 change-log + 3'UTR question In-Reply-To: References: Message-ID: <4ABBF9F2-4F9E-4D7F-B821-3276D4D3EFD1@gmail.com> Thanks Jason, Yes, my idea was to add extra 3?UTR info to an existing maker gff3 file. If you say that funannotate can do it, I?ll have a look. Thanks Luca > On 20 May 2020, at 01:29, Jason Stajich wrote: > > Luca - I would suggest PASA as a tool for 3'UTR (and 5'UTR) improvement in gene annotation too. https://github.com/PASApipeline/PASApipeline > > Funannotate has a step that can be use to run and update gene models if you want to also take on from an existing maker run - https://funannotate.readthedocs.io/en/latest/ > > Jason > Jason Stajich > jason.stajich at gmail.com > > > On Tue, May 19, 2020 at 1:33 AM Luca Peruzza > wrote: > Hi There, > I have two questions and I hope you guys can help me with them: > > 1. I have seen that maker version 3.01 is now out. Is there a change log available to see the changes in comparison to the previous maker version and have a glimpse of the new features of this release? > > 2. If I was to improve the annotation of my 3? UTRs within a certain (non-model species) gff3, is there a particular way or a protocol to follow? I was thinking for example that Lexogen has released their 3? UTR kit for RNA-seq of the three prime end of transcripts. Would it be possible to feed those reads to maker and somehow suggest that the reads are originating from the three-prime end so that this info is then passed in the gff3 file? > > Thanks a lot in advance for your help > Best > Luca > > > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From niconm89 at gmail.com Thu May 21 07:58:34 2020 From: niconm89 at gmail.com (=?UTF-8?Q?Nicol=C3=A1s_Moreyra?=) Date: Thu, 21 May 2020 10:58:34 -0300 Subject: [maker-devel] different number of annotated genes and transcripts Message-ID: Dear all, First of all, thank you for sharing your experiences here. I tried to find this issue in the posts already made but failed. Secondly, I am sorry for asking you a silly question (I think), but after I complete the genome annotation of four species, I obtained fewer transcripts than genes. I do not understand why MAKER annotated genes unable to transcribe. I was trying to find the reason for this issue to discuss it in my thesis but I am a bit lost. Has this happened to anyone? Is there any possible cause that comes to mind? Thanks in advance. Nicol?s *--* *Nicolas Nahuel Moreyra* *BSc/MSc in Bioinformatics* *CONICET PhD Fellow @ IEGEBA* *PhD Student in Comparative Genomics @ EGE (**FCEyN - UBA) **-> **nmoreyra at ege.fcen.uba.ar * Professor of Bioinformatics @ Favaloro University Professor of Informatics @ IFTS N? 7 *Argentina* -------------- next part -------------- An HTML attachment was scrubbed... URL: From yujin at genomics.cn Fri May 22 23:46:33 2020 From: yujin at genomics.cn (=?gb2312?B?0+C9+ChKaW4gWXUp?=) Date: Sat, 23 May 2020 05:46:33 +0000 Subject: [maker-devel] maker error-ERROR: Failed while annotating transcripts Message-ID: Hi, Dear developers. I'm using maker-3.01.03 to annotate a plant genome. But I met this error: Can't locate object method "add_entry" via package "1" (perhaps you forgot to load "1"?) at /vol2/liuyang_group/liuyang/software/maker-3.01.03/bin/../lib/Widget/snap.pm line 540. ERROR: Failed while annotating transcripts The attached file is the full STDERR from maker. I have searched the archived mailing list, and found a similar question (https://groups.google.com/forum/#!topic/maker-devel/fGGCKXhi6cw), but I didn't find any error which occurred before this one in the log. Appreciate it a lot if you could help me! Best regards Jin Yu ??? ?? 15527740380 ??????????? ???????? -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker.log Type: application/octet-stream Size: 3935522 bytes Desc: maker.log URL: From zhoux233 at mail2.sysu.edu.cn Sun May 24 17:43:43 2020 From: zhoux233 at mail2.sysu.edu.cn (=?utf-8?B?5ZGo6ZGr?=) Date: Mon, 25 May 2020 07:43:43 +0800 Subject: [maker-devel] Trouble in opening the registration page for Maker Message-ID: Hello, Developers of Maker, I'm a student from SYSU, China. Recently, I wanted to download Maker for my lab annotation work from your website, but I got in trouble opening the registration page for days, and I didn't figure out why. And I failed to install maker with conda, so could you please tell me how to deal with it? Or could you please send me a copy of source? If you are convenient to send me a copy, here is my information: Name: Zhou Xin Email address: zhoux233 at mail2.sysu.edu.cn Software needed: Maker PI name: Huang ShengFeng Research: Genome Annotation for zebrafish Institute: Life Science School, Sun Yat-Sen University Institute URL: http://lifesciences.sysu.edu.cn/ Country: China Province: GuangDong City: Guang Zhou If anything else needed, please email me, I will add it as soon as I see it. Anyway, thank you for your attention very much! Any reply will be appreciated very much! Regards! Zhou Xin -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue May 26 11:54:45 2020 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 26 May 2020 11:54:45 -0600 Subject: [maker-devel] different number of annotated genes and transcripts In-Reply-To: References: Message-ID: Perhaps you are counting wrong. If you want to know the number go genes, you must look at the GFF3. You can use ?grep -c -P ?\tgene\t? file.gff?, then the number of transcripts would be ?grep -c -P ?RNA\t? file.gff" Note that if you are using things like tRNAscan, you will get tRNA transcripts and associated genes. If you are trying to count from the fasta files, make sure you use the right file (maker.proteins.fasta and maker.transcripts.fasta). Thanks, Carson > On May 21, 2020, at 7:58 AM, Nicol?s Moreyra wrote: > > Dear all, > > First of all, thank you for sharing your experiences here. I tried to find this issue in the posts already made but failed. > Secondly, I am sorry for asking you a silly question (I think), but after I complete the genome annotation of four species, I obtained fewer transcripts than genes. I do not understand why MAKER annotated genes unable to transcribe. > I was trying to find the reason for this issue to discuss it in my thesis but I am a bit lost. Has this happened to anyone? Is there any possible cause that comes to mind? > > Thanks in advance. > > Nicol?s > > -- > Nicolas Nahuel Moreyra > BSc/MSc in Bioinformatics > CONICET PhD Fellow @ IEGEBA > PhD Student in Comparative Genomics @ EGE (FCEyN - UBA) -> nmoreyra at ege.fcen.uba.ar > Professor of Bioinformatics @ Favaloro University > Professor of Informatics @ IFTS N? 7 > Argentina > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue May 26 12:10:16 2020 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 26 May 2020 12:10:16 -0600 Subject: [maker-devel] Maker 0 genes after SNAP or with proteins.gff In-Reply-To: <84c5fc195df0fcc5e03484e65076fa9c@uni-duesseldorf.de> References: <84c5fc195df0fcc5e03484e65076fa9c@uni-duesseldorf.de> Message-ID: <053268A7-E5B0-4878-92F2-63B01869B677@gmail.com> You would have to look at the alignments, but I suspect they do not align in a way to the gene models to supply sufficient support for the annotation. If it is the maker2zff script producing 0 genes, that it because it requires at least some EST evidence. You can change that using the command line options. ?Carson > On Apr 24, 2020, at 8:27 AM, Ricardo Nuno Ferreira Martins Guerreiro wrote: > > Dear Makers list, > > > I am struggling with Maker after many successful attempts. I don't understand why but my final .gff does not contain any genes, 0. > > I am running first an Evidence based modelling, with proteins only. Here I get around 40 thousand genes if I give the proteins as a fasta to align (if I provide a protein.gff from a previous maker try, I get 0 genes, same problem). > > Afterwards I'm creating a SNAP hmm and running maker again, turning protein2genome=0 and snaphmm=snap.hmm as you say, but now I have 0 genes. This happens either I keep providing proteins as a fasta or as .gff of a previous run. > > I have done this many times and it always worked. The only difference now is that I am using no ESTs whatsoever, only proteins. It's also strange that it works on the first round of maker but doesn't work on the SNAP rounds. > > > Hope you can help, > Ricardo_______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Tue May 26 12:14:28 2020 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 26 May 2020 12:14:28 -0600 Subject: [maker-devel] Problems with openMPI in multiple computing nodes In-Reply-To: References: Message-ID: <78BFA69F-C631-4190-9F97-6B6ECC7AE15B@gmail.com> You don?t have berkleyDB installed on your system, so BioPerl is trying to fall back to another index format that has issues on network mounted file systems. You can try and install BerkleyDB then the related perl module (https://metacpan.org/pod/BerkeleyDB ). You would then need to reinstall BioPerl and MAKER. You can also try running on a single CPU until indexing finishes, then launch MAKER. That might be enough to get around any early race conditions. ?Carson > On Apr 26, 2020, at 12:58 AM, Xu, taosheng wrote: > > Hello, > I am using a computer cluster with 20 nodes(40cpus per node) for gene annotation. I submit my maker task to one node with 40 CPUs using openMPI. Everything is well. > But I encounter the problem when submitting the same maker task to the cluster with multiple nodes (120 cpus) There are errors shown below. > I would also appreciate any advice. Thank you. > > Best regards, > Taosheng > > > STATUS: Processing and indexing input FASTA files... > cannot remove directory for home/20200425/genome.maker.output/mpi_blastdb/te_proteins%2Efasta.mpi.10//.dbtmp0: No such file or directory at /maker/bin/../lib/FastaDB.pm line 145. > cannot remove directory for /home/20200425/genome.maker.output/mpi_blastdb/te_proteins%2Efasta.mpi.10//.dbtmp0: Directory not empty at /maker/bin/../lib/FastaDB.pm line 145. > cannot remove directory for /home/20200425/genome.maker.output/mpi_blastdb/te_proteins%2Efasta.mpi.10//.dbtmp0: Directory not empty at /maker/bin/../lib/FastaDB.pm line 145. > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue May 26 12:48:14 2020 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 26 May 2020 12:48:14 -0600 Subject: [maker-devel] Missing genes in lift-over with est2genome In-Reply-To: References: <373413EA-9D4C-44CF-AA51-632C0F54B7AC@gmail.com> Message-ID: If using the est_forward=1 options for the leftover, you can also anchor a search to a specific contig or region by adding a tag to the fasta header ( maker_coor=contig:1-1000; ). The tag will force Exonerate to only run on that region. Sometimes that can rescue a model. When you pass results into model_gff=, it will leave them unchanged. It just accepts or rejects them as is. But the model itself is considered evidence, and can alter clustering. Other_gff= just passes things through with no processing or evaluation (it?s like cut and paste). You can also try deFusion on result models for resolving gene fusions ?> https://wjidea.github.io/defusion/Introduction.html ?Carson > On Apr 30, 2020, at 6:58 AM, Lior Glick wrote: > > Thanks Carson - your answer was very helpful. > Another question related to the lift-over process, if I may. > I want to take the resulting gff and pass it on to another MAKER run, where I provide further, lower confidence evidence (ESTs and proteins). I'm not sure which option to use though. According to this helpful post , I tried using pred_gff and model_gff, but both created cases of fusion genes when genes are very adjacent to one another (see attached picture), even with the correct_est_fusion parameter enabled. It looks like the only way to take lifted-over genes "as-is" would be to use other_gff, but I figure that this was not really intended for genes. Would you recommend this usage? Am I missing something? > Thank you! > > ??????? ??? ??, 23 ????? 2020 ?-20:43 ??? ?Carson Holt?? ??>:? > There are percent cutoffs for the est2genome algorithm you can set in the maker_bopts.ctl file. Additionally, maker will give the alignment but not produce a gene model if it can?t translate through the est2genome alignment (i.e. stop codons in the assembly). I believe the cutoff is 50%. If you add est_forward=1 to the maker_opts.ctl file names will be copied from the alignment source and the score in the GFF3 column will be the percent match to the original transcript. > > ?Carson > > > > > On Apr 21, 2020, at 7:08 AM, Lior Glick > wrote: > > > > Hello, > > I am using MAKER to annotate a plant genome assembly. A high-quality reference genome and annotation exists for another variety of the same species, so my first step is lifting over reference genes to my genome. I do this by setting est2genome = 1 and providing MAKER with the reference cDNA (transcriptome). No other evidence is provided and no prediction is performed. Repeat masking is done using the reference repeats library. > > When checking the results, I found out lots of reference genes missing from the lift-over result. However, if I blast the sequences of these genes myself, I get good matches. I even see these matches when I look at the blast results buried in the MAKER data_store. > > For example, a transcript of length 1077 got a match of length 855 - 100% identity and no gaps. Bitscore was 1709 and E-value 0. This looks like a pretty good match, but it is not found in the final MAKER results (gff/fasta). > > Why is this happening? Are there some cutoffs that are not satisfied? If so, what are they and how can they be configured? > > > > Thanks, > > Lior > > _______________________________________________ > > maker-devel mailing list > > maker-devel at yandell-lab.org > > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue May 26 12:51:52 2020 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 26 May 2020 12:51:52 -0600 Subject: [maker-devel] Multiple UTR ? In-Reply-To: <16630d833d1448e7a771a4f2b19b0476@unil.ch> References: <16630d833d1448e7a771a4f2b19b0476@unil.ch> Message-ID: <003DC610-63E0-4D04-8CD2-431C92337C5F@gmail.com> The UTR is split across two exons. The intron is not considered part of the UTR. The UTR exists in the post splicing mRNA, so the corresponding genomic coordinates will have gaps because the introns that exist in the genome have been spliced out of the mRNA. So while the UTR is continuous in the mRNA, it is punctuated in the genome. ?Carson > On May 3, 2020, at 5:39 AM, Patrick Tran Van wrote: > > Hi Carson, > > for instance, if have this: > > SCFXX maker > five_prime_UTR 5164370 > 5164715 . > - . ID=GENE-RA:five_prime_utr;Parent=GENE-RA; > SCFXX maker > five_prime_UTR 5156091 > 5156136 . > - . ID=GENE-RA:five_prime_utr;Parent=GENE-RA; > > > Does it mean that real coordinate of the 5' UTR is from 5156091 to 5164715 ? > > Patrick Tran Van > > Bioinformatician: Lab Chapuisat & Schwander > Department of Ecology and Evolution > University of Lausanne > Lausanne - Switzerland > Office 3206 > From: Carson Holt > > Sent: Wednesday, February 26, 2020 8:27:43 PM > To: Patrick Tran Van > Cc: maker-devel at yandell-lab.org > Subject: Re: [maker-devel] Multiple UTR ? > > Sorry for the very slow reply. I found this way way down in my inbox. > > The UTR features are the parts of the exons that are not CDS. So multiple UTR, means it spans multiple exons, and must assembled to generate the full UTR in a browser. Any exon that is fully non-coding will produce a UTR feature that mirrors an exons coordinates, and if it?s partially coding the UTR will share the same start or end by will terminate somewhere in the middle with a CDS filling up the remains coordinates. The UTR and CDS features get tiled over the top of the exon features when assembling a gene model. > > ?Carson > > > >> On Dec 18, 2019, at 7:19 AM, Patrick Tran Van > wrote: >> >> Hi Carson, >> >> I have seen something strange in my annotation: multiple UTR. How can we explain this ? Thanks! >> >> Scaffold maker >> mRNA 12117462 >> 12128433 . >> - . >> ID=GENE_02395-RA;Parent=GENE_02395;Name=GENE_02395-RA;Alias=maker-Scaffold-augustus-gene-40.12-mRNA-3;_AED=0.02;_QI=5383|1|1|1|0.88|0.9|10|247|238;_eAED=0.02;Note=Protein of unknown function; >> Scaffold maker >> exon 12128112 >> 12128433 . >> - . >> ID=GENE_02395-RA:exon:571;Parent=GENE_02395-RA; >> Scaffold maker >> exon 12117462 >> 12118046 . >> - . >> ID=GENE_02395-RB:exon:569;Parent=GENE_02395-RB,GENE_02395-RC,GENE_02395-RA; >> Scaffold maker >> exon 12118141 >> 12118301 . >> - . >> ID=GENE_02395-RB:exon:568;Parent=GENE_02395-RB,GENE_02395-RC,GENE_02395-RA; >> Scaffold maker >> exon 12118386 >> 12118539 . >> - . >> ID=GENE_02395-RB:exon:567;Parent=GENE_02395-RB,GENE_02395-RA; >> Scaffold maker >> exon 12118818 >> 12122493 . >> - . >> ID=GENE_02395-RB:exon:566;Parent=GENE_02395-RB,GENE_02395-RA; >> Scaffold maker >> exon 12123591 >> 12123893 . >> - . >> ID=GENE_02395-RB:exon:565;Parent=GENE_02395-RB,GENE_02395-RC,GENE_02395-RA; >> Scaffold maker >> exon 12123995 >> 12124303 . >> - . >> ID=GENE_02395-RB:exon:564;Parent=GENE_02395-RB,GENE_02395-RC,GENE_02395-RA; >> Scaffold maker >> exon 12125119 >> 12125418 . >> - . >> ID=GENE_02395-RB:exon:563;Parent=GENE_02395-RB,GENE_02395-RC,GENE_02395-RA; >> Scaffold maker >> exon 12126005 >> 12126313 . >> - . >> ID=GENE_02395-RB:exon:562;Parent=GENE_02395-RB,GENE_02395-RC,GENE_02395-RA; >> Scaffold maker >> exon 12127460 >> 12127687 . >> - . >> ID=GENE_02395-RB:exon:561;Parent=GENE_02395-RB,GENE_02395-RC,GENE_02395-RA; >> Scaffold maker >> five_prime_UTR 12128112 >> 12128433 . >> - . >> ID=GENE_02395-RA:five_prime_utr;Parent=GENE_02395-RA; >> Scaffold maker >> five_prime_UTR 12127460 >> 12127687 . >> - . >> ID=GENE_02395-RA:five_prime_utr;Parent=GENE_02395-RA; >> Scaffold maker >> five_prime_UTR 12126005 >> 12126313 . >> - . >> ID=GENE_02395-RA:five_prime_utr;Parent=GENE_02395-RA; >> Scaffold maker >> five_prime_UTR 12125119 >> 12125418 . >> - . >> ID=GENE_02395-RA:five_prime_utr;Parent=GENE_02395-RA; >> Scaffold maker >> five_prime_UTR 12123995 >> 12124303 . >> - . >> ID=GENE_02395-RA:five_prime_utr;Parent=GENE_02395-RA; >> Scaffold maker >> five_prime_UTR 12123591 >> 12123893 . >> - . >> ID=GENE_02395-RA:five_prime_utr;Parent=GENE_02395-RA; >> Scaffold maker >> five_prime_UTR 12118882 >> 12122493 . >> - . >> ID=GENE_02395-RA:five_prime_utr;Parent=GENE_02395-RA; >> Scaffold maker >> CDS 12118818 >> 12118881 . >> - 0 >> ID=GENE_02395-RA:cds;Parent=GENE_02395-RA; >> Scaffold maker >> CDS 12118386 >> 12118539 . >> - 2 >> ID=GENE_02395-RA:cds;Parent=GENE_02395-RA; >> Scaffold maker >> CDS 12118141 >> 12118301 . >> - 1 >> ID=GENE_02395-RA:cds;Parent=GENE_02395-RA; >> Scaffold maker >> CDS 12117709 >> 12118046 . >> - 2 >> ID=GENE_02395-RA:cds;Parent=GENE_02395-RA; >> Scaffold maker >> three_prime_UTR 12117462 >> 12117708 . >> - . >> ID=GENE_02395-RA:three_prime_utr;Parent=GENE_02395-RA; >> >> >> >> Patrick Tran Van >> >> Bioinformatician: Lab Chapuisat & Schwander >> Department of Ecology and Evolution >> University of Lausanne >> Lausanne - Switzerland >> Office 3206 >> _______________________________________________ >> maker-devel mailing list >> maker-devel at yandell-lab.org >> http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From niconm89 at gmail.com Tue May 26 12:58:22 2020 From: niconm89 at gmail.com (=?UTF-8?Q?Nicol=C3=A1s_Moreyra?=) Date: Tue, 26 May 2020 15:58:22 -0300 Subject: [maker-devel] different number of annotated genes and transcripts In-Reply-To: References: Message-ID: Hi Carson, thanks for your reply. Yes, I did the same as you. Here are different outputs for the same annotation file: > grep -c -P "\tgene\t" Dato_struct-annot.noseq.gff > 17688 > grep -c -P "RNA\t" Dato_struct-annot.noseq.gff > 17688 > grep -c -P "mRNA\t" Dato_struct-annot.noseq.gff > 17205 > grep -P "RNA\t" Dato_struct-annot.noseq.gff| cut -f3 | sort -u > mRNA > tRNA After using a tool to extract transcripts sequences in a Fasta file, y obtained 17205 sequences. Looking for those genes without an associated transcript, it seems that you can only find tRNAs annotated there. It is odd: > Backbone_23 maker gene 486041 486112 . - . > ID=Dato03103;Name=Dato03103;Alias=trnascan-Backbone_23-noncoding-Glu_CTC-gene-4.38; > Backbone_23 maker tRNA 486041 486112 . - . > ID=Dato03103-RA;Parent=Dato03103;Name=Dato03103-RA;_AED=1.00;_QI=0|-1|0|0|-1|0|1|73|0;_eAED=1.00; > Backbone_23 maker exon 486041 486112 . - . > ID=Dato03103-RA:exon:45875;Parent=Dato03103-RA; The AED is bad in this example, so I'm thinking that it would be possible this gene had no evidence supporting it. I do not understand either the "Alias" for the gene line, it looks like trnaScan detected the gene. Any ideas? Nicol?s *--* *Nicolas Nahuel Moreyra* *BSc/MSc in Bioinformatics* *CONICET PhD Fellow @ IEGEBA* *PhD Student in Comparative Genomics @ EGE (**FCEyN - UBA) **-> **nmoreyra at ege.fcen.uba.ar * Professor of Bioinformatics @ Favaloro University Professor of Informatics @ IFTS N? 7 *Argentina* El mar., 26 de may. de 2020 a la(s) 14:54, Carson Holt (carsonhh at gmail.com) escribi?: > Perhaps you are counting wrong. If you want to know the number go genes, > you must look at the GFF3. You can use ?grep -c -P ?\tgene\t? file.gff?, > then the number of transcripts would be ?grep -c -P ?RNA\t? file.gff" > > Note that if you are using things like tRNAscan, you will get tRNA > transcripts and associated genes. If you are trying to count from the > fasta files, make sure you use the right file (maker.proteins.fasta and > maker.transcripts.fasta). > > Thanks, > Carson > > > On May 21, 2020, at 7:58 AM, Nicol?s Moreyra wrote: > > Dear all, > > First of all, thank you for sharing your experiences here. I tried to find > this issue in the posts already made but failed. > Secondly, I am sorry for asking you a silly question (I think), but after > I complete the genome annotation of four species, I obtained fewer > transcripts than genes. I do not understand why MAKER annotated genes > unable to transcribe. > I was trying to find the reason for this issue to discuss it in my thesis > but I am a bit lost. Has this happened to anyone? Is there any possible > cause that comes to mind? > > Thanks in advance. > > Nicol?s > > *--* > *Nicolas Nahuel Moreyra* > *BSc/MSc in Bioinformatics* > *CONICET PhD Fellow @ IEGEBA* > *PhD Student in Comparative Genomics @ EGE (**FCEyN - UBA) **-> **nmoreyra at ege.fcen.uba.ar > * > Professor of Bioinformatics @ Favaloro University > Professor of Informatics @ IFTS N? 7 > *Argentina* > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue May 26 13:01:38 2020 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 26 May 2020 13:01:38 -0600 Subject: [maker-devel] Unable to reproduce MAKER blastn results In-Reply-To: References: Message-ID: <39657402-9B41-4B82-8164-199ECCA634AC@gmail.com> The only downstream change to the blast results would be the removal of HSPs not meeting the bit_blastn of a minimum bitscore. Also note the prove is not a blast parameter. It is a post blast filter. The HSPs are tiled and flattened, then the percent coverage against the original query is calculated (i.e if every base of the query is represented at least once in the result, then coverage is 100% ). The blast results are used only for identifying the rough region a model overlaps that is then passed to exonerate. The exonerate alignment is used to generate the splice aware est2genome model. Many good blastn alignments will produce poor exonerate alignments, and no est2genome results. ?Carson > On May 5, 2020, at 3:46 AM, Lior Glick wrote: > > Hello, > > I am running MAKER 2.31.10 with a very simple configuration, with only EST evidence and the est2genome option enabled (basically a lift-over procedure). > I noticed that some of my transcripts are not included in the annotation output and when I looked at the blastn results the reason was clear - they do not pass the coverage cutoff defined in maker_bopts.ctl. Interestingly, when I tried running blastn myself, using the same command (taken from the maker log) and the same blastn version, I got slightly different results. Specifically, for some of the transcripts the MAKER blastn run produced less HSPs than my blastn run, resulting in a lower total coverage. The additional HSPs seem to have good % identity and E-values, so I don't understand why and how they are discarded. Are the blastn results changed by MAKER in subsequent steps (after the blastn run)? > Please find attached blastn results from MAKER run and from my run. You can look at transcript AT1G01740.3 as an example. in my.blastn, there are 8 HSPs, while MAKER.blastn only has 3 of them. > Can you explain the difference? Maybe it has to do with repeat masking or other processing of the genome sequence? > > Just to make sure you have all the details: > Relevant maker_bopts parameters: > pcov_blastn=0.7 #Blastn Percent Coverage Threhold EST-Genome Alignments > pid_blastn=0.85 #Blastn Percent Identity Threshold EST-Genome Aligments > eval_blastn=1e-10 #Blastn eval cutoff > bit_blastn=40 #Blastn bit cutoff > depth_blastn=0 #Blastn depth cutoff (0 to disable cutoff) > > Blastn command: > blastn -db /groups/itay_mayrose/nosnap/liorglic/Projects/PGCM/output/A_thaliana_pan_genome/PGC_de_novo/RESULT_RG_new/per_sample/col-0/liftover_SRR1945757/chunks/chunk00.fa/TMP/maker_sPf3Rf/TAIR10_longest_trans%2Efasta.mpi.10.0 -query /groups/itay_mayrose/nosnap/liorglic/Projects/PGCM/output/A_thaliana_pan_genome/PGC_de_novo/RESULT_RG_new/per_sample/col-0/liftover_SRR1945757/chunks/chunk00.fa/TMP/maker_sPf3Rf/0/chunk00.0 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-10 -word_size 28 -reward 1 -penalty -5 -gapopen 5 -gapextend 5 -dbsize 1000 -searchsp 500000000 -num_threads 10 -lcase_masking -dust yes -soft_masking true -show_gis -out > > Thank you! > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Tue May 26 13:03:31 2020 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 26 May 2020 13:03:31 -0600 Subject: [maker-devel] Question about maker. Maker2 failed In-Reply-To: References: Message-ID: <2F9626D1-F7C2-4BDA-A59F-C6566C0D558D@gmail.com> It is probably the formating of the models provided. There is something wrong with them. They must be match/match_part two level feature for rm_gff. You can send us the file, and I can take a look if it helps. ?Carson > On May 5, 2020, at 2:41 PM, ??? wrote: > > Hi maker developer, > > I'm using maker 2 to annotate a vertebrate genome. > When I try to provide rm_gff file, it always fails. > Here is log: > Now starting the contig!! > SeqID: chr_XXII > Length: 12689475 > #--------------------------------------------------------------------- > > > setting up GFF3 output and fasta chunks > doing repeat masking > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Did not specify a Hit End or Hit Begin > STACK: Error::throw > STACK: Bio::Root::Root::throw /gpfs/homefs/iee/zl19g775/miniconda3/envs/maker/lib/site_perl/5.26.2/Bio/Root/Root.pm:449 > STACK: Bio::Search::HSP::GenericHSP::_subject_seq_feature /gpfs/homefs/iee/zl19g775/miniconda3/envs/maker/lib/site_perl/5.26.2/Bio/Search/HSP/GenericHSP.pm:1604 > STACK: Bio::Search::HSP::GenericHSP::hit /gpfs/homefs/iee/zl19g775/miniconda3/envs/maker/lib/site_perl/5.26.2/Bio/Search/HSP/GenericHSP.pm:988 > STACK: repeat_mask_seq::separate_types /gpfs/homefs/iee/zl19g775/miniconda3/envs/maker/bin/../lib/repeat_mask_seq.pm:307 > STACK: repeat_mask_seq::mask_chunk /gpfs/homefs/iee/zl19g775/miniconda3/envs/maker/bin/../lib/repeat_mask_seq.pm:191 > STACK: Process::MpiChunk::_go /gpfs/homefs/iee/zl19g775/miniconda3/envs/maker/bin/../lib/Process/MpiChunk.pm:763 > STACK: Process::MpiChunk::run /gpfs/homefs/iee/zl19g775/miniconda3/envs/maker/bin/../lib/Process/MpiChunk.pm:341 > STACK: Process::MpiChunk::run_all /gpfs/homefs/iee/zl19g775/miniconda3/envs/maker/bin/../lib/Process/MpiChunk.pm:357 > STACK: Process::MpiTiers::run_all /gpfs/homefs/iee/zl19g775/miniconda3/envs/maker/bin/../lib/Process/MpiTiers.pm:287 > STACK: Process::MpiTiers::run_all /gpfs/homefs/iee/zl19g775/miniconda3/envs/maker/bin/../lib/Process/MpiTiers.pm:287 > STACK: /home/ubelix/iee/zl19g775/miniconda3/envs/maker/bin/maker:689 > ----------------------------------------------------------- > --> rank=NA, hostname=submit02.ubelix.unibe.ch > ERROR: Failed while doing repeat masking > ERROR: Chunk failed at level:0, tier_type:1 > FAILED CONTIG:chr_XXII > > ERROR: Chunk failed at level:2, tier_type:0 > FAILED CONTIG:chr_XXII > > examining contents of the fasta file and run log > > > > I also searched the google group and tried update my bioperl to 1.7.7 the latest version, but it didn't help. > > Could you please help me? > > Thanks a lot. > > Zuyao > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue May 26 13:15:20 2020 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 26 May 2020 13:15:20 -0600 Subject: [maker-devel] gene:multiple_Einit and overlaps_prev_exon errors in first round of SNAP training In-Reply-To: References: Message-ID: <92F27280-90B8-4AF2-8C1D-B54955A7C521@gmail.com> You have an ID collision in the GFF3. Check the gff3 being sent to maker2zff. If you are using GFF3 as input to MAKER, you likely have a non-unique ID's there that are causing the issue in the first place. ?Carson > On May 6, 2020, at 11:36 AM, Carneiro,Celine M wrote: > > Hello, > > I am getting the errors gene:multiple_Einit, gene:multiple_Eterm, and exon:overlaps_prev_exon, at just about every gene model. I've ran the first round of maker on a bird genome I'm annotating with no errors and have started the steps to train SNAP. However, after running fathom -categorize, just about every single gene model has the same set of errors. Here is an example from my log file after running fathom -categorize: > > MODEL117 1 1 8 - errors(6): gene:multiple_Einit gene:multiple_Eterm exon-7:overlaps_prev_exon exon-5:overlaps_prev_exon exon-3:overlaps_prev_exon exon-1:overlaps_prev_exon > MODEL851 1 1 100 - errors(78): gene:multiple_Einit gene:multiple_Eterm exon-99:overlaps_prev_exon exon-98:overlaps_prev_exon exon-97:overlaps_prev_exon exon-95:overlaps_prev_exon exon-94:overlaps_prev_exon exon-93:overlaps_prev_exon exon-91:overlaps_prev_exon exon-90:overlaps_prev_exon exon-89:overlaps_prev_exon exon-87:overlaps_prev_exon exon-86:overlaps_prev_exon exon-85:overlaps_prev_exon exon-83:overlaps_prev_exon exon-82:overlaps_prev_exon exon-81:overlaps_prev_exon exon-79:overlaps_prev_exon exon-78:overlaps_prev_exon exon-77:overlaps_prev_exon exon-75:overlaps_prev_exon exon-74:overlaps_prev_exon exon-73:overlaps_prev_exon exon-71:overlaps_prev_exon exon-70:overlaps_prev_exon exon-69:overlaps_prev_exon exon-67:overlaps_prev_exon exon-66:overlaps_prev_exon exon-65:overlaps_prev_exon exon-63:overlaps_prev_exon exon-62:overlaps_prev_exon exon-61:overlaps_prev_exon exon-59:overlaps_prev_exon exon-58:overlaps_prev_exon exon-57:overlaps_prev_exon exon-55:overlaps_prev_exon exon-54:overlaps_prev_exon exon-53:overlaps_prev_exon exon-51:overlaps_prev_exon exon-50:overlaps_prev_exon exon-49:overlaps_prev_exon exon-48:overlaps_prev_exon exon-47:overlaps_prev_exon exon-46:overlaps_prev_exon exon-45:overlaps_prev_exon exon-43:overlaps_prev_exon exon-42:overlaps_prev_exon exon-41:overlaps_prev_exon exon-39:overlaps_prev_exon exon-38:overlaps_prev_exon exon-37:overlaps_prev_exon exon-35:overlaps_prev_exon exon-34:overlaps_prev_exon exon-33:overlaps_prev_exon exon-31:overlaps_prev_exon exon-30:overlaps_prev_exon exon-29:overlaps_prev_exon exon-27:overlaps_prev_exon exon-26:overlaps_prev_exon exon-25:overlaps_prev_exon exon-23:overlaps_prev_exon exon-22:overlaps_prev_exon exon-21:overlaps_prev_exon exon-19:overlaps_prev_exon exon-18:overlaps_prev_exon exon-17:overlaps_prev_exon exon-15:overlaps_prev_exon exon-14:overlaps_prev_exon exon-13:overlaps_prev_exon exon-11:overlaps_prev_exon exon-10:overlaps_prev_exon exon-9:overlaps_prev_exon exon-7:overlaps_prev_exon exon-6:overlaps_prev_exon exon-5:overlaps_prev_exon exon-3:overlaps_prev_exon exon-2:overlaps_prev_exon exon-1:overlaps_prev_exon > MODEL190 1 1 39 + errors(35): gene:multiple_Einit gene:multiple_Eterm exon-2:overlaps_prev_exon exon-3:overlaps_prev_exon exon-4:overlaps_prev_exon exon-5:overlaps_prev_exon exon-6:overlaps_prev_exon exon-7:overlaps_prev_exon exon-8:overlaps_prev_exon exon-9:overlaps_prev_exon exon-11:overlaps_prev_exon exon-12:overlaps_prev_exon exon-13:overlaps_prev_exon exon-14:overlaps_prev_exon exon-15:overlaps_prev_exon exon-16:overlaps_prev_exon exon-17:overlaps_prev_exon exon-18:overlaps_prev_exon exon-20:overlaps_prev_exon exon-21:overlaps_prev_exon exon-22:overlaps_prev_exon exon-23:overlaps_prev_exon exon-24:overlaps_prev_exon exon-25:overlaps_prev_exon exon-26:overlaps_prev_exon exon-27:overlaps_prev_exon exon-29:overlaps_prev_exon exon-30:overlaps_prev_exon exon-32:overlaps_prev_exon exon-33:overlaps_prev_exon exon-34:overlaps_prev_exon exon-35:overlaps_prev_exon exon-36:overlaps_prev_exon exon-38:overlaps_prev_exon exon-39:overlaps_prev_exon > MODEL424 1 1 10 - errors(8): gene:multiple_Einit gene:multiple_Eterm exon-9:overlaps_prev_exon exon-7:overlaps_prev_exon exon-6:overlaps_prev_exon exon-5:overlaps_prev_exon exon-3:overlaps_prev_exon exon-1:overlaps_prev_exon > MODEL902 1 1 20 - errors(14): gene:multiple_Einit gene:multiple_Eterm exon-19:overlaps_prev_exon exon-18:overlaps_prev_exon exon-17:overlaps_prev_exon exon-15:overlaps_prev_exon exon-13:overlaps_prev_exon exon-11:overlaps_prev_exon exon-9:overlaps_prev_exon exon-7:overlaps_prev_exon exon-6:overlaps_prev_exon exon-5:overlaps_prev_exon exon-3:overlaps_prev_exon exon-1:overlaps_prev_exon > MODEL238 1 1 14 - errors(11): gene:multiple_Einit gene:multiple_Eterm exon-13:overlaps_prev_exon exon-12:overlaps_prev_exon exon-11:overlaps_prev_exon exon-9:overlaps_prev_exon exon-7:overlaps_prev_exon exon-6:overlaps_prev_exon exon-5:overlaps_prev_exon exon-3:overlaps_prev_exon exon-1:overlaps_prev_exon > MODEL39 1 1 6 - errors(1): exon-3:overlaps_prev_exon > MODEL119 1 1 10 + errors(8): gene:multiple_Einit gene:multiple_Eterm exon-2:overlaps_prev_exon exon-4:overlaps_prev_exon exon-6:overlaps_prev_exon exon-7:overlaps_prev_exon exon-8:overlaps_prev_exon exon-10:overlaps_prev_exon > > Furthermore, I checked my genome.ann file and noticed that my Einit and Exon sites are duplicated. For example: > > >ScdimlH_1004;HRSCAF=1084 > Einit 38730 38677 MODEL851 > Exon 38255 38178 MODEL851 > Exon 38074 38021 MODEL851 > Exon 24755 24717 MODEL851 > Exon 24213 24149 MODEL851 > Exon 23176 23098 MODEL851 > Exon 22037 21961 MODEL851 > Exon 21269 21080 MODEL851 > Exon 20232 20167 MODEL851 > Exon 19742 19704 MODEL851 > Exon 14705 14590 MODEL851 > Exon 14255 13980 MODEL851 > Exon 14169 13980 MODEL851 > Exon 13303 13223 MODEL851 > Exon 13303 13223 MODEL851 > Exon 12782 12639 MODEL851 > Exon 12782 12639 MODEL851 > Exon 5761 5592 MODEL851 > Exon 5482 5404 MODEL851 > Exon 5140 5064 MODEL851 > Exon 4951 4750 MODEL851 > Exon 4567 4502 MODEL851 > Exon 4256 4185 MODEL851 > Exon 3569 3403 MODEL851 > Exon 3157 3076 MODEL851 > Exon 2936 2800 MODEL851 > Eterm 2186 2000 MODEL851 > Einit 38730 38677 MODEL851 > Exon 38255 38178 MODEL851 > Exon 38074 38021 MODEL851 > Exon 24755 24717 MODEL851 > Exon 24213 24149 MODEL851 > Exon 23176 23098 MODEL851 > Exon 22037 21961 MODEL851 > Exon 21269 21080 MODEL851 > Exon 20232 20167 MODEL851 > Exon 19742 19704 MODEL851 > Exon 14705 14590 MODEL851 > Exon 14255 13980 MODEL851 > Exon 14169 13980 MODEL851 > Exon 13303 13223 MODEL851 > Exon 13303 13223 MODEL851 > Exon 12782 12639 MODEL851 > Exon 12782 12639 MODEL851 > Exon 5761 5592 MODEL851 > Exon 5482 5404 MODEL851 > Exon 5140 5064 MODEL851 > Exon 4951 4750 MODEL851 > Exon 4567 4502 MODEL851 > Exon 4256 4185 MODEL851 > Exon 3569 3403 MODEL851 > Exon 3157 3076 MODEL851 > Exon 2936 2800 MODEL851 > Eterm 2186 2000 MODEL851 > > Any ideas why I'm seeing this duplication? Lastly, any ideas why my exons are overlapping so much? I appreciate any input and please let me know if you require any more information. > > Thank you! > > Celine > > > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue May 26 13:26:03 2020 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 26 May 2020 13:26:03 -0600 Subject: [maker-devel] Maker v3.01 change-log + 3'UTR question In-Reply-To: References: Message-ID: <07D0AA31-0FCF-4E60-AC50-1E377CBF870C@gmail.com> 2. If I was to improve the annotation of my 3? UTRs within a certain (non-model species) gff3, is there a particular way or a protocol to follow? I was thinking for example that Lexogen has released their 3? UTR kit for RNA-seq of the three prime end of transcripts. Would it be possible to feed those reads to maker and somehow suggest that the reads are originating from the three-prime end so that this info is then passed in the gff3 file? You could pass final models back in as predicted_gff (no UTR on the models), then pass in just the evidence you want as UTR as est_gff (would have to be assembled and not as individual reads). As long as the overlap the pred_gff models, MAKER would try and make UTR out of them. Might be worth an experiment. ?Carson -------------- next part -------------- An HTML attachment was scrubbed... URL: From shawn.trojahn at wsu.edu Fri May 29 15:49:38 2020 From: shawn.trojahn at wsu.edu (Trojahn, Shawn Michael) Date: Fri, 29 May 2020 21:49:38 +0000 Subject: [maker-devel] Intron lengths below minimum cutoff Message-ID: Hello, I have been having a problem with the final annotation coming from Maker2 where I have a few thousand introns that are below the minimum intron value I have set. Most of the exons around these problem introns have no support in the final merged gff file, but a few are supported by blast hits. Is there a reason why these introns would remain in the final gff? Thanks, Shawn -------------- next part -------------- An HTML attachment was scrubbed... URL: