From carsonhh at gmail.com Fri Aug 21 10:56:37 2020 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 21 Aug 2020 10:56:37 -0600 Subject: [maker-devel] maker-devel post from jgrant@smith.edu requires approval In-Reply-To: References: Message-ID: <2DC10381-54DB-4386-9DC0-A723E01C20B3@gmail.com> MAKER on its own can run under MPI to run on multiple cores or across multiple machines on a local computer cluster. It can also restart as long as you run in the same directory. However when controlled by Galaxy I do not know if the restart is possible since Galaxy controls the run directory. Similarly I don?t know if Galaxy can launch it via MPI. ?Carson > From: Jessica Grant > Subject: Possible to restart maker run through a local galaxy? > Date: August 6, 2020 at 12:06:57 PM MDT > To: maker-devel at yandell-lab.org > > > Hi, > > I have a local galaxy instance and installed maker through the tool shed. I have been running it on a large genome and it had been running on one core for a few weeks. Then my IT guys needed to take the server down so stopped the run. > > I wonder if I can restart it - I have all the intermediate files in my /galaxy/database/jobs_directory and also, is there a way to run it on multiple cores? > > Thank you! > > Jessica > > > > From: maker-devel-request at yandell-lab.org > Subject: confirm 6c52cdc17b7f8c4718d930157625ecc62c32c681 > Date: August 6, 2020 at 12:07:14 PM MDT > > > If you reply to this message, keeping the Subject: header intact, > Mailman will discard the held message. Do this if the message is > spam. If you reply to this message and include an Approved: header > with the list password in it, the message will be approved for posting > to the list. The Approved: header can also appear in the first line > of the body of the reply. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Aug 21 11:09:26 2020 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 21 Aug 2020 11:09:26 -0600 Subject: [maker-devel] Maker Snap annotation yields match_part In-Reply-To: References: Message-ID: <7F42897F-8BBA-4FA0-AC64-128A0CF727C6@gmail.com> If you just want a count you can grep for match with tabs at either side ?> grep -P ?\tmatch\t? file.gff If you want the model, you need to assemble the match parts onto the match parent. ?Carson > On Jul 26, 2020, at 11:31 PM, Emmanuel Nnadi wrote: > > > Hello, > I am annotating my genome using SNAP after running SNAP twice my GFF has quite a number of match_part. How can I get an actual match? > > > > ilon_pilon:hit:2447078:4.5.0.0;Target=snap_masked-contig_844_pilon_pilon_pilon-abinit-gene-0.1-mRNA-1 890 1003 +;Gap=M114 > contig_844_pilon_pilon_pilon snap_masked match_part 66719 66729 8.644 + . ID=contig_844_pilon_pilon_pilon:hsp:2866471:4.5.0.0;Parent=contig_844_pilon_pilon_pilon:hit:2447078:4.5.0.0;Target=snap_masked-contig_844_pilon_pilon_pilon-abinit-gene-0.1-mRNA-1 1004 1014 +;Gap=M11 > Nnadi Nnaemeka Emmanuel,Ph.D > Department of Microbiology, > Faculty of Natural and Applied Science, > Plateau State University, Bokkos, Plateau State, Nigeria. > +2348068124819 > Publications: > https://www.researchgate.net/profile/Emmanuel_Nnadi/publications -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Aug 21 11:12:41 2020 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 21 Aug 2020 11:12:41 -0600 Subject: [maker-devel] maker-devel post from wei.xiong@wur.nl requires approval In-Reply-To: References: Message-ID: <7C34D2E2-2C8D-4723-9512-CED5B98F9144@gmail.com> Sorry for the slow reply. You can merge them into one file, specify multiple files using a comma to separate the list. If you choose the comma option you can also add tags the GFF3 by including a ?:? and then a tag after each file name. Example protein=file1.fasta:swissprot,file2.fasta:NCBI,file3.fasta:N_vect ?Carson > From: "Xiong, Wei" > > Subject: protein homology evidence > Date: July 24, 2020 at 3:04:43 AM MDT > To: "'maker-devel at yandell-lab.org '" > > > > Dear colleague, > > I couldn?t find a related answer online for the following question that might sound silly. > If I want to include protein homology evidence from multiple species, what should I do? > Can I directly merge the protein sequence files of different species and feed it to MAKER? > Thank you for your help in advance. I look forward to hearing from you. > I wish you a pleasant and healthy summer! > > Met vriendelijke groet, > With kind regards, > > Wei Xiong > PhD candidate | Wageningen University & Research > Plant Science Group | Biosystematics Group Radix Building 107 Droevendaalsesteeg 1 > 6708 PB Wageningen > The Netherlands > E-mail: wei.xiong at wur.nl > > > > > From: maker-devel-request at yandell-lab.org > Subject: confirm 3d4c53a4046dc382d8a1b23e1e2077e90cc821b6 > Date: July 24, 2020 at 3:04:58 AM MDT > > > If you reply to this message, keeping the Subject: header intact, > Mailman will discard the held message. Do this if the message is > spam. If you reply to this message and include an Approved: header > with the list password in it, the message will be approved for posting > to the list. The Approved: header can also appear in the first line > of the body of the reply. -------------- next part -------------- An HTML attachment was scrubbed... URL: From eennadi at gmail.com Fri Aug 21 12:38:20 2020 From: eennadi at gmail.com (Emmanuel Nnadi) Date: Fri, 21 Aug 2020 19:38:20 +0100 Subject: [maker-devel] Maker Snap annotation yields match_part In-Reply-To: <7F42897F-8BBA-4FA0-AC64-128A0CF727C6@gmail.com> References: <7F42897F-8BBA-4FA0-AC64-128A0CF727C6@gmail.com> Message-ID: Thanks Carson Please how do I assemble the match-part on the match parent? Any special instruction on the maker control file? On Fri, 21 Aug 2020 at 6:09 PM, Carson Holt wrote: > If you just want a count you can grep for match with tabs at either side > ?> grep -P ?\tmatch\t? file.gff > > If you want the model, you need to assemble the match parts onto the match > parent. > > ?Carson > > > On Jul 26, 2020, at 11:31 PM, Emmanuel Nnadi wrote: > > > Hello, > I am annotating my genome using SNAP after running SNAP twice my GFF has > quite a number of match_part. How can I get an actual match? > > > > ilon_pilon:hit:2447078:4.5.0.0;Target=snap_masked-contig_844_pilon_pilon_pilon-abinit-gene-0.1-mRNA-1 > 890 1003 +;Gap=M114 > contig_844_pilon_pilon_pilon snap_masked match_part 66719 > 66729 8.644 + . > ID=contig_844_pilon_pilon_pilon:hsp:2866471:4.5.0.0;Parent=contig_844_pilon_pilon_pilon:hit:2447078:4.5.0.0;Target=snap_masked-contig_844_pilon_pilon_pilon-abinit-gene-0.1-mRNA-1 > 1004 1014 +;Gap=M11 > Nnadi Nnaemeka Emmanuel,Ph.D > Department of Microbiology, > Faculty of Natural and Applied Science, > Plateau State University, Bokkos, Plateau State, Nigeria. > +2348068124819 > Publications: > https://www.researchgate.net/profile/Emmanuel_Nnadi/publications > > > > -- Nnadi Nnaemeka Emmanuel,Ph.D Department of Microbiology, Faculty of Natural and Applied Science, Plateau State University, Bokkos, Plateau State, Nigeria. +2348068124819 Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Aug 21 12:47:11 2020 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 21 Aug 2020 12:47:11 -0600 Subject: [maker-devel] Maker Snap annotation yields match_part In-Reply-To: References: <7F42897F-8BBA-4FA0-AC64-128A0CF727C6@gmail.com> Message-ID: If you want to try and do this programmatically, you can try libraries like the Gene Annotation Library (GAL) from the Sequence Ontology ?> https://github.com/The-Sequence-Ontology/GAL Or if you are looking for visualization or other manipulation you can try one of the GMOD tools ?> http://gmod.org/wiki/Main_Page ?Carson > On Aug 21, 2020, at 12:38 PM, Emmanuel Nnadi wrote: > > Thanks Carson > > Please how do I assemble the match-part on the match parent? Any special instruction on the maker control file? > > > On Fri, 21 Aug 2020 at 6:09 PM, Carson Holt > wrote: > If you just want a count you can grep for match with tabs at either side ?> grep -P ?\tmatch\t? file.gff > > If you want the model, you need to assemble the match parts onto the match parent. > > ?Carson > > >> On Jul 26, 2020, at 11:31 PM, Emmanuel Nnadi > wrote: >> >> >> Hello, >> I am annotating my genome using SNAP after running SNAP twice my GFF has quite a number of match_part. How can I get an actual match? >> >> >> >> ilon_pilon:hit:2447078:4.5.0.0;Target=snap_masked-contig_844_pilon_pilon_pilon-abinit-gene-0.1-mRNA-1 890 1003 +;Gap=M114 >> contig_844_pilon_pilon_pilon snap_masked match_part 66719 66729 8.644 + . ID=contig_844_pilon_pilon_pilon:hsp:2866471:4.5.0.0;Parent=contig_844_pilon_pilon_pilon:hit:2447078:4.5.0.0;Target=snap_masked-contig_844_pilon_pilon_pilon-abinit-gene-0.1-mRNA-1 1004 1014 +;Gap=M11 >> Nnadi Nnaemeka Emmanuel,Ph.D >> Department of Microbiology, >> Faculty of Natural and Applied Science, >> Plateau State University, Bokkos, Plateau State, Nigeria. >> +2348068124819 >> Publications: >> https://www.researchgate.net/profile/Emmanuel_Nnadi/publications >> > > -- > Nnadi Nnaemeka Emmanuel,Ph.D > Department of Microbiology, > Faculty of Natural and Applied Science, > Plateau State University, Bokkos, Plateau State, Nigeria. > +2348068124819 > Publications: > https://www.researchgate.net/profile/Emmanuel_Nnadi/publications -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Aug 21 12:49:56 2020 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 21 Aug 2020 12:49:56 -0600 Subject: [maker-devel] Maker Snap annotation yields match_part In-Reply-To: References: <7F42897F-8BBA-4FA0-AC64-128A0CF727C6@gmail.com> Message-ID: <24B6BD02-B57C-4426-B919-52533BA3DC2B@gmail.com> Also, if you are just trying to see what the gene model of a match/match_part would have been, you can change maker_opts.ctl settings to keep all models (keep_pred=1) or pass in the models you want as match/match_part (Fred_gff=) and then let it be turned into an mRNA/exon/CDS feature. match/match_part are reference features that let you see where the annotations came from. ?Carson > On Aug 21, 2020, at 12:47 PM, Carson Holt wrote: > > If you want to try and do this programmatically, you can try libraries like the Gene Annotation Library (GAL) from the Sequence Ontology ?> https://github.com/The-Sequence-Ontology/GAL > > Or if you are looking for visualization or other manipulation you can try one of the GMOD tools ?> http://gmod.org/wiki/Main_Page > > ?Carson > > >> On Aug 21, 2020, at 12:38 PM, Emmanuel Nnadi > wrote: >> >> Thanks Carson >> >> Please how do I assemble the match-part on the match parent? Any special instruction on the maker control file? >> >> >> On Fri, 21 Aug 2020 at 6:09 PM, Carson Holt > wrote: >> If you just want a count you can grep for match with tabs at either side ?> grep -P ?\tmatch\t? file.gff >> >> If you want the model, you need to assemble the match parts onto the match parent. >> >> ?Carson >> >> >>> On Jul 26, 2020, at 11:31 PM, Emmanuel Nnadi > wrote: >>> >>> >>> Hello, >>> I am annotating my genome using SNAP after running SNAP twice my GFF has quite a number of match_part. How can I get an actual match? >>> >>> >>> >>> ilon_pilon:hit:2447078:4.5.0.0;Target=snap_masked-contig_844_pilon_pilon_pilon-abinit-gene-0.1-mRNA-1 890 1003 +;Gap=M114 >>> contig_844_pilon_pilon_pilon snap_masked match_part 66719 66729 8.644 + . ID=contig_844_pilon_pilon_pilon:hsp:2866471:4.5.0.0;Parent=contig_844_pilon_pilon_pilon:hit:2447078:4.5.0.0;Target=snap_masked-contig_844_pilon_pilon_pilon-abinit-gene-0.1-mRNA-1 1004 1014 +;Gap=M11 >>> Nnadi Nnaemeka Emmanuel,Ph.D >>> Department of Microbiology, >>> Faculty of Natural and Applied Science, >>> Plateau State University, Bokkos, Plateau State, Nigeria. >>> +2348068124819 >>> Publications: >>> https://www.researchgate.net/profile/Emmanuel_Nnadi/publications >>> >> >> -- >> Nnadi Nnaemeka Emmanuel,Ph.D >> Department of Microbiology, >> Faculty of Natural and Applied Science, >> Plateau State University, Bokkos, Plateau State, Nigeria. >> +2348068124819 >> Publications: >> https://www.researchgate.net/profile/Emmanuel_Nnadi/publications > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Aug 21 12:55:40 2020 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 21 Aug 2020 12:55:40 -0600 Subject: [maker-devel] Intron lengths below minimum cutoff In-Reply-To: References: Message-ID: Looking through old messages, it looks like this one fell through the cracks. The min_intron parameter is for exonerate alignments (gets past directly to the algorithm). However gene predictors like SNAP and Augustus can still call introns of any length they wish (doesn?t affect them). So you can still get shorter introns in the model, you just won?t get short introns from exonerate or in the evidence hints passed to SNAP and Augustus. ?Carson > On May 29, 2020, at 3:49 PM, Trojahn, Shawn Michael wrote: > > Hello, > > I have been having a problem with the final annotation coming from Maker2 where I have a few thousand introns that are below the minimum intron value I have set. Most of the exons around these problem introns have no support in the final merged gff file, but a few are supported by blast hits. Is there a reason why these introns would remain in the final gff? > > Thanks, > Shawn > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Aug 21 14:45:43 2020 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 21 Aug 2020 14:45:43 -0600 Subject: [maker-devel] maker-devel post from christopher.keeling.1@ulaval.ca requires approval In-Reply-To: References: Message-ID: Hi Chris, Sorry for the slow reply. Actually The desired behavior is to capture the entire name from the fasta header and not try and divide on the pipes. NCBI BLAST versions have historically done this dividing but only for NCBI sourced data (it won?t do it for Swiss-prot for example or at least it wouldn?t with all previous versions). If it is doing that now, that is a rather big behavior change, but can be turned off by adding -show_gis to the blast command line. Thanks, Carson > From: Christopher Keeling > Subject: Re: Maker 2.31.10: maker_functional_gff and maker_functional_fasta not parsing correctly, Can't use string ("") as a HASH ref while "strict refs" in use > Date: July 7, 2020 at 6:12:37 PM MDT > To: "maker-devel at yandell-lab.org" > > > Hi Carson, > > I?m now using Maker 3.01.03, and I?m finding that maker_functional_gff and maker_functional_fasta still are not behaving as they should. I?m getting an error: > > Can't use string ("") as a HASH ref while "strict refs" in use at /usr/local/bin/maker/bin/maker_functional_gff line 55, <$IN> line 167. > > Version 2020_03 of uniprot_sprot.fasta starts like this: > > >sp|Q6GZX4|001R_FRG3G Putative transcription factor 001R OS=Frog virus 3 (isolate Goorha) OX=654924 GN=FV3-001R PE=4 SV=1 > > Based on your scripts, this is the example of your first condition. However, I find that I need to change it (in red) to get it to work as I understand it should work: > > #>sp|Q6GZX4|001R_FRG3G Putative transcription factor 001R OS=Frog virus 3 (isolate Goorha) OX=654924 GN=FV3-001R PE=4 SV=1 > if (/>sp\|(\S+)\|\S+\s+(.*?)\s+OS=(.*?)\s+OX=\S+\s+(GN=(.*?)\s+)?PE=/) { > $id = $1; > $desc = $2; > $org = $3; > $name = $5 || ?'; > } > > Compared to what is in 3.01.03: > #>sp|Q6GZX4|001R_FRG3G Putative transcription factor 001R OS=Frog virus 3 (isolate Goorha) OX=654924 GN=FV3-001R PE=4 SV=1 > if (/>(\S+)\s+(.*?)\s+OS=(.*?)\s+OX=(.*?)\s+(GN=(.*?)\s+)?PE=/) { > $id = $1; > $desc = $2; > $org = $3; > $name = $6 || ''; > } > > Thus, with my edits: > >sp|Q62559|IFT52_MOUSE Intraflagellar transport protein 52 homolog OS=Mus musculus OX=10090 GN=Ift52 PE=1 SV=2 > > maker_functional_gff would result in: > ...Note=Similar to Ift52: Intraflagellar transport protein 52 homolog (Mus musculus); > > And maker_function_gff would result in: > Name:"Similar to Ift52 Intraflagellar transport protein 52 homolog (Mus musculus)" > > Are these the expected behaviours? > > Cheers, > > Chris > >> On Mar 14, 2020, at 1:24 PM, Christopher Keeling > wrote: >> >> Hello, >> >> In sub parse_blast{, during parsing of uniprot fasta file: >> >> if (/>(\S+)\s+(.*?)\s+OS=(.*?)\s+(GN=(.*?)\s+)?PE=/) { >> >> should be changed to: >> >> if (/>sp\|(\S+)\|\S+\s+(.*?)\s+OS=(.*?)\s+OX=\S+\s+(GN=(.*?)\s+)?PE=/) { >> >> to avoid "Can't use string ("") as a HASH ref while "strict refs" in use at?" errors. >> >> For UniProt release 2020_01: ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.fasta.gz >> >> Cheers, >> Chris >> > > > > > From: maker-devel-request at yandell-lab.org > Subject: confirm 4103e2b4c7646d07c7e79febdc4867fcd9cb2430 > Date: July 7, 2020 at 6:12:59 PM MDT > > > If you reply to this message, keeping the Subject: header intact, > Mailman will discard the held message. Do this if the message is > spam. If you reply to this message and include an Approved: header > with the list password in it, the message will be approved for posting > to the list. The Approved: header can also appear in the first line > of the body of the reply. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Aug 21 14:56:52 2020 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 21 Aug 2020 14:56:52 -0600 Subject: [maker-devel] maker-devel post from dianad.mosa@gmail.com requires approval In-Reply-To: References: Message-ID: MAKER?s internal logic has not changed at all. The new release is a minimal update to handle newer versions of GeneMark, tRNAScan, RepeatMasker, and Swiss-Prot (slight deviations from historical output formats). *Update to handle most recent tRNAScan output format. *Update to handle most recent GeneMark output format. *Update to downstream putative function scripts to handle newer Swiss-Prot FASTA header format *Some version/configurations of RepeatMasker can report 0 as a start/end coordinate when it should be 1 (0 is out of bounds), so we fix it when we see it. ?Carson > On Jul 17, 2020, at 3:42 PM, maker-devel-owner at yandell-lab.org wrote: > > As list administrator, your authorization is requested for the > following mailing list posting: > > List: maker-devel at yandell-lab.org > From: dianad.mosa at gmail.com > Subject: INCONSISTENCY BETWEEN PROTEINS AND GFF > Reason: Post by non-member to a members-only list > > At your convenience, visit: > > http://yandell-lab.org/mailman/admindb/maker-devel_yandell-lab.org > > to approve or deny the request. > > From: Diana Moreno Santill?n > Subject: INCONSISTENCY BETWEEN PROTEINS AND GFF > Date: July 17, 2020 at 3:32:20 PM MDT > To: maker-devel at yandell-lab.org > > > Hello, > > I am using MAKER for mammalian genomes. > I am running 3 rounds for most of them, as it seems to increase completeness according to BUSCO. > The issue is that the round 3 output has inconsistencies. > > For example, at the protein.fasta file I have sequences that after performing blast and rename are annotated, but at the gff file are missing, for example: > > > %grep AnoCau_scaffold_40730 acaudifer_3rd_run_all_maker_proteins_renamed_putative_function.fasta > augustus_masked-AnoCau_scaffold_40730-processed-gene-0.0-mRNA-1 protein Name:"Similar to IFNL3 Interferon lambda-3 (Homo sapiens OX=9606)" AED:0.28 eAED:0.23 QI:0|0|0|1|1|1|4|0|183 > > %grep AnoCau_scaffold_40730 acaudifer_3rd_run_all_maker_renamed_putative_function.gff > nothing > > Why is the sequence present in my proteins file, but not at my gff3 file? > > Is worth to notice that when I tried to run maps_id I got: > WARNING: No mapping available for AnoCau_scaffold_40730 > > I found that if I use the gff file from previous rounds of Maker as evidence, this could happen, but I haven't found a solution yet. What do you recommend to avoid these changes of names? > Why I have sequences in my protein file but not in my gff file? > > Thank you for your attention. > > Diana Moreno > > > > From: maker-devel-request at yandell-lab.org > Subject: confirm 2fbcb7f0a71181de1fcea3c9d2bf4e09bbaca6e4 > Date: July 17, 2020 at 3:42:30 PM MDT > > > If you reply to this message, keeping the Subject: header intact, > Mailman will discard the held message. Do this if the message is > spam. If you reply to this message and include an Approved: header > with the list password in it, the message will be approved for posting > to the list. The Approved: header can also appear in the first line > of the body of the reply. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Aug 21 15:00:06 2020 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 21 Aug 2020 15:00:06 -0600 Subject: [maker-devel] Maker2 changelog and Maker3 In-Reply-To: References: Message-ID: <39B39BFE-864C-4DD0-B538-15F3CAD6043F@gmail.com> Also MAKER2 vs MAKER3. Apart from EVM support, MAKER3 also has modified logic for est2genome and protein2genome that results in better forward mapping of old genes between assemblies and can use est_gff for UTR. MAKER2 only uses est_gff for predictor hints but not for UTR generation. ?Carson > On Jun 2, 2020, at 12:57 AM, Xabier V?zquez-Campos wrote: > > Hi Carson, > > I was looking for some Maker-related stuff (not important what) and I realised that there was a new release in April for Maker 2 and Maker 3 was not in beta anymore. > > We have been using Maker 2.31.9 in our cluster for a few years now and while the new (2.31.11) includes the most recent change (maker_functional_fasta and maker_functional_gff handle newer UniProt/Swiss-Prot format. Also fix for newer genemark command line structure) I can't find what changed in 2.31.10 nor a general changelog. > By the way, do you know when (version) the genemark command line structure changed? What about the Uniprot format? > > Also, is there any fundamental difference aside of the integration of EVM in Maker3 vs Maker2? I found this entry from 2 years ago and wanted to check if this is still the case. > http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/2018-August/003279.html > > Thank you, > Xabi > > > PS: is there a way to receive notifications for new releases? I didn't notice anything in the mail list > -- > Xabier V?zquez-Campos, PhD > Research Associate > NSW Systems Biology Initiative > School of Biotechnology and Biomolecular Sciences > The University of New South Wales > Sydney NSW 2052 AUSTRALIA > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From mj.gomez12 at uniandes.edu.co Wed Aug 26 22:35:06 2020 From: mj.gomez12 at uniandes.edu.co (Maria Jose Gomez Hughes) Date: Thu, 27 Aug 2020 04:35:06 +0000 Subject: [maker-devel] fasta_merge problem Message-ID: Hi! I have been using maker to annotate an eukaryote genome. Since the genome was big (~3Gb), I divided it into 10 chunks with fasta_utils and run maker on each separately. After that, I used gff3_merge and fasta_merge to produce output for each of the chunks. Then I tried using these same tools for merging the different outputs into one, and while gff3_merge ran as expected, fasta_merge did not. I used the following command: fasta_merge -o maker.all.proteins.fasta -i maker.chunk-00.proteins.fasta maker.chunk-01.proteins.fasta maker.chunk-02.proteins.fasta maker.chunk-03.proteins.fasta maker.chunk-04.proteins.fasta maker.chunk-05.proteins.fasta maker.chunk-06.proteins.fasta maker.chunk-07.proteins.fasta maker.chunk-08.proteins.fasta maker.chunk-09.proteins.fasta But instead of running it just showed me the help page. I tried just running two of them, but that didn?t help either. Any help will be much appreciated! Maria Jose Gomez -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Aug 27 13:52:53 2020 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 27 Aug 2020 13:52:53 -0600 Subject: [maker-devel] fasta_merge problem In-Reply-To: References: Message-ID: The main function of fasta_merge is to use the datastore index file to find and concatenate fasta files spread throughout the datastore. If you wish to merge a handful of fasta files into one because you made 10 chunks, you can simply merge the files you want using the Linux ?cat? command. Example 1: ?cat file1 file2 file3 file4 > merge.fasta' Example 2: ?cat *output/*.maker.proteins.fasta > merge.maker.proteins.fasta' ?Carson > On Aug 26, 2020, at 10:35 PM, Maria Jose Gomez Hughes wrote: > > Hi! > > I have been using maker to annotate an eukaryote genome. Since the genome was big (~3Gb), I divided it into 10 chunks with fasta_utils and run maker on each separately. After that, I used gff3_merge and fasta_merge to produce output for each of the chunks. Then I tried using these same tools for merging the different outputs into one, and while gff3_merge ran as expected, fasta_merge did not. I used the following command: > > fasta_merge -o maker.all.proteins.fasta -i maker.chunk-00.proteins.fasta maker.chunk-01.proteins.fasta maker.chunk-02.proteins.fasta maker.chunk-03.proteins.fasta maker.chunk-04.proteins.fasta maker.chunk-05.proteins.fasta maker.chunk-06.proteins.fasta maker.chunk-07.proteins.fasta maker.chunk-08.proteins.fasta maker.chunk-09.proteins.fasta > > But instead of running it just showed me the help page. I tried just running two of them, but that didn?t help either. > > Any help will be much appreciated! > > Maria Jose Gomez > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From wei.xiong at wur.nl Fri Aug 28 01:33:33 2020 From: wei.xiong at wur.nl (Xiong, Wei) Date: Fri, 28 Aug 2020 07:33:33 +0000 Subject: [maker-devel] ERROR: Failed while processing all repeats Message-ID: Dear Colleague, I had encountered an ERROR when I used MAKER to annotate my genome. It is a large plant genome (more than 3Gb), I included a TE (gff) data, one Transcriptome data (fasta), and four protein sequences (fasta) for the homology annotation. There are in total 29 scaffolds. Three scaffolds faced the "ERROR: Failed while processing all repeats," while the other 26 finished successfully. I have tried the following methods from the online forum. However, I still cannot fix the error. * Check the RepeatMasker configuration * Check the maker_exe.ctl * replace .../maker/lib/Widget/RepeatMasker.pm http://gmod.827538.n3.nabble.com/MAKER-v3-ERROR-Failed-while-processing-all-repeats-td4059410.html * increase the try_count in the maker_opt.ctl Could you please help me to solve this problem? Thank you for reading my email. I look forward to hearing from you. Best wishes and stay healthy, Met vriendelijke groet, With kind regards, Wei Xiong PhD candidate | Wageningen University & Research Plant Science Group | Biosystematics Group Radix Building 107 Droevendaalsesteeg 1 6708 PB Wageningen The Netherlands Email: wei.xiong at wur.nl -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Aug 21 10:56:37 2020 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 21 Aug 2020 10:56:37 -0600 Subject: [maker-devel] maker-devel post from jgrant@smith.edu requires approval In-Reply-To: References: Message-ID: <2DC10381-54DB-4386-9DC0-A723E01C20B3@gmail.com> MAKER on its own can run under MPI to run on multiple cores or across multiple machines on a local computer cluster. It can also restart as long as you run in the same directory. However when controlled by Galaxy I do not know if the restart is possible since Galaxy controls the run directory. Similarly I don?t know if Galaxy can launch it via MPI. ?Carson > From: Jessica Grant > Subject: Possible to restart maker run through a local galaxy? > Date: August 6, 2020 at 12:06:57 PM MDT > To: maker-devel at yandell-lab.org > > > Hi, > > I have a local galaxy instance and installed maker through the tool shed. I have been running it on a large genome and it had been running on one core for a few weeks. Then my IT guys needed to take the server down so stopped the run. > > I wonder if I can restart it - I have all the intermediate files in my /galaxy/database/jobs_directory and also, is there a way to run it on multiple cores? > > Thank you! > > Jessica > > > > From: maker-devel-request at yandell-lab.org > Subject: confirm 6c52cdc17b7f8c4718d930157625ecc62c32c681 > Date: August 6, 2020 at 12:07:14 PM MDT > > > If you reply to this message, keeping the Subject: header intact, > Mailman will discard the held message. Do this if the message is > spam. If you reply to this message and include an Approved: header > with the list password in it, the message will be approved for posting > to the list. The Approved: header can also appear in the first line > of the body of the reply. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Aug 21 11:09:26 2020 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 21 Aug 2020 11:09:26 -0600 Subject: [maker-devel] Maker Snap annotation yields match_part In-Reply-To: References: Message-ID: <7F42897F-8BBA-4FA0-AC64-128A0CF727C6@gmail.com> If you just want a count you can grep for match with tabs at either side ?> grep -P ?\tmatch\t? file.gff If you want the model, you need to assemble the match parts onto the match parent. ?Carson > On Jul 26, 2020, at 11:31 PM, Emmanuel Nnadi wrote: > > > Hello, > I am annotating my genome using SNAP after running SNAP twice my GFF has quite a number of match_part. How can I get an actual match? > > > > ilon_pilon:hit:2447078:4.5.0.0;Target=snap_masked-contig_844_pilon_pilon_pilon-abinit-gene-0.1-mRNA-1 890 1003 +;Gap=M114 > contig_844_pilon_pilon_pilon snap_masked match_part 66719 66729 8.644 + . ID=contig_844_pilon_pilon_pilon:hsp:2866471:4.5.0.0;Parent=contig_844_pilon_pilon_pilon:hit:2447078:4.5.0.0;Target=snap_masked-contig_844_pilon_pilon_pilon-abinit-gene-0.1-mRNA-1 1004 1014 +;Gap=M11 > Nnadi Nnaemeka Emmanuel,Ph.D > Department of Microbiology, > Faculty of Natural and Applied Science, > Plateau State University, Bokkos, Plateau State, Nigeria. > +2348068124819 > Publications: > https://www.researchgate.net/profile/Emmanuel_Nnadi/publications -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Aug 21 11:12:41 2020 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 21 Aug 2020 11:12:41 -0600 Subject: [maker-devel] maker-devel post from wei.xiong@wur.nl requires approval In-Reply-To: References: Message-ID: <7C34D2E2-2C8D-4723-9512-CED5B98F9144@gmail.com> Sorry for the slow reply. You can merge them into one file, specify multiple files using a comma to separate the list. If you choose the comma option you can also add tags the GFF3 by including a ?:? and then a tag after each file name. Example protein=file1.fasta:swissprot,file2.fasta:NCBI,file3.fasta:N_vect ?Carson > From: "Xiong, Wei" > > Subject: protein homology evidence > Date: July 24, 2020 at 3:04:43 AM MDT > To: "'maker-devel at yandell-lab.org '" > > > > Dear colleague, > > I couldn?t find a related answer online for the following question that might sound silly. > If I want to include protein homology evidence from multiple species, what should I do? > Can I directly merge the protein sequence files of different species and feed it to MAKER? > Thank you for your help in advance. I look forward to hearing from you. > I wish you a pleasant and healthy summer! > > Met vriendelijke groet, > With kind regards, > > Wei Xiong > PhD candidate | Wageningen University & Research > Plant Science Group | Biosystematics Group Radix Building 107 Droevendaalsesteeg 1 > 6708 PB Wageningen > The Netherlands > E-mail: wei.xiong at wur.nl > > > > > From: maker-devel-request at yandell-lab.org > Subject: confirm 3d4c53a4046dc382d8a1b23e1e2077e90cc821b6 > Date: July 24, 2020 at 3:04:58 AM MDT > > > If you reply to this message, keeping the Subject: header intact, > Mailman will discard the held message. Do this if the message is > spam. If you reply to this message and include an Approved: header > with the list password in it, the message will be approved for posting > to the list. The Approved: header can also appear in the first line > of the body of the reply. -------------- next part -------------- An HTML attachment was scrubbed... URL: From eennadi at gmail.com Fri Aug 21 12:38:20 2020 From: eennadi at gmail.com (Emmanuel Nnadi) Date: Fri, 21 Aug 2020 19:38:20 +0100 Subject: [maker-devel] Maker Snap annotation yields match_part In-Reply-To: <7F42897F-8BBA-4FA0-AC64-128A0CF727C6@gmail.com> References: <7F42897F-8BBA-4FA0-AC64-128A0CF727C6@gmail.com> Message-ID: Thanks Carson Please how do I assemble the match-part on the match parent? Any special instruction on the maker control file? On Fri, 21 Aug 2020 at 6:09 PM, Carson Holt wrote: > If you just want a count you can grep for match with tabs at either side > ?> grep -P ?\tmatch\t? file.gff > > If you want the model, you need to assemble the match parts onto the match > parent. > > ?Carson > > > On Jul 26, 2020, at 11:31 PM, Emmanuel Nnadi wrote: > > > Hello, > I am annotating my genome using SNAP after running SNAP twice my GFF has > quite a number of match_part. How can I get an actual match? > > > > ilon_pilon:hit:2447078:4.5.0.0;Target=snap_masked-contig_844_pilon_pilon_pilon-abinit-gene-0.1-mRNA-1 > 890 1003 +;Gap=M114 > contig_844_pilon_pilon_pilon snap_masked match_part 66719 > 66729 8.644 + . > ID=contig_844_pilon_pilon_pilon:hsp:2866471:4.5.0.0;Parent=contig_844_pilon_pilon_pilon:hit:2447078:4.5.0.0;Target=snap_masked-contig_844_pilon_pilon_pilon-abinit-gene-0.1-mRNA-1 > 1004 1014 +;Gap=M11 > Nnadi Nnaemeka Emmanuel,Ph.D > Department of Microbiology, > Faculty of Natural and Applied Science, > Plateau State University, Bokkos, Plateau State, Nigeria. > +2348068124819 > Publications: > https://www.researchgate.net/profile/Emmanuel_Nnadi/publications > > > > -- Nnadi Nnaemeka Emmanuel,Ph.D Department of Microbiology, Faculty of Natural and Applied Science, Plateau State University, Bokkos, Plateau State, Nigeria. +2348068124819 Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Aug 21 12:47:11 2020 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 21 Aug 2020 12:47:11 -0600 Subject: [maker-devel] Maker Snap annotation yields match_part In-Reply-To: References: <7F42897F-8BBA-4FA0-AC64-128A0CF727C6@gmail.com> Message-ID: If you want to try and do this programmatically, you can try libraries like the Gene Annotation Library (GAL) from the Sequence Ontology ?> https://github.com/The-Sequence-Ontology/GAL Or if you are looking for visualization or other manipulation you can try one of the GMOD tools ?> http://gmod.org/wiki/Main_Page ?Carson > On Aug 21, 2020, at 12:38 PM, Emmanuel Nnadi wrote: > > Thanks Carson > > Please how do I assemble the match-part on the match parent? Any special instruction on the maker control file? > > > On Fri, 21 Aug 2020 at 6:09 PM, Carson Holt > wrote: > If you just want a count you can grep for match with tabs at either side ?> grep -P ?\tmatch\t? file.gff > > If you want the model, you need to assemble the match parts onto the match parent. > > ?Carson > > >> On Jul 26, 2020, at 11:31 PM, Emmanuel Nnadi > wrote: >> >> >> Hello, >> I am annotating my genome using SNAP after running SNAP twice my GFF has quite a number of match_part. How can I get an actual match? >> >> >> >> ilon_pilon:hit:2447078:4.5.0.0;Target=snap_masked-contig_844_pilon_pilon_pilon-abinit-gene-0.1-mRNA-1 890 1003 +;Gap=M114 >> contig_844_pilon_pilon_pilon snap_masked match_part 66719 66729 8.644 + . ID=contig_844_pilon_pilon_pilon:hsp:2866471:4.5.0.0;Parent=contig_844_pilon_pilon_pilon:hit:2447078:4.5.0.0;Target=snap_masked-contig_844_pilon_pilon_pilon-abinit-gene-0.1-mRNA-1 1004 1014 +;Gap=M11 >> Nnadi Nnaemeka Emmanuel,Ph.D >> Department of Microbiology, >> Faculty of Natural and Applied Science, >> Plateau State University, Bokkos, Plateau State, Nigeria. >> +2348068124819 >> Publications: >> https://www.researchgate.net/profile/Emmanuel_Nnadi/publications >> > > -- > Nnadi Nnaemeka Emmanuel,Ph.D > Department of Microbiology, > Faculty of Natural and Applied Science, > Plateau State University, Bokkos, Plateau State, Nigeria. > +2348068124819 > Publications: > https://www.researchgate.net/profile/Emmanuel_Nnadi/publications -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Aug 21 12:49:56 2020 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 21 Aug 2020 12:49:56 -0600 Subject: [maker-devel] Maker Snap annotation yields match_part In-Reply-To: References: <7F42897F-8BBA-4FA0-AC64-128A0CF727C6@gmail.com> Message-ID: <24B6BD02-B57C-4426-B919-52533BA3DC2B@gmail.com> Also, if you are just trying to see what the gene model of a match/match_part would have been, you can change maker_opts.ctl settings to keep all models (keep_pred=1) or pass in the models you want as match/match_part (Fred_gff=) and then let it be turned into an mRNA/exon/CDS feature. match/match_part are reference features that let you see where the annotations came from. ?Carson > On Aug 21, 2020, at 12:47 PM, Carson Holt wrote: > > If you want to try and do this programmatically, you can try libraries like the Gene Annotation Library (GAL) from the Sequence Ontology ?> https://github.com/The-Sequence-Ontology/GAL > > Or if you are looking for visualization or other manipulation you can try one of the GMOD tools ?> http://gmod.org/wiki/Main_Page > > ?Carson > > >> On Aug 21, 2020, at 12:38 PM, Emmanuel Nnadi > wrote: >> >> Thanks Carson >> >> Please how do I assemble the match-part on the match parent? Any special instruction on the maker control file? >> >> >> On Fri, 21 Aug 2020 at 6:09 PM, Carson Holt > wrote: >> If you just want a count you can grep for match with tabs at either side ?> grep -P ?\tmatch\t? file.gff >> >> If you want the model, you need to assemble the match parts onto the match parent. >> >> ?Carson >> >> >>> On Jul 26, 2020, at 11:31 PM, Emmanuel Nnadi > wrote: >>> >>> >>> Hello, >>> I am annotating my genome using SNAP after running SNAP twice my GFF has quite a number of match_part. How can I get an actual match? >>> >>> >>> >>> ilon_pilon:hit:2447078:4.5.0.0;Target=snap_masked-contig_844_pilon_pilon_pilon-abinit-gene-0.1-mRNA-1 890 1003 +;Gap=M114 >>> contig_844_pilon_pilon_pilon snap_masked match_part 66719 66729 8.644 + . ID=contig_844_pilon_pilon_pilon:hsp:2866471:4.5.0.0;Parent=contig_844_pilon_pilon_pilon:hit:2447078:4.5.0.0;Target=snap_masked-contig_844_pilon_pilon_pilon-abinit-gene-0.1-mRNA-1 1004 1014 +;Gap=M11 >>> Nnadi Nnaemeka Emmanuel,Ph.D >>> Department of Microbiology, >>> Faculty of Natural and Applied Science, >>> Plateau State University, Bokkos, Plateau State, Nigeria. >>> +2348068124819 >>> Publications: >>> https://www.researchgate.net/profile/Emmanuel_Nnadi/publications >>> >> >> -- >> Nnadi Nnaemeka Emmanuel,Ph.D >> Department of Microbiology, >> Faculty of Natural and Applied Science, >> Plateau State University, Bokkos, Plateau State, Nigeria. >> +2348068124819 >> Publications: >> https://www.researchgate.net/profile/Emmanuel_Nnadi/publications > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Aug 21 12:55:40 2020 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 21 Aug 2020 12:55:40 -0600 Subject: [maker-devel] Intron lengths below minimum cutoff In-Reply-To: References: Message-ID: Looking through old messages, it looks like this one fell through the cracks. The min_intron parameter is for exonerate alignments (gets past directly to the algorithm). However gene predictors like SNAP and Augustus can still call introns of any length they wish (doesn?t affect them). So you can still get shorter introns in the model, you just won?t get short introns from exonerate or in the evidence hints passed to SNAP and Augustus. ?Carson > On May 29, 2020, at 3:49 PM, Trojahn, Shawn Michael wrote: > > Hello, > > I have been having a problem with the final annotation coming from Maker2 where I have a few thousand introns that are below the minimum intron value I have set. Most of the exons around these problem introns have no support in the final merged gff file, but a few are supported by blast hits. Is there a reason why these introns would remain in the final gff? > > Thanks, > Shawn > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Aug 21 14:45:43 2020 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 21 Aug 2020 14:45:43 -0600 Subject: [maker-devel] maker-devel post from christopher.keeling.1@ulaval.ca requires approval In-Reply-To: References: Message-ID: Hi Chris, Sorry for the slow reply. Actually The desired behavior is to capture the entire name from the fasta header and not try and divide on the pipes. NCBI BLAST versions have historically done this dividing but only for NCBI sourced data (it won?t do it for Swiss-prot for example or at least it wouldn?t with all previous versions). If it is doing that now, that is a rather big behavior change, but can be turned off by adding -show_gis to the blast command line. Thanks, Carson > From: Christopher Keeling > Subject: Re: Maker 2.31.10: maker_functional_gff and maker_functional_fasta not parsing correctly, Can't use string ("") as a HASH ref while "strict refs" in use > Date: July 7, 2020 at 6:12:37 PM MDT > To: "maker-devel at yandell-lab.org" > > > Hi Carson, > > I?m now using Maker 3.01.03, and I?m finding that maker_functional_gff and maker_functional_fasta still are not behaving as they should. I?m getting an error: > > Can't use string ("") as a HASH ref while "strict refs" in use at /usr/local/bin/maker/bin/maker_functional_gff line 55, <$IN> line 167. > > Version 2020_03 of uniprot_sprot.fasta starts like this: > > >sp|Q6GZX4|001R_FRG3G Putative transcription factor 001R OS=Frog virus 3 (isolate Goorha) OX=654924 GN=FV3-001R PE=4 SV=1 > > Based on your scripts, this is the example of your first condition. However, I find that I need to change it (in red) to get it to work as I understand it should work: > > #>sp|Q6GZX4|001R_FRG3G Putative transcription factor 001R OS=Frog virus 3 (isolate Goorha) OX=654924 GN=FV3-001R PE=4 SV=1 > if (/>sp\|(\S+)\|\S+\s+(.*?)\s+OS=(.*?)\s+OX=\S+\s+(GN=(.*?)\s+)?PE=/) { > $id = $1; > $desc = $2; > $org = $3; > $name = $5 || ?'; > } > > Compared to what is in 3.01.03: > #>sp|Q6GZX4|001R_FRG3G Putative transcription factor 001R OS=Frog virus 3 (isolate Goorha) OX=654924 GN=FV3-001R PE=4 SV=1 > if (/>(\S+)\s+(.*?)\s+OS=(.*?)\s+OX=(.*?)\s+(GN=(.*?)\s+)?PE=/) { > $id = $1; > $desc = $2; > $org = $3; > $name = $6 || ''; > } > > Thus, with my edits: > >sp|Q62559|IFT52_MOUSE Intraflagellar transport protein 52 homolog OS=Mus musculus OX=10090 GN=Ift52 PE=1 SV=2 > > maker_functional_gff would result in: > ...Note=Similar to Ift52: Intraflagellar transport protein 52 homolog (Mus musculus); > > And maker_function_gff would result in: > Name:"Similar to Ift52 Intraflagellar transport protein 52 homolog (Mus musculus)" > > Are these the expected behaviours? > > Cheers, > > Chris > >> On Mar 14, 2020, at 1:24 PM, Christopher Keeling > wrote: >> >> Hello, >> >> In sub parse_blast{, during parsing of uniprot fasta file: >> >> if (/>(\S+)\s+(.*?)\s+OS=(.*?)\s+(GN=(.*?)\s+)?PE=/) { >> >> should be changed to: >> >> if (/>sp\|(\S+)\|\S+\s+(.*?)\s+OS=(.*?)\s+OX=\S+\s+(GN=(.*?)\s+)?PE=/) { >> >> to avoid "Can't use string ("") as a HASH ref while "strict refs" in use at?" errors. >> >> For UniProt release 2020_01: ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.fasta.gz >> >> Cheers, >> Chris >> > > > > > From: maker-devel-request at yandell-lab.org > Subject: confirm 4103e2b4c7646d07c7e79febdc4867fcd9cb2430 > Date: July 7, 2020 at 6:12:59 PM MDT > > > If you reply to this message, keeping the Subject: header intact, > Mailman will discard the held message. Do this if the message is > spam. If you reply to this message and include an Approved: header > with the list password in it, the message will be approved for posting > to the list. The Approved: header can also appear in the first line > of the body of the reply. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Aug 21 14:56:52 2020 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 21 Aug 2020 14:56:52 -0600 Subject: [maker-devel] maker-devel post from dianad.mosa@gmail.com requires approval In-Reply-To: References: Message-ID: MAKER?s internal logic has not changed at all. The new release is a minimal update to handle newer versions of GeneMark, tRNAScan, RepeatMasker, and Swiss-Prot (slight deviations from historical output formats). *Update to handle most recent tRNAScan output format. *Update to handle most recent GeneMark output format. *Update to downstream putative function scripts to handle newer Swiss-Prot FASTA header format *Some version/configurations of RepeatMasker can report 0 as a start/end coordinate when it should be 1 (0 is out of bounds), so we fix it when we see it. ?Carson > On Jul 17, 2020, at 3:42 PM, maker-devel-owner at yandell-lab.org wrote: > > As list administrator, your authorization is requested for the > following mailing list posting: > > List: maker-devel at yandell-lab.org > From: dianad.mosa at gmail.com > Subject: INCONSISTENCY BETWEEN PROTEINS AND GFF > Reason: Post by non-member to a members-only list > > At your convenience, visit: > > http://yandell-lab.org/mailman/admindb/maker-devel_yandell-lab.org > > to approve or deny the request. > > From: Diana Moreno Santill?n > Subject: INCONSISTENCY BETWEEN PROTEINS AND GFF > Date: July 17, 2020 at 3:32:20 PM MDT > To: maker-devel at yandell-lab.org > > > Hello, > > I am using MAKER for mammalian genomes. > I am running 3 rounds for most of them, as it seems to increase completeness according to BUSCO. > The issue is that the round 3 output has inconsistencies. > > For example, at the protein.fasta file I have sequences that after performing blast and rename are annotated, but at the gff file are missing, for example: > > > %grep AnoCau_scaffold_40730 acaudifer_3rd_run_all_maker_proteins_renamed_putative_function.fasta > augustus_masked-AnoCau_scaffold_40730-processed-gene-0.0-mRNA-1 protein Name:"Similar to IFNL3 Interferon lambda-3 (Homo sapiens OX=9606)" AED:0.28 eAED:0.23 QI:0|0|0|1|1|1|4|0|183 > > %grep AnoCau_scaffold_40730 acaudifer_3rd_run_all_maker_renamed_putative_function.gff > nothing > > Why is the sequence present in my proteins file, but not at my gff3 file? > > Is worth to notice that when I tried to run maps_id I got: > WARNING: No mapping available for AnoCau_scaffold_40730 > > I found that if I use the gff file from previous rounds of Maker as evidence, this could happen, but I haven't found a solution yet. What do you recommend to avoid these changes of names? > Why I have sequences in my protein file but not in my gff file? > > Thank you for your attention. > > Diana Moreno > > > > From: maker-devel-request at yandell-lab.org > Subject: confirm 2fbcb7f0a71181de1fcea3c9d2bf4e09bbaca6e4 > Date: July 17, 2020 at 3:42:30 PM MDT > > > If you reply to this message, keeping the Subject: header intact, > Mailman will discard the held message. Do this if the message is > spam. If you reply to this message and include an Approved: header > with the list password in it, the message will be approved for posting > to the list. The Approved: header can also appear in the first line > of the body of the reply. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Aug 21 15:00:06 2020 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 21 Aug 2020 15:00:06 -0600 Subject: [maker-devel] Maker2 changelog and Maker3 In-Reply-To: References: Message-ID: <39B39BFE-864C-4DD0-B538-15F3CAD6043F@gmail.com> Also MAKER2 vs MAKER3. Apart from EVM support, MAKER3 also has modified logic for est2genome and protein2genome that results in better forward mapping of old genes between assemblies and can use est_gff for UTR. MAKER2 only uses est_gff for predictor hints but not for UTR generation. ?Carson > On Jun 2, 2020, at 12:57 AM, Xabier V?zquez-Campos wrote: > > Hi Carson, > > I was looking for some Maker-related stuff (not important what) and I realised that there was a new release in April for Maker 2 and Maker 3 was not in beta anymore. > > We have been using Maker 2.31.9 in our cluster for a few years now and while the new (2.31.11) includes the most recent change (maker_functional_fasta and maker_functional_gff handle newer UniProt/Swiss-Prot format. Also fix for newer genemark command line structure) I can't find what changed in 2.31.10 nor a general changelog. > By the way, do you know when (version) the genemark command line structure changed? What about the Uniprot format? > > Also, is there any fundamental difference aside of the integration of EVM in Maker3 vs Maker2? I found this entry from 2 years ago and wanted to check if this is still the case. > http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/2018-August/003279.html > > Thank you, > Xabi > > > PS: is there a way to receive notifications for new releases? I didn't notice anything in the mail list > -- > Xabier V?zquez-Campos, PhD > Research Associate > NSW Systems Biology Initiative > School of Biotechnology and Biomolecular Sciences > The University of New South Wales > Sydney NSW 2052 AUSTRALIA > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From mj.gomez12 at uniandes.edu.co Wed Aug 26 22:35:06 2020 From: mj.gomez12 at uniandes.edu.co (Maria Jose Gomez Hughes) Date: Thu, 27 Aug 2020 04:35:06 +0000 Subject: [maker-devel] fasta_merge problem Message-ID: Hi! I have been using maker to annotate an eukaryote genome. Since the genome was big (~3Gb), I divided it into 10 chunks with fasta_utils and run maker on each separately. After that, I used gff3_merge and fasta_merge to produce output for each of the chunks. Then I tried using these same tools for merging the different outputs into one, and while gff3_merge ran as expected, fasta_merge did not. I used the following command: fasta_merge -o maker.all.proteins.fasta -i maker.chunk-00.proteins.fasta maker.chunk-01.proteins.fasta maker.chunk-02.proteins.fasta maker.chunk-03.proteins.fasta maker.chunk-04.proteins.fasta maker.chunk-05.proteins.fasta maker.chunk-06.proteins.fasta maker.chunk-07.proteins.fasta maker.chunk-08.proteins.fasta maker.chunk-09.proteins.fasta But instead of running it just showed me the help page. I tried just running two of them, but that didn?t help either. Any help will be much appreciated! Maria Jose Gomez -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Aug 27 13:52:53 2020 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 27 Aug 2020 13:52:53 -0600 Subject: [maker-devel] fasta_merge problem In-Reply-To: References: Message-ID: The main function of fasta_merge is to use the datastore index file to find and concatenate fasta files spread throughout the datastore. If you wish to merge a handful of fasta files into one because you made 10 chunks, you can simply merge the files you want using the Linux ?cat? command. Example 1: ?cat file1 file2 file3 file4 > merge.fasta' Example 2: ?cat *output/*.maker.proteins.fasta > merge.maker.proteins.fasta' ?Carson > On Aug 26, 2020, at 10:35 PM, Maria Jose Gomez Hughes wrote: > > Hi! > > I have been using maker to annotate an eukaryote genome. Since the genome was big (~3Gb), I divided it into 10 chunks with fasta_utils and run maker on each separately. After that, I used gff3_merge and fasta_merge to produce output for each of the chunks. Then I tried using these same tools for merging the different outputs into one, and while gff3_merge ran as expected, fasta_merge did not. I used the following command: > > fasta_merge -o maker.all.proteins.fasta -i maker.chunk-00.proteins.fasta maker.chunk-01.proteins.fasta maker.chunk-02.proteins.fasta maker.chunk-03.proteins.fasta maker.chunk-04.proteins.fasta maker.chunk-05.proteins.fasta maker.chunk-06.proteins.fasta maker.chunk-07.proteins.fasta maker.chunk-08.proteins.fasta maker.chunk-09.proteins.fasta > > But instead of running it just showed me the help page. I tried just running two of them, but that didn?t help either. > > Any help will be much appreciated! > > Maria Jose Gomez > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From wei.xiong at wur.nl Fri Aug 28 01:33:33 2020 From: wei.xiong at wur.nl (Xiong, Wei) Date: Fri, 28 Aug 2020 07:33:33 +0000 Subject: [maker-devel] ERROR: Failed while processing all repeats Message-ID: Dear Colleague, I had encountered an ERROR when I used MAKER to annotate my genome. It is a large plant genome (more than 3Gb), I included a TE (gff) data, one Transcriptome data (fasta), and four protein sequences (fasta) for the homology annotation. There are in total 29 scaffolds. Three scaffolds faced the "ERROR: Failed while processing all repeats," while the other 26 finished successfully. I have tried the following methods from the online forum. However, I still cannot fix the error. * Check the RepeatMasker configuration * Check the maker_exe.ctl * replace .../maker/lib/Widget/RepeatMasker.pm http://gmod.827538.n3.nabble.com/MAKER-v3-ERROR-Failed-while-processing-all-repeats-td4059410.html * increase the try_count in the maker_opt.ctl Could you please help me to solve this problem? Thank you for reading my email. I look forward to hearing from you. Best wishes and stay healthy, Met vriendelijke groet, With kind regards, Wei Xiong PhD candidate | Wageningen University & Research Plant Science Group | Biosystematics Group Radix Building 107 Droevendaalsesteeg 1 6708 PB Wageningen The Netherlands Email: wei.xiong at wur.nl -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Aug 21 10:56:37 2020 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 21 Aug 2020 10:56:37 -0600 Subject: [maker-devel] maker-devel post from jgrant@smith.edu requires approval In-Reply-To: References: Message-ID: <2DC10381-54DB-4386-9DC0-A723E01C20B3@gmail.com> MAKER on its own can run under MPI to run on multiple cores or across multiple machines on a local computer cluster. It can also restart as long as you run in the same directory. However when controlled by Galaxy I do not know if the restart is possible since Galaxy controls the run directory. Similarly I don?t know if Galaxy can launch it via MPI. ?Carson > From: Jessica Grant > Subject: Possible to restart maker run through a local galaxy? > Date: August 6, 2020 at 12:06:57 PM MDT > To: maker-devel at yandell-lab.org > > > Hi, > > I have a local galaxy instance and installed maker through the tool shed. I have been running it on a large genome and it had been running on one core for a few weeks. Then my IT guys needed to take the server down so stopped the run. > > I wonder if I can restart it - I have all the intermediate files in my /galaxy/database/jobs_directory and also, is there a way to run it on multiple cores? > > Thank you! > > Jessica > > > > From: maker-devel-request at yandell-lab.org > Subject: confirm 6c52cdc17b7f8c4718d930157625ecc62c32c681 > Date: August 6, 2020 at 12:07:14 PM MDT > > > If you reply to this message, keeping the Subject: header intact, > Mailman will discard the held message. Do this if the message is > spam. If you reply to this message and include an Approved: header > with the list password in it, the message will be approved for posting > to the list. The Approved: header can also appear in the first line > of the body of the reply. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Aug 21 11:09:26 2020 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 21 Aug 2020 11:09:26 -0600 Subject: [maker-devel] Maker Snap annotation yields match_part In-Reply-To: References: Message-ID: <7F42897F-8BBA-4FA0-AC64-128A0CF727C6@gmail.com> If you just want a count you can grep for match with tabs at either side ?> grep -P ?\tmatch\t? file.gff If you want the model, you need to assemble the match parts onto the match parent. ?Carson > On Jul 26, 2020, at 11:31 PM, Emmanuel Nnadi wrote: > > > Hello, > I am annotating my genome using SNAP after running SNAP twice my GFF has quite a number of match_part. How can I get an actual match? > > > > ilon_pilon:hit:2447078:4.5.0.0;Target=snap_masked-contig_844_pilon_pilon_pilon-abinit-gene-0.1-mRNA-1 890 1003 +;Gap=M114 > contig_844_pilon_pilon_pilon snap_masked match_part 66719 66729 8.644 + . ID=contig_844_pilon_pilon_pilon:hsp:2866471:4.5.0.0;Parent=contig_844_pilon_pilon_pilon:hit:2447078:4.5.0.0;Target=snap_masked-contig_844_pilon_pilon_pilon-abinit-gene-0.1-mRNA-1 1004 1014 +;Gap=M11 > Nnadi Nnaemeka Emmanuel,Ph.D > Department of Microbiology, > Faculty of Natural and Applied Science, > Plateau State University, Bokkos, Plateau State, Nigeria. > +2348068124819 > Publications: > https://www.researchgate.net/profile/Emmanuel_Nnadi/publications -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Aug 21 11:12:41 2020 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 21 Aug 2020 11:12:41 -0600 Subject: [maker-devel] maker-devel post from wei.xiong@wur.nl requires approval In-Reply-To: References: Message-ID: <7C34D2E2-2C8D-4723-9512-CED5B98F9144@gmail.com> Sorry for the slow reply. You can merge them into one file, specify multiple files using a comma to separate the list. If you choose the comma option you can also add tags the GFF3 by including a ?:? and then a tag after each file name. Example protein=file1.fasta:swissprot,file2.fasta:NCBI,file3.fasta:N_vect ?Carson > From: "Xiong, Wei" > > Subject: protein homology evidence > Date: July 24, 2020 at 3:04:43 AM MDT > To: "'maker-devel at yandell-lab.org '" > > > > Dear colleague, > > I couldn?t find a related answer online for the following question that might sound silly. > If I want to include protein homology evidence from multiple species, what should I do? > Can I directly merge the protein sequence files of different species and feed it to MAKER? > Thank you for your help in advance. I look forward to hearing from you. > I wish you a pleasant and healthy summer! > > Met vriendelijke groet, > With kind regards, > > Wei Xiong > PhD candidate | Wageningen University & Research > Plant Science Group | Biosystematics Group Radix Building 107 Droevendaalsesteeg 1 > 6708 PB Wageningen > The Netherlands > E-mail: wei.xiong at wur.nl > > > > > From: maker-devel-request at yandell-lab.org > Subject: confirm 3d4c53a4046dc382d8a1b23e1e2077e90cc821b6 > Date: July 24, 2020 at 3:04:58 AM MDT > > > If you reply to this message, keeping the Subject: header intact, > Mailman will discard the held message. Do this if the message is > spam. If you reply to this message and include an Approved: header > with the list password in it, the message will be approved for posting > to the list. The Approved: header can also appear in the first line > of the body of the reply. -------------- next part -------------- An HTML attachment was scrubbed... URL: From eennadi at gmail.com Fri Aug 21 12:38:20 2020 From: eennadi at gmail.com (Emmanuel Nnadi) Date: Fri, 21 Aug 2020 19:38:20 +0100 Subject: [maker-devel] Maker Snap annotation yields match_part In-Reply-To: <7F42897F-8BBA-4FA0-AC64-128A0CF727C6@gmail.com> References: <7F42897F-8BBA-4FA0-AC64-128A0CF727C6@gmail.com> Message-ID: Thanks Carson Please how do I assemble the match-part on the match parent? Any special instruction on the maker control file? On Fri, 21 Aug 2020 at 6:09 PM, Carson Holt wrote: > If you just want a count you can grep for match with tabs at either side > ?> grep -P ?\tmatch\t? file.gff > > If you want the model, you need to assemble the match parts onto the match > parent. > > ?Carson > > > On Jul 26, 2020, at 11:31 PM, Emmanuel Nnadi wrote: > > > Hello, > I am annotating my genome using SNAP after running SNAP twice my GFF has > quite a number of match_part. How can I get an actual match? > > > > ilon_pilon:hit:2447078:4.5.0.0;Target=snap_masked-contig_844_pilon_pilon_pilon-abinit-gene-0.1-mRNA-1 > 890 1003 +;Gap=M114 > contig_844_pilon_pilon_pilon snap_masked match_part 66719 > 66729 8.644 + . > ID=contig_844_pilon_pilon_pilon:hsp:2866471:4.5.0.0;Parent=contig_844_pilon_pilon_pilon:hit:2447078:4.5.0.0;Target=snap_masked-contig_844_pilon_pilon_pilon-abinit-gene-0.1-mRNA-1 > 1004 1014 +;Gap=M11 > Nnadi Nnaemeka Emmanuel,Ph.D > Department of Microbiology, > Faculty of Natural and Applied Science, > Plateau State University, Bokkos, Plateau State, Nigeria. > +2348068124819 > Publications: > https://www.researchgate.net/profile/Emmanuel_Nnadi/publications > > > > -- Nnadi Nnaemeka Emmanuel,Ph.D Department of Microbiology, Faculty of Natural and Applied Science, Plateau State University, Bokkos, Plateau State, Nigeria. +2348068124819 Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Aug 21 12:47:11 2020 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 21 Aug 2020 12:47:11 -0600 Subject: [maker-devel] Maker Snap annotation yields match_part In-Reply-To: References: <7F42897F-8BBA-4FA0-AC64-128A0CF727C6@gmail.com> Message-ID: If you want to try and do this programmatically, you can try libraries like the Gene Annotation Library (GAL) from the Sequence Ontology ?> https://github.com/The-Sequence-Ontology/GAL Or if you are looking for visualization or other manipulation you can try one of the GMOD tools ?> http://gmod.org/wiki/Main_Page ?Carson > On Aug 21, 2020, at 12:38 PM, Emmanuel Nnadi wrote: > > Thanks Carson > > Please how do I assemble the match-part on the match parent? Any special instruction on the maker control file? > > > On Fri, 21 Aug 2020 at 6:09 PM, Carson Holt > wrote: > If you just want a count you can grep for match with tabs at either side ?> grep -P ?\tmatch\t? file.gff > > If you want the model, you need to assemble the match parts onto the match parent. > > ?Carson > > >> On Jul 26, 2020, at 11:31 PM, Emmanuel Nnadi > wrote: >> >> >> Hello, >> I am annotating my genome using SNAP after running SNAP twice my GFF has quite a number of match_part. How can I get an actual match? >> >> >> >> ilon_pilon:hit:2447078:4.5.0.0;Target=snap_masked-contig_844_pilon_pilon_pilon-abinit-gene-0.1-mRNA-1 890 1003 +;Gap=M114 >> contig_844_pilon_pilon_pilon snap_masked match_part 66719 66729 8.644 + . ID=contig_844_pilon_pilon_pilon:hsp:2866471:4.5.0.0;Parent=contig_844_pilon_pilon_pilon:hit:2447078:4.5.0.0;Target=snap_masked-contig_844_pilon_pilon_pilon-abinit-gene-0.1-mRNA-1 1004 1014 +;Gap=M11 >> Nnadi Nnaemeka Emmanuel,Ph.D >> Department of Microbiology, >> Faculty of Natural and Applied Science, >> Plateau State University, Bokkos, Plateau State, Nigeria. >> +2348068124819 >> Publications: >> https://www.researchgate.net/profile/Emmanuel_Nnadi/publications >> > > -- > Nnadi Nnaemeka Emmanuel,Ph.D > Department of Microbiology, > Faculty of Natural and Applied Science, > Plateau State University, Bokkos, Plateau State, Nigeria. > +2348068124819 > Publications: > https://www.researchgate.net/profile/Emmanuel_Nnadi/publications -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Aug 21 12:49:56 2020 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 21 Aug 2020 12:49:56 -0600 Subject: [maker-devel] Maker Snap annotation yields match_part In-Reply-To: References: <7F42897F-8BBA-4FA0-AC64-128A0CF727C6@gmail.com> Message-ID: <24B6BD02-B57C-4426-B919-52533BA3DC2B@gmail.com> Also, if you are just trying to see what the gene model of a match/match_part would have been, you can change maker_opts.ctl settings to keep all models (keep_pred=1) or pass in the models you want as match/match_part (Fred_gff=) and then let it be turned into an mRNA/exon/CDS feature. match/match_part are reference features that let you see where the annotations came from. ?Carson > On Aug 21, 2020, at 12:47 PM, Carson Holt wrote: > > If you want to try and do this programmatically, you can try libraries like the Gene Annotation Library (GAL) from the Sequence Ontology ?> https://github.com/The-Sequence-Ontology/GAL > > Or if you are looking for visualization or other manipulation you can try one of the GMOD tools ?> http://gmod.org/wiki/Main_Page > > ?Carson > > >> On Aug 21, 2020, at 12:38 PM, Emmanuel Nnadi > wrote: >> >> Thanks Carson >> >> Please how do I assemble the match-part on the match parent? Any special instruction on the maker control file? >> >> >> On Fri, 21 Aug 2020 at 6:09 PM, Carson Holt > wrote: >> If you just want a count you can grep for match with tabs at either side ?> grep -P ?\tmatch\t? file.gff >> >> If you want the model, you need to assemble the match parts onto the match parent. >> >> ?Carson >> >> >>> On Jul 26, 2020, at 11:31 PM, Emmanuel Nnadi > wrote: >>> >>> >>> Hello, >>> I am annotating my genome using SNAP after running SNAP twice my GFF has quite a number of match_part. How can I get an actual match? >>> >>> >>> >>> ilon_pilon:hit:2447078:4.5.0.0;Target=snap_masked-contig_844_pilon_pilon_pilon-abinit-gene-0.1-mRNA-1 890 1003 +;Gap=M114 >>> contig_844_pilon_pilon_pilon snap_masked match_part 66719 66729 8.644 + . ID=contig_844_pilon_pilon_pilon:hsp:2866471:4.5.0.0;Parent=contig_844_pilon_pilon_pilon:hit:2447078:4.5.0.0;Target=snap_masked-contig_844_pilon_pilon_pilon-abinit-gene-0.1-mRNA-1 1004 1014 +;Gap=M11 >>> Nnadi Nnaemeka Emmanuel,Ph.D >>> Department of Microbiology, >>> Faculty of Natural and Applied Science, >>> Plateau State University, Bokkos, Plateau State, Nigeria. >>> +2348068124819 >>> Publications: >>> https://www.researchgate.net/profile/Emmanuel_Nnadi/publications >>> >> >> -- >> Nnadi Nnaemeka Emmanuel,Ph.D >> Department of Microbiology, >> Faculty of Natural and Applied Science, >> Plateau State University, Bokkos, Plateau State, Nigeria. >> +2348068124819 >> Publications: >> https://www.researchgate.net/profile/Emmanuel_Nnadi/publications > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Aug 21 12:55:40 2020 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 21 Aug 2020 12:55:40 -0600 Subject: [maker-devel] Intron lengths below minimum cutoff In-Reply-To: References: Message-ID: Looking through old messages, it looks like this one fell through the cracks. The min_intron parameter is for exonerate alignments (gets past directly to the algorithm). However gene predictors like SNAP and Augustus can still call introns of any length they wish (doesn?t affect them). So you can still get shorter introns in the model, you just won?t get short introns from exonerate or in the evidence hints passed to SNAP and Augustus. ?Carson > On May 29, 2020, at 3:49 PM, Trojahn, Shawn Michael wrote: > > Hello, > > I have been having a problem with the final annotation coming from Maker2 where I have a few thousand introns that are below the minimum intron value I have set. Most of the exons around these problem introns have no support in the final merged gff file, but a few are supported by blast hits. Is there a reason why these introns would remain in the final gff? > > Thanks, > Shawn > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Aug 21 14:45:43 2020 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 21 Aug 2020 14:45:43 -0600 Subject: [maker-devel] maker-devel post from christopher.keeling.1@ulaval.ca requires approval In-Reply-To: References: Message-ID: Hi Chris, Sorry for the slow reply. Actually The desired behavior is to capture the entire name from the fasta header and not try and divide on the pipes. NCBI BLAST versions have historically done this dividing but only for NCBI sourced data (it won?t do it for Swiss-prot for example or at least it wouldn?t with all previous versions). If it is doing that now, that is a rather big behavior change, but can be turned off by adding -show_gis to the blast command line. Thanks, Carson > From: Christopher Keeling > Subject: Re: Maker 2.31.10: maker_functional_gff and maker_functional_fasta not parsing correctly, Can't use string ("") as a HASH ref while "strict refs" in use > Date: July 7, 2020 at 6:12:37 PM MDT > To: "maker-devel at yandell-lab.org" > > > Hi Carson, > > I?m now using Maker 3.01.03, and I?m finding that maker_functional_gff and maker_functional_fasta still are not behaving as they should. I?m getting an error: > > Can't use string ("") as a HASH ref while "strict refs" in use at /usr/local/bin/maker/bin/maker_functional_gff line 55, <$IN> line 167. > > Version 2020_03 of uniprot_sprot.fasta starts like this: > > >sp|Q6GZX4|001R_FRG3G Putative transcription factor 001R OS=Frog virus 3 (isolate Goorha) OX=654924 GN=FV3-001R PE=4 SV=1 > > Based on your scripts, this is the example of your first condition. However, I find that I need to change it (in red) to get it to work as I understand it should work: > > #>sp|Q6GZX4|001R_FRG3G Putative transcription factor 001R OS=Frog virus 3 (isolate Goorha) OX=654924 GN=FV3-001R PE=4 SV=1 > if (/>sp\|(\S+)\|\S+\s+(.*?)\s+OS=(.*?)\s+OX=\S+\s+(GN=(.*?)\s+)?PE=/) { > $id = $1; > $desc = $2; > $org = $3; > $name = $5 || ?'; > } > > Compared to what is in 3.01.03: > #>sp|Q6GZX4|001R_FRG3G Putative transcription factor 001R OS=Frog virus 3 (isolate Goorha) OX=654924 GN=FV3-001R PE=4 SV=1 > if (/>(\S+)\s+(.*?)\s+OS=(.*?)\s+OX=(.*?)\s+(GN=(.*?)\s+)?PE=/) { > $id = $1; > $desc = $2; > $org = $3; > $name = $6 || ''; > } > > Thus, with my edits: > >sp|Q62559|IFT52_MOUSE Intraflagellar transport protein 52 homolog OS=Mus musculus OX=10090 GN=Ift52 PE=1 SV=2 > > maker_functional_gff would result in: > ...Note=Similar to Ift52: Intraflagellar transport protein 52 homolog (Mus musculus); > > And maker_function_gff would result in: > Name:"Similar to Ift52 Intraflagellar transport protein 52 homolog (Mus musculus)" > > Are these the expected behaviours? > > Cheers, > > Chris > >> On Mar 14, 2020, at 1:24 PM, Christopher Keeling > wrote: >> >> Hello, >> >> In sub parse_blast{, during parsing of uniprot fasta file: >> >> if (/>(\S+)\s+(.*?)\s+OS=(.*?)\s+(GN=(.*?)\s+)?PE=/) { >> >> should be changed to: >> >> if (/>sp\|(\S+)\|\S+\s+(.*?)\s+OS=(.*?)\s+OX=\S+\s+(GN=(.*?)\s+)?PE=/) { >> >> to avoid "Can't use string ("") as a HASH ref while "strict refs" in use at?" errors. >> >> For UniProt release 2020_01: ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.fasta.gz >> >> Cheers, >> Chris >> > > > > > From: maker-devel-request at yandell-lab.org > Subject: confirm 4103e2b4c7646d07c7e79febdc4867fcd9cb2430 > Date: July 7, 2020 at 6:12:59 PM MDT > > > If you reply to this message, keeping the Subject: header intact, > Mailman will discard the held message. Do this if the message is > spam. If you reply to this message and include an Approved: header > with the list password in it, the message will be approved for posting > to the list. The Approved: header can also appear in the first line > of the body of the reply. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Aug 21 14:56:52 2020 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 21 Aug 2020 14:56:52 -0600 Subject: [maker-devel] maker-devel post from dianad.mosa@gmail.com requires approval In-Reply-To: References: Message-ID: MAKER?s internal logic has not changed at all. The new release is a minimal update to handle newer versions of GeneMark, tRNAScan, RepeatMasker, and Swiss-Prot (slight deviations from historical output formats). *Update to handle most recent tRNAScan output format. *Update to handle most recent GeneMark output format. *Update to downstream putative function scripts to handle newer Swiss-Prot FASTA header format *Some version/configurations of RepeatMasker can report 0 as a start/end coordinate when it should be 1 (0 is out of bounds), so we fix it when we see it. ?Carson > On Jul 17, 2020, at 3:42 PM, maker-devel-owner at yandell-lab.org wrote: > > As list administrator, your authorization is requested for the > following mailing list posting: > > List: maker-devel at yandell-lab.org > From: dianad.mosa at gmail.com > Subject: INCONSISTENCY BETWEEN PROTEINS AND GFF > Reason: Post by non-member to a members-only list > > At your convenience, visit: > > http://yandell-lab.org/mailman/admindb/maker-devel_yandell-lab.org > > to approve or deny the request. > > From: Diana Moreno Santill?n > Subject: INCONSISTENCY BETWEEN PROTEINS AND GFF > Date: July 17, 2020 at 3:32:20 PM MDT > To: maker-devel at yandell-lab.org > > > Hello, > > I am using MAKER for mammalian genomes. > I am running 3 rounds for most of them, as it seems to increase completeness according to BUSCO. > The issue is that the round 3 output has inconsistencies. > > For example, at the protein.fasta file I have sequences that after performing blast and rename are annotated, but at the gff file are missing, for example: > > > %grep AnoCau_scaffold_40730 acaudifer_3rd_run_all_maker_proteins_renamed_putative_function.fasta > augustus_masked-AnoCau_scaffold_40730-processed-gene-0.0-mRNA-1 protein Name:"Similar to IFNL3 Interferon lambda-3 (Homo sapiens OX=9606)" AED:0.28 eAED:0.23 QI:0|0|0|1|1|1|4|0|183 > > %grep AnoCau_scaffold_40730 acaudifer_3rd_run_all_maker_renamed_putative_function.gff > nothing > > Why is the sequence present in my proteins file, but not at my gff3 file? > > Is worth to notice that when I tried to run maps_id I got: > WARNING: No mapping available for AnoCau_scaffold_40730 > > I found that if I use the gff file from previous rounds of Maker as evidence, this could happen, but I haven't found a solution yet. What do you recommend to avoid these changes of names? > Why I have sequences in my protein file but not in my gff file? > > Thank you for your attention. > > Diana Moreno > > > > From: maker-devel-request at yandell-lab.org > Subject: confirm 2fbcb7f0a71181de1fcea3c9d2bf4e09bbaca6e4 > Date: July 17, 2020 at 3:42:30 PM MDT > > > If you reply to this message, keeping the Subject: header intact, > Mailman will discard the held message. Do this if the message is > spam. If you reply to this message and include an Approved: header > with the list password in it, the message will be approved for posting > to the list. The Approved: header can also appear in the first line > of the body of the reply. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Aug 21 15:00:06 2020 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 21 Aug 2020 15:00:06 -0600 Subject: [maker-devel] Maker2 changelog and Maker3 In-Reply-To: References: Message-ID: <39B39BFE-864C-4DD0-B538-15F3CAD6043F@gmail.com> Also MAKER2 vs MAKER3. Apart from EVM support, MAKER3 also has modified logic for est2genome and protein2genome that results in better forward mapping of old genes between assemblies and can use est_gff for UTR. MAKER2 only uses est_gff for predictor hints but not for UTR generation. ?Carson > On Jun 2, 2020, at 12:57 AM, Xabier V?zquez-Campos wrote: > > Hi Carson, > > I was looking for some Maker-related stuff (not important what) and I realised that there was a new release in April for Maker 2 and Maker 3 was not in beta anymore. > > We have been using Maker 2.31.9 in our cluster for a few years now and while the new (2.31.11) includes the most recent change (maker_functional_fasta and maker_functional_gff handle newer UniProt/Swiss-Prot format. Also fix for newer genemark command line structure) I can't find what changed in 2.31.10 nor a general changelog. > By the way, do you know when (version) the genemark command line structure changed? What about the Uniprot format? > > Also, is there any fundamental difference aside of the integration of EVM in Maker3 vs Maker2? I found this entry from 2 years ago and wanted to check if this is still the case. > http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/2018-August/003279.html > > Thank you, > Xabi > > > PS: is there a way to receive notifications for new releases? I didn't notice anything in the mail list > -- > Xabier V?zquez-Campos, PhD > Research Associate > NSW Systems Biology Initiative > School of Biotechnology and Biomolecular Sciences > The University of New South Wales > Sydney NSW 2052 AUSTRALIA > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From mj.gomez12 at uniandes.edu.co Wed Aug 26 22:35:06 2020 From: mj.gomez12 at uniandes.edu.co (Maria Jose Gomez Hughes) Date: Thu, 27 Aug 2020 04:35:06 +0000 Subject: [maker-devel] fasta_merge problem Message-ID: Hi! I have been using maker to annotate an eukaryote genome. Since the genome was big (~3Gb), I divided it into 10 chunks with fasta_utils and run maker on each separately. After that, I used gff3_merge and fasta_merge to produce output for each of the chunks. Then I tried using these same tools for merging the different outputs into one, and while gff3_merge ran as expected, fasta_merge did not. I used the following command: fasta_merge -o maker.all.proteins.fasta -i maker.chunk-00.proteins.fasta maker.chunk-01.proteins.fasta maker.chunk-02.proteins.fasta maker.chunk-03.proteins.fasta maker.chunk-04.proteins.fasta maker.chunk-05.proteins.fasta maker.chunk-06.proteins.fasta maker.chunk-07.proteins.fasta maker.chunk-08.proteins.fasta maker.chunk-09.proteins.fasta But instead of running it just showed me the help page. I tried just running two of them, but that didn?t help either. Any help will be much appreciated! Maria Jose Gomez -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Aug 27 13:52:53 2020 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 27 Aug 2020 13:52:53 -0600 Subject: [maker-devel] fasta_merge problem In-Reply-To: References: Message-ID: The main function of fasta_merge is to use the datastore index file to find and concatenate fasta files spread throughout the datastore. If you wish to merge a handful of fasta files into one because you made 10 chunks, you can simply merge the files you want using the Linux ?cat? command. Example 1: ?cat file1 file2 file3 file4 > merge.fasta' Example 2: ?cat *output/*.maker.proteins.fasta > merge.maker.proteins.fasta' ?Carson > On Aug 26, 2020, at 10:35 PM, Maria Jose Gomez Hughes wrote: > > Hi! > > I have been using maker to annotate an eukaryote genome. Since the genome was big (~3Gb), I divided it into 10 chunks with fasta_utils and run maker on each separately. After that, I used gff3_merge and fasta_merge to produce output for each of the chunks. Then I tried using these same tools for merging the different outputs into one, and while gff3_merge ran as expected, fasta_merge did not. I used the following command: > > fasta_merge -o maker.all.proteins.fasta -i maker.chunk-00.proteins.fasta maker.chunk-01.proteins.fasta maker.chunk-02.proteins.fasta maker.chunk-03.proteins.fasta maker.chunk-04.proteins.fasta maker.chunk-05.proteins.fasta maker.chunk-06.proteins.fasta maker.chunk-07.proteins.fasta maker.chunk-08.proteins.fasta maker.chunk-09.proteins.fasta > > But instead of running it just showed me the help page. I tried just running two of them, but that didn?t help either. > > Any help will be much appreciated! > > Maria Jose Gomez > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From wei.xiong at wur.nl Fri Aug 28 01:33:33 2020 From: wei.xiong at wur.nl (Xiong, Wei) Date: Fri, 28 Aug 2020 07:33:33 +0000 Subject: [maker-devel] ERROR: Failed while processing all repeats Message-ID: Dear Colleague, I had encountered an ERROR when I used MAKER to annotate my genome. It is a large plant genome (more than 3Gb), I included a TE (gff) data, one Transcriptome data (fasta), and four protein sequences (fasta) for the homology annotation. There are in total 29 scaffolds. Three scaffolds faced the "ERROR: Failed while processing all repeats," while the other 26 finished successfully. I have tried the following methods from the online forum. However, I still cannot fix the error. * Check the RepeatMasker configuration * Check the maker_exe.ctl * replace .../maker/lib/Widget/RepeatMasker.pm http://gmod.827538.n3.nabble.com/MAKER-v3-ERROR-Failed-while-processing-all-repeats-td4059410.html * increase the try_count in the maker_opt.ctl Could you please help me to solve this problem? Thank you for reading my email. I look forward to hearing from you. Best wishes and stay healthy, Met vriendelijke groet, With kind regards, Wei Xiong PhD candidate | Wageningen University & Research Plant Science Group | Biosystematics Group Radix Building 107 Droevendaalsesteeg 1 6708 PB Wageningen The Netherlands Email: wei.xiong at wur.nl -------------- next part -------------- An HTML attachment was scrubbed... URL: