From arnstrm at gmail.com Fri Jan 1 12:31:06 2016 From: arnstrm at gmail.com (Arun Seetharam) Date: Fri, 1 Jan 2016 12:31:06 -0600 Subject: [maker-devel] running MAKER to merge annotations Message-ID: Hi all, First of all, a very happy new year to all of you! I hope everyone is having a great holiday season. I have a question about Maker. For my grass species, I ran 3 separate rounds of MAKER (low GC, regular GC and high GC) and I now have 3 maker gff files. So, what is the correct way to merge these files to a single gff file? Do I have to run a maker round with just the GFF files as input? It looks like EVM especially meant to do this kind of job, but not sure if Maker does this too. Thanks for any help or suggestions! Have a nice day, -- Arun Seetharam Post Doctoral Research Associate Genome Informatics Facility & EEOB Office of Biotechnology 228 Science I Iowa State University Ames, Iowa 50011 -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Fri Jan 1 12:37:38 2016 From: dence at genetics.utah.edu (Daniel Ence) Date: Fri, 1 Jan 2016 18:37:38 +0000 Subject: [maker-devel] running MAKER to merge annotations In-Reply-To: References: Message-ID: Hi Arun, are the three rounds of maker on different parts/versions of the genome or did you maker on the same genome with three different settings? If it?s the former, then you can merge the maker gff files with gff3_merge, which is included with your maker installation. If it?s the latter case then I do think EVM could help if you want to give the different result sets different confidence weights. If you want to give them all the same weight, then you could do another run of maker, and pass them through as either models or predictions. ~Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 > On Jan 1, 2016, at 11:31 AM, Arun Seetharam wrote: > > Hi all, > > First of all, a very happy new year to all of you! I hope everyone is having a great holiday season. > > I have a question about Maker. For my grass species, I ran 3 separate rounds of MAKER (low GC, regular GC and high GC) and I now have 3 maker gff files. So, what is the correct way to merge these files to a single gff file? Do I have to run a maker round with just the GFF files as input? It looks like EVM especially meant to do this kind of job, but not sure if Maker does this too. > > Thanks for any help or suggestions! > > Have a nice day, > -- > Arun Seetharam > Post Doctoral Research Associate > Genome Informatics Facility & EEOB > Office of Biotechnology > 228 Science I > Iowa State University > Ames, Iowa 50011 > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From arnstrm at gmail.com Fri Jan 1 13:17:18 2016 From: arnstrm at gmail.com (Arun Seetharam) Date: Fri, 1 Jan 2016 13:17:18 -0600 Subject: [maker-devel] running MAKER to merge annotations In-Reply-To: References: Message-ID: Hi Daniel, Thanks very much for the reply! It is the latter: same input genome under 3 settings (training was done using a different set of genes for all the gene predictors). I simply want to get a single gff, retaining only the best model(s) for each locus. Are you suggesting that I can run MAKER by providing 3 files for "maker_gff" (in maker_opts.ctl) and keeping everything else default? or do I have to do something in the CTL file to achieve this? I appreciate if you can provide more details for how to do this! Thanks once again for the reply! On Fri, Jan 1, 2016 at 12:37 PM, Daniel Ence wrote: > Hi Arun, are the three rounds of maker on different parts/versions of the > genome or did you maker on the same genome with three different settings? > If it?s the former, then you can merge the maker gff files with gff3_merge, > which is included with your maker installation. > > If it?s the latter case then I do think EVM could help if you want to give > the different result sets different confidence weights. If you want to give > them all the same weight, then you could do another run of maker, and pass > them through as either models or predictions. > > ~Daniel > > > > > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > > > On Jan 1, 2016, at 11:31 AM, Arun Seetharam wrote: > > > > Hi all, > > > > First of all, a very happy new year to all of you! I hope everyone is > having a great holiday season. > > > > I have a question about Maker. For my grass species, I ran 3 separate > rounds of MAKER (low GC, regular GC and high GC) and I now have 3 maker gff > files. So, what is the correct way to merge these files to a single gff > file? Do I have to run a maker round with just the GFF files as input? It > looks like EVM especially meant to do this kind of job, but not sure if > Maker does this too. > > > > Thanks for any help or suggestions! > > > > Have a nice day, > > -- > > Arun Seetharam > > Post Doctoral Research Associate > > Genome Informatics Facility & EEOB > > Office of Biotechnology > > 228 Science I > > Iowa State University > > Ames, Iowa 50011 > > _______________________________________________ > > maker-devel mailing list > > maker-devel at box290.bluehost.com > > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -- Arun Seetharam Post Doctoral Research Associate Genome Informatics Facility & EEOB Office of Biotechnology 228 Science I Iowa State University Ames, Iowa 50011 -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Jan 1 13:26:14 2016 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 1 Jan 2016 12:26:14 -0700 Subject: [maker-devel] running MAKER to merge annotations In-Reply-To: References: Message-ID: <28388091-0956-412B-B472-0A272FA31269@gmail.com> If you are running with different settings on the exact same contig, you will have to merge the models using the -l legacy option of gff3_merge to ensure there will be no ID collisions (some things will have the same IDs in the different runs). Then supply just the genes to pred_gff on the rerun. Alternatively you could have just provided your different predictor files as a comma separated list (i.e. snaphmm=hmm1,hmm2,hmm3). MAKER would have ran each one and kept just the one that best matched the evidence. However because MAKER passes hints to the predictors (which override the HMM for the most part), I have found that running with different predictor settings because of GC differences between contigs doesn?t provide the benefit you would think. ?Carson > On Jan 1, 2016, at 12:17 PM, Arun Seetharam wrote: > > Hi Daniel, > > Thanks very much for the reply! It is the latter: same input genome under 3 settings (training was done using a different set of genes for all the gene predictors). I simply want to get a single gff, retaining only the best model(s) for each locus. > Are you suggesting that I can run MAKER by providing 3 files for "maker_gff" (in maker_opts.ctl) and keeping everything else default? or do I have to do something in the CTL file to achieve this? I appreciate if you can provide more details for how to do this! > > Thanks once again for the reply! > > On Fri, Jan 1, 2016 at 12:37 PM, Daniel Ence > wrote: > Hi Arun, are the three rounds of maker on different parts/versions of the genome or did you maker on the same genome with three different settings? If it?s the former, then you can merge the maker gff files with gff3_merge, which is included with your maker installation. > > If it?s the latter case then I do think EVM could help if you want to give the different result sets different confidence weights. If you want to give them all the same weight, then you could do another run of maker, and pass them through as either models or predictions. > > ~Daniel > > > > > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > > > On Jan 1, 2016, at 11:31 AM, Arun Seetharam > wrote: > > > > Hi all, > > > > First of all, a very happy new year to all of you! I hope everyone is having a great holiday season. > > > > I have a question about Maker. For my grass species, I ran 3 separate rounds of MAKER (low GC, regular GC and high GC) and I now have 3 maker gff files. So, what is the correct way to merge these files to a single gff file? Do I have to run a maker round with just the GFF files as input? It looks like EVM especially meant to do this kind of job, but not sure if Maker does this too. > > > > Thanks for any help or suggestions! > > > > Have a nice day, > > -- > > Arun Seetharam > > Post Doctoral Research Associate > > Genome Informatics Facility & EEOB > > Office of Biotechnology > > 228 Science I > > Iowa State University > > Ames, Iowa 50011 > > _______________________________________________ > > maker-devel mailing list > > maker-devel at box290.bluehost.com > > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > > -- > Arun Seetharam > Post Doctoral Research Associate > Genome Informatics Facility & EEOB > Office of Biotechnology > 228 Science I > Iowa State University > Ames, Iowa 50011 > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From zhangzb554 at nenu.edu.cn Sat Jan 2 10:09:24 2016 From: zhangzb554 at nenu.edu.cn (=?UTF-8?B?5byg5b+X5paM?=) Date: Sun, 3 Jan 2016 00:09:24 +0800 (GMT+08:00) Subject: [maker-devel] =?utf-8?q?maker-devel_Digest=2C_Vol_92=2C_Issue_1?= In-Reply-To: Message-ID: Hi every one
I wonder where I can downlaod the perl package proc::signal? I can not find it in CPAN. who could send me the package or give me the website where i can get it ?

thans for your help At 2016-01-02 03:00:02, maker-devel-request at yandell-lab.org wrote: >Send maker-devel mailing list submissions to > maker-devel at yandell-lab.org > >To subscribe or unsubscribe via the World Wide Web, visit > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > >or, via email, send a message with subject or body 'help' to > maker-devel-request at yandell-lab.org > >You can reach the person managing the list at > maker-devel-owner at yandell-lab.org > >When replying, please edit your Subject line so it is more specific >than "Re: Contents of maker-devel digest..." > > >Today's Topics: > > 1. running MAKER to merge annotations (Arun Seetharam) > 2. Re: running MAKER to merge annotations (Daniel Ence) > > >---------------------------------------------------------------------- > >Message: 1 >Date: Fri, 1 Jan 2016 12:31:06 -0600 >From: Arun Seetharam >To: maker-devel at yandell-lab.org >Subject: [maker-devel] running MAKER to merge annotations >Message-ID: > >Content-Type: text/plain; charset="utf-8" > >Hi all, > >First of all, a very happy new year to all of you! I hope everyone is >having a great holiday season. > >I have a question about Maker. For my grass species, I ran 3 separate >rounds of MAKER (low GC, regular GC and high GC) and I now have 3 maker gff >files. So, what is the correct way to merge these files to a single gff >file? Do I have to run a maker round with just the GFF files as input? It >looks like EVM especially meant to do this kind of job, but not sure if >Maker does this too. > >Thanks for any help or suggestions! > >Have a nice day, >-- >Arun Seetharam >Post Doctoral Research Associate >Genome Informatics Facility & EEOB >Office of Biotechnology >228 Science I >Iowa State University >Ames, Iowa 50011 >-------------- next part -------------- >An HTML attachment was scrubbed... >URL: > >------------------------------ > >Message: 2 >Date: Fri, 1 Jan 2016 18:37:38 +0000 >From: Daniel Ence >To: Arun Seetharam >Cc: "maker-devel at yandell-lab.org" >Subject: Re: [maker-devel] running MAKER to merge annotations >Message-ID: >Content-Type: text/plain; charset="utf-8" > >Hi Arun, are the three rounds of maker on different parts/versions of the genome or did you maker on the same genome with three different settings? If it?s the former, then you can merge the maker gff files with gff3_merge, which is included with your maker installation. > >If it?s the latter case then I do think EVM could help if you want to give the different result sets different confidence weights. If you want to give them all the same weight, then you could do another run of maker, and pass them through as either models or predictions. > >~Daniel > > > > > >Daniel Ence >Graduate Student >Eccles Institute of Human Genetics >University of Utah >15 North 2030 East, Room 2100 >Salt Lake City, UT 84112-5330 > >> On Jan 1, 2016, at 11:31 AM, Arun Seetharam wrote: >> >> Hi all, >> >> First of all, a very happy new year to all of you! I hope everyone is having a great holiday season. >> >> I have a question about Maker. For my grass species, I ran 3 separate rounds of MAKER (low GC, regular GC and high GC) and I now have 3 maker gff files. So, what is the correct way to merge these files to a single gff file? Do I have to run a maker round with just the GFF files as input? It looks like EVM especially meant to do this kind of job, but not sure if Maker does this too. >> >> Thanks for any help or suggestions! >> >> Have a nice day, >> -- >> Arun Seetharam >> Post Doctoral Research Associate >> Genome Informatics Facility & EEOB >> Office of Biotechnology >> 228 Science I >> Iowa State University >> Ames, Iowa 50011 >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > >------------------------------ > >Subject: Digest Footer > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > >------------------------------ > >End of maker-devel Digest, Vol 92, Issue 1 >****************************************** From arnstrm at iastate.edu Fri Jan 1 12:05:35 2016 From: arnstrm at iastate.edu (Arun Seetharam) Date: Fri, 1 Jan 2016 12:05:35 -0600 Subject: [maker-devel] running MAKER to merge annotations Message-ID: Hi all, First of all, a very happy new year to all of you! I hope everyone is having a great holiday season. I have a question about Maker. For my grass species, I ran 3 separate rounds of MAKER (low GC, regular GC and high GC) and I now have 3 maker gff files. So, what is the correct way to merge these files to a single gff file? Do I have to run a maker round with just the GFF files as input? It looks like EVM especially meant to do this kind of job, but not sure if Maker does this too. Thanks for any help or suggestions! Have a nice day, This email has been sent from a virus-free computer protected by Avast. www.avast.com <#DDB4FAA8-2DD7-40BB-A1B8-4E2AA1F9FDF2> -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Jan 4 10:04:29 2016 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 4 Jan 2016 09:04:29 -0700 Subject: [maker-devel] Error with maker_functional_gff In-Reply-To: References: <1EBE8B59-ED4E-4017-99CE-6CD5A5662B74@genetics.utah.edu> Message-ID: Perhaps the easiest way to look at this is if you send us the files. I?m still leaning towards a format error. But it?s the kind of thing where I would need the files to find the specific entry. ?Carson > On Dec 16, 2015, at 11:32 PM, Ole Kristian T?rresen wrote: > > Here's the hits for GAMO_00029233 > >sp|Q9SUR9|SGT1A_ARATH Protein SGT1 homolog A OS=Arabidopsis thaliana GN=SGT1A PE=1 SV=1 > >sp|Q9SUT5|SGT1B_ARATH Protein SGT1 homolog B OS=Arabidopsis thaliana GN=SGT1B PE=1 SV=1 > >sp|Q2KIK0|SGT1_BOVIN Protein SGT1 homolog OS=Bos taurus GN=SUGT1 PE=2 SV=1 > >sp|Q55ED0|SGT1_DICDI Protein SGT1 homolog OS=Dictyostelium discoideum GN=sugt1 PE=2 SV=1 > >sp|Q9Y2Z0|SGT1_HUMAN Protein SGT1 homolog OS=Homo sapiens GN=SUGT1 PE=1 SV=3 > >sp|Q9CX34|SGT1_MOUSE Protein SGT1 homolog OS=Mus musculus GN=Sugt1 PE=1 SV=3 > >sp|Q0JL44|SGT1_ORYSJ Protein SGT1 homolog OS=Oryza sativa subsp. japonica GN=SGT1 PE=1 SV=1 > >sp|B0BN85|SGT1_RAT Protein SGT1 homolog OS=Rattus norvegicus GN=Sugt1 PE=2 SV=1 > > The bovin is the first hit. I can't really see anything different about that. > > I'm don't know perl that well. Do you have some code which I can use to debug this? In line 58 it tries to access the blast hash with the ID as a key, if I understand this correctly. Either the hash is empty where the key tries to access, or the key is empty. If I could print each ID as it is found, maybe I can find a pattern. And/or print each blast entry when the blast hash is created. > > Thank you. > > Ole > > On 16 December 2015 at 21:55, Carson Holt > wrote: > Find the hit for GAMO_00029233 and then pull it?s header line out of the Uniprot fasta file. There may be an unexpected formatting difference in that header. > > ?Carson > > > >> On Dec 16, 2015, at 1:53 PM, Ole Kristian T?rresen > wrote: >> >> Daniel, >> this is the previous gene, before maker_functional_gff: >> LG08 maker gene 13648888 13656687 . - . ID=GAMO_00029212;Name=GAMO_00029212;Alias=maker-LG08-snap-gene-46.325; >> LG08 maker mRNA 13648888 13656687 . - . ID=GAMO_00029212-RA;Parent=GAMO_00029212;Name=GAMO_00029212-RA;Alias=maker-LG08-snap-gene-46.325-mRNA-1;_AED=0.45;_QI=0|0.83|0.84|1|0.5|0.61|13|1843|351;_eAED=0.45; >> LG08 maker exon 13648888 13648944 . - . ID=GAMO_00029212-RA:exon:9363;Parent=GAMO_00029212-RA; >> LG08 maker exon 13649295 13649577 . - . ID=GAMO_00029212-RA:exon:9362;Parent=GAMO_00029212-RA; >> LG08 maker exon 13649816 13651468 . - . ID=GAMO_00029212-RA:exon:9361;Parent=GAMO_00029212-RA; >> LG08 maker exon 13651736 13651789 . - . ID=GAMO_00029212-RA:exon:9360;Parent=GAMO_00029212-RA; >> LG08 maker exon 13652270 13652365 . - . ID=GAMO_00029212-RA:exon:9359;Parent=GAMO_00029212-RA; >> LG08 maker exon 13652643 13652730 . - . ID=GAMO_00029212-RA:exon:9358;Parent=GAMO_00029212-RA; >> LG08 maker exon 13653175 13653212 . - . ID=GAMO_00029212-RA:exon:9357;Parent=GAMO_00029212-RA; >> LG08 maker exon 13653587 13653641 . - . ID=GAMO_00029212-RA:exon:9356;Parent=GAMO_00029212-RA; >> LG08 maker exon 13653764 13653817 . - . ID=GAMO_00029212-RA:exon:9355;Parent=GAMO_00029212-RA; >> LG08 maker exon 13653910 13653974 . - . ID=GAMO_00029212-RA:exon:9354;Parent=GAMO_00029212-RA; >> LG08 maker exon 13654085 13654164 . - . ID=GAMO_00029212-RA:exon:9353;Parent=GAMO_00029212-RA; >> LG08 maker exon 13654474 13654828 . - . ID=GAMO_00029212-RA:exon:9352;Parent=GAMO_00029212-RA; >> LG08 maker exon 13656667 13656687 . - . ID=GAMO_00029212-RA:exon:9351;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13656667 13656687 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13654474 13654828 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13654085 13654164 . - 2 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13653910 13653974 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13653764 13653817 . - 1 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13653587 13653641 . - 1 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13653175 13653212 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13652643 13652730 . - 1 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13652270 13652365 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13651736 13651789 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13651319 13651468 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker three_prime_UTR 13649816 13651318 . - . ID=GAMO_00029212-RA:three_prime_utr;Parent=GAMO_00029212-RA; >> LG08 maker three_prime_UTR 13649295 13649577 . - . ID=GAMO_00029212-RA:three_prime_utr;Parent=GAMO_00029212-RA; >> LG08 maker three_prime_UTR 13648888 13648944 . - . ID=GAMO_00029212-RA:three_prime_utr;Parent=GAMO_00029212-RA; >> LG08 maker gene 13786695 13806565 . - . ID=GAMO_00029233;Name=GAMO_00029233;Alias=maker-LG08-snap-gene-46.343; >> LG08 maker mRNA 13786695 13806565 . - . ID=GAMO_00029233-RA;Parent=GAMO_00029233;Name=GAMO_00029233-RA;Alias=maker-LG08-snap-gene-46.343-mRNA-1;_AED=0.47;_QI=173|0.78|0.66|1|0.21|0.26|15|0|301;_eAED=0.47; >> >> After : >> LG08 maker gene 13648888 13656687 . - . ID=GAMO_00029212;Name=GAMO_00029212;Alias=maker-LG08-snap-gene-46.325;Note=Similar to Tmbim1: Protein lifeguard 3 (Mus musculus); >> LG08 maker mRNA 13648888 13656687 . - . ID=GAMO_00029212-RA;Parent=GAMO_00029212;Name=GAMO_00029212-RA;Alias=maker-LG08-snap-gene-46.325-mRNA-1;_AED=0.45;_QI=0|0.83|0.84|1|0.5|0.61|13|1843|351;_eAED=0.45;Note=Similar to Tmbim1: Protein lifeguard 3 (Mus musculus); >> LG08 maker exon 13648888 13648944 . - . ID=GAMO_00029212-RA:exon:9363;Parent=GAMO_00029212-RA; >> LG08 maker exon 13649295 13649577 . - . ID=GAMO_00029212-RA:exon:9362;Parent=GAMO_00029212-RA; >> LG08 maker exon 13649816 13651468 . - . ID=GAMO_00029212-RA:exon:9361;Parent=GAMO_00029212-RA; >> LG08 maker exon 13651736 13651789 . - . ID=GAMO_00029212-RA:exon:9360;Parent=GAMO_00029212-RA; >> LG08 maker exon 13652270 13652365 . - . ID=GAMO_00029212-RA:exon:9359;Parent=GAMO_00029212-RA; >> LG08 maker exon 13652643 13652730 . - . ID=GAMO_00029212-RA:exon:9358;Parent=GAMO_00029212-RA; >> LG08 maker exon 13653175 13653212 . - . ID=GAMO_00029212-RA:exon:9357;Parent=GAMO_00029212-RA; >> LG08 maker exon 13653587 13653641 . - . ID=GAMO_00029212-RA:exon:9356;Parent=GAMO_00029212-RA; >> LG08 maker exon 13653764 13653817 . - . ID=GAMO_00029212-RA:exon:9355;Parent=GAMO_00029212-RA; >> LG08 maker exon 13653910 13653974 . - . ID=GAMO_00029212-RA:exon:9354;Parent=GAMO_00029212-RA; >> LG08 maker exon 13654085 13654164 . - . ID=GAMO_00029212-RA:exon:9353;Parent=GAMO_00029212-RA; >> LG08 maker exon 13654474 13654828 . - . ID=GAMO_00029212-RA:exon:9352;Parent=GAMO_00029212-RA; >> LG08 maker exon 13656667 13656687 . - . ID=GAMO_00029212-RA:exon:9351;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13656667 13656687 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13654474 13654828 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13654085 13654164 . - 2 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13653910 13653974 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13653764 13653817 . - 1 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13653587 13653641 . - 1 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13653175 13653212 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13652643 13652730 . - 1 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13652270 13652365 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13651736 13651789 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13651319 13651468 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker three_prime_UTR 13649816 13651318 . - . ID=GAMO_00029212-RA:three_prime_utr;Parent=GAMO_00029212-RA; >> LG08 maker three_prime_UTR 13649295 13649577 . - . ID=GAMO_00029212-RA:three_prime_utr;Parent=GAMO_00029212-RA; >> LG08 maker three_prime_UTR 13648888 13648944 . - . ID=GAMO_00029212-RA:three_prime_utr;Parent=GAMO_00029212-RA; >> >> Carson, I saw that, but I did use Uniprot/Swiss-prot. A snap of the blast-output used as input here: >> GAMO_00029212-RA sp|Q8BJZ3|LFG3_MOUSE 53.93 280 112 3 81 348 33 307 2e-92 285 >> GAMO_00029212-RA sp|Q969X1|LFG3_HUMAN 54.51 288 103 5 76 347 33 308 4e-92 284 >> GAMO_00029212-RA sp|Q9BWQ8|LFG2_HUMAN 45.73 328 134 6 44 351 13 316 2e-86 270 >> GAMO_00029212-RA sp|Q5R4I4|LFG2_PONAB 45.73 328 134 6 44 351 13 316 3e-86 269 >> GAMO_00029212-RA sp|Q1LZ71|LFG2_BOVIN 45.03 322 145 5 44 351 13 316 5e-84 264 >> GAMO_00029212-RA sp|O88407|LFG2_RAT 44.65 327 139 6 44 351 13 316 8e-83 261 >> GAMO_00029212-RA sp|Q8K097|LFG2_MOUSE 45.16 310 129 5 60 351 31 317 1e-80 255 >> GAMO_00029212-RA sp|Q7Z429|LFG1_HUMAN 39.32 351 164 9 32 351 39 371 6e-69 226 >> GAMO_00029212-RA sp|Q32L53|LFG1_BOVIN 41.69 343 158 8 29 351 46 366 8e-66 218 >> GAMO_00029212-RA sp|Q9ESF4|LFG1_MOUSE 40.43 324 156 8 53 351 34 345 2e-59 201 >> GAMO_00029212-RA sp|Q6P6R0|LFG1_RAT 39.71 345 165 11 34 351 20 348 2e-59 201 >> GAMO_00029212-RA sp|Q9DA39|LFG4_MOUSE 35.59 222 120 7 142 351 27 237 3e-24 103 >> GAMO_00029212-RA sp|Q49P94|GAAP_VACCL 33.47 239 128 9 113 337 1 222 5e-22 97.1 >> GAMO_00029233-RA sp|Q2KIK0|SGT1_BOVIN 53.18 299 100 3 5 268 17 310 5e-89 275 >> GAMO_00029233-RA sp|B0BN85|SGT1_RAT 51.51 299 104 3 5 268 16 308 5e-86 268 >> GAMO_00029233-RA sp|Q9CX34|SGT1_MOUSE 51.51 299 104 3 5 268 16 308 8e-86 267 >> GAMO_00029233-RA sp|Q9Y2Z0|SGT1_HUMAN 46.83 331 100 5 5 268 16 337 1e-80 254 >> GAMO_00029233-RA sp|Q0JL44|SGT1_ORYSJ 30.75 322 160 4 10 268 16 337 5e-36 137 >> GAMO_00029233-RA sp|Q9SUT5|SGT1B_ARATH 27.99 318 171 4 9 268 11 328 3e-35 135 >> GAMO_00029233-RA sp|Q9SUR9|SGT1A_ARATH 28.28 297 159 5 24 268 26 320 7e-35 134 >> GAMO_00029233-RA sp|Q55ED0|SGT1_DICDI 37.72 167 63 3 138 268 196 357 5e-25 107 >> >> 521 genes have had added function before maker_functional_gff choked particular gene GAMO_00029233. >> >> Thank you. >> >> Ole >> >> >> On 16 December 2015 at 20:37, Carson Holt > wrote: >> I?ve seen this exact same error before (https://groups.google.com/forum/#!searchin/maker-devel/$2Fmaker_functional_gff$20line$2058/maker-devel/cBuQMKTJj2M/aXGnARZ7JhsJ ). >> >> It is caused by the ID from the blast report and input protein fasta. maker_functional_gff is not a generic script that can work on any input, it only works on blast results against Uniprot/Swiss-prot. The script is expecting a very specific header format in both the report and the protein fasta and if it doesn?t see it, then it is missing certain pieces of needed information. >> >> Thanks, >> Carson >> >>> On Dec 16, 2015, at 12:27 PM, Daniel Ence > wrote: >>> >>> Hi Ole, can you send a line for a gene feature that does work? >>> >>> >>> Daniel Ence >>> Graduate Student >>> Eccles Institute of Human Genetics >>> University of Utah >>> 15 North 2030 East, Room 2100 >>> Salt Lake City, UT 84112-5330 >>> >>>> On Dec 14, 2015, at 12:21 PM, Ole Kristian T?rresen > wrote: >>>> >>>> Hi, >>>> I'm trying to update my annotation with some functional annotations with maker_functional_gff, but get this annoying error: >>>> Can't use string ("") as a HASH ref while "strict refs" in use at /cluster/software/VERSIONS/maker-2.31.8/bin/maker_functional_gff line 58, <$IN> line 108947. >>>> Line 108947 in the input gff is this: >>>> >>>> LG08 maker gene 13786695 13806565 . - . ID=GAMO_00029233;Name=GAMO_00029233;Alias=maker-LG08-snap-gene-46.343; >>>> It seems like the regexp in line 55 in the maker_functional_gff script doesn't pick up the ID, but I can't see any difference between that line and other similar lines. >>>> >>>> Any help to trace down this is really appreciated. Do you need any other information? >>>> >>>> Thank you. >>>> >>>> Sincerely, >>>> >>>> Ole Kristian T?rresen >>>> >>>> >>>> >>>> _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at box290.bluehost.com >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ole.toerresen at gmail.com Mon Jan 4 13:08:43 2016 From: ole.toerresen at gmail.com (=?UTF-8?Q?Ole_Kristian_T=C3=B8rresen?=) Date: Mon, 4 Jan 2016 20:08:43 +0100 Subject: [maker-devel] Error with maker_functional_gff In-Reply-To: References: <1EBE8B59-ED4E-4017-99CE-6CD5A5662B74@genetics.utah.edu> Message-ID: I found the mistake, I used different versions of SwissProt/UniProt for BLASTing and as an option for maker_functional_gff. When I changed to the same version, the error went away. Sad to say, but stuff like different versions of SwissProt/UniProt do accumulate over time a bit... Thank you. Ole On 4 January 2016 at 17:04, Carson Holt wrote: > Perhaps the easiest way to look at this is if you send us the files. I?m > still leaning towards a format error. But it?s the kind of thing where I > would need the files to find the specific entry. > > ?Carson > > > > On Dec 16, 2015, at 11:32 PM, Ole Kristian T?rresen < > ole.toerresen at gmail.com> wrote: > > Here's the hits for GAMO_00029233 > >sp|Q9SUR9|SGT1A_ARATH Protein SGT1 homolog A OS=Arabidopsis thaliana > GN=SGT1A PE=1 SV=1 > >sp|Q9SUT5|SGT1B_ARATH Protein SGT1 homolog B OS=Arabidopsis thaliana > GN=SGT1B PE=1 SV=1 > >sp|Q2KIK0|SGT1_BOVIN Protein SGT1 homolog OS=Bos taurus GN=SUGT1 PE=2 SV=1 > >sp|Q55ED0|SGT1_DICDI Protein SGT1 homolog OS=Dictyostelium discoideum > GN=sugt1 PE=2 SV=1 > >sp|Q9Y2Z0|SGT1_HUMAN Protein SGT1 homolog OS=Homo sapiens GN=SUGT1 PE=1 > SV=3 > >sp|Q9CX34|SGT1_MOUSE Protein SGT1 homolog OS=Mus musculus GN=Sugt1 PE=1 > SV=3 > >sp|Q0JL44|SGT1_ORYSJ Protein SGT1 homolog OS=Oryza sativa subsp. japonica > GN=SGT1 PE=1 SV=1 > >sp|B0BN85|SGT1_RAT Protein SGT1 homolog OS=Rattus norvegicus GN=Sugt1 > PE=2 SV=1 > > The bovin is the first hit. I can't really see anything different about > that. > > I'm don't know perl that well. Do you have some code which I can use to > debug this? In line 58 it tries to access the blast hash with the ID as a > key, if I understand this correctly. Either the hash is empty where the key > tries to access, or the key is empty. If I could print each ID as it is > found, maybe I can find a pattern. And/or print each blast entry when the > blast hash is created. > > Thank you. > > Ole > > On 16 December 2015 at 21:55, Carson Holt wrote: > >> Find the hit for GAMO_00029233 and then pull it?s header line out of the >> Uniprot fasta file. There may be an unexpected formatting difference in >> that header. >> >> ?Carson >> >> >> >> On Dec 16, 2015, at 1:53 PM, Ole Kristian T?rresen < >> ole.toerresen at gmail.com> wrote: >> >> Daniel, >> this is the previous gene, before maker_functional_gff: >> LG08 maker gene 13648888 13656687 . - . >> ID=GAMO_00029212;Name=GAMO_00029212;Alias=maker-LG08-snap-gene-46.325; >> LG08 maker mRNA 13648888 13656687 . - . >> >> ID=GAMO_00029212-RA;Parent=GAMO_00029212;Name=GAMO_00029212-RA;Alias=maker-LG08-snap-gene-46.325-mRNA-1;_AED=0.45;_QI=0|0.83|0.84|1|0.5|0.61|13|1843|351;_eAED=0.45; >> LG08 maker exon 13648888 13648944 . - . >> ID=GAMO_00029212-RA:exon:9363;Parent=GAMO_00029212-RA; >> LG08 maker exon 13649295 13649577 . - . >> ID=GAMO_00029212-RA:exon:9362;Parent=GAMO_00029212-RA; >> LG08 maker exon 13649816 13651468 . - . >> ID=GAMO_00029212-RA:exon:9361;Parent=GAMO_00029212-RA; >> LG08 maker exon 13651736 13651789 . - . >> ID=GAMO_00029212-RA:exon:9360;Parent=GAMO_00029212-RA; >> LG08 maker exon 13652270 13652365 . - . >> ID=GAMO_00029212-RA:exon:9359;Parent=GAMO_00029212-RA; >> LG08 maker exon 13652643 13652730 . - . >> ID=GAMO_00029212-RA:exon:9358;Parent=GAMO_00029212-RA; >> LG08 maker exon 13653175 13653212 . - . >> ID=GAMO_00029212-RA:exon:9357;Parent=GAMO_00029212-RA; >> LG08 maker exon 13653587 13653641 . - . >> ID=GAMO_00029212-RA:exon:9356;Parent=GAMO_00029212-RA; >> LG08 maker exon 13653764 13653817 . - . >> ID=GAMO_00029212-RA:exon:9355;Parent=GAMO_00029212-RA; >> LG08 maker exon 13653910 13653974 . - . >> ID=GAMO_00029212-RA:exon:9354;Parent=GAMO_00029212-RA; >> LG08 maker exon 13654085 13654164 . - . >> ID=GAMO_00029212-RA:exon:9353;Parent=GAMO_00029212-RA; >> LG08 maker exon 13654474 13654828 . - . >> ID=GAMO_00029212-RA:exon:9352;Parent=GAMO_00029212-RA; >> LG08 maker exon 13656667 13656687 . - . >> ID=GAMO_00029212-RA:exon:9351;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13656667 13656687 . - 0 >> ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13654474 13654828 . - 0 >> ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13654085 13654164 . - 2 >> ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13653910 13653974 . - 0 >> ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13653764 13653817 . - 1 >> ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13653587 13653641 . - 1 >> ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13653175 13653212 . - 0 >> ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13652643 13652730 . - 1 >> ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13652270 13652365 . - 0 >> ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13651736 13651789 . - 0 >> ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13651319 13651468 . - 0 >> ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker three_prime_UTR 13649816 13651318 . - >> . ID=GAMO_00029212-RA:three_prime_utr;Parent=GAMO_00029212-RA; >> LG08 maker three_prime_UTR 13649295 13649577 . - >> . ID=GAMO_00029212-RA:three_prime_utr;Parent=GAMO_00029212-RA; >> LG08 maker three_prime_UTR 13648888 13648944 . - >> . ID=GAMO_00029212-RA:three_prime_utr;Parent=GAMO_00029212-RA; >> LG08 maker gene 13786695 13806565 . - . >> ID=GAMO_00029233;Name=GAMO_00029233;Alias=maker-LG08-snap-gene-46.343; >> LG08 maker mRNA 13786695 13806565 . - . >> >> ID=GAMO_00029233-RA;Parent=GAMO_00029233;Name=GAMO_00029233-RA;Alias=maker-LG08-snap-gene-46.343-mRNA-1;_AED=0.47;_QI=173|0.78|0.66|1|0.21|0.26|15|0|301;_eAED=0.47; >> >> After : >> LG08 maker gene 13648888 13656687 . - . >> >> ID=GAMO_00029212;Name=GAMO_00029212;Alias=maker-LG08-snap-gene-46.325;Note=Similar >> to Tmbim1: Protein lifeguard 3 (Mus musculus); >> LG08 maker mRNA 13648888 13656687 . - . >> >> ID=GAMO_00029212-RA;Parent=GAMO_00029212;Name=GAMO_00029212-RA;Alias=maker-LG08-snap-gene-46.325-mRNA-1;_AED=0.45;_QI=0|0.83|0.84|1|0.5|0.61|13|1843|351;_eAED=0.45;Note=Similar >> to Tmbim1: Protein lifeguard 3 (Mus musculus); >> LG08 maker exon 13648888 13648944 . - . >> ID=GAMO_00029212-RA:exon:9363;Parent=GAMO_00029212-RA; >> LG08 maker exon 13649295 13649577 . - . >> ID=GAMO_00029212-RA:exon:9362;Parent=GAMO_00029212-RA; >> LG08 maker exon 13649816 13651468 . - . >> ID=GAMO_00029212-RA:exon:9361;Parent=GAMO_00029212-RA; >> LG08 maker exon 13651736 13651789 . - . >> ID=GAMO_00029212-RA:exon:9360;Parent=GAMO_00029212-RA; >> LG08 maker exon 13652270 13652365 . - . >> ID=GAMO_00029212-RA:exon:9359;Parent=GAMO_00029212-RA; >> LG08 maker exon 13652643 13652730 . - . >> ID=GAMO_00029212-RA:exon:9358;Parent=GAMO_00029212-RA; >> LG08 maker exon 13653175 13653212 . - . >> ID=GAMO_00029212-RA:exon:9357;Parent=GAMO_00029212-RA; >> LG08 maker exon 13653587 13653641 . - . >> ID=GAMO_00029212-RA:exon:9356;Parent=GAMO_00029212-RA; >> LG08 maker exon 13653764 13653817 . - . >> ID=GAMO_00029212-RA:exon:9355;Parent=GAMO_00029212-RA; >> LG08 maker exon 13653910 13653974 . - . >> ID=GAMO_00029212-RA:exon:9354;Parent=GAMO_00029212-RA; >> LG08 maker exon 13654085 13654164 . - . >> ID=GAMO_00029212-RA:exon:9353;Parent=GAMO_00029212-RA; >> LG08 maker exon 13654474 13654828 . - . >> ID=GAMO_00029212-RA:exon:9352;Parent=GAMO_00029212-RA; >> LG08 maker exon 13656667 13656687 . - . >> ID=GAMO_00029212-RA:exon:9351;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13656667 13656687 . - 0 >> ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13654474 13654828 . - 0 >> ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13654085 13654164 . - 2 >> ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13653910 13653974 . - 0 >> ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13653764 13653817 . - 1 >> ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13653587 13653641 . - 1 >> ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13653175 13653212 . - 0 >> ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13652643 13652730 . - 1 >> ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13652270 13652365 . - 0 >> ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13651736 13651789 . - 0 >> ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13651319 13651468 . - 0 >> ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker three_prime_UTR 13649816 13651318 . - >> . ID=GAMO_00029212-RA:three_prime_utr;Parent=GAMO_00029212-RA; >> LG08 maker three_prime_UTR 13649295 13649577 . - >> . ID=GAMO_00029212-RA:three_prime_utr;Parent=GAMO_00029212-RA; >> LG08 maker three_prime_UTR 13648888 13648944 . - >> . ID=GAMO_00029212-RA:three_prime_utr;Parent=GAMO_00029212-RA; >> >> Carson, I saw that, but I did use Uniprot/Swiss-prot. A snap of the >> blast-output used as input here: >> GAMO_00029212-RA sp|Q8BJZ3|LFG3_MOUSE 53.93 280 112 3 >> 81 348 33 307 2e-92 285 >> GAMO_00029212-RA sp|Q969X1|LFG3_HUMAN 54.51 288 103 5 >> 76 347 33 308 4e-92 284 >> GAMO_00029212-RA sp|Q9BWQ8|LFG2_HUMAN 45.73 328 134 6 >> 44 351 13 316 2e-86 270 >> GAMO_00029212-RA sp|Q5R4I4|LFG2_PONAB 45.73 328 134 6 >> 44 351 13 316 3e-86 269 >> GAMO_00029212-RA sp|Q1LZ71|LFG2_BOVIN 45.03 322 145 5 >> 44 351 13 316 5e-84 264 >> GAMO_00029212-RA sp|O88407|LFG2_RAT 44.65 327 139 6 >> 44 351 13 316 8e-83 261 >> GAMO_00029212-RA sp|Q8K097|LFG2_MOUSE 45.16 310 129 5 >> 60 351 31 317 1e-80 255 >> GAMO_00029212-RA sp|Q7Z429|LFG1_HUMAN 39.32 351 164 9 >> 32 351 39 371 6e-69 226 >> GAMO_00029212-RA sp|Q32L53|LFG1_BOVIN 41.69 343 158 8 >> 29 351 46 366 8e-66 218 >> GAMO_00029212-RA sp|Q9ESF4|LFG1_MOUSE 40.43 324 156 8 >> 53 351 34 345 2e-59 201 >> GAMO_00029212-RA sp|Q6P6R0|LFG1_RAT 39.71 345 165 >> 11 34 351 20 348 2e-59 201 >> GAMO_00029212-RA sp|Q9DA39|LFG4_MOUSE 35.59 222 120 7 >> 142 351 27 237 3e-24 103 >> GAMO_00029212-RA sp|Q49P94|GAAP_VACCL 33.47 239 128 9 >> 113 337 1 222 5e-22 97.1 >> GAMO_00029233-RA sp|Q2KIK0|SGT1_BOVIN 53.18 299 100 3 >> 5 268 17 310 5e-89 275 >> GAMO_00029233-RA sp|B0BN85|SGT1_RAT 51.51 299 104 3 >> 5 268 16 308 5e-86 268 >> GAMO_00029233-RA sp|Q9CX34|SGT1_MOUSE 51.51 299 104 3 >> 5 268 16 308 8e-86 267 >> GAMO_00029233-RA sp|Q9Y2Z0|SGT1_HUMAN 46.83 331 100 5 >> 5 268 16 337 1e-80 254 >> GAMO_00029233-RA sp|Q0JL44|SGT1_ORYSJ 30.75 322 160 4 >> 10 268 16 337 5e-36 137 >> GAMO_00029233-RA sp|Q9SUT5|SGT1B_ARATH 27.99 318 171 4 >> 9 268 11 328 3e-35 135 >> GAMO_00029233-RA sp|Q9SUR9|SGT1A_ARATH 28.28 297 159 5 >> 24 268 26 320 7e-35 134 >> GAMO_00029233-RA sp|Q55ED0|SGT1_DICDI 37.72 167 63 3 >> 138 268 196 357 5e-25 107 >> >> 521 genes have had added function before maker_functional_gff choked >> particular gene GAMO_00029233. >> >> Thank you. >> >> Ole >> >> >> On 16 December 2015 at 20:37, Carson Holt wrote: >> >>> I?ve seen this exact same error before ( >>> https://groups.google.com/forum/#!searchin/maker-devel/$2Fmaker_functional_gff$20line$2058/maker-devel/cBuQMKTJj2M/aXGnARZ7JhsJ >>> ). >>> >>> It is caused by the ID from the blast report and input protein >>> fasta. maker_functional_gff is not a generic script that can work on any >>> input, it only works on blast results against Uniprot/Swiss-prot. The >>> script is expecting a very specific header format in both the report and >>> the protein fasta and if it doesn?t see it, then it is missing certain >>> pieces of needed information. >>> >>> Thanks, >>> Carson >>> >>> On Dec 16, 2015, at 12:27 PM, Daniel Ence >>> wrote: >>> >>> Hi Ole, can you send a line for a gene feature that does work? >>> >>> >>> Daniel Ence >>> Graduate Student >>> Eccles Institute of Human Genetics >>> University of Utah >>> 15 North 2030 East, Room 2100 >>> Salt Lake City, UT 84112-5330 >>> >>> On Dec 14, 2015, at 12:21 PM, Ole Kristian T?rresen < >>> ole.toerresen at gmail.com> wrote: >>> >>> Hi, >>> I'm trying to update my annotation with some functional annotations >>> with maker_functional_gff, but get this annoying error: >>> Can't use string ("") as a HASH ref while "strict refs" in use at >>> /cluster/software/VERSIONS/maker-2.31.8/bin/maker_functional_gff line 58, >>> <$IN> line 108947. >>> >>> Line 108947 in the input gff is this: >>> >>> LG08 maker gene 13786695 13806565 . - >>> . >>> ID=GAMO_00029233;Name=GAMO_00029233;Alias=maker-LG08-snap-gene-46.343; >>> >>> It seems like the regexp in line 55 in the maker_functional_gff script >>> doesn't pick up the ID, but I can't see any difference between that line >>> and other similar lines. >>> >>> Any help to trace down this is really appreciated. Do you need any other >>> information? >>> >>> Thank you. >>> >>> Sincerely, >>> >>> Ole Kristian T?rresen >>> >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> >>> >> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From s.shaw at abdn.ac.uk Fri Jan 8 09:05:16 2016 From: s.shaw at abdn.ac.uk (Shaw, Sophie) Date: Fri, 8 Jan 2016 15:05:16 +0000 Subject: [maker-devel] Moving Annotation to New Assembly Message-ID: Dear Maker Team, I have reassembled some data that was previously assembled with different software and then annotated using MAKER. I want to transfer the MAKER annotation to the new fasta file. I've followed the instructions in the post here - https://groups.google.com/forum/#!searchin/maker-devel/est_forward/maker-devel/q9fxXGKO8mk/0ATwhJvZeI4J However all of the information in the final column of the GFF has not been transferred over, just the gene name. For example: The original annotation is as follows: scaffold_252 maker gene 3018 4307 . + . ID=CAUR_05562;Name=CAUR_05562;Alias=augustus_masked-scaffold_252-processed-gene-0.0;Note=Similar to VHS1: Serine/threonine-protein kinase VHS1 (Saccharomyces cerevisiae (strain ATCC 204508 / S288c));Dbxref=Gene3D:G3DSA:1.10.510.10,Gene3D:G3DSA:3.30.200.20,InterPro:IPR000719,InterPro:IPR002290,InterPro:IPR008271,InterPro:IPR011009,InterPro:IPR017441,PANTHER:PTHR24343,PANTHER:PTHR24343:SF90,Pfam:PF00069,ProSitePatterns:PS00107,ProSitePatterns:PS00108,ProSiteProfiles:PS50011,SMART:SM00220,SUPERFAMILY:SSF56112;Ontology_term=GO:0004672,GO:0005524,GO:0006468,GO:0016772; And the new annotation after running MAKER with est_forward=1: scaffold_21 maker gene 18116 19405 . - . ID=maker-scaffold_21-exonerate_est2genome-gene-0.25;Name=CAUR_05562-RA-gene Is there a way of pulling the Note part of the gff file over as well as the gene name (and is this even a correct thing to do - should I be re-running MAKER entirely?). The researchers don't want to lose the information gained from the work on the previous annotation. All the Best, Sophie Shaw - Dr. Sophie Shaw Bioinformatician Centre for Genome Enabled Biology and Medicine University of Aberdeen 23 St. Machar Drive Old Aberdeen AB24 3RY https://www.abdn.ac.uk/genomics/ The University of Aberdeen is a charity registered in Scotland, No SC013683. Tha Oilthigh Obar Dheathain na charthannas cl?raichte ann an Alba, ?ir. SC013683. -------------- next part -------------- An HTML attachment was scrubbed... URL: From hcma at uci.edu Mon Jan 11 18:21:11 2016 From: hcma at uci.edu (hcma) Date: Mon, 11 Jan 2016 16:21:11 -0800 Subject: [maker-devel] basic question for MAKER Message-ID: <2ed9dc6119cdaa218cf453b8390d28e8@uci.edu> Hi, I have some basic questions regarding how to use MAKER. Do I have to download the following file myself? Repeatmasker.gff file genome sequence protein EST I would like to incorporate my RNA-seq data, I have a transcriptome assembly generated using Trinity, how do I incorporate this and can i use MAKER or do i have to use MAKER2? Thanks for your time and any comments will be greatly appreciated. Best Regards Karen From hcma at uci.edu Wed Jan 13 12:09:14 2016 From: hcma at uci.edu (hcma) Date: Wed, 13 Jan 2016 10:09:14 -0800 Subject: [maker-devel] basic question on maker Message-ID: Hi, I would like to include a de novo assembled transcriptome assembly for running maker. The organism i am working with is fly and I am wondering what is the best way to do this? Do I need to get the input files for running Repeatmasker or just set: model_org=all What's the best protein sequence file to use? is ' uniprot_sprot.fasta' ok? Some people use Trinity transcriptome assembly to generate a train set for Augustus and then run maker again, is this a better way than running maker just once? Thanks for your time and any comments will be greatly appreciated. Best Regards Karen From carsonhh at gmail.com Thu Jan 14 14:01:00 2016 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 14 Jan 2016 13:01:00 -0700 Subject: [maker-devel] basic question on maker In-Reply-To: References: Message-ID: <07C1C0FA-243E-4F05-AE17-FBECCE5BC3A5@gmail.com> Hi Karen, All your questions may be best answered from this tutorial on the MAKER wiki ?> http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial_for_GMOD_Online_Training_2014 There is also a video link on the wiki page if you want to follow that. Thanks, Carson > On Jan 13, 2016, at 11:09 AM, hcma wrote: > > Hi, > > I would like to include a de novo assembled transcriptome assembly for running maker. The organism i am working with is fly and I am wondering what is the best way to do this? > > Do I need to get the input files for running Repeatmasker or just set: > > model_org=all > > What's the best protein sequence file to use? > > is ' uniprot_sprot.fasta' ok? > > > Some people use Trinity transcriptome assembly to generate a train set for Augustus and then run maker again, is this a better way than running maker just once? > > > Thanks for your time and any comments will be greatly appreciated. > > Best Regards > Karen > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Jan 14 14:35:10 2016 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 14 Jan 2016 13:35:10 -0700 Subject: [maker-devel] Moving Annotation to New Assembly In-Reply-To: References: Message-ID: <7418369D-6EDB-4C61-B3F7-CF5FFF797FA2@gmail.com> We do not have a tool that will copy over attributes from one GFF3 file to another based off of ID match. Your needs are specific enough that you may have to write a script yourself to copy the attributes you care about. Truthfully I would recommend rerunning interproscan and blastp against swiss-prot, as these could probably use an update as anyways. The est_forward tool used to pull IDs forward is based solely off of alignment (they will not all be exact matches or complete matches - just best matches), so you cannot guarantee that all domain content will be completely identical. Interpro and swiss-prot also get periodically updated, so running these against the most recent releases can give more functional info. The purist in me would be inclined to redo the interproscn analysis and blastp against swiss-prot. Then you can use the maker_functional_gff, ipr_update_gff, and iprscan2gff3 scripts to properly add everything back in a way similar to the previous annotations. ?Carson > On Jan 8, 2016, at 8:05 AM, Shaw, Sophie wrote: > > Dear Maker Team, > > I have reassembled some data that was previously assembled with different software and then annotated using MAKER. I want to transfer the MAKER annotation to the new fasta file. I?ve followed the instructions in the post here - https://groups.google.com/forum/#!searchin/maker-devel/est_forward/maker-devel/q9fxXGKO8mk/0ATwhJvZeI4J > > However all of the information in the final column of the GFF has not been transferred over, just the gene name. For example: > > The original annotation is as follows: > scaffold_252 maker > gene 3018 > 4307 . > + . > ID=CAUR_05562;Name=CAUR_05562;Alias=augustus_masked-scaffold_252-processed-gene-0.0;Note=Similar to VHS1: Serine/threonine-protein kinase VHS1 (Saccharomyces cerevisiae (strain ATCC 204508 / S288c));Dbxref=Gene3D:G3DSA:1.10.510.10,Gene3D:G3DSA:3.30.200.20,InterPro:IPR000719,InterPro:IPR002290,InterPro:IPR008271,InterPro:IPR011009,InterPro:IPR017441,PANTHER:PTHR24343,PANTHER:PTHR24343:SF90,Pfam:PF00069,ProSitePatterns:PS00107,ProSitePatterns:PS00108,ProSiteProfiles:PS50011,SMART:SM00220,SUPERFAMILY:SSF56112;Ontology_term=GO:0004672,GO:0005524,GO:0006468,GO:0016772; > > And the new annotation after running MAKER with est_forward=1: > scaffold_21 maker > gene 18116 > 19405 . > - . > ID=maker-scaffold_21-exonerate_est2genome-gene-0.25;Name=CAUR_05562-RA-gene > > Is there a way of pulling the Note part of the gff file over as well as the gene name (and is this even a correct thing to do - should I be re-running MAKER entirely?). The researchers don?t want to lose the information gained from the work on the previous annotation. > > All the Best, > > Sophie Shaw > > ? > Dr. Sophie Shaw > Bioinformatician > Centre for Genome Enabled Biology and Medicine > University of Aberdeen > 23 St. Machar Drive > Old Aberdeen > AB24 3RY > https://www.abdn.ac.uk/genomics/ > > > > > The University of Aberdeen is a charity registered in Scotland, No SC013683. > Tha Oilthigh Obar Dheathain na charthannas cl?raichte ann an Alba, ?ir. SC013683. > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From hcma at uci.edu Thu Jan 14 17:44:39 2016 From: hcma at uci.edu (hcma) Date: Thu, 14 Jan 2016 15:44:39 -0800 Subject: [maker-devel] basic question on maker In-Reply-To: <07C1C0FA-243E-4F05-AE17-FBECCE5BC3A5@gmail.com> References: <07C1C0FA-243E-4F05-AE17-FBECCE5BC3A5@gmail.com> Message-ID: Hi Carson, Thanks for the link. Can maker2 be run without inputting any protein sequences? How to turn this off in the control files? Also, can i run maker using Augustus and not SNAP? Again, how do i turn SNAP off? Does maker also predict non-coding genes? Thanks. Best Regards Karen On 2016-01-14 12:01, Carson Holt wrote: > Hi Karen, > > All your questions may be best answered from this tutorial on the > MAKER wiki ?> > http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial_for_GMOD_Online_Training_2014 > [1] > > There is also a video link on the wiki page if you want to follow > that. > > Thanks, > Carson > >> On Jan 13, 2016, at 11:09 AM, hcma wrote: >> >> Hi, >> >> I would like to include a de novo assembled transcriptome assembly >> for running maker. The organism i am working with is fly and I am >> wondering what is the best way to do this? >> >> Do I need to get the input files for running Repeatmasker or just >> set: >> >> model_org=all >> >> What's the best protein sequence file to use? >> >> is ' uniprot_sprot.fasta' ok? >> >> Some people use Trinity transcriptome assembly to generate a train >> set for Augustus and then run maker again, is this a better way than >> running maker just once? >> >> Thanks for your time and any comments will be greatly appreciated. >> >> Best Regards >> Karen >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > Links: > ------ > [1] > http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial_for_GMOD_Online_Training_2014 From carsonhh at gmail.com Fri Jan 15 11:16:27 2016 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 15 Jan 2016 10:16:27 -0700 Subject: [maker-devel] basic question on maker In-Reply-To: References: <07C1C0FA-243E-4F05-AE17-FBECCE5BC3A5@gmail.com> Message-ID: > Can maker2 be run without inputting any protein sequences? Yes. But it will not perform as well. > How to turn this off in the control files? Any option left blank is off. > Also, can i run maker using Augustus and not SNAP? Again, how do i turn SNAP off? Yes. Leave it blank. > Does maker also predict non-coding genes? You can run it with tRNAscan or snoscan. Snoscan requires you to have rRNAs from your organism to train with though. ?Carson From hcma at uci.edu Fri Jan 15 16:39:25 2016 From: hcma at uci.edu (hcma) Date: Fri, 15 Jan 2016 14:39:25 -0800 Subject: [maker-devel] basic question on maker In-Reply-To: References: <07C1C0FA-243E-4F05-AE17-FBECCE5BC3A5@gmail.com> Message-ID: Hi Carlson, Regarding non-coding RNA predictions, MAKER only predicts tRNAs and rRNAs, but not other RNAs, for example, lncRNAs? Thanks again. Best Regards Karen On 2016-01-15 09:16, Carson Holt wrote: >> Can maker2 be run without inputting any protein sequences? > > Yes. But it will not perform as well. > >> How to turn this off in the control files? > > Any option left blank is off. > > >> Also, can i run maker using Augustus and not SNAP? Again, how do i >> turn SNAP off? > > Yes. Leave it blank. > > >> Does maker also predict non-coding genes? > > You can run it with tRNAscan or snoscan. Snoscan requires you to have > rRNAs from your organism to train with though. > > ?Carson From dence at genetics.utah.edu Fri Jan 15 16:51:44 2016 From: dence at genetics.utah.edu (Daniel Ence) Date: Fri, 15 Jan 2016 22:51:44 +0000 Subject: [maker-devel] basic question on maker In-Reply-To: References: <07C1C0FA-243E-4F05-AE17-FBECCE5BC3A5@gmail.com> Message-ID: <1C311E8C-20F3-48DB-A982-925AEECD7636@genetics.utah.edu> Hi Karen, I don?t of a unified tool that predicts lncRNAs from genomic sequence. I found a tool that predicts lncRNAs from RNAseq dataset, which you might be able to use for your project. I?ve never used it, but it might be a starting place. http://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-15-311 Here?s also a review that describes several workflows for annotating lncRNAs in insect genomes: http://www.sciencedirect.com/science/article/pii/S2214574515000061 Hope that helps, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 On Jan 15, 2016, at 3:39 PM, hcma > wrote: Hi Carlson, Regarding non-coding RNA predictions, MAKER only predicts tRNAs and rRNAs, but not other RNAs, for example, lncRNAs? Thanks again. Best Regards Karen On 2016-01-15 09:16, Carson Holt wrote: Can maker2 be run without inputting any protein sequences? Yes. But it will not perform as well. How to turn this off in the control files? Any option left blank is off. Also, can i run maker using Augustus and not SNAP? Again, how do i turn SNAP off? Yes. Leave it blank. Does maker also predict non-coding genes? You can run it with tRNAscan or snoscan. Snoscan requires you to have rRNAs from your organism to train with though. ?Carson _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.s.campbell1 at gmail.com Fri Jan 15 19:11:05 2016 From: michael.s.campbell1 at gmail.com (Michael Campbell) Date: Fri, 15 Jan 2016 17:11:05 -0800 Subject: [maker-devel] basic question on maker In-Reply-To: References: <07C1C0FA-243E-4F05-AE17-FBECCE5BC3A5@gmail.com> Message-ID: <980827F7-C4F0-4E26-A480-F588B2E09C7B@gmail.com> Hi Karen, Just a quick clarification, MAKER doesn?t predict the rRNAs. If you give MAKER the rRNA sequence with the O-methylation sites it will run snoscan to predict snoRNAs. Take care, Mike > On Jan 15, 2016, at 2:39 PM, hcma wrote: > > Hi Carlson, > > Regarding non-coding RNA predictions, MAKER only predicts tRNAs and rRNAs, but not other RNAs, for example, lncRNAs? > > Thanks again. > > Best Regards > Karen > > > > > On 2016-01-15 09:16, Carson Holt wrote: >>> Can maker2 be run without inputting any protein sequences? >> Yes. But it will not perform as well. >>> How to turn this off in the control files? >> Any option left blank is off. >>> Also, can i run maker using Augustus and not SNAP? Again, how do i turn SNAP off? >> Yes. Leave it blank. >>> Does maker also predict non-coding genes? >> You can run it with tRNAscan or snoscan. Snoscan requires you to have >> rRNAs from your organism to train with though. >> ?Carson > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From nguyenan at mail.nih.gov Tue Jan 19 14:18:36 2016 From: nguyenan at mail.nih.gov (Nguyen, Anh-Dao (NIH/NHGRI) [C]) Date: Tue, 19 Jan 2016 20:18:36 +0000 Subject: [maker-devel] MAKER version 3 beta Message-ID: Hello, I just wanted to know if MAKER version 3 beta (EVM integration) has already been available for downloading? https://groups.google.com/forum/#!searchin/maker-devel/EVM|sort:date/maker- devel/YzsN-t0gu0U/-A_7YT2gFwAJ Thank you very much! Anh-Dao From carsonhh at gmail.com Tue Jan 19 14:23:54 2016 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 19 Jan 2016 13:23:54 -0700 Subject: [maker-devel] MAKER version 3 beta In-Reply-To: References: Message-ID: <6F128D66-685F-4F7F-9097-2A9065ECBC94@gmail.com> Yes. Go to the registration page for the standard MAKER download. After registering, you will be redirected to a page with links to both the current version of MAKER as well as the beta ?> http://yandell.topaz.genetics.utah.edu/cgi-bin/maker_license.cgi ?Carson > On Jan 19, 2016, at 1:18 PM, Nguyen, Anh-Dao (NIH/NHGRI) [C] wrote: > > Hello, > > I just wanted to know if MAKER version 3 beta (EVM integration) has > already been available for downloading? > > https://groups.google.com/forum/#!searchin/maker-devel/EVM|sort:date/maker- > devel/YzsN-t0gu0U/-A_7YT2gFwAJ > > Thank you very much! > Anh-Dao > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From macmanes at gmail.com Tue Jan 19 14:34:38 2016 From: macmanes at gmail.com (Matthew MacManes) Date: Tue, 19 Jan 2016 15:34:38 -0500 Subject: [maker-devel] MAKER version 3 beta In-Reply-To: <6F128D66-685F-4F7F-9097-2A9065ECBC94@gmail.com> References: <6F128D66-685F-4F7F-9097-2A9065ECBC94@gmail.com> Message-ID: Just checking, when installing from the beta, I still see ./maker -?version 2.32 was expecting 3.00.. Thanks, Matt ______________________________________________ Matthew MacManes, Ph.D. University of New Hampshire? I? Assistant Professor of Genome Enabled Biology Department of Molecular, Cellular, & Biomedical Sciences Durham, NH? 03824 Phone: 603-862-4052? | ?Twitter:?@macmanes??| Web:?genomebio.org Office: 189 Rudman Hall | Laboratory: 145 Rudman Hall On January 19, 2016 at 3:24:16 PM, Carson Holt (carsonhh at gmail.com) wrote: Yes. ?Go to the registration page for the standard MAKER download. After registering, you will be redirected to a page with links to both the current version of MAKER as well as the beta ?>?http://yandell.topaz.genetics.utah.edu/cgi-bin/maker_license.cgi ?Carson On Jan 19, 2016, at 1:18 PM, Nguyen, Anh-Dao (NIH/NHGRI) [C] wrote: Hello, I just wanted to know if MAKER version 3 beta (EVM integration) has already been available for downloading? https://groups.google.com/forum/#!searchin/maker-devel/EVM|sort:date/maker- devel/YzsN-t0gu0U/-A_7YT2gFwAJ Thank you very much! Anh-Dao _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Jan 19 14:35:55 2016 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 19 Jan 2016 13:35:55 -0700 Subject: [maker-devel] MAKER version 3 beta In-Reply-To: References: <6F128D66-685F-4F7F-9097-2A9065ECBC94@gmail.com> Message-ID: Thanks. I?ll fix that. ?Carson > On Jan 19, 2016, at 1:34 PM, Matthew MacManes wrote: > > Just checking, when installing from the beta, I still see > > ./maker -?version > 2.32 > was expecting 3.00.. > > Thanks, Matt > > > > > ______________________________________________ > Matthew MacManes, Ph.D. > University of New Hampshire I Assistant Professor of Genome Enabled Biology > Department of Molecular, Cellular, & Biomedical Sciences > Durham, NH 03824 > Phone: 603-862-4052 | Twitter: @macmanes? | Web: genomebio.org > Office: 189 Rudman Hall | Laboratory: 145 Rudman Hall > > On January 19, 2016 at 3:24:16 PM, Carson Holt (carsonhh at gmail.com ) wrote: > >> Yes. Go to the registration page for the standard MAKER download. After registering, you will be redirected to a page with links to both the current version of MAKER as well as the beta ?> http://yandell.topaz.genetics.utah.edu/cgi-bin/maker_license.cgi >> >> ?Carson >> >> >> >>> On Jan 19, 2016, at 1:18 PM, Nguyen, Anh-Dao (NIH/NHGRI) [C] > wrote: >>> >>> Hello, >>> >>> I just wanted to know if MAKER version 3 beta (EVM integration) has >>> already been available for downloading? >>> >>> https://groups.google.com/forum/#!searchin/maker-devel/EVM|sort:date/maker- >>> devel/YzsN-t0gu0U/-A_7YT2gFwAJ >>> >>> Thank you very much! >>> Anh-Dao >>> >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From daren.card at gmail.com Wed Jan 20 09:27:28 2016 From: daren.card at gmail.com (Daren C. Card) Date: Wed, 20 Jan 2016 09:27:28 -0600 Subject: [maker-devel] Passing pre-masked repeats into Maker Message-ID: <306F04FB-DFFC-4CAB-8289-494FC87F13BA@gmail.com> Hello all, I?m about to use Maker to begin annotating a vertebrate genome. We use successive rounds of RepeatMasker to annotate repeats due to some library issues we?ve noticed with Repbase (at least in our critters) and to incorporate de novo repeats from RepeatModeler, a process I don?t think Maker could match. I?m wonder what the best way to pass these annotations into Maker would be. I see the thread at https://groups.google.com/forum/#!topic/maker-devel/7UbOIvwaaRM nicely outlines what Maker does with repeats, and it looks like I have 3 options: (1) reannotate in Maker, (2) pass in a RepeatMasker GFF, or (3) pass in a masked genome. #1 is problematic due to the reasons above. #2 looks like it would hard mask the complex repeats like we want, but will also hard mask the simple repeats, which wouldn?t be ideal for evidence mapping from transcripts/proteins. #3 is cautioned against in the link above, and without an accompanying GFF, I would imagine that Maker wouldn?t be able to release the masking to perform Exonerate polishing (Ns could be gaps or could be hard masking, it wouldn?t know). The way I thought to get around these apparent issues (but let me know if my thinking is incorrect) is to separate simple and complex repeats from the final RepeatMasker GFF. Feed only the complex repeats into Maker as a GFF, so that they are hard masked and accounted for, and have Maker also run RepeatMasker, thus remaking the simple repeats (and maybe some other complex hits, primarily through RepeatRunner). Then Maker can presumedly release the masking as needed. Would this type of workaround be a good idea or are there other options? Or am I just overthinking something that isn?t really a problem? Thanks in advance for any help. Daren Daren Card Castoe Lab University of Texas at Arlington www.darencard.net -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Jan 20 10:20:51 2016 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 20 Jan 2016 09:20:51 -0700 Subject: [maker-devel] Passing pre-masked repeats into Maker In-Reply-To: <306F04FB-DFFC-4CAB-8289-494FC87F13BA@gmail.com> References: <306F04FB-DFFC-4CAB-8289-494FC87F13BA@gmail.com> Message-ID: <6FD7CE4B-B944-4793-A822-9D395725ED6D@gmail.com> The strategy outlined would work. To get RepeatMasker to call only simple repeats in MAKER, set model_org=simple in the control files. ?Carson > On Jan 20, 2016, at 8:27 AM, Daren C. Card wrote: > > Hello all, > > I?m about to use Maker to begin annotating a vertebrate genome. We use successive rounds of RepeatMasker to annotate repeats due to some library issues we?ve noticed with Repbase (at least in our critters) and to incorporate de novo repeats from RepeatModeler, a process I don?t think Maker could match. I?m wonder what the best way to pass these annotations into Maker would be. > > I see the thread at https://groups.google.com/forum/#!topic/maker-devel/7UbOIvwaaRM nicely outlines what Maker does with repeats, and it looks like I have 3 options: (1) reannotate in Maker, (2) pass in a RepeatMasker GFF, or (3) pass in a masked genome. > > #1 is problematic due to the reasons above. > > #2 looks like it would hard mask the complex repeats like we want, but will also hard mask the simple repeats, which wouldn?t be ideal for evidence mapping from transcripts/proteins. > > #3 is cautioned against in the link above, and without an accompanying GFF, I would imagine that Maker wouldn?t be able to release the masking to perform Exonerate polishing (Ns could be gaps or could be hard masking, it wouldn?t know). > > The way I thought to get around these apparent issues (but let me know if my thinking is incorrect) is to separate simple and complex repeats from the final RepeatMasker GFF. Feed only the complex repeats into Maker as a GFF, so that they are hard masked and accounted for, and have Maker also run RepeatMasker, thus remaking the simple repeats (and maybe some other complex hits, primarily through RepeatRunner). Then Maker can presumedly release the masking as needed. > > Would this type of workaround be a good idea or are there other options? Or am I just overthinking something that isn?t really a problem? > > Thanks in advance for any help. > > Daren > > Daren Card > Castoe Lab > University of Texas at Arlington > www.darencard.net _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Wed Jan 20 10:21:38 2016 From: dence at genetics.utah.edu (Daniel Ence) Date: Wed, 20 Jan 2016 16:21:38 +0000 Subject: [maker-devel] Passing pre-masked repeats into Maker In-Reply-To: <306F04FB-DFFC-4CAB-8289-494FC87F13BA@gmail.com> References: <306F04FB-DFFC-4CAB-8289-494FC87F13BA@gmail.com> Message-ID: HI Daren, I think the solution you described sounds appropriate. If you?re concerned about how the simple repeats will be handled by maker in the gff, then you can just take those out. If they?re important for downstream analysis, you can add them back in then. Let me know if that helps or if other issues arise. Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 On Jan 20, 2016, at 8:27 AM, Daren C. Card > wrote: Hello all, I?m about to use Maker to begin annotating a vertebrate genome. We use successive rounds of RepeatMasker to annotate repeats due to some library issues we?ve noticed with Repbase (at least in our critters) and to incorporate de novo repeats from RepeatModeler, a process I don?t think Maker could match. I?m wonder what the best way to pass these annotations into Maker would be. I see the thread at https://groups.google.com/forum/#!topic/maker-devel/7UbOIvwaaRM nicely outlines what Maker does with repeats, and it looks like I have 3 options: (1) reannotate in Maker, (2) pass in a RepeatMasker GFF, or (3) pass in a masked genome. #1 is problematic due to the reasons above. #2 looks like it would hard mask the complex repeats like we want, but will also hard mask the simple repeats, which wouldn?t be ideal for evidence mapping from transcripts/proteins. #3 is cautioned against in the link above, and without an accompanying GFF, I would imagine that Maker wouldn?t be able to release the masking to perform Exonerate polishing (Ns could be gaps or could be hard masking, it wouldn?t know). The way I thought to get around these apparent issues (but let me know if my thinking is incorrect) is to separate simple and complex repeats from the final RepeatMasker GFF. Feed only the complex repeats into Maker as a GFF, so that they are hard masked and accounted for, and have Maker also run RepeatMasker, thus remaking the simple repeats (and maybe some other complex hits, primarily through RepeatRunner). Then Maker can presumedly release the masking as needed. Would this type of workaround be a good idea or are there other options? Or am I just overthinking something that isn?t really a problem? Thanks in advance for any help. Daren Daren Card Castoe Lab University of Texas at Arlington www.darencard.net _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From jcornel3 at asu.edu Fri Jan 22 15:38:14 2016 From: jcornel3 at asu.edu (John Cornelius) Date: Fri, 22 Jan 2016 14:38:14 -0700 Subject: [maker-devel] Question on post processing of annotations Message-ID: Hi, I'm using the maker_map_ids script to change the gene ids on an annotation that I just finished. However, I noticed that it does not change the name of genes predicted by SNAP. Is there any way to include SNAP genes for consideration by maker_map_ids? Thanks. -- John Cornelius MCB PhD Candidate Arizona State University -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Jan 22 16:01:29 2016 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 22 Jan 2016 15:01:29 -0700 Subject: [maker-devel] Question on post processing of annotations In-Reply-To: References: Message-ID: Hi John, Do you mean the match/match_part features that are source snap_masked? Those are not genes, they are reference alignments representing the ab initio SNAP calls, and it would be incorrect to rename them. They do not have a 1 to 1 relationship with the final gene models. Sometimes a gene model will overlap 2 or more uninformed SNAP ab initio reference alignments, or one SNAP reference alignment may overlap multiple final gene models, so names cannot just be passed from one to the other. If you want to add specific SNAP models to the final annotation set, you would need to upgrade them to being a gene/mRNA/exon/CDS feature before you can do that. You can do that with manual editors like Apollo, or you can supply a subset of the features you want to upgrade to maker in the pred_gff= option as a separate run, put existing models in model_gff=, and run with keep_preds=1. I know I have covered this previously in greater detail as part of the devel list. If you search the archives for the keywords pred_gff, keep_preds, and iprscan you should come across a number of threads that may be helpful ?> https://groups.google.com/forum/#!forum/maker-devel Thanks, Carson > On Jan 22, 2016, at 2:38 PM, John Cornelius wrote: > > Hi, I'm using the maker_map_ids script to change the gene ids on an annotation that I just finished. However, I noticed that it does not change the name of genes predicted by SNAP. Is there any way to include SNAP genes for consideration by maker_map_ids? Thanks. > > -- > John Cornelius > MCB PhD Candidate > Arizona State University > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From jcornel3 at asu.edu Fri Jan 22 16:06:17 2016 From: jcornel3 at asu.edu (John Cornelius) Date: Fri, 22 Jan 2016 15:06:17 -0700 Subject: [maker-devel] Question on post processing of annotations In-Reply-To: References: Message-ID: I'll look into that thanks. I had been previously just been looking for things in regards to the script itself and its functionality. On Fri, Jan 22, 2016 at 3:01 PM, Carson Holt wrote: > Hi John, > > Do you mean the match/match_part features that are source snap_masked? > Those are not genes, they are reference alignments representing the ab > initio SNAP calls, and it would be incorrect to rename them. They do not > have a 1 to 1 relationship with the final gene models. Sometimes a gene > model will overlap 2 or more uninformed SNAP ab initio reference > alignments, or one SNAP reference alignment may overlap multiple final gene > models, so names cannot just be passed from one to the other. > > If you want to add specific SNAP models to the final annotation set, you > would need to upgrade them to being a gene/mRNA/exon/CDS feature before you > can do that. You can do that with manual editors like Apollo, or you can > supply a subset of the features you want to upgrade to maker in the > pred_gff= option as a separate run, put existing models in model_gff=, and > run with keep_preds=1. > > I know I have covered this previously in greater detail as part of the > devel list. If you search the archives for the keywords pred_gff, > keep_preds, and iprscan you should come across a number of threads that may > be helpful ?> https://groups.google.com/forum/#!forum/maker-devel > > Thanks, > Carson > > > > On Jan 22, 2016, at 2:38 PM, John Cornelius wrote: > > Hi, I'm using the maker_map_ids script to change the gene ids on an > annotation that I just finished. However, I noticed that it does not change > the name of genes predicted by SNAP. Is there any way to include SNAP genes > for consideration by maker_map_ids? Thanks. > > -- > John Cornelius > MCB PhD Candidate > Arizona State University > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > -- John Cornelius MCB PhD Candidate Arizona State University -------------- next part -------------- An HTML attachment was scrubbed... URL: From sapuizait at gmail.com Wed Jan 27 07:14:45 2016 From: sapuizait at gmail.com (Panos Sapou) Date: Wed, 27 Jan 2016 14:14:45 +0100 Subject: [maker-devel] prokaryotic genome annotation Message-ID: Dear all I recently started using maker for the annotation of my prokaryotic genomes and even if i managed to get some nice results I would like to check with you if what I did was right and also ask you a couple of questions about the procedure I also apologize in advance if I ask sth silly since I am a newbie in bionformatics and I might ask very basic stuff I have only available DNA sequences, I have no ESTs and no proteins 1) I started by using the protein2genome option and as reference I used the Uniref50 database. Then I generated a merged gff file (similar procedure like the one in the tutorial maker) 2) I used Genemark.S and I created a model by using the gmsn.pl command and as input the assembled contigs of my bacteria 3) after finishing the above 2 steps I run maker again by using as input the gff file from step 1: #-------Re-annotation using maker derived GFF3: maker_gff=input.gff and I also set protein_pass=1 is that correct? do you think it helps? and at the #-----gene prediction I used the hmm.mod file generated in step 2 my questions: Do the above sound correct? it is in my understanding that I can only use genemark for prokaryotic genomes, is that correct? when I run maker the second time (step 3) should I set protein2genome=1 or 0? or just having the gff file (from step 1) in the re-annotation options is enough? and thefore prediction based on the protein2genome has already been done? Also if I use a gff file (from step 1) will it make any difference if I set protein2genome=1 and use an extra (different) database? (I was wondering if it will improve the results?) finally regarding the choice of the database: would you advise me to use uniref or the proteomes of closely related bacteria (I have downloaded and created a single fasta from appx 100 proteomes of closely related bacteria) thank you in advance and once again I apologize if it is pretty basic what I am asking, just wanted to make sure... Best Panos -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Jan 27 09:17:37 2016 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 27 Jan 2016 08:17:37 -0700 Subject: [maker-devel] prokaryotic genome annotation In-Reply-To: References: Message-ID: <032FB687-8EDD-49A7-9198-3A5E7FE04C88@gmail.com> Hi Panos, The strategy for annotating prokaryotes is very different than that for eukaryotes. Basically my recommendation is to use Genemark S and set protein2genome=1, keep_preds=1, always_complete=1, and no need for ESTs (irrelevant in prokaryotes). No need to do multiple iterations like you would for eukaryotes either. The bootstrapping procedure is not relevant for prokaryotes. I?d avoid also using the GFF3 passthrough option, you will lose some information about the alignment that affects reading frame of the protein evidence. It can be convenient for large eukaryotes when you are pulling evidence from a database, but if it?s just from a previous maker run, you should just rerun in the same directory with the protein fasta. MAKER will detect that it already ran blastx and pull the raw reports form the previous datastore. Thanks, Carson > On Jan 27, 2016, at 6:14 AM, Panos Sapou wrote: > > Dear all > > I recently started using maker for the annotation of my prokaryotic genomes and even if i managed to get some nice results I would like to check with you if what I did was right and also ask you a couple of questions about the procedure > > I also apologize in advance if I ask sth silly since I am a newbie in bionformatics and I might ask very basic stuff > > > I have only available DNA sequences, I have no ESTs and no proteins > > 1) I started by using the protein2genome option and as reference I used the Uniref50 database. Then I generated a merged gff file (similar procedure like the one in the tutorial maker) > > 2) I used Genemark.S and I created a model by using the gmsn.pl command and as input the assembled contigs of my bacteria > > 3) after finishing the above 2 steps I run maker again by using as input the gff file from step 1: #-------Re-annotation using maker derived GFF3: maker_gff=input.gff > and I also set > protein_pass=1 > is that correct? do you think it helps? > and at the #-----gene prediction I used the hmm.mod file generated in step 2 > > my questions: > Do the above sound correct? > > it is in my understanding that I can only use genemark for prokaryotic genomes, is that correct? > > when I run maker the second time (step 3) should I set protein2genome=1 or 0? or just having the gff file (from step 1) in the re-annotation options is enough? and thefore prediction based on the protein2genome has already been done? > > Also if I use a gff file (from step 1) will it make any difference if I set protein2genome=1 and use an extra (different) database? (I was wondering if it will improve the results?) > > finally regarding the choice of the database: would you advise me to use uniref or the proteomes of closely related bacteria (I have downloaded and created a single fasta from appx 100 proteomes of closely related bacteria) > > thank you in advance > and once again I apologize if it is pretty basic what I am asking, just wanted to make sure... > > > Best > Panos > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From cjfields at illinois.edu Wed Jan 27 12:30:29 2016 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 27 Jan 2016 18:30:29 +0000 Subject: [maker-devel] prokaryotic genome annotation In-Reply-To: <032FB687-8EDD-49A7-9198-3A5E7FE04C88@gmail.com> References: <032FB687-8EDD-49A7-9198-3A5E7FE04C88@gmail.com> Message-ID: <6B0EA45F-9526-4ED2-AED7-1DA3E1AEDD24@illinois.edu> We?re thinking of switching our bacterial pipeline to MAKER actually. We generally use other bacteria-specific gene pred tools like Glimmer and Prodigal, though I anticipate these could be added using pred_gff (as long as the GFF3 is fine)? chris On Jan 27, 2016, at 9:17 AM, Carson Holt > wrote: Hi Panos, The strategy for annotating prokaryotes is very different than that for eukaryotes. Basically my recommendation is to use Genemark S and set protein2genome=1, keep_preds=1, always_complete=1, and no need for ESTs (irrelevant in prokaryotes). No need to do multiple iterations like you would for eukaryotes either. The bootstrapping procedure is not relevant for prokaryotes. I?d avoid also using the GFF3 passthrough option, you will lose some information about the alignment that affects reading frame of the protein evidence. It can be convenient for large eukaryotes when you are pulling evidence from a database, but if it?s just from a previous maker run, you should just rerun in the same directory with the protein fasta. MAKER will detect that it already ran blastx and pull the raw reports form the previous datastore. Thanks, Carson On Jan 27, 2016, at 6:14 AM, Panos Sapou > wrote: Dear all I recently started using maker for the annotation of my prokaryotic genomes and even if i managed to get some nice results I would like to check with you if what I did was right and also ask you a couple of questions about the procedure I also apologize in advance if I ask sth silly since I am a newbie in bionformatics and I might ask very basic stuff I have only available DNA sequences, I have no ESTs and no proteins 1) I started by using the protein2genome option and as reference I used the Uniref50 database. Then I generated a merged gff file (similar procedure like the one in the tutorial maker) 2) I used Genemark.S and I created a model by using the gmsn.pl command and as input the assembled contigs of my bacteria 3) after finishing the above 2 steps I run maker again by using as input the gff file from step 1: #-------Re-annotation using maker derived GFF3: maker_gff=input.gff and I also set protein_pass=1 is that correct? do you think it helps? and at the #-----gene prediction I used the hmm.mod file generated in step 2 my questions: Do the above sound correct? it is in my understanding that I can only use genemark for prokaryotic genomes, is that correct? when I run maker the second time (step 3) should I set protein2genome=1 or 0? or just having the gff file (from step 1) in the re-annotation options is enough? and thefore prediction based on the protein2genome has already been done? Also if I use a gff file (from step 1) will it make any difference if I set protein2genome=1 and use an extra (different) database? (I was wondering if it will improve the results?) finally regarding the choice of the database: would you advise me to use uniref or the proteomes of closely related bacteria (I have downloaded and created a single fasta from appx 100 proteomes of closely related bacteria) thank you in advance and once again I apologize if it is pretty basic what I am asking, just wanted to make sure... Best Panos _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Jan 27 16:42:59 2016 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 27 Jan 2016 15:42:59 -0700 Subject: [maker-devel] prokaryotic genome annotation In-Reply-To: <6B0EA45F-9526-4ED2-AED7-1DA3E1AEDD24@illinois.edu> References: <032FB687-8EDD-49A7-9198-3A5E7FE04C88@gmail.com> <6B0EA45F-9526-4ED2-AED7-1DA3E1AEDD24@illinois.edu> Message-ID: GFF3 is just fine for adding predictions. For prokaryotes, I don?t like to add protein evidence that way, but for predictions it?s fine. The only issue I could see going forward would be a lack of support for alternate codon usage in MAKER right now. Everything is being interpreted using the canonical codon table. It?s not an insurmountable issue, but it would take some work to let it do that. --Carson > On Jan 27, 2016, at 11:30 AM, Fields, Christopher J wrote: > > We?re thinking of switching our bacterial pipeline to MAKER actually. We generally use other bacteria-specific gene pred tools like Glimmer and Prodigal, though I anticipate these could be added using pred_gff (as long as the GFF3 is fine)? > > chris > >> On Jan 27, 2016, at 9:17 AM, Carson Holt > wrote: >> >> Hi Panos, >> >> The strategy for annotating prokaryotes is very different than that for eukaryotes. Basically my recommendation is to use Genemark S and set protein2genome=1, keep_preds=1, always_complete=1, and no need for ESTs (irrelevant in prokaryotes). No need to do multiple iterations like you would for eukaryotes either. The bootstrapping procedure is not relevant for prokaryotes. I?d avoid also using the GFF3 passthrough option, you will lose some information about the alignment that affects reading frame of the protein evidence. It can be convenient for large eukaryotes when you are pulling evidence from a database, but if it?s just from a previous maker run, you should just rerun in the same directory with the protein fasta. MAKER will detect that it already ran blastx and pull the raw reports form the previous datastore. >> >> Thanks, >> Carson >> >> >> >>> On Jan 27, 2016, at 6:14 AM, Panos Sapou > wrote: >>> >>> Dear all >>> >>> I recently started using maker for the annotation of my prokaryotic genomes and even if i managed to get some nice results I would like to check with you if what I did was right and also ask you a couple of questions about the procedure >>> >>> I also apologize in advance if I ask sth silly since I am a newbie in bionformatics and I might ask very basic stuff >>> >>> >>> I have only available DNA sequences, I have no ESTs and no proteins >>> >>> 1) I started by using the protein2genome option and as reference I used the Uniref50 database. Then I generated a merged gff file (similar procedure like the one in the tutorial maker) >>> >>> 2) I used Genemark.S and I created a model by using the gmsn.pl command and as input the assembled contigs of my bacteria >>> >>> 3) after finishing the above 2 steps I run maker again by using as input the gff file from step 1: #-------Re-annotation using maker derived GFF3: maker_gff=input.gff >>> and I also set >>> protein_pass=1 >>> is that correct? do you think it helps? >>> and at the #-----gene prediction I used the hmm.mod file generated in step 2 >>> >>> my questions: >>> Do the above sound correct? >>> >>> it is in my understanding that I can only use genemark for prokaryotic genomes, is that correct? >>> >>> when I run maker the second time (step 3) should I set protein2genome=1 or 0? or just having the gff file (from step 1) in the re-annotation options is enough? and thefore prediction based on the protein2genome has already been done? >>> >>> Also if I use a gff file (from step 1) will it make any difference if I set protein2genome=1 and use an extra (different) database? (I was wondering if it will improve the results?) >>> >>> finally regarding the choice of the database: would you advise me to use uniref or the proteomes of closely related bacteria (I have downloaded and created a single fasta from appx 100 proteomes of closely related bacteria) >>> >>> thank you in advance >>> and once again I apologize if it is pretty basic what I am asking, just wanted to make sure... >>> >>> >>> Best >>> Panos >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sapuizait at gmail.com Fri Jan 29 04:12:35 2016 From: sapuizait at gmail.com (Panos Sapou) Date: Fri, 29 Jan 2016 11:12:35 +0100 Subject: [maker-devel] prokaryotic genome annotation In-Reply-To: References: Message-ID: Dear all I am trying to annotate a new spiroplasma strain and I would like to know if there is a way to change the stop codons (not take into account 'tga') cause eitherwise I get too many premature stop codons and fragmented genes that are not real Best Panos On 27 January 2016 at 14:14, Panos Sapou wrote: > Dear all > > I recently started using maker for the annotation of my prokaryotic > genomes and even if i managed to get some nice results I would like to > check with you if what I did was right and also ask you a couple of > questions about the procedure > > I also apologize in advance if I ask sth silly since I am a newbie in > bionformatics and I might ask very basic stuff > > > I have only available DNA sequences, I have no ESTs and no proteins > > 1) I started by using the protein2genome option and as reference I used > the Uniref50 database. Then I generated a merged gff file (similar > procedure like the one in the tutorial maker) > > 2) I used Genemark.S and I created a model by using the gmsn.pl command > and as input the assembled contigs of my bacteria > > 3) after finishing the above 2 steps I run maker again by using as input > the gff file from step 1: #-------Re-annotation using maker derived GFF3: > maker_gff=input.gff > and I also set > protein_pass=1 > is that correct? do you think it helps? > and at the #-----gene prediction I used the hmm.mod file generated in step > 2 > > my questions: > Do the above sound correct? > > it is in my understanding that I can only use genemark for prokaryotic > genomes, is that correct? > > when I run maker the second time (step 3) should I set protein2genome=1 or > 0? or just having the gff file (from step 1) in the re-annotation options > is enough? and thefore prediction based on the protein2genome has already > been done? > > Also if I use a gff file (from step 1) will it make any difference if I > set protein2genome=1 and use an extra (different) database? (I was > wondering if it will improve the results?) > > finally regarding the choice of the database: would you advise me to use > uniref or the proteomes of closely related bacteria (I have downloaded and > created a single fasta from appx 100 proteomes of closely related bacteria) > > thank you in advance > and once again I apologize if it is pretty basic what I am asking, just > wanted to make sure... > > > Best > Panos > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Sun Jan 31 14:43:21 2016 From: carsonhh at gmail.com (Carson Holt) Date: Sun, 31 Jan 2016 13:43:21 -0700 Subject: [maker-devel] prokaryotic genome annotation In-Reply-To: References: Message-ID: MAKER doesn?t support alternate codon usage yet. ?Carson > On Jan 29, 2016, at 3:12 AM, Panos Sapou wrote: > > Dear all > > I am trying to annotate a new spiroplasma strain and I would like to know if there is a way to change the stop codons (not take into account 'tga') > > cause eitherwise I get too many premature stop codons and fragmented genes that are not real > > Best > Panos > > On 27 January 2016 at 14:14, Panos Sapou > wrote: > Dear all > > I recently started using maker for the annotation of my prokaryotic genomes and even if i managed to get some nice results I would like to check with you if what I did was right and also ask you a couple of questions about the procedure > > I also apologize in advance if I ask sth silly since I am a newbie in bionformatics and I might ask very basic stuff > > > I have only available DNA sequences, I have no ESTs and no proteins > > 1) I started by using the protein2genome option and as reference I used the Uniref50 database. Then I generated a merged gff file (similar procedure like the one in the tutorial maker) > > 2) I used Genemark.S and I created a model by using the gmsn.pl command and as input the assembled contigs of my bacteria > > 3) after finishing the above 2 steps I run maker again by using as input the gff file from step 1: #-------Re-annotation using maker derived GFF3: maker_gff=input.gff > and I also set > protein_pass=1 > is that correct? do you think it helps? > and at the #-----gene prediction I used the hmm.mod file generated in step 2 > > my questions: > Do the above sound correct? > > it is in my understanding that I can only use genemark for prokaryotic genomes, is that correct? > > when I run maker the second time (step 3) should I set protein2genome=1 or 0? or just having the gff file (from step 1) in the re-annotation options is enough? and thefore prediction based on the protein2genome has already been done? > > Also if I use a gff file (from step 1) will it make any difference if I set protein2genome=1 and use an extra (different) database? (I was wondering if it will improve the results?) > > finally regarding the choice of the database: would you advise me to use uniref or the proteomes of closely related bacteria (I have downloaded and created a single fasta from appx 100 proteomes of closely related bacteria) > > thank you in advance > and once again I apologize if it is pretty basic what I am asking, just wanted to make sure... > > > Best > Panos > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From arnstrm at gmail.com Fri Jan 1 11:31:06 2016 From: arnstrm at gmail.com (Arun Seetharam) Date: Fri, 1 Jan 2016 12:31:06 -0600 Subject: [maker-devel] running MAKER to merge annotations Message-ID: Hi all, First of all, a very happy new year to all of you! I hope everyone is having a great holiday season. I have a question about Maker. For my grass species, I ran 3 separate rounds of MAKER (low GC, regular GC and high GC) and I now have 3 maker gff files. So, what is the correct way to merge these files to a single gff file? Do I have to run a maker round with just the GFF files as input? It looks like EVM especially meant to do this kind of job, but not sure if Maker does this too. Thanks for any help or suggestions! Have a nice day, -- Arun Seetharam Post Doctoral Research Associate Genome Informatics Facility & EEOB Office of Biotechnology 228 Science I Iowa State University Ames, Iowa 50011 -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Fri Jan 1 11:37:38 2016 From: dence at genetics.utah.edu (Daniel Ence) Date: Fri, 1 Jan 2016 18:37:38 +0000 Subject: [maker-devel] running MAKER to merge annotations In-Reply-To: References: Message-ID: Hi Arun, are the three rounds of maker on different parts/versions of the genome or did you maker on the same genome with three different settings? If it?s the former, then you can merge the maker gff files with gff3_merge, which is included with your maker installation. If it?s the latter case then I do think EVM could help if you want to give the different result sets different confidence weights. If you want to give them all the same weight, then you could do another run of maker, and pass them through as either models or predictions. ~Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 > On Jan 1, 2016, at 11:31 AM, Arun Seetharam wrote: > > Hi all, > > First of all, a very happy new year to all of you! I hope everyone is having a great holiday season. > > I have a question about Maker. For my grass species, I ran 3 separate rounds of MAKER (low GC, regular GC and high GC) and I now have 3 maker gff files. So, what is the correct way to merge these files to a single gff file? Do I have to run a maker round with just the GFF files as input? It looks like EVM especially meant to do this kind of job, but not sure if Maker does this too. > > Thanks for any help or suggestions! > > Have a nice day, > -- > Arun Seetharam > Post Doctoral Research Associate > Genome Informatics Facility & EEOB > Office of Biotechnology > 228 Science I > Iowa State University > Ames, Iowa 50011 > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From arnstrm at gmail.com Fri Jan 1 12:17:18 2016 From: arnstrm at gmail.com (Arun Seetharam) Date: Fri, 1 Jan 2016 13:17:18 -0600 Subject: [maker-devel] running MAKER to merge annotations In-Reply-To: References: Message-ID: Hi Daniel, Thanks very much for the reply! It is the latter: same input genome under 3 settings (training was done using a different set of genes for all the gene predictors). I simply want to get a single gff, retaining only the best model(s) for each locus. Are you suggesting that I can run MAKER by providing 3 files for "maker_gff" (in maker_opts.ctl) and keeping everything else default? or do I have to do something in the CTL file to achieve this? I appreciate if you can provide more details for how to do this! Thanks once again for the reply! On Fri, Jan 1, 2016 at 12:37 PM, Daniel Ence wrote: > Hi Arun, are the three rounds of maker on different parts/versions of the > genome or did you maker on the same genome with three different settings? > If it?s the former, then you can merge the maker gff files with gff3_merge, > which is included with your maker installation. > > If it?s the latter case then I do think EVM could help if you want to give > the different result sets different confidence weights. If you want to give > them all the same weight, then you could do another run of maker, and pass > them through as either models or predictions. > > ~Daniel > > > > > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > > > On Jan 1, 2016, at 11:31 AM, Arun Seetharam wrote: > > > > Hi all, > > > > First of all, a very happy new year to all of you! I hope everyone is > having a great holiday season. > > > > I have a question about Maker. For my grass species, I ran 3 separate > rounds of MAKER (low GC, regular GC and high GC) and I now have 3 maker gff > files. So, what is the correct way to merge these files to a single gff > file? Do I have to run a maker round with just the GFF files as input? It > looks like EVM especially meant to do this kind of job, but not sure if > Maker does this too. > > > > Thanks for any help or suggestions! > > > > Have a nice day, > > -- > > Arun Seetharam > > Post Doctoral Research Associate > > Genome Informatics Facility & EEOB > > Office of Biotechnology > > 228 Science I > > Iowa State University > > Ames, Iowa 50011 > > _______________________________________________ > > maker-devel mailing list > > maker-devel at box290.bluehost.com > > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -- Arun Seetharam Post Doctoral Research Associate Genome Informatics Facility & EEOB Office of Biotechnology 228 Science I Iowa State University Ames, Iowa 50011 -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Jan 1 12:26:14 2016 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 1 Jan 2016 12:26:14 -0700 Subject: [maker-devel] running MAKER to merge annotations In-Reply-To: References: Message-ID: <28388091-0956-412B-B472-0A272FA31269@gmail.com> If you are running with different settings on the exact same contig, you will have to merge the models using the -l legacy option of gff3_merge to ensure there will be no ID collisions (some things will have the same IDs in the different runs). Then supply just the genes to pred_gff on the rerun. Alternatively you could have just provided your different predictor files as a comma separated list (i.e. snaphmm=hmm1,hmm2,hmm3). MAKER would have ran each one and kept just the one that best matched the evidence. However because MAKER passes hints to the predictors (which override the HMM for the most part), I have found that running with different predictor settings because of GC differences between contigs doesn?t provide the benefit you would think. ?Carson > On Jan 1, 2016, at 12:17 PM, Arun Seetharam wrote: > > Hi Daniel, > > Thanks very much for the reply! It is the latter: same input genome under 3 settings (training was done using a different set of genes for all the gene predictors). I simply want to get a single gff, retaining only the best model(s) for each locus. > Are you suggesting that I can run MAKER by providing 3 files for "maker_gff" (in maker_opts.ctl) and keeping everything else default? or do I have to do something in the CTL file to achieve this? I appreciate if you can provide more details for how to do this! > > Thanks once again for the reply! > > On Fri, Jan 1, 2016 at 12:37 PM, Daniel Ence > wrote: > Hi Arun, are the three rounds of maker on different parts/versions of the genome or did you maker on the same genome with three different settings? If it?s the former, then you can merge the maker gff files with gff3_merge, which is included with your maker installation. > > If it?s the latter case then I do think EVM could help if you want to give the different result sets different confidence weights. If you want to give them all the same weight, then you could do another run of maker, and pass them through as either models or predictions. > > ~Daniel > > > > > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > > > On Jan 1, 2016, at 11:31 AM, Arun Seetharam > wrote: > > > > Hi all, > > > > First of all, a very happy new year to all of you! I hope everyone is having a great holiday season. > > > > I have a question about Maker. For my grass species, I ran 3 separate rounds of MAKER (low GC, regular GC and high GC) and I now have 3 maker gff files. So, what is the correct way to merge these files to a single gff file? Do I have to run a maker round with just the GFF files as input? It looks like EVM especially meant to do this kind of job, but not sure if Maker does this too. > > > > Thanks for any help or suggestions! > > > > Have a nice day, > > -- > > Arun Seetharam > > Post Doctoral Research Associate > > Genome Informatics Facility & EEOB > > Office of Biotechnology > > 228 Science I > > Iowa State University > > Ames, Iowa 50011 > > _______________________________________________ > > maker-devel mailing list > > maker-devel at box290.bluehost.com > > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > > -- > Arun Seetharam > Post Doctoral Research Associate > Genome Informatics Facility & EEOB > Office of Biotechnology > 228 Science I > Iowa State University > Ames, Iowa 50011 > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From zhangzb554 at nenu.edu.cn Sat Jan 2 09:09:24 2016 From: zhangzb554 at nenu.edu.cn (=?UTF-8?B?5byg5b+X5paM?=) Date: Sun, 3 Jan 2016 00:09:24 +0800 (GMT+08:00) Subject: [maker-devel] =?utf-8?q?maker-devel_Digest=2C_Vol_92=2C_Issue_1?= In-Reply-To: Message-ID: Hi every one
I wonder where I can downlaod the perl package proc::signal? I can not find it in CPAN. who could send me the package or give me the website where i can get it ?

thans for your help At 2016-01-02 03:00:02, maker-devel-request at yandell-lab.org wrote: >Send maker-devel mailing list submissions to > maker-devel at yandell-lab.org > >To subscribe or unsubscribe via the World Wide Web, visit > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > >or, via email, send a message with subject or body 'help' to > maker-devel-request at yandell-lab.org > >You can reach the person managing the list at > maker-devel-owner at yandell-lab.org > >When replying, please edit your Subject line so it is more specific >than "Re: Contents of maker-devel digest..." > > >Today's Topics: > > 1. running MAKER to merge annotations (Arun Seetharam) > 2. Re: running MAKER to merge annotations (Daniel Ence) > > >---------------------------------------------------------------------- > >Message: 1 >Date: Fri, 1 Jan 2016 12:31:06 -0600 >From: Arun Seetharam >To: maker-devel at yandell-lab.org >Subject: [maker-devel] running MAKER to merge annotations >Message-ID: > >Content-Type: text/plain; charset="utf-8" > >Hi all, > >First of all, a very happy new year to all of you! I hope everyone is >having a great holiday season. > >I have a question about Maker. For my grass species, I ran 3 separate >rounds of MAKER (low GC, regular GC and high GC) and I now have 3 maker gff >files. So, what is the correct way to merge these files to a single gff >file? Do I have to run a maker round with just the GFF files as input? It >looks like EVM especially meant to do this kind of job, but not sure if >Maker does this too. > >Thanks for any help or suggestions! > >Have a nice day, >-- >Arun Seetharam >Post Doctoral Research Associate >Genome Informatics Facility & EEOB >Office of Biotechnology >228 Science I >Iowa State University >Ames, Iowa 50011 >-------------- next part -------------- >An HTML attachment was scrubbed... >URL: > >------------------------------ > >Message: 2 >Date: Fri, 1 Jan 2016 18:37:38 +0000 >From: Daniel Ence >To: Arun Seetharam >Cc: "maker-devel at yandell-lab.org" >Subject: Re: [maker-devel] running MAKER to merge annotations >Message-ID: >Content-Type: text/plain; charset="utf-8" > >Hi Arun, are the three rounds of maker on different parts/versions of the genome or did you maker on the same genome with three different settings? If it?s the former, then you can merge the maker gff files with gff3_merge, which is included with your maker installation. > >If it?s the latter case then I do think EVM could help if you want to give the different result sets different confidence weights. If you want to give them all the same weight, then you could do another run of maker, and pass them through as either models or predictions. > >~Daniel > > > > > >Daniel Ence >Graduate Student >Eccles Institute of Human Genetics >University of Utah >15 North 2030 East, Room 2100 >Salt Lake City, UT 84112-5330 > >> On Jan 1, 2016, at 11:31 AM, Arun Seetharam wrote: >> >> Hi all, >> >> First of all, a very happy new year to all of you! I hope everyone is having a great holiday season. >> >> I have a question about Maker. For my grass species, I ran 3 separate rounds of MAKER (low GC, regular GC and high GC) and I now have 3 maker gff files. So, what is the correct way to merge these files to a single gff file? Do I have to run a maker round with just the GFF files as input? It looks like EVM especially meant to do this kind of job, but not sure if Maker does this too. >> >> Thanks for any help or suggestions! >> >> Have a nice day, >> -- >> Arun Seetharam >> Post Doctoral Research Associate >> Genome Informatics Facility & EEOB >> Office of Biotechnology >> 228 Science I >> Iowa State University >> Ames, Iowa 50011 >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > >------------------------------ > >Subject: Digest Footer > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > >------------------------------ > >End of maker-devel Digest, Vol 92, Issue 1 >****************************************** From arnstrm at iastate.edu Fri Jan 1 11:05:35 2016 From: arnstrm at iastate.edu (Arun Seetharam) Date: Fri, 1 Jan 2016 12:05:35 -0600 Subject: [maker-devel] running MAKER to merge annotations Message-ID: Hi all, First of all, a very happy new year to all of you! I hope everyone is having a great holiday season. I have a question about Maker. For my grass species, I ran 3 separate rounds of MAKER (low GC, regular GC and high GC) and I now have 3 maker gff files. So, what is the correct way to merge these files to a single gff file? Do I have to run a maker round with just the GFF files as input? It looks like EVM especially meant to do this kind of job, but not sure if Maker does this too. Thanks for any help or suggestions! Have a nice day, This email has been sent from a virus-free computer protected by Avast. www.avast.com <#DDB4FAA8-2DD7-40BB-A1B8-4E2AA1F9FDF2> -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Jan 4 09:04:29 2016 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 4 Jan 2016 09:04:29 -0700 Subject: [maker-devel] Error with maker_functional_gff In-Reply-To: References: <1EBE8B59-ED4E-4017-99CE-6CD5A5662B74@genetics.utah.edu> Message-ID: Perhaps the easiest way to look at this is if you send us the files. I?m still leaning towards a format error. But it?s the kind of thing where I would need the files to find the specific entry. ?Carson > On Dec 16, 2015, at 11:32 PM, Ole Kristian T?rresen wrote: > > Here's the hits for GAMO_00029233 > >sp|Q9SUR9|SGT1A_ARATH Protein SGT1 homolog A OS=Arabidopsis thaliana GN=SGT1A PE=1 SV=1 > >sp|Q9SUT5|SGT1B_ARATH Protein SGT1 homolog B OS=Arabidopsis thaliana GN=SGT1B PE=1 SV=1 > >sp|Q2KIK0|SGT1_BOVIN Protein SGT1 homolog OS=Bos taurus GN=SUGT1 PE=2 SV=1 > >sp|Q55ED0|SGT1_DICDI Protein SGT1 homolog OS=Dictyostelium discoideum GN=sugt1 PE=2 SV=1 > >sp|Q9Y2Z0|SGT1_HUMAN Protein SGT1 homolog OS=Homo sapiens GN=SUGT1 PE=1 SV=3 > >sp|Q9CX34|SGT1_MOUSE Protein SGT1 homolog OS=Mus musculus GN=Sugt1 PE=1 SV=3 > >sp|Q0JL44|SGT1_ORYSJ Protein SGT1 homolog OS=Oryza sativa subsp. japonica GN=SGT1 PE=1 SV=1 > >sp|B0BN85|SGT1_RAT Protein SGT1 homolog OS=Rattus norvegicus GN=Sugt1 PE=2 SV=1 > > The bovin is the first hit. I can't really see anything different about that. > > I'm don't know perl that well. Do you have some code which I can use to debug this? In line 58 it tries to access the blast hash with the ID as a key, if I understand this correctly. Either the hash is empty where the key tries to access, or the key is empty. If I could print each ID as it is found, maybe I can find a pattern. And/or print each blast entry when the blast hash is created. > > Thank you. > > Ole > > On 16 December 2015 at 21:55, Carson Holt > wrote: > Find the hit for GAMO_00029233 and then pull it?s header line out of the Uniprot fasta file. There may be an unexpected formatting difference in that header. > > ?Carson > > > >> On Dec 16, 2015, at 1:53 PM, Ole Kristian T?rresen > wrote: >> >> Daniel, >> this is the previous gene, before maker_functional_gff: >> LG08 maker gene 13648888 13656687 . - . ID=GAMO_00029212;Name=GAMO_00029212;Alias=maker-LG08-snap-gene-46.325; >> LG08 maker mRNA 13648888 13656687 . - . ID=GAMO_00029212-RA;Parent=GAMO_00029212;Name=GAMO_00029212-RA;Alias=maker-LG08-snap-gene-46.325-mRNA-1;_AED=0.45;_QI=0|0.83|0.84|1|0.5|0.61|13|1843|351;_eAED=0.45; >> LG08 maker exon 13648888 13648944 . - . ID=GAMO_00029212-RA:exon:9363;Parent=GAMO_00029212-RA; >> LG08 maker exon 13649295 13649577 . - . ID=GAMO_00029212-RA:exon:9362;Parent=GAMO_00029212-RA; >> LG08 maker exon 13649816 13651468 . - . ID=GAMO_00029212-RA:exon:9361;Parent=GAMO_00029212-RA; >> LG08 maker exon 13651736 13651789 . - . ID=GAMO_00029212-RA:exon:9360;Parent=GAMO_00029212-RA; >> LG08 maker exon 13652270 13652365 . - . ID=GAMO_00029212-RA:exon:9359;Parent=GAMO_00029212-RA; >> LG08 maker exon 13652643 13652730 . - . ID=GAMO_00029212-RA:exon:9358;Parent=GAMO_00029212-RA; >> LG08 maker exon 13653175 13653212 . - . ID=GAMO_00029212-RA:exon:9357;Parent=GAMO_00029212-RA; >> LG08 maker exon 13653587 13653641 . - . ID=GAMO_00029212-RA:exon:9356;Parent=GAMO_00029212-RA; >> LG08 maker exon 13653764 13653817 . - . ID=GAMO_00029212-RA:exon:9355;Parent=GAMO_00029212-RA; >> LG08 maker exon 13653910 13653974 . - . ID=GAMO_00029212-RA:exon:9354;Parent=GAMO_00029212-RA; >> LG08 maker exon 13654085 13654164 . - . ID=GAMO_00029212-RA:exon:9353;Parent=GAMO_00029212-RA; >> LG08 maker exon 13654474 13654828 . - . ID=GAMO_00029212-RA:exon:9352;Parent=GAMO_00029212-RA; >> LG08 maker exon 13656667 13656687 . - . ID=GAMO_00029212-RA:exon:9351;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13656667 13656687 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13654474 13654828 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13654085 13654164 . - 2 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13653910 13653974 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13653764 13653817 . - 1 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13653587 13653641 . - 1 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13653175 13653212 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13652643 13652730 . - 1 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13652270 13652365 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13651736 13651789 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13651319 13651468 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker three_prime_UTR 13649816 13651318 . - . ID=GAMO_00029212-RA:three_prime_utr;Parent=GAMO_00029212-RA; >> LG08 maker three_prime_UTR 13649295 13649577 . - . ID=GAMO_00029212-RA:three_prime_utr;Parent=GAMO_00029212-RA; >> LG08 maker three_prime_UTR 13648888 13648944 . - . ID=GAMO_00029212-RA:three_prime_utr;Parent=GAMO_00029212-RA; >> LG08 maker gene 13786695 13806565 . - . ID=GAMO_00029233;Name=GAMO_00029233;Alias=maker-LG08-snap-gene-46.343; >> LG08 maker mRNA 13786695 13806565 . - . ID=GAMO_00029233-RA;Parent=GAMO_00029233;Name=GAMO_00029233-RA;Alias=maker-LG08-snap-gene-46.343-mRNA-1;_AED=0.47;_QI=173|0.78|0.66|1|0.21|0.26|15|0|301;_eAED=0.47; >> >> After : >> LG08 maker gene 13648888 13656687 . - . ID=GAMO_00029212;Name=GAMO_00029212;Alias=maker-LG08-snap-gene-46.325;Note=Similar to Tmbim1: Protein lifeguard 3 (Mus musculus); >> LG08 maker mRNA 13648888 13656687 . - . ID=GAMO_00029212-RA;Parent=GAMO_00029212;Name=GAMO_00029212-RA;Alias=maker-LG08-snap-gene-46.325-mRNA-1;_AED=0.45;_QI=0|0.83|0.84|1|0.5|0.61|13|1843|351;_eAED=0.45;Note=Similar to Tmbim1: Protein lifeguard 3 (Mus musculus); >> LG08 maker exon 13648888 13648944 . - . ID=GAMO_00029212-RA:exon:9363;Parent=GAMO_00029212-RA; >> LG08 maker exon 13649295 13649577 . - . ID=GAMO_00029212-RA:exon:9362;Parent=GAMO_00029212-RA; >> LG08 maker exon 13649816 13651468 . - . ID=GAMO_00029212-RA:exon:9361;Parent=GAMO_00029212-RA; >> LG08 maker exon 13651736 13651789 . - . ID=GAMO_00029212-RA:exon:9360;Parent=GAMO_00029212-RA; >> LG08 maker exon 13652270 13652365 . - . ID=GAMO_00029212-RA:exon:9359;Parent=GAMO_00029212-RA; >> LG08 maker exon 13652643 13652730 . - . ID=GAMO_00029212-RA:exon:9358;Parent=GAMO_00029212-RA; >> LG08 maker exon 13653175 13653212 . - . ID=GAMO_00029212-RA:exon:9357;Parent=GAMO_00029212-RA; >> LG08 maker exon 13653587 13653641 . - . ID=GAMO_00029212-RA:exon:9356;Parent=GAMO_00029212-RA; >> LG08 maker exon 13653764 13653817 . - . ID=GAMO_00029212-RA:exon:9355;Parent=GAMO_00029212-RA; >> LG08 maker exon 13653910 13653974 . - . ID=GAMO_00029212-RA:exon:9354;Parent=GAMO_00029212-RA; >> LG08 maker exon 13654085 13654164 . - . ID=GAMO_00029212-RA:exon:9353;Parent=GAMO_00029212-RA; >> LG08 maker exon 13654474 13654828 . - . ID=GAMO_00029212-RA:exon:9352;Parent=GAMO_00029212-RA; >> LG08 maker exon 13656667 13656687 . - . ID=GAMO_00029212-RA:exon:9351;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13656667 13656687 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13654474 13654828 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13654085 13654164 . - 2 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13653910 13653974 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13653764 13653817 . - 1 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13653587 13653641 . - 1 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13653175 13653212 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13652643 13652730 . - 1 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13652270 13652365 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13651736 13651789 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13651319 13651468 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker three_prime_UTR 13649816 13651318 . - . ID=GAMO_00029212-RA:three_prime_utr;Parent=GAMO_00029212-RA; >> LG08 maker three_prime_UTR 13649295 13649577 . - . ID=GAMO_00029212-RA:three_prime_utr;Parent=GAMO_00029212-RA; >> LG08 maker three_prime_UTR 13648888 13648944 . - . ID=GAMO_00029212-RA:three_prime_utr;Parent=GAMO_00029212-RA; >> >> Carson, I saw that, but I did use Uniprot/Swiss-prot. A snap of the blast-output used as input here: >> GAMO_00029212-RA sp|Q8BJZ3|LFG3_MOUSE 53.93 280 112 3 81 348 33 307 2e-92 285 >> GAMO_00029212-RA sp|Q969X1|LFG3_HUMAN 54.51 288 103 5 76 347 33 308 4e-92 284 >> GAMO_00029212-RA sp|Q9BWQ8|LFG2_HUMAN 45.73 328 134 6 44 351 13 316 2e-86 270 >> GAMO_00029212-RA sp|Q5R4I4|LFG2_PONAB 45.73 328 134 6 44 351 13 316 3e-86 269 >> GAMO_00029212-RA sp|Q1LZ71|LFG2_BOVIN 45.03 322 145 5 44 351 13 316 5e-84 264 >> GAMO_00029212-RA sp|O88407|LFG2_RAT 44.65 327 139 6 44 351 13 316 8e-83 261 >> GAMO_00029212-RA sp|Q8K097|LFG2_MOUSE 45.16 310 129 5 60 351 31 317 1e-80 255 >> GAMO_00029212-RA sp|Q7Z429|LFG1_HUMAN 39.32 351 164 9 32 351 39 371 6e-69 226 >> GAMO_00029212-RA sp|Q32L53|LFG1_BOVIN 41.69 343 158 8 29 351 46 366 8e-66 218 >> GAMO_00029212-RA sp|Q9ESF4|LFG1_MOUSE 40.43 324 156 8 53 351 34 345 2e-59 201 >> GAMO_00029212-RA sp|Q6P6R0|LFG1_RAT 39.71 345 165 11 34 351 20 348 2e-59 201 >> GAMO_00029212-RA sp|Q9DA39|LFG4_MOUSE 35.59 222 120 7 142 351 27 237 3e-24 103 >> GAMO_00029212-RA sp|Q49P94|GAAP_VACCL 33.47 239 128 9 113 337 1 222 5e-22 97.1 >> GAMO_00029233-RA sp|Q2KIK0|SGT1_BOVIN 53.18 299 100 3 5 268 17 310 5e-89 275 >> GAMO_00029233-RA sp|B0BN85|SGT1_RAT 51.51 299 104 3 5 268 16 308 5e-86 268 >> GAMO_00029233-RA sp|Q9CX34|SGT1_MOUSE 51.51 299 104 3 5 268 16 308 8e-86 267 >> GAMO_00029233-RA sp|Q9Y2Z0|SGT1_HUMAN 46.83 331 100 5 5 268 16 337 1e-80 254 >> GAMO_00029233-RA sp|Q0JL44|SGT1_ORYSJ 30.75 322 160 4 10 268 16 337 5e-36 137 >> GAMO_00029233-RA sp|Q9SUT5|SGT1B_ARATH 27.99 318 171 4 9 268 11 328 3e-35 135 >> GAMO_00029233-RA sp|Q9SUR9|SGT1A_ARATH 28.28 297 159 5 24 268 26 320 7e-35 134 >> GAMO_00029233-RA sp|Q55ED0|SGT1_DICDI 37.72 167 63 3 138 268 196 357 5e-25 107 >> >> 521 genes have had added function before maker_functional_gff choked particular gene GAMO_00029233. >> >> Thank you. >> >> Ole >> >> >> On 16 December 2015 at 20:37, Carson Holt > wrote: >> I?ve seen this exact same error before (https://groups.google.com/forum/#!searchin/maker-devel/$2Fmaker_functional_gff$20line$2058/maker-devel/cBuQMKTJj2M/aXGnARZ7JhsJ ). >> >> It is caused by the ID from the blast report and input protein fasta. maker_functional_gff is not a generic script that can work on any input, it only works on blast results against Uniprot/Swiss-prot. The script is expecting a very specific header format in both the report and the protein fasta and if it doesn?t see it, then it is missing certain pieces of needed information. >> >> Thanks, >> Carson >> >>> On Dec 16, 2015, at 12:27 PM, Daniel Ence > wrote: >>> >>> Hi Ole, can you send a line for a gene feature that does work? >>> >>> >>> Daniel Ence >>> Graduate Student >>> Eccles Institute of Human Genetics >>> University of Utah >>> 15 North 2030 East, Room 2100 >>> Salt Lake City, UT 84112-5330 >>> >>>> On Dec 14, 2015, at 12:21 PM, Ole Kristian T?rresen > wrote: >>>> >>>> Hi, >>>> I'm trying to update my annotation with some functional annotations with maker_functional_gff, but get this annoying error: >>>> Can't use string ("") as a HASH ref while "strict refs" in use at /cluster/software/VERSIONS/maker-2.31.8/bin/maker_functional_gff line 58, <$IN> line 108947. >>>> Line 108947 in the input gff is this: >>>> >>>> LG08 maker gene 13786695 13806565 . - . ID=GAMO_00029233;Name=GAMO_00029233;Alias=maker-LG08-snap-gene-46.343; >>>> It seems like the regexp in line 55 in the maker_functional_gff script doesn't pick up the ID, but I can't see any difference between that line and other similar lines. >>>> >>>> Any help to trace down this is really appreciated. Do you need any other information? >>>> >>>> Thank you. >>>> >>>> Sincerely, >>>> >>>> Ole Kristian T?rresen >>>> >>>> >>>> >>>> _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at box290.bluehost.com >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ole.toerresen at gmail.com Mon Jan 4 12:08:43 2016 From: ole.toerresen at gmail.com (=?UTF-8?Q?Ole_Kristian_T=C3=B8rresen?=) Date: Mon, 4 Jan 2016 20:08:43 +0100 Subject: [maker-devel] Error with maker_functional_gff In-Reply-To: References: <1EBE8B59-ED4E-4017-99CE-6CD5A5662B74@genetics.utah.edu> Message-ID: I found the mistake, I used different versions of SwissProt/UniProt for BLASTing and as an option for maker_functional_gff. When I changed to the same version, the error went away. Sad to say, but stuff like different versions of SwissProt/UniProt do accumulate over time a bit... Thank you. Ole On 4 January 2016 at 17:04, Carson Holt wrote: > Perhaps the easiest way to look at this is if you send us the files. I?m > still leaning towards a format error. But it?s the kind of thing where I > would need the files to find the specific entry. > > ?Carson > > > > On Dec 16, 2015, at 11:32 PM, Ole Kristian T?rresen < > ole.toerresen at gmail.com> wrote: > > Here's the hits for GAMO_00029233 > >sp|Q9SUR9|SGT1A_ARATH Protein SGT1 homolog A OS=Arabidopsis thaliana > GN=SGT1A PE=1 SV=1 > >sp|Q9SUT5|SGT1B_ARATH Protein SGT1 homolog B OS=Arabidopsis thaliana > GN=SGT1B PE=1 SV=1 > >sp|Q2KIK0|SGT1_BOVIN Protein SGT1 homolog OS=Bos taurus GN=SUGT1 PE=2 SV=1 > >sp|Q55ED0|SGT1_DICDI Protein SGT1 homolog OS=Dictyostelium discoideum > GN=sugt1 PE=2 SV=1 > >sp|Q9Y2Z0|SGT1_HUMAN Protein SGT1 homolog OS=Homo sapiens GN=SUGT1 PE=1 > SV=3 > >sp|Q9CX34|SGT1_MOUSE Protein SGT1 homolog OS=Mus musculus GN=Sugt1 PE=1 > SV=3 > >sp|Q0JL44|SGT1_ORYSJ Protein SGT1 homolog OS=Oryza sativa subsp. japonica > GN=SGT1 PE=1 SV=1 > >sp|B0BN85|SGT1_RAT Protein SGT1 homolog OS=Rattus norvegicus GN=Sugt1 > PE=2 SV=1 > > The bovin is the first hit. I can't really see anything different about > that. > > I'm don't know perl that well. Do you have some code which I can use to > debug this? In line 58 it tries to access the blast hash with the ID as a > key, if I understand this correctly. Either the hash is empty where the key > tries to access, or the key is empty. If I could print each ID as it is > found, maybe I can find a pattern. And/or print each blast entry when the > blast hash is created. > > Thank you. > > Ole > > On 16 December 2015 at 21:55, Carson Holt wrote: > >> Find the hit for GAMO_00029233 and then pull it?s header line out of the >> Uniprot fasta file. There may be an unexpected formatting difference in >> that header. >> >> ?Carson >> >> >> >> On Dec 16, 2015, at 1:53 PM, Ole Kristian T?rresen < >> ole.toerresen at gmail.com> wrote: >> >> Daniel, >> this is the previous gene, before maker_functional_gff: >> LG08 maker gene 13648888 13656687 . - . >> ID=GAMO_00029212;Name=GAMO_00029212;Alias=maker-LG08-snap-gene-46.325; >> LG08 maker mRNA 13648888 13656687 . - . >> >> ID=GAMO_00029212-RA;Parent=GAMO_00029212;Name=GAMO_00029212-RA;Alias=maker-LG08-snap-gene-46.325-mRNA-1;_AED=0.45;_QI=0|0.83|0.84|1|0.5|0.61|13|1843|351;_eAED=0.45; >> LG08 maker exon 13648888 13648944 . - . >> ID=GAMO_00029212-RA:exon:9363;Parent=GAMO_00029212-RA; >> LG08 maker exon 13649295 13649577 . - . >> ID=GAMO_00029212-RA:exon:9362;Parent=GAMO_00029212-RA; >> LG08 maker exon 13649816 13651468 . - . >> ID=GAMO_00029212-RA:exon:9361;Parent=GAMO_00029212-RA; >> LG08 maker exon 13651736 13651789 . - . >> ID=GAMO_00029212-RA:exon:9360;Parent=GAMO_00029212-RA; >> LG08 maker exon 13652270 13652365 . - . >> ID=GAMO_00029212-RA:exon:9359;Parent=GAMO_00029212-RA; >> LG08 maker exon 13652643 13652730 . - . >> ID=GAMO_00029212-RA:exon:9358;Parent=GAMO_00029212-RA; >> LG08 maker exon 13653175 13653212 . - . >> ID=GAMO_00029212-RA:exon:9357;Parent=GAMO_00029212-RA; >> LG08 maker exon 13653587 13653641 . - . >> ID=GAMO_00029212-RA:exon:9356;Parent=GAMO_00029212-RA; >> LG08 maker exon 13653764 13653817 . - . >> ID=GAMO_00029212-RA:exon:9355;Parent=GAMO_00029212-RA; >> LG08 maker exon 13653910 13653974 . - . >> ID=GAMO_00029212-RA:exon:9354;Parent=GAMO_00029212-RA; >> LG08 maker exon 13654085 13654164 . - . >> ID=GAMO_00029212-RA:exon:9353;Parent=GAMO_00029212-RA; >> LG08 maker exon 13654474 13654828 . - . >> ID=GAMO_00029212-RA:exon:9352;Parent=GAMO_00029212-RA; >> LG08 maker exon 13656667 13656687 . - . >> ID=GAMO_00029212-RA:exon:9351;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13656667 13656687 . - 0 >> ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13654474 13654828 . - 0 >> ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13654085 13654164 . - 2 >> ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13653910 13653974 . - 0 >> ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13653764 13653817 . - 1 >> ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13653587 13653641 . - 1 >> ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13653175 13653212 . - 0 >> ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13652643 13652730 . - 1 >> ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13652270 13652365 . - 0 >> ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13651736 13651789 . - 0 >> ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13651319 13651468 . - 0 >> ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker three_prime_UTR 13649816 13651318 . - >> . ID=GAMO_00029212-RA:three_prime_utr;Parent=GAMO_00029212-RA; >> LG08 maker three_prime_UTR 13649295 13649577 . - >> . ID=GAMO_00029212-RA:three_prime_utr;Parent=GAMO_00029212-RA; >> LG08 maker three_prime_UTR 13648888 13648944 . - >> . ID=GAMO_00029212-RA:three_prime_utr;Parent=GAMO_00029212-RA; >> LG08 maker gene 13786695 13806565 . - . >> ID=GAMO_00029233;Name=GAMO_00029233;Alias=maker-LG08-snap-gene-46.343; >> LG08 maker mRNA 13786695 13806565 . - . >> >> ID=GAMO_00029233-RA;Parent=GAMO_00029233;Name=GAMO_00029233-RA;Alias=maker-LG08-snap-gene-46.343-mRNA-1;_AED=0.47;_QI=173|0.78|0.66|1|0.21|0.26|15|0|301;_eAED=0.47; >> >> After : >> LG08 maker gene 13648888 13656687 . - . >> >> ID=GAMO_00029212;Name=GAMO_00029212;Alias=maker-LG08-snap-gene-46.325;Note=Similar >> to Tmbim1: Protein lifeguard 3 (Mus musculus); >> LG08 maker mRNA 13648888 13656687 . - . >> >> ID=GAMO_00029212-RA;Parent=GAMO_00029212;Name=GAMO_00029212-RA;Alias=maker-LG08-snap-gene-46.325-mRNA-1;_AED=0.45;_QI=0|0.83|0.84|1|0.5|0.61|13|1843|351;_eAED=0.45;Note=Similar >> to Tmbim1: Protein lifeguard 3 (Mus musculus); >> LG08 maker exon 13648888 13648944 . - . >> ID=GAMO_00029212-RA:exon:9363;Parent=GAMO_00029212-RA; >> LG08 maker exon 13649295 13649577 . - . >> ID=GAMO_00029212-RA:exon:9362;Parent=GAMO_00029212-RA; >> LG08 maker exon 13649816 13651468 . - . >> ID=GAMO_00029212-RA:exon:9361;Parent=GAMO_00029212-RA; >> LG08 maker exon 13651736 13651789 . - . >> ID=GAMO_00029212-RA:exon:9360;Parent=GAMO_00029212-RA; >> LG08 maker exon 13652270 13652365 . - . >> ID=GAMO_00029212-RA:exon:9359;Parent=GAMO_00029212-RA; >> LG08 maker exon 13652643 13652730 . - . >> ID=GAMO_00029212-RA:exon:9358;Parent=GAMO_00029212-RA; >> LG08 maker exon 13653175 13653212 . - . >> ID=GAMO_00029212-RA:exon:9357;Parent=GAMO_00029212-RA; >> LG08 maker exon 13653587 13653641 . - . >> ID=GAMO_00029212-RA:exon:9356;Parent=GAMO_00029212-RA; >> LG08 maker exon 13653764 13653817 . - . >> ID=GAMO_00029212-RA:exon:9355;Parent=GAMO_00029212-RA; >> LG08 maker exon 13653910 13653974 . - . >> ID=GAMO_00029212-RA:exon:9354;Parent=GAMO_00029212-RA; >> LG08 maker exon 13654085 13654164 . - . >> ID=GAMO_00029212-RA:exon:9353;Parent=GAMO_00029212-RA; >> LG08 maker exon 13654474 13654828 . - . >> ID=GAMO_00029212-RA:exon:9352;Parent=GAMO_00029212-RA; >> LG08 maker exon 13656667 13656687 . - . >> ID=GAMO_00029212-RA:exon:9351;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13656667 13656687 . - 0 >> ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13654474 13654828 . - 0 >> ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13654085 13654164 . - 2 >> ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13653910 13653974 . - 0 >> ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13653764 13653817 . - 1 >> ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13653587 13653641 . - 1 >> ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13653175 13653212 . - 0 >> ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13652643 13652730 . - 1 >> ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13652270 13652365 . - 0 >> ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13651736 13651789 . - 0 >> ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13651319 13651468 . - 0 >> ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker three_prime_UTR 13649816 13651318 . - >> . ID=GAMO_00029212-RA:three_prime_utr;Parent=GAMO_00029212-RA; >> LG08 maker three_prime_UTR 13649295 13649577 . - >> . ID=GAMO_00029212-RA:three_prime_utr;Parent=GAMO_00029212-RA; >> LG08 maker three_prime_UTR 13648888 13648944 . - >> . ID=GAMO_00029212-RA:three_prime_utr;Parent=GAMO_00029212-RA; >> >> Carson, I saw that, but I did use Uniprot/Swiss-prot. A snap of the >> blast-output used as input here: >> GAMO_00029212-RA sp|Q8BJZ3|LFG3_MOUSE 53.93 280 112 3 >> 81 348 33 307 2e-92 285 >> GAMO_00029212-RA sp|Q969X1|LFG3_HUMAN 54.51 288 103 5 >> 76 347 33 308 4e-92 284 >> GAMO_00029212-RA sp|Q9BWQ8|LFG2_HUMAN 45.73 328 134 6 >> 44 351 13 316 2e-86 270 >> GAMO_00029212-RA sp|Q5R4I4|LFG2_PONAB 45.73 328 134 6 >> 44 351 13 316 3e-86 269 >> GAMO_00029212-RA sp|Q1LZ71|LFG2_BOVIN 45.03 322 145 5 >> 44 351 13 316 5e-84 264 >> GAMO_00029212-RA sp|O88407|LFG2_RAT 44.65 327 139 6 >> 44 351 13 316 8e-83 261 >> GAMO_00029212-RA sp|Q8K097|LFG2_MOUSE 45.16 310 129 5 >> 60 351 31 317 1e-80 255 >> GAMO_00029212-RA sp|Q7Z429|LFG1_HUMAN 39.32 351 164 9 >> 32 351 39 371 6e-69 226 >> GAMO_00029212-RA sp|Q32L53|LFG1_BOVIN 41.69 343 158 8 >> 29 351 46 366 8e-66 218 >> GAMO_00029212-RA sp|Q9ESF4|LFG1_MOUSE 40.43 324 156 8 >> 53 351 34 345 2e-59 201 >> GAMO_00029212-RA sp|Q6P6R0|LFG1_RAT 39.71 345 165 >> 11 34 351 20 348 2e-59 201 >> GAMO_00029212-RA sp|Q9DA39|LFG4_MOUSE 35.59 222 120 7 >> 142 351 27 237 3e-24 103 >> GAMO_00029212-RA sp|Q49P94|GAAP_VACCL 33.47 239 128 9 >> 113 337 1 222 5e-22 97.1 >> GAMO_00029233-RA sp|Q2KIK0|SGT1_BOVIN 53.18 299 100 3 >> 5 268 17 310 5e-89 275 >> GAMO_00029233-RA sp|B0BN85|SGT1_RAT 51.51 299 104 3 >> 5 268 16 308 5e-86 268 >> GAMO_00029233-RA sp|Q9CX34|SGT1_MOUSE 51.51 299 104 3 >> 5 268 16 308 8e-86 267 >> GAMO_00029233-RA sp|Q9Y2Z0|SGT1_HUMAN 46.83 331 100 5 >> 5 268 16 337 1e-80 254 >> GAMO_00029233-RA sp|Q0JL44|SGT1_ORYSJ 30.75 322 160 4 >> 10 268 16 337 5e-36 137 >> GAMO_00029233-RA sp|Q9SUT5|SGT1B_ARATH 27.99 318 171 4 >> 9 268 11 328 3e-35 135 >> GAMO_00029233-RA sp|Q9SUR9|SGT1A_ARATH 28.28 297 159 5 >> 24 268 26 320 7e-35 134 >> GAMO_00029233-RA sp|Q55ED0|SGT1_DICDI 37.72 167 63 3 >> 138 268 196 357 5e-25 107 >> >> 521 genes have had added function before maker_functional_gff choked >> particular gene GAMO_00029233. >> >> Thank you. >> >> Ole >> >> >> On 16 December 2015 at 20:37, Carson Holt wrote: >> >>> I?ve seen this exact same error before ( >>> https://groups.google.com/forum/#!searchin/maker-devel/$2Fmaker_functional_gff$20line$2058/maker-devel/cBuQMKTJj2M/aXGnARZ7JhsJ >>> ). >>> >>> It is caused by the ID from the blast report and input protein >>> fasta. maker_functional_gff is not a generic script that can work on any >>> input, it only works on blast results against Uniprot/Swiss-prot. The >>> script is expecting a very specific header format in both the report and >>> the protein fasta and if it doesn?t see it, then it is missing certain >>> pieces of needed information. >>> >>> Thanks, >>> Carson >>> >>> On Dec 16, 2015, at 12:27 PM, Daniel Ence >>> wrote: >>> >>> Hi Ole, can you send a line for a gene feature that does work? >>> >>> >>> Daniel Ence >>> Graduate Student >>> Eccles Institute of Human Genetics >>> University of Utah >>> 15 North 2030 East, Room 2100 >>> Salt Lake City, UT 84112-5330 >>> >>> On Dec 14, 2015, at 12:21 PM, Ole Kristian T?rresen < >>> ole.toerresen at gmail.com> wrote: >>> >>> Hi, >>> I'm trying to update my annotation with some functional annotations >>> with maker_functional_gff, but get this annoying error: >>> Can't use string ("") as a HASH ref while "strict refs" in use at >>> /cluster/software/VERSIONS/maker-2.31.8/bin/maker_functional_gff line 58, >>> <$IN> line 108947. >>> >>> Line 108947 in the input gff is this: >>> >>> LG08 maker gene 13786695 13806565 . - >>> . >>> ID=GAMO_00029233;Name=GAMO_00029233;Alias=maker-LG08-snap-gene-46.343; >>> >>> It seems like the regexp in line 55 in the maker_functional_gff script >>> doesn't pick up the ID, but I can't see any difference between that line >>> and other similar lines. >>> >>> Any help to trace down this is really appreciated. Do you need any other >>> information? >>> >>> Thank you. >>> >>> Sincerely, >>> >>> Ole Kristian T?rresen >>> >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> >>> >> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From s.shaw at abdn.ac.uk Fri Jan 8 08:05:16 2016 From: s.shaw at abdn.ac.uk (Shaw, Sophie) Date: Fri, 8 Jan 2016 15:05:16 +0000 Subject: [maker-devel] Moving Annotation to New Assembly Message-ID: Dear Maker Team, I have reassembled some data that was previously assembled with different software and then annotated using MAKER. I want to transfer the MAKER annotation to the new fasta file. I've followed the instructions in the post here - https://groups.google.com/forum/#!searchin/maker-devel/est_forward/maker-devel/q9fxXGKO8mk/0ATwhJvZeI4J However all of the information in the final column of the GFF has not been transferred over, just the gene name. For example: The original annotation is as follows: scaffold_252 maker gene 3018 4307 . + . ID=CAUR_05562;Name=CAUR_05562;Alias=augustus_masked-scaffold_252-processed-gene-0.0;Note=Similar to VHS1: Serine/threonine-protein kinase VHS1 (Saccharomyces cerevisiae (strain ATCC 204508 / S288c));Dbxref=Gene3D:G3DSA:1.10.510.10,Gene3D:G3DSA:3.30.200.20,InterPro:IPR000719,InterPro:IPR002290,InterPro:IPR008271,InterPro:IPR011009,InterPro:IPR017441,PANTHER:PTHR24343,PANTHER:PTHR24343:SF90,Pfam:PF00069,ProSitePatterns:PS00107,ProSitePatterns:PS00108,ProSiteProfiles:PS50011,SMART:SM00220,SUPERFAMILY:SSF56112;Ontology_term=GO:0004672,GO:0005524,GO:0006468,GO:0016772; And the new annotation after running MAKER with est_forward=1: scaffold_21 maker gene 18116 19405 . - . ID=maker-scaffold_21-exonerate_est2genome-gene-0.25;Name=CAUR_05562-RA-gene Is there a way of pulling the Note part of the gff file over as well as the gene name (and is this even a correct thing to do - should I be re-running MAKER entirely?). The researchers don't want to lose the information gained from the work on the previous annotation. All the Best, Sophie Shaw - Dr. Sophie Shaw Bioinformatician Centre for Genome Enabled Biology and Medicine University of Aberdeen 23 St. Machar Drive Old Aberdeen AB24 3RY https://www.abdn.ac.uk/genomics/ The University of Aberdeen is a charity registered in Scotland, No SC013683. Tha Oilthigh Obar Dheathain na charthannas cl?raichte ann an Alba, ?ir. SC013683. -------------- next part -------------- An HTML attachment was scrubbed... URL: From hcma at uci.edu Mon Jan 11 17:21:11 2016 From: hcma at uci.edu (hcma) Date: Mon, 11 Jan 2016 16:21:11 -0800 Subject: [maker-devel] basic question for MAKER Message-ID: <2ed9dc6119cdaa218cf453b8390d28e8@uci.edu> Hi, I have some basic questions regarding how to use MAKER. Do I have to download the following file myself? Repeatmasker.gff file genome sequence protein EST I would like to incorporate my RNA-seq data, I have a transcriptome assembly generated using Trinity, how do I incorporate this and can i use MAKER or do i have to use MAKER2? Thanks for your time and any comments will be greatly appreciated. Best Regards Karen From hcma at uci.edu Wed Jan 13 11:09:14 2016 From: hcma at uci.edu (hcma) Date: Wed, 13 Jan 2016 10:09:14 -0800 Subject: [maker-devel] basic question on maker Message-ID: Hi, I would like to include a de novo assembled transcriptome assembly for running maker. The organism i am working with is fly and I am wondering what is the best way to do this? Do I need to get the input files for running Repeatmasker or just set: model_org=all What's the best protein sequence file to use? is ' uniprot_sprot.fasta' ok? Some people use Trinity transcriptome assembly to generate a train set for Augustus and then run maker again, is this a better way than running maker just once? Thanks for your time and any comments will be greatly appreciated. Best Regards Karen From carsonhh at gmail.com Thu Jan 14 13:01:00 2016 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 14 Jan 2016 13:01:00 -0700 Subject: [maker-devel] basic question on maker In-Reply-To: References: Message-ID: <07C1C0FA-243E-4F05-AE17-FBECCE5BC3A5@gmail.com> Hi Karen, All your questions may be best answered from this tutorial on the MAKER wiki ?> http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial_for_GMOD_Online_Training_2014 There is also a video link on the wiki page if you want to follow that. Thanks, Carson > On Jan 13, 2016, at 11:09 AM, hcma wrote: > > Hi, > > I would like to include a de novo assembled transcriptome assembly for running maker. The organism i am working with is fly and I am wondering what is the best way to do this? > > Do I need to get the input files for running Repeatmasker or just set: > > model_org=all > > What's the best protein sequence file to use? > > is ' uniprot_sprot.fasta' ok? > > > Some people use Trinity transcriptome assembly to generate a train set for Augustus and then run maker again, is this a better way than running maker just once? > > > Thanks for your time and any comments will be greatly appreciated. > > Best Regards > Karen > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Jan 14 13:35:10 2016 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 14 Jan 2016 13:35:10 -0700 Subject: [maker-devel] Moving Annotation to New Assembly In-Reply-To: References: Message-ID: <7418369D-6EDB-4C61-B3F7-CF5FFF797FA2@gmail.com> We do not have a tool that will copy over attributes from one GFF3 file to another based off of ID match. Your needs are specific enough that you may have to write a script yourself to copy the attributes you care about. Truthfully I would recommend rerunning interproscan and blastp against swiss-prot, as these could probably use an update as anyways. The est_forward tool used to pull IDs forward is based solely off of alignment (they will not all be exact matches or complete matches - just best matches), so you cannot guarantee that all domain content will be completely identical. Interpro and swiss-prot also get periodically updated, so running these against the most recent releases can give more functional info. The purist in me would be inclined to redo the interproscn analysis and blastp against swiss-prot. Then you can use the maker_functional_gff, ipr_update_gff, and iprscan2gff3 scripts to properly add everything back in a way similar to the previous annotations. ?Carson > On Jan 8, 2016, at 8:05 AM, Shaw, Sophie wrote: > > Dear Maker Team, > > I have reassembled some data that was previously assembled with different software and then annotated using MAKER. I want to transfer the MAKER annotation to the new fasta file. I?ve followed the instructions in the post here - https://groups.google.com/forum/#!searchin/maker-devel/est_forward/maker-devel/q9fxXGKO8mk/0ATwhJvZeI4J > > However all of the information in the final column of the GFF has not been transferred over, just the gene name. For example: > > The original annotation is as follows: > scaffold_252 maker > gene 3018 > 4307 . > + . > ID=CAUR_05562;Name=CAUR_05562;Alias=augustus_masked-scaffold_252-processed-gene-0.0;Note=Similar to VHS1: Serine/threonine-protein kinase VHS1 (Saccharomyces cerevisiae (strain ATCC 204508 / S288c));Dbxref=Gene3D:G3DSA:1.10.510.10,Gene3D:G3DSA:3.30.200.20,InterPro:IPR000719,InterPro:IPR002290,InterPro:IPR008271,InterPro:IPR011009,InterPro:IPR017441,PANTHER:PTHR24343,PANTHER:PTHR24343:SF90,Pfam:PF00069,ProSitePatterns:PS00107,ProSitePatterns:PS00108,ProSiteProfiles:PS50011,SMART:SM00220,SUPERFAMILY:SSF56112;Ontology_term=GO:0004672,GO:0005524,GO:0006468,GO:0016772; > > And the new annotation after running MAKER with est_forward=1: > scaffold_21 maker > gene 18116 > 19405 . > - . > ID=maker-scaffold_21-exonerate_est2genome-gene-0.25;Name=CAUR_05562-RA-gene > > Is there a way of pulling the Note part of the gff file over as well as the gene name (and is this even a correct thing to do - should I be re-running MAKER entirely?). The researchers don?t want to lose the information gained from the work on the previous annotation. > > All the Best, > > Sophie Shaw > > ? > Dr. Sophie Shaw > Bioinformatician > Centre for Genome Enabled Biology and Medicine > University of Aberdeen > 23 St. Machar Drive > Old Aberdeen > AB24 3RY > https://www.abdn.ac.uk/genomics/ > > > > > The University of Aberdeen is a charity registered in Scotland, No SC013683. > Tha Oilthigh Obar Dheathain na charthannas cl?raichte ann an Alba, ?ir. SC013683. > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From hcma at uci.edu Thu Jan 14 16:44:39 2016 From: hcma at uci.edu (hcma) Date: Thu, 14 Jan 2016 15:44:39 -0800 Subject: [maker-devel] basic question on maker In-Reply-To: <07C1C0FA-243E-4F05-AE17-FBECCE5BC3A5@gmail.com> References: <07C1C0FA-243E-4F05-AE17-FBECCE5BC3A5@gmail.com> Message-ID: Hi Carson, Thanks for the link. Can maker2 be run without inputting any protein sequences? How to turn this off in the control files? Also, can i run maker using Augustus and not SNAP? Again, how do i turn SNAP off? Does maker also predict non-coding genes? Thanks. Best Regards Karen On 2016-01-14 12:01, Carson Holt wrote: > Hi Karen, > > All your questions may be best answered from this tutorial on the > MAKER wiki ?> > http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial_for_GMOD_Online_Training_2014 > [1] > > There is also a video link on the wiki page if you want to follow > that. > > Thanks, > Carson > >> On Jan 13, 2016, at 11:09 AM, hcma wrote: >> >> Hi, >> >> I would like to include a de novo assembled transcriptome assembly >> for running maker. The organism i am working with is fly and I am >> wondering what is the best way to do this? >> >> Do I need to get the input files for running Repeatmasker or just >> set: >> >> model_org=all >> >> What's the best protein sequence file to use? >> >> is ' uniprot_sprot.fasta' ok? >> >> Some people use Trinity transcriptome assembly to generate a train >> set for Augustus and then run maker again, is this a better way than >> running maker just once? >> >> Thanks for your time and any comments will be greatly appreciated. >> >> Best Regards >> Karen >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > Links: > ------ > [1] > http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial_for_GMOD_Online_Training_2014 From carsonhh at gmail.com Fri Jan 15 10:16:27 2016 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 15 Jan 2016 10:16:27 -0700 Subject: [maker-devel] basic question on maker In-Reply-To: References: <07C1C0FA-243E-4F05-AE17-FBECCE5BC3A5@gmail.com> Message-ID: > Can maker2 be run without inputting any protein sequences? Yes. But it will not perform as well. > How to turn this off in the control files? Any option left blank is off. > Also, can i run maker using Augustus and not SNAP? Again, how do i turn SNAP off? Yes. Leave it blank. > Does maker also predict non-coding genes? You can run it with tRNAscan or snoscan. Snoscan requires you to have rRNAs from your organism to train with though. ?Carson From hcma at uci.edu Fri Jan 15 15:39:25 2016 From: hcma at uci.edu (hcma) Date: Fri, 15 Jan 2016 14:39:25 -0800 Subject: [maker-devel] basic question on maker In-Reply-To: References: <07C1C0FA-243E-4F05-AE17-FBECCE5BC3A5@gmail.com> Message-ID: Hi Carlson, Regarding non-coding RNA predictions, MAKER only predicts tRNAs and rRNAs, but not other RNAs, for example, lncRNAs? Thanks again. Best Regards Karen On 2016-01-15 09:16, Carson Holt wrote: >> Can maker2 be run without inputting any protein sequences? > > Yes. But it will not perform as well. > >> How to turn this off in the control files? > > Any option left blank is off. > > >> Also, can i run maker using Augustus and not SNAP? Again, how do i >> turn SNAP off? > > Yes. Leave it blank. > > >> Does maker also predict non-coding genes? > > You can run it with tRNAscan or snoscan. Snoscan requires you to have > rRNAs from your organism to train with though. > > ?Carson From dence at genetics.utah.edu Fri Jan 15 15:51:44 2016 From: dence at genetics.utah.edu (Daniel Ence) Date: Fri, 15 Jan 2016 22:51:44 +0000 Subject: [maker-devel] basic question on maker In-Reply-To: References: <07C1C0FA-243E-4F05-AE17-FBECCE5BC3A5@gmail.com> Message-ID: <1C311E8C-20F3-48DB-A982-925AEECD7636@genetics.utah.edu> Hi Karen, I don?t of a unified tool that predicts lncRNAs from genomic sequence. I found a tool that predicts lncRNAs from RNAseq dataset, which you might be able to use for your project. I?ve never used it, but it might be a starting place. http://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-15-311 Here?s also a review that describes several workflows for annotating lncRNAs in insect genomes: http://www.sciencedirect.com/science/article/pii/S2214574515000061 Hope that helps, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 On Jan 15, 2016, at 3:39 PM, hcma > wrote: Hi Carlson, Regarding non-coding RNA predictions, MAKER only predicts tRNAs and rRNAs, but not other RNAs, for example, lncRNAs? Thanks again. Best Regards Karen On 2016-01-15 09:16, Carson Holt wrote: Can maker2 be run without inputting any protein sequences? Yes. But it will not perform as well. How to turn this off in the control files? Any option left blank is off. Also, can i run maker using Augustus and not SNAP? Again, how do i turn SNAP off? Yes. Leave it blank. Does maker also predict non-coding genes? You can run it with tRNAscan or snoscan. Snoscan requires you to have rRNAs from your organism to train with though. ?Carson _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.s.campbell1 at gmail.com Fri Jan 15 18:11:05 2016 From: michael.s.campbell1 at gmail.com (Michael Campbell) Date: Fri, 15 Jan 2016 17:11:05 -0800 Subject: [maker-devel] basic question on maker In-Reply-To: References: <07C1C0FA-243E-4F05-AE17-FBECCE5BC3A5@gmail.com> Message-ID: <980827F7-C4F0-4E26-A480-F588B2E09C7B@gmail.com> Hi Karen, Just a quick clarification, MAKER doesn?t predict the rRNAs. If you give MAKER the rRNA sequence with the O-methylation sites it will run snoscan to predict snoRNAs. Take care, Mike > On Jan 15, 2016, at 2:39 PM, hcma wrote: > > Hi Carlson, > > Regarding non-coding RNA predictions, MAKER only predicts tRNAs and rRNAs, but not other RNAs, for example, lncRNAs? > > Thanks again. > > Best Regards > Karen > > > > > On 2016-01-15 09:16, Carson Holt wrote: >>> Can maker2 be run without inputting any protein sequences? >> Yes. But it will not perform as well. >>> How to turn this off in the control files? >> Any option left blank is off. >>> Also, can i run maker using Augustus and not SNAP? Again, how do i turn SNAP off? >> Yes. Leave it blank. >>> Does maker also predict non-coding genes? >> You can run it with tRNAscan or snoscan. Snoscan requires you to have >> rRNAs from your organism to train with though. >> ?Carson > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From nguyenan at mail.nih.gov Tue Jan 19 13:18:36 2016 From: nguyenan at mail.nih.gov (Nguyen, Anh-Dao (NIH/NHGRI) [C]) Date: Tue, 19 Jan 2016 20:18:36 +0000 Subject: [maker-devel] MAKER version 3 beta Message-ID: Hello, I just wanted to know if MAKER version 3 beta (EVM integration) has already been available for downloading? https://groups.google.com/forum/#!searchin/maker-devel/EVM|sort:date/maker- devel/YzsN-t0gu0U/-A_7YT2gFwAJ Thank you very much! Anh-Dao From carsonhh at gmail.com Tue Jan 19 13:23:54 2016 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 19 Jan 2016 13:23:54 -0700 Subject: [maker-devel] MAKER version 3 beta In-Reply-To: References: Message-ID: <6F128D66-685F-4F7F-9097-2A9065ECBC94@gmail.com> Yes. Go to the registration page for the standard MAKER download. After registering, you will be redirected to a page with links to both the current version of MAKER as well as the beta ?> http://yandell.topaz.genetics.utah.edu/cgi-bin/maker_license.cgi ?Carson > On Jan 19, 2016, at 1:18 PM, Nguyen, Anh-Dao (NIH/NHGRI) [C] wrote: > > Hello, > > I just wanted to know if MAKER version 3 beta (EVM integration) has > already been available for downloading? > > https://groups.google.com/forum/#!searchin/maker-devel/EVM|sort:date/maker- > devel/YzsN-t0gu0U/-A_7YT2gFwAJ > > Thank you very much! > Anh-Dao > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From macmanes at gmail.com Tue Jan 19 13:34:38 2016 From: macmanes at gmail.com (Matthew MacManes) Date: Tue, 19 Jan 2016 15:34:38 -0500 Subject: [maker-devel] MAKER version 3 beta In-Reply-To: <6F128D66-685F-4F7F-9097-2A9065ECBC94@gmail.com> References: <6F128D66-685F-4F7F-9097-2A9065ECBC94@gmail.com> Message-ID: Just checking, when installing from the beta, I still see ./maker -?version 2.32 was expecting 3.00.. Thanks, Matt ______________________________________________ Matthew MacManes, Ph.D. University of New Hampshire? I? Assistant Professor of Genome Enabled Biology Department of Molecular, Cellular, & Biomedical Sciences Durham, NH? 03824 Phone: 603-862-4052? | ?Twitter:?@macmanes??| Web:?genomebio.org Office: 189 Rudman Hall | Laboratory: 145 Rudman Hall On January 19, 2016 at 3:24:16 PM, Carson Holt (carsonhh at gmail.com) wrote: Yes. ?Go to the registration page for the standard MAKER download. After registering, you will be redirected to a page with links to both the current version of MAKER as well as the beta ?>?http://yandell.topaz.genetics.utah.edu/cgi-bin/maker_license.cgi ?Carson On Jan 19, 2016, at 1:18 PM, Nguyen, Anh-Dao (NIH/NHGRI) [C] wrote: Hello, I just wanted to know if MAKER version 3 beta (EVM integration) has already been available for downloading? https://groups.google.com/forum/#!searchin/maker-devel/EVM|sort:date/maker- devel/YzsN-t0gu0U/-A_7YT2gFwAJ Thank you very much! Anh-Dao _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Jan 19 13:35:55 2016 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 19 Jan 2016 13:35:55 -0700 Subject: [maker-devel] MAKER version 3 beta In-Reply-To: References: <6F128D66-685F-4F7F-9097-2A9065ECBC94@gmail.com> Message-ID: Thanks. I?ll fix that. ?Carson > On Jan 19, 2016, at 1:34 PM, Matthew MacManes wrote: > > Just checking, when installing from the beta, I still see > > ./maker -?version > 2.32 > was expecting 3.00.. > > Thanks, Matt > > > > > ______________________________________________ > Matthew MacManes, Ph.D. > University of New Hampshire I Assistant Professor of Genome Enabled Biology > Department of Molecular, Cellular, & Biomedical Sciences > Durham, NH 03824 > Phone: 603-862-4052 | Twitter: @macmanes? | Web: genomebio.org > Office: 189 Rudman Hall | Laboratory: 145 Rudman Hall > > On January 19, 2016 at 3:24:16 PM, Carson Holt (carsonhh at gmail.com ) wrote: > >> Yes. Go to the registration page for the standard MAKER download. After registering, you will be redirected to a page with links to both the current version of MAKER as well as the beta ?> http://yandell.topaz.genetics.utah.edu/cgi-bin/maker_license.cgi >> >> ?Carson >> >> >> >>> On Jan 19, 2016, at 1:18 PM, Nguyen, Anh-Dao (NIH/NHGRI) [C] > wrote: >>> >>> Hello, >>> >>> I just wanted to know if MAKER version 3 beta (EVM integration) has >>> already been available for downloading? >>> >>> https://groups.google.com/forum/#!searchin/maker-devel/EVM|sort:date/maker- >>> devel/YzsN-t0gu0U/-A_7YT2gFwAJ >>> >>> Thank you very much! >>> Anh-Dao >>> >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From daren.card at gmail.com Wed Jan 20 08:27:28 2016 From: daren.card at gmail.com (Daren C. Card) Date: Wed, 20 Jan 2016 09:27:28 -0600 Subject: [maker-devel] Passing pre-masked repeats into Maker Message-ID: <306F04FB-DFFC-4CAB-8289-494FC87F13BA@gmail.com> Hello all, I?m about to use Maker to begin annotating a vertebrate genome. We use successive rounds of RepeatMasker to annotate repeats due to some library issues we?ve noticed with Repbase (at least in our critters) and to incorporate de novo repeats from RepeatModeler, a process I don?t think Maker could match. I?m wonder what the best way to pass these annotations into Maker would be. I see the thread at https://groups.google.com/forum/#!topic/maker-devel/7UbOIvwaaRM nicely outlines what Maker does with repeats, and it looks like I have 3 options: (1) reannotate in Maker, (2) pass in a RepeatMasker GFF, or (3) pass in a masked genome. #1 is problematic due to the reasons above. #2 looks like it would hard mask the complex repeats like we want, but will also hard mask the simple repeats, which wouldn?t be ideal for evidence mapping from transcripts/proteins. #3 is cautioned against in the link above, and without an accompanying GFF, I would imagine that Maker wouldn?t be able to release the masking to perform Exonerate polishing (Ns could be gaps or could be hard masking, it wouldn?t know). The way I thought to get around these apparent issues (but let me know if my thinking is incorrect) is to separate simple and complex repeats from the final RepeatMasker GFF. Feed only the complex repeats into Maker as a GFF, so that they are hard masked and accounted for, and have Maker also run RepeatMasker, thus remaking the simple repeats (and maybe some other complex hits, primarily through RepeatRunner). Then Maker can presumedly release the masking as needed. Would this type of workaround be a good idea or are there other options? Or am I just overthinking something that isn?t really a problem? Thanks in advance for any help. Daren Daren Card Castoe Lab University of Texas at Arlington www.darencard.net -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Jan 20 09:20:51 2016 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 20 Jan 2016 09:20:51 -0700 Subject: [maker-devel] Passing pre-masked repeats into Maker In-Reply-To: <306F04FB-DFFC-4CAB-8289-494FC87F13BA@gmail.com> References: <306F04FB-DFFC-4CAB-8289-494FC87F13BA@gmail.com> Message-ID: <6FD7CE4B-B944-4793-A822-9D395725ED6D@gmail.com> The strategy outlined would work. To get RepeatMasker to call only simple repeats in MAKER, set model_org=simple in the control files. ?Carson > On Jan 20, 2016, at 8:27 AM, Daren C. Card wrote: > > Hello all, > > I?m about to use Maker to begin annotating a vertebrate genome. We use successive rounds of RepeatMasker to annotate repeats due to some library issues we?ve noticed with Repbase (at least in our critters) and to incorporate de novo repeats from RepeatModeler, a process I don?t think Maker could match. I?m wonder what the best way to pass these annotations into Maker would be. > > I see the thread at https://groups.google.com/forum/#!topic/maker-devel/7UbOIvwaaRM nicely outlines what Maker does with repeats, and it looks like I have 3 options: (1) reannotate in Maker, (2) pass in a RepeatMasker GFF, or (3) pass in a masked genome. > > #1 is problematic due to the reasons above. > > #2 looks like it would hard mask the complex repeats like we want, but will also hard mask the simple repeats, which wouldn?t be ideal for evidence mapping from transcripts/proteins. > > #3 is cautioned against in the link above, and without an accompanying GFF, I would imagine that Maker wouldn?t be able to release the masking to perform Exonerate polishing (Ns could be gaps or could be hard masking, it wouldn?t know). > > The way I thought to get around these apparent issues (but let me know if my thinking is incorrect) is to separate simple and complex repeats from the final RepeatMasker GFF. Feed only the complex repeats into Maker as a GFF, so that they are hard masked and accounted for, and have Maker also run RepeatMasker, thus remaking the simple repeats (and maybe some other complex hits, primarily through RepeatRunner). Then Maker can presumedly release the masking as needed. > > Would this type of workaround be a good idea or are there other options? Or am I just overthinking something that isn?t really a problem? > > Thanks in advance for any help. > > Daren > > Daren Card > Castoe Lab > University of Texas at Arlington > www.darencard.net _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Wed Jan 20 09:21:38 2016 From: dence at genetics.utah.edu (Daniel Ence) Date: Wed, 20 Jan 2016 16:21:38 +0000 Subject: [maker-devel] Passing pre-masked repeats into Maker In-Reply-To: <306F04FB-DFFC-4CAB-8289-494FC87F13BA@gmail.com> References: <306F04FB-DFFC-4CAB-8289-494FC87F13BA@gmail.com> Message-ID: HI Daren, I think the solution you described sounds appropriate. If you?re concerned about how the simple repeats will be handled by maker in the gff, then you can just take those out. If they?re important for downstream analysis, you can add them back in then. Let me know if that helps or if other issues arise. Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 On Jan 20, 2016, at 8:27 AM, Daren C. Card > wrote: Hello all, I?m about to use Maker to begin annotating a vertebrate genome. We use successive rounds of RepeatMasker to annotate repeats due to some library issues we?ve noticed with Repbase (at least in our critters) and to incorporate de novo repeats from RepeatModeler, a process I don?t think Maker could match. I?m wonder what the best way to pass these annotations into Maker would be. I see the thread at https://groups.google.com/forum/#!topic/maker-devel/7UbOIvwaaRM nicely outlines what Maker does with repeats, and it looks like I have 3 options: (1) reannotate in Maker, (2) pass in a RepeatMasker GFF, or (3) pass in a masked genome. #1 is problematic due to the reasons above. #2 looks like it would hard mask the complex repeats like we want, but will also hard mask the simple repeats, which wouldn?t be ideal for evidence mapping from transcripts/proteins. #3 is cautioned against in the link above, and without an accompanying GFF, I would imagine that Maker wouldn?t be able to release the masking to perform Exonerate polishing (Ns could be gaps or could be hard masking, it wouldn?t know). The way I thought to get around these apparent issues (but let me know if my thinking is incorrect) is to separate simple and complex repeats from the final RepeatMasker GFF. Feed only the complex repeats into Maker as a GFF, so that they are hard masked and accounted for, and have Maker also run RepeatMasker, thus remaking the simple repeats (and maybe some other complex hits, primarily through RepeatRunner). Then Maker can presumedly release the masking as needed. Would this type of workaround be a good idea or are there other options? Or am I just overthinking something that isn?t really a problem? Thanks in advance for any help. Daren Daren Card Castoe Lab University of Texas at Arlington www.darencard.net _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From jcornel3 at asu.edu Fri Jan 22 14:38:14 2016 From: jcornel3 at asu.edu (John Cornelius) Date: Fri, 22 Jan 2016 14:38:14 -0700 Subject: [maker-devel] Question on post processing of annotations Message-ID: Hi, I'm using the maker_map_ids script to change the gene ids on an annotation that I just finished. However, I noticed that it does not change the name of genes predicted by SNAP. Is there any way to include SNAP genes for consideration by maker_map_ids? Thanks. -- John Cornelius MCB PhD Candidate Arizona State University -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Jan 22 15:01:29 2016 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 22 Jan 2016 15:01:29 -0700 Subject: [maker-devel] Question on post processing of annotations In-Reply-To: References: Message-ID: Hi John, Do you mean the match/match_part features that are source snap_masked? Those are not genes, they are reference alignments representing the ab initio SNAP calls, and it would be incorrect to rename them. They do not have a 1 to 1 relationship with the final gene models. Sometimes a gene model will overlap 2 or more uninformed SNAP ab initio reference alignments, or one SNAP reference alignment may overlap multiple final gene models, so names cannot just be passed from one to the other. If you want to add specific SNAP models to the final annotation set, you would need to upgrade them to being a gene/mRNA/exon/CDS feature before you can do that. You can do that with manual editors like Apollo, or you can supply a subset of the features you want to upgrade to maker in the pred_gff= option as a separate run, put existing models in model_gff=, and run with keep_preds=1. I know I have covered this previously in greater detail as part of the devel list. If you search the archives for the keywords pred_gff, keep_preds, and iprscan you should come across a number of threads that may be helpful ?> https://groups.google.com/forum/#!forum/maker-devel Thanks, Carson > On Jan 22, 2016, at 2:38 PM, John Cornelius wrote: > > Hi, I'm using the maker_map_ids script to change the gene ids on an annotation that I just finished. However, I noticed that it does not change the name of genes predicted by SNAP. Is there any way to include SNAP genes for consideration by maker_map_ids? Thanks. > > -- > John Cornelius > MCB PhD Candidate > Arizona State University > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From jcornel3 at asu.edu Fri Jan 22 15:06:17 2016 From: jcornel3 at asu.edu (John Cornelius) Date: Fri, 22 Jan 2016 15:06:17 -0700 Subject: [maker-devel] Question on post processing of annotations In-Reply-To: References: Message-ID: I'll look into that thanks. I had been previously just been looking for things in regards to the script itself and its functionality. On Fri, Jan 22, 2016 at 3:01 PM, Carson Holt wrote: > Hi John, > > Do you mean the match/match_part features that are source snap_masked? > Those are not genes, they are reference alignments representing the ab > initio SNAP calls, and it would be incorrect to rename them. They do not > have a 1 to 1 relationship with the final gene models. Sometimes a gene > model will overlap 2 or more uninformed SNAP ab initio reference > alignments, or one SNAP reference alignment may overlap multiple final gene > models, so names cannot just be passed from one to the other. > > If you want to add specific SNAP models to the final annotation set, you > would need to upgrade them to being a gene/mRNA/exon/CDS feature before you > can do that. You can do that with manual editors like Apollo, or you can > supply a subset of the features you want to upgrade to maker in the > pred_gff= option as a separate run, put existing models in model_gff=, and > run with keep_preds=1. > > I know I have covered this previously in greater detail as part of the > devel list. If you search the archives for the keywords pred_gff, > keep_preds, and iprscan you should come across a number of threads that may > be helpful ?> https://groups.google.com/forum/#!forum/maker-devel > > Thanks, > Carson > > > > On Jan 22, 2016, at 2:38 PM, John Cornelius wrote: > > Hi, I'm using the maker_map_ids script to change the gene ids on an > annotation that I just finished. However, I noticed that it does not change > the name of genes predicted by SNAP. Is there any way to include SNAP genes > for consideration by maker_map_ids? Thanks. > > -- > John Cornelius > MCB PhD Candidate > Arizona State University > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > -- John Cornelius MCB PhD Candidate Arizona State University -------------- next part -------------- An HTML attachment was scrubbed... URL: From sapuizait at gmail.com Wed Jan 27 06:14:45 2016 From: sapuizait at gmail.com (Panos Sapou) Date: Wed, 27 Jan 2016 14:14:45 +0100 Subject: [maker-devel] prokaryotic genome annotation Message-ID: Dear all I recently started using maker for the annotation of my prokaryotic genomes and even if i managed to get some nice results I would like to check with you if what I did was right and also ask you a couple of questions about the procedure I also apologize in advance if I ask sth silly since I am a newbie in bionformatics and I might ask very basic stuff I have only available DNA sequences, I have no ESTs and no proteins 1) I started by using the protein2genome option and as reference I used the Uniref50 database. Then I generated a merged gff file (similar procedure like the one in the tutorial maker) 2) I used Genemark.S and I created a model by using the gmsn.pl command and as input the assembled contigs of my bacteria 3) after finishing the above 2 steps I run maker again by using as input the gff file from step 1: #-------Re-annotation using maker derived GFF3: maker_gff=input.gff and I also set protein_pass=1 is that correct? do you think it helps? and at the #-----gene prediction I used the hmm.mod file generated in step 2 my questions: Do the above sound correct? it is in my understanding that I can only use genemark for prokaryotic genomes, is that correct? when I run maker the second time (step 3) should I set protein2genome=1 or 0? or just having the gff file (from step 1) in the re-annotation options is enough? and thefore prediction based on the protein2genome has already been done? Also if I use a gff file (from step 1) will it make any difference if I set protein2genome=1 and use an extra (different) database? (I was wondering if it will improve the results?) finally regarding the choice of the database: would you advise me to use uniref or the proteomes of closely related bacteria (I have downloaded and created a single fasta from appx 100 proteomes of closely related bacteria) thank you in advance and once again I apologize if it is pretty basic what I am asking, just wanted to make sure... Best Panos -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Jan 27 08:17:37 2016 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 27 Jan 2016 08:17:37 -0700 Subject: [maker-devel] prokaryotic genome annotation In-Reply-To: References: Message-ID: <032FB687-8EDD-49A7-9198-3A5E7FE04C88@gmail.com> Hi Panos, The strategy for annotating prokaryotes is very different than that for eukaryotes. Basically my recommendation is to use Genemark S and set protein2genome=1, keep_preds=1, always_complete=1, and no need for ESTs (irrelevant in prokaryotes). No need to do multiple iterations like you would for eukaryotes either. The bootstrapping procedure is not relevant for prokaryotes. I?d avoid also using the GFF3 passthrough option, you will lose some information about the alignment that affects reading frame of the protein evidence. It can be convenient for large eukaryotes when you are pulling evidence from a database, but if it?s just from a previous maker run, you should just rerun in the same directory with the protein fasta. MAKER will detect that it already ran blastx and pull the raw reports form the previous datastore. Thanks, Carson > On Jan 27, 2016, at 6:14 AM, Panos Sapou wrote: > > Dear all > > I recently started using maker for the annotation of my prokaryotic genomes and even if i managed to get some nice results I would like to check with you if what I did was right and also ask you a couple of questions about the procedure > > I also apologize in advance if I ask sth silly since I am a newbie in bionformatics and I might ask very basic stuff > > > I have only available DNA sequences, I have no ESTs and no proteins > > 1) I started by using the protein2genome option and as reference I used the Uniref50 database. Then I generated a merged gff file (similar procedure like the one in the tutorial maker) > > 2) I used Genemark.S and I created a model by using the gmsn.pl command and as input the assembled contigs of my bacteria > > 3) after finishing the above 2 steps I run maker again by using as input the gff file from step 1: #-------Re-annotation using maker derived GFF3: maker_gff=input.gff > and I also set > protein_pass=1 > is that correct? do you think it helps? > and at the #-----gene prediction I used the hmm.mod file generated in step 2 > > my questions: > Do the above sound correct? > > it is in my understanding that I can only use genemark for prokaryotic genomes, is that correct? > > when I run maker the second time (step 3) should I set protein2genome=1 or 0? or just having the gff file (from step 1) in the re-annotation options is enough? and thefore prediction based on the protein2genome has already been done? > > Also if I use a gff file (from step 1) will it make any difference if I set protein2genome=1 and use an extra (different) database? (I was wondering if it will improve the results?) > > finally regarding the choice of the database: would you advise me to use uniref or the proteomes of closely related bacteria (I have downloaded and created a single fasta from appx 100 proteomes of closely related bacteria) > > thank you in advance > and once again I apologize if it is pretty basic what I am asking, just wanted to make sure... > > > Best > Panos > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From cjfields at illinois.edu Wed Jan 27 11:30:29 2016 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 27 Jan 2016 18:30:29 +0000 Subject: [maker-devel] prokaryotic genome annotation In-Reply-To: <032FB687-8EDD-49A7-9198-3A5E7FE04C88@gmail.com> References: <032FB687-8EDD-49A7-9198-3A5E7FE04C88@gmail.com> Message-ID: <6B0EA45F-9526-4ED2-AED7-1DA3E1AEDD24@illinois.edu> We?re thinking of switching our bacterial pipeline to MAKER actually. We generally use other bacteria-specific gene pred tools like Glimmer and Prodigal, though I anticipate these could be added using pred_gff (as long as the GFF3 is fine)? chris On Jan 27, 2016, at 9:17 AM, Carson Holt > wrote: Hi Panos, The strategy for annotating prokaryotes is very different than that for eukaryotes. Basically my recommendation is to use Genemark S and set protein2genome=1, keep_preds=1, always_complete=1, and no need for ESTs (irrelevant in prokaryotes). No need to do multiple iterations like you would for eukaryotes either. The bootstrapping procedure is not relevant for prokaryotes. I?d avoid also using the GFF3 passthrough option, you will lose some information about the alignment that affects reading frame of the protein evidence. It can be convenient for large eukaryotes when you are pulling evidence from a database, but if it?s just from a previous maker run, you should just rerun in the same directory with the protein fasta. MAKER will detect that it already ran blastx and pull the raw reports form the previous datastore. Thanks, Carson On Jan 27, 2016, at 6:14 AM, Panos Sapou > wrote: Dear all I recently started using maker for the annotation of my prokaryotic genomes and even if i managed to get some nice results I would like to check with you if what I did was right and also ask you a couple of questions about the procedure I also apologize in advance if I ask sth silly since I am a newbie in bionformatics and I might ask very basic stuff I have only available DNA sequences, I have no ESTs and no proteins 1) I started by using the protein2genome option and as reference I used the Uniref50 database. Then I generated a merged gff file (similar procedure like the one in the tutorial maker) 2) I used Genemark.S and I created a model by using the gmsn.pl command and as input the assembled contigs of my bacteria 3) after finishing the above 2 steps I run maker again by using as input the gff file from step 1: #-------Re-annotation using maker derived GFF3: maker_gff=input.gff and I also set protein_pass=1 is that correct? do you think it helps? and at the #-----gene prediction I used the hmm.mod file generated in step 2 my questions: Do the above sound correct? it is in my understanding that I can only use genemark for prokaryotic genomes, is that correct? when I run maker the second time (step 3) should I set protein2genome=1 or 0? or just having the gff file (from step 1) in the re-annotation options is enough? and thefore prediction based on the protein2genome has already been done? Also if I use a gff file (from step 1) will it make any difference if I set protein2genome=1 and use an extra (different) database? (I was wondering if it will improve the results?) finally regarding the choice of the database: would you advise me to use uniref or the proteomes of closely related bacteria (I have downloaded and created a single fasta from appx 100 proteomes of closely related bacteria) thank you in advance and once again I apologize if it is pretty basic what I am asking, just wanted to make sure... Best Panos _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Jan 27 15:42:59 2016 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 27 Jan 2016 15:42:59 -0700 Subject: [maker-devel] prokaryotic genome annotation In-Reply-To: <6B0EA45F-9526-4ED2-AED7-1DA3E1AEDD24@illinois.edu> References: <032FB687-8EDD-49A7-9198-3A5E7FE04C88@gmail.com> <6B0EA45F-9526-4ED2-AED7-1DA3E1AEDD24@illinois.edu> Message-ID: GFF3 is just fine for adding predictions. For prokaryotes, I don?t like to add protein evidence that way, but for predictions it?s fine. The only issue I could see going forward would be a lack of support for alternate codon usage in MAKER right now. Everything is being interpreted using the canonical codon table. It?s not an insurmountable issue, but it would take some work to let it do that. --Carson > On Jan 27, 2016, at 11:30 AM, Fields, Christopher J wrote: > > We?re thinking of switching our bacterial pipeline to MAKER actually. We generally use other bacteria-specific gene pred tools like Glimmer and Prodigal, though I anticipate these could be added using pred_gff (as long as the GFF3 is fine)? > > chris > >> On Jan 27, 2016, at 9:17 AM, Carson Holt > wrote: >> >> Hi Panos, >> >> The strategy for annotating prokaryotes is very different than that for eukaryotes. Basically my recommendation is to use Genemark S and set protein2genome=1, keep_preds=1, always_complete=1, and no need for ESTs (irrelevant in prokaryotes). No need to do multiple iterations like you would for eukaryotes either. The bootstrapping procedure is not relevant for prokaryotes. I?d avoid also using the GFF3 passthrough option, you will lose some information about the alignment that affects reading frame of the protein evidence. It can be convenient for large eukaryotes when you are pulling evidence from a database, but if it?s just from a previous maker run, you should just rerun in the same directory with the protein fasta. MAKER will detect that it already ran blastx and pull the raw reports form the previous datastore. >> >> Thanks, >> Carson >> >> >> >>> On Jan 27, 2016, at 6:14 AM, Panos Sapou > wrote: >>> >>> Dear all >>> >>> I recently started using maker for the annotation of my prokaryotic genomes and even if i managed to get some nice results I would like to check with you if what I did was right and also ask you a couple of questions about the procedure >>> >>> I also apologize in advance if I ask sth silly since I am a newbie in bionformatics and I might ask very basic stuff >>> >>> >>> I have only available DNA sequences, I have no ESTs and no proteins >>> >>> 1) I started by using the protein2genome option and as reference I used the Uniref50 database. Then I generated a merged gff file (similar procedure like the one in the tutorial maker) >>> >>> 2) I used Genemark.S and I created a model by using the gmsn.pl command and as input the assembled contigs of my bacteria >>> >>> 3) after finishing the above 2 steps I run maker again by using as input the gff file from step 1: #-------Re-annotation using maker derived GFF3: maker_gff=input.gff >>> and I also set >>> protein_pass=1 >>> is that correct? do you think it helps? >>> and at the #-----gene prediction I used the hmm.mod file generated in step 2 >>> >>> my questions: >>> Do the above sound correct? >>> >>> it is in my understanding that I can only use genemark for prokaryotic genomes, is that correct? >>> >>> when I run maker the second time (step 3) should I set protein2genome=1 or 0? or just having the gff file (from step 1) in the re-annotation options is enough? and thefore prediction based on the protein2genome has already been done? >>> >>> Also if I use a gff file (from step 1) will it make any difference if I set protein2genome=1 and use an extra (different) database? (I was wondering if it will improve the results?) >>> >>> finally regarding the choice of the database: would you advise me to use uniref or the proteomes of closely related bacteria (I have downloaded and created a single fasta from appx 100 proteomes of closely related bacteria) >>> >>> thank you in advance >>> and once again I apologize if it is pretty basic what I am asking, just wanted to make sure... >>> >>> >>> Best >>> Panos >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sapuizait at gmail.com Fri Jan 29 03:12:35 2016 From: sapuizait at gmail.com (Panos Sapou) Date: Fri, 29 Jan 2016 11:12:35 +0100 Subject: [maker-devel] prokaryotic genome annotation In-Reply-To: References: Message-ID: Dear all I am trying to annotate a new spiroplasma strain and I would like to know if there is a way to change the stop codons (not take into account 'tga') cause eitherwise I get too many premature stop codons and fragmented genes that are not real Best Panos On 27 January 2016 at 14:14, Panos Sapou wrote: > Dear all > > I recently started using maker for the annotation of my prokaryotic > genomes and even if i managed to get some nice results I would like to > check with you if what I did was right and also ask you a couple of > questions about the procedure > > I also apologize in advance if I ask sth silly since I am a newbie in > bionformatics and I might ask very basic stuff > > > I have only available DNA sequences, I have no ESTs and no proteins > > 1) I started by using the protein2genome option and as reference I used > the Uniref50 database. Then I generated a merged gff file (similar > procedure like the one in the tutorial maker) > > 2) I used Genemark.S and I created a model by using the gmsn.pl command > and as input the assembled contigs of my bacteria > > 3) after finishing the above 2 steps I run maker again by using as input > the gff file from step 1: #-------Re-annotation using maker derived GFF3: > maker_gff=input.gff > and I also set > protein_pass=1 > is that correct? do you think it helps? > and at the #-----gene prediction I used the hmm.mod file generated in step > 2 > > my questions: > Do the above sound correct? > > it is in my understanding that I can only use genemark for prokaryotic > genomes, is that correct? > > when I run maker the second time (step 3) should I set protein2genome=1 or > 0? or just having the gff file (from step 1) in the re-annotation options > is enough? and thefore prediction based on the protein2genome has already > been done? > > Also if I use a gff file (from step 1) will it make any difference if I > set protein2genome=1 and use an extra (different) database? (I was > wondering if it will improve the results?) > > finally regarding the choice of the database: would you advise me to use > uniref or the proteomes of closely related bacteria (I have downloaded and > created a single fasta from appx 100 proteomes of closely related bacteria) > > thank you in advance > and once again I apologize if it is pretty basic what I am asking, just > wanted to make sure... > > > Best > Panos > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Sun Jan 31 13:43:21 2016 From: carsonhh at gmail.com (Carson Holt) Date: Sun, 31 Jan 2016 13:43:21 -0700 Subject: [maker-devel] prokaryotic genome annotation In-Reply-To: References: Message-ID: MAKER doesn?t support alternate codon usage yet. ?Carson > On Jan 29, 2016, at 3:12 AM, Panos Sapou wrote: > > Dear all > > I am trying to annotate a new spiroplasma strain and I would like to know if there is a way to change the stop codons (not take into account 'tga') > > cause eitherwise I get too many premature stop codons and fragmented genes that are not real > > Best > Panos > > On 27 January 2016 at 14:14, Panos Sapou > wrote: > Dear all > > I recently started using maker for the annotation of my prokaryotic genomes and even if i managed to get some nice results I would like to check with you if what I did was right and also ask you a couple of questions about the procedure > > I also apologize in advance if I ask sth silly since I am a newbie in bionformatics and I might ask very basic stuff > > > I have only available DNA sequences, I have no ESTs and no proteins > > 1) I started by using the protein2genome option and as reference I used the Uniref50 database. Then I generated a merged gff file (similar procedure like the one in the tutorial maker) > > 2) I used Genemark.S and I created a model by using the gmsn.pl command and as input the assembled contigs of my bacteria > > 3) after finishing the above 2 steps I run maker again by using as input the gff file from step 1: #-------Re-annotation using maker derived GFF3: maker_gff=input.gff > and I also set > protein_pass=1 > is that correct? do you think it helps? > and at the #-----gene prediction I used the hmm.mod file generated in step 2 > > my questions: > Do the above sound correct? > > it is in my understanding that I can only use genemark for prokaryotic genomes, is that correct? > > when I run maker the second time (step 3) should I set protein2genome=1 or 0? or just having the gff file (from step 1) in the re-annotation options is enough? and thefore prediction based on the protein2genome has already been done? > > Also if I use a gff file (from step 1) will it make any difference if I set protein2genome=1 and use an extra (different) database? (I was wondering if it will improve the results?) > > finally regarding the choice of the database: would you advise me to use uniref or the proteomes of closely related bacteria (I have downloaded and created a single fasta from appx 100 proteomes of closely related bacteria) > > thank you in advance > and once again I apologize if it is pretty basic what I am asking, just wanted to make sure... > > > Best > Panos > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From arnstrm at gmail.com Fri Jan 1 11:31:06 2016 From: arnstrm at gmail.com (Arun Seetharam) Date: Fri, 1 Jan 2016 12:31:06 -0600 Subject: [maker-devel] running MAKER to merge annotations Message-ID: Hi all, First of all, a very happy new year to all of you! I hope everyone is having a great holiday season. I have a question about Maker. For my grass species, I ran 3 separate rounds of MAKER (low GC, regular GC and high GC) and I now have 3 maker gff files. So, what is the correct way to merge these files to a single gff file? Do I have to run a maker round with just the GFF files as input? It looks like EVM especially meant to do this kind of job, but not sure if Maker does this too. Thanks for any help or suggestions! Have a nice day, -- Arun Seetharam Post Doctoral Research Associate Genome Informatics Facility & EEOB Office of Biotechnology 228 Science I Iowa State University Ames, Iowa 50011 -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Fri Jan 1 11:37:38 2016 From: dence at genetics.utah.edu (Daniel Ence) Date: Fri, 1 Jan 2016 18:37:38 +0000 Subject: [maker-devel] running MAKER to merge annotations In-Reply-To: References: Message-ID: Hi Arun, are the three rounds of maker on different parts/versions of the genome or did you maker on the same genome with three different settings? If it?s the former, then you can merge the maker gff files with gff3_merge, which is included with your maker installation. If it?s the latter case then I do think EVM could help if you want to give the different result sets different confidence weights. If you want to give them all the same weight, then you could do another run of maker, and pass them through as either models or predictions. ~Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 > On Jan 1, 2016, at 11:31 AM, Arun Seetharam wrote: > > Hi all, > > First of all, a very happy new year to all of you! I hope everyone is having a great holiday season. > > I have a question about Maker. For my grass species, I ran 3 separate rounds of MAKER (low GC, regular GC and high GC) and I now have 3 maker gff files. So, what is the correct way to merge these files to a single gff file? Do I have to run a maker round with just the GFF files as input? It looks like EVM especially meant to do this kind of job, but not sure if Maker does this too. > > Thanks for any help or suggestions! > > Have a nice day, > -- > Arun Seetharam > Post Doctoral Research Associate > Genome Informatics Facility & EEOB > Office of Biotechnology > 228 Science I > Iowa State University > Ames, Iowa 50011 > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From arnstrm at gmail.com Fri Jan 1 12:17:18 2016 From: arnstrm at gmail.com (Arun Seetharam) Date: Fri, 1 Jan 2016 13:17:18 -0600 Subject: [maker-devel] running MAKER to merge annotations In-Reply-To: References: Message-ID: Hi Daniel, Thanks very much for the reply! It is the latter: same input genome under 3 settings (training was done using a different set of genes for all the gene predictors). I simply want to get a single gff, retaining only the best model(s) for each locus. Are you suggesting that I can run MAKER by providing 3 files for "maker_gff" (in maker_opts.ctl) and keeping everything else default? or do I have to do something in the CTL file to achieve this? I appreciate if you can provide more details for how to do this! Thanks once again for the reply! On Fri, Jan 1, 2016 at 12:37 PM, Daniel Ence wrote: > Hi Arun, are the three rounds of maker on different parts/versions of the > genome or did you maker on the same genome with three different settings? > If it?s the former, then you can merge the maker gff files with gff3_merge, > which is included with your maker installation. > > If it?s the latter case then I do think EVM could help if you want to give > the different result sets different confidence weights. If you want to give > them all the same weight, then you could do another run of maker, and pass > them through as either models or predictions. > > ~Daniel > > > > > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > > > On Jan 1, 2016, at 11:31 AM, Arun Seetharam wrote: > > > > Hi all, > > > > First of all, a very happy new year to all of you! I hope everyone is > having a great holiday season. > > > > I have a question about Maker. For my grass species, I ran 3 separate > rounds of MAKER (low GC, regular GC and high GC) and I now have 3 maker gff > files. So, what is the correct way to merge these files to a single gff > file? Do I have to run a maker round with just the GFF files as input? It > looks like EVM especially meant to do this kind of job, but not sure if > Maker does this too. > > > > Thanks for any help or suggestions! > > > > Have a nice day, > > -- > > Arun Seetharam > > Post Doctoral Research Associate > > Genome Informatics Facility & EEOB > > Office of Biotechnology > > 228 Science I > > Iowa State University > > Ames, Iowa 50011 > > _______________________________________________ > > maker-devel mailing list > > maker-devel at box290.bluehost.com > > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -- Arun Seetharam Post Doctoral Research Associate Genome Informatics Facility & EEOB Office of Biotechnology 228 Science I Iowa State University Ames, Iowa 50011 -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Jan 1 12:26:14 2016 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 1 Jan 2016 12:26:14 -0700 Subject: [maker-devel] running MAKER to merge annotations In-Reply-To: References: Message-ID: <28388091-0956-412B-B472-0A272FA31269@gmail.com> If you are running with different settings on the exact same contig, you will have to merge the models using the -l legacy option of gff3_merge to ensure there will be no ID collisions (some things will have the same IDs in the different runs). Then supply just the genes to pred_gff on the rerun. Alternatively you could have just provided your different predictor files as a comma separated list (i.e. snaphmm=hmm1,hmm2,hmm3). MAKER would have ran each one and kept just the one that best matched the evidence. However because MAKER passes hints to the predictors (which override the HMM for the most part), I have found that running with different predictor settings because of GC differences between contigs doesn?t provide the benefit you would think. ?Carson > On Jan 1, 2016, at 12:17 PM, Arun Seetharam wrote: > > Hi Daniel, > > Thanks very much for the reply! It is the latter: same input genome under 3 settings (training was done using a different set of genes for all the gene predictors). I simply want to get a single gff, retaining only the best model(s) for each locus. > Are you suggesting that I can run MAKER by providing 3 files for "maker_gff" (in maker_opts.ctl) and keeping everything else default? or do I have to do something in the CTL file to achieve this? I appreciate if you can provide more details for how to do this! > > Thanks once again for the reply! > > On Fri, Jan 1, 2016 at 12:37 PM, Daniel Ence > wrote: > Hi Arun, are the three rounds of maker on different parts/versions of the genome or did you maker on the same genome with three different settings? If it?s the former, then you can merge the maker gff files with gff3_merge, which is included with your maker installation. > > If it?s the latter case then I do think EVM could help if you want to give the different result sets different confidence weights. If you want to give them all the same weight, then you could do another run of maker, and pass them through as either models or predictions. > > ~Daniel > > > > > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > > > On Jan 1, 2016, at 11:31 AM, Arun Seetharam > wrote: > > > > Hi all, > > > > First of all, a very happy new year to all of you! I hope everyone is having a great holiday season. > > > > I have a question about Maker. For my grass species, I ran 3 separate rounds of MAKER (low GC, regular GC and high GC) and I now have 3 maker gff files. So, what is the correct way to merge these files to a single gff file? Do I have to run a maker round with just the GFF files as input? It looks like EVM especially meant to do this kind of job, but not sure if Maker does this too. > > > > Thanks for any help or suggestions! > > > > Have a nice day, > > -- > > Arun Seetharam > > Post Doctoral Research Associate > > Genome Informatics Facility & EEOB > > Office of Biotechnology > > 228 Science I > > Iowa State University > > Ames, Iowa 50011 > > _______________________________________________ > > maker-devel mailing list > > maker-devel at box290.bluehost.com > > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > > -- > Arun Seetharam > Post Doctoral Research Associate > Genome Informatics Facility & EEOB > Office of Biotechnology > 228 Science I > Iowa State University > Ames, Iowa 50011 > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From zhangzb554 at nenu.edu.cn Sat Jan 2 09:09:24 2016 From: zhangzb554 at nenu.edu.cn (=?UTF-8?B?5byg5b+X5paM?=) Date: Sun, 3 Jan 2016 00:09:24 +0800 (GMT+08:00) Subject: [maker-devel] =?utf-8?q?maker-devel_Digest=2C_Vol_92=2C_Issue_1?= In-Reply-To: Message-ID: Hi every one
I wonder where I can downlaod the perl package proc::signal? I can not find it in CPAN. who could send me the package or give me the website where i can get it ?

thans for your help At 2016-01-02 03:00:02, maker-devel-request at yandell-lab.org wrote: >Send maker-devel mailing list submissions to > maker-devel at yandell-lab.org > >To subscribe or unsubscribe via the World Wide Web, visit > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > >or, via email, send a message with subject or body 'help' to > maker-devel-request at yandell-lab.org > >You can reach the person managing the list at > maker-devel-owner at yandell-lab.org > >When replying, please edit your Subject line so it is more specific >than "Re: Contents of maker-devel digest..." > > >Today's Topics: > > 1. running MAKER to merge annotations (Arun Seetharam) > 2. Re: running MAKER to merge annotations (Daniel Ence) > > >---------------------------------------------------------------------- > >Message: 1 >Date: Fri, 1 Jan 2016 12:31:06 -0600 >From: Arun Seetharam >To: maker-devel at yandell-lab.org >Subject: [maker-devel] running MAKER to merge annotations >Message-ID: > >Content-Type: text/plain; charset="utf-8" > >Hi all, > >First of all, a very happy new year to all of you! I hope everyone is >having a great holiday season. > >I have a question about Maker. For my grass species, I ran 3 separate >rounds of MAKER (low GC, regular GC and high GC) and I now have 3 maker gff >files. So, what is the correct way to merge these files to a single gff >file? Do I have to run a maker round with just the GFF files as input? It >looks like EVM especially meant to do this kind of job, but not sure if >Maker does this too. > >Thanks for any help or suggestions! > >Have a nice day, >-- >Arun Seetharam >Post Doctoral Research Associate >Genome Informatics Facility & EEOB >Office of Biotechnology >228 Science I >Iowa State University >Ames, Iowa 50011 >-------------- next part -------------- >An HTML attachment was scrubbed... >URL: > >------------------------------ > >Message: 2 >Date: Fri, 1 Jan 2016 18:37:38 +0000 >From: Daniel Ence >To: Arun Seetharam >Cc: "maker-devel at yandell-lab.org" >Subject: Re: [maker-devel] running MAKER to merge annotations >Message-ID: >Content-Type: text/plain; charset="utf-8" > >Hi Arun, are the three rounds of maker on different parts/versions of the genome or did you maker on the same genome with three different settings? If it?s the former, then you can merge the maker gff files with gff3_merge, which is included with your maker installation. > >If it?s the latter case then I do think EVM could help if you want to give the different result sets different confidence weights. If you want to give them all the same weight, then you could do another run of maker, and pass them through as either models or predictions. > >~Daniel > > > > > >Daniel Ence >Graduate Student >Eccles Institute of Human Genetics >University of Utah >15 North 2030 East, Room 2100 >Salt Lake City, UT 84112-5330 > >> On Jan 1, 2016, at 11:31 AM, Arun Seetharam wrote: >> >> Hi all, >> >> First of all, a very happy new year to all of you! I hope everyone is having a great holiday season. >> >> I have a question about Maker. For my grass species, I ran 3 separate rounds of MAKER (low GC, regular GC and high GC) and I now have 3 maker gff files. So, what is the correct way to merge these files to a single gff file? Do I have to run a maker round with just the GFF files as input? It looks like EVM especially meant to do this kind of job, but not sure if Maker does this too. >> >> Thanks for any help or suggestions! >> >> Have a nice day, >> -- >> Arun Seetharam >> Post Doctoral Research Associate >> Genome Informatics Facility & EEOB >> Office of Biotechnology >> 228 Science I >> Iowa State University >> Ames, Iowa 50011 >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > >------------------------------ > >Subject: Digest Footer > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > >------------------------------ > >End of maker-devel Digest, Vol 92, Issue 1 >****************************************** From arnstrm at iastate.edu Fri Jan 1 11:05:35 2016 From: arnstrm at iastate.edu (Arun Seetharam) Date: Fri, 1 Jan 2016 12:05:35 -0600 Subject: [maker-devel] running MAKER to merge annotations Message-ID: Hi all, First of all, a very happy new year to all of you! I hope everyone is having a great holiday season. I have a question about Maker. For my grass species, I ran 3 separate rounds of MAKER (low GC, regular GC and high GC) and I now have 3 maker gff files. So, what is the correct way to merge these files to a single gff file? Do I have to run a maker round with just the GFF files as input? It looks like EVM especially meant to do this kind of job, but not sure if Maker does this too. Thanks for any help or suggestions! Have a nice day, This email has been sent from a virus-free computer protected by Avast. www.avast.com <#DDB4FAA8-2DD7-40BB-A1B8-4E2AA1F9FDF2> -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Jan 4 09:04:29 2016 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 4 Jan 2016 09:04:29 -0700 Subject: [maker-devel] Error with maker_functional_gff In-Reply-To: References: <1EBE8B59-ED4E-4017-99CE-6CD5A5662B74@genetics.utah.edu> Message-ID: Perhaps the easiest way to look at this is if you send us the files. I?m still leaning towards a format error. But it?s the kind of thing where I would need the files to find the specific entry. ?Carson > On Dec 16, 2015, at 11:32 PM, Ole Kristian T?rresen wrote: > > Here's the hits for GAMO_00029233 > >sp|Q9SUR9|SGT1A_ARATH Protein SGT1 homolog A OS=Arabidopsis thaliana GN=SGT1A PE=1 SV=1 > >sp|Q9SUT5|SGT1B_ARATH Protein SGT1 homolog B OS=Arabidopsis thaliana GN=SGT1B PE=1 SV=1 > >sp|Q2KIK0|SGT1_BOVIN Protein SGT1 homolog OS=Bos taurus GN=SUGT1 PE=2 SV=1 > >sp|Q55ED0|SGT1_DICDI Protein SGT1 homolog OS=Dictyostelium discoideum GN=sugt1 PE=2 SV=1 > >sp|Q9Y2Z0|SGT1_HUMAN Protein SGT1 homolog OS=Homo sapiens GN=SUGT1 PE=1 SV=3 > >sp|Q9CX34|SGT1_MOUSE Protein SGT1 homolog OS=Mus musculus GN=Sugt1 PE=1 SV=3 > >sp|Q0JL44|SGT1_ORYSJ Protein SGT1 homolog OS=Oryza sativa subsp. japonica GN=SGT1 PE=1 SV=1 > >sp|B0BN85|SGT1_RAT Protein SGT1 homolog OS=Rattus norvegicus GN=Sugt1 PE=2 SV=1 > > The bovin is the first hit. I can't really see anything different about that. > > I'm don't know perl that well. Do you have some code which I can use to debug this? In line 58 it tries to access the blast hash with the ID as a key, if I understand this correctly. Either the hash is empty where the key tries to access, or the key is empty. If I could print each ID as it is found, maybe I can find a pattern. And/or print each blast entry when the blast hash is created. > > Thank you. > > Ole > > On 16 December 2015 at 21:55, Carson Holt > wrote: > Find the hit for GAMO_00029233 and then pull it?s header line out of the Uniprot fasta file. There may be an unexpected formatting difference in that header. > > ?Carson > > > >> On Dec 16, 2015, at 1:53 PM, Ole Kristian T?rresen > wrote: >> >> Daniel, >> this is the previous gene, before maker_functional_gff: >> LG08 maker gene 13648888 13656687 . - . ID=GAMO_00029212;Name=GAMO_00029212;Alias=maker-LG08-snap-gene-46.325; >> LG08 maker mRNA 13648888 13656687 . - . ID=GAMO_00029212-RA;Parent=GAMO_00029212;Name=GAMO_00029212-RA;Alias=maker-LG08-snap-gene-46.325-mRNA-1;_AED=0.45;_QI=0|0.83|0.84|1|0.5|0.61|13|1843|351;_eAED=0.45; >> LG08 maker exon 13648888 13648944 . - . ID=GAMO_00029212-RA:exon:9363;Parent=GAMO_00029212-RA; >> LG08 maker exon 13649295 13649577 . - . ID=GAMO_00029212-RA:exon:9362;Parent=GAMO_00029212-RA; >> LG08 maker exon 13649816 13651468 . - . ID=GAMO_00029212-RA:exon:9361;Parent=GAMO_00029212-RA; >> LG08 maker exon 13651736 13651789 . - . ID=GAMO_00029212-RA:exon:9360;Parent=GAMO_00029212-RA; >> LG08 maker exon 13652270 13652365 . - . ID=GAMO_00029212-RA:exon:9359;Parent=GAMO_00029212-RA; >> LG08 maker exon 13652643 13652730 . - . ID=GAMO_00029212-RA:exon:9358;Parent=GAMO_00029212-RA; >> LG08 maker exon 13653175 13653212 . - . ID=GAMO_00029212-RA:exon:9357;Parent=GAMO_00029212-RA; >> LG08 maker exon 13653587 13653641 . - . ID=GAMO_00029212-RA:exon:9356;Parent=GAMO_00029212-RA; >> LG08 maker exon 13653764 13653817 . - . ID=GAMO_00029212-RA:exon:9355;Parent=GAMO_00029212-RA; >> LG08 maker exon 13653910 13653974 . - . ID=GAMO_00029212-RA:exon:9354;Parent=GAMO_00029212-RA; >> LG08 maker exon 13654085 13654164 . - . ID=GAMO_00029212-RA:exon:9353;Parent=GAMO_00029212-RA; >> LG08 maker exon 13654474 13654828 . - . ID=GAMO_00029212-RA:exon:9352;Parent=GAMO_00029212-RA; >> LG08 maker exon 13656667 13656687 . - . ID=GAMO_00029212-RA:exon:9351;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13656667 13656687 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13654474 13654828 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13654085 13654164 . - 2 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13653910 13653974 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13653764 13653817 . - 1 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13653587 13653641 . - 1 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13653175 13653212 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13652643 13652730 . - 1 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13652270 13652365 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13651736 13651789 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13651319 13651468 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker three_prime_UTR 13649816 13651318 . - . ID=GAMO_00029212-RA:three_prime_utr;Parent=GAMO_00029212-RA; >> LG08 maker three_prime_UTR 13649295 13649577 . - . ID=GAMO_00029212-RA:three_prime_utr;Parent=GAMO_00029212-RA; >> LG08 maker three_prime_UTR 13648888 13648944 . - . ID=GAMO_00029212-RA:three_prime_utr;Parent=GAMO_00029212-RA; >> LG08 maker gene 13786695 13806565 . - . ID=GAMO_00029233;Name=GAMO_00029233;Alias=maker-LG08-snap-gene-46.343; >> LG08 maker mRNA 13786695 13806565 . - . ID=GAMO_00029233-RA;Parent=GAMO_00029233;Name=GAMO_00029233-RA;Alias=maker-LG08-snap-gene-46.343-mRNA-1;_AED=0.47;_QI=173|0.78|0.66|1|0.21|0.26|15|0|301;_eAED=0.47; >> >> After : >> LG08 maker gene 13648888 13656687 . - . ID=GAMO_00029212;Name=GAMO_00029212;Alias=maker-LG08-snap-gene-46.325;Note=Similar to Tmbim1: Protein lifeguard 3 (Mus musculus); >> LG08 maker mRNA 13648888 13656687 . - . ID=GAMO_00029212-RA;Parent=GAMO_00029212;Name=GAMO_00029212-RA;Alias=maker-LG08-snap-gene-46.325-mRNA-1;_AED=0.45;_QI=0|0.83|0.84|1|0.5|0.61|13|1843|351;_eAED=0.45;Note=Similar to Tmbim1: Protein lifeguard 3 (Mus musculus); >> LG08 maker exon 13648888 13648944 . - . ID=GAMO_00029212-RA:exon:9363;Parent=GAMO_00029212-RA; >> LG08 maker exon 13649295 13649577 . - . ID=GAMO_00029212-RA:exon:9362;Parent=GAMO_00029212-RA; >> LG08 maker exon 13649816 13651468 . - . ID=GAMO_00029212-RA:exon:9361;Parent=GAMO_00029212-RA; >> LG08 maker exon 13651736 13651789 . - . ID=GAMO_00029212-RA:exon:9360;Parent=GAMO_00029212-RA; >> LG08 maker exon 13652270 13652365 . - . ID=GAMO_00029212-RA:exon:9359;Parent=GAMO_00029212-RA; >> LG08 maker exon 13652643 13652730 . - . ID=GAMO_00029212-RA:exon:9358;Parent=GAMO_00029212-RA; >> LG08 maker exon 13653175 13653212 . - . ID=GAMO_00029212-RA:exon:9357;Parent=GAMO_00029212-RA; >> LG08 maker exon 13653587 13653641 . - . ID=GAMO_00029212-RA:exon:9356;Parent=GAMO_00029212-RA; >> LG08 maker exon 13653764 13653817 . - . ID=GAMO_00029212-RA:exon:9355;Parent=GAMO_00029212-RA; >> LG08 maker exon 13653910 13653974 . - . ID=GAMO_00029212-RA:exon:9354;Parent=GAMO_00029212-RA; >> LG08 maker exon 13654085 13654164 . - . ID=GAMO_00029212-RA:exon:9353;Parent=GAMO_00029212-RA; >> LG08 maker exon 13654474 13654828 . - . ID=GAMO_00029212-RA:exon:9352;Parent=GAMO_00029212-RA; >> LG08 maker exon 13656667 13656687 . - . ID=GAMO_00029212-RA:exon:9351;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13656667 13656687 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13654474 13654828 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13654085 13654164 . - 2 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13653910 13653974 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13653764 13653817 . - 1 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13653587 13653641 . - 1 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13653175 13653212 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13652643 13652730 . - 1 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13652270 13652365 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13651736 13651789 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13651319 13651468 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker three_prime_UTR 13649816 13651318 . - . ID=GAMO_00029212-RA:three_prime_utr;Parent=GAMO_00029212-RA; >> LG08 maker three_prime_UTR 13649295 13649577 . - . ID=GAMO_00029212-RA:three_prime_utr;Parent=GAMO_00029212-RA; >> LG08 maker three_prime_UTR 13648888 13648944 . - . ID=GAMO_00029212-RA:three_prime_utr;Parent=GAMO_00029212-RA; >> >> Carson, I saw that, but I did use Uniprot/Swiss-prot. A snap of the blast-output used as input here: >> GAMO_00029212-RA sp|Q8BJZ3|LFG3_MOUSE 53.93 280 112 3 81 348 33 307 2e-92 285 >> GAMO_00029212-RA sp|Q969X1|LFG3_HUMAN 54.51 288 103 5 76 347 33 308 4e-92 284 >> GAMO_00029212-RA sp|Q9BWQ8|LFG2_HUMAN 45.73 328 134 6 44 351 13 316 2e-86 270 >> GAMO_00029212-RA sp|Q5R4I4|LFG2_PONAB 45.73 328 134 6 44 351 13 316 3e-86 269 >> GAMO_00029212-RA sp|Q1LZ71|LFG2_BOVIN 45.03 322 145 5 44 351 13 316 5e-84 264 >> GAMO_00029212-RA sp|O88407|LFG2_RAT 44.65 327 139 6 44 351 13 316 8e-83 261 >> GAMO_00029212-RA sp|Q8K097|LFG2_MOUSE 45.16 310 129 5 60 351 31 317 1e-80 255 >> GAMO_00029212-RA sp|Q7Z429|LFG1_HUMAN 39.32 351 164 9 32 351 39 371 6e-69 226 >> GAMO_00029212-RA sp|Q32L53|LFG1_BOVIN 41.69 343 158 8 29 351 46 366 8e-66 218 >> GAMO_00029212-RA sp|Q9ESF4|LFG1_MOUSE 40.43 324 156 8 53 351 34 345 2e-59 201 >> GAMO_00029212-RA sp|Q6P6R0|LFG1_RAT 39.71 345 165 11 34 351 20 348 2e-59 201 >> GAMO_00029212-RA sp|Q9DA39|LFG4_MOUSE 35.59 222 120 7 142 351 27 237 3e-24 103 >> GAMO_00029212-RA sp|Q49P94|GAAP_VACCL 33.47 239 128 9 113 337 1 222 5e-22 97.1 >> GAMO_00029233-RA sp|Q2KIK0|SGT1_BOVIN 53.18 299 100 3 5 268 17 310 5e-89 275 >> GAMO_00029233-RA sp|B0BN85|SGT1_RAT 51.51 299 104 3 5 268 16 308 5e-86 268 >> GAMO_00029233-RA sp|Q9CX34|SGT1_MOUSE 51.51 299 104 3 5 268 16 308 8e-86 267 >> GAMO_00029233-RA sp|Q9Y2Z0|SGT1_HUMAN 46.83 331 100 5 5 268 16 337 1e-80 254 >> GAMO_00029233-RA sp|Q0JL44|SGT1_ORYSJ 30.75 322 160 4 10 268 16 337 5e-36 137 >> GAMO_00029233-RA sp|Q9SUT5|SGT1B_ARATH 27.99 318 171 4 9 268 11 328 3e-35 135 >> GAMO_00029233-RA sp|Q9SUR9|SGT1A_ARATH 28.28 297 159 5 24 268 26 320 7e-35 134 >> GAMO_00029233-RA sp|Q55ED0|SGT1_DICDI 37.72 167 63 3 138 268 196 357 5e-25 107 >> >> 521 genes have had added function before maker_functional_gff choked particular gene GAMO_00029233. >> >> Thank you. >> >> Ole >> >> >> On 16 December 2015 at 20:37, Carson Holt > wrote: >> I?ve seen this exact same error before (https://groups.google.com/forum/#!searchin/maker-devel/$2Fmaker_functional_gff$20line$2058/maker-devel/cBuQMKTJj2M/aXGnARZ7JhsJ ). >> >> It is caused by the ID from the blast report and input protein fasta. maker_functional_gff is not a generic script that can work on any input, it only works on blast results against Uniprot/Swiss-prot. The script is expecting a very specific header format in both the report and the protein fasta and if it doesn?t see it, then it is missing certain pieces of needed information. >> >> Thanks, >> Carson >> >>> On Dec 16, 2015, at 12:27 PM, Daniel Ence > wrote: >>> >>> Hi Ole, can you send a line for a gene feature that does work? >>> >>> >>> Daniel Ence >>> Graduate Student >>> Eccles Institute of Human Genetics >>> University of Utah >>> 15 North 2030 East, Room 2100 >>> Salt Lake City, UT 84112-5330 >>> >>>> On Dec 14, 2015, at 12:21 PM, Ole Kristian T?rresen > wrote: >>>> >>>> Hi, >>>> I'm trying to update my annotation with some functional annotations with maker_functional_gff, but get this annoying error: >>>> Can't use string ("") as a HASH ref while "strict refs" in use at /cluster/software/VERSIONS/maker-2.31.8/bin/maker_functional_gff line 58, <$IN> line 108947. >>>> Line 108947 in the input gff is this: >>>> >>>> LG08 maker gene 13786695 13806565 . - . ID=GAMO_00029233;Name=GAMO_00029233;Alias=maker-LG08-snap-gene-46.343; >>>> It seems like the regexp in line 55 in the maker_functional_gff script doesn't pick up the ID, but I can't see any difference between that line and other similar lines. >>>> >>>> Any help to trace down this is really appreciated. Do you need any other information? >>>> >>>> Thank you. >>>> >>>> Sincerely, >>>> >>>> Ole Kristian T?rresen >>>> >>>> >>>> >>>> _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at box290.bluehost.com >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ole.toerresen at gmail.com Mon Jan 4 12:08:43 2016 From: ole.toerresen at gmail.com (=?UTF-8?Q?Ole_Kristian_T=C3=B8rresen?=) Date: Mon, 4 Jan 2016 20:08:43 +0100 Subject: [maker-devel] Error with maker_functional_gff In-Reply-To: References: <1EBE8B59-ED4E-4017-99CE-6CD5A5662B74@genetics.utah.edu> Message-ID: I found the mistake, I used different versions of SwissProt/UniProt for BLASTing and as an option for maker_functional_gff. When I changed to the same version, the error went away. Sad to say, but stuff like different versions of SwissProt/UniProt do accumulate over time a bit... Thank you. Ole On 4 January 2016 at 17:04, Carson Holt wrote: > Perhaps the easiest way to look at this is if you send us the files. I?m > still leaning towards a format error. But it?s the kind of thing where I > would need the files to find the specific entry. > > ?Carson > > > > On Dec 16, 2015, at 11:32 PM, Ole Kristian T?rresen < > ole.toerresen at gmail.com> wrote: > > Here's the hits for GAMO_00029233 > >sp|Q9SUR9|SGT1A_ARATH Protein SGT1 homolog A OS=Arabidopsis thaliana > GN=SGT1A PE=1 SV=1 > >sp|Q9SUT5|SGT1B_ARATH Protein SGT1 homolog B OS=Arabidopsis thaliana > GN=SGT1B PE=1 SV=1 > >sp|Q2KIK0|SGT1_BOVIN Protein SGT1 homolog OS=Bos taurus GN=SUGT1 PE=2 SV=1 > >sp|Q55ED0|SGT1_DICDI Protein SGT1 homolog OS=Dictyostelium discoideum > GN=sugt1 PE=2 SV=1 > >sp|Q9Y2Z0|SGT1_HUMAN Protein SGT1 homolog OS=Homo sapiens GN=SUGT1 PE=1 > SV=3 > >sp|Q9CX34|SGT1_MOUSE Protein SGT1 homolog OS=Mus musculus GN=Sugt1 PE=1 > SV=3 > >sp|Q0JL44|SGT1_ORYSJ Protein SGT1 homolog OS=Oryza sativa subsp. japonica > GN=SGT1 PE=1 SV=1 > >sp|B0BN85|SGT1_RAT Protein SGT1 homolog OS=Rattus norvegicus GN=Sugt1 > PE=2 SV=1 > > The bovin is the first hit. I can't really see anything different about > that. > > I'm don't know perl that well. Do you have some code which I can use to > debug this? In line 58 it tries to access the blast hash with the ID as a > key, if I understand this correctly. Either the hash is empty where the key > tries to access, or the key is empty. If I could print each ID as it is > found, maybe I can find a pattern. And/or print each blast entry when the > blast hash is created. > > Thank you. > > Ole > > On 16 December 2015 at 21:55, Carson Holt wrote: > >> Find the hit for GAMO_00029233 and then pull it?s header line out of the >> Uniprot fasta file. There may be an unexpected formatting difference in >> that header. >> >> ?Carson >> >> >> >> On Dec 16, 2015, at 1:53 PM, Ole Kristian T?rresen < >> ole.toerresen at gmail.com> wrote: >> >> Daniel, >> this is the previous gene, before maker_functional_gff: >> LG08 maker gene 13648888 13656687 . - . >> ID=GAMO_00029212;Name=GAMO_00029212;Alias=maker-LG08-snap-gene-46.325; >> LG08 maker mRNA 13648888 13656687 . - . >> >> ID=GAMO_00029212-RA;Parent=GAMO_00029212;Name=GAMO_00029212-RA;Alias=maker-LG08-snap-gene-46.325-mRNA-1;_AED=0.45;_QI=0|0.83|0.84|1|0.5|0.61|13|1843|351;_eAED=0.45; >> LG08 maker exon 13648888 13648944 . - . >> ID=GAMO_00029212-RA:exon:9363;Parent=GAMO_00029212-RA; >> LG08 maker exon 13649295 13649577 . - . >> ID=GAMO_00029212-RA:exon:9362;Parent=GAMO_00029212-RA; >> LG08 maker exon 13649816 13651468 . - . >> ID=GAMO_00029212-RA:exon:9361;Parent=GAMO_00029212-RA; >> LG08 maker exon 13651736 13651789 . - . >> ID=GAMO_00029212-RA:exon:9360;Parent=GAMO_00029212-RA; >> LG08 maker exon 13652270 13652365 . - . >> ID=GAMO_00029212-RA:exon:9359;Parent=GAMO_00029212-RA; >> LG08 maker exon 13652643 13652730 . - . >> ID=GAMO_00029212-RA:exon:9358;Parent=GAMO_00029212-RA; >> LG08 maker exon 13653175 13653212 . - . >> ID=GAMO_00029212-RA:exon:9357;Parent=GAMO_00029212-RA; >> LG08 maker exon 13653587 13653641 . - . >> ID=GAMO_00029212-RA:exon:9356;Parent=GAMO_00029212-RA; >> LG08 maker exon 13653764 13653817 . - . >> ID=GAMO_00029212-RA:exon:9355;Parent=GAMO_00029212-RA; >> LG08 maker exon 13653910 13653974 . - . >> ID=GAMO_00029212-RA:exon:9354;Parent=GAMO_00029212-RA; >> LG08 maker exon 13654085 13654164 . - . >> ID=GAMO_00029212-RA:exon:9353;Parent=GAMO_00029212-RA; >> LG08 maker exon 13654474 13654828 . - . >> ID=GAMO_00029212-RA:exon:9352;Parent=GAMO_00029212-RA; >> LG08 maker exon 13656667 13656687 . - . >> ID=GAMO_00029212-RA:exon:9351;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13656667 13656687 . - 0 >> ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13654474 13654828 . - 0 >> ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13654085 13654164 . - 2 >> ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13653910 13653974 . - 0 >> ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13653764 13653817 . - 1 >> ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13653587 13653641 . - 1 >> ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13653175 13653212 . - 0 >> ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13652643 13652730 . - 1 >> ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13652270 13652365 . - 0 >> ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13651736 13651789 . - 0 >> ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13651319 13651468 . - 0 >> ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker three_prime_UTR 13649816 13651318 . - >> . ID=GAMO_00029212-RA:three_prime_utr;Parent=GAMO_00029212-RA; >> LG08 maker three_prime_UTR 13649295 13649577 . - >> . ID=GAMO_00029212-RA:three_prime_utr;Parent=GAMO_00029212-RA; >> LG08 maker three_prime_UTR 13648888 13648944 . - >> . ID=GAMO_00029212-RA:three_prime_utr;Parent=GAMO_00029212-RA; >> LG08 maker gene 13786695 13806565 . - . >> ID=GAMO_00029233;Name=GAMO_00029233;Alias=maker-LG08-snap-gene-46.343; >> LG08 maker mRNA 13786695 13806565 . - . >> >> ID=GAMO_00029233-RA;Parent=GAMO_00029233;Name=GAMO_00029233-RA;Alias=maker-LG08-snap-gene-46.343-mRNA-1;_AED=0.47;_QI=173|0.78|0.66|1|0.21|0.26|15|0|301;_eAED=0.47; >> >> After : >> LG08 maker gene 13648888 13656687 . - . >> >> ID=GAMO_00029212;Name=GAMO_00029212;Alias=maker-LG08-snap-gene-46.325;Note=Similar >> to Tmbim1: Protein lifeguard 3 (Mus musculus); >> LG08 maker mRNA 13648888 13656687 . - . >> >> ID=GAMO_00029212-RA;Parent=GAMO_00029212;Name=GAMO_00029212-RA;Alias=maker-LG08-snap-gene-46.325-mRNA-1;_AED=0.45;_QI=0|0.83|0.84|1|0.5|0.61|13|1843|351;_eAED=0.45;Note=Similar >> to Tmbim1: Protein lifeguard 3 (Mus musculus); >> LG08 maker exon 13648888 13648944 . - . >> ID=GAMO_00029212-RA:exon:9363;Parent=GAMO_00029212-RA; >> LG08 maker exon 13649295 13649577 . - . >> ID=GAMO_00029212-RA:exon:9362;Parent=GAMO_00029212-RA; >> LG08 maker exon 13649816 13651468 . - . >> ID=GAMO_00029212-RA:exon:9361;Parent=GAMO_00029212-RA; >> LG08 maker exon 13651736 13651789 . - . >> ID=GAMO_00029212-RA:exon:9360;Parent=GAMO_00029212-RA; >> LG08 maker exon 13652270 13652365 . - . >> ID=GAMO_00029212-RA:exon:9359;Parent=GAMO_00029212-RA; >> LG08 maker exon 13652643 13652730 . - . >> ID=GAMO_00029212-RA:exon:9358;Parent=GAMO_00029212-RA; >> LG08 maker exon 13653175 13653212 . - . >> ID=GAMO_00029212-RA:exon:9357;Parent=GAMO_00029212-RA; >> LG08 maker exon 13653587 13653641 . - . >> ID=GAMO_00029212-RA:exon:9356;Parent=GAMO_00029212-RA; >> LG08 maker exon 13653764 13653817 . - . >> ID=GAMO_00029212-RA:exon:9355;Parent=GAMO_00029212-RA; >> LG08 maker exon 13653910 13653974 . - . >> ID=GAMO_00029212-RA:exon:9354;Parent=GAMO_00029212-RA; >> LG08 maker exon 13654085 13654164 . - . >> ID=GAMO_00029212-RA:exon:9353;Parent=GAMO_00029212-RA; >> LG08 maker exon 13654474 13654828 . - . >> ID=GAMO_00029212-RA:exon:9352;Parent=GAMO_00029212-RA; >> LG08 maker exon 13656667 13656687 . - . >> ID=GAMO_00029212-RA:exon:9351;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13656667 13656687 . - 0 >> ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13654474 13654828 . - 0 >> ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13654085 13654164 . - 2 >> ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13653910 13653974 . - 0 >> ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13653764 13653817 . - 1 >> ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13653587 13653641 . - 1 >> ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13653175 13653212 . - 0 >> ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13652643 13652730 . - 1 >> ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13652270 13652365 . - 0 >> ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13651736 13651789 . - 0 >> ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13651319 13651468 . - 0 >> ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker three_prime_UTR 13649816 13651318 . - >> . ID=GAMO_00029212-RA:three_prime_utr;Parent=GAMO_00029212-RA; >> LG08 maker three_prime_UTR 13649295 13649577 . - >> . ID=GAMO_00029212-RA:three_prime_utr;Parent=GAMO_00029212-RA; >> LG08 maker three_prime_UTR 13648888 13648944 . - >> . ID=GAMO_00029212-RA:three_prime_utr;Parent=GAMO_00029212-RA; >> >> Carson, I saw that, but I did use Uniprot/Swiss-prot. A snap of the >> blast-output used as input here: >> GAMO_00029212-RA sp|Q8BJZ3|LFG3_MOUSE 53.93 280 112 3 >> 81 348 33 307 2e-92 285 >> GAMO_00029212-RA sp|Q969X1|LFG3_HUMAN 54.51 288 103 5 >> 76 347 33 308 4e-92 284 >> GAMO_00029212-RA sp|Q9BWQ8|LFG2_HUMAN 45.73 328 134 6 >> 44 351 13 316 2e-86 270 >> GAMO_00029212-RA sp|Q5R4I4|LFG2_PONAB 45.73 328 134 6 >> 44 351 13 316 3e-86 269 >> GAMO_00029212-RA sp|Q1LZ71|LFG2_BOVIN 45.03 322 145 5 >> 44 351 13 316 5e-84 264 >> GAMO_00029212-RA sp|O88407|LFG2_RAT 44.65 327 139 6 >> 44 351 13 316 8e-83 261 >> GAMO_00029212-RA sp|Q8K097|LFG2_MOUSE 45.16 310 129 5 >> 60 351 31 317 1e-80 255 >> GAMO_00029212-RA sp|Q7Z429|LFG1_HUMAN 39.32 351 164 9 >> 32 351 39 371 6e-69 226 >> GAMO_00029212-RA sp|Q32L53|LFG1_BOVIN 41.69 343 158 8 >> 29 351 46 366 8e-66 218 >> GAMO_00029212-RA sp|Q9ESF4|LFG1_MOUSE 40.43 324 156 8 >> 53 351 34 345 2e-59 201 >> GAMO_00029212-RA sp|Q6P6R0|LFG1_RAT 39.71 345 165 >> 11 34 351 20 348 2e-59 201 >> GAMO_00029212-RA sp|Q9DA39|LFG4_MOUSE 35.59 222 120 7 >> 142 351 27 237 3e-24 103 >> GAMO_00029212-RA sp|Q49P94|GAAP_VACCL 33.47 239 128 9 >> 113 337 1 222 5e-22 97.1 >> GAMO_00029233-RA sp|Q2KIK0|SGT1_BOVIN 53.18 299 100 3 >> 5 268 17 310 5e-89 275 >> GAMO_00029233-RA sp|B0BN85|SGT1_RAT 51.51 299 104 3 >> 5 268 16 308 5e-86 268 >> GAMO_00029233-RA sp|Q9CX34|SGT1_MOUSE 51.51 299 104 3 >> 5 268 16 308 8e-86 267 >> GAMO_00029233-RA sp|Q9Y2Z0|SGT1_HUMAN 46.83 331 100 5 >> 5 268 16 337 1e-80 254 >> GAMO_00029233-RA sp|Q0JL44|SGT1_ORYSJ 30.75 322 160 4 >> 10 268 16 337 5e-36 137 >> GAMO_00029233-RA sp|Q9SUT5|SGT1B_ARATH 27.99 318 171 4 >> 9 268 11 328 3e-35 135 >> GAMO_00029233-RA sp|Q9SUR9|SGT1A_ARATH 28.28 297 159 5 >> 24 268 26 320 7e-35 134 >> GAMO_00029233-RA sp|Q55ED0|SGT1_DICDI 37.72 167 63 3 >> 138 268 196 357 5e-25 107 >> >> 521 genes have had added function before maker_functional_gff choked >> particular gene GAMO_00029233. >> >> Thank you. >> >> Ole >> >> >> On 16 December 2015 at 20:37, Carson Holt wrote: >> >>> I?ve seen this exact same error before ( >>> https://groups.google.com/forum/#!searchin/maker-devel/$2Fmaker_functional_gff$20line$2058/maker-devel/cBuQMKTJj2M/aXGnARZ7JhsJ >>> ). >>> >>> It is caused by the ID from the blast report and input protein >>> fasta. maker_functional_gff is not a generic script that can work on any >>> input, it only works on blast results against Uniprot/Swiss-prot. The >>> script is expecting a very specific header format in both the report and >>> the protein fasta and if it doesn?t see it, then it is missing certain >>> pieces of needed information. >>> >>> Thanks, >>> Carson >>> >>> On Dec 16, 2015, at 12:27 PM, Daniel Ence >>> wrote: >>> >>> Hi Ole, can you send a line for a gene feature that does work? >>> >>> >>> Daniel Ence >>> Graduate Student >>> Eccles Institute of Human Genetics >>> University of Utah >>> 15 North 2030 East, Room 2100 >>> Salt Lake City, UT 84112-5330 >>> >>> On Dec 14, 2015, at 12:21 PM, Ole Kristian T?rresen < >>> ole.toerresen at gmail.com> wrote: >>> >>> Hi, >>> I'm trying to update my annotation with some functional annotations >>> with maker_functional_gff, but get this annoying error: >>> Can't use string ("") as a HASH ref while "strict refs" in use at >>> /cluster/software/VERSIONS/maker-2.31.8/bin/maker_functional_gff line 58, >>> <$IN> line 108947. >>> >>> Line 108947 in the input gff is this: >>> >>> LG08 maker gene 13786695 13806565 . - >>> . >>> ID=GAMO_00029233;Name=GAMO_00029233;Alias=maker-LG08-snap-gene-46.343; >>> >>> It seems like the regexp in line 55 in the maker_functional_gff script >>> doesn't pick up the ID, but I can't see any difference between that line >>> and other similar lines. >>> >>> Any help to trace down this is really appreciated. Do you need any other >>> information? >>> >>> Thank you. >>> >>> Sincerely, >>> >>> Ole Kristian T?rresen >>> >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> >>> >> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From s.shaw at abdn.ac.uk Fri Jan 8 08:05:16 2016 From: s.shaw at abdn.ac.uk (Shaw, Sophie) Date: Fri, 8 Jan 2016 15:05:16 +0000 Subject: [maker-devel] Moving Annotation to New Assembly Message-ID: Dear Maker Team, I have reassembled some data that was previously assembled with different software and then annotated using MAKER. I want to transfer the MAKER annotation to the new fasta file. I've followed the instructions in the post here - https://groups.google.com/forum/#!searchin/maker-devel/est_forward/maker-devel/q9fxXGKO8mk/0ATwhJvZeI4J However all of the information in the final column of the GFF has not been transferred over, just the gene name. For example: The original annotation is as follows: scaffold_252 maker gene 3018 4307 . + . ID=CAUR_05562;Name=CAUR_05562;Alias=augustus_masked-scaffold_252-processed-gene-0.0;Note=Similar to VHS1: Serine/threonine-protein kinase VHS1 (Saccharomyces cerevisiae (strain ATCC 204508 / S288c));Dbxref=Gene3D:G3DSA:1.10.510.10,Gene3D:G3DSA:3.30.200.20,InterPro:IPR000719,InterPro:IPR002290,InterPro:IPR008271,InterPro:IPR011009,InterPro:IPR017441,PANTHER:PTHR24343,PANTHER:PTHR24343:SF90,Pfam:PF00069,ProSitePatterns:PS00107,ProSitePatterns:PS00108,ProSiteProfiles:PS50011,SMART:SM00220,SUPERFAMILY:SSF56112;Ontology_term=GO:0004672,GO:0005524,GO:0006468,GO:0016772; And the new annotation after running MAKER with est_forward=1: scaffold_21 maker gene 18116 19405 . - . ID=maker-scaffold_21-exonerate_est2genome-gene-0.25;Name=CAUR_05562-RA-gene Is there a way of pulling the Note part of the gff file over as well as the gene name (and is this even a correct thing to do - should I be re-running MAKER entirely?). The researchers don't want to lose the information gained from the work on the previous annotation. All the Best, Sophie Shaw - Dr. Sophie Shaw Bioinformatician Centre for Genome Enabled Biology and Medicine University of Aberdeen 23 St. Machar Drive Old Aberdeen AB24 3RY https://www.abdn.ac.uk/genomics/ The University of Aberdeen is a charity registered in Scotland, No SC013683. Tha Oilthigh Obar Dheathain na charthannas cl?raichte ann an Alba, ?ir. SC013683. -------------- next part -------------- An HTML attachment was scrubbed... URL: From hcma at uci.edu Mon Jan 11 17:21:11 2016 From: hcma at uci.edu (hcma) Date: Mon, 11 Jan 2016 16:21:11 -0800 Subject: [maker-devel] basic question for MAKER Message-ID: <2ed9dc6119cdaa218cf453b8390d28e8@uci.edu> Hi, I have some basic questions regarding how to use MAKER. Do I have to download the following file myself? Repeatmasker.gff file genome sequence protein EST I would like to incorporate my RNA-seq data, I have a transcriptome assembly generated using Trinity, how do I incorporate this and can i use MAKER or do i have to use MAKER2? Thanks for your time and any comments will be greatly appreciated. Best Regards Karen From hcma at uci.edu Wed Jan 13 11:09:14 2016 From: hcma at uci.edu (hcma) Date: Wed, 13 Jan 2016 10:09:14 -0800 Subject: [maker-devel] basic question on maker Message-ID: Hi, I would like to include a de novo assembled transcriptome assembly for running maker. The organism i am working with is fly and I am wondering what is the best way to do this? Do I need to get the input files for running Repeatmasker or just set: model_org=all What's the best protein sequence file to use? is ' uniprot_sprot.fasta' ok? Some people use Trinity transcriptome assembly to generate a train set for Augustus and then run maker again, is this a better way than running maker just once? Thanks for your time and any comments will be greatly appreciated. Best Regards Karen From carsonhh at gmail.com Thu Jan 14 13:01:00 2016 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 14 Jan 2016 13:01:00 -0700 Subject: [maker-devel] basic question on maker In-Reply-To: References: Message-ID: <07C1C0FA-243E-4F05-AE17-FBECCE5BC3A5@gmail.com> Hi Karen, All your questions may be best answered from this tutorial on the MAKER wiki ?> http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial_for_GMOD_Online_Training_2014 There is also a video link on the wiki page if you want to follow that. Thanks, Carson > On Jan 13, 2016, at 11:09 AM, hcma wrote: > > Hi, > > I would like to include a de novo assembled transcriptome assembly for running maker. The organism i am working with is fly and I am wondering what is the best way to do this? > > Do I need to get the input files for running Repeatmasker or just set: > > model_org=all > > What's the best protein sequence file to use? > > is ' uniprot_sprot.fasta' ok? > > > Some people use Trinity transcriptome assembly to generate a train set for Augustus and then run maker again, is this a better way than running maker just once? > > > Thanks for your time and any comments will be greatly appreciated. > > Best Regards > Karen > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Jan 14 13:35:10 2016 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 14 Jan 2016 13:35:10 -0700 Subject: [maker-devel] Moving Annotation to New Assembly In-Reply-To: References: Message-ID: <7418369D-6EDB-4C61-B3F7-CF5FFF797FA2@gmail.com> We do not have a tool that will copy over attributes from one GFF3 file to another based off of ID match. Your needs are specific enough that you may have to write a script yourself to copy the attributes you care about. Truthfully I would recommend rerunning interproscan and blastp against swiss-prot, as these could probably use an update as anyways. The est_forward tool used to pull IDs forward is based solely off of alignment (they will not all be exact matches or complete matches - just best matches), so you cannot guarantee that all domain content will be completely identical. Interpro and swiss-prot also get periodically updated, so running these against the most recent releases can give more functional info. The purist in me would be inclined to redo the interproscn analysis and blastp against swiss-prot. Then you can use the maker_functional_gff, ipr_update_gff, and iprscan2gff3 scripts to properly add everything back in a way similar to the previous annotations. ?Carson > On Jan 8, 2016, at 8:05 AM, Shaw, Sophie wrote: > > Dear Maker Team, > > I have reassembled some data that was previously assembled with different software and then annotated using MAKER. I want to transfer the MAKER annotation to the new fasta file. I?ve followed the instructions in the post here - https://groups.google.com/forum/#!searchin/maker-devel/est_forward/maker-devel/q9fxXGKO8mk/0ATwhJvZeI4J > > However all of the information in the final column of the GFF has not been transferred over, just the gene name. For example: > > The original annotation is as follows: > scaffold_252 maker > gene 3018 > 4307 . > + . > ID=CAUR_05562;Name=CAUR_05562;Alias=augustus_masked-scaffold_252-processed-gene-0.0;Note=Similar to VHS1: Serine/threonine-protein kinase VHS1 (Saccharomyces cerevisiae (strain ATCC 204508 / S288c));Dbxref=Gene3D:G3DSA:1.10.510.10,Gene3D:G3DSA:3.30.200.20,InterPro:IPR000719,InterPro:IPR002290,InterPro:IPR008271,InterPro:IPR011009,InterPro:IPR017441,PANTHER:PTHR24343,PANTHER:PTHR24343:SF90,Pfam:PF00069,ProSitePatterns:PS00107,ProSitePatterns:PS00108,ProSiteProfiles:PS50011,SMART:SM00220,SUPERFAMILY:SSF56112;Ontology_term=GO:0004672,GO:0005524,GO:0006468,GO:0016772; > > And the new annotation after running MAKER with est_forward=1: > scaffold_21 maker > gene 18116 > 19405 . > - . > ID=maker-scaffold_21-exonerate_est2genome-gene-0.25;Name=CAUR_05562-RA-gene > > Is there a way of pulling the Note part of the gff file over as well as the gene name (and is this even a correct thing to do - should I be re-running MAKER entirely?). The researchers don?t want to lose the information gained from the work on the previous annotation. > > All the Best, > > Sophie Shaw > > ? > Dr. Sophie Shaw > Bioinformatician > Centre for Genome Enabled Biology and Medicine > University of Aberdeen > 23 St. Machar Drive > Old Aberdeen > AB24 3RY > https://www.abdn.ac.uk/genomics/ > > > > > The University of Aberdeen is a charity registered in Scotland, No SC013683. > Tha Oilthigh Obar Dheathain na charthannas cl?raichte ann an Alba, ?ir. SC013683. > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From hcma at uci.edu Thu Jan 14 16:44:39 2016 From: hcma at uci.edu (hcma) Date: Thu, 14 Jan 2016 15:44:39 -0800 Subject: [maker-devel] basic question on maker In-Reply-To: <07C1C0FA-243E-4F05-AE17-FBECCE5BC3A5@gmail.com> References: <07C1C0FA-243E-4F05-AE17-FBECCE5BC3A5@gmail.com> Message-ID: Hi Carson, Thanks for the link. Can maker2 be run without inputting any protein sequences? How to turn this off in the control files? Also, can i run maker using Augustus and not SNAP? Again, how do i turn SNAP off? Does maker also predict non-coding genes? Thanks. Best Regards Karen On 2016-01-14 12:01, Carson Holt wrote: > Hi Karen, > > All your questions may be best answered from this tutorial on the > MAKER wiki ?> > http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial_for_GMOD_Online_Training_2014 > [1] > > There is also a video link on the wiki page if you want to follow > that. > > Thanks, > Carson > >> On Jan 13, 2016, at 11:09 AM, hcma wrote: >> >> Hi, >> >> I would like to include a de novo assembled transcriptome assembly >> for running maker. The organism i am working with is fly and I am >> wondering what is the best way to do this? >> >> Do I need to get the input files for running Repeatmasker or just >> set: >> >> model_org=all >> >> What's the best protein sequence file to use? >> >> is ' uniprot_sprot.fasta' ok? >> >> Some people use Trinity transcriptome assembly to generate a train >> set for Augustus and then run maker again, is this a better way than >> running maker just once? >> >> Thanks for your time and any comments will be greatly appreciated. >> >> Best Regards >> Karen >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > Links: > ------ > [1] > http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial_for_GMOD_Online_Training_2014 From carsonhh at gmail.com Fri Jan 15 10:16:27 2016 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 15 Jan 2016 10:16:27 -0700 Subject: [maker-devel] basic question on maker In-Reply-To: References: <07C1C0FA-243E-4F05-AE17-FBECCE5BC3A5@gmail.com> Message-ID: > Can maker2 be run without inputting any protein sequences? Yes. But it will not perform as well. > How to turn this off in the control files? Any option left blank is off. > Also, can i run maker using Augustus and not SNAP? Again, how do i turn SNAP off? Yes. Leave it blank. > Does maker also predict non-coding genes? You can run it with tRNAscan or snoscan. Snoscan requires you to have rRNAs from your organism to train with though. ?Carson From hcma at uci.edu Fri Jan 15 15:39:25 2016 From: hcma at uci.edu (hcma) Date: Fri, 15 Jan 2016 14:39:25 -0800 Subject: [maker-devel] basic question on maker In-Reply-To: References: <07C1C0FA-243E-4F05-AE17-FBECCE5BC3A5@gmail.com> Message-ID: Hi Carlson, Regarding non-coding RNA predictions, MAKER only predicts tRNAs and rRNAs, but not other RNAs, for example, lncRNAs? Thanks again. Best Regards Karen On 2016-01-15 09:16, Carson Holt wrote: >> Can maker2 be run without inputting any protein sequences? > > Yes. But it will not perform as well. > >> How to turn this off in the control files? > > Any option left blank is off. > > >> Also, can i run maker using Augustus and not SNAP? Again, how do i >> turn SNAP off? > > Yes. Leave it blank. > > >> Does maker also predict non-coding genes? > > You can run it with tRNAscan or snoscan. Snoscan requires you to have > rRNAs from your organism to train with though. > > ?Carson From dence at genetics.utah.edu Fri Jan 15 15:51:44 2016 From: dence at genetics.utah.edu (Daniel Ence) Date: Fri, 15 Jan 2016 22:51:44 +0000 Subject: [maker-devel] basic question on maker In-Reply-To: References: <07C1C0FA-243E-4F05-AE17-FBECCE5BC3A5@gmail.com> Message-ID: <1C311E8C-20F3-48DB-A982-925AEECD7636@genetics.utah.edu> Hi Karen, I don?t of a unified tool that predicts lncRNAs from genomic sequence. I found a tool that predicts lncRNAs from RNAseq dataset, which you might be able to use for your project. I?ve never used it, but it might be a starting place. http://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-15-311 Here?s also a review that describes several workflows for annotating lncRNAs in insect genomes: http://www.sciencedirect.com/science/article/pii/S2214574515000061 Hope that helps, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 On Jan 15, 2016, at 3:39 PM, hcma > wrote: Hi Carlson, Regarding non-coding RNA predictions, MAKER only predicts tRNAs and rRNAs, but not other RNAs, for example, lncRNAs? Thanks again. Best Regards Karen On 2016-01-15 09:16, Carson Holt wrote: Can maker2 be run without inputting any protein sequences? Yes. But it will not perform as well. How to turn this off in the control files? Any option left blank is off. Also, can i run maker using Augustus and not SNAP? Again, how do i turn SNAP off? Yes. Leave it blank. Does maker also predict non-coding genes? You can run it with tRNAscan or snoscan. Snoscan requires you to have rRNAs from your organism to train with though. ?Carson _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.s.campbell1 at gmail.com Fri Jan 15 18:11:05 2016 From: michael.s.campbell1 at gmail.com (Michael Campbell) Date: Fri, 15 Jan 2016 17:11:05 -0800 Subject: [maker-devel] basic question on maker In-Reply-To: References: <07C1C0FA-243E-4F05-AE17-FBECCE5BC3A5@gmail.com> Message-ID: <980827F7-C4F0-4E26-A480-F588B2E09C7B@gmail.com> Hi Karen, Just a quick clarification, MAKER doesn?t predict the rRNAs. If you give MAKER the rRNA sequence with the O-methylation sites it will run snoscan to predict snoRNAs. Take care, Mike > On Jan 15, 2016, at 2:39 PM, hcma wrote: > > Hi Carlson, > > Regarding non-coding RNA predictions, MAKER only predicts tRNAs and rRNAs, but not other RNAs, for example, lncRNAs? > > Thanks again. > > Best Regards > Karen > > > > > On 2016-01-15 09:16, Carson Holt wrote: >>> Can maker2 be run without inputting any protein sequences? >> Yes. But it will not perform as well. >>> How to turn this off in the control files? >> Any option left blank is off. >>> Also, can i run maker using Augustus and not SNAP? Again, how do i turn SNAP off? >> Yes. Leave it blank. >>> Does maker also predict non-coding genes? >> You can run it with tRNAscan or snoscan. Snoscan requires you to have >> rRNAs from your organism to train with though. >> ?Carson > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From nguyenan at mail.nih.gov Tue Jan 19 13:18:36 2016 From: nguyenan at mail.nih.gov (Nguyen, Anh-Dao (NIH/NHGRI) [C]) Date: Tue, 19 Jan 2016 20:18:36 +0000 Subject: [maker-devel] MAKER version 3 beta Message-ID: Hello, I just wanted to know if MAKER version 3 beta (EVM integration) has already been available for downloading? https://groups.google.com/forum/#!searchin/maker-devel/EVM|sort:date/maker- devel/YzsN-t0gu0U/-A_7YT2gFwAJ Thank you very much! Anh-Dao From carsonhh at gmail.com Tue Jan 19 13:23:54 2016 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 19 Jan 2016 13:23:54 -0700 Subject: [maker-devel] MAKER version 3 beta In-Reply-To: References: Message-ID: <6F128D66-685F-4F7F-9097-2A9065ECBC94@gmail.com> Yes. Go to the registration page for the standard MAKER download. After registering, you will be redirected to a page with links to both the current version of MAKER as well as the beta ?> http://yandell.topaz.genetics.utah.edu/cgi-bin/maker_license.cgi ?Carson > On Jan 19, 2016, at 1:18 PM, Nguyen, Anh-Dao (NIH/NHGRI) [C] wrote: > > Hello, > > I just wanted to know if MAKER version 3 beta (EVM integration) has > already been available for downloading? > > https://groups.google.com/forum/#!searchin/maker-devel/EVM|sort:date/maker- > devel/YzsN-t0gu0U/-A_7YT2gFwAJ > > Thank you very much! > Anh-Dao > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From macmanes at gmail.com Tue Jan 19 13:34:38 2016 From: macmanes at gmail.com (Matthew MacManes) Date: Tue, 19 Jan 2016 15:34:38 -0500 Subject: [maker-devel] MAKER version 3 beta In-Reply-To: <6F128D66-685F-4F7F-9097-2A9065ECBC94@gmail.com> References: <6F128D66-685F-4F7F-9097-2A9065ECBC94@gmail.com> Message-ID: Just checking, when installing from the beta, I still see ./maker -?version 2.32 was expecting 3.00.. Thanks, Matt ______________________________________________ Matthew MacManes, Ph.D. University of New Hampshire? I? Assistant Professor of Genome Enabled Biology Department of Molecular, Cellular, & Biomedical Sciences Durham, NH? 03824 Phone: 603-862-4052? | ?Twitter:?@macmanes??| Web:?genomebio.org Office: 189 Rudman Hall | Laboratory: 145 Rudman Hall On January 19, 2016 at 3:24:16 PM, Carson Holt (carsonhh at gmail.com) wrote: Yes. ?Go to the registration page for the standard MAKER download. After registering, you will be redirected to a page with links to both the current version of MAKER as well as the beta ?>?http://yandell.topaz.genetics.utah.edu/cgi-bin/maker_license.cgi ?Carson On Jan 19, 2016, at 1:18 PM, Nguyen, Anh-Dao (NIH/NHGRI) [C] wrote: Hello, I just wanted to know if MAKER version 3 beta (EVM integration) has already been available for downloading? https://groups.google.com/forum/#!searchin/maker-devel/EVM|sort:date/maker- devel/YzsN-t0gu0U/-A_7YT2gFwAJ Thank you very much! Anh-Dao _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Jan 19 13:35:55 2016 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 19 Jan 2016 13:35:55 -0700 Subject: [maker-devel] MAKER version 3 beta In-Reply-To: References: <6F128D66-685F-4F7F-9097-2A9065ECBC94@gmail.com> Message-ID: Thanks. I?ll fix that. ?Carson > On Jan 19, 2016, at 1:34 PM, Matthew MacManes wrote: > > Just checking, when installing from the beta, I still see > > ./maker -?version > 2.32 > was expecting 3.00.. > > Thanks, Matt > > > > > ______________________________________________ > Matthew MacManes, Ph.D. > University of New Hampshire I Assistant Professor of Genome Enabled Biology > Department of Molecular, Cellular, & Biomedical Sciences > Durham, NH 03824 > Phone: 603-862-4052 | Twitter: @macmanes? | Web: genomebio.org > Office: 189 Rudman Hall | Laboratory: 145 Rudman Hall > > On January 19, 2016 at 3:24:16 PM, Carson Holt (carsonhh at gmail.com ) wrote: > >> Yes. Go to the registration page for the standard MAKER download. After registering, you will be redirected to a page with links to both the current version of MAKER as well as the beta ?> http://yandell.topaz.genetics.utah.edu/cgi-bin/maker_license.cgi >> >> ?Carson >> >> >> >>> On Jan 19, 2016, at 1:18 PM, Nguyen, Anh-Dao (NIH/NHGRI) [C] > wrote: >>> >>> Hello, >>> >>> I just wanted to know if MAKER version 3 beta (EVM integration) has >>> already been available for downloading? >>> >>> https://groups.google.com/forum/#!searchin/maker-devel/EVM|sort:date/maker- >>> devel/YzsN-t0gu0U/-A_7YT2gFwAJ >>> >>> Thank you very much! >>> Anh-Dao >>> >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From daren.card at gmail.com Wed Jan 20 08:27:28 2016 From: daren.card at gmail.com (Daren C. Card) Date: Wed, 20 Jan 2016 09:27:28 -0600 Subject: [maker-devel] Passing pre-masked repeats into Maker Message-ID: <306F04FB-DFFC-4CAB-8289-494FC87F13BA@gmail.com> Hello all, I?m about to use Maker to begin annotating a vertebrate genome. We use successive rounds of RepeatMasker to annotate repeats due to some library issues we?ve noticed with Repbase (at least in our critters) and to incorporate de novo repeats from RepeatModeler, a process I don?t think Maker could match. I?m wonder what the best way to pass these annotations into Maker would be. I see the thread at https://groups.google.com/forum/#!topic/maker-devel/7UbOIvwaaRM nicely outlines what Maker does with repeats, and it looks like I have 3 options: (1) reannotate in Maker, (2) pass in a RepeatMasker GFF, or (3) pass in a masked genome. #1 is problematic due to the reasons above. #2 looks like it would hard mask the complex repeats like we want, but will also hard mask the simple repeats, which wouldn?t be ideal for evidence mapping from transcripts/proteins. #3 is cautioned against in the link above, and without an accompanying GFF, I would imagine that Maker wouldn?t be able to release the masking to perform Exonerate polishing (Ns could be gaps or could be hard masking, it wouldn?t know). The way I thought to get around these apparent issues (but let me know if my thinking is incorrect) is to separate simple and complex repeats from the final RepeatMasker GFF. Feed only the complex repeats into Maker as a GFF, so that they are hard masked and accounted for, and have Maker also run RepeatMasker, thus remaking the simple repeats (and maybe some other complex hits, primarily through RepeatRunner). Then Maker can presumedly release the masking as needed. Would this type of workaround be a good idea or are there other options? Or am I just overthinking something that isn?t really a problem? Thanks in advance for any help. Daren Daren Card Castoe Lab University of Texas at Arlington www.darencard.net -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Jan 20 09:20:51 2016 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 20 Jan 2016 09:20:51 -0700 Subject: [maker-devel] Passing pre-masked repeats into Maker In-Reply-To: <306F04FB-DFFC-4CAB-8289-494FC87F13BA@gmail.com> References: <306F04FB-DFFC-4CAB-8289-494FC87F13BA@gmail.com> Message-ID: <6FD7CE4B-B944-4793-A822-9D395725ED6D@gmail.com> The strategy outlined would work. To get RepeatMasker to call only simple repeats in MAKER, set model_org=simple in the control files. ?Carson > On Jan 20, 2016, at 8:27 AM, Daren C. Card wrote: > > Hello all, > > I?m about to use Maker to begin annotating a vertebrate genome. We use successive rounds of RepeatMasker to annotate repeats due to some library issues we?ve noticed with Repbase (at least in our critters) and to incorporate de novo repeats from RepeatModeler, a process I don?t think Maker could match. I?m wonder what the best way to pass these annotations into Maker would be. > > I see the thread at https://groups.google.com/forum/#!topic/maker-devel/7UbOIvwaaRM nicely outlines what Maker does with repeats, and it looks like I have 3 options: (1) reannotate in Maker, (2) pass in a RepeatMasker GFF, or (3) pass in a masked genome. > > #1 is problematic due to the reasons above. > > #2 looks like it would hard mask the complex repeats like we want, but will also hard mask the simple repeats, which wouldn?t be ideal for evidence mapping from transcripts/proteins. > > #3 is cautioned against in the link above, and without an accompanying GFF, I would imagine that Maker wouldn?t be able to release the masking to perform Exonerate polishing (Ns could be gaps or could be hard masking, it wouldn?t know). > > The way I thought to get around these apparent issues (but let me know if my thinking is incorrect) is to separate simple and complex repeats from the final RepeatMasker GFF. Feed only the complex repeats into Maker as a GFF, so that they are hard masked and accounted for, and have Maker also run RepeatMasker, thus remaking the simple repeats (and maybe some other complex hits, primarily through RepeatRunner). Then Maker can presumedly release the masking as needed. > > Would this type of workaround be a good idea or are there other options? Or am I just overthinking something that isn?t really a problem? > > Thanks in advance for any help. > > Daren > > Daren Card > Castoe Lab > University of Texas at Arlington > www.darencard.net _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Wed Jan 20 09:21:38 2016 From: dence at genetics.utah.edu (Daniel Ence) Date: Wed, 20 Jan 2016 16:21:38 +0000 Subject: [maker-devel] Passing pre-masked repeats into Maker In-Reply-To: <306F04FB-DFFC-4CAB-8289-494FC87F13BA@gmail.com> References: <306F04FB-DFFC-4CAB-8289-494FC87F13BA@gmail.com> Message-ID: HI Daren, I think the solution you described sounds appropriate. If you?re concerned about how the simple repeats will be handled by maker in the gff, then you can just take those out. If they?re important for downstream analysis, you can add them back in then. Let me know if that helps or if other issues arise. Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 On Jan 20, 2016, at 8:27 AM, Daren C. Card > wrote: Hello all, I?m about to use Maker to begin annotating a vertebrate genome. We use successive rounds of RepeatMasker to annotate repeats due to some library issues we?ve noticed with Repbase (at least in our critters) and to incorporate de novo repeats from RepeatModeler, a process I don?t think Maker could match. I?m wonder what the best way to pass these annotations into Maker would be. I see the thread at https://groups.google.com/forum/#!topic/maker-devel/7UbOIvwaaRM nicely outlines what Maker does with repeats, and it looks like I have 3 options: (1) reannotate in Maker, (2) pass in a RepeatMasker GFF, or (3) pass in a masked genome. #1 is problematic due to the reasons above. #2 looks like it would hard mask the complex repeats like we want, but will also hard mask the simple repeats, which wouldn?t be ideal for evidence mapping from transcripts/proteins. #3 is cautioned against in the link above, and without an accompanying GFF, I would imagine that Maker wouldn?t be able to release the masking to perform Exonerate polishing (Ns could be gaps or could be hard masking, it wouldn?t know). The way I thought to get around these apparent issues (but let me know if my thinking is incorrect) is to separate simple and complex repeats from the final RepeatMasker GFF. Feed only the complex repeats into Maker as a GFF, so that they are hard masked and accounted for, and have Maker also run RepeatMasker, thus remaking the simple repeats (and maybe some other complex hits, primarily through RepeatRunner). Then Maker can presumedly release the masking as needed. Would this type of workaround be a good idea or are there other options? Or am I just overthinking something that isn?t really a problem? Thanks in advance for any help. Daren Daren Card Castoe Lab University of Texas at Arlington www.darencard.net _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From jcornel3 at asu.edu Fri Jan 22 14:38:14 2016 From: jcornel3 at asu.edu (John Cornelius) Date: Fri, 22 Jan 2016 14:38:14 -0700 Subject: [maker-devel] Question on post processing of annotations Message-ID: Hi, I'm using the maker_map_ids script to change the gene ids on an annotation that I just finished. However, I noticed that it does not change the name of genes predicted by SNAP. Is there any way to include SNAP genes for consideration by maker_map_ids? Thanks. -- John Cornelius MCB PhD Candidate Arizona State University -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Jan 22 15:01:29 2016 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 22 Jan 2016 15:01:29 -0700 Subject: [maker-devel] Question on post processing of annotations In-Reply-To: References: Message-ID: Hi John, Do you mean the match/match_part features that are source snap_masked? Those are not genes, they are reference alignments representing the ab initio SNAP calls, and it would be incorrect to rename them. They do not have a 1 to 1 relationship with the final gene models. Sometimes a gene model will overlap 2 or more uninformed SNAP ab initio reference alignments, or one SNAP reference alignment may overlap multiple final gene models, so names cannot just be passed from one to the other. If you want to add specific SNAP models to the final annotation set, you would need to upgrade them to being a gene/mRNA/exon/CDS feature before you can do that. You can do that with manual editors like Apollo, or you can supply a subset of the features you want to upgrade to maker in the pred_gff= option as a separate run, put existing models in model_gff=, and run with keep_preds=1. I know I have covered this previously in greater detail as part of the devel list. If you search the archives for the keywords pred_gff, keep_preds, and iprscan you should come across a number of threads that may be helpful ?> https://groups.google.com/forum/#!forum/maker-devel Thanks, Carson > On Jan 22, 2016, at 2:38 PM, John Cornelius wrote: > > Hi, I'm using the maker_map_ids script to change the gene ids on an annotation that I just finished. However, I noticed that it does not change the name of genes predicted by SNAP. Is there any way to include SNAP genes for consideration by maker_map_ids? Thanks. > > -- > John Cornelius > MCB PhD Candidate > Arizona State University > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From jcornel3 at asu.edu Fri Jan 22 15:06:17 2016 From: jcornel3 at asu.edu (John Cornelius) Date: Fri, 22 Jan 2016 15:06:17 -0700 Subject: [maker-devel] Question on post processing of annotations In-Reply-To: References: Message-ID: I'll look into that thanks. I had been previously just been looking for things in regards to the script itself and its functionality. On Fri, Jan 22, 2016 at 3:01 PM, Carson Holt wrote: > Hi John, > > Do you mean the match/match_part features that are source snap_masked? > Those are not genes, they are reference alignments representing the ab > initio SNAP calls, and it would be incorrect to rename them. They do not > have a 1 to 1 relationship with the final gene models. Sometimes a gene > model will overlap 2 or more uninformed SNAP ab initio reference > alignments, or one SNAP reference alignment may overlap multiple final gene > models, so names cannot just be passed from one to the other. > > If you want to add specific SNAP models to the final annotation set, you > would need to upgrade them to being a gene/mRNA/exon/CDS feature before you > can do that. You can do that with manual editors like Apollo, or you can > supply a subset of the features you want to upgrade to maker in the > pred_gff= option as a separate run, put existing models in model_gff=, and > run with keep_preds=1. > > I know I have covered this previously in greater detail as part of the > devel list. If you search the archives for the keywords pred_gff, > keep_preds, and iprscan you should come across a number of threads that may > be helpful ?> https://groups.google.com/forum/#!forum/maker-devel > > Thanks, > Carson > > > > On Jan 22, 2016, at 2:38 PM, John Cornelius wrote: > > Hi, I'm using the maker_map_ids script to change the gene ids on an > annotation that I just finished. However, I noticed that it does not change > the name of genes predicted by SNAP. Is there any way to include SNAP genes > for consideration by maker_map_ids? Thanks. > > -- > John Cornelius > MCB PhD Candidate > Arizona State University > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > -- John Cornelius MCB PhD Candidate Arizona State University -------------- next part -------------- An HTML attachment was scrubbed... URL: From sapuizait at gmail.com Wed Jan 27 06:14:45 2016 From: sapuizait at gmail.com (Panos Sapou) Date: Wed, 27 Jan 2016 14:14:45 +0100 Subject: [maker-devel] prokaryotic genome annotation Message-ID: Dear all I recently started using maker for the annotation of my prokaryotic genomes and even if i managed to get some nice results I would like to check with you if what I did was right and also ask you a couple of questions about the procedure I also apologize in advance if I ask sth silly since I am a newbie in bionformatics and I might ask very basic stuff I have only available DNA sequences, I have no ESTs and no proteins 1) I started by using the protein2genome option and as reference I used the Uniref50 database. Then I generated a merged gff file (similar procedure like the one in the tutorial maker) 2) I used Genemark.S and I created a model by using the gmsn.pl command and as input the assembled contigs of my bacteria 3) after finishing the above 2 steps I run maker again by using as input the gff file from step 1: #-------Re-annotation using maker derived GFF3: maker_gff=input.gff and I also set protein_pass=1 is that correct? do you think it helps? and at the #-----gene prediction I used the hmm.mod file generated in step 2 my questions: Do the above sound correct? it is in my understanding that I can only use genemark for prokaryotic genomes, is that correct? when I run maker the second time (step 3) should I set protein2genome=1 or 0? or just having the gff file (from step 1) in the re-annotation options is enough? and thefore prediction based on the protein2genome has already been done? Also if I use a gff file (from step 1) will it make any difference if I set protein2genome=1 and use an extra (different) database? (I was wondering if it will improve the results?) finally regarding the choice of the database: would you advise me to use uniref or the proteomes of closely related bacteria (I have downloaded and created a single fasta from appx 100 proteomes of closely related bacteria) thank you in advance and once again I apologize if it is pretty basic what I am asking, just wanted to make sure... Best Panos -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Jan 27 08:17:37 2016 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 27 Jan 2016 08:17:37 -0700 Subject: [maker-devel] prokaryotic genome annotation In-Reply-To: References: Message-ID: <032FB687-8EDD-49A7-9198-3A5E7FE04C88@gmail.com> Hi Panos, The strategy for annotating prokaryotes is very different than that for eukaryotes. Basically my recommendation is to use Genemark S and set protein2genome=1, keep_preds=1, always_complete=1, and no need for ESTs (irrelevant in prokaryotes). No need to do multiple iterations like you would for eukaryotes either. The bootstrapping procedure is not relevant for prokaryotes. I?d avoid also using the GFF3 passthrough option, you will lose some information about the alignment that affects reading frame of the protein evidence. It can be convenient for large eukaryotes when you are pulling evidence from a database, but if it?s just from a previous maker run, you should just rerun in the same directory with the protein fasta. MAKER will detect that it already ran blastx and pull the raw reports form the previous datastore. Thanks, Carson > On Jan 27, 2016, at 6:14 AM, Panos Sapou wrote: > > Dear all > > I recently started using maker for the annotation of my prokaryotic genomes and even if i managed to get some nice results I would like to check with you if what I did was right and also ask you a couple of questions about the procedure > > I also apologize in advance if I ask sth silly since I am a newbie in bionformatics and I might ask very basic stuff > > > I have only available DNA sequences, I have no ESTs and no proteins > > 1) I started by using the protein2genome option and as reference I used the Uniref50 database. Then I generated a merged gff file (similar procedure like the one in the tutorial maker) > > 2) I used Genemark.S and I created a model by using the gmsn.pl command and as input the assembled contigs of my bacteria > > 3) after finishing the above 2 steps I run maker again by using as input the gff file from step 1: #-------Re-annotation using maker derived GFF3: maker_gff=input.gff > and I also set > protein_pass=1 > is that correct? do you think it helps? > and at the #-----gene prediction I used the hmm.mod file generated in step 2 > > my questions: > Do the above sound correct? > > it is in my understanding that I can only use genemark for prokaryotic genomes, is that correct? > > when I run maker the second time (step 3) should I set protein2genome=1 or 0? or just having the gff file (from step 1) in the re-annotation options is enough? and thefore prediction based on the protein2genome has already been done? > > Also if I use a gff file (from step 1) will it make any difference if I set protein2genome=1 and use an extra (different) database? (I was wondering if it will improve the results?) > > finally regarding the choice of the database: would you advise me to use uniref or the proteomes of closely related bacteria (I have downloaded and created a single fasta from appx 100 proteomes of closely related bacteria) > > thank you in advance > and once again I apologize if it is pretty basic what I am asking, just wanted to make sure... > > > Best > Panos > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From cjfields at illinois.edu Wed Jan 27 11:30:29 2016 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 27 Jan 2016 18:30:29 +0000 Subject: [maker-devel] prokaryotic genome annotation In-Reply-To: <032FB687-8EDD-49A7-9198-3A5E7FE04C88@gmail.com> References: <032FB687-8EDD-49A7-9198-3A5E7FE04C88@gmail.com> Message-ID: <6B0EA45F-9526-4ED2-AED7-1DA3E1AEDD24@illinois.edu> We?re thinking of switching our bacterial pipeline to MAKER actually. We generally use other bacteria-specific gene pred tools like Glimmer and Prodigal, though I anticipate these could be added using pred_gff (as long as the GFF3 is fine)? chris On Jan 27, 2016, at 9:17 AM, Carson Holt > wrote: Hi Panos, The strategy for annotating prokaryotes is very different than that for eukaryotes. Basically my recommendation is to use Genemark S and set protein2genome=1, keep_preds=1, always_complete=1, and no need for ESTs (irrelevant in prokaryotes). No need to do multiple iterations like you would for eukaryotes either. The bootstrapping procedure is not relevant for prokaryotes. I?d avoid also using the GFF3 passthrough option, you will lose some information about the alignment that affects reading frame of the protein evidence. It can be convenient for large eukaryotes when you are pulling evidence from a database, but if it?s just from a previous maker run, you should just rerun in the same directory with the protein fasta. MAKER will detect that it already ran blastx and pull the raw reports form the previous datastore. Thanks, Carson On Jan 27, 2016, at 6:14 AM, Panos Sapou > wrote: Dear all I recently started using maker for the annotation of my prokaryotic genomes and even if i managed to get some nice results I would like to check with you if what I did was right and also ask you a couple of questions about the procedure I also apologize in advance if I ask sth silly since I am a newbie in bionformatics and I might ask very basic stuff I have only available DNA sequences, I have no ESTs and no proteins 1) I started by using the protein2genome option and as reference I used the Uniref50 database. Then I generated a merged gff file (similar procedure like the one in the tutorial maker) 2) I used Genemark.S and I created a model by using the gmsn.pl command and as input the assembled contigs of my bacteria 3) after finishing the above 2 steps I run maker again by using as input the gff file from step 1: #-------Re-annotation using maker derived GFF3: maker_gff=input.gff and I also set protein_pass=1 is that correct? do you think it helps? and at the #-----gene prediction I used the hmm.mod file generated in step 2 my questions: Do the above sound correct? it is in my understanding that I can only use genemark for prokaryotic genomes, is that correct? when I run maker the second time (step 3) should I set protein2genome=1 or 0? or just having the gff file (from step 1) in the re-annotation options is enough? and thefore prediction based on the protein2genome has already been done? Also if I use a gff file (from step 1) will it make any difference if I set protein2genome=1 and use an extra (different) database? (I was wondering if it will improve the results?) finally regarding the choice of the database: would you advise me to use uniref or the proteomes of closely related bacteria (I have downloaded and created a single fasta from appx 100 proteomes of closely related bacteria) thank you in advance and once again I apologize if it is pretty basic what I am asking, just wanted to make sure... Best Panos _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Jan 27 15:42:59 2016 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 27 Jan 2016 15:42:59 -0700 Subject: [maker-devel] prokaryotic genome annotation In-Reply-To: <6B0EA45F-9526-4ED2-AED7-1DA3E1AEDD24@illinois.edu> References: <032FB687-8EDD-49A7-9198-3A5E7FE04C88@gmail.com> <6B0EA45F-9526-4ED2-AED7-1DA3E1AEDD24@illinois.edu> Message-ID: GFF3 is just fine for adding predictions. For prokaryotes, I don?t like to add protein evidence that way, but for predictions it?s fine. The only issue I could see going forward would be a lack of support for alternate codon usage in MAKER right now. Everything is being interpreted using the canonical codon table. It?s not an insurmountable issue, but it would take some work to let it do that. --Carson > On Jan 27, 2016, at 11:30 AM, Fields, Christopher J wrote: > > We?re thinking of switching our bacterial pipeline to MAKER actually. We generally use other bacteria-specific gene pred tools like Glimmer and Prodigal, though I anticipate these could be added using pred_gff (as long as the GFF3 is fine)? > > chris > >> On Jan 27, 2016, at 9:17 AM, Carson Holt > wrote: >> >> Hi Panos, >> >> The strategy for annotating prokaryotes is very different than that for eukaryotes. Basically my recommendation is to use Genemark S and set protein2genome=1, keep_preds=1, always_complete=1, and no need for ESTs (irrelevant in prokaryotes). No need to do multiple iterations like you would for eukaryotes either. The bootstrapping procedure is not relevant for prokaryotes. I?d avoid also using the GFF3 passthrough option, you will lose some information about the alignment that affects reading frame of the protein evidence. It can be convenient for large eukaryotes when you are pulling evidence from a database, but if it?s just from a previous maker run, you should just rerun in the same directory with the protein fasta. MAKER will detect that it already ran blastx and pull the raw reports form the previous datastore. >> >> Thanks, >> Carson >> >> >> >>> On Jan 27, 2016, at 6:14 AM, Panos Sapou > wrote: >>> >>> Dear all >>> >>> I recently started using maker for the annotation of my prokaryotic genomes and even if i managed to get some nice results I would like to check with you if what I did was right and also ask you a couple of questions about the procedure >>> >>> I also apologize in advance if I ask sth silly since I am a newbie in bionformatics and I might ask very basic stuff >>> >>> >>> I have only available DNA sequences, I have no ESTs and no proteins >>> >>> 1) I started by using the protein2genome option and as reference I used the Uniref50 database. Then I generated a merged gff file (similar procedure like the one in the tutorial maker) >>> >>> 2) I used Genemark.S and I created a model by using the gmsn.pl command and as input the assembled contigs of my bacteria >>> >>> 3) after finishing the above 2 steps I run maker again by using as input the gff file from step 1: #-------Re-annotation using maker derived GFF3: maker_gff=input.gff >>> and I also set >>> protein_pass=1 >>> is that correct? do you think it helps? >>> and at the #-----gene prediction I used the hmm.mod file generated in step 2 >>> >>> my questions: >>> Do the above sound correct? >>> >>> it is in my understanding that I can only use genemark for prokaryotic genomes, is that correct? >>> >>> when I run maker the second time (step 3) should I set protein2genome=1 or 0? or just having the gff file (from step 1) in the re-annotation options is enough? and thefore prediction based on the protein2genome has already been done? >>> >>> Also if I use a gff file (from step 1) will it make any difference if I set protein2genome=1 and use an extra (different) database? (I was wondering if it will improve the results?) >>> >>> finally regarding the choice of the database: would you advise me to use uniref or the proteomes of closely related bacteria (I have downloaded and created a single fasta from appx 100 proteomes of closely related bacteria) >>> >>> thank you in advance >>> and once again I apologize if it is pretty basic what I am asking, just wanted to make sure... >>> >>> >>> Best >>> Panos >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sapuizait at gmail.com Fri Jan 29 03:12:35 2016 From: sapuizait at gmail.com (Panos Sapou) Date: Fri, 29 Jan 2016 11:12:35 +0100 Subject: [maker-devel] prokaryotic genome annotation In-Reply-To: References: Message-ID: Dear all I am trying to annotate a new spiroplasma strain and I would like to know if there is a way to change the stop codons (not take into account 'tga') cause eitherwise I get too many premature stop codons and fragmented genes that are not real Best Panos On 27 January 2016 at 14:14, Panos Sapou wrote: > Dear all > > I recently started using maker for the annotation of my prokaryotic > genomes and even if i managed to get some nice results I would like to > check with you if what I did was right and also ask you a couple of > questions about the procedure > > I also apologize in advance if I ask sth silly since I am a newbie in > bionformatics and I might ask very basic stuff > > > I have only available DNA sequences, I have no ESTs and no proteins > > 1) I started by using the protein2genome option and as reference I used > the Uniref50 database. Then I generated a merged gff file (similar > procedure like the one in the tutorial maker) > > 2) I used Genemark.S and I created a model by using the gmsn.pl command > and as input the assembled contigs of my bacteria > > 3) after finishing the above 2 steps I run maker again by using as input > the gff file from step 1: #-------Re-annotation using maker derived GFF3: > maker_gff=input.gff > and I also set > protein_pass=1 > is that correct? do you think it helps? > and at the #-----gene prediction I used the hmm.mod file generated in step > 2 > > my questions: > Do the above sound correct? > > it is in my understanding that I can only use genemark for prokaryotic > genomes, is that correct? > > when I run maker the second time (step 3) should I set protein2genome=1 or > 0? or just having the gff file (from step 1) in the re-annotation options > is enough? and thefore prediction based on the protein2genome has already > been done? > > Also if I use a gff file (from step 1) will it make any difference if I > set protein2genome=1 and use an extra (different) database? (I was > wondering if it will improve the results?) > > finally regarding the choice of the database: would you advise me to use > uniref or the proteomes of closely related bacteria (I have downloaded and > created a single fasta from appx 100 proteomes of closely related bacteria) > > thank you in advance > and once again I apologize if it is pretty basic what I am asking, just > wanted to make sure... > > > Best > Panos > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Sun Jan 31 13:43:21 2016 From: carsonhh at gmail.com (Carson Holt) Date: Sun, 31 Jan 2016 13:43:21 -0700 Subject: [maker-devel] prokaryotic genome annotation In-Reply-To: References: Message-ID: MAKER doesn?t support alternate codon usage yet. ?Carson > On Jan 29, 2016, at 3:12 AM, Panos Sapou wrote: > > Dear all > > I am trying to annotate a new spiroplasma strain and I would like to know if there is a way to change the stop codons (not take into account 'tga') > > cause eitherwise I get too many premature stop codons and fragmented genes that are not real > > Best > Panos > > On 27 January 2016 at 14:14, Panos Sapou > wrote: > Dear all > > I recently started using maker for the annotation of my prokaryotic genomes and even if i managed to get some nice results I would like to check with you if what I did was right and also ask you a couple of questions about the procedure > > I also apologize in advance if I ask sth silly since I am a newbie in bionformatics and I might ask very basic stuff > > > I have only available DNA sequences, I have no ESTs and no proteins > > 1) I started by using the protein2genome option and as reference I used the Uniref50 database. Then I generated a merged gff file (similar procedure like the one in the tutorial maker) > > 2) I used Genemark.S and I created a model by using the gmsn.pl command and as input the assembled contigs of my bacteria > > 3) after finishing the above 2 steps I run maker again by using as input the gff file from step 1: #-------Re-annotation using maker derived GFF3: maker_gff=input.gff > and I also set > protein_pass=1 > is that correct? do you think it helps? > and at the #-----gene prediction I used the hmm.mod file generated in step 2 > > my questions: > Do the above sound correct? > > it is in my understanding that I can only use genemark for prokaryotic genomes, is that correct? > > when I run maker the second time (step 3) should I set protein2genome=1 or 0? or just having the gff file (from step 1) in the re-annotation options is enough? and thefore prediction based on the protein2genome has already been done? > > Also if I use a gff file (from step 1) will it make any difference if I set protein2genome=1 and use an extra (different) database? (I was wondering if it will improve the results?) > > finally regarding the choice of the database: would you advise me to use uniref or the proteomes of closely related bacteria (I have downloaded and created a single fasta from appx 100 proteomes of closely related bacteria) > > thank you in advance > and once again I apologize if it is pretty basic what I am asking, just wanted to make sure... > > > Best > Panos > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From arnstrm at gmail.com Fri Jan 1 11:31:06 2016 From: arnstrm at gmail.com (Arun Seetharam) Date: Fri, 1 Jan 2016 12:31:06 -0600 Subject: [maker-devel] running MAKER to merge annotations Message-ID: Hi all, First of all, a very happy new year to all of you! I hope everyone is having a great holiday season. I have a question about Maker. For my grass species, I ran 3 separate rounds of MAKER (low GC, regular GC and high GC) and I now have 3 maker gff files. So, what is the correct way to merge these files to a single gff file? Do I have to run a maker round with just the GFF files as input? It looks like EVM especially meant to do this kind of job, but not sure if Maker does this too. Thanks for any help or suggestions! Have a nice day, -- Arun Seetharam Post Doctoral Research Associate Genome Informatics Facility & EEOB Office of Biotechnology 228 Science I Iowa State University Ames, Iowa 50011 -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Fri Jan 1 11:37:38 2016 From: dence at genetics.utah.edu (Daniel Ence) Date: Fri, 1 Jan 2016 18:37:38 +0000 Subject: [maker-devel] running MAKER to merge annotations In-Reply-To: References: Message-ID: Hi Arun, are the three rounds of maker on different parts/versions of the genome or did you maker on the same genome with three different settings? If it?s the former, then you can merge the maker gff files with gff3_merge, which is included with your maker installation. If it?s the latter case then I do think EVM could help if you want to give the different result sets different confidence weights. If you want to give them all the same weight, then you could do another run of maker, and pass them through as either models or predictions. ~Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 > On Jan 1, 2016, at 11:31 AM, Arun Seetharam wrote: > > Hi all, > > First of all, a very happy new year to all of you! I hope everyone is having a great holiday season. > > I have a question about Maker. For my grass species, I ran 3 separate rounds of MAKER (low GC, regular GC and high GC) and I now have 3 maker gff files. So, what is the correct way to merge these files to a single gff file? Do I have to run a maker round with just the GFF files as input? It looks like EVM especially meant to do this kind of job, but not sure if Maker does this too. > > Thanks for any help or suggestions! > > Have a nice day, > -- > Arun Seetharam > Post Doctoral Research Associate > Genome Informatics Facility & EEOB > Office of Biotechnology > 228 Science I > Iowa State University > Ames, Iowa 50011 > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From arnstrm at gmail.com Fri Jan 1 12:17:18 2016 From: arnstrm at gmail.com (Arun Seetharam) Date: Fri, 1 Jan 2016 13:17:18 -0600 Subject: [maker-devel] running MAKER to merge annotations In-Reply-To: References: Message-ID: Hi Daniel, Thanks very much for the reply! It is the latter: same input genome under 3 settings (training was done using a different set of genes for all the gene predictors). I simply want to get a single gff, retaining only the best model(s) for each locus. Are you suggesting that I can run MAKER by providing 3 files for "maker_gff" (in maker_opts.ctl) and keeping everything else default? or do I have to do something in the CTL file to achieve this? I appreciate if you can provide more details for how to do this! Thanks once again for the reply! On Fri, Jan 1, 2016 at 12:37 PM, Daniel Ence wrote: > Hi Arun, are the three rounds of maker on different parts/versions of the > genome or did you maker on the same genome with three different settings? > If it?s the former, then you can merge the maker gff files with gff3_merge, > which is included with your maker installation. > > If it?s the latter case then I do think EVM could help if you want to give > the different result sets different confidence weights. If you want to give > them all the same weight, then you could do another run of maker, and pass > them through as either models or predictions. > > ~Daniel > > > > > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > > > On Jan 1, 2016, at 11:31 AM, Arun Seetharam wrote: > > > > Hi all, > > > > First of all, a very happy new year to all of you! I hope everyone is > having a great holiday season. > > > > I have a question about Maker. For my grass species, I ran 3 separate > rounds of MAKER (low GC, regular GC and high GC) and I now have 3 maker gff > files. So, what is the correct way to merge these files to a single gff > file? Do I have to run a maker round with just the GFF files as input? It > looks like EVM especially meant to do this kind of job, but not sure if > Maker does this too. > > > > Thanks for any help or suggestions! > > > > Have a nice day, > > -- > > Arun Seetharam > > Post Doctoral Research Associate > > Genome Informatics Facility & EEOB > > Office of Biotechnology > > 228 Science I > > Iowa State University > > Ames, Iowa 50011 > > _______________________________________________ > > maker-devel mailing list > > maker-devel at box290.bluehost.com > > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -- Arun Seetharam Post Doctoral Research Associate Genome Informatics Facility & EEOB Office of Biotechnology 228 Science I Iowa State University Ames, Iowa 50011 -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Jan 1 12:26:14 2016 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 1 Jan 2016 12:26:14 -0700 Subject: [maker-devel] running MAKER to merge annotations In-Reply-To: References: Message-ID: <28388091-0956-412B-B472-0A272FA31269@gmail.com> If you are running with different settings on the exact same contig, you will have to merge the models using the -l legacy option of gff3_merge to ensure there will be no ID collisions (some things will have the same IDs in the different runs). Then supply just the genes to pred_gff on the rerun. Alternatively you could have just provided your different predictor files as a comma separated list (i.e. snaphmm=hmm1,hmm2,hmm3). MAKER would have ran each one and kept just the one that best matched the evidence. However because MAKER passes hints to the predictors (which override the HMM for the most part), I have found that running with different predictor settings because of GC differences between contigs doesn?t provide the benefit you would think. ?Carson > On Jan 1, 2016, at 12:17 PM, Arun Seetharam wrote: > > Hi Daniel, > > Thanks very much for the reply! It is the latter: same input genome under 3 settings (training was done using a different set of genes for all the gene predictors). I simply want to get a single gff, retaining only the best model(s) for each locus. > Are you suggesting that I can run MAKER by providing 3 files for "maker_gff" (in maker_opts.ctl) and keeping everything else default? or do I have to do something in the CTL file to achieve this? I appreciate if you can provide more details for how to do this! > > Thanks once again for the reply! > > On Fri, Jan 1, 2016 at 12:37 PM, Daniel Ence > wrote: > Hi Arun, are the three rounds of maker on different parts/versions of the genome or did you maker on the same genome with three different settings? If it?s the former, then you can merge the maker gff files with gff3_merge, which is included with your maker installation. > > If it?s the latter case then I do think EVM could help if you want to give the different result sets different confidence weights. If you want to give them all the same weight, then you could do another run of maker, and pass them through as either models or predictions. > > ~Daniel > > > > > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > > > On Jan 1, 2016, at 11:31 AM, Arun Seetharam > wrote: > > > > Hi all, > > > > First of all, a very happy new year to all of you! I hope everyone is having a great holiday season. > > > > I have a question about Maker. For my grass species, I ran 3 separate rounds of MAKER (low GC, regular GC and high GC) and I now have 3 maker gff files. So, what is the correct way to merge these files to a single gff file? Do I have to run a maker round with just the GFF files as input? It looks like EVM especially meant to do this kind of job, but not sure if Maker does this too. > > > > Thanks for any help or suggestions! > > > > Have a nice day, > > -- > > Arun Seetharam > > Post Doctoral Research Associate > > Genome Informatics Facility & EEOB > > Office of Biotechnology > > 228 Science I > > Iowa State University > > Ames, Iowa 50011 > > _______________________________________________ > > maker-devel mailing list > > maker-devel at box290.bluehost.com > > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > > -- > Arun Seetharam > Post Doctoral Research Associate > Genome Informatics Facility & EEOB > Office of Biotechnology > 228 Science I > Iowa State University > Ames, Iowa 50011 > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From zhangzb554 at nenu.edu.cn Sat Jan 2 09:09:24 2016 From: zhangzb554 at nenu.edu.cn (=?UTF-8?B?5byg5b+X5paM?=) Date: Sun, 3 Jan 2016 00:09:24 +0800 (GMT+08:00) Subject: [maker-devel] =?utf-8?q?maker-devel_Digest=2C_Vol_92=2C_Issue_1?= In-Reply-To: Message-ID: Hi every one
I wonder where I can downlaod the perl package proc::signal? I can not find it in CPAN. who could send me the package or give me the website where i can get it ?

thans for your help At 2016-01-02 03:00:02, maker-devel-request at yandell-lab.org wrote: >Send maker-devel mailing list submissions to > maker-devel at yandell-lab.org > >To subscribe or unsubscribe via the World Wide Web, visit > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > >or, via email, send a message with subject or body 'help' to > maker-devel-request at yandell-lab.org > >You can reach the person managing the list at > maker-devel-owner at yandell-lab.org > >When replying, please edit your Subject line so it is more specific >than "Re: Contents of maker-devel digest..." > > >Today's Topics: > > 1. running MAKER to merge annotations (Arun Seetharam) > 2. Re: running MAKER to merge annotations (Daniel Ence) > > >---------------------------------------------------------------------- > >Message: 1 >Date: Fri, 1 Jan 2016 12:31:06 -0600 >From: Arun Seetharam >To: maker-devel at yandell-lab.org >Subject: [maker-devel] running MAKER to merge annotations >Message-ID: > >Content-Type: text/plain; charset="utf-8" > >Hi all, > >First of all, a very happy new year to all of you! I hope everyone is >having a great holiday season. > >I have a question about Maker. For my grass species, I ran 3 separate >rounds of MAKER (low GC, regular GC and high GC) and I now have 3 maker gff >files. So, what is the correct way to merge these files to a single gff >file? Do I have to run a maker round with just the GFF files as input? It >looks like EVM especially meant to do this kind of job, but not sure if >Maker does this too. > >Thanks for any help or suggestions! > >Have a nice day, >-- >Arun Seetharam >Post Doctoral Research Associate >Genome Informatics Facility & EEOB >Office of Biotechnology >228 Science I >Iowa State University >Ames, Iowa 50011 >-------------- next part -------------- >An HTML attachment was scrubbed... >URL: > >------------------------------ > >Message: 2 >Date: Fri, 1 Jan 2016 18:37:38 +0000 >From: Daniel Ence >To: Arun Seetharam >Cc: "maker-devel at yandell-lab.org" >Subject: Re: [maker-devel] running MAKER to merge annotations >Message-ID: >Content-Type: text/plain; charset="utf-8" > >Hi Arun, are the three rounds of maker on different parts/versions of the genome or did you maker on the same genome with three different settings? If it?s the former, then you can merge the maker gff files with gff3_merge, which is included with your maker installation. > >If it?s the latter case then I do think EVM could help if you want to give the different result sets different confidence weights. If you want to give them all the same weight, then you could do another run of maker, and pass them through as either models or predictions. > >~Daniel > > > > > >Daniel Ence >Graduate Student >Eccles Institute of Human Genetics >University of Utah >15 North 2030 East, Room 2100 >Salt Lake City, UT 84112-5330 > >> On Jan 1, 2016, at 11:31 AM, Arun Seetharam wrote: >> >> Hi all, >> >> First of all, a very happy new year to all of you! I hope everyone is having a great holiday season. >> >> I have a question about Maker. For my grass species, I ran 3 separate rounds of MAKER (low GC, regular GC and high GC) and I now have 3 maker gff files. So, what is the correct way to merge these files to a single gff file? Do I have to run a maker round with just the GFF files as input? It looks like EVM especially meant to do this kind of job, but not sure if Maker does this too. >> >> Thanks for any help or suggestions! >> >> Have a nice day, >> -- >> Arun Seetharam >> Post Doctoral Research Associate >> Genome Informatics Facility & EEOB >> Office of Biotechnology >> 228 Science I >> Iowa State University >> Ames, Iowa 50011 >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > >------------------------------ > >Subject: Digest Footer > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > >------------------------------ > >End of maker-devel Digest, Vol 92, Issue 1 >****************************************** From arnstrm at iastate.edu Fri Jan 1 11:05:35 2016 From: arnstrm at iastate.edu (Arun Seetharam) Date: Fri, 1 Jan 2016 12:05:35 -0600 Subject: [maker-devel] running MAKER to merge annotations Message-ID: Hi all, First of all, a very happy new year to all of you! I hope everyone is having a great holiday season. I have a question about Maker. For my grass species, I ran 3 separate rounds of MAKER (low GC, regular GC and high GC) and I now have 3 maker gff files. So, what is the correct way to merge these files to a single gff file? Do I have to run a maker round with just the GFF files as input? It looks like EVM especially meant to do this kind of job, but not sure if Maker does this too. Thanks for any help or suggestions! Have a nice day, This email has been sent from a virus-free computer protected by Avast. www.avast.com <#DDB4FAA8-2DD7-40BB-A1B8-4E2AA1F9FDF2> -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Jan 4 09:04:29 2016 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 4 Jan 2016 09:04:29 -0700 Subject: [maker-devel] Error with maker_functional_gff In-Reply-To: References: <1EBE8B59-ED4E-4017-99CE-6CD5A5662B74@genetics.utah.edu> Message-ID: Perhaps the easiest way to look at this is if you send us the files. I?m still leaning towards a format error. But it?s the kind of thing where I would need the files to find the specific entry. ?Carson > On Dec 16, 2015, at 11:32 PM, Ole Kristian T?rresen wrote: > > Here's the hits for GAMO_00029233 > >sp|Q9SUR9|SGT1A_ARATH Protein SGT1 homolog A OS=Arabidopsis thaliana GN=SGT1A PE=1 SV=1 > >sp|Q9SUT5|SGT1B_ARATH Protein SGT1 homolog B OS=Arabidopsis thaliana GN=SGT1B PE=1 SV=1 > >sp|Q2KIK0|SGT1_BOVIN Protein SGT1 homolog OS=Bos taurus GN=SUGT1 PE=2 SV=1 > >sp|Q55ED0|SGT1_DICDI Protein SGT1 homolog OS=Dictyostelium discoideum GN=sugt1 PE=2 SV=1 > >sp|Q9Y2Z0|SGT1_HUMAN Protein SGT1 homolog OS=Homo sapiens GN=SUGT1 PE=1 SV=3 > >sp|Q9CX34|SGT1_MOUSE Protein SGT1 homolog OS=Mus musculus GN=Sugt1 PE=1 SV=3 > >sp|Q0JL44|SGT1_ORYSJ Protein SGT1 homolog OS=Oryza sativa subsp. japonica GN=SGT1 PE=1 SV=1 > >sp|B0BN85|SGT1_RAT Protein SGT1 homolog OS=Rattus norvegicus GN=Sugt1 PE=2 SV=1 > > The bovin is the first hit. I can't really see anything different about that. > > I'm don't know perl that well. Do you have some code which I can use to debug this? In line 58 it tries to access the blast hash with the ID as a key, if I understand this correctly. Either the hash is empty where the key tries to access, or the key is empty. If I could print each ID as it is found, maybe I can find a pattern. And/or print each blast entry when the blast hash is created. > > Thank you. > > Ole > > On 16 December 2015 at 21:55, Carson Holt > wrote: > Find the hit for GAMO_00029233 and then pull it?s header line out of the Uniprot fasta file. There may be an unexpected formatting difference in that header. > > ?Carson > > > >> On Dec 16, 2015, at 1:53 PM, Ole Kristian T?rresen > wrote: >> >> Daniel, >> this is the previous gene, before maker_functional_gff: >> LG08 maker gene 13648888 13656687 . - . ID=GAMO_00029212;Name=GAMO_00029212;Alias=maker-LG08-snap-gene-46.325; >> LG08 maker mRNA 13648888 13656687 . - . ID=GAMO_00029212-RA;Parent=GAMO_00029212;Name=GAMO_00029212-RA;Alias=maker-LG08-snap-gene-46.325-mRNA-1;_AED=0.45;_QI=0|0.83|0.84|1|0.5|0.61|13|1843|351;_eAED=0.45; >> LG08 maker exon 13648888 13648944 . - . ID=GAMO_00029212-RA:exon:9363;Parent=GAMO_00029212-RA; >> LG08 maker exon 13649295 13649577 . - . ID=GAMO_00029212-RA:exon:9362;Parent=GAMO_00029212-RA; >> LG08 maker exon 13649816 13651468 . - . ID=GAMO_00029212-RA:exon:9361;Parent=GAMO_00029212-RA; >> LG08 maker exon 13651736 13651789 . - . ID=GAMO_00029212-RA:exon:9360;Parent=GAMO_00029212-RA; >> LG08 maker exon 13652270 13652365 . - . ID=GAMO_00029212-RA:exon:9359;Parent=GAMO_00029212-RA; >> LG08 maker exon 13652643 13652730 . - . ID=GAMO_00029212-RA:exon:9358;Parent=GAMO_00029212-RA; >> LG08 maker exon 13653175 13653212 . - . ID=GAMO_00029212-RA:exon:9357;Parent=GAMO_00029212-RA; >> LG08 maker exon 13653587 13653641 . - . ID=GAMO_00029212-RA:exon:9356;Parent=GAMO_00029212-RA; >> LG08 maker exon 13653764 13653817 . - . ID=GAMO_00029212-RA:exon:9355;Parent=GAMO_00029212-RA; >> LG08 maker exon 13653910 13653974 . - . ID=GAMO_00029212-RA:exon:9354;Parent=GAMO_00029212-RA; >> LG08 maker exon 13654085 13654164 . - . ID=GAMO_00029212-RA:exon:9353;Parent=GAMO_00029212-RA; >> LG08 maker exon 13654474 13654828 . - . ID=GAMO_00029212-RA:exon:9352;Parent=GAMO_00029212-RA; >> LG08 maker exon 13656667 13656687 . - . ID=GAMO_00029212-RA:exon:9351;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13656667 13656687 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13654474 13654828 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13654085 13654164 . - 2 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13653910 13653974 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13653764 13653817 . - 1 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13653587 13653641 . - 1 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13653175 13653212 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13652643 13652730 . - 1 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13652270 13652365 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13651736 13651789 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13651319 13651468 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker three_prime_UTR 13649816 13651318 . - . ID=GAMO_00029212-RA:three_prime_utr;Parent=GAMO_00029212-RA; >> LG08 maker three_prime_UTR 13649295 13649577 . - . ID=GAMO_00029212-RA:three_prime_utr;Parent=GAMO_00029212-RA; >> LG08 maker three_prime_UTR 13648888 13648944 . - . ID=GAMO_00029212-RA:three_prime_utr;Parent=GAMO_00029212-RA; >> LG08 maker gene 13786695 13806565 . - . ID=GAMO_00029233;Name=GAMO_00029233;Alias=maker-LG08-snap-gene-46.343; >> LG08 maker mRNA 13786695 13806565 . - . ID=GAMO_00029233-RA;Parent=GAMO_00029233;Name=GAMO_00029233-RA;Alias=maker-LG08-snap-gene-46.343-mRNA-1;_AED=0.47;_QI=173|0.78|0.66|1|0.21|0.26|15|0|301;_eAED=0.47; >> >> After : >> LG08 maker gene 13648888 13656687 . - . ID=GAMO_00029212;Name=GAMO_00029212;Alias=maker-LG08-snap-gene-46.325;Note=Similar to Tmbim1: Protein lifeguard 3 (Mus musculus); >> LG08 maker mRNA 13648888 13656687 . - . ID=GAMO_00029212-RA;Parent=GAMO_00029212;Name=GAMO_00029212-RA;Alias=maker-LG08-snap-gene-46.325-mRNA-1;_AED=0.45;_QI=0|0.83|0.84|1|0.5|0.61|13|1843|351;_eAED=0.45;Note=Similar to Tmbim1: Protein lifeguard 3 (Mus musculus); >> LG08 maker exon 13648888 13648944 . - . ID=GAMO_00029212-RA:exon:9363;Parent=GAMO_00029212-RA; >> LG08 maker exon 13649295 13649577 . - . ID=GAMO_00029212-RA:exon:9362;Parent=GAMO_00029212-RA; >> LG08 maker exon 13649816 13651468 . - . ID=GAMO_00029212-RA:exon:9361;Parent=GAMO_00029212-RA; >> LG08 maker exon 13651736 13651789 . - . ID=GAMO_00029212-RA:exon:9360;Parent=GAMO_00029212-RA; >> LG08 maker exon 13652270 13652365 . - . ID=GAMO_00029212-RA:exon:9359;Parent=GAMO_00029212-RA; >> LG08 maker exon 13652643 13652730 . - . ID=GAMO_00029212-RA:exon:9358;Parent=GAMO_00029212-RA; >> LG08 maker exon 13653175 13653212 . - . ID=GAMO_00029212-RA:exon:9357;Parent=GAMO_00029212-RA; >> LG08 maker exon 13653587 13653641 . - . ID=GAMO_00029212-RA:exon:9356;Parent=GAMO_00029212-RA; >> LG08 maker exon 13653764 13653817 . - . ID=GAMO_00029212-RA:exon:9355;Parent=GAMO_00029212-RA; >> LG08 maker exon 13653910 13653974 . - . ID=GAMO_00029212-RA:exon:9354;Parent=GAMO_00029212-RA; >> LG08 maker exon 13654085 13654164 . - . ID=GAMO_00029212-RA:exon:9353;Parent=GAMO_00029212-RA; >> LG08 maker exon 13654474 13654828 . - . ID=GAMO_00029212-RA:exon:9352;Parent=GAMO_00029212-RA; >> LG08 maker exon 13656667 13656687 . - . ID=GAMO_00029212-RA:exon:9351;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13656667 13656687 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13654474 13654828 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13654085 13654164 . - 2 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13653910 13653974 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13653764 13653817 . - 1 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13653587 13653641 . - 1 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13653175 13653212 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13652643 13652730 . - 1 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13652270 13652365 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13651736 13651789 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13651319 13651468 . - 0 ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker three_prime_UTR 13649816 13651318 . - . ID=GAMO_00029212-RA:three_prime_utr;Parent=GAMO_00029212-RA; >> LG08 maker three_prime_UTR 13649295 13649577 . - . ID=GAMO_00029212-RA:three_prime_utr;Parent=GAMO_00029212-RA; >> LG08 maker three_prime_UTR 13648888 13648944 . - . ID=GAMO_00029212-RA:three_prime_utr;Parent=GAMO_00029212-RA; >> >> Carson, I saw that, but I did use Uniprot/Swiss-prot. A snap of the blast-output used as input here: >> GAMO_00029212-RA sp|Q8BJZ3|LFG3_MOUSE 53.93 280 112 3 81 348 33 307 2e-92 285 >> GAMO_00029212-RA sp|Q969X1|LFG3_HUMAN 54.51 288 103 5 76 347 33 308 4e-92 284 >> GAMO_00029212-RA sp|Q9BWQ8|LFG2_HUMAN 45.73 328 134 6 44 351 13 316 2e-86 270 >> GAMO_00029212-RA sp|Q5R4I4|LFG2_PONAB 45.73 328 134 6 44 351 13 316 3e-86 269 >> GAMO_00029212-RA sp|Q1LZ71|LFG2_BOVIN 45.03 322 145 5 44 351 13 316 5e-84 264 >> GAMO_00029212-RA sp|O88407|LFG2_RAT 44.65 327 139 6 44 351 13 316 8e-83 261 >> GAMO_00029212-RA sp|Q8K097|LFG2_MOUSE 45.16 310 129 5 60 351 31 317 1e-80 255 >> GAMO_00029212-RA sp|Q7Z429|LFG1_HUMAN 39.32 351 164 9 32 351 39 371 6e-69 226 >> GAMO_00029212-RA sp|Q32L53|LFG1_BOVIN 41.69 343 158 8 29 351 46 366 8e-66 218 >> GAMO_00029212-RA sp|Q9ESF4|LFG1_MOUSE 40.43 324 156 8 53 351 34 345 2e-59 201 >> GAMO_00029212-RA sp|Q6P6R0|LFG1_RAT 39.71 345 165 11 34 351 20 348 2e-59 201 >> GAMO_00029212-RA sp|Q9DA39|LFG4_MOUSE 35.59 222 120 7 142 351 27 237 3e-24 103 >> GAMO_00029212-RA sp|Q49P94|GAAP_VACCL 33.47 239 128 9 113 337 1 222 5e-22 97.1 >> GAMO_00029233-RA sp|Q2KIK0|SGT1_BOVIN 53.18 299 100 3 5 268 17 310 5e-89 275 >> GAMO_00029233-RA sp|B0BN85|SGT1_RAT 51.51 299 104 3 5 268 16 308 5e-86 268 >> GAMO_00029233-RA sp|Q9CX34|SGT1_MOUSE 51.51 299 104 3 5 268 16 308 8e-86 267 >> GAMO_00029233-RA sp|Q9Y2Z0|SGT1_HUMAN 46.83 331 100 5 5 268 16 337 1e-80 254 >> GAMO_00029233-RA sp|Q0JL44|SGT1_ORYSJ 30.75 322 160 4 10 268 16 337 5e-36 137 >> GAMO_00029233-RA sp|Q9SUT5|SGT1B_ARATH 27.99 318 171 4 9 268 11 328 3e-35 135 >> GAMO_00029233-RA sp|Q9SUR9|SGT1A_ARATH 28.28 297 159 5 24 268 26 320 7e-35 134 >> GAMO_00029233-RA sp|Q55ED0|SGT1_DICDI 37.72 167 63 3 138 268 196 357 5e-25 107 >> >> 521 genes have had added function before maker_functional_gff choked particular gene GAMO_00029233. >> >> Thank you. >> >> Ole >> >> >> On 16 December 2015 at 20:37, Carson Holt > wrote: >> I?ve seen this exact same error before (https://groups.google.com/forum/#!searchin/maker-devel/$2Fmaker_functional_gff$20line$2058/maker-devel/cBuQMKTJj2M/aXGnARZ7JhsJ ). >> >> It is caused by the ID from the blast report and input protein fasta. maker_functional_gff is not a generic script that can work on any input, it only works on blast results against Uniprot/Swiss-prot. The script is expecting a very specific header format in both the report and the protein fasta and if it doesn?t see it, then it is missing certain pieces of needed information. >> >> Thanks, >> Carson >> >>> On Dec 16, 2015, at 12:27 PM, Daniel Ence > wrote: >>> >>> Hi Ole, can you send a line for a gene feature that does work? >>> >>> >>> Daniel Ence >>> Graduate Student >>> Eccles Institute of Human Genetics >>> University of Utah >>> 15 North 2030 East, Room 2100 >>> Salt Lake City, UT 84112-5330 >>> >>>> On Dec 14, 2015, at 12:21 PM, Ole Kristian T?rresen > wrote: >>>> >>>> Hi, >>>> I'm trying to update my annotation with some functional annotations with maker_functional_gff, but get this annoying error: >>>> Can't use string ("") as a HASH ref while "strict refs" in use at /cluster/software/VERSIONS/maker-2.31.8/bin/maker_functional_gff line 58, <$IN> line 108947. >>>> Line 108947 in the input gff is this: >>>> >>>> LG08 maker gene 13786695 13806565 . - . ID=GAMO_00029233;Name=GAMO_00029233;Alias=maker-LG08-snap-gene-46.343; >>>> It seems like the regexp in line 55 in the maker_functional_gff script doesn't pick up the ID, but I can't see any difference between that line and other similar lines. >>>> >>>> Any help to trace down this is really appreciated. Do you need any other information? >>>> >>>> Thank you. >>>> >>>> Sincerely, >>>> >>>> Ole Kristian T?rresen >>>> >>>> >>>> >>>> _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at box290.bluehost.com >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ole.toerresen at gmail.com Mon Jan 4 12:08:43 2016 From: ole.toerresen at gmail.com (=?UTF-8?Q?Ole_Kristian_T=C3=B8rresen?=) Date: Mon, 4 Jan 2016 20:08:43 +0100 Subject: [maker-devel] Error with maker_functional_gff In-Reply-To: References: <1EBE8B59-ED4E-4017-99CE-6CD5A5662B74@genetics.utah.edu> Message-ID: I found the mistake, I used different versions of SwissProt/UniProt for BLASTing and as an option for maker_functional_gff. When I changed to the same version, the error went away. Sad to say, but stuff like different versions of SwissProt/UniProt do accumulate over time a bit... Thank you. Ole On 4 January 2016 at 17:04, Carson Holt wrote: > Perhaps the easiest way to look at this is if you send us the files. I?m > still leaning towards a format error. But it?s the kind of thing where I > would need the files to find the specific entry. > > ?Carson > > > > On Dec 16, 2015, at 11:32 PM, Ole Kristian T?rresen < > ole.toerresen at gmail.com> wrote: > > Here's the hits for GAMO_00029233 > >sp|Q9SUR9|SGT1A_ARATH Protein SGT1 homolog A OS=Arabidopsis thaliana > GN=SGT1A PE=1 SV=1 > >sp|Q9SUT5|SGT1B_ARATH Protein SGT1 homolog B OS=Arabidopsis thaliana > GN=SGT1B PE=1 SV=1 > >sp|Q2KIK0|SGT1_BOVIN Protein SGT1 homolog OS=Bos taurus GN=SUGT1 PE=2 SV=1 > >sp|Q55ED0|SGT1_DICDI Protein SGT1 homolog OS=Dictyostelium discoideum > GN=sugt1 PE=2 SV=1 > >sp|Q9Y2Z0|SGT1_HUMAN Protein SGT1 homolog OS=Homo sapiens GN=SUGT1 PE=1 > SV=3 > >sp|Q9CX34|SGT1_MOUSE Protein SGT1 homolog OS=Mus musculus GN=Sugt1 PE=1 > SV=3 > >sp|Q0JL44|SGT1_ORYSJ Protein SGT1 homolog OS=Oryza sativa subsp. japonica > GN=SGT1 PE=1 SV=1 > >sp|B0BN85|SGT1_RAT Protein SGT1 homolog OS=Rattus norvegicus GN=Sugt1 > PE=2 SV=1 > > The bovin is the first hit. I can't really see anything different about > that. > > I'm don't know perl that well. Do you have some code which I can use to > debug this? In line 58 it tries to access the blast hash with the ID as a > key, if I understand this correctly. Either the hash is empty where the key > tries to access, or the key is empty. If I could print each ID as it is > found, maybe I can find a pattern. And/or print each blast entry when the > blast hash is created. > > Thank you. > > Ole > > On 16 December 2015 at 21:55, Carson Holt wrote: > >> Find the hit for GAMO_00029233 and then pull it?s header line out of the >> Uniprot fasta file. There may be an unexpected formatting difference in >> that header. >> >> ?Carson >> >> >> >> On Dec 16, 2015, at 1:53 PM, Ole Kristian T?rresen < >> ole.toerresen at gmail.com> wrote: >> >> Daniel, >> this is the previous gene, before maker_functional_gff: >> LG08 maker gene 13648888 13656687 . - . >> ID=GAMO_00029212;Name=GAMO_00029212;Alias=maker-LG08-snap-gene-46.325; >> LG08 maker mRNA 13648888 13656687 . - . >> >> ID=GAMO_00029212-RA;Parent=GAMO_00029212;Name=GAMO_00029212-RA;Alias=maker-LG08-snap-gene-46.325-mRNA-1;_AED=0.45;_QI=0|0.83|0.84|1|0.5|0.61|13|1843|351;_eAED=0.45; >> LG08 maker exon 13648888 13648944 . - . >> ID=GAMO_00029212-RA:exon:9363;Parent=GAMO_00029212-RA; >> LG08 maker exon 13649295 13649577 . - . >> ID=GAMO_00029212-RA:exon:9362;Parent=GAMO_00029212-RA; >> LG08 maker exon 13649816 13651468 . - . >> ID=GAMO_00029212-RA:exon:9361;Parent=GAMO_00029212-RA; >> LG08 maker exon 13651736 13651789 . - . >> ID=GAMO_00029212-RA:exon:9360;Parent=GAMO_00029212-RA; >> LG08 maker exon 13652270 13652365 . - . >> ID=GAMO_00029212-RA:exon:9359;Parent=GAMO_00029212-RA; >> LG08 maker exon 13652643 13652730 . - . >> ID=GAMO_00029212-RA:exon:9358;Parent=GAMO_00029212-RA; >> LG08 maker exon 13653175 13653212 . - . >> ID=GAMO_00029212-RA:exon:9357;Parent=GAMO_00029212-RA; >> LG08 maker exon 13653587 13653641 . - . >> ID=GAMO_00029212-RA:exon:9356;Parent=GAMO_00029212-RA; >> LG08 maker exon 13653764 13653817 . - . >> ID=GAMO_00029212-RA:exon:9355;Parent=GAMO_00029212-RA; >> LG08 maker exon 13653910 13653974 . - . >> ID=GAMO_00029212-RA:exon:9354;Parent=GAMO_00029212-RA; >> LG08 maker exon 13654085 13654164 . - . >> ID=GAMO_00029212-RA:exon:9353;Parent=GAMO_00029212-RA; >> LG08 maker exon 13654474 13654828 . - . >> ID=GAMO_00029212-RA:exon:9352;Parent=GAMO_00029212-RA; >> LG08 maker exon 13656667 13656687 . - . >> ID=GAMO_00029212-RA:exon:9351;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13656667 13656687 . - 0 >> ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13654474 13654828 . - 0 >> ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13654085 13654164 . - 2 >> ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13653910 13653974 . - 0 >> ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13653764 13653817 . - 1 >> ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13653587 13653641 . - 1 >> ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13653175 13653212 . - 0 >> ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13652643 13652730 . - 1 >> ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13652270 13652365 . - 0 >> ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13651736 13651789 . - 0 >> ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13651319 13651468 . - 0 >> ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker three_prime_UTR 13649816 13651318 . - >> . ID=GAMO_00029212-RA:three_prime_utr;Parent=GAMO_00029212-RA; >> LG08 maker three_prime_UTR 13649295 13649577 . - >> . ID=GAMO_00029212-RA:three_prime_utr;Parent=GAMO_00029212-RA; >> LG08 maker three_prime_UTR 13648888 13648944 . - >> . ID=GAMO_00029212-RA:three_prime_utr;Parent=GAMO_00029212-RA; >> LG08 maker gene 13786695 13806565 . - . >> ID=GAMO_00029233;Name=GAMO_00029233;Alias=maker-LG08-snap-gene-46.343; >> LG08 maker mRNA 13786695 13806565 . - . >> >> ID=GAMO_00029233-RA;Parent=GAMO_00029233;Name=GAMO_00029233-RA;Alias=maker-LG08-snap-gene-46.343-mRNA-1;_AED=0.47;_QI=173|0.78|0.66|1|0.21|0.26|15|0|301;_eAED=0.47; >> >> After : >> LG08 maker gene 13648888 13656687 . - . >> >> ID=GAMO_00029212;Name=GAMO_00029212;Alias=maker-LG08-snap-gene-46.325;Note=Similar >> to Tmbim1: Protein lifeguard 3 (Mus musculus); >> LG08 maker mRNA 13648888 13656687 . - . >> >> ID=GAMO_00029212-RA;Parent=GAMO_00029212;Name=GAMO_00029212-RA;Alias=maker-LG08-snap-gene-46.325-mRNA-1;_AED=0.45;_QI=0|0.83|0.84|1|0.5|0.61|13|1843|351;_eAED=0.45;Note=Similar >> to Tmbim1: Protein lifeguard 3 (Mus musculus); >> LG08 maker exon 13648888 13648944 . - . >> ID=GAMO_00029212-RA:exon:9363;Parent=GAMO_00029212-RA; >> LG08 maker exon 13649295 13649577 . - . >> ID=GAMO_00029212-RA:exon:9362;Parent=GAMO_00029212-RA; >> LG08 maker exon 13649816 13651468 . - . >> ID=GAMO_00029212-RA:exon:9361;Parent=GAMO_00029212-RA; >> LG08 maker exon 13651736 13651789 . - . >> ID=GAMO_00029212-RA:exon:9360;Parent=GAMO_00029212-RA; >> LG08 maker exon 13652270 13652365 . - . >> ID=GAMO_00029212-RA:exon:9359;Parent=GAMO_00029212-RA; >> LG08 maker exon 13652643 13652730 . - . >> ID=GAMO_00029212-RA:exon:9358;Parent=GAMO_00029212-RA; >> LG08 maker exon 13653175 13653212 . - . >> ID=GAMO_00029212-RA:exon:9357;Parent=GAMO_00029212-RA; >> LG08 maker exon 13653587 13653641 . - . >> ID=GAMO_00029212-RA:exon:9356;Parent=GAMO_00029212-RA; >> LG08 maker exon 13653764 13653817 . - . >> ID=GAMO_00029212-RA:exon:9355;Parent=GAMO_00029212-RA; >> LG08 maker exon 13653910 13653974 . - . >> ID=GAMO_00029212-RA:exon:9354;Parent=GAMO_00029212-RA; >> LG08 maker exon 13654085 13654164 . - . >> ID=GAMO_00029212-RA:exon:9353;Parent=GAMO_00029212-RA; >> LG08 maker exon 13654474 13654828 . - . >> ID=GAMO_00029212-RA:exon:9352;Parent=GAMO_00029212-RA; >> LG08 maker exon 13656667 13656687 . - . >> ID=GAMO_00029212-RA:exon:9351;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13656667 13656687 . - 0 >> ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13654474 13654828 . - 0 >> ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13654085 13654164 . - 2 >> ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13653910 13653974 . - 0 >> ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13653764 13653817 . - 1 >> ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13653587 13653641 . - 1 >> ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13653175 13653212 . - 0 >> ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13652643 13652730 . - 1 >> ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13652270 13652365 . - 0 >> ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13651736 13651789 . - 0 >> ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker CDS 13651319 13651468 . - 0 >> ID=GAMO_00029212-RA:cds;Parent=GAMO_00029212-RA; >> LG08 maker three_prime_UTR 13649816 13651318 . - >> . ID=GAMO_00029212-RA:three_prime_utr;Parent=GAMO_00029212-RA; >> LG08 maker three_prime_UTR 13649295 13649577 . - >> . ID=GAMO_00029212-RA:three_prime_utr;Parent=GAMO_00029212-RA; >> LG08 maker three_prime_UTR 13648888 13648944 . - >> . ID=GAMO_00029212-RA:three_prime_utr;Parent=GAMO_00029212-RA; >> >> Carson, I saw that, but I did use Uniprot/Swiss-prot. A snap of the >> blast-output used as input here: >> GAMO_00029212-RA sp|Q8BJZ3|LFG3_MOUSE 53.93 280 112 3 >> 81 348 33 307 2e-92 285 >> GAMO_00029212-RA sp|Q969X1|LFG3_HUMAN 54.51 288 103 5 >> 76 347 33 308 4e-92 284 >> GAMO_00029212-RA sp|Q9BWQ8|LFG2_HUMAN 45.73 328 134 6 >> 44 351 13 316 2e-86 270 >> GAMO_00029212-RA sp|Q5R4I4|LFG2_PONAB 45.73 328 134 6 >> 44 351 13 316 3e-86 269 >> GAMO_00029212-RA sp|Q1LZ71|LFG2_BOVIN 45.03 322 145 5 >> 44 351 13 316 5e-84 264 >> GAMO_00029212-RA sp|O88407|LFG2_RAT 44.65 327 139 6 >> 44 351 13 316 8e-83 261 >> GAMO_00029212-RA sp|Q8K097|LFG2_MOUSE 45.16 310 129 5 >> 60 351 31 317 1e-80 255 >> GAMO_00029212-RA sp|Q7Z429|LFG1_HUMAN 39.32 351 164 9 >> 32 351 39 371 6e-69 226 >> GAMO_00029212-RA sp|Q32L53|LFG1_BOVIN 41.69 343 158 8 >> 29 351 46 366 8e-66 218 >> GAMO_00029212-RA sp|Q9ESF4|LFG1_MOUSE 40.43 324 156 8 >> 53 351 34 345 2e-59 201 >> GAMO_00029212-RA sp|Q6P6R0|LFG1_RAT 39.71 345 165 >> 11 34 351 20 348 2e-59 201 >> GAMO_00029212-RA sp|Q9DA39|LFG4_MOUSE 35.59 222 120 7 >> 142 351 27 237 3e-24 103 >> GAMO_00029212-RA sp|Q49P94|GAAP_VACCL 33.47 239 128 9 >> 113 337 1 222 5e-22 97.1 >> GAMO_00029233-RA sp|Q2KIK0|SGT1_BOVIN 53.18 299 100 3 >> 5 268 17 310 5e-89 275 >> GAMO_00029233-RA sp|B0BN85|SGT1_RAT 51.51 299 104 3 >> 5 268 16 308 5e-86 268 >> GAMO_00029233-RA sp|Q9CX34|SGT1_MOUSE 51.51 299 104 3 >> 5 268 16 308 8e-86 267 >> GAMO_00029233-RA sp|Q9Y2Z0|SGT1_HUMAN 46.83 331 100 5 >> 5 268 16 337 1e-80 254 >> GAMO_00029233-RA sp|Q0JL44|SGT1_ORYSJ 30.75 322 160 4 >> 10 268 16 337 5e-36 137 >> GAMO_00029233-RA sp|Q9SUT5|SGT1B_ARATH 27.99 318 171 4 >> 9 268 11 328 3e-35 135 >> GAMO_00029233-RA sp|Q9SUR9|SGT1A_ARATH 28.28 297 159 5 >> 24 268 26 320 7e-35 134 >> GAMO_00029233-RA sp|Q55ED0|SGT1_DICDI 37.72 167 63 3 >> 138 268 196 357 5e-25 107 >> >> 521 genes have had added function before maker_functional_gff choked >> particular gene GAMO_00029233. >> >> Thank you. >> >> Ole >> >> >> On 16 December 2015 at 20:37, Carson Holt wrote: >> >>> I?ve seen this exact same error before ( >>> https://groups.google.com/forum/#!searchin/maker-devel/$2Fmaker_functional_gff$20line$2058/maker-devel/cBuQMKTJj2M/aXGnARZ7JhsJ >>> ). >>> >>> It is caused by the ID from the blast report and input protein >>> fasta. maker_functional_gff is not a generic script that can work on any >>> input, it only works on blast results against Uniprot/Swiss-prot. The >>> script is expecting a very specific header format in both the report and >>> the protein fasta and if it doesn?t see it, then it is missing certain >>> pieces of needed information. >>> >>> Thanks, >>> Carson >>> >>> On Dec 16, 2015, at 12:27 PM, Daniel Ence >>> wrote: >>> >>> Hi Ole, can you send a line for a gene feature that does work? >>> >>> >>> Daniel Ence >>> Graduate Student >>> Eccles Institute of Human Genetics >>> University of Utah >>> 15 North 2030 East, Room 2100 >>> Salt Lake City, UT 84112-5330 >>> >>> On Dec 14, 2015, at 12:21 PM, Ole Kristian T?rresen < >>> ole.toerresen at gmail.com> wrote: >>> >>> Hi, >>> I'm trying to update my annotation with some functional annotations >>> with maker_functional_gff, but get this annoying error: >>> Can't use string ("") as a HASH ref while "strict refs" in use at >>> /cluster/software/VERSIONS/maker-2.31.8/bin/maker_functional_gff line 58, >>> <$IN> line 108947. >>> >>> Line 108947 in the input gff is this: >>> >>> LG08 maker gene 13786695 13806565 . - >>> . >>> ID=GAMO_00029233;Name=GAMO_00029233;Alias=maker-LG08-snap-gene-46.343; >>> >>> It seems like the regexp in line 55 in the maker_functional_gff script >>> doesn't pick up the ID, but I can't see any difference between that line >>> and other similar lines. >>> >>> Any help to trace down this is really appreciated. Do you need any other >>> information? >>> >>> Thank you. >>> >>> Sincerely, >>> >>> Ole Kristian T?rresen >>> >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> >>> >> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From s.shaw at abdn.ac.uk Fri Jan 8 08:05:16 2016 From: s.shaw at abdn.ac.uk (Shaw, Sophie) Date: Fri, 8 Jan 2016 15:05:16 +0000 Subject: [maker-devel] Moving Annotation to New Assembly Message-ID: Dear Maker Team, I have reassembled some data that was previously assembled with different software and then annotated using MAKER. I want to transfer the MAKER annotation to the new fasta file. I've followed the instructions in the post here - https://groups.google.com/forum/#!searchin/maker-devel/est_forward/maker-devel/q9fxXGKO8mk/0ATwhJvZeI4J However all of the information in the final column of the GFF has not been transferred over, just the gene name. For example: The original annotation is as follows: scaffold_252 maker gene 3018 4307 . + . ID=CAUR_05562;Name=CAUR_05562;Alias=augustus_masked-scaffold_252-processed-gene-0.0;Note=Similar to VHS1: Serine/threonine-protein kinase VHS1 (Saccharomyces cerevisiae (strain ATCC 204508 / S288c));Dbxref=Gene3D:G3DSA:1.10.510.10,Gene3D:G3DSA:3.30.200.20,InterPro:IPR000719,InterPro:IPR002290,InterPro:IPR008271,InterPro:IPR011009,InterPro:IPR017441,PANTHER:PTHR24343,PANTHER:PTHR24343:SF90,Pfam:PF00069,ProSitePatterns:PS00107,ProSitePatterns:PS00108,ProSiteProfiles:PS50011,SMART:SM00220,SUPERFAMILY:SSF56112;Ontology_term=GO:0004672,GO:0005524,GO:0006468,GO:0016772; And the new annotation after running MAKER with est_forward=1: scaffold_21 maker gene 18116 19405 . - . ID=maker-scaffold_21-exonerate_est2genome-gene-0.25;Name=CAUR_05562-RA-gene Is there a way of pulling the Note part of the gff file over as well as the gene name (and is this even a correct thing to do - should I be re-running MAKER entirely?). The researchers don't want to lose the information gained from the work on the previous annotation. All the Best, Sophie Shaw - Dr. Sophie Shaw Bioinformatician Centre for Genome Enabled Biology and Medicine University of Aberdeen 23 St. Machar Drive Old Aberdeen AB24 3RY https://www.abdn.ac.uk/genomics/ The University of Aberdeen is a charity registered in Scotland, No SC013683. Tha Oilthigh Obar Dheathain na charthannas cl?raichte ann an Alba, ?ir. SC013683. -------------- next part -------------- An HTML attachment was scrubbed... URL: From hcma at uci.edu Mon Jan 11 17:21:11 2016 From: hcma at uci.edu (hcma) Date: Mon, 11 Jan 2016 16:21:11 -0800 Subject: [maker-devel] basic question for MAKER Message-ID: <2ed9dc6119cdaa218cf453b8390d28e8@uci.edu> Hi, I have some basic questions regarding how to use MAKER. Do I have to download the following file myself? Repeatmasker.gff file genome sequence protein EST I would like to incorporate my RNA-seq data, I have a transcriptome assembly generated using Trinity, how do I incorporate this and can i use MAKER or do i have to use MAKER2? Thanks for your time and any comments will be greatly appreciated. Best Regards Karen From hcma at uci.edu Wed Jan 13 11:09:14 2016 From: hcma at uci.edu (hcma) Date: Wed, 13 Jan 2016 10:09:14 -0800 Subject: [maker-devel] basic question on maker Message-ID: Hi, I would like to include a de novo assembled transcriptome assembly for running maker. The organism i am working with is fly and I am wondering what is the best way to do this? Do I need to get the input files for running Repeatmasker or just set: model_org=all What's the best protein sequence file to use? is ' uniprot_sprot.fasta' ok? Some people use Trinity transcriptome assembly to generate a train set for Augustus and then run maker again, is this a better way than running maker just once? Thanks for your time and any comments will be greatly appreciated. Best Regards Karen From carsonhh at gmail.com Thu Jan 14 13:01:00 2016 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 14 Jan 2016 13:01:00 -0700 Subject: [maker-devel] basic question on maker In-Reply-To: References: Message-ID: <07C1C0FA-243E-4F05-AE17-FBECCE5BC3A5@gmail.com> Hi Karen, All your questions may be best answered from this tutorial on the MAKER wiki ?> http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial_for_GMOD_Online_Training_2014 There is also a video link on the wiki page if you want to follow that. Thanks, Carson > On Jan 13, 2016, at 11:09 AM, hcma wrote: > > Hi, > > I would like to include a de novo assembled transcriptome assembly for running maker. The organism i am working with is fly and I am wondering what is the best way to do this? > > Do I need to get the input files for running Repeatmasker or just set: > > model_org=all > > What's the best protein sequence file to use? > > is ' uniprot_sprot.fasta' ok? > > > Some people use Trinity transcriptome assembly to generate a train set for Augustus and then run maker again, is this a better way than running maker just once? > > > Thanks for your time and any comments will be greatly appreciated. > > Best Regards > Karen > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Jan 14 13:35:10 2016 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 14 Jan 2016 13:35:10 -0700 Subject: [maker-devel] Moving Annotation to New Assembly In-Reply-To: References: Message-ID: <7418369D-6EDB-4C61-B3F7-CF5FFF797FA2@gmail.com> We do not have a tool that will copy over attributes from one GFF3 file to another based off of ID match. Your needs are specific enough that you may have to write a script yourself to copy the attributes you care about. Truthfully I would recommend rerunning interproscan and blastp against swiss-prot, as these could probably use an update as anyways. The est_forward tool used to pull IDs forward is based solely off of alignment (they will not all be exact matches or complete matches - just best matches), so you cannot guarantee that all domain content will be completely identical. Interpro and swiss-prot also get periodically updated, so running these against the most recent releases can give more functional info. The purist in me would be inclined to redo the interproscn analysis and blastp against swiss-prot. Then you can use the maker_functional_gff, ipr_update_gff, and iprscan2gff3 scripts to properly add everything back in a way similar to the previous annotations. ?Carson > On Jan 8, 2016, at 8:05 AM, Shaw, Sophie wrote: > > Dear Maker Team, > > I have reassembled some data that was previously assembled with different software and then annotated using MAKER. I want to transfer the MAKER annotation to the new fasta file. I?ve followed the instructions in the post here - https://groups.google.com/forum/#!searchin/maker-devel/est_forward/maker-devel/q9fxXGKO8mk/0ATwhJvZeI4J > > However all of the information in the final column of the GFF has not been transferred over, just the gene name. For example: > > The original annotation is as follows: > scaffold_252 maker > gene 3018 > 4307 . > + . > ID=CAUR_05562;Name=CAUR_05562;Alias=augustus_masked-scaffold_252-processed-gene-0.0;Note=Similar to VHS1: Serine/threonine-protein kinase VHS1 (Saccharomyces cerevisiae (strain ATCC 204508 / S288c));Dbxref=Gene3D:G3DSA:1.10.510.10,Gene3D:G3DSA:3.30.200.20,InterPro:IPR000719,InterPro:IPR002290,InterPro:IPR008271,InterPro:IPR011009,InterPro:IPR017441,PANTHER:PTHR24343,PANTHER:PTHR24343:SF90,Pfam:PF00069,ProSitePatterns:PS00107,ProSitePatterns:PS00108,ProSiteProfiles:PS50011,SMART:SM00220,SUPERFAMILY:SSF56112;Ontology_term=GO:0004672,GO:0005524,GO:0006468,GO:0016772; > > And the new annotation after running MAKER with est_forward=1: > scaffold_21 maker > gene 18116 > 19405 . > - . > ID=maker-scaffold_21-exonerate_est2genome-gene-0.25;Name=CAUR_05562-RA-gene > > Is there a way of pulling the Note part of the gff file over as well as the gene name (and is this even a correct thing to do - should I be re-running MAKER entirely?). The researchers don?t want to lose the information gained from the work on the previous annotation. > > All the Best, > > Sophie Shaw > > ? > Dr. Sophie Shaw > Bioinformatician > Centre for Genome Enabled Biology and Medicine > University of Aberdeen > 23 St. Machar Drive > Old Aberdeen > AB24 3RY > https://www.abdn.ac.uk/genomics/ > > > > > The University of Aberdeen is a charity registered in Scotland, No SC013683. > Tha Oilthigh Obar Dheathain na charthannas cl?raichte ann an Alba, ?ir. SC013683. > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From hcma at uci.edu Thu Jan 14 16:44:39 2016 From: hcma at uci.edu (hcma) Date: Thu, 14 Jan 2016 15:44:39 -0800 Subject: [maker-devel] basic question on maker In-Reply-To: <07C1C0FA-243E-4F05-AE17-FBECCE5BC3A5@gmail.com> References: <07C1C0FA-243E-4F05-AE17-FBECCE5BC3A5@gmail.com> Message-ID: Hi Carson, Thanks for the link. Can maker2 be run without inputting any protein sequences? How to turn this off in the control files? Also, can i run maker using Augustus and not SNAP? Again, how do i turn SNAP off? Does maker also predict non-coding genes? Thanks. Best Regards Karen On 2016-01-14 12:01, Carson Holt wrote: > Hi Karen, > > All your questions may be best answered from this tutorial on the > MAKER wiki ?> > http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial_for_GMOD_Online_Training_2014 > [1] > > There is also a video link on the wiki page if you want to follow > that. > > Thanks, > Carson > >> On Jan 13, 2016, at 11:09 AM, hcma wrote: >> >> Hi, >> >> I would like to include a de novo assembled transcriptome assembly >> for running maker. The organism i am working with is fly and I am >> wondering what is the best way to do this? >> >> Do I need to get the input files for running Repeatmasker or just >> set: >> >> model_org=all >> >> What's the best protein sequence file to use? >> >> is ' uniprot_sprot.fasta' ok? >> >> Some people use Trinity transcriptome assembly to generate a train >> set for Augustus and then run maker again, is this a better way than >> running maker just once? >> >> Thanks for your time and any comments will be greatly appreciated. >> >> Best Regards >> Karen >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > Links: > ------ > [1] > http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial_for_GMOD_Online_Training_2014 From carsonhh at gmail.com Fri Jan 15 10:16:27 2016 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 15 Jan 2016 10:16:27 -0700 Subject: [maker-devel] basic question on maker In-Reply-To: References: <07C1C0FA-243E-4F05-AE17-FBECCE5BC3A5@gmail.com> Message-ID: > Can maker2 be run without inputting any protein sequences? Yes. But it will not perform as well. > How to turn this off in the control files? Any option left blank is off. > Also, can i run maker using Augustus and not SNAP? Again, how do i turn SNAP off? Yes. Leave it blank. > Does maker also predict non-coding genes? You can run it with tRNAscan or snoscan. Snoscan requires you to have rRNAs from your organism to train with though. ?Carson From hcma at uci.edu Fri Jan 15 15:39:25 2016 From: hcma at uci.edu (hcma) Date: Fri, 15 Jan 2016 14:39:25 -0800 Subject: [maker-devel] basic question on maker In-Reply-To: References: <07C1C0FA-243E-4F05-AE17-FBECCE5BC3A5@gmail.com> Message-ID: Hi Carlson, Regarding non-coding RNA predictions, MAKER only predicts tRNAs and rRNAs, but not other RNAs, for example, lncRNAs? Thanks again. Best Regards Karen On 2016-01-15 09:16, Carson Holt wrote: >> Can maker2 be run without inputting any protein sequences? > > Yes. But it will not perform as well. > >> How to turn this off in the control files? > > Any option left blank is off. > > >> Also, can i run maker using Augustus and not SNAP? Again, how do i >> turn SNAP off? > > Yes. Leave it blank. > > >> Does maker also predict non-coding genes? > > You can run it with tRNAscan or snoscan. Snoscan requires you to have > rRNAs from your organism to train with though. > > ?Carson From dence at genetics.utah.edu Fri Jan 15 15:51:44 2016 From: dence at genetics.utah.edu (Daniel Ence) Date: Fri, 15 Jan 2016 22:51:44 +0000 Subject: [maker-devel] basic question on maker In-Reply-To: References: <07C1C0FA-243E-4F05-AE17-FBECCE5BC3A5@gmail.com> Message-ID: <1C311E8C-20F3-48DB-A982-925AEECD7636@genetics.utah.edu> Hi Karen, I don?t of a unified tool that predicts lncRNAs from genomic sequence. I found a tool that predicts lncRNAs from RNAseq dataset, which you might be able to use for your project. I?ve never used it, but it might be a starting place. http://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-15-311 Here?s also a review that describes several workflows for annotating lncRNAs in insect genomes: http://www.sciencedirect.com/science/article/pii/S2214574515000061 Hope that helps, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 On Jan 15, 2016, at 3:39 PM, hcma > wrote: Hi Carlson, Regarding non-coding RNA predictions, MAKER only predicts tRNAs and rRNAs, but not other RNAs, for example, lncRNAs? Thanks again. Best Regards Karen On 2016-01-15 09:16, Carson Holt wrote: Can maker2 be run without inputting any protein sequences? Yes. But it will not perform as well. How to turn this off in the control files? Any option left blank is off. Also, can i run maker using Augustus and not SNAP? Again, how do i turn SNAP off? Yes. Leave it blank. Does maker also predict non-coding genes? You can run it with tRNAscan or snoscan. Snoscan requires you to have rRNAs from your organism to train with though. ?Carson _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.s.campbell1 at gmail.com Fri Jan 15 18:11:05 2016 From: michael.s.campbell1 at gmail.com (Michael Campbell) Date: Fri, 15 Jan 2016 17:11:05 -0800 Subject: [maker-devel] basic question on maker In-Reply-To: References: <07C1C0FA-243E-4F05-AE17-FBECCE5BC3A5@gmail.com> Message-ID: <980827F7-C4F0-4E26-A480-F588B2E09C7B@gmail.com> Hi Karen, Just a quick clarification, MAKER doesn?t predict the rRNAs. If you give MAKER the rRNA sequence with the O-methylation sites it will run snoscan to predict snoRNAs. Take care, Mike > On Jan 15, 2016, at 2:39 PM, hcma wrote: > > Hi Carlson, > > Regarding non-coding RNA predictions, MAKER only predicts tRNAs and rRNAs, but not other RNAs, for example, lncRNAs? > > Thanks again. > > Best Regards > Karen > > > > > On 2016-01-15 09:16, Carson Holt wrote: >>> Can maker2 be run without inputting any protein sequences? >> Yes. But it will not perform as well. >>> How to turn this off in the control files? >> Any option left blank is off. >>> Also, can i run maker using Augustus and not SNAP? Again, how do i turn SNAP off? >> Yes. Leave it blank. >>> Does maker also predict non-coding genes? >> You can run it with tRNAscan or snoscan. Snoscan requires you to have >> rRNAs from your organism to train with though. >> ?Carson > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From nguyenan at mail.nih.gov Tue Jan 19 13:18:36 2016 From: nguyenan at mail.nih.gov (Nguyen, Anh-Dao (NIH/NHGRI) [C]) Date: Tue, 19 Jan 2016 20:18:36 +0000 Subject: [maker-devel] MAKER version 3 beta Message-ID: Hello, I just wanted to know if MAKER version 3 beta (EVM integration) has already been available for downloading? https://groups.google.com/forum/#!searchin/maker-devel/EVM|sort:date/maker- devel/YzsN-t0gu0U/-A_7YT2gFwAJ Thank you very much! Anh-Dao From carsonhh at gmail.com Tue Jan 19 13:23:54 2016 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 19 Jan 2016 13:23:54 -0700 Subject: [maker-devel] MAKER version 3 beta In-Reply-To: References: Message-ID: <6F128D66-685F-4F7F-9097-2A9065ECBC94@gmail.com> Yes. Go to the registration page for the standard MAKER download. After registering, you will be redirected to a page with links to both the current version of MAKER as well as the beta ?> http://yandell.topaz.genetics.utah.edu/cgi-bin/maker_license.cgi ?Carson > On Jan 19, 2016, at 1:18 PM, Nguyen, Anh-Dao (NIH/NHGRI) [C] wrote: > > Hello, > > I just wanted to know if MAKER version 3 beta (EVM integration) has > already been available for downloading? > > https://groups.google.com/forum/#!searchin/maker-devel/EVM|sort:date/maker- > devel/YzsN-t0gu0U/-A_7YT2gFwAJ > > Thank you very much! > Anh-Dao > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From macmanes at gmail.com Tue Jan 19 13:34:38 2016 From: macmanes at gmail.com (Matthew MacManes) Date: Tue, 19 Jan 2016 15:34:38 -0500 Subject: [maker-devel] MAKER version 3 beta In-Reply-To: <6F128D66-685F-4F7F-9097-2A9065ECBC94@gmail.com> References: <6F128D66-685F-4F7F-9097-2A9065ECBC94@gmail.com> Message-ID: Just checking, when installing from the beta, I still see ./maker -?version 2.32 was expecting 3.00.. Thanks, Matt ______________________________________________ Matthew MacManes, Ph.D. University of New Hampshire? I? Assistant Professor of Genome Enabled Biology Department of Molecular, Cellular, & Biomedical Sciences Durham, NH? 03824 Phone: 603-862-4052? | ?Twitter:?@macmanes??| Web:?genomebio.org Office: 189 Rudman Hall | Laboratory: 145 Rudman Hall On January 19, 2016 at 3:24:16 PM, Carson Holt (carsonhh at gmail.com) wrote: Yes. ?Go to the registration page for the standard MAKER download. After registering, you will be redirected to a page with links to both the current version of MAKER as well as the beta ?>?http://yandell.topaz.genetics.utah.edu/cgi-bin/maker_license.cgi ?Carson On Jan 19, 2016, at 1:18 PM, Nguyen, Anh-Dao (NIH/NHGRI) [C] wrote: Hello, I just wanted to know if MAKER version 3 beta (EVM integration) has already been available for downloading? https://groups.google.com/forum/#!searchin/maker-devel/EVM|sort:date/maker- devel/YzsN-t0gu0U/-A_7YT2gFwAJ Thank you very much! Anh-Dao _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Jan 19 13:35:55 2016 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 19 Jan 2016 13:35:55 -0700 Subject: [maker-devel] MAKER version 3 beta In-Reply-To: References: <6F128D66-685F-4F7F-9097-2A9065ECBC94@gmail.com> Message-ID: Thanks. I?ll fix that. ?Carson > On Jan 19, 2016, at 1:34 PM, Matthew MacManes wrote: > > Just checking, when installing from the beta, I still see > > ./maker -?version > 2.32 > was expecting 3.00.. > > Thanks, Matt > > > > > ______________________________________________ > Matthew MacManes, Ph.D. > University of New Hampshire I Assistant Professor of Genome Enabled Biology > Department of Molecular, Cellular, & Biomedical Sciences > Durham, NH 03824 > Phone: 603-862-4052 | Twitter: @macmanes? | Web: genomebio.org > Office: 189 Rudman Hall | Laboratory: 145 Rudman Hall > > On January 19, 2016 at 3:24:16 PM, Carson Holt (carsonhh at gmail.com ) wrote: > >> Yes. Go to the registration page for the standard MAKER download. After registering, you will be redirected to a page with links to both the current version of MAKER as well as the beta ?> http://yandell.topaz.genetics.utah.edu/cgi-bin/maker_license.cgi >> >> ?Carson >> >> >> >>> On Jan 19, 2016, at 1:18 PM, Nguyen, Anh-Dao (NIH/NHGRI) [C] > wrote: >>> >>> Hello, >>> >>> I just wanted to know if MAKER version 3 beta (EVM integration) has >>> already been available for downloading? >>> >>> https://groups.google.com/forum/#!searchin/maker-devel/EVM|sort:date/maker- >>> devel/YzsN-t0gu0U/-A_7YT2gFwAJ >>> >>> Thank you very much! >>> Anh-Dao >>> >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From daren.card at gmail.com Wed Jan 20 08:27:28 2016 From: daren.card at gmail.com (Daren C. Card) Date: Wed, 20 Jan 2016 09:27:28 -0600 Subject: [maker-devel] Passing pre-masked repeats into Maker Message-ID: <306F04FB-DFFC-4CAB-8289-494FC87F13BA@gmail.com> Hello all, I?m about to use Maker to begin annotating a vertebrate genome. We use successive rounds of RepeatMasker to annotate repeats due to some library issues we?ve noticed with Repbase (at least in our critters) and to incorporate de novo repeats from RepeatModeler, a process I don?t think Maker could match. I?m wonder what the best way to pass these annotations into Maker would be. I see the thread at https://groups.google.com/forum/#!topic/maker-devel/7UbOIvwaaRM nicely outlines what Maker does with repeats, and it looks like I have 3 options: (1) reannotate in Maker, (2) pass in a RepeatMasker GFF, or (3) pass in a masked genome. #1 is problematic due to the reasons above. #2 looks like it would hard mask the complex repeats like we want, but will also hard mask the simple repeats, which wouldn?t be ideal for evidence mapping from transcripts/proteins. #3 is cautioned against in the link above, and without an accompanying GFF, I would imagine that Maker wouldn?t be able to release the masking to perform Exonerate polishing (Ns could be gaps or could be hard masking, it wouldn?t know). The way I thought to get around these apparent issues (but let me know if my thinking is incorrect) is to separate simple and complex repeats from the final RepeatMasker GFF. Feed only the complex repeats into Maker as a GFF, so that they are hard masked and accounted for, and have Maker also run RepeatMasker, thus remaking the simple repeats (and maybe some other complex hits, primarily through RepeatRunner). Then Maker can presumedly release the masking as needed. Would this type of workaround be a good idea or are there other options? Or am I just overthinking something that isn?t really a problem? Thanks in advance for any help. Daren Daren Card Castoe Lab University of Texas at Arlington www.darencard.net -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Jan 20 09:20:51 2016 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 20 Jan 2016 09:20:51 -0700 Subject: [maker-devel] Passing pre-masked repeats into Maker In-Reply-To: <306F04FB-DFFC-4CAB-8289-494FC87F13BA@gmail.com> References: <306F04FB-DFFC-4CAB-8289-494FC87F13BA@gmail.com> Message-ID: <6FD7CE4B-B944-4793-A822-9D395725ED6D@gmail.com> The strategy outlined would work. To get RepeatMasker to call only simple repeats in MAKER, set model_org=simple in the control files. ?Carson > On Jan 20, 2016, at 8:27 AM, Daren C. Card wrote: > > Hello all, > > I?m about to use Maker to begin annotating a vertebrate genome. We use successive rounds of RepeatMasker to annotate repeats due to some library issues we?ve noticed with Repbase (at least in our critters) and to incorporate de novo repeats from RepeatModeler, a process I don?t think Maker could match. I?m wonder what the best way to pass these annotations into Maker would be. > > I see the thread at https://groups.google.com/forum/#!topic/maker-devel/7UbOIvwaaRM nicely outlines what Maker does with repeats, and it looks like I have 3 options: (1) reannotate in Maker, (2) pass in a RepeatMasker GFF, or (3) pass in a masked genome. > > #1 is problematic due to the reasons above. > > #2 looks like it would hard mask the complex repeats like we want, but will also hard mask the simple repeats, which wouldn?t be ideal for evidence mapping from transcripts/proteins. > > #3 is cautioned against in the link above, and without an accompanying GFF, I would imagine that Maker wouldn?t be able to release the masking to perform Exonerate polishing (Ns could be gaps or could be hard masking, it wouldn?t know). > > The way I thought to get around these apparent issues (but let me know if my thinking is incorrect) is to separate simple and complex repeats from the final RepeatMasker GFF. Feed only the complex repeats into Maker as a GFF, so that they are hard masked and accounted for, and have Maker also run RepeatMasker, thus remaking the simple repeats (and maybe some other complex hits, primarily through RepeatRunner). Then Maker can presumedly release the masking as needed. > > Would this type of workaround be a good idea or are there other options? Or am I just overthinking something that isn?t really a problem? > > Thanks in advance for any help. > > Daren > > Daren Card > Castoe Lab > University of Texas at Arlington > www.darencard.net _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Wed Jan 20 09:21:38 2016 From: dence at genetics.utah.edu (Daniel Ence) Date: Wed, 20 Jan 2016 16:21:38 +0000 Subject: [maker-devel] Passing pre-masked repeats into Maker In-Reply-To: <306F04FB-DFFC-4CAB-8289-494FC87F13BA@gmail.com> References: <306F04FB-DFFC-4CAB-8289-494FC87F13BA@gmail.com> Message-ID: HI Daren, I think the solution you described sounds appropriate. If you?re concerned about how the simple repeats will be handled by maker in the gff, then you can just take those out. If they?re important for downstream analysis, you can add them back in then. Let me know if that helps or if other issues arise. Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 On Jan 20, 2016, at 8:27 AM, Daren C. Card > wrote: Hello all, I?m about to use Maker to begin annotating a vertebrate genome. We use successive rounds of RepeatMasker to annotate repeats due to some library issues we?ve noticed with Repbase (at least in our critters) and to incorporate de novo repeats from RepeatModeler, a process I don?t think Maker could match. I?m wonder what the best way to pass these annotations into Maker would be. I see the thread at https://groups.google.com/forum/#!topic/maker-devel/7UbOIvwaaRM nicely outlines what Maker does with repeats, and it looks like I have 3 options: (1) reannotate in Maker, (2) pass in a RepeatMasker GFF, or (3) pass in a masked genome. #1 is problematic due to the reasons above. #2 looks like it would hard mask the complex repeats like we want, but will also hard mask the simple repeats, which wouldn?t be ideal for evidence mapping from transcripts/proteins. #3 is cautioned against in the link above, and without an accompanying GFF, I would imagine that Maker wouldn?t be able to release the masking to perform Exonerate polishing (Ns could be gaps or could be hard masking, it wouldn?t know). The way I thought to get around these apparent issues (but let me know if my thinking is incorrect) is to separate simple and complex repeats from the final RepeatMasker GFF. Feed only the complex repeats into Maker as a GFF, so that they are hard masked and accounted for, and have Maker also run RepeatMasker, thus remaking the simple repeats (and maybe some other complex hits, primarily through RepeatRunner). Then Maker can presumedly release the masking as needed. Would this type of workaround be a good idea or are there other options? Or am I just overthinking something that isn?t really a problem? Thanks in advance for any help. Daren Daren Card Castoe Lab University of Texas at Arlington www.darencard.net _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From jcornel3 at asu.edu Fri Jan 22 14:38:14 2016 From: jcornel3 at asu.edu (John Cornelius) Date: Fri, 22 Jan 2016 14:38:14 -0700 Subject: [maker-devel] Question on post processing of annotations Message-ID: Hi, I'm using the maker_map_ids script to change the gene ids on an annotation that I just finished. However, I noticed that it does not change the name of genes predicted by SNAP. Is there any way to include SNAP genes for consideration by maker_map_ids? Thanks. -- John Cornelius MCB PhD Candidate Arizona State University -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Jan 22 15:01:29 2016 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 22 Jan 2016 15:01:29 -0700 Subject: [maker-devel] Question on post processing of annotations In-Reply-To: References: Message-ID: Hi John, Do you mean the match/match_part features that are source snap_masked? Those are not genes, they are reference alignments representing the ab initio SNAP calls, and it would be incorrect to rename them. They do not have a 1 to 1 relationship with the final gene models. Sometimes a gene model will overlap 2 or more uninformed SNAP ab initio reference alignments, or one SNAP reference alignment may overlap multiple final gene models, so names cannot just be passed from one to the other. If you want to add specific SNAP models to the final annotation set, you would need to upgrade them to being a gene/mRNA/exon/CDS feature before you can do that. You can do that with manual editors like Apollo, or you can supply a subset of the features you want to upgrade to maker in the pred_gff= option as a separate run, put existing models in model_gff=, and run with keep_preds=1. I know I have covered this previously in greater detail as part of the devel list. If you search the archives for the keywords pred_gff, keep_preds, and iprscan you should come across a number of threads that may be helpful ?> https://groups.google.com/forum/#!forum/maker-devel Thanks, Carson > On Jan 22, 2016, at 2:38 PM, John Cornelius wrote: > > Hi, I'm using the maker_map_ids script to change the gene ids on an annotation that I just finished. However, I noticed that it does not change the name of genes predicted by SNAP. Is there any way to include SNAP genes for consideration by maker_map_ids? Thanks. > > -- > John Cornelius > MCB PhD Candidate > Arizona State University > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From jcornel3 at asu.edu Fri Jan 22 15:06:17 2016 From: jcornel3 at asu.edu (John Cornelius) Date: Fri, 22 Jan 2016 15:06:17 -0700 Subject: [maker-devel] Question on post processing of annotations In-Reply-To: References: Message-ID: I'll look into that thanks. I had been previously just been looking for things in regards to the script itself and its functionality. On Fri, Jan 22, 2016 at 3:01 PM, Carson Holt wrote: > Hi John, > > Do you mean the match/match_part features that are source snap_masked? > Those are not genes, they are reference alignments representing the ab > initio SNAP calls, and it would be incorrect to rename them. They do not > have a 1 to 1 relationship with the final gene models. Sometimes a gene > model will overlap 2 or more uninformed SNAP ab initio reference > alignments, or one SNAP reference alignment may overlap multiple final gene > models, so names cannot just be passed from one to the other. > > If you want to add specific SNAP models to the final annotation set, you > would need to upgrade them to being a gene/mRNA/exon/CDS feature before you > can do that. You can do that with manual editors like Apollo, or you can > supply a subset of the features you want to upgrade to maker in the > pred_gff= option as a separate run, put existing models in model_gff=, and > run with keep_preds=1. > > I know I have covered this previously in greater detail as part of the > devel list. If you search the archives for the keywords pred_gff, > keep_preds, and iprscan you should come across a number of threads that may > be helpful ?> https://groups.google.com/forum/#!forum/maker-devel > > Thanks, > Carson > > > > On Jan 22, 2016, at 2:38 PM, John Cornelius wrote: > > Hi, I'm using the maker_map_ids script to change the gene ids on an > annotation that I just finished. However, I noticed that it does not change > the name of genes predicted by SNAP. Is there any way to include SNAP genes > for consideration by maker_map_ids? Thanks. > > -- > John Cornelius > MCB PhD Candidate > Arizona State University > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > -- John Cornelius MCB PhD Candidate Arizona State University -------------- next part -------------- An HTML attachment was scrubbed... URL: From sapuizait at gmail.com Wed Jan 27 06:14:45 2016 From: sapuizait at gmail.com (Panos Sapou) Date: Wed, 27 Jan 2016 14:14:45 +0100 Subject: [maker-devel] prokaryotic genome annotation Message-ID: Dear all I recently started using maker for the annotation of my prokaryotic genomes and even if i managed to get some nice results I would like to check with you if what I did was right and also ask you a couple of questions about the procedure I also apologize in advance if I ask sth silly since I am a newbie in bionformatics and I might ask very basic stuff I have only available DNA sequences, I have no ESTs and no proteins 1) I started by using the protein2genome option and as reference I used the Uniref50 database. Then I generated a merged gff file (similar procedure like the one in the tutorial maker) 2) I used Genemark.S and I created a model by using the gmsn.pl command and as input the assembled contigs of my bacteria 3) after finishing the above 2 steps I run maker again by using as input the gff file from step 1: #-------Re-annotation using maker derived GFF3: maker_gff=input.gff and I also set protein_pass=1 is that correct? do you think it helps? and at the #-----gene prediction I used the hmm.mod file generated in step 2 my questions: Do the above sound correct? it is in my understanding that I can only use genemark for prokaryotic genomes, is that correct? when I run maker the second time (step 3) should I set protein2genome=1 or 0? or just having the gff file (from step 1) in the re-annotation options is enough? and thefore prediction based on the protein2genome has already been done? Also if I use a gff file (from step 1) will it make any difference if I set protein2genome=1 and use an extra (different) database? (I was wondering if it will improve the results?) finally regarding the choice of the database: would you advise me to use uniref or the proteomes of closely related bacteria (I have downloaded and created a single fasta from appx 100 proteomes of closely related bacteria) thank you in advance and once again I apologize if it is pretty basic what I am asking, just wanted to make sure... Best Panos -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Jan 27 08:17:37 2016 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 27 Jan 2016 08:17:37 -0700 Subject: [maker-devel] prokaryotic genome annotation In-Reply-To: References: Message-ID: <032FB687-8EDD-49A7-9198-3A5E7FE04C88@gmail.com> Hi Panos, The strategy for annotating prokaryotes is very different than that for eukaryotes. Basically my recommendation is to use Genemark S and set protein2genome=1, keep_preds=1, always_complete=1, and no need for ESTs (irrelevant in prokaryotes). No need to do multiple iterations like you would for eukaryotes either. The bootstrapping procedure is not relevant for prokaryotes. I?d avoid also using the GFF3 passthrough option, you will lose some information about the alignment that affects reading frame of the protein evidence. It can be convenient for large eukaryotes when you are pulling evidence from a database, but if it?s just from a previous maker run, you should just rerun in the same directory with the protein fasta. MAKER will detect that it already ran blastx and pull the raw reports form the previous datastore. Thanks, Carson > On Jan 27, 2016, at 6:14 AM, Panos Sapou wrote: > > Dear all > > I recently started using maker for the annotation of my prokaryotic genomes and even if i managed to get some nice results I would like to check with you if what I did was right and also ask you a couple of questions about the procedure > > I also apologize in advance if I ask sth silly since I am a newbie in bionformatics and I might ask very basic stuff > > > I have only available DNA sequences, I have no ESTs and no proteins > > 1) I started by using the protein2genome option and as reference I used the Uniref50 database. Then I generated a merged gff file (similar procedure like the one in the tutorial maker) > > 2) I used Genemark.S and I created a model by using the gmsn.pl command and as input the assembled contigs of my bacteria > > 3) after finishing the above 2 steps I run maker again by using as input the gff file from step 1: #-------Re-annotation using maker derived GFF3: maker_gff=input.gff > and I also set > protein_pass=1 > is that correct? do you think it helps? > and at the #-----gene prediction I used the hmm.mod file generated in step 2 > > my questions: > Do the above sound correct? > > it is in my understanding that I can only use genemark for prokaryotic genomes, is that correct? > > when I run maker the second time (step 3) should I set protein2genome=1 or 0? or just having the gff file (from step 1) in the re-annotation options is enough? and thefore prediction based on the protein2genome has already been done? > > Also if I use a gff file (from step 1) will it make any difference if I set protein2genome=1 and use an extra (different) database? (I was wondering if it will improve the results?) > > finally regarding the choice of the database: would you advise me to use uniref or the proteomes of closely related bacteria (I have downloaded and created a single fasta from appx 100 proteomes of closely related bacteria) > > thank you in advance > and once again I apologize if it is pretty basic what I am asking, just wanted to make sure... > > > Best > Panos > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From cjfields at illinois.edu Wed Jan 27 11:30:29 2016 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 27 Jan 2016 18:30:29 +0000 Subject: [maker-devel] prokaryotic genome annotation In-Reply-To: <032FB687-8EDD-49A7-9198-3A5E7FE04C88@gmail.com> References: <032FB687-8EDD-49A7-9198-3A5E7FE04C88@gmail.com> Message-ID: <6B0EA45F-9526-4ED2-AED7-1DA3E1AEDD24@illinois.edu> We?re thinking of switching our bacterial pipeline to MAKER actually. We generally use other bacteria-specific gene pred tools like Glimmer and Prodigal, though I anticipate these could be added using pred_gff (as long as the GFF3 is fine)? chris On Jan 27, 2016, at 9:17 AM, Carson Holt > wrote: Hi Panos, The strategy for annotating prokaryotes is very different than that for eukaryotes. Basically my recommendation is to use Genemark S and set protein2genome=1, keep_preds=1, always_complete=1, and no need for ESTs (irrelevant in prokaryotes). No need to do multiple iterations like you would for eukaryotes either. The bootstrapping procedure is not relevant for prokaryotes. I?d avoid also using the GFF3 passthrough option, you will lose some information about the alignment that affects reading frame of the protein evidence. It can be convenient for large eukaryotes when you are pulling evidence from a database, but if it?s just from a previous maker run, you should just rerun in the same directory with the protein fasta. MAKER will detect that it already ran blastx and pull the raw reports form the previous datastore. Thanks, Carson On Jan 27, 2016, at 6:14 AM, Panos Sapou > wrote: Dear all I recently started using maker for the annotation of my prokaryotic genomes and even if i managed to get some nice results I would like to check with you if what I did was right and also ask you a couple of questions about the procedure I also apologize in advance if I ask sth silly since I am a newbie in bionformatics and I might ask very basic stuff I have only available DNA sequences, I have no ESTs and no proteins 1) I started by using the protein2genome option and as reference I used the Uniref50 database. Then I generated a merged gff file (similar procedure like the one in the tutorial maker) 2) I used Genemark.S and I created a model by using the gmsn.pl command and as input the assembled contigs of my bacteria 3) after finishing the above 2 steps I run maker again by using as input the gff file from step 1: #-------Re-annotation using maker derived GFF3: maker_gff=input.gff and I also set protein_pass=1 is that correct? do you think it helps? and at the #-----gene prediction I used the hmm.mod file generated in step 2 my questions: Do the above sound correct? it is in my understanding that I can only use genemark for prokaryotic genomes, is that correct? when I run maker the second time (step 3) should I set protein2genome=1 or 0? or just having the gff file (from step 1) in the re-annotation options is enough? and thefore prediction based on the protein2genome has already been done? Also if I use a gff file (from step 1) will it make any difference if I set protein2genome=1 and use an extra (different) database? (I was wondering if it will improve the results?) finally regarding the choice of the database: would you advise me to use uniref or the proteomes of closely related bacteria (I have downloaded and created a single fasta from appx 100 proteomes of closely related bacteria) thank you in advance and once again I apologize if it is pretty basic what I am asking, just wanted to make sure... Best Panos _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Jan 27 15:42:59 2016 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 27 Jan 2016 15:42:59 -0700 Subject: [maker-devel] prokaryotic genome annotation In-Reply-To: <6B0EA45F-9526-4ED2-AED7-1DA3E1AEDD24@illinois.edu> References: <032FB687-8EDD-49A7-9198-3A5E7FE04C88@gmail.com> <6B0EA45F-9526-4ED2-AED7-1DA3E1AEDD24@illinois.edu> Message-ID: GFF3 is just fine for adding predictions. For prokaryotes, I don?t like to add protein evidence that way, but for predictions it?s fine. The only issue I could see going forward would be a lack of support for alternate codon usage in MAKER right now. Everything is being interpreted using the canonical codon table. It?s not an insurmountable issue, but it would take some work to let it do that. --Carson > On Jan 27, 2016, at 11:30 AM, Fields, Christopher J wrote: > > We?re thinking of switching our bacterial pipeline to MAKER actually. We generally use other bacteria-specific gene pred tools like Glimmer and Prodigal, though I anticipate these could be added using pred_gff (as long as the GFF3 is fine)? > > chris > >> On Jan 27, 2016, at 9:17 AM, Carson Holt > wrote: >> >> Hi Panos, >> >> The strategy for annotating prokaryotes is very different than that for eukaryotes. Basically my recommendation is to use Genemark S and set protein2genome=1, keep_preds=1, always_complete=1, and no need for ESTs (irrelevant in prokaryotes). No need to do multiple iterations like you would for eukaryotes either. The bootstrapping procedure is not relevant for prokaryotes. I?d avoid also using the GFF3 passthrough option, you will lose some information about the alignment that affects reading frame of the protein evidence. It can be convenient for large eukaryotes when you are pulling evidence from a database, but if it?s just from a previous maker run, you should just rerun in the same directory with the protein fasta. MAKER will detect that it already ran blastx and pull the raw reports form the previous datastore. >> >> Thanks, >> Carson >> >> >> >>> On Jan 27, 2016, at 6:14 AM, Panos Sapou > wrote: >>> >>> Dear all >>> >>> I recently started using maker for the annotation of my prokaryotic genomes and even if i managed to get some nice results I would like to check with you if what I did was right and also ask you a couple of questions about the procedure >>> >>> I also apologize in advance if I ask sth silly since I am a newbie in bionformatics and I might ask very basic stuff >>> >>> >>> I have only available DNA sequences, I have no ESTs and no proteins >>> >>> 1) I started by using the protein2genome option and as reference I used the Uniref50 database. Then I generated a merged gff file (similar procedure like the one in the tutorial maker) >>> >>> 2) I used Genemark.S and I created a model by using the gmsn.pl command and as input the assembled contigs of my bacteria >>> >>> 3) after finishing the above 2 steps I run maker again by using as input the gff file from step 1: #-------Re-annotation using maker derived GFF3: maker_gff=input.gff >>> and I also set >>> protein_pass=1 >>> is that correct? do you think it helps? >>> and at the #-----gene prediction I used the hmm.mod file generated in step 2 >>> >>> my questions: >>> Do the above sound correct? >>> >>> it is in my understanding that I can only use genemark for prokaryotic genomes, is that correct? >>> >>> when I run maker the second time (step 3) should I set protein2genome=1 or 0? or just having the gff file (from step 1) in the re-annotation options is enough? and thefore prediction based on the protein2genome has already been done? >>> >>> Also if I use a gff file (from step 1) will it make any difference if I set protein2genome=1 and use an extra (different) database? (I was wondering if it will improve the results?) >>> >>> finally regarding the choice of the database: would you advise me to use uniref or the proteomes of closely related bacteria (I have downloaded and created a single fasta from appx 100 proteomes of closely related bacteria) >>> >>> thank you in advance >>> and once again I apologize if it is pretty basic what I am asking, just wanted to make sure... >>> >>> >>> Best >>> Panos >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sapuizait at gmail.com Fri Jan 29 03:12:35 2016 From: sapuizait at gmail.com (Panos Sapou) Date: Fri, 29 Jan 2016 11:12:35 +0100 Subject: [maker-devel] prokaryotic genome annotation In-Reply-To: References: Message-ID: Dear all I am trying to annotate a new spiroplasma strain and I would like to know if there is a way to change the stop codons (not take into account 'tga') cause eitherwise I get too many premature stop codons and fragmented genes that are not real Best Panos On 27 January 2016 at 14:14, Panos Sapou wrote: > Dear all > > I recently started using maker for the annotation of my prokaryotic > genomes and even if i managed to get some nice results I would like to > check with you if what I did was right and also ask you a couple of > questions about the procedure > > I also apologize in advance if I ask sth silly since I am a newbie in > bionformatics and I might ask very basic stuff > > > I have only available DNA sequences, I have no ESTs and no proteins > > 1) I started by using the protein2genome option and as reference I used > the Uniref50 database. Then I generated a merged gff file (similar > procedure like the one in the tutorial maker) > > 2) I used Genemark.S and I created a model by using the gmsn.pl command > and as input the assembled contigs of my bacteria > > 3) after finishing the above 2 steps I run maker again by using as input > the gff file from step 1: #-------Re-annotation using maker derived GFF3: > maker_gff=input.gff > and I also set > protein_pass=1 > is that correct? do you think it helps? > and at the #-----gene prediction I used the hmm.mod file generated in step > 2 > > my questions: > Do the above sound correct? > > it is in my understanding that I can only use genemark for prokaryotic > genomes, is that correct? > > when I run maker the second time (step 3) should I set protein2genome=1 or > 0? or just having the gff file (from step 1) in the re-annotation options > is enough? and thefore prediction based on the protein2genome has already > been done? > > Also if I use a gff file (from step 1) will it make any difference if I > set protein2genome=1 and use an extra (different) database? (I was > wondering if it will improve the results?) > > finally regarding the choice of the database: would you advise me to use > uniref or the proteomes of closely related bacteria (I have downloaded and > created a single fasta from appx 100 proteomes of closely related bacteria) > > thank you in advance > and once again I apologize if it is pretty basic what I am asking, just > wanted to make sure... > > > Best > Panos > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Sun Jan 31 13:43:21 2016 From: carsonhh at gmail.com (Carson Holt) Date: Sun, 31 Jan 2016 13:43:21 -0700 Subject: [maker-devel] prokaryotic genome annotation In-Reply-To: References: Message-ID: MAKER doesn?t support alternate codon usage yet. ?Carson > On Jan 29, 2016, at 3:12 AM, Panos Sapou wrote: > > Dear all > > I am trying to annotate a new spiroplasma strain and I would like to know if there is a way to change the stop codons (not take into account 'tga') > > cause eitherwise I get too many premature stop codons and fragmented genes that are not real > > Best > Panos > > On 27 January 2016 at 14:14, Panos Sapou > wrote: > Dear all > > I recently started using maker for the annotation of my prokaryotic genomes and even if i managed to get some nice results I would like to check with you if what I did was right and also ask you a couple of questions about the procedure > > I also apologize in advance if I ask sth silly since I am a newbie in bionformatics and I might ask very basic stuff > > > I have only available DNA sequences, I have no ESTs and no proteins > > 1) I started by using the protein2genome option and as reference I used the Uniref50 database. Then I generated a merged gff file (similar procedure like the one in the tutorial maker) > > 2) I used Genemark.S and I created a model by using the gmsn.pl command and as input the assembled contigs of my bacteria > > 3) after finishing the above 2 steps I run maker again by using as input the gff file from step 1: #-------Re-annotation using maker derived GFF3: maker_gff=input.gff > and I also set > protein_pass=1 > is that correct? do you think it helps? > and at the #-----gene prediction I used the hmm.mod file generated in step 2 > > my questions: > Do the above sound correct? > > it is in my understanding that I can only use genemark for prokaryotic genomes, is that correct? > > when I run maker the second time (step 3) should I set protein2genome=1 or 0? or just having the gff file (from step 1) in the re-annotation options is enough? and thefore prediction based on the protein2genome has already been done? > > Also if I use a gff file (from step 1) will it make any difference if I set protein2genome=1 and use an extra (different) database? (I was wondering if it will improve the results?) > > finally regarding the choice of the database: would you advise me to use uniref or the proteomes of closely related bacteria (I have downloaded and created a single fasta from appx 100 proteomes of closely related bacteria) > > thank you in advance > and once again I apologize if it is pretty basic what I am asking, just wanted to make sure... > > > Best > Panos > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: